Track A: Autoresearch
Can a small model autonomously search, read, and synthesise sources into a structured research output — without human intervention at each step?
Track A: Autoresearch
Can a small model autonomously search, read, and synthesise sources into a structured research output — without human intervention at each step?
Track B: Task agents
Structured multi-step task completion: planning, tool use, error recovery. How far can a 4B model go before scaffolding fails to compensate?
Track C: Scaffolding strategies
Systematic comparison of prompting, memory, tool access, and loop design. Which scaffolding choices matter most for small-model agentic reliability?
Track D: Framework evaluation
Head-to-head comparison of agentic frameworks (LangChain, LlamaIndex, smolagents, custom) on identical tasks with identical models.