Can small models think in loops? We're finding out. Modern agentic AI assumes frontier models. We're testing whether a 4B model with the right scaffolding can run useful agent loops — on consumer hardware, for free.
The idea is simple: give a small model a task, let it modify code, check the result, and loop. The real question is whether the right scaffolding can make a 4B model useful in this loop. We're building the tools to answer that.
The agent reads state — a metric, a file, a test result — and decides what to do next. This is where scaffolding (CoT, RE2, constrained choices) makes the biggest difference for small models.
The agent modifies code, runs a command, or calls a tool. The action space is deliberately constrained — fewer choices mean fewer errors, and errors compound in loops.
Did it improve? The agent checks, then loops back. Fixed time budgets and automatic metrics make every experiment directly comparable.
The sensible approach is to use frontier models for agentic tasks. We'd rather find out what's possible with a 4B model and the right scaffolding. Maybe that's loco. We think the answer matters.
A reliable agent with 5 possible actions beats an unreliable agent with 500. Start with the smallest viable action space and expand only when you have the evidence.
Every experiment is a semester project. Students don't just run agents — they study why scaffolding helps, where models fail, and what that tells us about reasoning.
If small models can't do this, we publish that. Understanding the capability floor is as valuable as pushing through it. No hype. Just data.
The frontier baseline costs API money. Every other experiment runs on your own hardware, for free. That's the same LocoLLM advantage applied to agent research.
If any of these sound like you, welcome to the club.
You want to know which prompting and reasoning strategies actually matter for multi-step AI tasks. Not vibes. Systematic comparison with real metrics.
You want to build useful AI agents without frontier API costs. If a 4B model can handle your task loop, your agent runs free forever.
You want a capstone project that's genuinely novel. Nobody has published systematic small-model agent scaffolding research. Your results will be among the first.
You already know what adapters and routing can do for single queries. Now you want to know if the same strategies work when the model has to think across multiple steps.
A task comes in, the agent observes state, plans with scaffolding, acts, and evaluates. If the metric improved, it keeps the change. If not, it reverts. The loop continues until the goal is met or the budget is exhausted.
LocoAgente is a young project. We're not pretending otherwise. Here's what exists today and where we're headed.
Core question defined, experiment matrix designed, three research tracks planned. The framework exists.
DonePorting Karpathy's autoresearch loop to work with local 4B models. The simplest possible agent — one file, one metric, automatic evaluation.
BuildingSystematic comparison: bare model vs CoT vs RE2 vs voting vs constrained actions vs agent adapter. Which strategies compound in loops?
NextTask-specific agent adapters contributed to the LocoLLM ecosystem. Multi-agent coordination on consumer hardware via LocoConvoy.
VisionLocoAgente is a collaborative research project. Every experiment, adapter, and analysis makes the whole effort stronger. The barrier to entry is low. The ceiling is high.
Pick a scaffolding strategy, run the experiment matrix, publish results. Each comparison is a self-contained research contribution.
Use LocoLLM's adapter pipeline to train a specialist agent adapter. Code modification, data analysis, documentation — pick your domain.
We'll generate experiment data faster than we can analyse it. Bring your statistics and visualisation skills.
Every experiment is a potential paper. Scaffolding strategies for small-model agents is unstudied territory. Your results will be among the first published.
If a project interests you but you're not sure you have the skills, that's probably the right project. The one that stretches you is the one you'll learn the most from.
LocoAgente is a School of Management and Marketing initiative at Curtin University. Whether you're a student looking for a capstone project, a researcher interested in collaboration, or just curious — we'd love to hear from you.
Project Lead: Michael Borck