Yes, we're a little loco · Research Project · MIT License

LocoAgente

Can small models think in loops? We're finding out. Modern agentic AI assumes frontier models. We're testing whether a 4B model with the right scaffolding can run useful agent loops — on consumer hardware, for free.

Get Started View on GitHub
terminal
$ loco-agente run --task autoresearch --scaffolding re2+cot
[loop 1] modified train.py → val_bpb: 1.842
[loop 2] modified train.py → val_bpb: 1.831 ↓ kept
[loop 3] modified train.py → val_bpb: 1.845 ↑ reverted
$ loco-agente compare --baseline frontier --local re2+cot
How It Works

Observe. Act. Evaluate. Loop.
Sounds loco. Let's find out.

The idea is simple: give a small model a task, let it modify code, check the result, and loop. The real question is whether the right scaffolding can make a 4B model useful in this loop. We're building the tools to answer that.

1

Observe & Plan

The agent reads state — a metric, a file, a test result — and decides what to do next. This is where scaffolding (CoT, RE2, constrained choices) makes the biggest difference for small models.

2

Act & Execute

The agent modifies code, runs a command, or calls a tool. The action space is deliberately constrained — fewer choices mean fewer errors, and errors compound in loops.

3

Evaluate & Iterate

Did it improve? The agent checks, then loops back. Fixed time budgets and automatic metrics make every experiment directly comparable.

We're all a little loco here.

The sensible approach is to use frontier models for agentic tasks. We'd rather find out what's possible with a 4B model and the right scaffolding. Maybe that's loco. We think the answer matters.

Constrained Over Capable

A reliable agent with 5 possible actions beats an unreliable agent with 500. Start with the smallest viable action space and expand only when you have the evidence.

🎓

Built to Learn From

Every experiment is a semester project. Students don't just run agents — they study why scaffolding helps, where models fail, and what that tells us about reasoning.

🔬

Honest Results

If small models can't do this, we publish that. Understanding the capability floor is as valuable as pushing through it. No hype. Just data.

💰

Free After Row One

The frontier baseline costs API money. Every other experiment runs on your own hardware, for free. That's the same LocoLLM advantage applied to agent research.

Who It's For

Are you loco enough?

If any of these sound like you, welcome to the club.

🔬

The Scaffolding Researcher

You want to know which prompting and reasoning strategies actually matter for multi-step AI tasks. Not vibes. Systematic comparison with real metrics.

🤖

The Agent Builder

You want to build useful AI agents without frontier API costs. If a 4B model can handle your task loop, your agent runs free forever.

🎓

The Student

You want a capstone project that's genuinely novel. Nobody has published systematic small-model agent scaffolding research. Your results will be among the first.

🧩

The LocoLLM User

You already know what adapters and routing can do for single queries. Now you want to know if the same strategies work when the model has to think across multiple steps.

Architecture

The agent loop.

A task comes in, the agent observes state, plans with scaffolding, acts, and evaluates. If the metric improved, it keeps the change. If not, it reverts. The loop continues until the goal is met or the budget is exhausted.

Task / Goal
Observe State
Plan (LLM + Scaffolding)
Modify Code
Run Command
Call Tool
Evaluate (metric improved?)
Loop continues until goal met or budget exhausted · Scaffolding applied at every step

Early days. Eyes wide open.

LocoAgente is a young project. We're not pretending otherwise. Here's what exists today and where we're headed.

Research Design

Core question defined, experiment matrix designed, three research tracks planned. The framework exists.

Done

Autoresearch Port

Porting Karpathy's autoresearch loop to work with local 4B models. The simplest possible agent — one file, one metric, automatic evaluation.

Building

Scaffolding Experiments

Systematic comparison: bare model vs CoT vs RE2 vs voting vs constrained actions vs agent adapter. Which strategies compound in loops?

Next

Local Agent Ecosystem

Task-specific agent adapters contributed to the LocoLLM ecosystem. Multi-agent coordination on consumer hardware via LocoConvoy.

Vision
Get Involved

Join the loco ones.

LocoAgente is a collaborative research project. Every experiment, adapter, and analysis makes the whole effort stronger. The barrier to entry is low. The ceiling is high.

🔄 Run Experiments

Pick a scaffolding strategy, run the experiment matrix, publish results. Each comparison is a self-contained research contribution.

🧠 Train an Agent Adapter

Use LocoLLM's adapter pipeline to train a specialist agent adapter. Code modification, data analysis, documentation — pick your domain.

📊 Analyse Results

We'll generate experiment data faster than we can analyse it. Bring your statistics and visualisation skills.

📝 Write It Up

Every experiment is a potential paper. Scaffolding strategies for small-model agents is unstudied territory. Your results will be among the first published.

If a project interests you but you're not sure you have the skills, that's probably the right project. The one that stretches you is the one you'll learn the most from.

Contact

Say hello.

LocoAgente is a School of Management and Marketing initiative at Curtin University. Whether you're a student looking for a capstone project, a researcher interested in collaboration, or just curious — we'd love to hear from you.

Project Lead: Michael Borck

Get in Touch View on GitHub