Open-sourcing Octofriend, a coding agent that works with GPT-5, Claude, and open LLMs

Published 2025-08-07

We're launching Octofriend, an open-source coding agent that works with GPT-5, Claude, and open-source (and even local) LLMs like GLM-4.5 and GPT-OSS-120B. Octo runs in your terminal: it's like Claude Code, but it works with pretty much any LLM.

Octo has two optional custom-trained models that automatically fix minor diff edit or JSON encoding errors that even very good coding models sometimes run into. Using the autofix models is usually faster and cheaper than retrying the large coding models, and helps reduce the large models' confusion. Octo can use these autofix models with any LLM! Naturally, we're open-sourcing the autofix models we trained... Including down to the training pipelines themselves.

Octo works especially well with reasoning models. Many coding agents struggle with correctly handling reasoning tokens, especially encrypted ones from OpenAI and Anthropic's APIs. Octo handles those tokens carefully, and we think you'll notice how much smarter it is as a result.

We've been busy for the past few months: we've also shipped improvements to the main Synthetic site, like new model support (including some excellent coding models you can use with Octo), and a free trial.

Octofriend
Open-sourcing diff-apply and fix-json models
New model support
Free trials for new users
Bugfixes and reliability improvements
Future plans

Octofriend

octofriend

We're open-sourcing Octofriend, the cute terminal coding agent we've been working on for the past couple of months. Octo works great with GPT-5 and Claude 4, and, of course, we've also made sure it works great with open models we host on Synthetic like zai-org/GLM-4.5 and moonshotai/Kimi-K2.

Octo is sort of like Claude Code, except that it works with just about any model in existence — even LLMs run locally on your own machine. It also has two optional helper models we trained, which automatically fix minor diff edit inaccuracies and JSON encoding errors that even very good coding models sometimes make mistakes on. If you're familiar with the Aider Polyglot benchmarks, you'll recognize that even the top coding models sometimes fail to solve problems due to edit format inaccuracies. Octo should run into far fewer of those problems, because of the autofix models we trained. This helps in a few ways:

Asking coding models to fix their own malformed tool calls or edits often confuses them more than if the edits had simply succeeded. When Octo quietly auto-fixes almost-correct edits and JSON, the models get less confused.
The autofix models are much smaller, faster, and cheaper than retrying the large coding models: they're LoRAs we self-host on Synthetic, at $0.20/million tokens. GPT-5, for example, is many times more expensive: it's $1.25/million input tokens, and $10/million output tokens!

Open-sourcing diff-apply and fix-json models

Naturally, we're also open-sourcing the models! There's two models we're releasing today:

You can run both of them on Synthetic. If you do, we'd recommend setting the temperature parameter to zero: in the UI, you can do this by clicking the little gear icon below the main text box.

click this

We're also open-sourcing the training pipeline: it's in the Octofriend GitHub repo!

These models are just tiny Llama 3.1 8b Instruct LoRAs, just like anyone else can train and run for $0.20/million tokens on Synthetic. They didn't take too much GPU time to train — the fix-json model in particular only took 2.5 hours on a single H100 NVL, which costs less than $10 on GPU rental clouds like RunPod.

One of the things we're excited by is showing how easy it is to train these kinds of small models, and yet how useful they can be despite their small size. We hope you're inspired to train your own, too.

Free trials for new users

We've shipped a free trial for new users, to help people get a feel for the site before deciding to spend money on talking to LLMs. When new people sign up, they'll get a few free messages to try out different always-on LLMs.

UI-only for now! We'll see how it goes and consider opening it up to API usage.

New model support

There have been a lot of releases lately! We've added support for quite a few models since our last newsletter:

We also added support for new on-demand architectures like EXAONE 4.0 and Ernie 4.5.

Bugfixes and reliability improvements

You've probably noticed the site becoming smoother and easier on the eyes over the last couple of months. That's no accident, and we'll keep working on it!

Future plans

If you made it this far: thanks for reading! We're hard at work on more improvements, including but not limited to:

Shareable chat threads! Sometimes you just want to shoot a chat thread you've been having with a model over to a friend or coworker, and we want to support this easily (opt-in of course, for privacy). It's just not the same to copy/paste the entire thread and try to send it to someone: it's much easier to just send them a link.
More always-on model support, and support for more base models for always-on LoRAs!
Dark mode! We've had a lot of requests for this one and it's on our radar. Our designer is working hard at improving the UI, and one of the items on her agenda is a good set of colors for dark mode.
Better model search and discovery! It's not easy to figure out which Hugging Face models we support or discover new models to try. Most users might not even know we support thousands of models. We're working on making that clearer and easier.
More backend and UI improvements. Maybe not the most exciting line item, but we're always bugfixing.

If you have any thoughts or feedback, please continue to reach out at [email protected]. We appreciate all the emails we've gotten so far!

— Matt & Billy