Published 2025-08-07
We're launching Octofriend, an open-source coding agent that works with GPT-5, Claude, and open-source (and even local) LLMs like GLM-4.5 and GPT-OSS-120B. Octo runs in your terminal: it's like Claude Code, but it works with pretty much any LLM.
Octo has two optional custom-trained models that automatically fix minor diff edit or JSON encoding errors that even very good coding models sometimes run into. Using the autofix models is usually faster and cheaper than retrying the large coding models, and helps reduce the large models' confusion. Octo can use these autofix models with any LLM! Naturally, we're open-sourcing the autofix models we trained... Including down to the training pipelines themselves.
Octo works especially well with reasoning models. Many coding agents struggle with correctly handling reasoning tokens, especially encrypted ones from OpenAI and Anthropic's APIs. Octo handles those tokens carefully, and we think you'll notice how much smarter it is as a result.
We've been busy for the past few months: we've also shipped improvements to the main Synthetic site, like new model support (including some excellent coding models you can use with Octo), and a free trial.
We're open-sourcing Octofriend, the cute terminal coding agent we've been working on for the past couple of months. Octo works great with GPT-5 and Claude 4, and, of course, we've also made sure it works great with open models we host on Synthetic like zai-org/GLM-4.5 and moonshotai/Kimi-K2.
Octo is sort of like Claude Code, except that it works with just about any model in existence — even LLMs run locally on your own machine. It also has two optional helper models we trained, which automatically fix minor diff edit inaccuracies and JSON encoding errors that even very good coding models sometimes make mistakes on. If you're familiar with the Aider Polyglot benchmarks, you'll recognize that even the top coding models sometimes fail to solve problems due to edit format inaccuracies. Octo should run into far fewer of those problems, because of the autofix models we trained. This helps in a few ways:
Naturally, we're also open-sourcing the models! There's two models we're releasing today:
You can run both of them on Synthetic. If you do, we'd recommend setting the temperature parameter to zero: in the UI, you can do this by clicking the little gear icon below the main text box.
We're also open-sourcing the training pipeline: it's in the Octofriend GitHub repo!
These models are just tiny Llama 3.1 8b Instruct LoRAs, just like anyone else can train and run for $0.20/million tokens on Synthetic. They didn't take too much GPU time to train — the fix-json model in particular only took 2.5 hours on a single H100 NVL, which costs less than $10 on GPU rental clouds like RunPod.
One of the things we're excited by is showing how easy it is to train these kinds of small models, and yet how useful they can be despite their small size. We hope you're inspired to train your own, too.
We've shipped a free trial for new users, to help people get a feel for the site before deciding to spend money on talking to LLMs. When new people sign up, they'll get a few free messages to try out different always-on LLMs.
UI-only for now! We'll see how it goes and consider opening it up to API usage.
There have been a lot of releases lately! We've added support for quite a few models since our last newsletter:
We also added support for new on-demand architectures like EXAONE 4.0 and Ernie 4.5.
You've probably noticed the site becoming smoother and easier on the eyes over the last couple of months. That's no accident, and we'll keep working on it!
If you made it this far: thanks for reading! We're hard at work on more improvements, including but not limited to:
If you have any thoughts or feedback, please continue to reach out at [email protected]. We appreciate all the emails we've gotten so far!
— Matt & Billy