Published 2025-08-28
May 1st of this year, Anthropic launched a flat, monthly subscription to use Claude models inside Claude Code. This was pretty groundbreaking for coding use cases: coding agents tend to consume tokens pretty hungrily, and being able to wrap your coding agent spending into a flat subscription fee meant it was much harder to lose your shirt trying to refactor your codebase.
Until now there wasn't a similar subscription for open-source models, so it was harder to justify using them for coding agents. So: today we're launching one. It's $20/month for 100 requests per five hours (roughly double Claude's $20/month rate limits), or $60/month for 1,000 — more than Claude's $200/month Max subscription, at less than a third of the price.
We're launching subscriptions to all of our always-on LLMs — we support basically all of the major open-source coding LLMs. Some of our favorite coding LLMs include:
The subscription covers both the UI, and the API: we don't put any restrictions on how you use it. It should work out of the box in pretty much any OpenAI-compatible coding agent framework, like:
And pretty much everything else, too. And it'll work in standard LLM clients as well, like OpenWebUI or SillyTavern.
In addition to offering better pricing, we also think we compare pretty favorably to using something like OpenRouter from a reliability perspective: a lot of the inference backends on OpenRouter are serving broken or lobotomized versions of these models, especially when it comes to function calling: in Aider's testing, there's a massive 10 percentage point difference in coding agent task completion between the official Alibaba Qwen3 Coder API and round-robining between OpenRouter Qwen3 Coder hosts. We test our LLMs against our coding agent Octofriend to make sure that the implementations work well before we release them — not all implementations of open-source LLMs are equal.
A lot of you have used our on-demand LLMs, where you pay per-minute to launch arbitrary open-source models and finetunes from Hugging Face repos on reserved GPUs. Currently, we can't make GPU pricing affordable enough to bundle into a flat subscription... But you can still pay our low, per-minute pricing (typically, single-digit cents per minute) to launch LLMs on-demand. We compare very favorably to companies like Modal Labs and Replicate: we're 2x cheaper for 80GB GPUs, for example.
We're working on getting cheaper hardware running for on-demand models, so that we can offer a certain number of hours of on-demand GPU usage per month included in your subscription. Currently we're experimenting with Framework Desktops as a relatively low-cost way to run reasonably-sized models, and we also have orders out for Nvidia DGX Spark boxes. Stay tuned!
You may have noticed the site getting smoother and easier on the eyes over the past few months. That's no accident, and we'll keep working on it. We also launched a nifty token calculator that helps understand how different LLMs tokenize different strings, and makes per-token pricing more understandable — although we think for most people, subscriptions are simpler!
If you made it this far: thanks for reading! We're hard at work on more improvements, including but not limited to:
If you have any thoughts or feedback, please continue to reach out at [email protected]. We appreciate all the emails we've gotten so far!
— Matt & Billy