A flat monthly subscription to open-source LLMs

Published 2025-08-28

May 1st of this year, Anthropic launched a flat, monthly subscription to use Claude models inside Claude Code. This was pretty groundbreaking for coding use cases: coding agents tend to consume tokens pretty hungrily, and being able to wrap your coding agent spending into a flat subscription fee meant it was much harder to lose your shirt trying to refactor your codebase.

Until now there wasn't a similar subscription for open-source models, so it was harder to justify using them for coding agents. So: today we're launching one. It's $20/month for 125 requests per five hours (3x Claude's $20/month rate limits), or $60/month for 1250 — more than Claude's $200/month Max subscription, at less than a third of the price. We never train on your data, and we delete all prompt and completion data from the API within 14 days. (In fact, we don't intentionally store prompts or completions at all, but give ourselves 14 days to remediate if a log statement accidentally slips through.)

Subscriptions
On-demand GPU pricing
Bugfixes and general improvements
Future plans

Subscriptions

We're launching subscriptions to all of our always-on LLMs — we support basically all of the major open-source coding LLMs. Some of our favorite coding LLMs include:

GLM-4.5: my personal favorite at the moment. A hybrid reasoner that tends to avoid overthinking and compares pretty favorably to closed-source models like Sonnet 4.
Kimi K2: a very good non-reasoning model.
Qwen3 Coder 480B: excellent at one-shot coding problem solving.
DeepSeek 3.1: most likely the final entry in the DeepSeek V3 line, updated to be better for agentic coding tasks.

The subscription covers both the UI, and the API: we don't put any restrictions on how you use it. It should work out of the box in pretty much any OpenAI-compatible coding agent framework, like:

Cline
KiloCode
Roo
Aider
Octofriend, the open-source terminal agent we launched last month

And pretty much everything else, too. And it'll work in standard LLM clients as well, like OpenWebUI or SillyTavern.

In addition to offering better pricing, we also think we compare pretty favorably to using something like OpenRouter from a reliability perspective: a lot of the inference backends on OpenRouter are serving broken or lobotomized versions of these models, especially when it comes to function calling: in Aider's testing, there's a massive 10 percentage point difference in coding agent task completion between the official Alibaba Qwen3 Coder API and round-robining between OpenRouter Qwen3 Coder hosts. We test our LLMs against our coding agent Octofriend to make sure that the implementations work well before we release them — not all implementations of open-source LLMs are equal.

On-demand GPU pricing

A lot of you have used our on-demand LLMs, where you pay per-minute to launch arbitrary open-source models and finetunes from Hugging Face repos on reserved GPUs. Currently, we can't make GPU pricing affordable enough to bundle into a flat subscription... But you can still pay our low, per-minute pricing (typically, single-digit cents per minute) to launch LLMs on-demand. We compare very favorably to companies like Modal Labs and Replicate: we're 2x cheaper for 80GB GPUs, for example.

We're working on getting cheaper hardware running for on-demand models, so that we can offer a certain number of hours of on-demand GPU usage per month included in your subscription. Currently we're experimenting with Framework Desktops as a relatively low-cost way to run reasonably-sized models, and we also have orders out for Nvidia DGX Spark boxes. Stay tuned!

Bugfixes and general improvements

You may have noticed the site getting smoother and easier on the eyes over the past few months. That's no accident, and we'll keep working on it. We also launched a nifty token calculator that helps understand how different LLMs tokenize different strings, and makes per-token pricing more understandable — although we think for most people, subscriptions are simpler!

Future plans

If you made it this far: thanks for reading! We're hard at work on more improvements, including but not limited to:

On-demand minutes included in your subscription!
Shareable chat threads in the UI: sometimes you just want to shoot a chat thread you've been having with a model over to a friend or coworker, and we want to support this easily (opt-in of course, for privacy). It's just not the same to copy/paste the entire thread and try to send it to someone: it's much easier to just send them a link.
Dark mode — we've had a lot of requests for this one and it's on our radar. Our designer is working hard at improving the UI, and one of the items on her agenda is a good set of colors for dark mode.
Better model search and discovery. It's not easy to figure out which Hugging Face models we support or discover new models to try. Most users might not even know we support thousands of models. We're working on making that clearer and easier.
More backend and UI improvements. Maybe not the most exciting line item, but we're always bugfixing.

If you have any thoughts or feedback, please continue to reach out at [email protected]. We appreciate all the emails we've gotten so far!

— Matt & Billy