Synthetic can use either subscription or usage-based pricing. Choose the plan that works best for you.
Run any agent for $1/day.
Agents for enterprise.
We keep popular open-source models always-on: there's no boot time, they're just ready to go. For usage-based plans, always-on models are charged per-token.
Just like we read word-by-word, LLMs break down text into tokens, which can be words or part of a word. On average, two words are worth three tokens.
Always-on models are very affordable, usually only costing fractions of a cent per conversation.
Not sure how many tokens your prompt takes? Try our interactive token calculator →
Here's the list of our always-on models:
| Model | Context length | Input price (per million tokens) | Output price (per million tokens) |
|---|---|---|---|
| deepseek-ai/DeepSeek-R1-0528 | 128k tokens | $3.00/mtok | $8.00/mtok |
| deepseek-ai/DeepSeek-V3 | 128k tokens | $1.25/mtok | $1.25/mtok |
| deepseek-ai/DeepSeek-V3.2 | 159k tokens | $0.56/mtok | $1.68/mtok |
| meta-llama/Llama-3.3-70B-Instruct | 128k tokens | $0.88/mtok | $0.88/mtok |
| MiniMaxAI/MiniMax-M2.5 | 187k tokens | $0.40/mtok | $2.00/mtok |
| moonshotai/Kimi-K2.5 | 256k tokens | $0.45/mtok | $3.40/mtok |
| nvidia/Kimi-K2.5-NVFP4 | 256k tokens | $0.45/mtok | $3.40/mtok |
| nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 | 256k tokens | $0.30/mtok | $1.00/mtok |
| openai/gpt-oss-120b | 128k tokens | $0.10/mtok | $0.10/mtok |
| Qwen/Qwen3-235B-A22B-Thinking-2507 | 256k tokens | $0.65/mtok | $3.00/mtok |
| Qwen/Qwen3-Coder-480B-A35B-Instruct | 256k tokens | $2.00/mtok | $2.00/mtok |
| Qwen/Qwen3.5-397B-A17B | 256k tokens | $0.60/mtok | $3.60/mtok |
| zai-org/GLM-4.7 | 198k tokens | $0.45/mtok | $2.19/mtok |
| zai-org/GLM-4.7-Flash | 192k tokens | $0.10/mtok | $0.50/mtok |
| zai-org/GLM-5 | 192k tokens | $1.00/mtok | $3.00/mtok |
| zai-org/GLM-5.1 (Beta!) | 192k tokens | $1.00/mtok | $3.00/mtok |
Low-rank adapters — called "LoRAs" — are small, efficient fine-tunes that run on top of existing models. They can modify a model to be much more effective at specific tasks.
LoRAs of the following models are always-on, and are charged per-token for usage-based plans.
| Model | Context length | Input price (per million tokens) | Output price (per million tokens) |
|---|---|---|---|
| meta-llama/Llama-3.2-1B-Instruct | 128k tokens | $0.06/mtok | $0.06/mtok |
| meta-llama/Llama-3.2-3B-Instruct | 128k tokens | $0.06/mtok | $0.06/mtok |
| meta-llama/Meta-Llama-3.1-8B-Instruct | 128k tokens | $0.20/mtok | $0.20/mtok |
| meta-llama/Meta-Llama-3.1-70B-Instruct | 128k tokens | $0.90/mtok | $0.90/mtok |
Embedding models convert text to special numerical coordinates, placing more-similar text closer to each other and less-similar text more distant: these coordinates are referred to as "embeddings". Embedding models are often used by AI-enabled tools for tasks like codebase indexing or search.
Embedding models are charged per-token for usage-based plans.
| Model | Context length | Input price (per million tokens) |
|---|---|---|
| nomic-ai/nomic-embed-text-v1.5 | 8k tokens | $0.01/mtok |