synthetic

    Synthetic can use either subscription or usage-based pricing. Choose the plan that works best for you.

    Subscription Packs

    1
    (1 pack)
    Price
    $1/day
    $30/mo
    Rate Limit
    500
    messages/5hr
    vs Claude
    3×
    higher limits
    Run any agent for $1/day.
    ✓Works with any agent framework
    ✓3x higher rate limits than Claude's $20/month plan
    ✓1 concurrent request per model (buy more packs to increase)
    ✓UI and API access

    Usage-based

    Agents for enterprise.
    ✓Works with any agent framework
    ✓UI and API access
    ✓All models are pay-per-token

    Run any agent for $1/day.

    ✓Works with any agent framework
    ✓3x higher rate limits than Claude's $20/month plan
    ✓1 concurrent request per model (buy more packs to increase)
    ✓UI and API access

    Agents for enterprise.

    ✓Works with any agent framework
    ✓UI and API access
    ✓All models are pay-per-token
    Sign upLog in

    Always-on model pricing

    We keep popular open-source models always-on: there's no boot time, they're just ready to go. For usage-based plans, always-on models are charged per-token.

    What's a token?

    Just like we read word-by-word, LLMs break down text into tokens, which can be words or part of a word. On average, two words are worth three tokens.

    Always-on models are very affordable, usually only costing fractions of a cent per conversation.

    Not sure how many tokens your prompt takes? Try our interactive token calculator →

    Here's the list of our always-on models:

    ModelContext lengthInput price (per million tokens)Output price (per million tokens)
    deepseek-ai/DeepSeek-R1-0528 128k tokens$3.00/mtok$8.00/mtok
    deepseek-ai/DeepSeek-V3 128k tokens$1.25/mtok$1.25/mtok
    deepseek-ai/DeepSeek-V3.2 159k tokens$0.56/mtok$1.68/mtok
    meta-llama/Llama-3.3-70B-Instruct 128k tokens$0.88/mtok$0.88/mtok
    MiniMaxAI/MiniMax-M2.5 187k tokens$0.40/mtok$2.00/mtok
    moonshotai/Kimi-K2.5 256k tokens$0.45/mtok$3.40/mtok
    nvidia/Kimi-K2.5-NVFP4 256k tokens$0.45/mtok$3.40/mtok
    nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 256k tokens$0.30/mtok$1.00/mtok
    openai/gpt-oss-120b 128k tokens$0.10/mtok$0.10/mtok
    Qwen/Qwen3-235B-A22B-Thinking-2507 256k tokens$0.65/mtok$3.00/mtok
    Qwen/Qwen3-Coder-480B-A35B-Instruct 256k tokens$2.00/mtok$2.00/mtok
    Qwen/Qwen3.5-397B-A17B 256k tokens$0.60/mtok$3.60/mtok
    zai-org/GLM-4.7 198k tokens$0.45/mtok$2.19/mtok
    zai-org/GLM-4.7-Flash 192k tokens$0.10/mtok$0.50/mtok
    zai-org/GLM-5 192k tokens$1.00/mtok$3.00/mtok
    zai-org/GLM-5.1 (Beta!)192k tokens$1.00/mtok$3.00/mtok

    LoRA pricing

    What's a LoRA?

    Low-rank adapters — called "LoRAs" — are small, efficient fine-tunes that run on top of existing models. They can modify a model to be much more effective at specific tasks.

    LoRAs of the following models are always-on, and are charged per-token for usage-based plans.

    ModelContext lengthInput price (per million tokens)Output price (per million tokens)
    meta-llama/Llama-3.2-1B-Instruct 128k tokens$0.06/mtok$0.06/mtok
    meta-llama/Llama-3.2-3B-Instruct 128k tokens$0.06/mtok$0.06/mtok
    meta-llama/Meta-Llama-3.1-8B-Instruct 128k tokens$0.20/mtok$0.20/mtok
    meta-llama/Meta-Llama-3.1-70B-Instruct 128k tokens$0.90/mtok$0.90/mtok
    LoRA sizes are measured in "ranks," starting at rank-8; we support up to rank-64 LoRAs kept always-on, and we run them in FP8 precision. The rank is set during the finetuning process: if you create your own LoRA, you'll be able to set exactly what rank you want using standard configuration for your training framework.

    Embedding pricing

    What are embeddings?

    Embedding models convert text to special numerical coordinates, placing more-similar text closer to each other and less-similar text more distant: these coordinates are referred to as "embeddings". Embedding models are often used by AI-enabled tools for tasks like codebase indexing or search.

    Embedding models are charged per-token for usage-based plans.

    ModelContext lengthInput price (per million tokens)
    nomic-ai/nomic-embed-text-v1.5 8k tokens$0.01/mtok
    Since embedding models aren't full LLMs and can't be used for chat — only for creating embedding coordinates — these models are only accessible via the API.

    Roo & KiloCode setup

    In the Codebase Indexing setup, select the "OpenAI Compatible" embedding provider and paste in your API key. Set "Model Dimension" to the embedding model's default dimension: in the case of nomic-ai/nomic-embed-text-v1.5, use 768.

    Make sure to copy the model string with an "hf:" prefix (short for "Hugging Face", the open-source code repository where these models are stored); for example, hf:nomic-ai/nomic-embed-text-v1.5

    Your configuration should look roughly like so:

    Screenshot
    Instructions for integrating with KiloCode and Roo Code
    Sign upLog in