synthetic

    Rate limits

    Synthetic subscriptions have three types of rate limits: a five-hour request limit, a weekly credit limit, and a concurrency throttle. All rate limits scale with subscription packs: you can buy more packs to get higher limits. Here's a simulator to explain how they work.

    Subscription packs

    Price
    $1/day
    $30/mo
    Rate limit
    500
    requests/5hr
    Weekly credits
    $24
    Over $102/month
    Concurrent
    1
    requests per model

    Weekly credit quota

    Each pack gives you $24 of API credits per week. That nets out to slightly more than $102/month of value.

    Credits regenerate incrementally over the course of a week. Every 202 minutes (about 3.4 hours), you get back 2% of your total weekly quota. That means it takes one week to fully regenerate from zero.

    Weekly credit regeneration

    Fully regenerated ($24.00)
    $24.00
    100%

    Since these are API credits, using cheaper models means your weekly limit will stretch further. Route lightweight tasks — summarization, title generation, codebase exploration — to a smaller model and your quota goes a lot further.

    Per-five-hour request limit

    Each pack gives you 500 requests per five hours.

    Requests regenerate incrementally over five hours. Every 15 minutes, you get back 5% of your total five-hour quota. That means it takes five hours to fully regenerate from zero.

    Five-hour request regeneration

    Fully regenerated (500 reqs)
    500 reqs
    100%

    Similar to the weekly credits, requests are scaled by the input price of the model you use. The baseline is our default model, which currently is zai-org/GLM-5.2: one call counts as exactly 1 request against your limit. Cheaper models cost fewer requests; for example, zai-org/GLM-4.7-Flash is 10× cheaper, so a typical call only counts as about 0.1 requests. Routing lightweight tasks to smaller models makes your quota goes a lot further, just like the weekly limit.

    Why incremental regeneration?

    Unlike traditional rate limits — which reset on a fixed timer and lock you out for the entire reset period once you go over them — Synthetic continuously regenerates your quota in small increments: you're never waiting for a full reset. Even if you burn through your entire allowance, the next tick is always just around the corner:

    Five-hour limit

    Worst-case wait is 15 minutes. In practice, you're usually back online within a few minutes.

    Weekly credit limit

    Worst-case wait is about 3.4 hours. You start getting credits back the same day.

    This also means there is no penalty for burst usage. You can use all your requests at once for a heavy coding session, then walk away knowing your quota will have regenerated some usage when you return.

    Concurrent requests per model

    Each subscription pack lets you run 1 request per model at full speed. Requests to different models run in parallel — they don't interfere with each other. But if you send multiple requests to the same model at the same time, anything beyond your concurrency throttle gets queued behind the earlier ones.

    For example, with 1 pack you could have 1 request to zai-org/GLM-5.2 running at the same time, while also having 1 request to zai-org/GLM-4.7-Flash running in parallel. But a 2nd request to the same model would wait in line until one finishes.

    Many coding agent harnesses let you assign different models to different tasks: for example, using a fast model to summarize sessions and generate titles, or to explore your codebase, while using a stronger model for actual code generation. We recommend configuring multiple models where possible. It keeps each model's queue shorter, gives you better overall throughput, and uses your rate limits more efficiently because work stays parallel instead of piling up behind a single bottleneck.

    Simulation controls

    1.0s

    One new request every 1.0s total.

    1.5s

    Each request takes 1.5s to finish.

    Same model queued

    All requests go to one model, arriving every 1.0s, taking 1.5s to process. Capacity is 1 request per 1.5s, so arrivals outpace completions and a queue forms. Increase the number of packs to clear arrivals faster.

    zai-org/GLM-5.20/1 running, 0 queued
    0
    Running
    0
    Queued

    Different models in parallel

    Requests arrive every 1.0s, which means each queue sees a request every 2.0s with the same 1.5s processing time. Capacity per model is 1 per 1.5s, so completions outpace arrivals, no queue.

    zai-org/GLM-5.20/1 running, 0 queued
    zai-org/GLM-4.7-Flash0/1 running, 0 queued
    0
    Running
    0
    Queued