Synthetic subscriptions have three types of rate limits: a five-hour request limit, a weekly credit limit, and a concurrency throttle. All rate limits scale with subscription packs: you can buy more packs to get higher limits. Here's a simulator to explain how they work.
Each pack gives you $24 of API credits per week. That nets out to slightly more than $102/month of value.
Credits regenerate incrementally over the course of a week. Every 202 minutes (about 3.4 hours), you get back 2% of your total weekly quota. That means it takes one week to fully regenerate from zero.
Since these are API credits, using cheaper models means your weekly limit will stretch further. Route lightweight tasks — summarization, title generation, codebase exploration — to a smaller model and your quota goes a lot further.
Each pack gives you 500 requests per five hours.
Requests regenerate incrementally over five hours. Every 15 minutes, you get back 5% of your total five-hour quota. That means it takes five hours to fully regenerate from zero.
Similar to the weekly credits, requests are scaled by the input price of the model you use. The baseline is our default model, which currently is zai-org/GLM-5.2: one call counts as exactly 1 request against your limit. Cheaper models cost fewer requests; for example, zai-org/GLM-4.7-Flash is 10× cheaper, so a typical call only counts as about 0.1 requests. Routing lightweight tasks to smaller models makes your quota goes a lot further, just like the weekly limit.
Unlike traditional rate limits — which reset on a fixed timer and lock you out for the entire reset period once you go over them — Synthetic continuously regenerates your quota in small increments: you're never waiting for a full reset. Even if you burn through your entire allowance, the next tick is always just around the corner:
Worst-case wait is 15 minutes. In practice, you're usually back online within a few minutes.
Worst-case wait is about 3.4 hours. You start getting credits back the same day.
This also means there is no penalty for burst usage. You can use all your requests at once for a heavy coding session, then walk away knowing your quota will have regenerated some usage when you return.
Each subscription pack lets you run 1 request per model at full speed. Requests to different models run in parallel — they don't interfere with each other. But if you send multiple requests to the same model at the same time, anything beyond your concurrency throttle gets queued behind the earlier ones.
For example, with 1 pack you could have 1 request to zai-org/GLM-5.2 running at the same time, while also having 1 request to zai-org/GLM-4.7-Flash running in parallel. But a 2nd request to the same model would wait in line until one finishes.
Many coding agent harnesses let you assign different models to different tasks: for example, using a fast model to summarize sessions and generate titles, or to explore your codebase, while using a stronger model for actual code generation. We recommend configuring multiple models where possible. It keeps each model's queue shorter, gives you better overall throughput, and uses your rate limits more efficiently because work stays parallel instead of piling up behind a single bottleneck.
One new request every 1.0s total.
Each request takes 1.5s to finish.
All requests go to one model, arriving every 1.0s, taking 1.5s to process. Capacity is 1 request per 1.5s, so arrivals outpace completions and a queue forms. Increase the number of packs to clear arrivals faster.
Requests arrive every 1.0s, which means each queue sees a request every 2.0s with the same 1.5s processing time. Capacity per model is 1 per 1.5s, so completions outpace arrivals, no queue.