New Billing UI, /models endpoint, and inference/stability improvements

November 16, 2025

Table of Contents

New Billing UI
Better /models Endpoint
Kimi K2 Thinking Improvements
Infrastructure Improvements

We have a few small updates to announce, before some bigger announcements soon!

New Billing UI

We’ve gotten a lot of feedback this could be improved, so we’ve taken a first pass at improving it! You can check it out at https://synthetic.new/billing

Easier to get to subscription quota
Significantly reduced clutter
Show your subscription credit (which previously existed but hidden on /referrals)

Open to feedback!

As part of this we moved usage-based pricing to https://dev.synthetic.new/usage, as subscriptions and always-on models are becoming more popular. However we do have some exciting “developer platform” features down the line, stay tuned!

Better /models Endpoint

This has been out for a few days, but we now publish detailed information about all our models:

curl "https://api.synthetic.new/openai/v1/models" --header "authorization: Bearer ${SYNTHETIC_API_KEY}" | jq

For our self-hosted models (GLM 4.6, MiniMax M2, and Kimi K2 Thinking), we provide the most information, including quantization (we generally always run the original model weights without quantizing so you get maximum intelligence†), context length, etc.

You can filter by provider by adding ?provider= , eg.

https://api.synthetic.new/openai/v1/models?provider=synthetic

Documentation for the new /models features will be updated soon™.

Kimi K2 Thinking Improvements

It’s been Kimi launch week! We (by which I mean Matt) has been hard at work fixing the launch week bugs, including:

Fixing constant crashes due to SGLang overlap scheduler (reported upstream to sglang)
Fixed garbled tool calls caused by SGLang EBNF grammer mis-compiling JSON schemas, and a Kimi-specific bug preventing disabling the EBNF grammar compiler. (reported upstream and partially patched by SGLang)
Fixed broken tool calls being “inside” reasoning content due to missing </think> token, by patching the reasoning parser to allow tool calls to auto-end reasoning mode.

Huge shoutout to the SGLang team for maintaining such a high quality, readable codebase so it’s easy to fix and understand even launch-day bugs for new models.

(we're aware of an issue where Kimi K2 Thinking is reluctant to call tools, which we believe is due to Kimi K2 generating invalid tool JSON, should be fixed in a future version of SGLang)

Infrastructure Improvements

We’ve had a handful of minor outages throughout the last couple of weeks, and Matt and I have been hard at work making a slew of infrastructure improvements up and down the stack, from tuning sglang parameters to improving our kubernetes setup to improving our monitoring stack so we’re properly paged when things go wrong and can track performance and error rates over time.

We’re sorry if you’ve been impacted and appreciate everyone being patient with us. We believe we're mostly out of the woods for stability issues but we take stability very seriously and have a couple more important improvements down the line‡ to further improve things.

Thank you again for bearing with us!

- Billy

If you got here, thanks for being a Synthetic user and being part of our community! Feedback from our users is what allows us to continue to iterate and improve.

Please keep the feedback coming, and consider joining our Discord for faster updates and to engage with us directly!

We've been working on a lot of exciting things in the background, so stay tuned for more updates!

‡ for the nerds: The last major issue seems to be that sglang-router isn’t properly load balancing requests. Often even when we have plenty of capacity, user requests take a long time to complete or get 503 errors because they’re being sent to an already full node. We plan to have a fix out for this soon which should solve the rest of our infra issues. :slight_smile:

† For GLM 4.6 we run a custom FP8 quant that fixed a launch day speculative decoding bug. (which has since been fixed upstream)

↑ Back to top