We have a few small updates to announce, before some bigger announcements soon!
We’ve gotten a lot of feedback this could be improved, so we’ve taken a first pass at improving it! You can check it out at https://synthetic.new/billing
/referrals)Open to feedback!
As part of this we moved usage-based pricing to https://dev.synthetic.new/usage, as subscriptions and always-on models are becoming more popular. However we do have some exciting “developer platform” features down the line, stay tuned!
This has been out for a few days, but we now publish detailed information about all our models:
curl "https://api.synthetic.new/openai/v1/models" --header "authorization: Bearer ${SYNTHETIC_API_KEY}" | jq
For our self-hosted models (GLM 4.6, MiniMax M2, and Kimi K2 Thinking), we provide the most information, including quantization (we generally always run the original model weights without quantizing so you get maximum intelligence†), context length, etc.
You can filter by provider by adding ?provider= , eg.
https://api.synthetic.new/openai/v1/models?provider=synthetic
Documentation for the new /models features will be updated soon™.
It’s been Kimi launch week! We (by which I mean Matt) has been hard at work fixing the launch week bugs, including:
Huge shoutout to the SGLang team for maintaining such a high quality, readable codebase so it’s easy to fix and understand even launch-day bugs for new models.
(we're aware of an issue where Kimi K2 Thinking is reluctant to call tools, which we believe is due to Kimi K2 generating invalid tool JSON, should be fixed in a future version of SGLang)
We’ve had a handful of minor outages throughout the last couple of weeks, and Matt and I have been hard at work making a slew of infrastructure improvements up and down the stack, from tuning sglang parameters to improving our kubernetes setup to improving our monitoring stack so we’re properly paged when things go wrong and can track performance and error rates over time.
We’re sorry if you’ve been impacted and appreciate everyone being patient with us. We believe we're mostly out of the woods for stability issues but we take stability very seriously and have a couple more important improvements down the line‡ to further improve things.
Thank you again for bearing with us!
- Billy
If you got here, thanks for being a Synthetic user and being part of our community! Feedback from our users is what allows us to continue to iterate and improve.
Please keep the feedback coming, and consider joining our Discord for faster updates and to engage with us directly!
We've been working on a lot of exciting things in the background, so stay tuned for more updates!
‡ for the nerds: The last major issue seems to be that sglang-router isn’t properly load balancing requests. Often even when we have plenty of capacity, user requests take a long time to complete or get 503 errors because they’re being sent to an already full node. We plan to have a fix out for this soon which should solve the rest of our infra issues. :slight_smile:
† For GLM 4.6 we run a custom FP8 quant that fixed a launch day speculative decoding bug. (which has since been fixed upstream)