Published 2025-05-10
That's right: we're finally shipping multimodal image input support for our API! For all models that support multimodal image input, you can now use our OpenAI-compatible API to chat with them about images you input. We've also shipped support for several new LLM architectures, significantly improved our UI, and made large performance and reliability improvements for on-demand models. We're working on multimodal image support in our UI as well, so stay tuned!
After much demand, we're finally launching support for multimodal image input in our API! If you use our API, you can now ask LLMs that support multimodal image input (like Llama 4) questions about images. For example, let's ask Llama 4 Scout about this image:
We'll use a curl command in Bash to ask it to describe the image:
curl -X POST "https://api.synthetic.new/v1/chat/completions" \
-H "Content-Type: application/json" -H "Authorization: Bearer ${SYNTHETIC_API_KEY}" \
-d '{
"model": "hf:meta-llama/Llama-4-Scout-17B-16E-Instruct",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe the contents of this image."
},
{
"type": "image_url",
"image_url": {
"url": "https://synthetic.new/images/totoro.jpg"
}
}
]
}
],
"max_tokens": 300
}' 2>/dev/null | jq -r ".choices[0].message.content"
In response, we get:
The image depicts a scene from the animated film "My Neighbor Totoro," featuring a young girl and two fantastical creatures in a forest setting.
The background of the image features a dense forest with numerous trees and green foliage. The overall atmosphere suggests that the girl has encountered these magical creatures while exploring the woods.
We're launching this to the API first, because it's ready! We expect to launch UI for multimodal image input in the coming weeks.
Since our last newsletter, we've launched support for several new architectures!
We're keeping Llama 4 Scout and Maverick always-on, as well as Qwen 3's largest model, Qwen/Qwen3-235B-A22B. We've also added deepseek-ai/DeepSeek-V3-0324 to our always-on list: it's really good!
Our UI has changed a lot over the past few weeks, as you may have noticed. Our goal is to make it simpler to quickly start conversations with LLMs: now, when you go to synthetic.new, you'll automatically land on a new chat page for the last model you used in the UI. And hitting the + icon in a thread will automatically create a new thread with the model from your current thread! Here's what our new chat page looks like:
You can also share links directly to models you like! For example, synthetic.new/hf/deepseek-ai/DeepSeek-V3-0324 will take you directly to DeepSeek-V3-0324.
We overhauled how we launch on-demand models, and now use a new GPU provider on our backend and have significantly better caching in place. Launching on-demand models should be much faster now, and work more reliably. It's still not fast to launch a model on-demand — we still need to load potentially hundreds of gigabytes of weights into GPU VRAM, and compile them! — but it should be much better than it used to be.
If you made it this far: thanks for reading! We're hard at work on more improvements, including but not limited to:
If you have any thoughts or feedback, please continue to reach out at [email protected]. We appreciate all the emails we've gotten so far!
— Matt & Billy