Read in

Releases

Gemini 3 Flash

Google has released the smaller, cheaper, faster version of their Gemini 3 Pro model, Gemini 3 Flash.

Benchmarks

It is a strong model, especially given its pricing. For most tasks it will be passable, if not equal to many of the top models out there right now. It is on the Pareto frontier for intelligence per dollar. If you are looking for an alternative to the cheap Chinese LLM’s this is a great option to look at, as it seems to be just as good if not better across the board.

For coding it is definitely usable for easy and maybe even medium difficulty tasks. You may notice in some of the coding benchmarks it actually outdoes Gemini 3 Pro. According to the team at Google, this is because they were able to spend more time RL’ing the model before release. This does not seem to translate to real world performance however, and is just a bit of benchmaxxing.

Model	$ per million (input)	$ per million (output)	Tokens per second
GLM 4.6	$0.60	$2.20	90
Claude Sonnet 4.5	$3	$15	57
GPT 5.2	$1.75	$14	34
Kimi K2 Thinking	$0.6	$2.50	25
Gemini 3 Pro	$2	$12	80
Gemini 3 Flash	$0.50	$3	75

Numbers from OpenRouter. I am unsure why Gemini 3 Flash is listed as slower than Gemini 3 Pro. According to Artificial Analysis it should be 35% faster.

Overall a very solid model that I would check out if you are looking for a good quality LLM that is fast and cheap.

Nemotron 3 Nano

Nvidia has been turning up the open source releases as of late, releasing the first model in their new Nemotron 3 series.

The model is the smallest in the family, called Nemotron 3 Nano and is a mixture of experts model with 30 billion parameters with 3 billion active, putting it at the same size as the Qwen3 30B model. Where it differs from its Qwen3 counterpart is its architecture. Instead of using the usual transformer layers with quadratic attention, it is using a linear variant called Mamba-2, which allows it to have much better speeds as the sequence length increases.

Benchmarks and throughput measurements

For the throughput numbers, ISL is input sequence length (how many tokens are in the prompt) and OSL is output sequence length (how many tokens the model generated)

Unlike the Qwen models, which only have the model open sourced, but none of the training infrastructure or data, Nemotron 3 has everything available, including the pretraining data, supervised finetuning data, and reinforcement learning environments.

In the real world, people seem to prefer Nemotron over Qwen, which was the previous best model for its size. This is also reflected in some of the internal benchmarks I have been making (official release coming soon), as it has almost triple the real world knowledge that Qwen 30B has.

If you run models at home, I would definitely recommend checking out this model if you haven’t already. Also expect releases of its two bigger brothers in the near future, which will be ~120 billion and ~480 billion parameters each.

GPT 5.2 Codex

OpenAI has released the coding focused version of their GPT 5.2 model. It is finetuned specifically for using their Codex coding framework, hence the name GPT 5.2 Codex.

SWE-Bench Pro Scores

There is not much that stands out about the model on the benchmark side of things. It is most likely just more reliable in the Codex harness than they have for it.

For GPT 5.2, what I have been seeing people say is that it excels for very hard tasks, but for day to day coding it is slow and depending on the task still may not be better than Opus 4.5.

I recently have switched to Opus 4.5 on the Claude Max plan ($100/month) and have yet to run into anything that Opus was unable to crush in one or two prompts for me.

Quick Hits

SOTA Open Source TTS

The open source TTS community has been falling behind companies like ElevenLabs, especially when it comes to voice cloning.

We no longer have that issue as Resemble AI has released their Chatterbox Turbo TTS model.

It is an MIT licensed open source model that can do voice cloning off of just 5 seconds of audio, has better performance that Eleven Labs Turbo v2.5, and has 150ms time to first utterance.

I used it to clone my own voice on Huggingface and it did a very good job. It is definitely far above any other open source model I have used, and is just as good as Eleven Labs v3 was for cloning my voice.

Benchmarks

Qwen Image Layered

When has released an interesting new image model based on their Qwen Image and Edit lineup.

This model is a layer decomposition model, which allows you to take your reference image and split into multiple layers for you to edit like you would in Photoshop.

The model can then be used to edit each layer for you as well, similar to Qwen Image Edit, giving you much more fine grained control over your images.

Like the other Qwen Image models, it is open source under an Apache 2 license.

Claude Code can use Chrome

A small quality of life improvement for those using Claude Code. If you run the /chrome command, you can set up Claude to be able to use Chrome directly using an extension, which will allow it to view and debug frontend issues all by itself.

Function Gemma

Alongside the Gemini 3 Flash release, Google also released a small 270 million parameter model called Function Gemma that has been trained specifically for function calling.

You are meant to finetune it for your specific task, since it is not a very general model given its small size. Its size does allow it to be deployed pretty much anywhere and will be able to run faster than pretty much any other edge LLM out there.

Finish

I hope you enjoyed the news this week. If you want to get the news every week, be sure to join our mailing list below.

Algorithmic art by yuruyurau on Twitter

Gemini 3 Flash

Releases

Gemini 3 Flash

Nemotron 3 Nano

GPT 5.2 Codex

Quick Hits

SOTA Open Source TTS

Qwen Image Layered

Claude Code can use Chrome

Function Gemma

Finish

Stay Updated