FeedCity: Simon Willison's Weblog

XBai o4

XBai o4 Yet another open source (Apache 2.0) LLM from a Chinese AI lab. This model card claims: XBai o4 excels in complex reasoning capabilities and has now completely surpassed OpenAI-o3-mini in Medium mode. This a 32.8 billion parameter model released by MetaStone AI, a ...

Simon Willison's Weblog
03 Aug 19:33

From Async/Await to Virtual Threads

From Async/Await to Virtual Threads Armin Ronacher has long been critical of async/await in Python, both for necessitating colored functions and because of the more subtle challenges they introduce like managing back pressure. Armin argued convincingly for the threaded progr...

Simon Willison's Weblog
02 Aug 20:12

Re-label the "Save" button to be "Publish", to better indicate to users the outcomes of their action

Re-label the "Save" button to be "Publish", to better indicate to users the outcomes of their action Fascinating Wikipedia usability improvement issue from 2016: From feedback we get repeatedly as a development team from interviews, user testing and other solicited and unso...

Simon Willison's Weblog
01 Aug 23:45

Faster inference

Two interesting examples of inference speed as a flagship feature of LLM services today. First, Cerebras announced two new monthly plans for their extremely high speed hosted model service: Cerebras Code Pro ($50/month, 1,000 messages a day) and Cerebras Code Max ($200/month...

Simon Willison's Weblog
01 Aug 17:15

Deep Think in the Gemini app

Deep Think in the Gemini app Google released Gemini 2.5 Deep Think this morning, exclusively to their Ultra ($250/month) subscribers: It is a variation of the model that recently achieved the gold-medal standard at this year's International Mathematical Olympiad (IMO). Whil...

Simon Willison's Weblog
01 Aug 16:03

July newsletter for sponors is out

This morning I sent out the third edition of my LLM digest newsletter for my $10/month and higher sponsors on GitHub. It included the following section headers:

Claude Code
Model releases in July
Gold medal performances in the IMO
Reverse engineering system prompts
Tools I'm using at the moment

The newsletter is a condensed summary of highlights from the past month of my blog. I published 98 posts in July - the concept for the newsletter is that you can pay me for the version that only takes 10 minutes to read!

Here are the newsletters I sent out for June 2025 and May 2025, if you want a taste of what you'll be getting as a sponsor. New sponsors instantly get access to the archive of previous newsletters, including the one I sent this morning.

Tags: newsletter

Simon Willison's Weblog
01 Aug 14:57

Quoting Logan Kilpatrick

Gemini Deep Think, our SOTA model with parallel thinking that won the IMO Gold Medal 🥇, is now available in the Gemini App for Ultra subscribers!! [...]

Quick correction: this is a variation of our IMO gold model that is faster and more optimized for daily use! We are also giving the IMO gold full model to a set of mathematicians to test the value of the full capabilities.

— Logan Kilpatrick, announcing Gemini Deep Think

Tags: gemini, logan-kilpatrick, llm-reasoning, ai, llms, llm-release, google, generative-ai

Simon Willison's Weblog
01 Aug 00:12

Reverse engineering some updates to Claude

Anthropic released two major new features for their consumer-facing Claude apps in the past couple of days. Sadly, they don't do a very good job of updating the release notes for those apps - neither of these releases came with any documentation at all beyond short announcem...

Simon Willison's Weblog
31 Jul 22:18

More model releases on 31st July

Here are a few more model releases from today, to round out a very busy July: Cohere released Command A Vision, their first multi-modal (image input) LLM. Like their others it's open weights under Creative Commons Attribution Non-Commercial, so you need to license it (or us...

Simon Willison's Weblog
31 Jul 22:18

Quoting Christina Wodtke

The old timers who built the early web are coding with AI like it's 1995. Think about it: They gave blockchain the sniff test and walked away. Ignored crypto (and yeah, we're not rich now). NFTs got a collective eye roll. But AI? Different story. The same folks who hand-co...

Simon Willison's Weblog
31 Jul 20:12

Trying out Qwen3 Coder Flash using LM Studio and Open WebUI and LLM

Qwen just released their sixth model(!) of this July called Qwen3-Coder-30B-A3B-Instruct - listed as Qwen3-Coder-Flash in their chat.qwen.ai interface. It's 30.5B total parameters with 3.3B active at any one time. This means it will fit on a 64GB Mac - and even a 32GB Mac if...

Simon Willison's Weblog
31 Jul 01:24

Ollama's new app

Ollama's new app

Ollama has been one of my favorite ways to run local models for a while - it makes it really easy to download models, and it's smart about keeping them resident in memory while they are being used and then cleaning them out after they stop receiving traffic.

The one missing feature to date has been an interface: Ollama has been exclusively command-line, which is fine for the CLI literate among us and not much use for everyone else.

They've finally fixed that! The new app's interface is accessible from the existing system tray menu and lets you chat with any of your installed models. Vision models can accept images through the new interface as well.

Via Hacker News

Tags: ai, generative-ai, local-llms, llms, ollama

Simon Willison's Weblog
30 Jul 21:54

Quoting Steve Krouse

When you vibe code, you are incurring tech debt as fast as the LLM can spit it out. Which is why vibe coding is perfect for prototypes and throwaway projects: It's only legacy code if you have to maintain it! [...]

The worst possible situation is to have a non-programmer vibe code a large project that they intend to maintain. This would be the equivalent of giving a credit card to a child without first explaining the concept of debt. [...]

If you don't understand the code, your only recourse is to ask AI to fix it for you, which is like paying off credit card debt with another credit card.

— Steve Krouse, Vibe code is legacy code

Tags: vibe-coding, ai-assisted-programming, generative-ai, steve-krouse, ai, llms

Simon Willison's Weblog
30 Jul 16:39

The best available open weight LLMs now come from China

Something that has become undeniable this month is that the best available open weight models now come from the Chinese AI labs. I continue to have a lot of love for Mistral, Gemma and Llama but my feeling is that Qwen, Moonshot and Z.ai have positively smoked them over the ...

Simon Willison's Weblog
30 Jul 16:09

Qwen3-30B-A3B-Thinking-2507

Qwen3-30B-A3B-Thinking-2507 Yesterday was Qwen3-30B-A3B-Instruct-2507. Qwen are clearly committed to their new split between reasoning and non-reasoning models (a reversal from Qwen 3 in April), because today they released the new reasoning partner to yesterday's model: Qwen...

Simon Willison's Weblog
29 Jul 19:30

OpenAI: Introducing study mode

OpenAI: Introducing study mode New ChatGPT feature, which can be triggered by typing /study or by visiting chatgpt.com/studymode. OpenAI say: Under the hood, study mode is powered by custom system instructions we’ve written in collaboration with teachers, scientists, and pe...

Simon Willison's Weblog
29 Jul 19:03

Qwen/Qwen3-30B-A3B-Instruct-2507

Qwen/Qwen3-30B-A3B-Instruct-2507 New model update from Qwen, improving on their previous Qwen3-30B-A3B release from late April. In their tweet they said: Smarter, faster, and local deployment-friendly. ✨ Key Enhancements: ✅ Enhanced reasoning, coding, and math skills ✅ Broa...

Simon Willison's Weblog
29 Jul 17:06

Quoting Nilay Patel

Our plan is to build direct traffic to our site. and newsletters just one kind of direct traffic in the end. I don’t intend to ever rely on someone else’s distribution ever again ;

— Nilay Patel, on The Verge's new newsletter strategy

Tags: nilay-patel, journalism, email

Simon Willison's Weblog
29 Jul 13:10

My 2.5 year old laptop can write Space Invaders in JavaScript now

I wrote about the new GLM-4.5 model family yesterday - new open weight (MIT licensed) models from Z.ai in China which their benchmarks claim score highly in coding even against models such as Claude Sonnet 4. The models are pretty big - the smaller GLM-4.5 Air model is still...

Simon Willison's Weblog
29 Jul 00:03

Quoting Anthropic

We’re rolling out new weekly rate limits for Claude Pro and Max in late August. We estimate they’ll apply to less than 5% of subscribers based on current usage. [...]

Some of the biggest Claude Code fans are running it continuously in the background, 24/7.

These uses are remarkable and we want to enable them. But a few outlying cases are very costly to support. For example, one user consumed tens of thousands in model usage on a $200 plan.

— Anthropic

Tags: anthropic, claude-code, llm-pricing, generative-ai, ai, llms

Simon Willison's Weblog
28 Jul 17:09

GLM-4.5: Reasoning, Coding, and Agentic Abililties

GLM-4.5: Reasoning, Coding, and Agentic Abililties Another day, another significant new open weight model release from a Chinese frontier AI lab. This time it's Z.ai - who rebranded (at least in English) from Zhipu AI a few months ago. They just dropped GLM-4.5-Base, GLM-4.5...

Simon Willison's Weblog
28 Jul 00:09

The many, many, many JavaScript runtimes of the last decade

The many, many, many JavaScript runtimes of the last decade

Extraordinary piece of writing by Jamie Birch who spent over a year putting together this comprehensive reference to JavaScript runtimes. It covers everything from Node.js, Deno, Electron, AWS Lambda, Cloudflare Workers and Bun all the way to much smaller projects idea like dukluv and txiki.js.

Via Hacker News

Tags: javascript, nodejs, deno

Simon Willison's Weblog
27 Jul 23:18

TIL: Exception.add_note

TIL: Exception.add_note

Neat tip from Danny Roy Greenfeld: Python 3.11 added a .add_note(message: str) method to the BaseException class, which means you can add one or more extra notes to any Python exception and they'll be displayed in the stacktrace!

Here's PEP 678 – Enriching Exceptions with Notes by Zac Hatfield-Dodds proposing the new feature back in 2021.

Via Lobste.rs

Tags: debugging, python

Simon Willison's Weblog
27 Jul 22:39

Enough AI copilots! We need AI HUDs

Enough AI copilots! We need AI HUDs

Geoffrey Litt compares Copilots - AI assistants that you engage in dialog with and work with you to complete a task - with HUDs, Head-Up Displays, which enhance your working environment in less intrusive ways.

He uses spellcheck as an obvious example, providing underlines for incorrectly spelt words, and then suggests his AI-implemented custom debugging UI as a more ambitious implementation of that pattern.

Plenty of people have expressed interest in LLM-backed interfaces that go beyond chat or editor autocomplete. I think HUDs offer a really interesting way to frame one approach to that design challenge.

Tags: design, design-patterns, ai, generative-ai, llms, geoffrey-litt

Simon Willison's Weblog
26 Jul 16:34

Official statement from Tea on their data leak

Official statement from Tea on their data leak Tea is a dating safety app for women that lets them share notes about potential dates. The other day it was subject to a truly egregious data leak caused by a legacy unprotected Firebase cloud storage bucket: A legacy data stor...

Simon Willison's Weblog
25 Jul 23:12

Qwen3-235B-A22B-Thinking-2507

Qwen3-235B-A22B-Thinking-2507 The third Qwen model release week, following Qwen3-235B-A22B-Instruct-2507 on Monday 21st and Qwen3-Coder-480B-A35B-Instruct on Tuesday 22nd. Those two were both non-reasoning models - a change from the previous models in the Qwen 3 family which...

Simon Willison's Weblog
24 Jul 15:48

Using GitHub Spark to reverse engineer GitHub Spark

GitHub Spark was released in public preview yesterday. It's GitHub's implementation of the prompt-to-app pattern also seen in products like Claude Artifacts, Lovable, Vercel v0, Val Town Townie and Fly.io’s Phoenix New. In this post I reverse engineer Spark and explore its f...

Simon Willison's Weblog
24 Jul 08:03

Quoting Recurse Center

[...] You learn best and most effectively when you are learning something that you care about. Your work becomes meaningful and something you can be proud of only when you have chosen it for yourself. This is why our second self-directive is to build your volitional muscles...

Simon Willison's Weblog
24 Jul 00:36

I Drank Every Cocktail

I Drank Every Cocktail

Adam Aaronson drank his way through all 102 cocktails on the IBA cocktails list - published by the International Bartenders Association since 1961, with the most recent update in 2024.

Adam's write up is delightful, incorporating pedantry, data nerdery, a trip to the Internet Archive, some excellent bar recommendations in New York and London and hints at elicit rum smuggling to help make the final cocktail, the IBA Tiki, using two different Havana Club rums that are illegal in the USA thanks to import restrictions.

Via Andy Baio

Tags: cocktails

Simon Willison's Weblog
23 Jul 19:15

Instagram Reel: Veo 3 paid preview

Instagram Reel: Veo 3 paid preview

@googlefordevs on Instagram published this reel featuring Christina Warren with prompting tips for the new Veo 3 paid preview (mp4 copy here).

(Christine checked first if I minded them using that concept. I did not!)

Tags: google, ai, generative-ai, gemini, pelican-riding-a-bicycle, text-to-video