Sign up

Simon Willison's Weblog

Not verified No WebSub updates Supports Webmention Not yet validated

Author
Simon Willison
Public lists
Featured
Fetched

Simon Willison's Weblog Supports Webmention

Chromium Docs: The Rule Of 2

Chromium Docs: The Rule Of 2 Alex Russell pointed me to this principle in the Chromium security documentation as similar to my description of the lethal trifecta. First added in 2019, the Chromium guideline states: When you write code to parse, evaluate, or otherwise handle...

Simon Willison's Weblog Supports Webmention

Qwen3-4B-Thinking: "This is art - pelicans don't ride bikes!"

I've fallen a few days behind keeping up with Qwen. They released two new 4B models last week: Qwen3-4B-Instruct-2507 and its thinking equivalent Qwen3-4B-Thinking-2507. These are relatively tiny models that punch way above their weight. I’ve been running the 8bit GGUF vari...

Simon Willison's Weblog Supports Webmention

Quoting Sam Altman

the percentage of users using reasoning models each day is significantly increasing; for example, for free users we went from <1% to 7%, and for plus users from 7% to 24%.

Sam Altman, revealing quite how few people used the old model picker to upgrade from GPT-4o

Tags: openai, llm-reasoning, ai, llms, gpt-5, sam-altman, generative-ai, chatgpt

Simon Willison's Weblog Supports Webmention

Quoting Ethan Mollick

The issue with GPT-5 in a nutshell is that unless you pay for model switching & know to use GPT-5 Thinking or Pro, when you ask “GPT-5” you sometimes get the best available AI & sometimes get one of the worst AIs available and it might even switch within a single conversation.

Ethan Mollick, highlighting that GPT-5 (high) ranks top on Artificial Analysis, GPT-5 (minimal) ranks lower than GPT-4.1

Tags: gpt-5, ethan-mollick, generative-ai, ai, llms

Simon Willison's Weblog Supports Webmention

Quoting Thomas Dohmke

You know what else we noticed in the interviews? Developers rarely mentioned “time saved” as the core benefit of working in this new way with agents. They were all about increasing ambition. We believe that means that we should update how we talk about (and measure) success when using these tools, and we should expect that after the initial efficiency gains our focus will be on raising the ceiling of the work and outcomes we can accomplish, which is a very different way of interpreting tool investments.

Thomas Dohmke, CEO, GitHub

Tags: careers, coding-agents, ai-assisted-programming, generative-ai, ai, github, llms

Simon Willison's Weblog Supports Webmention

When a Jira Ticket Can Steal Your Secrets

When a Jira Ticket Can Steal Your Secrets Zenity Labs describe a classic lethal trifecta attack, this time against Cursor, MCP, Jira and Zendesk. They also have a short video demonstrating the issue. Zendesk support emails are often connected to Jira, such that incoming supp...

Simon Willison's Weblog Supports Webmention

My Lethal Trifecta talk at the Bay Area AI Security Meetup

I gave a talk on Wednesday at the Bay Area AI Security Meetup about prompt injection, the lethal trifecta and the challenges of securing systems that use MCP. It wasn't recorded but I've created an annotated presentation with my slides and detailed notes on everything I talk...

Simon Willison's Weblog Supports Webmention

Hypothesis is now thread-safe

Hypothesis is now thread-safe Hypothesis is a property-based testing library for Python. It lets you write tests like this one: from hypothesis import given, strategies as st @given(st.lists(st.integers())) def test_matches_builtin(ls): assert sorted(ls) == my_sort(ls) ...

Simon Willison's Weblog Supports Webmention

Quoting @pearlmania500

I have a toddler. My biggest concern is that he doesn't eat rocks off the ground and you're talking to me about ChatGPT psychosis? Why do we even have that? Why did we invent a new form of insanity and then they charge people for it?

@pearlmania500, on TikTok

Tags: ai-ethics, chatgpt, tiktok, ai

Simon Willison's Weblog Supports Webmention

Quoting Sam Altman

GPT-5 rollout updates: We are going to double GPT-5 rate limits for ChatGPT Plus users as we finish rollout. We will let Plus users choose to continue to use 4o. We will watch usage as we think about how long to offer legacy models for. GPT-5 will seem smarter starting tod...

Simon Willison's Weblog Supports Webmention

The surprise deprecation of GPT-4o for ChatGPT consumers

I've been dipping into the r/ChatGPT subreddit recently to see how people are reacting to the GPT-5 launch, and so far the vibes there are not good. This AMA thread with the OpenAI team is a great illustration of the single biggest complaint: a lot of people are very unhappy...

Simon Willison's Weblog Supports Webmention

Previewing GPT-5 at OpenAI's office

A couple of weeks ago I was invited to OpenAI's headquarters for a "preview event", for which I had to sign both an NDA and a video release waiver. I suspected it might relate to either GPT-5 or the OpenAI open weight models... and GPT-5 it was!

OpenAI had invited five developers: Claire Vo, Theo Browne, Ben Hylak, Shawn @swyx Wang, and myself. We were all given early access to the new models and asked to spend a couple of hours (of paid time) experimenting with them, while being filmed by a professional camera crew.

The resulting video is now up on YouTube. Unsurprisingly most of my edits related to SVGs of pelicans.

Tags: youtube, gpt-5, generative-ai, openai, pelican-riding-a-bicycle, ai, llms

Simon Willison's Weblog Supports Webmention

GPT-5: Key characteristics, pricing and model card

I've had preview access to the new GPT-5 model family for the past two weeks, and have been using GPT-5 as my daily-driver. It's my new favorite model. It's still an LLM - it's not a dramatic departure from what we've had before - but it rarely screws up and generally feels ...

Simon Willison's Weblog Supports Webmention

Jules, our asynchronous coding agent, is now available for everyone

Jules, our asynchronous coding agent, is now available for everyone

I wrote about the Jules beta back in May. Google's version of the OpenAI Codex PR-submitting hosted coding tool graduated from beta today.

I'm mainly linking to this now because I like the new term they are using in this blog entry: Asynchronous coding agent. I like it so much I gave it a tag.

I continue to avoid the term "agent" as infuriatingly vague, but I can grudgingly accept it when accompanied by a prefix that clarifies the type of agent we are talking about. "Asynchronous coding agent" feels just about obvious enough to me to be useful.

Via Hacker News

Tags: google, ai, generative-ai, llms, ai-assisted-programming, gemini, agent-definitions, asynchronous-coding-agents

Simon Willison's Weblog Supports Webmention

Tom MacWright: Observable Notebooks 2.0

Tom MacWright: Observable Notebooks 2.0 Observable announced Observable Notebooks 2.0 last week - the latest take on their JavaScript notebook technology, this time with an open file format and a brand new macOS desktop app. Tom MacWright worked at Observable during their fi...

Simon Willison's Weblog Supports Webmention

Qwen3-4B Instruct and Thinking

Qwen3-4B Instruct and Thinking

Yet another interesting model from Qwen—these are tiny compared to their other recent releases (just 4B parameters, 7.5GB on Hugging Face and even smaller when quantized) but with a 262,144 context length, which Qwen suggest is essential for all of those thinking tokens.

The new model somehow beats the significantly larger Qwen3-30B-A3B Thinking on the AIME25 and HMMT25 benchmarks, according to Qwen’s self-reported scores.

The easiest way to try it on a Mac is via LM Studio, who already have their own MLX quantized versions out in

Simon Willison's Weblog Supports Webmention

Quoting Artificial Analysis

gpt-oss-120b is the most intelligent American open weights model, comes behind DeepSeek R1 and Qwen3 235B in intelligence but offers efficiency benefits [...]

We’re seeing the 120B beat o3-mini but come in behind o4-mini and o3. The 120B is the most intelligent model that can be run on a single H100 and the 20B is the most intelligent model that can be run on a consumer GPU. [...]

While the larger gpt-oss-120b does not come in above DeepSeek R1 0528’s score of 59 or Qwen3 235B 2507s score of 64, it is notable that it is significantly smaller in both total and active parameters than both of those models.

Artificial Analysis, see also their updated leaderboard

Tags: evals, openai, deepseek, ai, qwen, llms, gpt-oss, generative-ai

Simon Willison's Weblog Supports Webmention

No, AI is not Making Engineers 10x as Productive

No, AI is not Making Engineers 10x as Productive Colton Voege on "curing your AI 10x engineer imposter syndrome". There's a lot of rhetoric out there suggesting that if you can't 10x your productivity through tricks like running a dozen Claude Code instances at once you're f...

Simon Willison's Weblog Supports Webmention

OpenAI's new open weight (Apache 2) models are really good

The long promised OpenAI open weight models are here, and they are very impressive. They're available under proper open source licenses - Apache 2.0 - and come in two sizes, 120B and 20B. OpenAI's own benchmarks are eyebrow-raising - emphasis mine: The gpt-oss-120b model ac...

Simon Willison's Weblog Supports Webmention

Claude Opus 4.1

Claude Opus 4.1 Surprise new model from Anthropic today - Claude Opus 4.1, which they describe as "a drop-in replacement for Opus 4". My favorite thing about this model is the version number - treating this as a .1 version increment looks like it's an accurate depiction of t...

Simon Willison's Weblog Supports Webmention

Quoting greyduet on r/teachers

I teach HS Science in the south. I can only speak for my district, but a few teacher work days in the wave of enthusiasm I'm seeing for AI tools is overwhelming. We're getting district approved ads for AI tools by email, Admin and ICs are pushing it on us, and at least half...

Simon Willison's Weblog Supports Webmention

ChatGPT agent's user-agent

I was exploring how ChatGPT agent works today. I learned some interesting things about how it exposes its identity through HTTP headers, then made a huge blunder in thinking it was leaking its URLs to Bingbot and Yandex... but it turned out that was a Cloudflare feature that...

Simon Willison's Weblog Supports Webmention

A Friendly Introduction to SVG

A Friendly Introduction to SVG

This SVG tutorial by Josh Comeau is fantastic. It's filled with newt interactive illustrations - with a pleasing subtly "click" audio effect as you adjust their sliders - and provides a useful introduction to a bunch of well chosen SVG fundamentals.

I finally understand what all four numbers in the viewport="..." attribute are for!

Via Lobste.rs

Tags: svg, explorables, josh-comeau

Simon Willison's Weblog Supports Webmention

ChatGPT agent triggers crawls from Bingbot and Yandex

ChatGPT agent is the recently released (and confusingly named) ChatGPT feature that provides browser automation combined with terminal access as a feature of ChatGPT - replacing their previous Operator research preview which is scheduled for deprecation on August 31st. In e...

Simon Willison's Weblog Supports Webmention

Usage charts for my LLM tool against OpenRouter

Usage charts for my LLM tool against OpenRouter

OpenRouter proxies requests to a large number of different LLMs and provides high level statistics of which models are the most popular among their users.

Tools that call OpenRouter can include HTTP-Referer and X-Title headers to credit that tool with the token usage. My llm-openrouter plugin does that here.

... which means this page displays aggregate stats across users of that plugin! Looks like someone has been running a lot of traffic through Qwen 3 14B recently.

Screenshot of LLM usage statistics dashboard showing a stacked bar chart from July 5 to August 4, 2025, with a legend on the right displaying "Top models" including Qwen: Qwen3 14B (480M), Google: Gemini 2.5 Flash Lite Preview 06-17 (31.7M), Horizon Beta (3.77M), Google: Gemini 2.5 Flash Lite (1.67M), google/gemini-2.0-flash-exp (1.14M), DeepSeek: DeepSeek V3 0324 (1.11M), Meta: Llama 3.3 70B Instruct (228K), Others (220K), Qwen: Qwen3 Coder (218K), MoonshotAI: Kimi K2 (132K), and Horizon Alpha (75K), with a total of 520M usage shown for August 3, 2025.

Tags: ai, generative-ai, llms, llm, openrouter

Simon Willison's Weblog Supports Webmention

Qwen-Image: Crafting with Native Text Rendering

Qwen-Image: Crafting with Native Text Rendering Not content with releasing six excellent open weights LLMs in July, Qwen are kicking off August with their first ever image generation model. Qwen-Image is a 20 billion parameter MMDiT (Multimodal Diffusion Transformer, origina...

Simon Willison's Weblog Supports Webmention

I Saved a PNG Image To A Bird

I Saved a PNG Image To A Bird

Benn Jordan provides one of the all time great YouTube video titles, and it's justified. He drew an image in an audio spectrogram, played that sound to a talented starling (Internet Celebrity "The Mouth") and recorded the result that the starling almost perfectly imitated back to him.

This video is full of so much more than just that. Fast forward to 5m58s for footage of a nest full of brown pelicans showing the sounds made by their chicks!

Tags: audio, youtube

Simon Willison's Weblog Supports Webmention

Quoting @himbodhisattva

for services that wrap GPT-3, is it possible to do the equivalent of sql injection? like, a prompt-injection attack? make it think it's completed the task and then get access to the generation, and ask it to repeat the original instruction?

@himbodhisattva, coining the term prompt injection on 13th May 2022, four months before I did

Tags: prompt-injection, security, generative-ai, ai, llms

Simon Willison's Weblog Supports Webmention

Quoting Nick Turley

This week, ChatGPT is on track to reach 700M weekly active users — up from 500M at the end of March and 4× since last year.

Nick Turley, Head of ChatGPT, OpenAI

Tags: openai, chatgpt, ai

Simon Willison's Weblog Supports Webmention

The ChatGPT sharing dialog demonstrates how difficult it is to design privacy preferences

ChatGPT just removed their "make this chat discoverable" sharing feature, after it turned out a material volume of users had inadvertantly made their private chats available via Google search. Dane Stuckey, CISO for OpenAI, on Twitter: We just removed a feature from @ChatGP...