FeedCity: Simon Willison's Weblog

Adding dynamic features to an aggressively cached website

My blog uses aggressive caching: it sits behind Cloudflare with a 15 minute cache header, which guarantees it can survive even the largest traffic spike to any given page. I've recently added a couple of dynamic features that work in spite of that full-page caching. Here's h...

Simon Willison's Weblog
28 Jan 22:18

The Five Levels: from Spicy Autocomplete to the Dark Factory

The Five Levels: from Spicy Autocomplete to the Dark Factory Dan Shapiro proposes a five level model of AI-assisted programming, inspired by the five (or rather six, it's zero-indexed) levels of driving automation. Spicy autocomplete, aka original GitHub Copilot or copying...

Simon Willison's Weblog
27 Jan 17:09

One Human + One Agent = One Browser From Scratch

One Human + One Agent = One Browser From Scratch embedding-shapes was so infuriated by the hype around Cursor's FastRender browser project - thousands of parallel agents producing ~1.6 million lines of Rust - that they were inspired to take a go at building a web browser usi...

Simon Willison's Weblog
27 Jan 15:36

Kimi K2.5: Visual Agentic Intelligence

Kimi K2.5: Visual Agentic Intelligence Kimi K2 landed in July as a 1 trillion parameter open weight LLM. It was joined by Kimi K2 Thinking in November which added reasoning capabilities. Now they've made it multi-modal: the K2 models were text-only, but the new 2.5 can handl...

Simon Willison's Weblog
27 Jan 00:24

Tips for getting coding agents to write good Python tests

Someone asked on Hacker News if I had any tips for getting coding agents to write decent quality tests. Here's what I said: I work in Python which helps a lot because there are a TON of good examples of pytest tests floating around in the training data, including things lik...

Simon Willison's Weblog
26 Jan 19:36

ChatGPT Containers can now run bash, pip/npm install packages, and download files

One of my favourite features of ChatGPT is its ability to write and execute code in a container. This feature launched as ChatGPT Code Interpreter nearly three years ago, was half-heartedly rebranded to "Advanced Data Analysis" at some point and is generally really difficult...

Simon Willison's Weblog
26 Jan 00:06

the browser is the sandbox

the browser is the sandbox Paul Kinlan is a web platform developer advocate at Google and recently turned his attention to coding agents. He quickly identified the importance of a robust sandbox for agents to operate in and put together these detailed notes on how the web br...

Simon Willison's Weblog
25 Jan 05:06

Kākāpō Cam: Rakiura live stream

Kākāpō Cam: Rakiura live stream Critical update for this year's Kākāpō breeding season: the New Zealand Department of Conservation have a livestream running of Rakiura's nest! You’re looking at the underground nest of 23-year-old Rakiura. She has chosen this same site to ne...

Simon Willison's Weblog
24 Jan 23:36

Don't "Trust the Process"

Don't "Trust the Process" Jenny Wen, Design Lead at Anthropic (and previously Director of Design at Figma) gave a provocative keynote at Hatch Conference in Berlin last September. Jenny argues that the Design Process - user research leading to personas leading to user journ...

Simon Willison's Weblog
24 Jan 21:42

Quoting Jasmine Sun

If you tell a friend they can now instantly create any app, they’ll probably say “Cool! Now I need to think of an idea.” Then they will forget about it, and never build a thing. The problem is not that your friend is horribly uncreative. It’s that most people’s problems are not software-shaped, and most won’t notice even when they are. [...]

Programmers are trained to see everything as a software-shaped problem: if you do a task three times, you should probably automate it with a script. Rename every IMG_*.jpg file from the last week to hawaii2025_*.jpg, they tell their terminal, while the rest of us painfully click and copy-paste. We are blind to the solutions we were never taught to see, asking for faster horses and never dreaming of cars.

— Jasmine Sun

Tags: vibe-coding, coding-agents, claude-code, generative-ai, ai, llms

Simon Willison's Weblog
23 Jan 21:54

Wilson Lin on FastRender: a browser built by thousands of parallel agents

Last week Cursor published Scaling long-running autonomous coding, an article describing their research efforts into coordinating large numbers of autonomous coding agents. One of the projects mentioned in the article was FastRender, a web browser they built from scratch usi...

Simon Willison's Weblog
23 Jan 09:36

Quoting Theia Vogel

[...] i was too busy with work to read anything, so i asked chatgpt to summarize some books on state formation, and it suggested circumscription theory. there was already the natural boundary of my computer hemming the towns in, and town mayors played the role of big men to drive conflict. so i just needed a way for them to fight. i slightly tweaked the allocation of claude max accounts to the towns from a demand-based to a fixed allocation system. towns would each get a fixed amount of tokens to start, but i added a soldier role that could attack and defend in raids to steal tokens from other towns. [...]

— Theia Vogel, Gas Town fan fiction

Tags: parallel-agents, llms, ai, generative-ai

Simon Willison's Weblog
23 Jan 00:18

SSH has no Host header

SSH has no Host header exe.dev is a new hosting service that, for $20/month, gives you up to 25 VMs "that share 2 CPUs and 8GB RAM". Everything happens over SSH, including creating new VMs. Once configured you can sign into your exe.dev VMs like this: ssh simon.exe.dev Here...

Simon Willison's Weblog
23 Jan 00:18

Quoting Chris Lloyd

Most people's mental model of Claude Code is that "it's just a TUI" but it should really be closer to "a small game engine".

For each frame our pipeline constructs a scene graph with React then:

-> layout elements
-> rasterize them to a 2d screen
-> diff that against the previous screen
-> finally use the diff to generate ANSI sequences to draw

We have a ~16ms frame budget so we have roughly ~5ms to go from the React scene graph to ANSI written.

— Chris Lloyd, Claude Code team at Anthropic

Tags: react, claude-code

Simon Willison's Weblog
22 Jan 18:06

Qwen3-TTS Family is Now Open Sourced: Voice Design, Clone, and Generation

Qwen3-TTS Family is Now Open Sourced: Voice Design, Clone, and Generation I haven't been paying much attention to the state-of-the-art in speech generation models other than noting that they've got really good, so I can't speak for how notable this new release from Qwen is. ...

Simon Willison's Weblog
22 Jan 15:51

Quoting Thariq Shihipar

Most people's mental model of Claude Code is that "it's just a TUI" but it should really be closer to "a small game engine".

For each frame our pipeline constructs a scene graph with React then

-> layouts elements
-> rasterizes them to a 2d screen
-> diffs that against the previous screen
-> finally uses the diff to generate ANSI sequences to draw

We have a ~16ms frame budget so we have roughly ~5ms to go from the React scene graph to ANSI written.

— Thariq Shihipar

Tags: react, claude-code

Simon Willison's Weblog
21 Jan 23:57

Claude's new constitution

Claude's new constitution Late last year Richard Weiss found something interesting while poking around with the just-released Claude Opus 4.5: he was able to talk the model into regurgitating a document which was not part of the system prompt but appeared instead to be baked...

Simon Willison's Weblog
20 Jan 23:27

Electricity use of AI coding agents

Electricity use of AI coding agents

Previous work estimating the energy and water cost of LLMs has generally focused on the cost per prompt using a consumer-level system such as ChatGPT.

Simon P. Couch notes that coding agents such as Claude Code use way more tokens in response to tasks, often burning through many thousands of tokens of many tool calls.

As a heavy Claude Code user, Simon estimates his own usage at the equivalent of 4,400 "typical queries" to an LLM, for an equivalent of around $15-$20 in daily API token spend. He figures that to be about the same as running a dishwasher once or the daily energy used by a domestic refrigerator.

Via Hacker News

Tags: ai, generative-ai, llms, ai-ethics, ai-energy-usage, coding-agents, claude-code

Simon Willison's Weblog
20 Jan 18:12

Giving University Exams in the Age of Chatbots

Giving University Exams in the Age of Chatbots

Detailed and thoughtful description of an open-book and open-chatbot exam run by Ploum at École Polytechnique de Louvain for an "Open Source Strategies" class.

Students were told they could use chatbots during the exam but they had to announce their intention to do so in advance, share their prompts and take full accountability for any mistakes they made.

Only 3 out of 60 students chose to use chatbots. Ploum surveyed half of the class to help understand their motivations.

Via lobste.rs

Tags: education, ai, generative-ai, llms, ai-ethics

Simon Willison's Weblog
20 Jan 00:27

jordanhubbard/nanolang

jordanhubbard/nanolang Plenty of people have mused about what a new programming language specifically designed to be used by LLMs might look like. Jordan Hubbard (co-founder of FreeBSD, with serious stints at Apple and NVIDIA) just released exactly that. A minimal, LLM-frie...

Simon Willison's Weblog
19 Jan 05:33

Scaling long-running autonomous coding

Scaling long-running autonomous coding Wilson Lin at Cursor has been doing some experiments to see how far you can push a large fleet of "autonomous" coding agents: This post describes what we've learned from running hundreds of concurrent agents on a single project, coordi...

Simon Willison's Weblog
19 Jan 00:28

FLUX.2-klein-4B Pure C Implementation

FLUX.2-klein-4B Pure C Implementation On 15th January Black Forest Labs, a lab formed by the creators of the original Stable Diffusion, released black-forest-labs/FLUX.2-klein-4B - an Apache 2.0 licensed 4 billion parameter version of their FLUX.2 family. Salvatore Sanfilipp...

Simon Willison's Weblog
17 Jan 17:12

Quoting Jeremy Daer

[On agents using CLI tools in place of REST APIs] To save on context window, yes, but moreso to improve accuracy and success rate when multiple tool calls are involved, particularly when calls must be correctly chained e.g. for pagination, rate-limit backoff, and recognizin...

Simon Willison's Weblog
16 Jan 21:42

Our approach to advertising and expanding access to ChatGPT

Our approach to advertising and expanding access to ChatGPT The long-rumored introduction of ads to ChatGPT just became a whole lot more concrete: In the coming weeks, we’re also planning to start testing ads in the U.S. for the free and Go tiers, so more people can benefit...

Simon Willison's Weblog
16 Jan 00:15

Open Responses

Open Responses This is the standardization effort I've most wanted in the world of LLMs: a vendor-neutral specification for the JSON API that clients can use to talk to hosted LLMs. Open Responses aims to provide exactly that as a documented standard, derived from OpenAI's R...

Simon Willison's Weblog
15 Jan 16:36

The Design & Implementation of Sprites

The Design & Implementation of Sprites I wrote about Sprites last week Here's Thomas Ptacek from Fly with the insider details on how they work under the hood. I like this framing of them as "disposable computers": Sprites are ball-point disposable computers. Whatever ma...

Simon Willison's Weblog
15 Jan 01:18

Quoting Boaz Barak, Gabriel Wu, Jeremy Chen and Manas Joglekar

When we optimize responses using a reward model as a proxy for “goodness” in reinforcement learning, models sometimes learn to “hack” this proxy and output an answer that only “looks good” to it (because coming up with an answer that is actually good can be hard). The philos...

Simon Willison's Weblog
14 Jan 22:42

Claude Cowork Exfiltrates Files

Claude Cowork Exfiltrates Files

Claude Cowork defaults to allowing outbound HTTP traffic to only a specific list of domains, to help protect the user against prompt injection attacks that exfiltrate their data.

Prompt Armor found a creative workaround: Anthropic's API domain is on that list, so they constructed an attack that includes an attacker's own Anthropic API key and has the agent upload any files it can see to the https://api.anthropic.com/v1/files endpoint, allowing the attacker to retrieve their content later.

Via Hacker News

Tags: security, ai, prompt-injection, generative-ai, llms, anthropic, exfiltration-attacks, ai-agents, claude-code, lethal-trifecta

Simon Willison's Weblog
14 Jan 00:12

Anthropic invests $1.5 million in the Python Software Foundation and open source security

Anthropic invests $1.5 million in the Python Software Foundation and open source security This is outstanding news, especially given our decision to withdraw from that NSF grant application back in October. We are thrilled to announce that Anthropic has entered into a two-y...

Simon Willison's Weblog
14 Jan 00:12

TIL from taking Neon I at the Crucible

TIL from taking Neon I at the Crucible

Things I learned about making neon signs after a week long intensive evening class at the Crucible in Oakland.

Tags: art, til