FeedCity: Simon Willison's Weblog

Introducing Showboat and Rodney, so agents can demo what they’ve built

A key challenge working with coding agents is having them both test what they’ve built and demonstrate that software to you, their overseer. This goes beyond automated tests - we need artifacts that show their progress and help us see exactly what the agent-produced software...

Simon Willison's Weblog
10 Feb 00:18

Structured Context Engineering for File-Native Agentic Systems

Structured Context Engineering for File-Native Agentic Systems New paper by Damon McMillan exploring challenging LLM context tasks involving large SQL schemas (up to 10,000 tables) across different models and file formats: Using SQL generation as a proxy for programmatic ag...

Simon Willison's Weblog
09 Feb 17:12

AI Doesn’t Reduce Work—It Intensifies It

AI Doesn’t Reduce Work—It Intensifies It Aruna Ranganathan and Xingqi Maggie Ye from Berkeley Haas School of Business report initial findings in the HBR from their April to December 2025 study of 200 employees at a "U.S.-based technology company". This captures an effec...

Simon Willison's Weblog
08 Feb 17:40

Kākāpō mug by Karen James

Friend and neighbour Karen James made me a Kākāpō mug. It has a charismatic Kākāpō, four Kākāpō chicks (in celebration of the 2026 breeding season) and even has some rimu fruit! I love it so much. Tags: kakapo, art

Simon Willison's Weblog
08 Feb 03:00

Quoting Thomas Ptacek

People on the orange site are laughing at this, assuming it's just an ad and that there's nothing to it. Vulnerability researchers I talk to do not think this is a joke. As an erstwhile vuln researcher myself: do not bet against LLMs on this. Axios: Anthropic's Claude Opus ...

Simon Willison's Weblog
08 Feb 00:06

Vouch

Vouch Mitchell Hashimoto's new system to help address the deluge of worthless AI-generated PRs faced by open source projects now that the friction involved in contributing has dropped so low. He says: The idea is simple: Unvouched users can't contribute to your projects. Ve...

Simon Willison's Weblog
07 Feb 23:33

Claude: Speed up responses with fast mode

Claude: Speed up responses with fast mode New "research preview" from Anthropic today: you can now access a faster version of their frontier model Claude Opus 4.6 by typing /fast in Claude Code... but at a cost that's 6x the normal price. Opus is usually $5/million input and...

Simon Willison's Weblog
07 Feb 21:45

Quoting David Crawshaw

I am having more fun programming than I ever have, because so many more of the programs I wish I could find the time to write actually exist. I wish I could share this joy with the people who are fearful about the changes agents are bringing. The fear itself I understand, I have fear more broadly about what the end-game is for intelligence on tap in our society. But in the limited domain of writing computer programs these tools have brought so much exploration and joy to my work.

— David Crawshaw, Eight more months of agents

Tags: coding-agents, ai-assisted-programming, generative-ai, ai, llms

Simon Willison's Weblog
07 Feb 15:54

How StrongDM's AI team build serious software without even looking at the code

Last week I hinted at a demo I had seen from a team implementing what Dan Shapiro called the Dark Factory level of AI adoption, where no human even looks at the code the coding agents are producing. That team was part of StrongDM, and they've just shared the first public des...

Simon Willison's Weblog
07 Feb 00:03

Quoting Tom Dale

I don't know why this week became the tipping point, but nearly every software engineer I've talked to is experiencing some degree of mental health crisis.

[...] Many people assuming I meant job loss anxiety but that's just one presentation. I'm seeing near-manic episodes triggered by watching software shift from scarce to abundant. Compulsive behaviors around agent usage. Dissociative awe at the temporal compression of change. It's not fear necessarily just the cognitive overload from living in an inflection point.

— Tom Dale

Tags: ai-ethics, careers, coding-agents, generative-ai, ai, llms

Simon Willison's Weblog
06 Feb 22:48

Running Pydantic's Monty Rust sandboxed Python subset in WebAssembly

There's a jargon-filled headline for you! Everyone's building sandboxes for running untrusted code right now, and Pydantic's latest attempt, Monty, provides a custom Python-like language (a subset of Python) in Rust and makes it available as both a Rust library and a Python ...

Simon Willison's Weblog
06 Feb 18:54

An Update on Heroku

An Update on Heroku An ominous headline to see on the official Heroku blog and yes, it's bad news. Today, Heroku is transitioning to a sustaining engineering model focused on stability, security, reliability, and support. Heroku remains an actively supported, production-rea...

Simon Willison's Weblog
06 Feb 01:12

Quoting Karel D'Oosterlinck

When I want to quickly implement a one-off experiment in a part of the codebase I am unfamiliar with, I get codex to do extensive due diligence. Codex explores relevant slack channels, reads related discussions, fetches experimental branches from those discussions, and cherry picks useful changes for my experiment. All of this gets summarized in an extensive set of notes, with links back to where each piece of information was found. Using these notes, codex wires the experiment and makes a bunch of hyperparameter decisions I couldn’t possibly make without much more effort.

— Karel D'Oosterlinck, I spent $10,000 to automate my research at OpenAI with Codex

Tags: codex-cli, coding-agents, ai-assisted-programming, generative-ai, openai, ai, llms

Simon Willison's Weblog
06 Feb 00:03

Mitchell Hashimoto: My AI Adoption Journey

Mitchell Hashimoto: My AI Adoption Journey Some really good and unconventional tips in here for getting to a place with coding agents where they demonstrably improve your workflow and productivity. I particularly liked: Reproduce your own work - when learning to use coding...

Simon Willison's Weblog
05 Feb 20:36

Opus 4.6 and Codex 5.3

Two major new model releases today, within about 15 minutes of each other. Anthropic released Opus 4.6. Here's its pelican: OpenAI release GPT-5.3-Codex, albeit only via their Codex app, not yet in their API. Here's its pelican: I've had a bit of preview access to both of ...

Simon Willison's Weblog
05 Feb 00:45

Spotlighting The World Factbook as We Bid a Fond Farewell

Spotlighting The World Factbook as We Bid a Fond Farewell Somewhat devastating news today from the CIA: One of CIA’s oldest and most recognizable intelligence publications, The World Factbook, has sunset. There's not even a hint as to why they decided to stop maintaining t...

Simon Willison's Weblog
04 Feb 23:06

Voxtral transcribes at the speed of sound

Voxtral transcribes at the speed of sound Mistral just released Voxtral Transcribe 2 - a family of two new models, one open weights, for transcribing audio to text. This is the latest in their Whisper-like model family, and a sequel to the original Voxtral which they release...

Simon Willison's Weblog
04 Feb 15:28

Distributing Go binaries like sqlite-scanner through PyPI using go-to-wheel

I've been exploring Go for building small, fast and self-contained binary applications recently. I'm enjoying how there's generally one obvious way to do things and the resulting code is boring and readable - and something that LLMs are very competent at writing. The one cat...

Simon Willison's Weblog
03 Feb 23:18

Introducing Deno Sandbox

Introducing Deno Sandbox Here's a new hosted sandbox product from the Deno team. It's actually unrelated to Deno itself - this is part of their Deno Deploy SaaS platform. As such, you don't even need to use JavaScript to access it - you can create and execute code in a hoste...

Simon Willison's Weblog
03 Feb 07:06

January sponsors-only newsletter is out

I just sent the January edition of my sponsors-only monthly newsletter. If you are a sponsor (or if you start a sponsorship now) you can access it here. In the newsletter for January:

LLM predictions for 2026
Coding agents get even more attention
Clawdbot/Moltbot/OpenClaw went very viral
Kakapo breeding season is off to a really strong start
New options for sandboxes
Web browsers are the "hello world" of coding agent swarms
Sam Altman addressed the Jevons paradox for software engineering
Model releases and miscellaneous extras

Here's a copy of the December newsletter as a preview of what you'll get. Pay $10/month to stay a month ahead of the free copy!

Tags: newsletter

Simon Willison's Weblog
03 Feb 03:00

Quoting Brandon Sanderson

This is the difference between Data and a large language model, at least the ones operating right now. Data created art because he wanted to grow. He wanted to become something. He wanted to understand. Art is the means by which we become what we want to be. [...] The book,...

Simon Willison's Weblog
02 Feb 20:15

Introducing the Codex app

Introducing the Codex app OpenAI just released a new macOS app for their Codex coding agent. I've had a few days of preview access - it's a solid app that provides a nice UI over the capabilities of the Codex CLI agent and adds some interesting new features, most notably fir...

Simon Willison's Weblog
02 Feb 17:12

A Social Network for A.I. Bots Only. No Humans Allowed.

A Social Network for A.I. Bots Only. No Humans Allowed. I talked to Cade Metz for this New York Times piece on OpenClaw and Moltbook. Cade reached out after seeing my blog post about that from the other day. In a first for me, they decided to send a photographer, Jason Henry...

Simon Willison's Weblog
02 Feb 00:30

TIL: Running OpenClaw in Docker

TIL: Running OpenClaw in Docker

I've been running OpenClaw using Docker on my Mac. Here are the first in my ongoing notes on how I set that up and the commands I'm using to administer it.

Here's a screenshot of the web UI that this serves on localhost:

Tags: ai, docker, til, generative-ai, llms, ai-agents, openclaw

Simon Willison's Weblog
31 Jan 22:06

Quoting Andrej Karpathy

Originally in 2019, GPT-2 was trained by OpenAI on 32 TPU v3 chips for 168 hours (7 days), with $8/hour/TPUv3 back then, for a total cost of approx. $43K. It achieves 0.256525 CORE score, which is an ensemble metric introduced in the DCLM paper over 22 evaluations like ARC/MMLU/etc.

As of the last few improvements merged into nanochat (many of them originating in modded-nanogpt repo), I can now reach a higher CORE score in 3.04 hours (~$73) on a single 8XH100 node. This is a 600X cost reduction over 7 years, i.e. the cost to train GPT-2 is falling approximately 2.5X every year.

— Andrej Karpathy

Tags: andrej-karpathy, gpt-2, generative-ai, ai, llms, openai

Simon Willison's Weblog
31 Jan 01:33

Singing the gospel of collective efficacy

Singing the gospel of collective efficacy Lovely piece from Matt Webb about how you can "just do things" to help make your community better for everyone: Similarly we all love when the swifts visit (beautiful birds), so somebody started a group to get swift nest boxes made ...

Simon Willison's Weblog
30 Jan 22:48

Quoting Steve Yegge

Getting agents using Beads requires much less prompting, because Beads now has 4 months of “Desire Paths” design, which I’ve talked about before. Beads has evolved a very complex command-line interface, with 100+ subcommands, each with many sub-subcommands, aliases, alternate syntaxes, and other affordances.

The complicated Beads CLI isn’t for humans; it’s for agents. What I did was make their hallucinations real, over and over, by implementing whatever I saw the agents trying to do with Beads, until nearly every guess by an agent is now correct.

— Steve Yegge, Software Survival 3.0

Tags: steve-yegge, coding-agents, generative-ai, ai-agents, ai, llms

Simon Willison's Weblog
30 Jan 17:00

Moltbook is the most interesting place on the internet right now

The hottest project in AI right now is Clawdbot, renamed to Moltbot, renamed to OpenClaw. It's an open source implementation of the digital personal assistant pattern, built by Peter Steinberger to integrate with the messaging system of your choice. It's two months old, has ...

Simon Willison's Weblog
30 Jan 04:30

We gotta talk about AI as a programming tool for the arts

We gotta talk about AI as a programming tool for the arts Chris Ashworth is the creator and CEO of QLab, a macOS software package for “cue-based, multimedia playback” which is designed automate lighting and audio for live theater productions. I recently started following him...

Simon Willison's Weblog
29 Jan 17:54

Datasette 1.0a24

Datasette 1.0a24 New Datasette alpha this morning. Key new features: Datasette's Request object can now handle multipart/form-data file uploads via the new await request.form(files=True) method. I plan to use this for a datasette-files plugin to support attaching files to ...