Sign up

Simon Willison's Weblog

Not verified No WebSub updates Supports Webmention Not yet validated

Author
Simon Willison
Public lists
Featured
Fetched

Simon Willison's Weblog Supports Webmention

Introducing Showboat and Rodney, so agents can demo what they’ve built

A key challenge working with coding agents is having them both test what they’ve built and demonstrate that software to you, their overseer. This goes beyond automated tests - we need artifacts that show their progress and help us see exactly what the agent-produced software...

Simon Willison's Weblog Supports Webmention

Structured Context Engineering for File-Native Agentic Systems

Structured Context Engineering for File-Native Agentic Systems New paper by Damon McMillan exploring challenging LLM context tasks involving large SQL schemas (up to 10,000 tables) across different models and file formats: Using SQL generation as a proxy for programmatic ag...

Simon Willison's Weblog Supports Webmention

AI Doesn’t Reduce Work—It Intensifies It

AI Doesn’t Reduce Work—It Intensifies It Aruna Ranganathan and Xingqi Maggie Ye from Berkeley Haas School of Business report initial findings in the HBR from their April to December 2025 study of 200 employees at a "U.S.-based technology company". This captures an effec...

Simon Willison's Weblog Supports Webmention

Kākāpō mug by Karen James

Friend and neighbour Karen James made me a Kākāpō mug. It has a charismatic Kākāpō, four Kākāpō chicks (in celebration of the 2026 breeding season) and even has some rimu fruit! I love it so much. Tags: kakapo, art

Simon Willison's Weblog Supports Webmention

Quoting Thomas Ptacek

People on the orange site are laughing at this, assuming it's just an ad and that there's nothing to it. Vulnerability researchers I talk to do not think this is a joke. As an erstwhile vuln researcher myself: do not bet against LLMs on this. Axios: Anthropic's Claude Opus ...

Simon Willison's Weblog Supports Webmention

Vouch

Vouch Mitchell Hashimoto's new system to help address the deluge of worthless AI-generated PRs faced by open source projects now that the friction involved in contributing has dropped so low. He says: The idea is simple: Unvouched users can't contribute to your projects. Ve...

Simon Willison's Weblog Supports Webmention

Claude: Speed up responses with fast mode

Claude: Speed up responses with fast mode New "research preview" from Anthropic today: you can now access a faster version of their frontier model Claude Opus 4.6 by typing /fast in Claude Code... but at a cost that's 6x the normal price. Opus is usually $5/million input and...

Simon Willison's Weblog Supports Webmention

Quoting David Crawshaw

I am having more fun programming than I ever have, because so many more of the programs I wish I could find the time to write actually exist. I wish I could share this joy with the people who are fearful about the changes agents are bringing. The fear itself I understand, I have fear more broadly about what the end-game is for intelligence on tap in our society. But in the limited domain of writing computer programs these tools have brought so much exploration and joy to my work.

David Crawshaw, Eight more months of agents

Tags: coding-agents, ai-assisted-programming, generative-ai, ai, llms

Simon Willison's Weblog Supports Webmention

How StrongDM's AI team build serious software without even looking at the code

Last week I hinted at a demo I had seen from a team implementing what Dan Shapiro called the Dark Factory level of AI adoption, where no human even looks at the code the coding agents are producing. That team was part of StrongDM, and they've just shared the first public des...

Simon Willison's Weblog Supports Webmention

Quoting Tom Dale

I don't know why this week became the tipping point, but nearly every software engineer I've talked to is experiencing some degree of mental health crisis.

[...] Many people assuming I meant job loss anxiety but that's just one presentation. I'm seeing near-manic episodes triggered by watching software shift from scarce to abundant. Compulsive behaviors around agent usage. Dissociative awe at the temporal compression of change. It's not fear necessarily just the cognitive overload from living in an inflection point.

Tom Dale

Tags: ai-ethics, careers, coding-agents, generative-ai, ai, llms

Simon Willison's Weblog Supports Webmention

Running Pydantic's Monty Rust sandboxed Python subset in WebAssembly

There's a jargon-filled headline for you! Everyone's building sandboxes for running untrusted code right now, and Pydantic's latest attempt, Monty, provides a custom Python-like language (a subset of Python) in Rust and makes it available as both a Rust library and a Python ...

Simon Willison's Weblog Supports Webmention

An Update on Heroku

An Update on Heroku An ominous headline to see on the official Heroku blog and yes, it's bad news. Today, Heroku is transitioning to a sustaining engineering model focused on stability, security, reliability, and support. Heroku remains an actively supported, production-rea...

Simon Willison's Weblog Supports Webmention

Quoting Karel D'Oosterlinck

When I want to quickly implement a one-off experiment in a part of the codebase I am unfamiliar with, I get codex to do extensive due diligence. Codex explores relevant slack channels, reads related discussions, fetches experimental branches from those discussions, and cherry picks useful changes for my experiment. All of this gets summarized in an extensive set of notes, with links back to where each piece of information was found. Using these notes, codex wires the experiment and makes a bunch of hyperparameter decisions I couldn’t possibly make without much more effort.

Karel D'Oosterlinck, I spent $10,000 to automate my research at OpenAI with Codex

Tags: codex-cli, coding-agents, ai-assisted-programming, generative-ai, openai, ai, llms

Simon Willison's Weblog Supports Webmention

Mitchell Hashimoto: My AI Adoption Journey

Mitchell Hashimoto: My AI Adoption Journey Some really good and unconventional tips in here for getting to a place with coding agents where they demonstrably improve your workflow and productivity. I particularly liked: Reproduce your own work - when learning to use coding...

Simon Willison's Weblog Supports Webmention

Opus 4.6 and Codex 5.3

Two major new model releases today, within about 15 minutes of each other. Anthropic released Opus 4.6. Here's its pelican: OpenAI release GPT-5.3-Codex, albeit only via their Codex app, not yet in their API. Here's its pelican: I've had a bit of preview access to both of ...

Simon Willison's Weblog Supports Webmention

Spotlighting The World Factbook as We Bid a Fond Farewell

Spotlighting The World Factbook as We Bid a Fond Farewell Somewhat devastating news today from the CIA: One of CIA’s oldest and most recognizable intelligence publications, The World Factbook, has sunset. There's not even a hint as to why they decided to stop maintaining t...

Simon Willison's Weblog Supports Webmention

Voxtral transcribes at the speed of sound

Voxtral transcribes at the speed of sound Mistral just released Voxtral Transcribe 2 - a family of two new models, one open weights, for transcribing audio to text. This is the latest in their Whisper-like model family, and a sequel to the original Voxtral which they release...

Simon Willison's Weblog Supports Webmention

Distributing Go binaries like sqlite-scanner through PyPI using go-to-wheel

I've been exploring Go for building small, fast and self-contained binary applications recently. I'm enjoying how there's generally one obvious way to do things and the resulting code is boring and readable - and something that LLMs are very competent at writing. The one cat...

Simon Willison's Weblog Supports Webmention

Introducing Deno Sandbox

Introducing Deno Sandbox Here's a new hosted sandbox product from the Deno team. It's actually unrelated to Deno itself - this is part of their Deno Deploy SaaS platform. As such, you don't even need to use JavaScript to access it - you can create and execute code in a hoste...

Simon Willison's Weblog Supports Webmention

January sponsors-only newsletter is out

I just sent the January edition of my sponsors-only monthly newsletter. If you are a sponsor (or if you start a sponsorship now) you can access it here. In the newsletter for January:

  • LLM predictions for 2026
  • Coding agents get even more attention
  • Clawdbot/Moltbot/OpenClaw went very viral
  • Kakapo breeding season is off to a really strong start
  • New options for sandboxes
  • Web browsers are the "hello world" of coding agent swarms
  • Sam Altman addressed the Jevons paradox for software engineering
  • Model releases and miscellaneous extras

Here's a copy of the December newsletter as a preview of what you'll get. Pay $10/month to stay a month ahead of the free copy!

Tags: newsletter

Simon Willison's Weblog Supports Webmention

Quoting Brandon Sanderson

This is the difference between Data and a large language model, at least the ones operating right now. Data created art because he wanted to grow. He wanted to become something. He wanted to understand. Art is the means by which we become what we want to be. [...] The book,...

Simon Willison's Weblog Supports Webmention

Introducing the Codex app

Introducing the Codex app OpenAI just released a new macOS app for their Codex coding agent. I've had a few days of preview access - it's a solid app that provides a nice UI over the capabilities of the Codex CLI agent and adds some interesting new features, most notably fir...

Simon Willison's Weblog Supports Webmention

A Social Network for A.I. Bots Only. No Humans Allowed.

A Social Network for A.I. Bots Only. No Humans Allowed. I talked to Cade Metz for this New York Times piece on OpenClaw and Moltbook. Cade reached out after seeing my blog post about that from the other day. In a first for me, they decided to send a photographer, Jason Henry...

Simon Willison's Weblog Supports Webmention

TIL: Running OpenClaw in Docker

TIL: Running OpenClaw in Docker

I've been running OpenClaw using Docker on my Mac. Here are the first in my ongoing notes on how I set that up and the commands I'm using to administer it.

Here's a screenshot of the web UI that this serves on localhost:

Screenshot of the OpenClaw Gateway Dashboard web interface. Header shows "OpenCLAW GATEWAY DASHBOARD" with a green "Health OK" indicator. Left sidebar contains navigation sections: Chat (Chat highlighted), Control (Overview, Channels, Instances, Sessions, Cron Jobs), Agent (Skills, Nodes), Settings (Config, Debug, Logs), and Resources (Docs). Main content area displays "Chat" with subtitle "Direct gateway chat session for quick interventions." and "telegram:6580064359" identifier. A user message at 4:08 PM reads "Show me a detailed list of all your available configured tools". The assistant response states: "Here's the full list of tools I have available in this OpenClaw session (as configured). These are the only ones I can call programmatically:" followed by categorized tools: "File & workspace" (read — Read a file (text or image). Supports offset/limit for large files; write — Create/overwrite a file (creates parent dirs); edit — Precise in-place edit by exact string replacement), "Shell / processes" (exec — Run a shell command (optionally PTY, backgrounding, timeouts); process — Manage running exec sessions (list/poll/log/write/kill/etc.)), "Web" (web_search — Search the web (Brave Search API); web_fetch — Fetch a URL and extract readable content (markdown/text); browser — Control a browser (open/navigate/snapshot/screenshot/act/etc.)), "UI / rendering" (canvas — Present/eval/snapshot a Canvas surface (for node canvases/UI rendering)), and "Devices / nodes" (cut off). Bottom shows message input with placeholder "Message (↵ to send, Shift+↵ for line breaks, paste images)" and "New session" and coral "Send" buttons.

Tags: ai, docker, til, generative-ai, llms, ai-agents, openclaw

Simon Willison's Weblog Supports Webmention

Quoting Andrej Karpathy

Originally in 2019, GPT-2 was trained by OpenAI on 32 TPU v3 chips for 168 hours (7 days), with $8/hour/TPUv3 back then, for a total cost of approx. $43K. It achieves 0.256525 CORE score, which is an ensemble metric introduced in the DCLM paper over 22 evaluations like ARC/MMLU/etc.

As of the last few improvements merged into nanochat (many of them originating in modded-nanogpt repo), I can now reach a higher CORE score in 3.04 hours (~$73) on a single 8XH100 node. This is a 600X cost reduction over 7 years, i.e. the cost to train GPT-2 is falling approximately 2.5X every year.

Andrej Karpathy

Tags: andrej-karpathy, gpt-2, generative-ai, ai, llms, openai

Simon Willison's Weblog Supports Webmention

Singing the gospel of collective efficacy

Singing the gospel of collective efficacy Lovely piece from Matt Webb about how you can "just do things" to help make your community better for everyone: Similarly we all love when the swifts visit (beautiful birds), so somebody started a group to get swift nest boxes made ...

Simon Willison's Weblog Supports Webmention

Quoting Steve Yegge

Getting agents using Beads requires much less prompting, because Beads now has 4 months of “Desire Paths” design, which I’ve talked about before. Beads has evolved a very complex command-line interface, with 100+ subcommands, each with many sub-subcommands, aliases, alternate syntaxes, and other affordances.

The complicated Beads CLI isn’t for humans; it’s for agents. What I did was make their hallucinations real, over and over, by implementing whatever I saw the agents trying to do with Beads, until nearly every guess by an agent is now correct.

Steve Yegge, Software Survival 3.0

Tags: steve-yegge, coding-agents, generative-ai, ai-agents, ai, llms

Simon Willison's Weblog Supports Webmention

Moltbook is the most interesting place on the internet right now

The hottest project in AI right now is Clawdbot, renamed to Moltbot, renamed to OpenClaw. It's an open source implementation of the digital personal assistant pattern, built by Peter Steinberger to integrate with the messaging system of your choice. It's two months old, has ...

Simon Willison's Weblog Supports Webmention

We gotta talk about AI as a programming tool for the arts

We gotta talk about AI as a programming tool for the arts Chris Ashworth is the creator and CEO of QLab, a macOS software package for “cue-based, multimedia playback” which is designed automate lighting and audio for live theater productions. I recently started following him...

Simon Willison's Weblog Supports Webmention

Datasette 1.0a24

Datasette 1.0a24 New Datasette alpha this morning. Key new features: Datasette's Request object can now handle multipart/form-data file uploads via the new await request.form(files=True) method. I plan to use this for a datasette-files plugin to support attaching files to ...