FeedCity: Simon Willison's Weblog

Claude as a calculator

Here's a quick demo of the kind of casual things I use LLMs for on a daily basis. I just found out that Perplexity offer their Deep Research feature via their API, through a model called Sonar Deep Research. Their documentation includes an example response, which included th...

Simon Willison's Weblog
28 May 14:10

AI Hallucination Cases

AI Hallucination Cases Damien Charlotin maintains this database of cases around the world where a legal decision has been made that confirms hallucinated content from generative AI was presented by a lawyer. That's an important distinction: this isn't just cases where AI may...

Simon Willison's Weblog
28 May 14:10

GitHub issues for notes

GitHub issues is almost the best notebook in the world. Free and unlimited, for both public and private notes. Comprehensive Markdown support, including syntax highlighting for almost any language. Plus you can drag and drop images or videos directly onto a note. It has fant...

Simon Willison's Weblog
28 May 14:10

GitHub Issues search now supports nested queries and boolean operators: Here’s how we (re)built it

GitHub Issues search now supports nested queries and boolean operators: Here’s how we (re)built it GitHub Issues got a significant search upgrade back in January. Deborah Digges provides some behind the scene details about how it works and how they rolled it out. The signatu...

Simon Willison's Weblog
28 May 14:10

Luis von Ahn on LinkedIn

Luis von Ahn on LinkedIn Last month's Duolingo memo about becoming an "AI-first" company has seen significant backlash, particularly on TikTok. I've had trouble figuring out how much of this is a real threat to their business as opposed to protests from a loud minority, but ...

Simon Willison's Weblog
28 May 14:10

CSS Minecraft

CSS Minecraft Incredible project by Benjamin Aster: There is no JavaScript on this page. All the logic is made 100% with pure HTML & CSS. For the best performance, please close other tabs and running programs. The page implements a full Minecraft-style world editor: yo...

Simon Willison's Weblog
28 May 14:10

GitHub MCP Exploited: Accessing private repositories via MCP

GitHub MCP Exploited: Accessing private repositories via MCP GitHub's official MCP server grants LLMs a whole host of new abilities, including being able to read and issues in repositories the user has access to and submit new pull requests. This is the lethal trifecta for p...

Simon Willison's Weblog
28 May 14:10

Build AI agents with the Mistral Agents API

Build AI agents with the Mistral Agents API Big upgrade to Mistral's API this morning: they've announced a new "Agents API". Mistral have been using the term "agents" for a while now. Here's how they describe them: AI agents are autonomous systems powered by large language ...

Simon Willison's Weblog
28 May 14:10

Large Language Models can run tools in your terminal with LLM 0.26

LLM 0.26 is out with the biggest new feature since I started the project: support for tools. You can now use the LLM CLI tool - and Python library - to grant LLMs from OpenAI, Anthropic, Gemini and local models from Ollama with access to any tool that you can represent as a ...

Simon Willison's Weblog
28 May 14:10

At Amazon, Some Coders Say Their Jobs Have Begun to Resemble Warehouse Work

At Amazon, Some Coders Say Their Jobs Have Begun to Resemble Warehouse Work I got a couple of quotes in this NYTimes story about internal resistance to Amazon's policy to encourage employees to make use of more generative AI: “It’s more fun to write code than to read code,”...

Simon Willison's Weblog
28 May 14:10

llm-llama-server 0.2

llm-llama-server 0.2 Here's a second option for using LLM's new tool support against local models (the first was via llm-ollama). It turns out the llama.cpp ecosystem has pretty robust OpenAI-compatible tool support already, so my llm-llama-server plugin only needed a quick ...

Simon Willison's Weblog
25 May 13:45

Highlights from the Claude 4 system prompt

Anthropic publish most of the system prompts for their chat models as part of their release notes. They recently shared the new prompts for both Claude Opus 4 and Claude Sonnet 4. I enjoyed digging through the prompts, since they act as a sort of unofficial manual for how be...

Simon Willison's Weblog
25 May 06:06

Subscribe to my sponsors-only monthly newsletter.

Subscribe to my sponsors-only monthly newsletter I’ve never liked the idea of charging for my content. I get enormous value from putting all of my writing and research out there for free. So I’m trying something a little different: pay me to send you less. I’m starting a sp...

Simon Willison's Weblog
25 May 05:52

System Card: Claude Opus 4 & Claude Sonnet 4

System Card: Claude Opus 4 & Claude Sonnet 4 Direct link to a PDF on Anthropic's CDN because they don't appear to have a landing page anywhere for this document. Anthropic's system cards are always worth a look, and this one for the new Opus 4 and Sonnet 4 has some parti...

Simon Willison's Weblog
24 May 21:09

How I used o3 to find CVE-2025-37899, a remote zeroday vulnerability in the Linux kernel’s SMB implementation

How I used o3 to find CVE-2025-37899, a remote zeroday vulnerability in the Linux kernel’s SMB implementation Sean Heelan: The vulnerability [o3] found is CVE-2025-37899 (fix here), a use-after-free in the handler for the SMB 'logoff' command. Understanding the vulnerabilit...

Simon Willison's Weblog
24 May 19:20

f2

f2 Really neat CLI tool for bulk renaming of files and directories by Ayooluwa Isaiah, written in Go and designed to work cross-platform. There's a lot of great design in this. Basic usage is intuitive - here's how to rename all .svg files to .tmp.svg in the current director...

Simon Willison's Weblog
23 May 18:22

Honey badger

I'm helping make some changes to a large, complex and very unfamiliar to me WordPress site. It's a perfect opportunity to try out Claude Code running against the new Claude 4 models.

It's going extremely well. So far Claude has helped get MySQL working on an older laptop (fixing some inscrutable Homebrew errors), disabled a CAPTCHA plugin that didn't work on localhost, toggled visible warnings on and off several times and figured out which CSS file to modify in the theme that the site is using. It even took a reasonable stab at making the site responsive on mobile!

I'm now calling Claude Code honey badger on account of its voracious appetite for crunching through code (and tokens) looking for the right thing to fix.

I got ChatGPT to make me some fan art:

Tags: anthropic, claude, wordpress, ai, llms, ai-assisted-programming, generative-ai, homebrew, claude-4

Simon Willison's Weblog
23 May 14:39

Remote Prompt Injection in GitLab Duo Leads to Source Code Theft

Remote Prompt Injection in GitLab Duo Leads to Source Code Theft Yet another example of the classic Markdown image exfiltration attack, this time affecting GitLab Duo - GitLab's chatbot. Omer Mayraz reports on how they found and disclosed the issue. The first part of this is...

Simon Willison's Weblog
22 May 19:07

Agents are models using tools in a loop

I was going slightly spare at the fact that every talk at this Anthropic developer conference has used the word "agents" dozens of times, but nobody ever stopped to provide a useful definition.

I'm now in the "Prompting for Agents" workshop and Anthropic's Hannah Moran finally broke the trend by saying that at Anthropic:

Agents are models using tools in a loop

I can live with that! I'm glad someone finally said it out loud.

Tags: anthropic, generative-ai, ai-agents, ai, llms

Simon Willison's Weblog
22 May 19:03

Updated Anthropic model comparison table

Updated Anthropic model comparison table A few details in here about Claude 4 that I hadn't spotted elsewhere: The training cut-off date for Claude Opus 4 and Claude Sonnet 4 is March 2025! That's the most recent cut-off for any of the current popular models, really impress...

Simon Willison's Weblog
22 May 18:36

llm-anthropic 0.16

llm-anthropic 0.16 New release of my LLM plugin for Anthropic adding the new Claude 4 Opus and Sonnet models. You can see pelicans on bicycles generated using the new plugin at the bottom of my live blog covering the release. I also released llm-anthropic 0.16a1 which works ...

Simon Willison's Weblog
22 May 16:32

Live blog: Claude 4 launch at Code with Claude

I'm at Anthropic's Code with Claude event, where they are launching Claude 4. I'll be live blogging the keynote here.

Tags: llm-release, liveblogging, anthropic, claude, generative-ai, ai, llms, pelican-riding-a-bicycle, claude-4

Simon Willison's Weblog
22 May 01:55

No docs, no bugs

If your library doesn't have any documentation, it can't have any bugs.

Documentation specifies what your code is supposed to do. Your tests specify what it actually does.

Bugs exist when your test-enforced implementation fails to match the behavior described in your documentation. Without documentation a bug is just undefined behavior.

If you aim to follow semantic versioning you bump your major version when you release a backwards incompatible change. Such changes cannot exist if your code is not comprehensively documented!

Inspired by a half-remembered conversation I had with Tom Insam many years ago.

Tags: testing, semantic-versioning, documentation

Simon Willison's Weblog
21 May 22:02

Devstral

Devstral New Apache 2.0 licensed LLM release from Mistral, this time specifically trained for code. Devstral achieves a score of 46.8% on SWE-Bench Verified, outperforming prior open-source SoTA models by more than 6% points. When evaluated under the same test scaffold (Ope...

Simon Willison's Weblog
21 May 21:44

Gemini Diffusion

Gemini Diffusion Another of the announcements from Google I/O yesterday was Gemini Diffusion, Google's first LLM to use diffusion (similar to image models like Imagen and Stable Diffusion) in place of transformers. Google describe it like this: Traditional autoregressive la...

Simon Willison's Weblog
21 May 15:03

Chicago Sun-Times Prints AI-Generated Summer Reading List With Books That Don't Exist

Chicago Sun-Times Prints AI-Generated Summer Reading List With Books That Don't Exist Classic slop: it listed real authors with entirely fake books. There's an important follow-up from 404 Media in their subsequent story: Victor Lim, the vice president of marketing and comm...

Simon Willison's Weblog
21 May 14:38

I really don't like ChatGPT's new memory dossier

Last month ChatGPT got a major upgrade. As far as I can tell the closest to an official announcement was this tweet from @OpenAI: Starting today [April 10th 2025], memory in ChatGPT can now reference all of your past chats to provide more personalized responses, drawing on ...

Simon Willison's Weblog
20 May 22:34

We did the math on AI’s energy footprint. Here’s the story you haven’t heard.

We did the math on AI’s energy footprint. Here’s the story you haven’t heard. James O'Donnell and Casey Crownhart try to pull together a detailed account of AI energy usage for MIT Technology Review. They quickly run into the same roadblock faced by everyone else who's tried...

Simon Willison's Weblog
20 May 20:34

Gemini 2.5: Our most intelligent models are getting even better

Gemini 2.5: Our most intelligent models are getting even better A bunch of new Gemini 2.5 announcements at Google I/O today. 2.5 Flash and 2.5 Pro are both getting audio output (previously previewed in Gemini 2.0) and 2.5 Pro is getting an enhanced reasoning mode called "Dee...

Simon Willison's Weblog
20 May 19:24

Google I/O Pelican

Tucked into today's Google I/O keynote, a blink-and-you'll miss it moment:

The pelican in the keynote was created by Alexander Chen. Here's the code they wrote with the help of Gemini, which uses p5.js to power the animation.

Tags: pelican-riding-a-bicycle, google-io, google