FeedCity: Simon Willison's Weblog

Quoting Bruce Schneier and Barath Raghavan

Prompt injection might be unsolvable in today’s LLMs. LLMs process token sequences, but no mechanism exists to mark token privileges. Every solution proposed introduces new injection vectors: Delimiter? Attackers include delimiters. Instruction hierarchy? Attackers claim pr...

Simon Willison's Weblog
21 Oct 02:48

Quoting Phil Gyford

Since getting a modem at the start of the month, and hooking up to the Internet, I’ve spent about an hour every evening actually online (which I guess is costing me about £1 a night), and much of the days and early evenings fiddling about with things. It’s so complicated. All the hype never mentioned that. I guess journalists just have it all set up for them so they don’t have to worry too much about that side of things. It’s been a nightmare, but an enjoyable one, and in the end, satisfying.

— Phil Gyford, Diary entry, Friday February 17th 1995 1.50 am

Tags: phil-gyford, computer-history

Simon Willison's Weblog
20 Oct 19:54

Claude Code for web - a new asynchronous coding agent from Anthropic

Anthropic launched Claude Code for web this morning. It's an asynchronous coding agent - their answer to OpenAI's Codex Cloud and Google's Jules, and has a very similar shape. I had preview access over the weekend and I've already seen some very promising results from it. It...

Simon Willison's Weblog
20 Oct 17:51

Getting DeepSeek-OCR working on an NVIDIA Spark via brute force using Claude Code

DeepSeek released a new model yesterday: DeepSeek-OCR, a 6.6GB model fine-tuned specifically for OCR. They released it as model weights that run using PyTorch and CUDA. I got it running on the NVIDIA Spark by having Claude Code effectively brute force the challenge of gettin...

Simon Willison's Weblog
18 Oct 19:48

TIL: Exploring OpenAI's deep research API model o4-mini-deep-research

TIL: Exploring OpenAI's deep research API model o4-mini-deep-research

I landed a PR by Manuel Solorzano adding pricing information to llm-prices.com for OpenAI's o4-mini-deep-research and o3-deep-research models, which they released in June and document here.

I realized I'd never tried these before, so I put o4-mini-deep-research through its paces researching locations of surviving orchestrions for me (I really like orchestrions).

The API cost me $1.10 and triggered a small flurry of extra vibe-coded tools, including this new tool for visualizing Responses API traces from deep research models and this mocked up page listing the 19 orchestrions it found (only one of which I have fact-checked myself).

Tags: ai, openai, generative-ai, llms, deep-research, vibe-coding

Simon Willison's Weblog
18 Oct 04:06

The AI water issue is fake

The AI water issue is fake Andy Masley (previously): All U.S. data centers (which mostly support the internet, not AI) used 200--250 million gallons of freshwater daily in 2023. The U.S. consumes approximately 132 billion gallons of freshwater daily. The U.S. circulates a l...

Simon Willison's Weblog
18 Oct 03:36

Andrej Karpathy — AGI is still a decade away

Andrej Karpathy — AGI is still a decade away Extremely high signal 2 hour 25 minute (!) conversation between Andrej Karpathy and Dwarkesh Patel. It starts with Andrej's claim that "the year of agents" is actually more likely to take a decade. Seeing as I accepted 2025 as the...

Simon Willison's Weblog
17 Oct 21:30

Quoting Alexander Fridriksson and Jay Miller

Using UUIDv7 is generally discouraged for security when the primary key is exposed to end users in external-facing applications or APIs. The main issue is that UUIDv7 incorporates a 48-bit Unix timestamp as its most significant part, meaning the identifier itself leaks the record's creation time.

This leakage is primarily a privacy concern. Attackers can use the timing data as metadata for de-anonymization or account correlation, potentially revealing activity patterns or growth rates within an organization.

— Alexander Fridriksson and Jay Miller, Exploring PostgreSQL 18's new UUIDv7 support

Tags: uuid, postgresql, privacy, security

Simon Willison's Weblog
17 Oct 18:51

Should form labels be wrapped or separate?

Should form labels be wrapped or separate?

James Edwards notes that wrapping a form input in a label event like this has a significant downside:

<label>Name <input type="text"></label>

It turns out both Dragon Naturally Speaking for Windows and Voice Control for macOS and iOS fail to understand this relationship!

You need to use the explicit <label for="element_id"> syntax to ensure those screen readers correctly understand the relationship between label and form field. You can still nest the input inside the label if you like:

<label for="idField">Name
  <input id="idField" type="text">
</label>

Via Chris Ferdinandi

Tags: accessibility, html, screen-readers

Simon Willison's Weblog
16 Oct 22:57

Quoting Barry Zhang

Skills actually came out of a prototype I built demonstrating that Claude Code is a general-purpose agent :-)

It was a natural conclusion once we realized that bash + filesystem were all we needed

— Barry Zhang, Anthropic

Tags: skills, claude-code, ai-agents, generative-ai, ai, llms

Simon Willison's Weblog
16 Oct 21:45

Claude Skills are awesome, maybe a bigger deal than MCP

Anthropic this morning introduced Claude Skills, a new pattern for making new abilities available to their models: Claude can now use Skills to improve how it performs specific tasks. Skills are folders that include instructions, scripts, and resources that Claude can load ...

Simon Willison's Weblog
16 Oct 05:48

NVIDIA DGX Spark + Apple Mac Studio = 4x Faster LLM Inference with EXO 1.0

NVIDIA DGX Spark + Apple Mac Studio = 4x Faster LLM Inference with EXO 1.0 EXO Labs wired a 256GB M3 Ultra Mac Studio up to an NVIDIA DGX Spark and got a 2.8x performance boost serving Llama-3.1 8B (FP16) with an 8,192 token prompt. Their detailed explanation taught me a lot...

Simon Willison's Weblog
16 Oct 04:39

Quoting Riana Pfefferkorn

Pro se litigants [people representing themselves in court without a lawyer] account for the majority of the cases in the United States where a party submitted a court filing containing AI hallucinations. In a country where legal representation is unaffordable for most people, it is no wonder that pro se litigants are depending on free or low-cost AI tools. But it is a scandal that so many have been betrayed by them, to the detriment of the cases they are litigating all on their own.

— Riana Pfefferkorn, analyzing the AI Hallucination Cases database for CIS at Stanford Law

Tags: ai-ethics, generative-ai, law, hallucinations, ai, llms

Simon Willison's Weblog
16 Oct 04:09

Coding without typing the code

Last year the most useful exercise for getting a feel for how good LLMs were at writing code was vibe coding (before that name had even been coined) - seeing if you could create a useful small application through prompting alone.

Today I think there's a new, more ambitious and significantly more intimidating exercise: spend a day working on real production code through prompting alone, making no manual edits yourself.

This doesn't mean you can't control exactly what goes into each file - you can even tell the model "update line 15 to use this instead" if you have to - but it's a great way to get more of a feel for how well the latest coding agents can wield their edit tools.

Tags: coding-agents, ai-assisted-programming, generative-ai, ai, llms

Simon Willison's Weblog
15 Oct 20:39

Quoting Catherine Wu

While Sonnet 4.5 remains the default [in Claude Code], Haiku 4.5 now powers the Explore subagent which can rapidly gather context on your codebase to build apps even faster.

You can select Haiku 4.5 to be your default model in /model. When selected, you’ll automatically use Sonnet 4.5 in Plan mode and Haiku 4.5 for execution for smarter plans and faster results.

— Catherine Wu, Claude Code PM, Anthropic

Tags: coding-agents, anthropic, claude-code, generative-ai, ai, llms, sub-agents

Simon Willison's Weblog
15 Oct 19:45

Quoting Claude Haiku 4.5 System Card

Previous system cards have reported results on an expanded version of our earlier agentic misalignment evaluation suite: three families of exotic scenarios meant to elicit the model to commit blackmail, attempt a murder, and frame someone for financial crimes. We choose not to report full results here because, similarly to Claude Sonnet 4.5, Claude Haiku 4.5 showed many clear examples of verbalized evaluation awareness on all three of the scenarios tested in this suite. Since the suite only consisted of many similar variants of three core scenarios, we expect that the model maintained high unverbalized awareness across the board, and we do not trust it to be representative of behavior in the real extreme situations the suite is meant to emulate.

— Claude Haiku 4.5 System Card

Tags: ai-ethics, anthropic, claude, generative-ai, ai, llms

Simon Willison's Weblog
15 Oct 19:45

Introducing Claude Haiku 4.5

Introducing Claude Haiku 4.5 Anthropic released Claude Haiku 4.5 today, the cheapest member of the Claude 4.5 family that started with Sonnet 4.5 a couple of weeks ago. It's priced at $1/million input tokens and $5/million output tokens, slightly more expensive than Haiku 3....

Simon Willison's Weblog
15 Oct 05:12

A modern approach to preventing CSRF in Go

A modern approach to preventing CSRF in Go Alex Edwards writes about the new http.CrossOriginProtection middleware that was added to the Go standard library in version 1.25 in August and asks: Have we finally reached the point where CSRF attacks can be prevented without rel...

Simon Willison's Weblog
15 Oct 00:00

NVIDIA DGX Spark: great hardware, early days for the ecosystem

NVIDIA sent me a preview unit of their new DGX Spark desktop "AI supercomputer". I've never had hardware to review before! You can consider this my first ever sponsored post if you like, but they did not pay me any cash and aside from an embargo date they did not request (no...

Simon Willison's Weblog
14 Oct 21:45

Just Talk To It - the no-bs Way of Agentic Engineering

Just Talk To It - the no-bs Way of Agentic Engineering Peter Steinberger's long, detailed description of his current process for using Codex CLI and GPT-5 Codex. This is information dense and full of actionable tips, plus plenty of strong opinions about the differences betwe...

Simon Willison's Weblog
13 Oct 20:36

nanochat

nanochat Really interesting new project from Andrej Karpathy, described at length in this discussion post. It provides a full ChatGPT-style LLM, including training, inference and a web Ui, that can be trained for as little as $100: This repo is a full-stack implementation o...

Simon Willison's Weblog
12 Oct 16:42

Quoting Slashdot

Slashdot: What's the reason OneDrive tells users this setting can only be turned off 3 times a year? (And are those any three times — or does that mean three specific days, like Christmas, New Year's Day, etc.)

[Microsoft's publicist chose not to answer this question.]

— Slashdot, asking the obvious question

Tags: slashdot, ai-ethics, ai, microsoft

Simon Willison's Weblog
11 Oct 21:39

Claude Code sub-agents

Claude Code includes the ability to run sub-agents, where a separate agent loop with a fresh token context is dispatched to achieve a goal and report back when it's done. I wrote a bit about how these work in June when I traced Claude Code's activity by intercepting its API ...

Simon Willison's Weblog
11 Oct 16:36

Vibing a Non-Trivial Ghostty Feature

Vibing a Non-Trivial Ghostty Feature Mitchell Hashimoto provides a comprehensive answer to the frequent demand for a detailed description of shipping a non-trivial production feature to an existing project using AI-assistance. In this case it's a slick unobtrusive auto-updat...

Simon Willison's Weblog
11 Oct 12:54

Note on 11th October 2025

I'm beginning to suspect that a key skill in working effectively with coding agents is developing an intuition for when you don't need to closely review every line of code they produce. This feels deeply uncomfortable!

Tags: vibe-coding, coding-agents, ai-assisted-programming, generative-ai, ai, llms

Simon Willison's Weblog
11 Oct 04:00

An MVCC-like columnar table on S3 with constant-time deletes

An MVCC-like columnar table on S3 with constant-time deletes s3's support for conditional writes (previously) makes it an interesting, scalable and often inexpensive platform for all kinds of database patterns. Shayon Mukherjee presents an ingenious design for a Parquet-back...

Simon Willison's Weblog
11 Oct 00:03

simonw/claude-skills

simonw/claude-skills One of the tips I picked up from Jesse Vincent's Claude Code Superpowers post (previously) was this: Skills are what give your agents Superpowers. The first time they really popped up on my radar was a few weeks ago when Anthropic rolled out improved Of...

Simon Willison's Weblog
10 Oct 23:39

Superpowers: How I'm using coding agents in October 2025

Superpowers: How I'm using coding agents in October 2025 A follow-up to Jesse Vincent's post about September, but this is a really significant piece in its own right. Jesse is one of the most creative users of coding agents (Claude Code in particular) that I know. He's put a...

Simon Willison's Weblog
10 Oct 23:18

A Retrospective Survey of 2024/2025 Open Source Supply Chain Compromises

A Retrospective Survey of 2024/2025 Open Source Supply Chain Compromises Filippo Valsorda surveyed 18 incidents from the past year of open source supply chain attacks, where package updates were infected with malware thanks to a compromise of the project itself. These are im...

Simon Willison's Weblog
10 Oct 22:48

Video of GPT-OSS 20B running on a phone

Video of GPT-OSS 20B running on a phone

GPT-OSS 20B is a very good model. At launch OpenAI claimed:

The gpt-oss-20b model delivers similar results to OpenAI o3‑mini on common benchmarks and can run on edge devices with just 16 GB of memory

Nexa AI just posted a video on Twitter demonstrating exactly that: the full GPT-OSS 20B running on a Snapdragon Gen 5 phone in their Nexa Studio Android app. It requires at least 16GB of RAM, and benefits from Snapdragon using a similar trick to Apple Silicon where the system RAM is available to both the CPU and the GPU.

The latest iPhone 17 Pro Max is still stuck at 12GB of RAM, presumably not enough to run this same model.

Tags: android, ai, openai, generative-ai, local-llms, llms, gpt-oss