FeedCity: Simon Willison's Weblog

XSLT on congress.gov

Today I learned - via a proposal to remove mentions of XSLT from the HTML spec - that congress.gov uses XSLT to serve XML bills as XHTML - here's H. R. 3617 117th CONGRESS 1st Session for example. View source on that page and it starts like this: <?xml version="1.0"?> ...

Simon Willison's Weblog
19 Aug 20:43

PyPI: Preventing Domain Resurrection Attacks

PyPI: Preventing Domain Resurrection Attacks Domain resurrection attacks are a nasty vulnerability in systems that use email verification to allow people to recover their accounts. If somebody lets their domain name expire an attacker might snap it up and use it to gain acce...

Simon Willison's Weblog
19 Aug 20:43

llama.cpp guide: running gpt-oss with llama.cpp

llama.cpp guide: running gpt-oss with llama.cpp Really useful official guide to running the OpenAI gpt-oss models using llama-server from llama.cpp - which provides an OpenAI-compatible localhost API and a neat web interface for interacting with the models. TLDR version for ...

Simon Willison's Weblog
19 Aug 05:03

r/ChatGPTPro: What is the most profitable thing you have done with ChatGPT?

r/ChatGPTPro: What is the most profitable thing you have done with ChatGPT?

This Reddit thread - with 279 replies - offers a neat targeted insight into the kinds of things people are using ChatGPT for.

Lots of variety here but two themes that stood out for me were ChatGPT for written negotiation - insurance claims, breaking rental leases - and ChatGPT for career and business advice.

Tags: reddit, ai, openai, generative-ai, chatgpt, llms

Simon Willison's Weblog
19 Aug 00:15

Google Gemini URL Context

Google Gemini URL Context New feature in the Gemini API: you can now enable a url_context tool which the models can use to request the contents of URLs as part of replying to a prompt. I released llm-gemini 0.25 with a new -o url_context 1 option adding support for this feat...

Simon Willison's Weblog
17 Aug 03:57

TIL: Running a gpt-oss eval suite against LM Studio on a Mac

TIL: Running a gpt-oss eval suite against LM Studio on a Mac The other day I learned that OpenAI published a set of evals as part of their gpt-oss model release, described in their cookbook on Verifying gpt-oss implementations. I decided to try and run that eval suite on my ...

Simon Willison's Weblog
17 Aug 01:00

Quoting Sam Altman

Most of what we're building out at this point is the inference [...] We're profitable on inference. If we didn't pay for training, we'd be a very profitable company.

— Sam Altman, during a "wide-ranging dinner with a small group of reporters in San Francisco"

Tags: openai, sam-altman, ai

Simon Willison's Weblog
16 Aug 17:21

Maintainers of Last Resort

Maintainers of Last Resort Filippo Valsorda founded Geomys last year as an "organization of professional open source maintainers", providing maintenance and support for critical packages in the Go language ecosystem backed by clients in retainer relationships. This is an in...

Simon Willison's Weblog
15 Aug 23:18

GPT-5 has a hidden system prompt

GPT-5 has a hidden system prompt It looks like GPT-5 when accessed via the OpenAI API may have its own hidden system prompt, independent from the system prompt you can specify in an API call. At the very least it's getting sent the current date. I tried this just now: llm -m...

Simon Willison's Weblog
15 Aug 23:00

The Summer of Johann: prompt injections as far as the eye can see

Independent AI researcher Johann Rehberger (previously) has had an absurdly busy August. Under the heading The Month of AI Bugs he has been publishing one report per day across an array of different tools, all of which are vulnerable to various classic prompt injection probl...

Simon Willison's Weblog
15 Aug 20:33

Meta’s AI rules have let bots hold ‘sensual’ chats with kids, offer false medical info

Meta’s AI rules have let bots hold ‘sensual’ chats with kids, offer false medical info

This is grim. Reuters got hold of a leaked copy Meta's internal "GenAI: Content Risk Standards" document:

Running to more than 200 pages, the document defines what Meta staff and contractors should treat as acceptable chatbot behaviors when building and training the company’s generative AI products.

Read the full story - there was some really nasty stuff in there.

It's understandable why this document was confidential, but also frustrating because documents like this are genuinely some of the best documentation out there in terms of how these systems can be expected to behave.

I'd love to see more transparency from AI labs around these kinds of decisions.

Tags: ai, meta, ai-ethics

Simon Willison's Weblog
15 Aug 17:09

Open weight LLMs exhibit inconsistent performance across providers

Artificial Analysis published a new benchmark the other day, this time focusing on how an individual model - OpenAI’s gpt-oss-120b - performs across different hosted providers. The results showed some surprising differences. Here's the one with the greatest variance, a run o...

Simon Willison's Weblog
15 Aug 16:24

Quoting Steve Wozniak

I gave all my Apple wealth away because wealth and power are not what I live for. I have a lot of fun and happiness. I funded a lot of important museums and arts groups in San Jose, the city of my birth, and they named a street after me for being good. I now speak publicly and have risen to the top. I have no idea how much I have but after speaking for 20 years it might be $10M plus a couple of homes. I never look for any type of tax dodge. I earn money from my labor and pay something like 55% combined tax on it. I am the happiest person ever. Life to me was never about accomplishment, but about Happiness, which is Smiles minus Frowns. I developed these philosophies when I was 18-20 years old and I never sold out.

— Steve Wozniak, in a comment on Slashdot

Tags: apple, careers, slashdot

Simon Willison's Weblog
14 Aug 20:57

Quoting Cory Doctorow

NERD HARDER! is the answer every time a politician gets a technological idée-fixe about how to solve a social problem by creating a technology that can't exist. It's the answer that EU politicians who backed the catastrophic proposal to require copyright filters for all use...

Simon Willison's Weblog
14 Aug 17:51

Introducing Gemma 3 270M: The compact model for hyper-efficient AI

Introducing Gemma 3 270M: The compact model for hyper-efficient AI New from Google: Gemma 3 270M, a compact, 270-million parameter model designed from the ground up for task-specific fine-tuning with strong instruction-following and text structuring capabilities already tra...

Simon Willison's Weblog
13 Aug 18:48

pyx: a Python-native package registry, now in Beta

pyx: a Python-native package registry, now in Beta Since its first release, the single biggest question around the uv Python environment management tool has been around Astral's business model: Astral are a VC-backed company and at some point they need to start making real r...

Simon Willison's Weblog
13 Aug 18:03

Screaming in the Cloud: AI’s Security Crisis: Why Your Assistant Might Betray You

Screaming in the Cloud: AI’s Security Crisis: Why Your Assistant Might Betray You I recorded this podcast conversation with Corey Quinn a few weeks ago: On this episode of Screaming in the Cloud, Corey Quinn talks with Simon Willison, founder of Datasette and creator of LLM...

Simon Willison's Weblog
13 Aug 16:48

How Does A Blind Model See The Earth?

How Does A Blind Model See The Earth?

Fun, creative new micro-eval. Split the world into a sampled collection of latitude longitude points and for each one ask a model:

If this location is over land, say 'Land'. If this location is over water, say 'Water'. Do not say anything else.

Author henry goes a step further: for models that expose logprobs they use the relative probability scores of Land or Water to get a confidence level, for other models they prompt four times at temperature 1 to get a score.

And then.. they plot those probabilities on a chart! Here's Gemini 2.5 Flash (one of the better results):

This reminds me of my pelican riding a bicycle benchmark in that it gives you an instant visual representation that's very easy to compare between different models.

Via @natolambert

Tags: ai, generative-ai, llms, evals

Simon Willison's Weblog
13 Aug 05:54

simonw/codespaces-llm

simonw/codespaces-llm GitHub Codespaces provides full development environments in your browser, and is free to use with anyone with a GitHub account. Each environment has a full Linux container and a browser-based UI using VS Code. I found out today that GitHub Codespaces co...

Simon Willison's Weblog
12 Aug 18:42

Claude Sonnet 4 now supports 1M tokens of context

Claude Sonnet 4 now supports 1M tokens of context Gemini and OpenAI both have million token models, so it's good to see Anthropic catching up. This is 5x the previous 200,000 context length limit of the various Claude Sonnet models. Anthropic have previously made 1 million t...

Simon Willison's Weblog
12 Aug 04:03

Quoting Nick Turley

I think there's been a lot of decisions over time that proved pretty consequential, but we made them very quickly as we have to. [...]

[On pricing] I had this kind of panic attack because we really needed to launch subscriptions because at the time we were taking the product down all the time. [...]

So what I did do is ship a Google Form to Discord with the four questions you're supposed to ask on how to price something.

But we got with the $20. We were debating something slightly higher at the time. I often wonder what would have happened because so many other companies ended up copying the $20 price point, so did we erase a bunch of market cap by pricing it this way?

— Nick Turley, Head of ChatGPT, interviewed by Lenny Rachitsky

Tags: chatgpt, discord, generative-ai, openai, llm-pricing, ai, llms

Simon Willison's Weblog
12 Aug 00:18

LLM 0.27, the annotated release notes: GPT-5 and improved tool calling

I shipped LLM 0.27 today, adding support for the new GPT-5 family of models from OpenAI plus a flurry of improvements to the tool calling features introduced in LLM 0.26. Here are the annotated release notes. GPT-5 New models: gpt-5, gpt-5-mini and gpt-5-nano. #1229 I w...

Simon Willison's Weblog
11 Aug 18:39

Reddit will block the Internet Archive

Reddit will block the Internet Archive

Well this sucks. Jay Peters for the Verge:

Reddit says that it has caught AI companies scraping its data from the Internet Archive’s Wayback Machine, so it’s going to start blocking the Internet Archive from indexing the vast majority of Reddit. The Wayback Machine will no longer be able to crawl post detail pages, comments, or profiles; instead, it will only be able to index the Reddit.com homepage, which effectively means Internet Archive will only be able to archive insights into which news headlines and posts were most popular on a given day.

Tags: internet-archive, reddit, scraping, ai, training-data, ai-ethics

Simon Willison's Weblog
11 Aug 16:27

Codex upgrade

If you've been experimenting with OpenAI's Codex CLI and have been frustrated that it's not possible to select text and copy it to the clipboard, at least when running in the Mac terminal (I genuinely didn't know it was possible to build a terminal app that disabled copy and...

Simon Willison's Weblog
11 Aug 06:48

qwen-image-mps

qwen-image-mps Ivan Fioravanti built this Python CLI script for running the Qwen/Qwen-Image image generation model on an Apple silicon Mac, optionally using the Qwen-Image-Lightning LoRA to dramatically speed up generation. Ivan has tested it this on 512GB and 128GB machines...

Simon Willison's Weblog
11 Aug 05:36

AI for data engineers with Simon Willison

AI for data engineers with Simon Willison I recorded an episode last week with Claire Giordano for the Talking Postgres podcast. The topic was "AI for data engineers" but we ended up covering an enjoyable range of different topics. How I got started programming with a Commo...

Simon Willison's Weblog
11 Aug 04:33

Chromium Docs: The Rule Of 2

Chromium Docs: The Rule Of 2 Alex Russell pointed me to this principle in the Chromium security documentation as similar to my description of the lethal trifecta. First added in 2019, the Chromium guideline states: When you write code to parse, evaluate, or otherwise handle...

Simon Willison's Weblog
11 Aug 00:42

Qwen3-4B-Thinking: "This is art - pelicans don't ride bikes!"

I've fallen a few days behind keeping up with Qwen. They released two new 4B models last week: Qwen3-4B-Instruct-2507 and its thinking equivalent Qwen3-4B-Thinking-2507. These are relatively tiny models that punch way above their weight. I’ve been running the 8bit GGUF vari...

Simon Willison's Weblog
10 Aug 23:33

Quoting Sam Altman

the percentage of users using reasoning models each day is significantly increasing; for example, for free users we went from <1% to 7%, and for plus users from 7% to 24%.

— Sam Altman, revealing quite how few people used the old model picker to upgrade from GPT-4o

Tags: openai, llm-reasoning, ai, llms, gpt-5, sam-altman, generative-ai, chatgpt

Simon Willison's Weblog
09 Aug 16:15

Quoting Ethan Mollick

The issue with GPT-5 in a nutshell is that unless you pay for model switching & know to use GPT-5 Thinking or Pro, when you ask “GPT-5” you sometimes get the best available AI & sometimes get one of the worst AIs available and it might even switch within a single conversation.

— Ethan Mollick, highlighting that GPT-5 (high) ranks top on Artificial Analysis, GPT-5 (minimal) ranks lower than GPT-4.1

Tags: gpt-5, ethan-mollick, generative-ai, ai, llms