Simon Willison's Weblog
- Author
- Simon Willison
- Public lists
-
Featured
- Fetched
PyPI: Preventing Domain Resurrection Attacks
llama.cpp guide: running gpt-oss with llama.cpp
r/ChatGPTPro: What is the most profitable thing you have done with ChatGPT?
r/ChatGPTPro: What is the most profitable thing you have done with ChatGPT?
This Reddit thread - with 279 replies - offers a neat targeted insight into the kinds of things people are using ChatGPT for.Lots of variety here but two themes that stood out for me were ChatGPT for written negotiation - insurance claims, breaking rental leases - and ChatGPT for career and business advice.
Google Gemini URL Context
TIL: Running a gpt-oss eval suite against LM Studio on a Mac
Quoting Sam Altman
Most of what we're building out at this point is the inference [...] We're profitable on inference. If we didn't pay for training, we'd be a very profitable company.
— Sam Altman, during a "wide-ranging dinner with a small group of reporters in San Francisco"
Tags: openai, sam-altman, ai
Maintainers of Last Resort
GPT-5 has a hidden system prompt
The Summer of Johann: prompt injections as far as the eye can see
Meta’s AI rules have let bots hold ‘sensual’ chats with kids, offer false medical info
Meta’s AI rules have let bots hold ‘sensual’ chats with kids, offer false medical info
This is grim. Reuters got hold of a leaked copy Meta's internal "GenAI: Content Risk Standards" document:Running to more than 200 pages, the document defines what Meta staff and contractors should treat as acceptable chatbot behaviors when building and training the company’s generative AI products.
Read the full story - there was some really nasty stuff in there.
It's understandable why this document was confidential, but also frustrating because documents like this are genuinely some of the best documentation out there in terms of how these systems can be expected to behave.
I'd love to see more transparency from AI labs around these kinds of decisions.
Open weight LLMs exhibit inconsistent performance across providers
Quoting Steve Wozniak
I gave all my Apple wealth away because wealth and power are not what I live for. I have a lot of fun and happiness. I funded a lot of important museums and arts groups in San Jose, the city of my birth, and they named a street after me for being good. I now speak publicly and have risen to the top. I have no idea how much I have but after speaking for 20 years it might be $10M plus a couple of homes. I never look for any type of tax dodge. I earn money from my labor and pay something like 55% combined tax on it. I am the happiest person ever. Life to me was never about accomplishment, but about Happiness, which is Smiles minus Frowns. I developed these philosophies when I was 18-20 years old and I never sold out.
— Steve Wozniak, in a comment on Slashdot
Quoting Cory Doctorow
Introducing Gemma 3 270M: The compact model for hyper-efficient AI
pyx: a Python-native package registry, now in Beta
Screaming in the Cloud: AI’s Security Crisis: Why Your Assistant Might Betray You
How Does A Blind Model See The Earth?
How Does A Blind Model See The Earth?
Fun, creative new micro-eval. Split the world into a sampled collection of latitude longitude points and for each one ask a model:
If this location is over land, say 'Land'. If this location is over water, say 'Water'. Do not say anything else.
Author henry goes a step further: for models that expose logprobs they use the relative probability scores of Land or Water to get a confidence level, for other models they prompt four times at temperature 1 to get a score.
And then.. they plot those probabilities on a chart! Here's Gemini 2.5 Flash (one of the better results):

This reminds me of my pelican riding a bicycle benchmark in that it gives you an instant visual representation that's very easy to compare between different models.
Via @natolambert
Tags: ai, generative-ai, llms, evals
simonw/codespaces-llm
Claude Sonnet 4 now supports 1M tokens of context
Quoting Nick Turley
I think there's been a lot of decisions over time that proved pretty consequential, but we made them very quickly as we have to. [...]
[On pricing] I had this kind of panic attack because we really needed to launch subscriptions because at the time we were taking the product down all the time. [...]
So what I did do is ship a Google Form to Discord with the four questions you're supposed to ask on how to price something.
But we got with the $20. We were debating something slightly higher at the time. I often wonder what would have happened because so many other companies ended up copying the $20 price point, so did we erase a bunch of market cap by pricing it this way?
— Nick Turley, Head of ChatGPT, interviewed by Lenny Rachitsky
Tags: chatgpt, discord, generative-ai, openai, llm-pricing, ai, llms
LLM 0.27, the annotated release notes: GPT-5 and improved tool calling
Reddit will block the Internet Archive
Reddit will block the Internet Archive
Well this sucks. Jay Peters for the Verge:Reddit says that it has caught AI companies scraping its data from the Internet Archive’s Wayback Machine, so it’s going to start blocking the Internet Archive from indexing the vast majority of Reddit. The Wayback Machine will no longer be able to crawl post detail pages, comments, or profiles; instead, it will only be able to index the Reddit.com homepage, which effectively means Internet Archive will only be able to archive insights into which news headlines and posts were most popular on a given day.
Tags: internet-archive, reddit, scraping, ai, training-data, ai-ethics
Codex upgrade
qwen-image-mps
AI for data engineers with Simon Willison
Chromium Docs: The Rule Of 2
Qwen3-4B-Thinking: "This is art - pelicans don't ride bikes!"
Quoting Sam Altman
the percentage of users using reasoning models each day is significantly increasing; for example, for free users we went from <1% to 7%, and for plus users from 7% to 24%.
— Sam Altman, revealing quite how few people used the old model picker to upgrade from GPT-4o
Tags: openai, llm-reasoning, ai, llms, gpt-5, sam-altman, generative-ai, chatgpt
Quoting Ethan Mollick
The issue with GPT-5 in a nutshell is that unless you pay for model switching & know to use GPT-5 Thinking or Pro, when you ask “GPT-5” you sometimes get the best available AI & sometimes get one of the worst AIs available and it might even switch within a single conversation.
— Ethan Mollick, highlighting that GPT-5 (high) ranks top on Artificial Analysis, GPT-5 (minimal) ranks lower than GPT-4.1
Tags: gpt-5, ethan-mollick, generative-ai, ai, llms