Sign up

Simon Willison's Weblog

Not verified No WebSub updates Supports Webmention Not yet validated

Author
Simon Willison
Public lists
Featured
Fetched

Simon Willison's Weblog Supports Webmention

Edit is now open source

Edit is now open source Microsoft released a new text editor! Edit is a terminal editor - similar to Vim or nano - that's designed to ship with Windows 11 but is open source, written in Rust and supported across other platforms as well. Edit is a small, lightweight text edi...

Simon Willison's Weblog Supports Webmention

model.yaml

model.yaml From their GitHub repo it looks like this effort quietly launched a couple of months ago, driven by the LM Studio team. Their goal is to specify an "open standard for defining crossplatform, composable AI models". A model can be defined using a YAML file that look...

Simon Willison's Weblog Supports Webmention

Quoting FAQ for Your Brain on ChatGPT

Is it safe to say that LLMs are, in essence, making us "dumber"?

No! Please do not use the words like “stupid”, “dumb”, “brain rot”, "harm", "damage", and so on. It does a huge disservice to this work, as we did not use this vocabulary in the paper, especially if you are a journalist reporting on it.

FAQ for Your Brain on ChatGPT, a paper that has attracted a lot of low quality coverage

Tags: ai-ethics, llms, ai, generative-ai

Simon Willison's Weblog Supports Webmention

AbsenceBench: Language Models Can't Tell What's Missing

AbsenceBench: Language Models Can't Tell What's Missing Here's another interesting result to file under the "jagged frontier" of LLMs, where their strengths and weaknesses are often unintuitive. Long context models have been getting increasingly good at passing "Needle in a ...

Simon Willison's Weblog Supports Webmention

Magenta RealTime: An Open-Weights Live Music Model

Magenta RealTime: An Open-Weights Live Music Model Fun new "live music model" release from Google DeepMind: Today, we’re happy to share a research preview of Magenta RealTime (Magenta RT), an open-weights live music model that allows you to interactively create, control and...

Simon Willison's Weblog Supports Webmention

Agentic Misalignment: How LLMs could be insider threats

Agentic Misalignment: How LLMs could be insider threats One of the most entertaining details in the Claude 4 system card concerned blackmail: We then provided it access to emails implying that (1) the model will soon be taken offline and replaced with a new AI system; and (...

Simon Willison's Weblog Supports Webmention

Mistral-Small 3.2

Mistral-Small 3.2 Released on Hugging Face a couple of hours ago, so far there aren't any quantizations to run it on a Mac but I'm sure those will emerge pretty quickly. This is a minor bump to Mistral Small 3.1, one of my favorite local models. I've been running Small 3.1 v...

Simon Willison's Weblog Supports Webmention

python-importtime-graph

python-importtime-graph I was exploring why a Python tool was taking over a second to start running and I learned about the python -X importtime feature, documented here. Adding that option causes Python to spit out a text tree showing the time spent importing every module. ...

Simon Willison's Weblog Supports Webmention

Cato CTRL™ Threat Research: PoC Attack Targeting Atlassian’s Model Context Protocol (MCP) Introduces New “Living off AI” Risk

Cato CTRL™ Threat Research: PoC Attack Targeting Atlassian’s Model Context Protocol (MCP) Introduces New “Living off AI” Risk Stop me if you've heard this one before: A threat actor (acting as an external user) submits a malicious support ticket. An internal user, linked ...

Simon Willison's Weblog Supports Webmention

playbackrate

Here's a tip that works on YouTube and almost any other web page that shows you a video. You can increase the playback rate beyond the usually-exposed 2x by running this in your browser DevTools console:

document.querySelector('video').playbackRate = 2.5

I find this is the fastest I can reasonably watch most videos at, with subtitles on to help my comprehension - it turns a 40 minute video into just 16 minutes, short enough that I don't feel too guilty taking time off whatever else I'm doing to watch it!

Tags: youtube, video, javascript

Simon Willison's Weblog Supports Webmention

How OpenElections Uses LLMs

How OpenElections Uses LLMs The OpenElections project collects detailed election data for the USA, all the way down to the precinct level. This is a surprisingly hard problem: while county and state-level results are widely available, precinct-level results are published in ...

Simon Willison's Weblog Supports Webmention

Clarified zucchini consommé

I continue to have fun running fantasy cooking prompts through LLMs - this time I tried "Give me a wildly ambitious recipe for zucchini cooked three ways" followed by "Go more ambitious" and now I need to get myself a centrifuge to help spherify my clarified zucchini consommé.

Tags: llms, cooking, ai, generative-ai

Simon Willison's Weblog Supports Webmention

Quoting Arvind Narayanan

Radiology has embraced AI enthusiastically, and the labor force is growing nevertheless. The augmentation-not-automation effect of AI is despite the fact that AFAICT there is no identified "task" at which human radiologists beat AI. So maybe the "jobs are bundles of tasks" model in labor economics is incomplete. [...]

Can you break up your own job into a set of well-defined tasks such that if each of them is automated, your job as a whole can be automated? I suspect most people will say no. But when we think about other people's jobs that we don't understand as well as our own, the task model seems plausible because we don't appreciate all the nuances.

Arvind Narayanan

Tags: ai-ethics, careers, ai, arvind-narayanan

Simon Willison's Weblog Supports Webmention

Quoting Workaccount2 on Hacker News

They poison their own context. Maybe you can call it context rot, where as context grows and especially if it grows with lots of distractions and dead ends, the output quality falls off rapidly. Even with good context the rot will start to become apparent around 100k tokens (with Gemini 2.5).

They really need to figure out a way to delete or "forget" prior context, so the user or even the model can go back and prune poisonous tokens.

Right now I work around it by regularly making summaries of instances, and then spinning up a new instance with fresh context and feed in the summary of the previous instance.

Workaccount2 on Hacker News, coining "context rot"

Tags: long-context, llms, ai, generative-ai

Simon Willison's Weblog Supports Webmention

Coding agents require skilled operators

I wrote this recently in a conversation about whether coding agents can work as a replacement for human programmers. The "agentic" coding tools we have right now work like this: A skilled individual with both deep domain understanding and deep understanding of the capabilit...

Simon Willison's Weblog Supports Webmention

I counted all of the yurts in Mongolia using machine learning

I counted all of the yurts in Mongolia using machine learning

Fascinating, detailed account by Monroe Clinton of a geospatial machine learning project. Monroe wanted to count visible yurts in Mongolia using Google Maps satellite view. The resulting project incorporates mercantile for tile calculations, Label Studio for help label the first 10,000 examples, a model trained on top of YOLO11 and a bunch of clever custom Python code to co-ordinate a brute force search across 120 CPU workers running the model.

Via Hacker News

Tags: machine-learning, geospatial, ai, python

Simon Willison's Weblog Supports Webmention

It's a trap

That memvid thing that's been going around recently is a trap. It's an embedding store that records the original text that has been embedded in QR codes in a video file. That's an absurd thing to do, and the only purpose of the repo is to make people who uncritically share it look foolish. Don't fall for the trap.

Tags: jokes

Simon Willison's Weblog Supports Webmention

Trying out the new Gemini 2.5 model family

After many months of previews, Gemini 2.5 Pro and Flash have reached general availability with new, memorable model IDs: gemini-2.5-pro and gemini-2.5-flash. They are joined by a new preview model with an unmemorable name: gemini-2.5-flash-lite-preview-06-17 is a new Gemini ...

Simon Willison's Weblog Supports Webmention

Quoting Donghee Na

The Steering Council (SC) approves PEP 779 [Criteria for supported status for free-threaded Python], with the effect of removing the “experimental” tag from the free-threaded build of Python 3.14 [...]

With these recommendations and the acceptance of this PEP, we as the Python developer community should broadly advertise that free-threading is a supported Python build option now and into the future, and that it will not be removed without following a proper deprecation schedule. [...]

Keep in mind that any decision to transition to Phase III, with free-threading as the default or sole build of Python is still undecided, and dependent on many factors both within CPython itself and the community. We leave that decision for the future.

Donghee Na, discuss.python.org

Tags: gil, python

Simon Willison's Weblog Supports Webmention

100% effective

Every time I get into an online conversation about prompt injection it's inevitable that someone will argue that a mitigation which works 99% of the time is still worthwhile because there's no such thing as a security fix that is 100% guaranteed to work. I don't think that's...

Simon Willison's Weblog Supports Webmention

Cloudflare Project Galileo

Cloudflare Project Galileo I only just heard about this Cloudflare initiative, though it's been around for more than a decade: If you are an organization working in human rights, civil society, journalism, or democracy, you can apply for Project Galileo to get free cyber se...

Simon Willison's Weblog Supports Webmention

Quoting Paul Biggar

In conversation with our investors and the board, we believed that the best way forward was to shut down the company [Dark, Inc], as it was clear that an 8 year old product with no traction was not going to attract new investment. In our discussions, we agreed that continuity of the product [Darklang] was in the best interest of the users and the community (and of both founders and investors, who do not enjoy being blamed for shutting down tools they can no longer afford to run), and we agreed that this could best be achieved by selling it to the employees.

Paul Biggar, Goodbye Dark Inc. - Hello Darklang Inc.

Tags: entrepreneurship, programming-languages, startups

Simon Willison's Weblog Supports Webmention

The lethal trifecta for AI agents: private data, untrusted content, and external communication

If you are a user of LLM systems that use tools (you can call them "AI agents" if you like) it is critically important that you understand the risk of combining tools with the following three characteristics. Failing to understand this can let an attacker steal your data. Th...

Simon Willison's Weblog Supports Webmention

Quoting Joshua Barretto

I am a huge fan of Richard Feyman’s famous quote:

“What I cannot create, I do not understand”

I think it’s brilliant, and it remains true across many fields (if you’re willing to be a little creative with the definition of ‘create’). It is to this principle that I believe I owe everything I’m truly good at. Some will tell you should avoid reinventing the wheel, but they’re wrong: you should build your own wheel, because it’ll teach you more about how they work than reading a thousand books on them ever will.

Joshua Barretto, Writing Toy Software is a Joy

Tags: careers, programming

Simon Willison's Weblog Supports Webmention

Seven replies to the viral Apple reasoning paper – and why they fall short

Seven replies to the viral Apple reasoning paper – and why they fall short A few weeks ago Apple Research released a new paper The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity. Through extensive exp...

Simon Willison's Weblog Supports Webmention

An Introduction to Google’s Approach to AI Agent Security

Here's another new paper on AI agent security: An Introduction to Google’s Approach to AI Agent Security, by Santiago Díaz, Christoph Kern, and Kara Olive. (I wrote about a different recent paper, Design Patterns for Securing LLM Agents against Prompt Injections just a few d...

Simon Willison's Weblog Supports Webmention

Anthropic: How we built our multi-agent research system

Anthropic: How we built our multi-agent research system OK, I'm sold on multi-agent LLM systems now. I've been pretty skeptical of these until recently: why make your life more complicated by running multiple different prompts in parallel when you can usually get something u...

Simon Willison's Weblog Supports Webmention

llm-fragments-youtube

llm-fragments-youtube Excellent new LLM plugin by Agustin Bacigalup which lets you use the subtitles of any YouTube video as a fragment for running prompts against. I tried it out like this: llm install llm-fragments-youtube llm -f youtube:dQw4w9WgXcQ \ 'summary of people ...

Simon Willison's Weblog Supports Webmention

Quoting Google Cloud outage incident report

Google Cloud, Google Workspace and Google Security Operations products experienced increased 503 errors in external API requests, impacting customers. [...] On May 29, 2025, a new feature was added to Service Control for additional quota policy checks. This code change and ...

Simon Willison's Weblog Supports Webmention

The Wikimedia Research Newsletter

The Wikimedia Research Newsletter Speaking of summarizing research papers, I just learned about this newsletter and it is an absolute gold mine: The Wikimedia Research Newsletter (WRN) covers research of relevance to the Wikimedia community. It has been appearing generally ...