FeedCity: Simon Willison's Weblog

Reflections on OpenAI

Reflections on OpenAI Calvin French-Owen spent just over a year working at OpenAI, during which time the organization grew from 1,000 to 3,000 people and Calvin found himself in "the top 30% by tenure". His reflections on leaving are fascinating - absolutely crammed with det...

Simon Willison's Weblog
15 Jul 14:12

xAI: "We spotted a couple of issues with Grok 4 recently that we immediately investigated & mitigated"

xAI: "We spotted a couple of issues with Grok 4 recently that we immediately investigated & mitigated" They continue: One was that if you ask it "What is your surname?" it doesn't have one so it searches the internet leading to undesirable results, such as when its sear...

Simon Willison's Weblog
14 Jul 22:03

Application development without programmers

Application development without programmers This book by James Martin, published in 1982, includes the following in the preface: Applications development did not change much for 20 years, but now a new wave is crashing in. A rich diversity of nonprocedural techniques and la...

Simon Willison's Weblog
14 Jul 17:33

ccusage

ccusage

Claude Code logs detailed usage information to the ~/.claude/ directory. ccusage is a neat little Node.js tool which reads that information and shows you a readable summary of your usage patterns, including the estimated cost in USD per day.

You can run it using npx like this:

npx ccusage@latest

Tags: javascript, nodejs, anthropic, claude-code

Simon Willison's Weblog
13 Jul 19:00

Happy 20th birthday Django! Here's my talk on Django Origins from Django's 10th

Today is the 20th anniversary of the first commit to the public Django repository! Ten years ago we threw a multi-day 10th birthday party for Django back in its birthtown of Lawrence, Kansas. As a personal celebration of the 20th, I'm revisiting the talk I gave at that event...

Simon Willison's Weblog
12 Jul 18:39

Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity

Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity METR - for Model Evaluation & Threat Research - are a non-profit research institute founded by Beth Barnes, a former alignment researcher at OpenAI (see Wikipedia). They've previously...

Simon Willison's Weblog
12 Jul 17:15

Grok 4 Heavy won't reveal its system prompt

Grok 4 Heavy won't reveal its system prompt Grok 4 Heavy is the "think much harder" version of Grok 4 that's currenly only available on their $300/month plan. Jeremy Howard relays a report from a Grok 4 Heavy user who wishes to remain anonymous: it turns out that Heavy, unli...

Simon Willison's Weblog
12 Jul 16:42

crates.io: Trusted Publishing

crates.io: Trusted Publishing

crates.io is the Rust ecosystem's equivalent of PyPI. Inspired by PyPI's GitHub integration (see my TIL, I use this for dozens of my packages now) they've added a similar feature:

Trusted Publishing eliminates the need for GitHub Actions secrets when publishing crates from your CI/CD pipeline. Instead of managing API tokens, you can now configure which GitHub repository you trust directly on crates.io.

They're missing one feature that PyPI has: on PyPI you can create a "pending publisher" for your first release. crates.io currently requires the first release to be manual:

To get started with Trusted Publishing, you'll need to publish your first release manually. After that, you can set up trusted publishing for future releases.

Via @charliermarsh

Tags: github, packaging, pypi, rust

Simon Willison's Weblog
12 Jul 15:54

Quoting @grok

On the morning of July 8, 2025, we observed undesired responses and immediately began investigating. To identify the specific language in the instructions causing the undesired behavior, we conducted multiple ablations and experiments to pinpoint the main culprits. We iden...

Simon Willison's Weblog
12 Jul 04:12

Musk’s latest Grok chatbot searches for billionaire mogul’s views before answering questions

Musk’s latest Grok chatbot searches for billionaire mogul’s views before answering questions I got quoted a couple of times in this story about Grok searching for tweets from:elonmusk by Matt O’Brien for the Associated Press. “It’s extraordinary,” said Simon Willison, an in...

Simon Willison's Weblog
11 Jul 18:54

moonshotai/Kimi-K2-Instruct

moonshotai/Kimi-K2-Instruct Colossal new open weights model release today from Moonshot AI, a two year old Chinese AI lab with a name inspired by Pink Floyd’s album The Dark Side of the Moon. My HuggingFace storage calculator says the repository is 958.52 GB. It's a mixture-...

Simon Willison's Weblog
11 Jul 17:12

Quoting Django’s security policies

Following the widespread availability of large language models (LLMs), the Django Security Team has received a growing number of security reports generated partially or entirely using such tools. Many of these contain inaccurate, misleading, or fictitious content. While AI ...

Simon Willison's Weblog
11 Jul 05:42

Generationship: Ep. #39, Simon Willison

Generationship: Ep. #39, Simon Willison I recorded this podcast episode with Rachel Chalmers a few weeks ago. We talked about the resurgence of blogging, the legacy of Google Reader, learning in public, LLMs as weirdly confident interns, AI-assisted search, prompt injection,...

Simon Willison's Weblog
11 Jul 04:54

Postgres LISTEN/NOTIFY does not scale

Postgres LISTEN/NOTIFY does not scale

I think this headline is justified. Recall.ai, a provider of meeting transcription bots, noticed that their PostgreSQL instance was being bogged down by heavy concurrent writes.

After some spelunking they found this comment in the PostgreSQL source explaining that transactions with a pending notification take out a global lock against the entire PostgreSQL instance (represented by database 0) to ensure "that queue entries appear in commit order".

Moving away from LISTEN/NOTIFY to trigger actions on changes to rows gave them a significant performance boost under high write loads.

Via Hacker News

Tags: databases, performance, postgresql

Simon Willison's Weblog
11 Jul 00:42

Grok: searching X for "from:elonmusk (Israel OR Palestine OR Hamas OR Gaza)"

If you ask the new Grok 4 for opinions on controversial questions, it will sometimes run a search to find out Elon Musk's stance before providing you with an answer. I heard about this today from Jeremy Howard, following a trail that started with @micah_erfan and lead throug...

Simon Willison's Weblog
10 Jul 19:54

Grok 4

Grok 4 Released last night, Grok 4 is now available via both API and a paid subscription for end-users. Key characteristics: image and text input, text output. 256,000 context length (twice that of Grok 3). It's a reasoning model where you can't see the reasoning tokens or t...

Simon Willison's Weblog
09 Jul 19:24

Infinite Monkey

Infinite Monkey

Mihai Parparita's Infinite Mac lets you run classic MacOS emulators directly in your browser. Infinite Monkey is a new feature which taps into the OpenAI Computer Use and Claude Computer Use APIs using your own API keys and uses them to remote control the emulated Mac!

Here's what happened when I told OpenAI Computer Use to "Open MacPaint and draw a pelican riding a bicycle" - video sped up 3x.

Via @persistent.info

Tags: macos, mihai-parparita, ai, webassembly, generative-ai, llms, ai-agents, pelican-riding-a-bicycle

Simon Willison's Weblog
09 Jul 00:15

uv cache prune

If you're running low on disk space and are a uv user, don't forget about uv cache prune:

uv cache prune removes all unused cache entries. For example, the cache directory may contain entries created in previous uv versions that are no longer necessary and can be safely removed. uv cache prune is safe to run periodically, to keep the cache directory clean.

My Mac just ran out of space. I ran OmniDiskSweeper and noticed that the ~/.cache/uv directory was 63.4GB - so I ran this:

uv cache prune                    
Pruning cache at: /Users/simon/.cache/uv
Removed 1156394 files (37.3GiB)

And now my computer can breathe again!

Tags: uv, python

Simon Willison's Weblog
07 Jul 19:51

Quoting Aphyr

I strongly suspect that Market Research Future, or a subcontractor, is conducting an automated spam campaign which uses a Large Language Model to evaluate a Mastodon instance, submit a plausible application for an account, and to post slop which links to Market Research Future reports. [...]

I don’t know how to run a community forum in this future. I do not have the time or emotional energy to screen out regular attacks by Large Language Models, with the knowledge that making the wrong decision costs a real human being their connection to a niche community.

— Aphyr, The Future of Forums is Lies, I Guess

Tags: spam, ai, llms, ai-ethics, slop, generative-ai, mastodon, community, moderation

Simon Willison's Weblog
07 Jul 19:03

Become a command-line superhero with Simon Willison's llm tool

Become a command-line superhero with Simon Willison's llm tool Christopher Smith ran a mini hackathon in Albany New York at the weekend around uses of my LLM - the first in-person event I'm aware of dedicated to that project! He prepared this video version of the opening tal...

Simon Willison's Weblog
07 Jul 16:00

Adding a feature because ChatGPT incorrectly thinks it exists

Adding a feature because ChatGPT incorrectly thinks it exists

Adrian Holovaty describes how his SoundSlice service saw an uptick in users attempting to use their sheet music scanner to import ASCII-art guitar tab... because it turned out ChatGPT had hallucinated that as a feature SoundSlice supported and was telling users to go there!

So they built that feature. Easier than convincing OpenAI to somehow patch ChatGPT to stop it from hallucinating a feature that doesn't exist.

Adrian:

To my knowledge, this is the first case of a company developing a feature because ChatGPT is incorrectly telling people it exists. (Yay?)

Via Hacker News

Tags: adrian-holovaty, ai, openai, generative-ai, chatgpt, llms, ai-ethics

Simon Willison's Weblog
06 Jul 23:18

I Shipped a macOS App Built Entirely by Claude Code

I Shipped a macOS App Built Entirely by Claude Code Indragie Karunaratne has "been building software for the Mac since 2008", but recently decided to try Claude Code to build a side project: Context, a native Mac app for debugging MCP servers: There is still skill and itera...

Simon Willison's Weblog
06 Jul 10:12

Quoting Nineteen Eighty-Four

There was a whole chain of separate departments dealing with proletarian literature, music, drama, and entertainment generally. Here were produced rubbishy newspapers containing almost nothing except sport, crime and astrology, sensational five-cent novelettes, films oozing with sex, and sentimental songs which were composed entirely by mechanical means on a special kind of kaleidoscope known as a versificator. [...]

It was one of countless similar songs published for the benefit of the proles by a sub-section of the Music Department. The words of these songs were composed without any human intervention whatever on an instrument known as a versificator.

— Nineteen Eighty-Four, George Orwell predicts generative AI, published 1949

Tags: ai-ethics, ai, generative-ai

Simon Willison's Weblog
06 Jul 03:03

Supabase MCP can leak your entire SQL database

Supabase MCP can leak your entire SQL database Here's yet another example of a lethal trifecta attack, where an LLM system combines access to private data, exposure to potentially malicious instructions and a mechanism to communicate data back out to an attacker. In this cas...

Simon Willison's Weblog
05 Jul 23:57

Serving 200 million requests per day with a cgi-bin

Serving 200 million requests per day with a cgi-bin Jake Gold tests how well 90s-era CGI works today, using a Go + SQLIte CGI program running on a 16-thread AMD 3700X. Using CGI on modest hardware, it’s possible to serve 2400+ requests per second or 200M+ requests per day. ...

Simon Willison's Weblog
05 Jul 05:48

Cursor: Clarifying Our Pricing

Cursor: Clarifying Our Pricing Cursor changed their pricing plan on June 16th, introducing a new $200/month Ultra plan with "20x more usage than Pro" and switching their $20/month Pro plan from "request limits to compute limits". This confused a lot of people. Here's Cursor'...

Simon Willison's Weblog
04 Jul 19:30

Identify, solve, verify

The more time I spend using LLMs for code, the less I worry for my career - even as their coding capabilities continue to improve. Using LLMs as part of my process helps me understand how much of my job isn't just bashing out code. My job is to identify problems that can be ...

Simon Willison's Weblog
04 Jul 15:39

awwaiid/gremllm

awwaiid/gremllm Delightfully cursed Python library by Brock Wilcox, built on top of LLM: from gremllm import Gremllm counter = Gremllm("counter") counter.value = 5 counter.increment() print(counter.value) # 6? print(counter.to_roman_numerals()) # VI? You tell your Gremllm...

Simon Willison's Weblog
03 Jul 22:18

Quoting Adam Gordon Bell

I think that a lot of resistance to AI coding tools comes from the same place: fear of losing something that has defined you for so long. People are reacting against overblown hype, and there is overblown hype. I get that, but I also think there’s something deeper going on here. When you’ve worked hard to build your skills, when coding is part of your identity and where you get your worth, the idea of a tool that might replace some of that is very threatening.

— Adam Gordon Bell, When AI Codes, What’s Left for me?

Tags: llms, careers, ai, generative-ai, ai-assisted-programming

Simon Willison's Weblog
03 Jul 21:39

TIL: Rate limiting by IP using Cloudflare's rate limiting rules

TIL: Rate limiting by IP using Cloudflare's rate limiting rules

My blog started timing out on some requests a few days ago, and it turned out there were misbehaving crawlers that were spidering my /search/ page even though it's restricted by robots.txt.

I run this site behind Cloudflare and it turns out Cloudflare's WAF (Web Application Firewall) has a rate limiting tool that I could use to restrict requests to /search/* by a specific IP to a maximum of 5 every 10 seconds.

Tags: rate-limiting, security, cloudflare, til