Sign up

Simon Willison's Weblog

Not verified No WebSub updates Supports Webmention Not yet validated

Author
Simon Willison
Public lists
Featured
Fetched

Simon Willison's Weblog Supports Webmention

Quoting Scott Aaronson

Given a week or two to try out ideas and search the literature, I’m pretty sure that Freek and I could’ve solved this problem ourselves. Instead, though, I simply asked GPT5-Thinking. After five minutes, it gave me something confident, plausible-looking, and (I could tell) ...

Simon Willison's Weblog Supports Webmention

Quoting Nick Turley

We’ve seen the strong reactions to 4o responses and want to explain what is happening.

We’ve started testing a new safety routing system in ChatGPT.

As we previously mentioned, when conversations touch on sensitive and emotional topics the system may switch mid-chat to a reasoning model or GPT-5 designed to handle these contexts with extra care. This is similar to how we route conversations that require extra thinking to our reasoning models; our goal is to always deliver answers aligned with our Model Spec.

Routing happens on a per-message basis; switching from the default model happens on a temporary basis. ChatGPT will tell you which model is active when asked.

Nick Turley, Head of ChatGPT, OpenAI

Tags: generative-ai, openai, chatgpt, ai, llms, nick-turley

Simon Willison's Weblog Supports Webmention

Video models are zero-shot learners and reasoners

Video models are zero-shot learners and reasoners Fascinating new paper from Google DeepMind which makes a very convincing case that their Veo 3 model - and generative video models in general - serve a similar role in the machine learning visual ecosystem as LLMs do for text...

Simon Willison's Weblog Supports Webmention

Quoting Dan Abramov

Conceptually, Mastodon is a bunch of copies of the same webapp emailing each other. There is no realtime global aggregation across the network so it can only offer a fragmented user experience. While some people might like it, it can't directly compete with closed social pr...

Simon Willison's Weblog Supports Webmention

ForcedLeak: AI Agent risks exposed in Salesforce AgentForce

ForcedLeak: AI Agent risks exposed in Salesforce AgentForce Classic lethal trifecta image exfiltration bug reported against Salesforce AgentForce by Sasi Levi and Noma Security. Here the malicious instructions come in via the Salesforce Web-to-Lead feature. When a Salesforce...

Simon Willison's Weblog Supports Webmention

How to stop AI’s “lethal trifecta”

How to stop AI’s “lethal trifecta” This is the second mention of the lethal trifecta in the Economist in just the last week! Their earlier coverage was Why AI systems may never be secure on September 22nd - I wrote about that here, where I called it "the clearest explanation...

Simon Willison's Weblog Supports Webmention

GitHub Copilot CLI is now in public preview

GitHub Copilot CLI is now in public preview GitHub now have their own entry in the coding terminal CLI agent space: Copilot CLI. It's the same basic shape as Claude Code, Codex CLI, Gemini CLI and a growing number of other tools in this space. It's a terminal UI which you ac...

Simon Willison's Weblog Supports Webmention

Improved Gemini 2.5 Flash and Flash-Lite

Improved Gemini 2.5 Flash and Flash-Lite Two new preview models from Google - updates to their fast and inexpensive Flash and Flash Lite families: The latest version of Gemini 2.5 Flash-Lite was trained and built based on three key themes: Better instruction following: Th...

Simon Willison's Weblog Supports Webmention

Don't hide your best documentation

If you hide the system prompt and tool descriptions for your LLM agent, what you're actually doing is deliberately hiding the most useful documentation describing your service from your most sophisticated users!

Tags: ai-agents, llms, ai, generative-ai

Simon Willison's Weblog Supports Webmention

Quoting Stanford CS221 Autumn 2025

[2 points] Learn basic NumPy operations with an AI tutor! Use an AI chatbot (e.g., ChatGPT, Claude, Gemini, or Stanford AI Playground) to teach yourself how to do basic vector and matrix operations in NumPy (import numpy as np). AI tutors have become exceptionally good at creating interactive tutorials, and this year in CS221, we're testing how they can help you learn fundamentals more interactively than traditional static exercises.

Stanford CS221 Autumn 2025, Problem 1: Linear Algebra

Tags: stanford, computer-science, education, ai, llms, python, numpy, generative-ai

Simon Willison's Weblog Supports Webmention

Cross-Agent Privilege Escalation: When Agents Free Each Other

Cross-Agent Privilege Escalation: When Agents Free Each Other Here's a clever new form of AI exploit from Johann Rehberger, who has coined the term Cross-Agent Privilege Escalation to describe an attack where multiple coding agents - GitHub Copilot and Claude Code for exampl...

Simon Willison's Weblog Supports Webmention

Qwen3-VL: Sharper Vision, Deeper Thought, Broader Action

Qwen3-VL: Sharper Vision, Deeper Thought, Broader Action I've been looking forward to this. Qwen 2.5 VL is one of the best available open weight vision LLMs, so I had high hopes for Qwen 3's vision models. Firstly, we are open-sourcing the flagship model of this series: Qwe...

Simon Willison's Weblog Supports Webmention

GPT-5-Codex

GPT-5-Codex OpenAI half-relased this model earlier this month, adding it to their Codex CLI tool but not their API. Today they've fixed that - the new model can now be accessed as gpt-5-codex. It's priced the same as regular GPT-5: $1.25/million input tokens, $10/million out...

Simon Willison's Weblog Supports Webmention

Why AI systems might never be secure

Why AI systems might never be secure The Economist have a new piece out about LLM security, with this headline and subtitle: Why AI systems might never be secure A “lethal trifecta” of conditions opens them to abuse I talked with their AI Writer Alex Hern for this piece. ...

Simon Willison's Weblog Supports Webmention

Quoting Kate Niederhoffer, Gabriella Rosen Kellerman, Angela Lee, Alex Liebscher, Kristina Rapuano and Jeffrey T. Hancock

We define workslop as AI generated work content that masquerades as good work, but lacks the substance to meaningfully advance a given task. Here’s how this happens. As AI tools become more accessible, workers are increasingly able to quickly produce polished output: well-f...

Simon Willison's Weblog Supports Webmention

Four new releases from Qwen

It's been an extremely busy day for team Qwen. Within the last 24 hours (all links to Twitter, which seems to be their preferred platform for these announcements): Qwen3-Next-80B-A3B-Instruct-FP8 and Qwen3-Next-80B-A3B-Thinking-FP8 - official FP8 quantized versions of thei...

Simon Willison's Weblog Supports Webmention

CompileBench: Can AI Compile 22-year-old Code?

CompileBench: Can AI Compile 22-year-old Code? Interesting new LLM benchmark from Piotr Grabowski and Piotr Migdał: how well can different models handle compilation challenges such as cross-compiling gucr for ARM64 architecture? This is one of my favorite applications of cod...

Simon Willison's Weblog Supports Webmention

ChatGPT Is Blowing Up Marriages as Spouses Use AI to Attack Their Partners

ChatGPT Is Blowing Up Marriages as Spouses Use AI to Attack Their Partners

Maggie Harrison Dupré for Futurism. It turns out having an always-available "marriage therapist" with a sycophantic instinct to always take your side is catastrophic for relationships.

The tension in the vehicle is palpable. The marriage has been on the rocks for months, and the wife in the passenger seat, who recently requested an official separation, has been asking her spouse not to fight with her in front of their kids. But as the family speeds down the roadway, the spouse in the driver’s seat pulls out a smartphone and starts quizzing ChatGPT’s Voice Mode about their relationship problems, feeding the chatbot leading prompts that result in the AI browbeating her wife in front of their preschool-aged children.

Tags: ai, generative-ai, chatgpt, llms, ai-ethics, ai-personality

Simon Willison's Weblog Supports Webmention

Locally AI

Locally AI

Handy new iOS app by Adrien Grondin for running local LLMs on your phone. It just added support for the new iOS 26 Apple Foundation model, so you can install this app and instantly start a conversation with that model without any additional download.

The app can also run a variety of other models using MLX, including embers of the Gemma, Llama 3.2, and and Qwen families.

Tags: apple, ios, ai, generative-ai, local-llms, llms, mlx

Simon Willison's Weblog Supports Webmention

llm-openrouter 0.5

llm-openrouter 0.5 New release of my LLM plugin for accessing models made available via OpenRouter. The release notes in full: Support for tool calling. Thanks, James Sanford. #43 Support for reasoning options, for example llm -m openrouter/openai/gpt-5 'prove dogs exist'...

Simon Willison's Weblog Supports Webmention

Grok 4 Fast

Grok 4 Fast New hosted reasoning model from xAI that's designed to be fast and extremely competitive on price. It has a 2 million token context window and "was trained end-to-end with tool-use reinforcement learning". It's priced at $0.20/million input tokens and $0.50/milli...

Simon Willison's Weblog Supports Webmention

Quoting Leaked Amazon memo

Amazonians, We've reviewed the Presidential Proclamation on H-1B visas that was released today and are actively working to gain greater clarity. Here's what you need to know right now: The proclamation creates a travel restriction starting September 21, 2025, at 12:01 a.m. ...

Simon Willison's Weblog Supports Webmention

httpjail

httpjail Here's a promising new (experimental) project in the sandboxing space from Ammar Bandukwala at Coder. httpjail provides a Rust CLI tool for running an individual process against a custom configured HTTP proxy. The initial goal is to help run coding agents like Claud...

Simon Willison's Weblog Supports Webmention

The Hidden Risk in Notion 3.0 AI Agents: Web Search Tool Abuse for Data Exfiltration

The Hidden Risk in Notion 3.0 AI Agents: Web Search Tool Abuse for Data Exfiltration Abi Raghuram reports that Notion 3.0, released yesterday, introduces new prompt injection data exfiltration vulnerabilities thanks to enabling lethal trifecta attacks. Abi's attack involves ...

Simon Willison's Weblog Supports Webmention

Magistral 1.2

Mistral quietly released two new models yesterday: Magistral Small 1.2 (Apache 2.0, 96.1 GB on Hugging Face) and Magistral Medium 1.2 (not open weights same as Mistral's other "medium" models.) Despite being described as "minor updates" to the Magistral 1.1 models these hav...

Simon Willison's Weblog Supports Webmention

Quoting Steve Jobs

Well, the types of computers we have today are tools. They’re responders: you ask a computer to do something and it will do it. The next stage is going to be computers as “agents.” In other words, it will be as if there’s a little person inside that box who starts to anticipate what you want. Rather than help you, it will start to guide you through large amounts of information. It will almost be like you have a little friend inside that box. I think the computer as an agent will start to mature in the late '80s, early '90s.

Steve Jobs, 1984 interview with Access Magazine (via)

Tags: agent-definitions, steve-jobs, computer-history

Simon Willison's Weblog Supports Webmention

I think "agent" may finally have a widely enough agreed upon definition to be useful jargon now

I've noticed something interesting over the past few weeks: I've started using the term "agent" in conversations where I don't feel the need to then define it, roll my eyes or wrap it in scare quotes. This is a big piece of personal character development for me! Moving forwa...

Simon Willison's Weblog Supports Webmention

Anthropic: A postmortem of three recent issues

Anthropic: A postmortem of three recent issues Anthropic had a very bad month in terms of model reliability: Between August and early September, three infrastructure bugs intermittently degraded Claude's response quality. We've now resolved these issues and want to explain ...

Simon Willison's Weblog Supports Webmention

ICPC medals for OpenAI and Gemini

In July it was the International Math Olympiad (OpenAI, Gemini), today it's the International Collegiate Programming Contest (ICPC). Once again, both OpenAI and Gemini competed with models that achieved Gold medal performance. OpenAI's Mostafa Rohaninejad: We received the p...

Simon Willison's Weblog Supports Webmention

Announcing the 2025 PSF Board Election Results!

Announcing the 2025 PSF Board Election Results!

I'm happy to share that I've been re-elected for second term on the board of directors of the Python Software Foundation.

Jannis Leidel was also re-elected and Abigail Dogbe and Sheena O’Connell will be joining the board for the first time.

Tags: python, psf