Django 6.0 released
Django 6.0 includes a flurry of neat features, but the two that most caught my eye are background workers and template partials.
Background workers started out as DEP (Django Enhancement Proposal) 14, proposed and shepherded by Jake Howard. Jake prototype...
I take tap dance evening classes at the College of San Mateo community college. A neat bonus of this is that I'm now officially a student of that college, which gives me access to their library... including the ability to send text messages to the librarians asking for help ...
Since the beginning of the project in 2023 and the private beta days of Ghostty, I've repeatedly expressed my intention that Ghostty legally become a non-profit. [...]
I want to squelch any possible concerns about a "rug pull". A non-profit structure provides enforceable as...
TIL: Dependency groups and uv run
I wrote up the new pattern I'm using for my various Python project repos to make them as easy to hack on with uv as possible. The trick is to use a PEP 735 dependency group called dev, declared in pyproject.toml like this:
[dependency-groups...
Anthropic acquires Bun
Anthropic just acquired the company behind the Bun JavaScript runtime, which they adopted for Claude Code just in July. Their announcement includes an impressive revenue update on Claude Code:
In November, Claude Code achieved a significant milestone:...
Introducing Mistral 3
Four new models from Mistral today: three in their "Ministral" smaller model series (14B, 8B, and 3B) and a new Mistral Large 3 MoE model with 675B parameters, 41B active.
All of the models are vision capable, and they are all released under an Apache 2...
Claude 4.5 Opus' Soul Document
Richard Weiss managed to get Claude 4.5 Opus to spit out this 14,000 token document which Claude called the "Soul overview". Richard says:
While extracting Claude 4.5 Opus' system message on its release date, as one does, I noticed an interest...
DeepSeek-V3.2
Two new open weight (MIT licensed) models from DeepSeek today: DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, both 690GB, 685B parameters. Here's the PDF tech report.
DeepSeek-V3.2 is DeepSeek's new flagship model, now running on chat.deepseek.com.
The difference be...
More than half of the teens surveyed believe journalists regularly engage in unethical behaviors like making up details or quotes in stories, paying sources, taking visual images out of context or doing favors for advertisers. Less than a third believe reporters correct their errors, confirm facts before reporting them, gather information from multiple sources or cover stories in the public interest — practices ingrained in the DNA of reputable journalists.
— David Bauder, AP News, A lost generation of news consumers? Survey shows how teenagers dislike the news media
YouTube embeds fail with a 153 error
I just fixed this bug on my blog. I was getting an annoying "Error 153: Video player configuration error" on some of the YouTube video embeds (like this one) on this site. After some digging it turns out the culprit was this HTTP header, ...
I am increasingly worried about AI in the video game space in general. [...] I'm not sure that the CEOs and the people making the decisions at these sorts of companies understand the difference between actual content and slop. [...]
It's exactly the same cryolab, it's exact...
It's ChatGPT's third birthday today.
It's fun looking back at Sam Altman's low key announcement thread from November 30th 2022:
today we launched ChatGPT. try talking with it here:
chat.openai.com
language interfaces are going to be a big deal, i think. talk to the compute...
The most annoying problem is that the [GitHub] frontend barely works without JavaScript, so we cannot open issues, pull requests, source code or CI logs in Dillo itself, despite them being mostly plain HTML, which I don't think is acceptable. In the past, it used to gracefully degrade without enforcing JavaScript, but now it doesn't.
Matt Webb coins the term context plumbing to describe the kind of engineering needed to feed agents the right context at the right time:
Context appears at disparate sources, by user activity or changes in the user’s environment: what they’re working on changes, emails appear, documents are edited, it’s no longer sunny outside, the available tools have been updated.
This context is not always where the AI runs (and the AI runs as closer as possible to the point of user intent).
So the job of making an agent run really well is to move the context to where it needs to be. [...]
So I’ve been thinking of AI system technical architecture as plumbing the sources and sinks of context.
Large language models (LLMs) can be useful tools, but they are not good at creating entirely new Wikipedia articles. Large language models should not be used to generate new Wikipedia articles from scratch.
In June 2025 Sam Altman claimed about ChatGPT that "the average query uses about 0.34 watt-hours".
In March 2020 George Kamiya of the International Energy Agency estimated that "streaming a Netflix video in 2019 typically consumed 0.12-0.24kWh of electricity per hour" - that...
I've been having a lot of fun hacking on my Bluesky Thread Viewer JavaScript tool with Claude Code recently. Here it renders a thread (complete with demo video) talking about the latest improvements to the tool itself.
I've been mostly vibe-coding this thing since April, now spanning 15 commits with contributions from ChatGPT, Claude, Claude Code for Web and Claude Code on my laptop. Each of those commits links to the transcript that created the changes in the commit.
Bluesky is a lot of fun to build tools like this against because the API supports CORS (so you can talk to it from an HTML+JavaScript page hosted anywhere) and doesn't require authentication.
To evaluate the model’s capability in processing long-context inputs, we construct a video “Needle-in-
a-Haystack” evaluation on Qwen3-VL-235B-A22B-Instruct. In this task, a semantically salient “needle”
frame—containing critical visual evidence—is inserted at varying tempo...
New on Hugging Face, a specialist mathematical reasoning LLM from DeepSeek. This is their entry in the space previously dominated by proprietary models from OpenAI and Google DeepMind, both of which achieved gold medal scores on the International Mathematical Olympiad earlier this year.
We now have an open weights (Apache 2 licensed) 685B, 689GB model that can achieve the same. From the accompanying paper:
DeepSeekMath-V2 demonstrates strong performance on competition mathematics. With scaled test-time compute, it achieved gold-medal scores in high-school competitions including IMO 2025 and CMO 2024, and a near-perfect score on the undergraduate Putnam 2024 competition.
I talked with CL Kao and Dori Wilson for an episode of their new Data Renegades podcast titled Data Journalism Unleashed with Simon Willison.
I fed the transcript into Claude Opus 4.5 to extract this list of topics with timestamps and illustrative quotes. It did such a good ...
Google Antigravity Exfiltrates Data
PromptArmor demonstrate a concerning prompt injection chain in Google's new Antigravity IDE:
In this attack chain, we illustrate that a poisoned web source (an integration guide) can manipulate Gemini into (a) collecting sensitive credent...
Constant-time support lands in LLVM: Protecting cryptographic code at the compiler level
Substantial LLVM contribution from Trail of Bits. Timing attacks against cryptography algorithms are a gnarly problem: if an attacker can precisely time a cryptographic algorithm they ca...
This took longer to release than I had hoped because it was blocked on Anthropic shipping 0.75.0 of their Python library with support for thinking effort.
LLM SVG Generation Benchmark
Here's a delightful project by Tom Gally, inspired by my pelican SVG benchmark. He asked Claude to help create more prompts of the form Generate an SVG of [A] [doing] [B] and then ran 30 creative prompts against 9 frontier models - prompts like "an octopus operating a pipe organ" or "a starfish driving a bulldozer".
Here are some for "butterfly inspecting a steam engine":
And for "sloth steering an excavator":
It's worth browsing the whole collection, which gives a really good overall indication of which models are the best at SVG art.
Via tkgally on Hacker News
Tags: benchmarks, svg, ai, generative-ai, llms, evals, pelican-riding-a-bicycle, tom-gally
If the person is unnecessarily rude, mean, or insulting to Claude, Claude doesn't need to apologize and can insist on kindness and dignity from the person it’s talking with. Even if someone is frustrated or unhappy, Claude is deserving of respectful engagement.
Anthropic released Claude Opus 4.5 this morning, which they call "best model in the world for coding, agents, and computer use". This is their attempt to retake the crown for best coding model after significant challenges from OpenAI's GPT-5.1-Codex-Max and Google's Gemini 3...
sqlite-utils 3.39
I got a report of a bug in sqlite-utils concerning plugin installation - if you installed the package using uv tool install further attempts to install plugins with sqlite-utils install X would fail, because uv doesn't bundle pip by default. I had the same ...
I released a new alpha version of sqlite-utils last night - the 128th release of that package since I started building it back in 2018.
sqlite-utils is two things in one package: a Python library for conveniently creating and manipulating SQLite databases and a CLI tool for ...
"Good engineering management" is a fad
Will Larson argues that the technology industry's idea of what makes a good engineering manager changes over time based on industry realities. ZIRP hypergrowth has been exchanged for a more cautious approach today, and expectations of m...