Solomon Hykes just presented the best definition of an AI agent I've seen yet, on stage at the AI Engineer World's Fair:
An AI agent is an LLM wrecking its environment in a loop.
I collect AI agent definitions and I really like this how this one combines the currently popular "tools in a loop" one (see Anthropic) with the classic academic definition that I think dates back to at least the 90s:
An agent is something that acts in an environment; it does something. Agents include worms, dogs, thermostats, airplanes, robots, humans, companies, and countries.
OpenAI slams court order to save all ChatGPT logs, including deleted chats
This is very worrying. The New York Times v OpenAI lawsuit, now in its 17th month, includes accusations that OpenAI's models can output verbatim copies of New York Times content - both from training d...
Guilherme Rambo reports on a weird iOS messages bug:
The bug is that, if you try to send an audio message using the Messages app to someone who’s also using the Messages app, and that message happens to include the name “Dave and Buster’s”, the message will never be received.
Guilherme captured the logs from an affected device and spotted an XHTMLParseFailure error.
It turned out the iOS automatic transcription mechanism was recognizing the brand name and converting it to the official restaurant chain's preferred spelling "Dave & Buster’s"... which was then incorrectly escaped and triggered a parse error!
PR #537: Fix Markdown in og descriptions
Since OpenAI Codex is now available to us ChatGPT Plus subscribers I decided to try it out against my blog.
It's a very nice implementation of the GitHub-connected coding "agent" pattern, as also seen in Google's Jules and Microsoft's...
Codex agent internet access
Sam Altman, just now:
codex gets access to the internet today! it is off by default and there are complex tradeoffs; people should read about the risks carefully and use when it makes sense.
This is the Codex "cloud-based software engineering ag...
We're hosting the sixth in our series of Datasette Public Office Hours livestream sessions this Friday, 6th of June at 2pm PST (here's that time in your location).
The topic is going to be tool support in LLM, as introduced here.
I'll be walking through the new features, and we're also inviting five minute lightning demos from community members who are doing fun things with the new capabilities. If you'd like to present one of those please get in touch via this form.
Back in March New Scientist reported on a successful Freedom of Information request they had filed requesting UK Secretary of State for Science, Innovation and Technology Peter Kyle's ChatGPT logs:
New Scientist has obtained records of Kyle’s ChatGPT use under the Freedom o...
Anthony Lewis published this neat, concise tutorial on using my LLM tool to run local models on your own machine, using llm-mlx.
An under-appreciated way to contribute to open source projects is to publish unofficial guides like this one. Always brightens my day when something like this shows up.
By making effort an optional factor in higher education rather than the whole point of it, LLMs risk producing a generation of students who have simply never experienced the feeling of focused intellectual work. Students who have never faced writer's block are also students who have never experienced the blissful flow state that comes when you break through writer's block. Students who have never searched fruitlessly in a library for hours are also students who, in a fundamental and distressing way, simply don't know what a library is even for.
— Benjamin Breen, AI makes the humanities more important, but also a lot weirder
Shisa V2 405B: Japan’s Highest Performing LLM
Leonard Lin and Adam Lensenmayer have been working on Shisa for a while. They describe their latest release as "Japan's Highest Performing LLM".
Shisa V2 405B is the highest-performing LLM ever developed in Japan, and surpasses ...
My AI Skeptic Friends Are All Nuts
Thomas Ptacek's frustrated tone throughout this piece perfectly captures how it feels sometimes to be an experienced programmer trying to argue that "LLMs are actually really useful" in many corners of the internet.
Some of the smartest pe...
Tom MacWright does some archaeology and describes the three different magic comment formats that can affect how JavaScript/TypeScript files are processed:
"a directive"; is a directive prologue, most commonly seen with "use strict";.
/** @aPragma */ is a pragma for a transpiler, often used for /** @jsx h */.
//# aMagicComment is usually used for source maps - //# sourceMappingURL=<url> - but also just got used by v8 for their new explicit compile hints feature.
It took me a few days to build the library [cloudflare/workers-oauth-provider] with AI.
I estimate it would have taken a few weeks, maybe months to write by hand.
That said, this is a pretty ideal use case: implementing a well-known standard on a well-known platform with a ...
claude-trace
I've been thinking for a while it would be interesting to run some kind of HTTP proxy against the Claude Code CLI app and take a peek at how it works.
Mario Zechner just published a really nice version of that. It works by monkey-patching global.fetch and the No...
My constant struggle is how to convince them that getting an education in the humanities is not about regurgitating ideas/knowledge that already exist. It’s about generating new knowledge, striving for creative insights, and having thoughts that haven’t been had before. I don’t want you to learn facts. I want you to think. To notice. To question. To reconsider. To challenge. Students don’t yet get that ChatGPT only rearranges preexisting ideas, whether they are accurate or not.
And even if the information was guaranteed to be accurate, they’re not learning anything by plugging a prompt in and turning in the resulting paper. They’ve bypassed the entire process of learning.
Progressive JSON
This post by Dan Abramov is a trap! It proposes a fascinating way of streaming JSON objects to a client in a way that provides the shape of the JSON before the stream has completed, then fills in the gaps as more data arrives... and then turns out to be a sn...
A fun new benchmark just dropped! Inspired by the Claude 4 system card - which showed that Claude 4 might just rat you out to the authorities if you told it to "take initiative" in enforcing its morals values while exposing it to evidence of malfeasance - Theo Browne built a...
deepseek-ai/DeepSeek-R1-0528
Sadly the trend for terrible naming of models has infested the Chinese AI labs as well.
DeepSeek-R1-0528 is a brand new and much improved open weights reasoning model from DeepSeek, a major step up from the DeepSeek R1 they released back in Janua...
If you've found web development frustrating over the past 5-10 years, here's something that has worked worked great for me: give yourself permission to avoid any form of frontend build system (so no npm / React / TypeScript / JSX / Babel / Vite / Tailwind etc) and code in HTML and JavaScript like it's 2009.
The joy came flooding back to me! It turns out browser APIs are really good now.
You don't even need jQuery to paper over the gaps any more - use document.querySelectorAll() and fetch() directly and see how much value you can build with a few dozen lines of code.
There's a new kind of coding I call "hype coding" where you fully give into the hype, and what's coming right around the corner, that you lose sight of whats' possible today. Everything is changing so fast that nobody has time to learn any tool, but we should aim to use as m...
Using voice mode on Claude Mobile Apps
Anthropic are rolling out voice mode for the Claude apps at the moment. Sadly I don't have access yet - I'm looking forward to this a lot, I frequently use ChatGPT's voice mode when walking the dog and it's a great way to satisfy my cur...
I was interviewed by News Nation's Natasha Zouves about the very complicated topic of how we should think about AI in terms of threatening our jobs and careers. I previously talked with Natasha two years ago about Microsoft Bing.
I'll be honest: I was nervous about this one...
Saying Bye to Glitch
Pirijan, co-creator of Glitch - who stopped working on it six years ago, so has the benefit of distance:
Here lies Glitch, a place on the web you could go to write up a website or a node.js server that would be hosted and updated as you type. 🥀 RIP 2015...
llm-github-models 0.15
Anthony Shaw's llm-github-models plugin just got an upgrade: it now supports LLM 0.26 tool use for a subset of the models hosted on the GitHub Models API, contributed by Caleb Brose.
The neat thing about this GitHub Models plugin is that it picks up an...
llm-mistral 0.14
I added tool-support to my plugin for accessing the Mistral API from LLM today, plus support for Mistral's new Codestral Embed embedding model.
An interesting challenge here is that I'm not using an official client library for llm-mistral - I rolled my own c...
llm-tools-exa
When I shipped LLM 0.26 yesterday one of the things I was most excited about was seeing what new tool plugins people would build for it.
Dan Turkel's llm-tools-exa is one of the first. It adds web search to LLM using Exa (previously), a relatively new search en...
I wonder if one of the reasons I'm finding LLMs so much more useful for coding than a lot of people that I see in online discussions is that effectively all of the code I work on has automated tests.
I've been trying to stay true to the idea of a Perfect Commit - one that bu...
Codestral Embed
Brand new embedding model from Mistral, specifically trained for code. Mistral claim that:
Codestral Embed significantly outperforms leading code embedders in the market today: Voyage Code 3, Cohere Embed v4.0 and OpenAI’s large embedding model.
The model i...