Here's a quick demo of the kind of casual things I use LLMs for on a daily basis.
I just found out that Perplexity offer their Deep Research feature via their API, through a model called Sonar Deep Research.
Their documentation includes an example response, which included th...
AI Hallucination Cases
Damien Charlotin maintains this database of cases around the world where a legal decision has been made that confirms hallucinated content from generative AI was presented by a lawyer.
That's an important distinction: this isn't just cases where AI may...
GitHub issues is almost the best notebook in the world.
Free and unlimited, for both public and private notes.
Comprehensive Markdown support, including syntax highlighting for almost any language. Plus you can drag and drop images or videos directly onto a note.
It has fant...
GitHub Issues search now supports nested queries and boolean operators: Here’s how we (re)built it
GitHub Issues got a significant search upgrade back in January. Deborah Digges provides some behind the scene details about how it works and how they rolled it out.
The signatu...
Luis von Ahn on LinkedIn
Last month's Duolingo memo about becoming an "AI-first" company has seen significant backlash, particularly on TikTok. I've had trouble figuring out how much of this is a real threat to their business as opposed to protests from a loud minority, but ...
CSS Minecraft
Incredible project by Benjamin Aster:
There is no JavaScript on this page. All the logic is made 100% with pure HTML & CSS. For the best performance, please close other tabs and running programs.
The page implements a full Minecraft-style world editor: yo...
GitHub MCP Exploited: Accessing private repositories via MCP
GitHub's official MCP server grants LLMs a whole host of new abilities, including being able to read and issues in repositories the user has access to and submit new pull requests.
This is the lethal trifecta for p...
Build AI agents with the Mistral Agents API
Big upgrade to Mistral's API this morning: they've announced a new "Agents API". Mistral have been using the term "agents" for a while now. Here's how they describe them:
AI agents are autonomous systems powered by large language ...
LLM 0.26 is out with the biggest new feature since I started the project: support for tools. You can now use the LLM CLI tool - and Python library - to grant LLMs from OpenAI, Anthropic, Gemini and local models from Ollama with access to any tool that you can represent as a ...
At Amazon, Some Coders Say Their Jobs Have Begun to Resemble Warehouse Work
I got a couple of quotes in this NYTimes story about internal resistance to Amazon's policy to encourage employees to make use of more generative AI:
“It’s more fun to write code than to read code,”...
llm-llama-server 0.2
Here's a second option for using LLM's new tool support against local models (the first was via llm-ollama).
It turns out the llama.cpp ecosystem has pretty robust OpenAI-compatible tool support already, so my llm-llama-server plugin only needed a quick ...
Anthropic publish most of the system prompts for their chat models as part of their release notes. They recently shared the new prompts for both Claude Opus 4 and Claude Sonnet 4. I enjoyed digging through the prompts, since they act as a sort of unofficial manual for how be...
Subscribe to my sponsors-only monthly newsletter
I’ve never liked the idea of charging for my content. I get enormous value from putting all of my writing and research out there for free.
So I’m trying something a little different: pay me to send you less.
I’m starting a sp...
System Card: Claude Opus 4 & Claude Sonnet 4
Direct link to a PDF on Anthropic's CDN because they don't appear to have a landing page anywhere for this document.
Anthropic's system cards are always worth a look, and this one for the new Opus 4 and Sonnet 4 has some parti...
How I used o3 to find CVE-2025-37899, a remote zeroday vulnerability in the Linux kernel’s SMB implementation
Sean Heelan:
The vulnerability [o3] found is CVE-2025-37899 (fix here), a use-after-free in the handler for the SMB 'logoff' command. Understanding the vulnerabilit...
f2
Really neat CLI tool for bulk renaming of files and directories by Ayooluwa Isaiah, written in Go and designed to work cross-platform.
There's a lot of great design in this. Basic usage is intuitive - here's how to rename all .svg files to .tmp.svg in the current director...
I'm helping make some changes to a large, complex and very unfamiliar to me WordPress site. It's a perfect opportunity to try out Claude Code running against the new Claude 4 models.
It's going extremely well. So far Claude has helped get MySQL working on an older laptop (fixing some inscrutable Homebrew errors), disabled a CAPTCHA plugin that didn't work on localhost, toggled visible warnings on and off several times and figured out which CSS file to modify in the theme that the site is using. It even took a reasonable stab at making the site responsive on mobile!
I'm now calling Claude Code honey badger on account of its voracious appetite for crunching through code (and tokens) looking for the right thing to fix.
Remote Prompt Injection in GitLab Duo Leads to Source Code Theft
Yet another example of the classic Markdown image exfiltration attack, this time affecting GitLab Duo - GitLab's chatbot.
Omer Mayraz reports on how they found and disclosed the issue. The first part of this is...
I was going slightly spare at the fact that every talk at this Anthropic developer conference has used the word "agents" dozens of times, but nobody ever stopped to provide a useful definition.
I'm now in the "Prompting for Agents" workshop and Anthropic's Hannah Moran finally broke the trend by saying that at Anthropic:
Agents are models using tools in a loop
I can live with that! I'm glad someone finally said it out loud.
Updated Anthropic model comparison table
A few details in here about Claude 4 that I hadn't spotted elsewhere:
The training cut-off date for Claude Opus 4 and Claude Sonnet 4 is March 2025! That's the most recent cut-off for any of the current popular models, really impress...
llm-anthropic 0.16
New release of my LLM plugin for Anthropic adding the new Claude 4 Opus and Sonnet models.
You can see pelicans on bicycles generated using the new plugin at the bottom of my live blog covering the release.
I also released llm-anthropic 0.16a1 which works ...
If your library doesn't have any documentation, it can't have any bugs.
Documentation specifies what your code is supposed to do. Your tests specify what it actually does.
Bugs exist when your test-enforced implementation fails to match the behavior described in your documentation. Without documentation a bug is just undefined behavior.
If you aim to follow semantic versioning you bump your major version when you release a backwards incompatible change. Such changes cannot exist if your code is not comprehensively documented!
Inspired by a half-remembered conversation I had with Tom Insam many years ago.
Devstral
New Apache 2.0 licensed LLM release from Mistral, this time specifically trained for code.
Devstral achieves a score of 46.8% on SWE-Bench Verified, outperforming prior open-source SoTA models by more than 6% points. When evaluated under the same test scaffold (Ope...
Gemini Diffusion
Another of the announcements from Google I/O yesterday was Gemini Diffusion, Google's first LLM to use diffusion (similar to image models like Imagen and Stable Diffusion) in place of transformers.
Google describe it like this:
Traditional autoregressive la...
Chicago Sun-Times Prints AI-Generated Summer Reading List With Books That Don't Exist
Classic slop: it listed real authors with entirely fake books.
There's an important follow-up from 404 Media in their subsequent story:
Victor Lim, the vice president of marketing and comm...
Last month ChatGPT got a major upgrade. As far as I can tell the closest to an official announcement was this tweet from @OpenAI:
Starting today [April 10th 2025], memory in ChatGPT can now reference all of your past chats to provide more personalized responses, drawing on ...
We did the math on AI’s energy footprint. Here’s the story you haven’t heard.
James O'Donnell and Casey Crownhart try to pull together a detailed account of AI energy usage for MIT Technology Review.
They quickly run into the same roadblock faced by everyone else who's tried...
Gemini 2.5: Our most intelligent models are getting even better
A bunch of new Gemini 2.5 announcements at Google I/O today.
2.5 Flash and 2.5 Pro are both getting audio output (previously previewed in Gemini 2.0) and 2.5 Pro is getting an enhanced reasoning mode called "Dee...