Snake-in-the-Box Problem
Jules, our asynchronous coding agent, is now available for everyone
I wrote about the Jules beta back in May. Google's version of the OpenAI Codex PR-submitting hosted coding tool graduated from beta today.I'm mainly linking to this now because I like the new term they are using in this blog entry: Asynchronous coding agent. I like it so much I gave it a tag.
I continue to avoid the term "agent" as infuriatingly vague, but I can grudgingly accept it when accompanied by a prefix that clarifies the type of agent we are talking about. "Asynchronous coding agent" feels just about obvious enough to me to be useful.
Via Hacker News
Tags: google, ai, generative-ai, llms, ai-assisted-programming, gemini, agent-definitions, asynchronous-coding-agents
From restitution to confronting authoritarian regimes, here are five ways museums can be more ethical. This list is from Gareth Harris, author of the book Towards the Ethical Art Museum.
“As a clinical psychologist, I was curious: Could ChatGPT function like a thinking partner? A therapist in miniature? I gave it three months to test the idea. A year later, I’m still using ChatGPT like an interactive journal.”
Qwen3-4B Instruct and Thinking
Yet another interesting model from Qwen—these are tiny compared to their other recent releases (just 4B parameters, 7.5GB on Hugging Face and even smaller when quantized) but with a 262,144 context length, which Qwen suggest is essential for all of those thinking tokens.The new model somehow beats the significantly larger Qwen3-30B-A3B Thinking on the AIME25 and HMMT25 benchmarks, according to Qwen’s self-reported scores.
The easiest way to try it on a Mac is via LM Studio, who already have their own MLX quantized versions out in
Song Exploder
• Hrishikesh Hirway
Fall Out Boy is a band from Chicago that formed in 2001. Their first album, Take This To Your Grave, was a hit, especially in the punk rock world. When they put out their second album, though, in 2005, that was on a whole other scale. That album is called From Under the Cork Tree, and it went double platinum, and they were nominated for a Grammy for Best New Artist. For this episode, I talked to the band’s singer, Patrick Stump, about how they made their breakout hit from that album, the song “Sugar, We’re Goin Down.”
For more info, visit songexploder.net/fall-out-boy.
gpt-oss-120b is the most intelligent American open weights model, comes behind DeepSeek R1 and Qwen3 235B in intelligence but offers efficiency benefits [...]
We’re seeing the 120B beat o3-mini but come in behind o4-mini and o3. The 120B is the most intelligent model that can be run on a single H100 and the 20B is the most intelligent model that can be run on a consumer GPU. [...]
While the larger gpt-oss-120b does not come in above DeepSeek R1 0528’s score of 59 or Qwen3 235B 2507s score of 64, it is notable that it is significantly smaller in both total and active parameters than both of those models.
— Artificial Analysis, see also their updated leaderboard
Tags: evals, openai, deepseek, ai, qwen, llms, gpt-oss, generative-ai
arstechnica.com/tech-policy/2025/07/meta-pirated-and-seeded-porn-for-years-to-train-ai-lawsuit-says/
theverge.com/news/718191/google-apple-intelligence-dunk-pixel-10-ad
Tom Warren:
In a new Pixel 10 ad, Google dunks on Apple’s failed promise of Siri AI improvements, with a narrator that suggests you could “just change your phone” if you bought “a new phone because of a feature that’s coming soon, but it’s been coming soon for a full year.”
The 30-second spot appeared on YouTube and X today, teasing the launch of Google’s new Pixel 10 devices on August 20th.
The whole Siri/Apple Intelligence thing has been an enormous self-inflicted embarrassment, but when it comes to Pixel phones, all I can think of is that Mad Men “I don’t think about you at all” GIF.
Link: theverge.com/news/718191/google-apple-intelligence-dunk…
macrumors.com/2025/06/20/apple-discussing-perplexity-ai-bid/