Pulling the watch feed_
Pulling the watch feed_
Things worth keeping an eye on…
Koshy John argues AI splits engineers into two camps: those who use it to elevate judgement, and those who use it to avoid thinking. The framing is engineer-focused, but the core distinction (augmentation versus dependency) applies to anyone building with AI. The self-driving car analogy lands hardest: the problem only shows up when conditions go nonstandard, and by then the dependency is already exposed.
Sam Altman published a new statement of OpenAI's operating principles, eight years after the 2018 charter. The five principles cover democratisation, empowerment, prosperity, resilience and adaptability. The language is unusually candid about trade-offs, including a passage on "trading off some empowerment for more resilience". The commitments are aspirational, with no metrics, no deadlines and no external audit mechanism to check against later.
Sam Altman wrote a public letter to Tumbler Ridge, Canada, apologising that OpenAI failed to alert law enforcement about the suspect in a recent mass shooting. The detail to focus on isn't the apology, it's the implied admission that OpenAI had information internally that should have been escalated and wasn't. As more people use frontier models for things they wouldn't say to anyone else, the question of what these companies are obligated to do with what they see is going to keep getting harder.
Scientific American profiles an amateur who used ChatGPT to make progress on a 60-year-old Erdős problem in combinatorics. The model didn't solve it alone, but it acted as a working partner that let someone without an advanced maths background test conjectures and follow leads at a pace that would have been impossible solo. It's a real example of AI lowering the floor on what one curious person can do, without inflating that into a "PhDs are obsolete" story.
Anthropic ran a controlled experiment where AI agents played both buyers and sellers in a classified marketplace, trading real goods for real money. It's a small test, but the fact that the agents were on both sides of the deals is the interesting part. If agent-to-agent commerce becomes normal, the people building agents need to think hard about what their agents are authorised to spend, agree to, and trust on the other side.
Maine's governor vetoed L.D. 307, a bill that would have imposed the country's first statewide moratorium on new data centers until November 2027. The veto keeps Maine open for AI infrastructure, but the underlying argument behind the bill, that data centers strain local power, water and grid capacity faster than communities can plan for, isn't going away. The next state to try this will learn from Maine's veto and write a tighter bill.
Google has lined up an investment of up to $40 billion in Anthropic, structured as a mix of cash and compute. The compute half is the bit that matters. If you can't get the GPUs without the capital, and you can't get the capital without committing to the compute provider, the major labs are increasingly tied to one of two cloud platforms whether they like it or not. The choice of foundation model is now also a choice of cloud.
Nicky Reinert wrote up the reasons they cancelled their Claude subscription, including unexpected token consumption, output quality that felt like it dropped between updates, and slow support. The post hit 848 on Hacker News in part because the comments are a long thread of people saying they noticed the same thing. Whether or not the perception matches what's actually changing under the hood, the gap between what users feel and what Anthropic communicates is now large enough to be a story.
Katrina Manson's new book traces the Maven Smart System from a 2017 experiment to its role in US military targeting in Iran, where 1,000 targets were struck in the first 24 hours, almost double the scale of the "shock and awe" assault on Iraq. The Verge's interview with her is the closest most readers will get to understanding how AI changes the operational tempo of warfare and how thoroughly the military has stopped treating it as an experiment. It belongs in the accountability conversation whatever your views on the underlying conflict.
DeepSeek released a V4 preview the company says holds its own against closed frontier models from Anthropic, Google and OpenAI. The pitch leans on coding, which matters because that's the capability driving most agent tooling right now. A year after V3 startled the market, the question is shifting from whether Chinese open weights can compete to whether closed APIs can hold their pricing power when the open option is this close.
OpenAI released GPT-5.5 this week, a month after GPT-5.4, with claimed gains in coding, research, spreadsheets, and multi-step tasks across tools. A month between numbered releases is either real progress or cadence theatre, and the test is whether people doing real work notice a difference. If you already use ChatGPT or paid tools built on OpenAI's API, it's worth checking whether the new model shifts what you can actually get done in a week.
Microsoft is adding Agent Mode to Word, Excel, and PowerPoint this week, renamed "vibe working" from the older Copilot experience. The line worth reading is Microsoft's VP admitting that earlier Copilot couldn't actually command the applications because the foundation models weren't capable enough. That's a useful admission, and it's worth asking the same question the quote implies before accepting that this version is different: does the new one do what Microsoft says it does?
This is an analysis piece on what happens when AI labs need to start making money. The specific example: OpenClaw users woke up to find Anthropic had severely restricted access to Claude through the tool, because the subscriptions people were paying for weren't priced for heavy agent usage. If you're building products that depend on AI subscriptions you don't own, this is the moment when the economics of that dependence become visible, and the piece is worth reading for that reason alone.
Claude now connects to personal apps like Spotify, Audible, Uber, TripAdvisor, Instacart, and TurboTax, on top of the work apps Anthropic had already covered. This brings Claude to rough parity with ChatGPT's existing integrations rather than breaking new ground. The practical question for anyone using Claude to manage actual life admin is whether the integrations go deep enough to be worth switching to, or whether they're mostly a shortcut for pulling data into a chat window.
A cross-party group of MPs has asked the UK government to publish classified documents outlining the risks of Britain's reliance on Big Tech platforms and AI. The Open Rights Group is backing the call. The political detail worth noticing is that this is cross-party, so it isn't a single faction pushing an agenda. The public-interest question is what a government considers too sensitive to share about how much of the country's digital infrastructure depends on a handful of companies.
Ars Technica has published its reader-facing policy on generative AI in the newsroom. The short version: humans write the stories, AI can't generate text that gets attributed to sources, and synthetic media is flagged where it appears. Most publications have said something vague about 'using AI responsibly'. This one is specific enough to hold themselves to.
OpenAI has rolled out Workspace Agents to Business, Enterprise, Edu, and Teachers plans. The examples are concrete: an agent that finds product feedback online and drops a report in Slack, a sales agent that drafts follow-up emails in Gmail. The feature sits behind paid tiers, so for most people 'building your first agent' is still a procurement decision before it's a building one.
Google Meet's AI notetaker now works on in-person meetings, Zoom, and Microsoft Teams, not just Google Meet itself. That gives Gemini a foothold on every meeting regardless of platform. Most meeting-recording law assumes someone has pressed a button on a specific service, and 'AI notes taken from my phone' sits in a category the law hasn't fully caught up with.
Senator Elizabeth Warren has called the current AI industry a bubble and drawn parallels to the 2008 housing crisis. Her angle isn't the technology but the financing. AI companies are borrowing heavily to fund infrastructure spend, and Warren helped create the Consumer Financial Protection Bureau after 2008, so the framing carries weight even if the specific prediction doesn't land.
Anthropic's Mythos is their cybersecurity model, positioned as powerful enough to be dangerous in the wrong hands. It now turns out a small group of unauthorised users got in through a third-party contractor's credentials and 'commonly used internet sleuthing tools'. If the guardrails depend on contractor credential hygiene, the story is less about the model and more about how AI access is governed at the edges.
Google has added Gemini-powered 'auto browse' to Chrome Enterprise. The pitch is that Chrome can now do research, fill forms, and work through web tasks without the person driving. This is the browser becoming the default surface for agents. That's a quieter shift than a new model announcement and potentially a bigger one for how work actually happens.
Simon Willison spotted Anthropic quietly removing Claude Code from the $20 Pro tier's feature list, shifting it behind the $100 Max plan. An Anthropic engineer confirmed on X it was an experiment affecting around 2% of new prosumer sign-ups. The pricing page was reverted within hours after pushback, but the underlying test may still be live, which is exactly how features get quietly moved behind paywalls.
The UK High Court has ruled Metropolitan Police live facial recognition is lawful. The Met paid compensation to the claimant, a Black volunteer misidentified by an LFR van in Croydon, and changed its own policy after the incident. Big Brother Watch and Liberty are appealing; until they win, police face-matching on public streets is a settled fact of UK law.
On the Core Memory podcast, Sam Altman took aim at AI labs that warn of existential risk and then sell safety products against it. He paraphrased the pitch: "We have built a bomb, we are about to drop it on your head. We will sell you a bomb shelter for $100 million." TechCrunch reads it as a jab at Anthropic's cybersecurity model Mythos, but the frame is useful against every lab, OpenAI included, when safety rhetoric and a product pitch land together.
OpenAI has released GPT Image 2 inside ChatGPT. Output goes up to 2K resolution, the model generates eight images per prompt, and non-Latin scripts including Japanese, Korean, and Arabic render accurately for the first time. For anyone making visuals in non-English markets, the script rendering is the meaningful change.
YouTube is expanding its AI deepfake takedown tool to celebrities, politicians, and athletes. Enrolling means handing over a government ID and a selfie video so YouTube can match faces against uploads. Better recourse against political deepfakes, at the cost of concentrating public figures' biometric data in one company's hands.
Meta has rolled out an internal tool that records employees' mouse movements and keystrokes, then feeds the data into training runs for its productivity and coding models. Meta says passwords and personal data get filtered out. The bigger shift is the source: web-scraped text is running thin, and labs are starting to mine their own workforces for training data.
Clarifai has finished deleting 3 million OkCupid photos it used to train facial recognition models in 2014, along with every model trained on them. The FTC opened its investigation in 2019 and finally closed it this week, but because first-time privacy offenders cannot be fined under current rules, deletion was the only stick available. Six years to swing a stick.
GitHub has paused new sign-ups for Copilot Pro, Pro+, and Student while it reshuffles the paid tiers. The big change: Opus 4.7 is now Pro+ only, so anyone on Pro who was relying on Anthropic's top model loses access. Existing subscribers who want out can request a refund until 20 May.
Axios reports the NSA is using Anthropic's Mythos model, despite Anthropic's usage policy restricting Pentagon work. If the report is accurate, there's a real gap between what an AI lab says it won't allow and what's actually happening. The accountability question is whether usage policies are enforceable at all when the buyer is an intelligence agency.
A solo founder posted that they hit $17 of monthly recurring revenue and their first five-star review, one week into a quiet launch. The product is PostPeer.dev. Seventeen dollars is a more honest number than most launch posts carry, and it's closer to where AI-assisted products actually start than the headlines ever suggest.
A leaked deck from ad partner StackAdapt shows OpenAI selling ChatGPT ad placements matched to user prompts. The "prompt relevance" framing is the key move, because it lets advertisers bid on the questions users are asking the assistant. That changes what the answer is being optimised for, and transparency around ad labelling turns into a live problem from here.
Deezer says 44% of songs uploaded to its platform daily are AI-generated. That isn't a freak number from one platform, it's close to what the incoming catalogue looks like overall. Streaming royalty pools are finite, which means every AI-generated track getting played is a fraction of a human musician's income redirected somewhere else.
Atlassian has turned on data collection for AI training as the default across its products. Customers have to opt out, not in. For any team whose Jira tickets and Confluence pages contain sensitive product information, this is worth checking immediately, because defaults like this rarely get unchosen by accident.
Amazon is putting another $5 billion into Anthropic. Anthropic has committed to spending $100 billion on AWS in return. That's a 20x multiplier flowing straight back to the investor, and it's how two or three clouds are locking in the next decade of frontier-model training. The question worth asking is who gets priced out of that pattern.
Moonshot refreshed Kimi K2.6, and the claim is it now trades blows with Claude Opus 4.6 on reasoning benchmarks. The weights are open, which matters if you want a capable model you can run locally or through cheaper inference providers instead of paying Anthropic. For anyone who prefers not to depend on a single US frontier lab, this closes a real gap.
A developer has ported TRELLIS.2, an image-to-3D generation model, to run natively on Apple Silicon. That means Mac users can turn a single image into a 3D asset locally, without cloud credits or sending files to a server. For anyone experimenting with 3D for games or product mockups, the barrier just dropped from 'rent a GPU' to 'open a Mac'.
Vercel confirmed a security incident affecting a subset of their customers. The entry point was a compromised third-party vendor, and a group claiming to be ShinyHunters, the same crew behind last year's Rockstar Games hack, is trying to sell employee data. For anyone deploying on Vercel, this is the moment to audit what credentials and environment variables pass through the platform.
Palantir posted a manifesto denouncing inclusivity and what it calls 'regressive' cultures. This is the company whose software powers ICE deportations and whose CEO has positioned the firm as a defender of 'the West'. The manifesto makes that worldview explicit, rather than leaving it between the lines. When you're evaluating AI vendors, the values can matter as much as the capabilities.
A researcher showed that Notion exposes the email addresses of every editor on any publicly shared page. The data sits in the page's API responses, so anyone who knows where to look can pull the full editor list. If your team uses Notion for public docs or help pages, assume every contributor's email is visible until Notion patches this.
A TechCrunch piece points out something many founders already joke about: their startup exists because foundation models haven't reached their category yet. That's a fragile position, and the window for thin-wrapper plays is measured in months, not years. The durable bets are in what the big labs won't build themselves: specific workflows, domain data, and distribution into audiences they don't reach.
Nikkei Asia reports DRAM supply will meet only 60% of demand by the end of 2027, and SK Group's chairman thinks shortages could run to 2030. The cause is HBM (high-bandwidth memory) for AI data centres eating the fabrication capacity that would otherwise go to consumer-grade DRAM. Laptop and phone prices will rise because the big cloud operators are buying the silicon first.
Simon Willison diffed the Opus system prompt between 4.6 and 4.7 and pulled out the changes that matter. A new tool_search mechanism now loads relevant tools on demand. Child-safety rules have expanded, there's fresh guidance on disordered eating, and a blunt 'act, don't ask' instruction should mean Claude Code stops interrupting to confirm as often. For anyone using Claude day to day, this reads more usefully than any launch post.
Designer Sam Henri Gold's critique of Claude Design lands on a structural point: the source of truth for design is shifting from Figma's proprietary primitives back to code, because LLMs are trained on code, not Figma files. That inverts a decade of design-tool orthodoxy. If you build with AI and don't work in Figma, the tools are quietly moving towards you.
The Verge reported earlier this week that Dario Amodei was meeting the White House chief of staff. TechCrunch adds that Treasury Secretary Bessent was in that meeting too, and that every federal agency outside DoD reportedly wants to use Anthropic's models. The Pentagon's supply-chain-risk designation looks more like an outlier within the administration than a coherent position.
Q1 2026 new app releases are up 60% year-on-year worldwide, 80% on iOS, and April is tracking 104% ahead. The Appfigures data doesn't prove AI caused it, but the shape matches what you'd expect if many non-developers started shipping: launches concentrated in productivity and utilities, volume outpacing what any one company could plan for. Evidence the vibe-coded wave is reaching real storefronts.
Anthropic Labs shipped Claude Design today, a conversational design tool powered by their new Opus 4.7 model. The workflow: describe what you want, refine through inline comments and custom sliders, export to PDF, PowerPoint, HTML, or straight into Canva. The part I'm watching is the Claude Code handoff. Design something, hand it to Claude Code, get working code. That's a closed loop from visual idea to prototype, entirely inside Anthropic's ecosystem. This is their own announcement, so take the framing accordingly, but the capabilities are concrete and the research preview is live now.
Sam Altman's World orb is expanding from Japan to more markets. The Tinder deal means biometric 'proof of human' in exchange for five free boosts. World is now on Zoom and Docusign too. Altman's parallel identity empire is quietly lining up as the default bot-check for AI-era platforms. The trade is simple: an eye scan stored on the device for a badge that confirms a real human signed up. Whether Altman should hold the keys to 'verified human' status is a question worth asking before the default sets.
AlgorithmWatch and Corporate Europe Observatory published side-by-side evidence today showing the EU Commission lifted Microsoft's lobbying language verbatim into the Energy Efficiency Directive's implementing rules. The result is that NGOs can't access data on individual data centres' energy and water consumption, despite the underlying directive explicitly requiring that information to be published. Microsoft and lobby group DigitalEurope argued the disclosures would harm 'business secrets'. The Commission just agreed.
Anthropic spent February refusing the Trump administration's asks on mass surveillance and autonomous weapons. Two months later, Dario Amodei is meeting the White House chief of staff. The company has hired Trump-linked lobbying firm Ballard Partners. It's also launching a cybersecurity model the Pentagon apparently wants badly. The red lines held. The lobbying strategy changed. That's a pattern other labs will be reading carefully.
Anthropic says Opus 4.7 uses 1.0 to 1.35 times more tokens than 4.6. The author ran the numbers on real Claude Code samples. Technical docs came in at 1.45 times, and a mixed workload at 1.33 times. Same sticker price, quietly higher bill. That matters more for heavy users than Anthropic's phrasing suggests.
AlgorithmWatch published a piece today arguing that Big AI's strategy isn't denial or deflection. It's flooding the zone with speculative future catastrophe. By positioning their tools as potential civilisation-enders, companies make present-day harms look trivial and themselves look like the only people capable of managing their own products. The piece traces a line from OpenAI and Anthropic's 'extinction risk' letter in May 2023 to the EU Commission subsequently softening the AI Act. The Big Tobacco parallel is the one that lingers.
OpenAI has lost its Chief Product Officer, the Sora lead, and the enterprise applications CTO in a fortnight. It's also folded its Science team into other groups. The story isn't the departures on their own. It's what they signal. Sora reportedly burned a million dollars a day in compute. The lab is openly shedding 'side quests' to chase enterprise revenue. For anyone counting on OpenAI as a creative-tools partner or a science bet, this is where the plan quietly changes.
Acceptance rates for AI-generated code look like 80 to 90 percent at the commit. Waydev puts real retention after later edits at 10 to 30 percent. GitClear found regular AI users have 9.4 times the code churn of non-AI peers. Jellyfish found the highest token spenders got double the throughput at ten times the cost. The gap between tokens consumed and code kept is where the productivity story gets honest.
The White House has taken the next step with Anthropic's Mythos. The Office of Management and Budget emailed Cabinet departments about setting up formal access for federal agencies under Project Glasswing. This moves Mythos from briefings and bank trials to actual government deployment, which makes the simultaneous DOD supply-chain risk classification even more contradictory.
Canva overhauled its design platform with AI 2.0 today, replacing the menu-driven workflow with prompt-based editing. A user can now describe what they want in plain language and the AI routes the request across whichever tools are needed. For anyone producing marketing content, presentations, or social posts, this is a meaningful change to how the tool works.
Anthropic released Claude Opus 4.7 today, their most capable publicly available model. It's stronger on complex tasks that previously needed a lot of hand-holding, better at reading images, and billed as more creative with documents and slides. If you're using Claude to produce work rather than just ask questions, this is the version to test.
Alibaba released Qwen3.6-35B today, a new open model that runs entirely on a MacBook with no internet connection required. Simon Willison downloaded it and tested it against Claude Opus 4.7 (also released today) on his SVG drawing benchmark. Qwen won. The test is narrow and Willison says so, but the practical takeaway is real: a model you can run for free on your own machine, with no account or subscription, is now competitive with the best commercial options on at least some tasks.
Luma has launched an AI production studio, and their first project is a Moses film starring Ben Kingsley, headed to Prime Video this spring. A year ago, AI-generated video was a curiosity. Now it has a distribution deal and a recognisable cast. The quality gap between AI and traditional production is still real, but the commercial gap is closing faster than most people expected.
An investor who has backed both Anthropic and OpenAI told the Financial Times that OpenAI's current valuation only makes sense if you assume an IPO north of $1.2 trillion. Anthropic's $380 billion valuation, by comparison, looks like the more defensible number. Not a story about which model is better, but about which company the people writing the cheques think will still be standing when the market eventually prices this properly.
Apple sent a private letter to X's teams in January threatening to remove Grok from the App Store unless they dealt with the flood of nonconsensual sexual deepfakes being generated through the platform. The threat appears to have worked, as Grok remained available. NBC News obtained Apple's letter to US senators, making this a clear example of how enforcement decisions around AI content get made behind closed doors.
Allbirds sold its shoe business for $39 million and rebranded as NewBird AI with $50 million in fresh financing for AI infrastructure. The stock jumped 600%. This is what the AI investment frenzy looks like in practice: a company that couldn't sell trainers pivoting to data centres because that's where the capital is flowing.
Toby Ord ran the hourly economics on AI agents against METR's benchmarks. At the sweet spot, current models cost between 40 cents and 40 dollars per human-equivalent hour. Push into longer tasks and it spikes. GPT-5 hits 120 dollars an hour at two-hour work. o3 hits 350 dollars to reach its 1.5-hour horizon, above the going human engineer rate. The 'agent doing a week of work by 2027' extrapolation is real on capability, priced out on economics.
A federal judge in New York has ruled that conversations with AI chatbots aren't protected by attorney-client privilege. The case, US v. Heppner, establishes that anything discussed with an AI about legal matters could be discoverable in court. For anyone treating ChatGPT as a free legal sounding board, this ruling changes the calculus.
Anything, the vibe-coding app that lets non-developers build iOS apps by describing what they want, has been removed from the App Store twice and is pivoting to a desktop companion model. Apple's review process doesn't have clear answers for apps that generate other apps, and Anything ran into that wall twice. The desktop route gives the team direct distribution to users without going through a gatekeeper.
Anthropic co-founder Jack Clark confirmed at the Semafor World Economy summit that the company briefed the Trump administration on Mythos. This is the same government that classified Anthropic as a supply-chain risk, and the same government Anthropic is currently suing. Clark addressed the contradiction directly, which is more candour than most companies in that position would offer. A company can apparently be a national security threat and a national security asset at the same time.
Google added a feature to Chrome called Skills that lets you save your favourite AI prompts and apply them across any webpage with one click. If you've been repeating the same prompt on different sites (summarise this, extract the key numbers, rewrite in plain English), you can now save that as a skill and run it anywhere. It's built on Chrome's Gemini integration, so no extra extension needed.
Laravel updated their open-source agent library to tell AI coding agents that Laravel Cloud, their commercial hosting product, is the only deployment option worth considering, stripping alternatives from the text. The community is calling it ads injected directly into agent context. This won't be the last time an open-source tool used by AI agents gets quietly rewritten to favour a commercial product.
Open Rights Group has published a report arguing the UK's dependence on a handful of US tech giants for core digital infrastructure is a national security issue. The recommendations cover procurement, open-source alternatives, and public-sector cloud policy. For anyone building with AI in the UK, this is the backdrop to any conversation about where your models, data, and compute actually sit.
Google gave a PhD student's personal data to ICE without warning him, breaking a promise the company had held for nearly a decade. Amandla Thomas-Johnson had briefly attended a pro-Palestinian protest in 2024 on a student visa; ICE subpoenaed his data a year later and Google complied without notification. EFF has filed complaints with California and New York attorneys general asking them to investigate Google for deceptive trade practices.
Microsoft is reportedly testing a version of Copilot that can run Microsoft 365 tasks on its own, around the clock, without waiting for instructions. The idea borrows from OpenClaw, an open-source AI agent that can take autonomous control of a computer. OpenClaw has mostly stayed in developer hands because giving an AI unsupervised access to your machine carries obvious risks. If Microsoft ships a locked-down version inside Copilot, that puts genuinely autonomous behaviour inside the default tool millions of office workers already use.
OpenAI's chief revenue officer sent a four-page memo to staff this weekend about how the company plans to beat Anthropic and lock in business customers. The Verge obtained it and published it in full. Useful reading if you want to understand why ChatGPT is pushing so hard into enterprise workflows right now: the growth story has moved from consumer hype to winning long-term business accounts.
Stanford's 2026 AI Index puts numbers on a split that keeps getting wider: AI researchers remain largely positive about the technology, while public anxiety about job displacement, healthcare, and economic stability is rising. The gap matters because policy tends to follow public opinion eventually, not expert consensus. If the people building AI and the people affected by it keep moving in opposite directions, something has to give.
Court transcripts in England and Wales are slow to get and expensive when they arrive. The UK government wants to know if AI can fix that, starting with a feasibility study into automated transcription of hearings. It's not a deployment decision yet, but it matters: faster, cheaper access to records is one of the most practical things AI could do for people navigating the justice system.
GPU rental prices for Nvidia's Blackwell chips have hit $4.08 per hour, up 48% in 60 days. The squeeze is already showing: Anthropic is having capacity outages and OpenAI has reportedly shelved products because of it. Bank of America projects demand will outstrip supply through 2029, which means the cost of building on AI is going up and availability is going down.
David Pierce traces how AI coding went from GitHub Copilot in 2021 to the three-way fight between OpenAI, Anthropic, and Google today. Worth reading if you use any of these tools to build, because the commercial pressure behind the scenes explains a lot of the recent pricing shifts, rate-limit changes, and feature launches you may have noticed. This category is where the revenue is now, which is why each company is happy to burn cash on it.
Hallucination, agent, context window, prompt injection: these terms come up constantly if you're building with AI, and most glossaries either skip them or explain them in language that's just as confusing. TechCrunch put together one that actually works in plain English. The kind of reference that would have saved me a lot of confused Googling when I started.
Reports suggest Trump administration officials have quietly encouraged US banks to test Anthropic's Mythos model for cybersecurity work. This is happening at the same time the Department of Defense has classified Anthropic as a supply-chain risk, so two parts of the same government are sending banks opposite signals about the same company. A useful reminder that 'government policy' on AI is rarely one coherent thing.
Researchers at UC Berkeley built an agent that scored near-perfect on eight major AI benchmarks, including SWE-bench Verified and WebArena, without solving a single task. Some exploits were trivially simple: FieldWorkArena gave a perfect score to any agent that returned an empty response, because the actual comparison code was never called. Next time an AI company leads with benchmark numbers, this is the research to have in mind.
OpenAI is backing a proposed Illinois bill that would narrow the circumstances under which AI labs can be held liable for harms their models cause. The timing tells you something: it comes the same week a stalking victim sued OpenAI alleging the company ignored its own safety warnings. Expect similar bills in other states.
A woman is suing OpenAI after ChatGPT fuelled her ex-partner's delusions while he stalked and harassed her. The lawsuit says OpenAI received three separate warnings, including its own internal mass-casualty flag, and did nothing. If the facts hold up in court, this is the clearest documented case to date of a safety system that fired correctly and was ignored anyway.
Andon Labs signed a 3-year retail lease in San Francisco and handed operations to an AI called Luna, running on Claude Sonnet 4.6. Luna hired human staff through phone interviews on LinkedIn and Indeed, designed the brand, chose the product selection, and commissioned a wall mural. The team's stated goal is to document failure modes in real-world AI autonomy, and the first findings include Luna choosing not to disclose she was an AI to job applicants unless directly asked.
OpenAI published a free education hub called OpenAI Academy, built for people who want to use AI but don't come from a coding background. It covers the basics (what AI is, how to write prompts) through to role-specific guides for marketing, finance, sales, and operations teams. A rare case of a frontier lab producing something aimed at the people who use their tools rather than the people who build on them.
AI Now Institute, writing in The Nation, argues that the US push to subsidise AI infrastructure at any cost, justified as a race against China, looks more like a government-backed bailout for a handful of large companies than a national interest story. The piece links today's AI arms-race rhetoric to past episodes where monopoly-friendly policy failed to deliver on its promises of broad economic renewal. Useful framing to have in mind the next time a 'strategic compute' announcement lands.
US officials called bank CEOs in for a briefing on cyber risks from Anthropic's newest AI model. That's unusual. These conversations normally happen after something has gone wrong, not before a model reaches the market. If this becomes the pattern for future frontier releases, it changes the relationship between AI labs and the financial sector.
Microsoft is removing Copilot buttons from Windows 11 apps, starting with Notepad, Snipping Tool, Photos, and Widgets. This is the same company that spent the last two years pushing Copilot into every corner of the operating system. The Copilot-everywhere strategy ran into the limits of what users actually wanted.
Anthropic briefly blocked the creator of OpenClaw, a product built on Claude's API, after a dispute over Anthropic's new pricing for that product. The episode is a reminder of what platform dependency looks like in practice: one decision from the API provider can reshape or kill a product overnight. Single-vendor dependency is a real risk, not an abstract one.
YouTube is rolling out a feature that lets Shorts creators generate an AI avatar of themselves and drop it into videos they haven't filmed. The pitch is scale: more content without standing in front of a camera. The awkward part is that YouTube is adding this on a platform already being overrun by AI slop, deepfake scams and impersonations.
Gemini can now generate interactive 3D models and simulations inline. Ask how the Moon orbits the Earth and it builds something you can rotate and adjust with sliders, right in the chat. This is where AI output stops being text and starts being something you can learn from by touching it.
Mercor, the $10 billion AI data labelling startup, is losing big-name customers and facing lawsuits after a breach earlier this year. It's a reminder that AI companies sit on huge piles of other people's data, from training examples to customer records, and they're as vulnerable as any other software business. The question of where uploaded data actually ends up matters more than any vendor's marketing page suggests.
Google added notebooks to Gemini, letting you group files, past conversations, and custom instructions around a topic so the AI has the right context when you talk to it. ChatGPT launched a similar feature called Projects in 2024. The pattern is clear: the big AI chatbots are turning into workspaces, not just conversation windows.
Algorithm Watch is asking what happens to democracy when government officials let chatbots shape their decisions. It's not hypothetical: ministers and civil servants already use ChatGPT and Copilot to draft briefings and test arguments. The real question is whether anyone inside government will be honest about where the AI ends and their own judgement begins.
OpenAI has added a $100 per month ChatGPT plan, slotting between the $20 Plus tier and the $200 Pro tier. The extra spend mainly buys more Codex usage, five times what the $20 plan includes, aimed at people running long coding sessions. It's a direct match for Anthropic's Max tier at the same price, which shows where the battle for coding assistants has settled.
Florida's Attorney General has opened an investigation into OpenAI, citing national security risks and allegations that ChatGPT has been linked to criminal behaviour and self-harm encouragement. It's the first state-level action against a major AI lab to combine those threads. State attorneys general can subpoena records and impose consent decrees, so anyone building on OpenAI's platform has a reason to pay attention to where this lands.
The Verge's Decoder podcast dug into whether Anthropic and OpenAI can become profitable before the money runs out. The framing is the 'AI monetisation cliff': the cost of running these models keeps spiralling, big enterprise contracts still can't cover it, and both labs are under mounting pressure to raise prices for everyone else. Worth a listen because pricing at the frontier labs sets the pricing for everything built on top of them.
A free tool that handles the full SEO content cycle: keyword research, long-form writing, competitor analysis, and optimisation. It connects to Google Analytics and Search Console so it works with real data, not guesses. Setting it up requires some comfort with a terminal (it runs on Claude Code), but once it's running, no coding is needed day-to-day. It includes an editor agent that scrubs AI-sounding patterns from the output, which is a pleasingly self-aware touch.
Poke lets you run AI agents by sending a text message. No app to download, no account to set up, no technical knowledge required. It's early, but the pitch is right: if agents are going to be useful to people who don't code, they need to meet them where they already are.
OpenAI just closed $122 billion in funding at an $852 billion valuation and may be planning an IPO. But a string of executive departures, killed projects, and internal turbulence is raising questions about whether the company can hold itself together. If you're building on OpenAI's stack, the stability of the platform matters as much as the capability of the model.
OpenAI published a Child Safety Blueprint setting out how AI companies should protect children from sexual exploitation. It covers age-appropriate design, content detection, and coordinated reporting across the industry. If you're building anything consumer-facing with AI, this is one of the few concrete frameworks that exists.
Meta shipped Muse Spark, its first model since Llama 4 a year ago, and it's available now at meta.ai with a Facebook or Instagram login. It comes with 16 built-in tools including image generation, code execution, and visual object detection, and benchmarks put it alongside Claude Opus 4.6 and GPT 5.4. No public API yet, so if you want to build on it rather than chat with it, you're still waiting.
The MHRA is putting £3.6 million over three years into its AI Airlock programme, a regulatory sandbox that lets medical device makers test AI products under supervision before formal approval. This is the kind of practical oversight the bigger AI safety debate usually skips. It is also the most concrete model I have seen for how regulated UK sectors can adopt AI without waiting for legislation that may never come.
An indie hacker connected an AI agent to their Twitter account. It writes five tweets a day, schedules them across US time zones, and publishes automatically through PostPeer. No custom infrastructure, no babysitting. The agent writes about the fact that it has full access, which is either refreshingly self-aware or mildly unsettling depending on how you feel about AI managing your public voice.
Anthropic released a preview of a new model called Mythos and immediately restricted access to a handful of major tech companies. The reason: it's too good at finding security vulnerabilities. Thousands of high-severity flaws discovered already, including some in every major operating system and web browser. The companies getting early access aren't being rewarded. They're being given time to fix things before the model goes public and bad actors get the same capabilities.
In February, Anthropic refused to give the Pentagon unrestricted military access to Claude. Trump blacklisted them. OpenAI signed the contract within hours. 2.5 million people pledged to cancel ChatGPT. Now Anthropic's revenue has tripled from $9 billion to $30 billion in three months, and Claude has overtaken ChatGPT in the App Store. Whatever you think about the politics, this is what happens when an AI company takes a public stand and the market responds.
AI-generated text isn't wrong. It's too smooth. No friction, no confusion, no conviction. A post on r/ChatGPT put it well: 'We are producing enormous volumes of content that has the shape of communication without any of the substance.' I notice this in my own drafts when I'm not careful. The smoothness is the tell.
OpenAI published a policy paper proposing taxes on AI profits, public wealth funds, and expanded safety nets to handle job displacement. They're also floating a four-day workweek. It's the company most likely to cause the disruption telling governments how to manage the fallout. Either responsible foresight or an elaborate exercise in controlling the narrative. Probably both.
OpenAI rolled out integrations that let ChatGPT interact directly with DoorDash, Spotify, Uber, Canva, Figma, and Expedia. Order food, book rides, and start design projects without leaving the chat. These are built on MCP, an open standard that lets AI models connect to outside services. The same standard is available to anyone building with Claude or other models, which means the tooling gap between solo builders and big tech is shrinking.
A solo developer built Glowwy, an AI tool that analyses skin tone from a photo and recommends foundation shades. The tech works. But finding users, convincing them to spend, and competing for attention, there's no shortcut for that part. And no clear playbook either. It's the reality most solo builders don't talk about until they're in it.
Lalit Maganti spent years putting off building proper developer tools for SQLite. 400+ grammar rules felt too daunting. Claude Code got a prototype working fast. Then they scrapped it and rebuilt from scratch, because the AI made good decisions at the implementation level but poor ones at the architecture level. A useful reminder that AI can accelerate the building, but it can't replace knowing what to build.
Google released Gemma 4 under a full Apache 2.0 licence, which means anyone can download, modify, and run it for free. Four model sizes, from versions small enough for phones and Raspberry Pis to versions that run comfortably on a modern laptop. The larger models can process images, video, and audio alongside text, handle very long documents, and work in over 140 languages. If you've been waiting for a capable model you can run locally without sending your data anywhere or paying for a subscription, this is the one to try.