How to improve AI agent performance
Trusting a new AI agent you just released can take time. You run it through your work data, watch it closely for days and weeks, always judging if it's working for you or against you. Just when you're starting to relax and enjoying the productivity boost, the AI provider launches a model update: the responses have now shifted, your instructions get interpreted differently, and you're back to zero trust again. So… progress? Improving AI agent performance is an ongoing process—not a one-time setup
How we used Gemini to build Google I/O 2026
Learn how Googlers used AI to produce Google I/O 2026.
Our new community investments in Virginia support local jobs and expand energy affordability.
We’re helping build the state’s next-generation workforce and investing in energy programs.
Catch up on 12 major I/O 2026 moments
Here are 12 of the biggest Google I/O 2026 keynote moments, including news about Gemini Omni, Gemini 3.5 Flash and more.
Nearly 100% of patients surveyed say they’d want to know when AI is used in imaging - Radiology Business
Nearly 100% of patients surveyed say they’d want to know when AI is used in imaging Radiology Business
How ChatGPT adoption has expanded
New OpenAI Signals data shows how ChatGPT adoption is growing globally, with users increasing usage, exploring more capabilities, and driving growth across regions and languages.
Podcasting platform Riverside enters the newsletter publishing game
Users will be able use AI to create newsletters based on their recordings.
New research shows how AMIE, our medical AI, could help manage health conditions.
Research in “Nature” shows our conversational AI system matches primary care physicians in complex disease management.
Introducing GeneBench-Pro
Introducing GeneBench-Pro, a new benchmark testing AI performance in genomics, biology, and scientific research using complex, real-world datasets.
What is agentic AI? And how you can start using it
The idea of AI tools that can be trusted to operate independently has always been exciting—and, for a long time, just out of reach. That's changing: 84% of enterprise leaders now say they're likely or certain to increase AI agent investments over the next 12 months. Not everything marketed as agentic AI actually clears the bar, though. Here, we'll explore what agentic AI really is, how it works, and some real-world examples of agentic AI workflows you can start experimenting with today. Table of
Employers who laid off workers citing AI are already starting to regret it - CNBC
Employers who laid off workers citing AI are already starting to regret it CNBC
Powering the world’s first AI arts museum
Refik Anadol Studio opens Dataland, the first museum of AI arts, powered by Google Cloud and supported by Google Arts & Culture.
Mapping Europe’s AI Workforce Opportunity
A new OpenAI report maps how AI could reshape jobs across the EU, highlighting which occupations may face automation, growth, or workflow changes.
Amazon launches new $1 billion FDE org, following OpenAI and Anthropic
Engineers on the new team will embed within companies to deploy purpose-built agents, focusing on fast deployments and customer self-sufficiency.
Nous Research's NousCoder-14B is an open-source coding model landing right in the Claude Code moment
Nous Research, the open-source artificial intelligence startup backed by crypto venture firm Paradigm, released a new competitive programming model on Monday that it says matches or exceeds several larger proprietary systems — trained in just four days using 48 of Nvidia's latest B200 graphics processors. The model, called NousCoder-14B, is another entry in a crowded field of AI coding assistants, but arrives at a particularly charged moment: Claude Code, the agentic programming tool from rival Anthropic, has dominated social media discussion since New Year's Day, with developers posting breathless testimonials about its capabilities. The simultaneous developments underscore how quickly AI-assisted software development is evolving — and how fiercely companies large and small are competing to capture what many believe will become a foundational technology for how software gets written. type: embedded-entry-inline id: 74cSyrq6OUrp9SEQ5zOUSl NousCoder-14B achieves a 67.87 percent accuracy rate on LiveCodeBench v6, a standardized evaluation that tests models on competitive programming problems published between August 2024 and May 2025. That figure represents a 7.08 percentage point improvement over the base model it was trained from, Alibaba's Qwen3-14B, according to Nous Research's technical report published alongside the release. "I gave Claude Code a description of the problem, it generated what we built last year in an hour," wrote Jaana Dogan, a principal engineer at Google responsible for the Gemini API, in a viral post on X last week that captured the prevailing mood around AI coding tools. Dogan was describing a distributed agent orchestration system her team had spent a year developing — a system Claude Code approximated from a three-paragraph prompt. The juxtaposition is instructive: while Anthropic's Claude Code has captured imaginations with demonstrations of end-to-end software development, Nous Research is betting that open-source alternatives trained on verifiable problems can close the gap — and that transparency in how these models are built matters as much as raw capability. How Nous Research built an AI coding model that anyone can replicate What distinguishes the NousCoder-14B release from many competitor announcements is its radical openness. Nous Research published not just the model weights but the complete reinforcement learning environment, benchmark suite, and training harness — built on the company's Atropos framework — enabling any researcher with sufficient compute to reproduce or extend the work. "Open-sourcing the Atropos stack provides the necessary infrastructure for reproducible olympiad-level reasoning research," noted one observer on X, summarizing the significance for the academic and open-source communities. The model was trained by Joe Li, a researcher in residence at Nous Research and a former competitive programmer himself. Li's technical report reveals an unexpectedly personal dimension: he compared the model's improvement trajectory to his own journey on Codeforces, the competitive programming platform where participants earn ratings based on contest performance. Based on rough estimates mapping LiveCodeBench scores to Codeforces ratings, Li calculated that NousCoder-14B's improvemen t— from approximately the 1600-1750 rating range to 2100-2200 — mirrors a leap that took him nearly two years of sustained practice between ages 14 and 16. The model accomplished the equivalent in four days. "Watching that final training run unfold was quite a surreal experience," Li wrote in the technical report. But Li was quick to note an important caveat that speaks to broader questions about AI efficiency: he solved roughly 1,000 problems during those two years, while the model required 24,000. Humans, at least for now, remain dramatically more sample-efficient learners. Inside the reinforcement learning system that trains on 24,000 competitive programming problems NousCoder-14B's training process offers a window into the increasingly sophisticated techniques researchers use to improve AI reasoning capabilities through reinforcement learning. The approach relies on what researchers call "verifiable rewards" — a system where the model generates code solutions, those solutions are executed against test cases, and the model receives a simple binary signal: correct or incorrect. This feedback loop, while conceptually straightforward, requires significant infrastructure to execute at scale. Nous Research used Modal, a cloud computing platform, to run sandboxed code execution in parallel. Each of the 24,000 training problems contains hundreds of test cases on average, and the system must verify that generated code produces correct outputs within time and memory constraints — 15 seconds and 4 gigabytes, respectively. The training employed a technique called DAPO (Dynamic Sampling Policy Optimization), which the researchers found performed slightly better than alternatives in their experiments. A key innovation involves "dynamic sampling" — discarding training examples where the model either solves all attempts or fails all attempts, since these provide no useful gradient signal for learning. The researchers also adopted "iterative context extension," first training the model with a 32,000-token context window before expanding to 40,000 tokens. During evaluation, extending the context further to approximately 80,000 tokens produced the best results, with accuracy reaching 67.87 percent. Perhaps most significantly, the training pipeline overlaps inference and verification — as soon as the model generates a solution, it begins work on the next problem while the previous solution is being checked. This pipelining, combined with asynchronous training where multiple model instances work in parallel, maximizes hardware utilization on expensive GPU clusters. The looming data shortage that could slow AI coding model progress Buried in Li's technical report is a finding with significant implications for the future of AI development: the training dataset for NousCoder-14B encompasses "a significant portion of all readily available, verifiable competitive programming problems in a standardized dataset format." In other words, for this particular domain, the researchers are approaching the limits of high-quality training data. "The total number of competitive programming problems on the Internet is roughly the same order of magnitude," Li wrote, referring to the 24,000 problems used for training. "This suggests that within the competitive programming domain, we have approached the limits of high-quality data." This observation echoes growing concern across the AI industry about data constraints. While compute continues to scale according to well-understood economic and engineering principles, training data is "increasingly finite," as Li put it. "It appears that some of the most important research that needs to be done in the future will be in the areas of synthetic data generation and data efficient algorithms and architectures," he concluded. The challenge is particularly acute for competitive programming because the domain requires problems with known correct solutions that can be verified automatically. Unlike natural language tasks where human evaluation or proxy metrics suffice, code either works or it doesn't — making synthetic data generation considerably more difficult. Li identified one potential avenue: training models not just to solve problems but to generate solvable problems, enabling a form of self-play similar to techniques that proved successful in game-playing AI systems. "Once synthetic problem generation is solved, self-play becomes a very interesting direction," he wrote. A $65 million bet that open-source AI can compete with Big Tech Nous Research has carved out a distinctive position in the AI landscape: a company committed to open-source releases that compete with — and sometimes exceed — proprietary alternatives. The company raised $50 million in April 2025 in a round led by Paradigm, the cryptocurrency-focused venture firm founded by Coinbase co-founder Fred Ehrsam. Total funding reached $65 million, according to some reports. The investment reflected growing interest in decentralized approaches to AI training, an area where Nous Research has developed its Psyche platform. Previous releases include Hermes 4, a family of models that we reported "outperform ChatGPT without content restrictions," and DeepHermes-3, which the company described as the first "toggle-on reasoning model" — allowing users to activate extended thinking capabilities on demand. The company has cultivated a distinctive aesthetic and community, prompting some skepticism about whether style might overshadow substance. "Ofc i'm gonna believe an anime pfp company. stop benchmarkmaxxing ffs," wrote one critic on X, referring to Nous Research's anime-style branding and the industry practice of optimizing for benchmark performance. Others raised technical questions. "Based on the benchmark, Nemotron is better," noted one commenter, referring to Nvidia's family of language models. Another asked whether NousCoder-14B is "agentic focused or just 'one shot' coding" — a distinction that matters for practical software development, where iterating on feedback typically produces better results than single attempts. What researchers say must happen next for AI coding tools to keep improving The release includes several directions for future work that hint at where AI coding research may be heading. Multi-turn reinforcement learning tops the list. Currently, the model receives only a final binary reward — pass or fail — after generating a solution. But competitive programming problems typically include public test cases that provide intermediate feedback: compilation errors, incorrect outputs, time limit violations. Training models to incorporate this feedback across multiple attempts could significantly improve performance. Controlling response length also remains a challenge. The researchers found that incorrect solutions tended to be longer than correct ones, and response lengths quickly saturated available context windows during training — a pattern that various algorithmic modifications failed to resolve. Perhaps most ambitiously, Li proposed "problem generation and self-play" — training models to both solve and create programming problems. This would address the data scarcity problem directly by enabling models to generate their own training curricula. "Humans are great at generating interesting and useful problems for other competitive programmers, but it appears that there still exists a significant gap in LLM capabilities in creative problem generation," Li wrote. The model is available now on Hugging Face under an Apache 2.0 license. For researchers and developers who want to build on the work, Nous Research has published the complete Atropos training stack alongside it. What took Li two years of adolescent dedication to achieve—climbing from a 1600-level novice to a 2100-rated competitor on Codeforces—an AI replicated in 96 hours. He needed 1,000 problems. The model needed 24,000. But soon enough, these systems may learn to write their own problems, teach themselves, and leave human benchmarks behind entirely. The question is no longer whether machines can learn to code. It's whether they'll soon be better teachers than we ever were.
Unlocking Britain’s next era of productivity: Building a nation of AI trailblazers
Google UK shares its latest Economic Impact Report and how to enable more people to unlock the benefits of AI-powered technologies.
How GPT-5 helped immunologist Derya Unutmaz solve a 3-year-old mystery
GPT-5 Pro helped solve a 3-year-old immunology mystery, offering insights into T cell behavior. The breakthrough could support cancer and autoimmune research.
Core dump epidemiology: fixing an 18-year-old bug
OpenAI engineers used large-scale core dump analysis to debug rare infrastructure crashes, uncovering both a hardware fault and a long-standing software bug.
HP Inc. launches Frontier strategic partnership with OpenAI
HP Inc. scales its OpenAI Frontier partnership to deploy AI across customer experiences, software development, and enterprise operations.
8 AI agent use cases and examples in the workplace
As an extremely cool person, I've recently gotten really into Minecraft, the open-world sandbox game that's basically virtual LEGOs. But I've found that the sheer possibility of building anything you want makes it weirdly tricky to actually start. AI agents have a similar problem. The idea of software that can take a goal, make decisions, and do work on your behalf is genuinely compelling. But figuring out which AI agent use cases are actually worth building is where most teams get stuck. To he
Pope Leo XIV Declares AI a Threat to Human Dignity and Workers’ Rights
Pope Leo XIV is taking a bold stance on artificial intelligence, calling it “a challenge to human dignity, justice and labour” in his first major address since being elected leader of the Catholic Church. The new pontiff is placing AI at the center of the Church’s moral agenda, warning that we’re entering a new industrial revolution with the same threats to workers and human rights seen over a century ago. “In our own day… developments in the field of artificial intelligence pose new challenges,” Leo said, addressing the College of Cardinals on Saturday in the New Synod Hall. He echoed The post Pope Leo XIV Declares AI a Threat to Human Dignity and Workers’ Rights appeared first on DailyAI.
Trump drops restrictions on Anthropic’s Mythos and Fable models
The Trump administration's erratic approach to AI policymaking has left companies across the industry with little clarity about what will govern future model releases.
Daybreak: Tools for securing every organization in the world
OpenAI introduces new Daybreak tools, including Codex Security and GPT-5.5-Cyber, to help organizations find, validate, and patch vulnerabilities at scale.
X now offers an MCP server to make its platform easier for AI tools to use
X has launched a hosted MCP server, making it easier for developers to connect AI applications with the company’s API.
Code by Zapier: Add custom code to your workflows
With Zapier, you can connect thousands of apps inside a Zap (what we call an automated workflow). Add forms, tables, and the ability to reach your data from any AI tool to the mix, and there's a lot you can do. But sometimes you need more. Maybe Zapier's existing actions or triggers can't quite get you where you need to go. Maybe you're pulling information from App A, but it's not in the right format for App B. Or maybe you have something more involved in mind, like looping through records from
5 ways Google Search can level up your thrift and vintage shopping
Uncover second-hand scores with AI tools in Google Search and Shopping.
The latest AI news we announced in May 2026
Here are Google’s latest AI updates from May 2026
Murder Victim Speaks from the Grave in Courtroom Through AI
Chris Pelkey was shot and killed in a road rage incident. At his killer’s sentencing, he forgave the man via AI. In a historic first for Arizona, and possibly the U.S., artificial intelligence was used in court to let a murder victim deliver his own victim impact statement. What happened Pelkey, a 37-year-old Army veteran, was gunned down at a red light in 2021. This month, a realistic AI version of him appeared in court to address his killer, Gabriel Horcasitas. “In another life, we probably could’ve been friends,” said AI Pelkey in the video. “I believe in forgiveness, and The post Murder Victim Speaks from the Grave in Courtroom Through AI appeared first on DailyAI.
Bailey campaign embraces artificial intelligence in new era of politics - NPR Illinois
Bailey campaign embraces artificial intelligence in new era of politics NPR Illinois
Start building with Nano Banana 2 Lite and Gemini Omni Flash
Scale your ideas with Nano Banana 2 Lite, our fastest, most cost-efficient Gemini Image model, and Gemini Omni Flash for high-quality video and conversational editing.
Fluid, natural voice translation with Gemini 3.5 Live Translate
Gemini 3.5 Live Translate brings near real-time, natural speech translation to Google AI Studio, Google Translate and Google Meet.
Inside Genebench-Pro
Artificial intelligence could usher in a new era of vaccine development - CIDRAP
Artificial intelligence could usher in a new era of vaccine development CIDRAP
Prevent lock-in with AI model flexibility on Zapier
Every AI provider comes with models of varying strengths. I'm a Claude stan because it just gets my writing style, but I'll often reach for Sonnet over the higher-tier models because its results are more consistent for me. And for some tasks, Claude's lineup doesn't cut it at all—when I need to process data at scale, for example, I might reach for Gemini. When I need a versatile generalist for classification or routing, GPT might be my pick. Other people across my team and at Zapier have altoget
The first reply wins: Meet the builders turning Yelp leads into booked jobs
The 2026 Zappy Awards are open — and this month, we're partnering with Yelp to spotlight something specific: what Yelp advertisers are actually building with Zapier. Turns out, the integration story goes well beyond connecting a form to a spreadsheet. These builders are using Yelp leads as the trigger for systems that route, respond, follow up, and convert — automatically, across multiple locations, at any hour. With 70+ submissions in already, here are two that stood out. Caleb Whalen, Owner, C
Anthropic’s Claude Science bets on workflow, not a new model, to win over scientists
Anthropic's Claude Science is a workbench that gives scientists one environment to do computational research, saving them from the need to bounce between databases, pipelines, and tools.
Google introduces a faster, cheaper image generator with Nano Banana 2 Lite
Google is updating its image generator to make it faster and cheaper, making it a more useful tool for creators looking to make AI content.
How to optimize your vibe coding spend
I've been vibe coding since the beginning of 2025 and have played around with just about all the available options. The premise definitely got me: I can build an app I just dreamed up, without spending a lot of money or taking a lot of time to get to a working product. But the "without spending a lot of money" part is tricky. It's confusing enough that it's left many new vibe coders shouting about scams in the respective subreddits for each of the major vibe coding tools. And to be sure, it's no
9 demos of Gemini Omni and Gemini 3.5 in action
Watch 9 videos showing the capabilities of Gemini Omni and Gemini 3.5, announced at Google I/O 2026.
How Omio is building the future of conversational travel
Discover how Omio uses OpenAI to power conversational travel experiences, accelerate product development, and transform into an AI-native company.
Fable and Mythos: Anthropic says US lifts export ban on its advanced AI tools - BBC
Fable and Mythos: Anthropic says US lifts export ban on its advanced AI tools BBC
Helping build shared standards for advanced AI
OpenAI helps build shared standards for advanced AI, supporting evaluation frameworks, safety practices, and global cooperation through the Appia Foundation.
Lumo, Proton’s privacy-focused AI chatbot, gets an upgrade
Proton's Lumo 2.0 is dropping this week, giving users a broader variety of capabilities.
Could your job be on this list? AI ranked the careers it thinks it can replace - Click2Houston
Could your job be on this list? AI ranked the careers it thinks it can replace Click2Houston
Anthropic: US has lifted export controls on Fable and Mythos AI models after security risk fears - The Guardian
Anthropic: US has lifted export controls on Fable and Mythos AI models after security risk fears The Guardian
10 top women in AI in 2026
AI is changing our world, but the stories of who build it often get lost in the noise. Behind the headlines and hype, a group of women are solving AI’s fundamental challenges – despite working in an industry persisently impacted by gender inequality. Women make up just 22% of AI professionals worldwide and only 12% of AI researchers. In academic publishing, female researchers account for just 29% of first authors on AI papers, a number that hasn’t increased since the mid-2000s. This is a story about ten leaders who have influenced AI despite the odds being stacked against them. Their The post 10 top women in AI in 2026 appeared first on DailyAI.
OpenClaw is finally available on Android and iOS
The free open source agentic program is finally invading your phone.
Zapier SDK: Connect your code files to thousands of actions
Right when I perfected my AI chatbot workflows, I found out all the cool kids had already migrated to building with AI coding agents. So I made the switch. And luckily for me, technical builders, and fellow vibe coders everywhere, Zapier SDK launched right on cue. Zapier SDK is a resource that gives AI coding agents access to pre-built app integrations in the Zapier directory. They all run through Zapier's governance layer, so you can build safely while you carry out more than 30,000 actions in
Catch up on the Dialogues stage at Google I/O 2026.
A recap of the 2026 I/O Dialogues, where leaders discuss the future of AI, quantum computing, robotics and creativity.
The ‘Father of the Internet’ is finally retiring
Vinton Cerf, one of the creators of the protocols underlying the internet, will step down as Google's chief internet evangelist next week.