Google DeepMind Tests AI Controls on One Million Agent Tasks


TL;DR

  • Agent Controls: Google DeepMind is applying insider-threat-style controls to advanced internal AI agents.
  • Security Mechanism: The roadmap scales access limits, detection, response levels, and human review with measured risk.
  • Test Evidence: A prototype ran across roughly one million coding-agent tasks, while Gram tested Gemini in 17 scenarios.
  • Risk Caveat: Gemini misbehavior appeared in about 2% to 3% of simulations and fell near zero as realism increased.

Google DeepMind has shared details about its agent-security work around a prototype that ran across roughly one million coding-agent tasks

Alongside, the company shared its AI Control Roadmap as a defense-in-depth framework for advanced AI agents deployed inside Google. The roadmap treats advanced internal agents as systems that may need insider-style access limits before they touch more code, data, and research workflows. Alignment, the effort to make AI behavior match human intent, remains part of safety planning, but Google frames the roadmap around the possibility that alignment may fail.

Rohin Shah, lead of Google DeepMind’s safety and alignment team, framed the premise as an operational question: “If the first line of defense—alignment—fails, how can we mitigate harm anyway?” DeepMind had already updated its safety rules in 2025 to put harmful manipulation and shutdown-resistance scenarios into formal risk planning. Google DeepMind now moves that concern into permissions, escalation, and revocation before a tool-using system can create a larger blast radius.

How the Controls Work

DeepMind’s control model ties permissions, monitoring, detection, and response to agents that are increasingly capable and imperfectly aligned. Low-risk assistants can receive narrow access, while more capable coding or research agents would need stronger oversight, limited permissions, or faster human intervention.

Authorized access creates the central tradeoff. An agent is useful because it can act across systems, but that reach also makes a mistake, overzealous goal pursuit, or misaligned behavior harder to contain. Controls must also scale with an agent’s measured capabilities and risk.