Introduction to MATS 10.0 and AI verification
This southern hemisphere winter I am participating in MATS 10.0, a fantastic program for building yourself as a researcher (of many types) in the world of AI safety. While I’m doing the project remotely, first couple of weeks I came to Berkeley to meet other engaged newcomers to the field and learn from old hats. Beyond choosing a project and getting to work, one of the early challenges of MATS – and of AI safety research, and I suspect impact-driven research in general – is the need for an understanding of what you intend to change with your work, what you might be able to work on, and how the two connect. In the spirit of learning to write in public, and of having something to point people to when they ask what my project is about, this first MATS post sets out my theory of change for technical AI verification work.
Why work on AI verification?
The line of argument runs as follows:
- AI is developing quickly. Many experts are confident we will reach artificial general intelligence (AGI) or artificial superintelligence (ASI) at some point. As of mid 2026, Metaculus puts even odds on the first general AI system around 2033, while expert panels put the median nearer 2050.1
- Even short of AGI or ASI, AI can already produce bad outcomes including cyber espionage2 and deepfakes around elections.3 Increasing capability will make harms like these more likely and more intense.
- Work is underway to mitigate or eliminate these risks. Some of it moves alongside capability development (controlling outputs related to WMDs), some far slower (alignment with human values).
- We cannot be certain existing safety work will generalise to models nearer human intelligence, and some of the problems (see alignment) may be unsolvable, or impossible even to articulate precisely.
- These problems are more likely to be addressed given more time and more focus on safety. That could come from some combination of (a) AI framed as cooperation between companies or countries rather than a race to supremacy, (b) training and deployment agreed with multiple parties (AI companies and third-party auditors, say, or other countries) and constrained accordingly,4 or (c) AI development being paused or limited outright.5
- At the domestic level these measures would be implemented through regulation, perhaps with an overarching international treaty.
- Regulation and treaties only trustably mitigate risk if an underlying suite of verification measures can test compliance with their requirements.6
- Therefore, improving the feasibility, cost-effectiveness and trustworthiness of AI verification measures increases the viability of regulation and treaty, and so may reduce AI risk.
1 Forecasting Research Institute 2026 (expert median 2050), Metaculus. Estimates move around a lot.
2 Anthropic 2025: AI autonomously executed 80–90% of a ~30-target espionage campaign.
3 Irish Times 2025: a fake RTÉ news bulletin announced the front-runner’s withdrawal and the election’s cancellation, days before the vote.
4 e.g. Trager et al. 2023, Baker et al. 2025.
5 e.g. the 2023 FLI open letter.
Where verification tools stand
Current thinking is that verification built into the chips themselves (‘on-chip’ verification) and verification done outside the computational part of a chip (‘off-chip’ verification) both necessary for a robust system in which chips may be untrusted. On-chip is the harder near-term leg – tamper-proofing and secure hardware still need substantial development, current AI chips lack the necessary security features, and suppliers and buyers across continents do not trust one another.7 Off-chip verification is therefore the more tractable place to push now, and where I am starting my research.
Note: today’s US export-control directive forcing Anthropic to suspend Fable 5 and Mythos 5 shows things can change very quickly, motivating this work even more.