CyberGym: Evaluating AI Agents' Real-World Cybersecurity Capabilities at Scale

Wang, Zhun; Shi, Tianneng; He, Jingxuan; Cai, Matthew; Zhang, Jialin; Song, Dawn

Computer Science > Cryptography and Security

arXiv:2506.02548 (cs)

[Submitted on 3 Jun 2025 (v1), last revised 24 Mar 2026 (this version, v3)]

Title:CyberGym: Evaluating AI Agents' Real-World Cybersecurity Capabilities at Scale

Authors:Zhun Wang, Tianneng Shi, Jingxuan He, Matthew Cai, Jialin Zhang, Dawn Song

View PDF

Abstract:AI agents have significant potential to reshape cybersecurity, making a thorough assessment of their capabilities critical. However, existing evaluations fall short, because they are based on small-scale benchmarks and only measure static outcomes, failing to capture the full, dynamic range of real-world security challenges. To address these limitations, we introduce CyberGym, a large-scale benchmark featuring 1,507 real-world vulnerabilities across 188 software projects. Adjustable to different vulnerability analysis settings, CyberGym primarily tasks agents with generating a proof-of-concept test that reproduces a vulnerability, given only its text description and the corresponding codebase. Our extensive evaluation highlights that CyberGym effectively differentiates agents' and models' cybersecurity capabilities. Even the top-performing combinations only achieve a ~20% success rate, demonstrating the overall difficulty of CyberGym. Beyond static benchmarking, we show that CyberGym leads to the discovery of 34 zero-day vulnerabilities and 18 historically incomplete patches. These results underscore that CyberGym is not only a robust benchmark for measuring AI's progress in cybersecurity but also a platform for creating direct, real-world security impact.

Subjects:	Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2506.02548 [cs.CR]
	(or arXiv:2506.02548v3 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2506.02548

Submission history

From: Zhun Wang [view email]
[v1] Tue, 3 Jun 2025 07:35:14 UTC (315 KB)
[v2] Wed, 8 Oct 2025 06:32:58 UTC (604 KB)
[v3] Tue, 24 Mar 2026 00:56:07 UTC (658 KB)

Mar	APR	May
	19
2025	2026	2027

Computer Science > Cryptography and Security

Title:CyberGym: Evaluating AI Agents' Real-World Cybersecurity Capabilities at Scale

Submission history

Access Paper:

Current browse context:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:CyberGym: Evaluating AI Agents' Real-World Cybersecurity Capabilities at Scale

Submission history

Access Paper:

Current browse context:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators