LLM-Powered Vulnerability Detection and Patching

An autonomous system that discovered 28 security vulnerabilities including 6 zero-days, and patched 14 of them, placing 4th in DARPA's AI Cyber Challenge

28 Vulnerabilities Found
6 Zero-Days
14 Patches Applied
FuzzingBrain Architecture
CRS Web Service
Static Analysis
Worker Services
Submission Service

About Our System

🎯

Autonomous Detection

Our system automatically generates Proofs-of-Vulnerability (POVs) and produces patches for discovered security issues without human intervention.

LLM-Powered

Leverages 23 distinct LLM-based strategies across multiple frontier models from Anthropic, Google, and OpenAI for comprehensive analysis.

Massively Parallel

Deployed across ~100 VMs with thousands of concurrent threads, enabling rapid vulnerability discovery and patch generation.

Technical Approach

System Architecture

FuzzingBrain consists of four core services working in parallel:

  • CRS Web Service: Central coordinator for task decomposition and fuzzer distribution
  • Static Analysis Service: Provides function metadata, reachability, and call path analysis
  • Worker Services: Execute parallel POV generation and patching strategies
  • Submission Service: Handles deduplication, SARIF validation, and bundling

POV Generation Strategies

Delta-Scan Full-Scan SARIF-Based

10 LLM-based strategies for vulnerability discovery, from basic iterative refinement to advanced multi-input generation with coverage feedback.

Patching Strategies

Multi-Model XPatch Path-Aware

13 patching strategies including our novel XPatch approach that generates patches even without POVs.

Key Technical Innovations

🔄 Iterative LLM Refinement

Multi-turn dialogue with structured feedback loops incorporating execution results and coverage data

🎭 Multi-Model Fallback

Resilient architecture with automatic model switching when individual LLMs fail or reach limits

📊 Static/Dynamic Analysis Integration

Call paths, reachability, and real-time coverage feedback to guide vulnerability discovery

Competition Results

4th Place DARPA AI Cyber Challenge out of 7 finalists
28
Total Vulnerabilities
Discovered across real-world C and Java projects
6
Zero-Day Discoveries
Previously unknown vulnerabilities
14
Successful Patches
Validated fixes that preserve functionality
60
Total Challenges
Final competition round

Performance Insights

⏱️ Speed

Most vulnerabilities detected within the first 30 minutes of analysis

🎯 Effectiveness

LLM-based strategies discovered nearly all POVs, vastly outperforming traditional fuzzing

🏗️ Scalability

Successfully processed 50+ fuzzers per project across multiple sanitizer configurations

Research Team

Ze Sheng

Texas A&M University

Qingxiao Xu

Texas A&M University

Jianwei Huang

Texas A&M University

Matthew Woodcock

Texas A&M University

Heqing Huang

City University of Hong Kong

Alastair F. Donaldson

Imperial College London

Guofei Gu

Texas A&M University

Jeff Huang

Team Lead

Texas A&M University

Open Source & Resources

📂

FuzzingBrain CRS

Complete Cyber Reasoning System implementation with all 23 LLM-based strategies

View on GitHub →
🏆

LLM Leaderboard

Benchmark comparing state-of-the-art LLMs on vulnerability detection and patching tasks

View Leaderboard →
📄

Technical Paper

Detailed technical description of our CRS with emphasis on LLM-powered components

Read Paper →