Scaling to 100 VMs: FuzzingBrain's Parallel Architecture

📊 Scale Snapshot

🖥️

~100

VMs

🧵

Thousands

Concurrent Jobs

🧠

100K+

LLM Requests

🧪

50+

Fuzzers/Project

🏗️ Four Core Services

FuzzingBrain decomposes the security workflow into four independently scalable services, each with clear contracts and idempotent operations.

🌐 CRS Web Service

Role: Central coordinator. Decomposes each challenge into 50+ fuzzer-target jobs per sanitizer configuration, tracks state, and assigns work.

Scale Tactics: Sharded queues, idempotent job tokens, and backpressure when workers saturate.

🔍 Static Analysis Service

Role: Precomputes reachability, call paths, and function metadata. Exposes results as JSON to keep workers fast and stateless.

Scale Tactics: Aggressive caching and timeouts on oversized projects; results reused across strategies.

⚡ Worker Services

Role: Execute discovery and patch strategies in parallel, each in an isolated workspace.

Scale Tactics: Per-job temp dirs, unique artifact paths, and limited concurrency per worker to avoid I/O contention.

📤 Submission Service

Role: Validates and deduplicates POVs/patches, bundles SARIF, and prepares submissions.

Scale Tactics: Bloom-style fast checks + deep validation on candidates; multi-LLM consensus for near-duplicate detection.

⏱️ Scheduling for Throughput

Shard by Fuzzer × Sanitizer

Jobs split along fuzzer-target × sanitizer axes balance CPU-bound compilation with I/O-bound LLM calls.

Idempotent Job Tokens

Jobs can be retried or stolen without double-submission; workers record atomic checkpoints.

Backpressure & Timeouts

Adaptive concurrency caps and exponential backoff prevent model rate-limit cascades and queue explosions.

🎭 Multi-Model Orchestration

Routing & Fallback

class LLMRouter:
    MODELS = ["claude", "gpt", "gemini"]

    async def call(self, prompt, validate):
        for name in self.MODELS:
            try:
                out = await call_model(name, prompt)
                if validate(out):
                    return out
            except (RateLimit, Overload):
                await backoff()
                continue
        raise RuntimeError("All models failed")

This simple pattern becomes non-trivial at scale; backoff and per-model quotas avoid failure cascades.

Validation Gates

Workers treat LLM outputs as untrusted: compile, run under sanitizers, and verify POV negation for patches before promotion.

Observability

Per-model success rates, token costs, and latency distributions drive dynamic routing and cost-aware throttling.

🛠️ Hard-Learned Scale Lessons

Process Isolation Prevents Races

Unique per-job paths (/tmp/job_{id}/...) eliminated cross-strategy file clobbering and nondeterminism.

Locks Are Not a Silver Bullet

We removed coarse locks in favor of lock-free maps and message passing to avoid deadlocks during peak submission windows.

Static Analysis Must Be Cached

Precomputing call graphs and reachability shaved minutes per job and made performance predictable across VMs.

Backoff Beats Fallback Storms

Without exponential backoff, rate-limit bursts on one model stampede the next. Adaptive caps stabilized throughput.

🚀 From AIxCC to Real-World Workloads

CI/CD Integration

security_scan:
  - static_analysis: precompute
  - llm_discovery: parallel_strategies
  - patch_generation: consensus
  - verification: pov_negation + regression
  - deploy: gated

The same decomposition scales to monorepos and nightly scans.

Cost Controls

Token budgets and model tiers per strategy keep API costs manageable under load.

Reproducibility

Seeded runs and artifact bundles (inputs, logs, patches) make results auditable for security review.

🔍 Explore the Architecture

Our open-source CRS demonstrates this architecture end-to-end — from job scheduling to patch validation.

📂 View Source Code 📄 Read Technical Report

Validated in competition: thousands of concurrent jobs, robust outputs, predictable costs.