π§ Strategy Overview
10 Discovery Strategies
Delta-scan and full-scan modes, SARIF-guided refinement, call-path targeting, and input synthesis blended with sanitizers and coverage to produce fast, verifiable POVs.
13 Patching Strategies
From minimal, path-aware guards to structural refactors and XPatch (patching without a POV) β all gated by compile, test, and POV-negation checks.
Multi-Model Orchestration
Routing and fallback across Anthropic, OpenAI, and Google models, with quotas, backoff, and success-rate tracking to avoid cascade failures.
π Discovery Strategies (10)
Discovery aims to produce a robust proof-of-vulnerability (POV). Strategies combine static signals (SARIF, call graphs, reachability) with dynamic feedback (sanitizers, coverage) to steer LLMs toward executable triggers.
1) Delta-Scan (Patch Diff Focus)
Prioritize files and functions touched by recent changes. Parse diffs, map to call paths, and have the LLM hypothesize likely CWE classes and inputs that traverse the modified path.
2) Full-Scan (Hotspot Ranking)
Rank all files using heuristics (unsafe APIs, complexity, historical bug density). Use LLM to draft targeted test harnesses per hotspot with auto-build/run loops.
3) SARIF-Guided Refinement
Ingest SARIF from static analyzers. For each finding, ask the LLM to convert the warning into a runnable POV with concrete inputs, then validate under ASAN/UBSAN.
4) Call-Path Targeting
Generate candidate input shapes that traverse specific call sequences to the sink. LLM reasons about required invariants and state to reach the vulnerable site.
5) Taint-Spot Exploration
Surface user-controlled data flows (CLI args, HTTP params, file parsers). LLM proposes minimally valid inputs that survive parsing and reach memory-unsafe operations.
6) Sanitizer-Driven Generalization
Cluster sanitizer crashes and let the LLM generalize a stable repro from noisy stack traces. Convert flakey crashes into deterministic POVs.
7) Grammar-Guided Input Synthesis
Ask the LLM to emit a minimal grammar/schema for inputs (e.g., PNG, JSON). Mutate within the grammar to preserve reachability while stressing edge counts.
8) Coverage-Loop Refinement
Instrument harnesses, report missed branches back to the LLM, and request inputs that flip specific predicates or increase rare-edge hit counts.
9) Exception-Mining (Java)
Exploit language-level stack traces and messages to shortcut to faulting APIs. Request inputs that transform a handled exception into a crash or integrity violation.
10) Pattern Replay
Leverage a library of historical bug patterns (e.g., off-by-one in image decoders). LLM adapts known triggers to the current codebase with type- and path-aware tweaks.
Design Notes
Discovery treats model output as untrusted. Every candidate is compiled and executed under sanitizers; only deterministic repros graduate to POVs. Details in our technical report.
π©Ή Patching Strategies (13)
Patches must compile, negate the POV, and preserve functionality. We bias toward minimal, auditable diffs unless the LLM justifies a larger refactor. Each strategy runs through the same validation gates.
1) Minimal Guard
Add precondition checks (bounds, null, state) at the faulting site with early returns or error codes.
// Before
memcpy(dst, src, len);
// After
if (len > dst_size) return ERR_INVALID_SIZE;
memcpy(dst, src, len);
2) Path-Aware Fix
Harden only the failing path: guard specific states along the call-chain that lead to the sink; avoid broad behavioral changes.
3) Size-Checked Copy
Replace unsafe copies with bounded variants (`strncpy`, `memcpy_s`) or explicit length checks with clear error handling.
4) Input Validation
Enforce strict parsing and reject malformed structures early (magic bytes, lengths, indices, state machines).
5) Signedness & Overflow
Normalize types, add overflow checks on arithmetic, and clamp to safe ranges before allocations or indexing.
6) Resource Safety
Fix leaks and double-frees by clarifying lifetime rules; prefer RAII/`defer`-like scopes where available.
7) Concurrency Guard
Introduce minimal synchronization (atomic flags, fine-grained locks) to eliminate races causing memory corruption or TOCTOU.
8) Defensive Defaults
On parser or API failure, return safe defaults rather than partially initialized structures.
9) API-Level Replacement
Swap to safer APIs (e.g., `snprintf` over `sprintf`), or centralize validation in a wrapper used across call sites.
10) State Machine Tightening
For complex formats, enforce valid transitions and terminal states to prevent invalid memory access.
11) Spec-Conformant Refactor
Where minimal guards arenβt enough, perform small refactors that align with spec rules while preserving public APIs.
12) Regression-Aware Patch
Augment patches with new unit tests derived from the POV and near-miss inputs to prevent reintroduction.
13) XPatch (No-POV Fix)
When a POV cannot be produced, synthesize a patch from high-confidence static findings plus local invariants; validate by negative testing and coverage invariants.
// Example (Java) β safer length check
public byte[] read(byte[] buf, int len) {
if (buf == null || len < 0 || len > buf.length) {
throw new IllegalArgumentException("invalid length");
}
// ... existing logic ...
}
ποΈ Orchestration & Validation Gates
LLM Router
class LLMRouter:
MODELS = ["claude", "gpt", "gemini"]
async def call(self, prompt, validate):
for name in self.MODELS:
try:
out = await call_model(name, prompt)
if validate(out):
return out
except (RateLimit, Overload):
await backoff()
continue
raise RuntimeError("All models failed")
Per-model quotas, exponential backoff, and success-rate telemetry avoid stampede failures and control cost.
Validation Gates
- Compile under sanitizers; reject non-deterministic crashes
- POV must be deterministic (Nβ₯3 runs)
- Patch must negate POV and pass regression
- Cost/latency budgets enforced per strategy
Reproducibility
Per-job directories (/tmp/job_{id}/...
) and artifact bundles (inputs, logs, patches) make results auditable, aligning with our report.
π§ͺ What Worked (and What Didnβt)
Start Minimal
Minimal guards neutralize most memory errors without side effects; escalate to refactors only when justified by failing regression tests.
Cache Static Facts
Precomputed call graphs and symbol maps make strategy runs predictable and cheap at scale.
Feedback Loops Matter
Coverage- and sanitizer-driven hints dramatically reduce LLM trial-and-error during discovery.
POV-First β Except When Not
XPatch salvages high-confidence findings when a deterministic POV is elusive, but requires tighter negative tests.
π Dive Deeper
The full system, data, and ablations are documented in our technical report and open-source CRS.