Skip to content

Architecture

pwnkit is a general-purpose autonomous pentesting framework that covers LLM endpoints, web applications, npm packages, and source code. It runs autonomous AI agents in a discover-attack-verify-report pipeline. Each agent uses tools (read_file, run_command, send_prompt, save_finding) and makes multi-turn decisions, adapting its strategy based on what it learns. Blind verification kills false positives — every finding is independently re-exploited by a second agent that never sees the original reasoning.

The core pipeline has four stages:

Discover -> Attack -> Verify -> Report

These stages are grouped into two agent sessions:

1. Research agent (Discover + Attack + PoC)

Section titled “1. Research agent (Discover + Attack + PoC)”

A single agent session that:

  1. Discovers the attack surface — maps endpoints, detects models, identifies features, fingerprints web technologies, and enumerates exposed paths
  2. Attacks the target — crafts multi-turn attacks spanning prompt injection, jailbreaks, tool poisoning, data exfiltration (LLM), CORS misconfiguration, SSRF, XSS, path traversal, header injection (web), supply chain and malicious code analysis (npm), and vulnerability patterns (source code)
  3. Writes PoC code — produces a proof-of-concept that demonstrates each vulnerability

The research agent has access to tools like send_prompt (for LLM endpoints), read_file (for source review), run_command (for package audits and web probing), and http_request (for web app pentesting). It adapts its strategy based on what it discovers — if a naive prompt injection fails, it may try encoding bypasses, multi-turn escalation, or indirect injection. For web apps, it escalates from fingerprinting to active exploitation. For source code, it traces data flows from user input to dangerous sinks.

The verify agent receives only the PoC code and the file path. It never sees the research agent’s reasoning, chain of thought, or attack strategy. This is the same principle as double-blind peer review.

The verify agent independently:

  • Traces data flow from the PoC
  • Attempts to reproduce the finding
  • Confirms or kills the finding

If the verify agent cannot reproduce the vulnerability, it is killed as a false positive. This eliminates the noise that plagues other scanners.

Only confirmed findings (those that survived blind verification) are included in the final report. Output formats:

  • SARIF — for the GitHub Security tab
  • Markdown — human-readable report
  • JSON — machine-readable for pipelines

Each finding includes a severity score, category, PoC code, and remediation guidance.

The pipeline adapts its tooling and attack strategy based on the target type:

ModeTargetWhat it does
deepLLM endpoint URLPrompt injection, jailbreaks, tool poisoning, data exfiltration, multi-turn escalation
probeLLM endpoint URLLightweight surface scan of an LLM endpoint
webWeb application URLCORS, headers, exposed files, SSRF, XSS, path traversal, fingerprinting
mcpMCP serverTool poisoning, schema abuse, permission escalation
auditnpm package nameSupply chain analysis, malicious code detection, dependency risk
reviewLocal path or GitHub URLAI-powered source code vulnerability analysis

The mode is auto-detected from the target when possible, or set explicitly with --mode.

pwnkit decouples the scanning pipeline from the LLM backend through runtime adapters. Each adapter implements the same interface but connects to a different provider:

AdapterBackendHow it works
ApiRuntimeOpenRouter / Anthropic / OpenAIDirect HTTP calls to the provider’s API
ClaudeRuntimeClaude Code CLISpawns claude as a subprocess with tool definitions
CodexRuntimeCodex CLISpawns codex as a subprocess
GeminiRuntimeGemini CLISpawns the Gemini CLI
McpRuntimeMCP serversConnects to Model Context Protocol servers
AutoRuntimeBest availableDetects installed CLIs and picks the best per stage

The --runtime flag selects which adapter to use. The auto runtime probes for installed CLIs and picks the most capable one for each pipeline stage (for example, using Claude for deep reasoning and the API for quick classification).

pwnkit integrates with the Model Context Protocol (MCP) in two ways:

The McpRuntime adapter can connect to MCP servers, using their exposed tools as the LLM backend for the scanning pipeline. This enables using any MCP-compatible model server.

The --mode mcp scan mode (coming soon) will probe MCP servers for:

  • Tool poisoning — malicious tool definitions that inject instructions
  • Schema abuse — tool schemas designed to exfiltrate data
  • Permission escalation — tools that request more access than needed

The product is intentionally split into two surfaces:

  • CLI — the execution surface for local runs, CI, replay, and exports
  • Dashboard — the local verification workbench for triage, evidence review, and human sign-off

The CLI runs scans and produces findings. The dashboard consumes those findings and provides a Kanban-style board for triage, evidence inspection, and disposition tracking. Both share the same local SQLite database.

For web application pentesting, pwnkit uses a shell-first approach. Instead of routing the agent through structured tools like crawl_page, submit_form, or http_request, the web mode gives the agent a minimal tool set:

  • shell_exec — run any bash command (curl, sqlmap, python, nmap, etc.)
  • save_finding — record a confirmed vulnerability with PoC
  • done — signal completion

This works because the model already knows curl, bash pipelines, and standard pentesting tools from training data. A single curl -c cookies.txt ... | jq command replaces multiple structured tool calls and eliminates the state-threading confusion that causes agents to loop.

The structured tools (crawl_page, submit_form, http_request) are still available as optional additions, but benchmarking showed the agent performs better with just shell access. On the XBOW benchmark, the shell-first approach scored 70% (7/10) without any benchmark-specific tuning.

See the philosophy page for the full rationale behind this design decision.

Each agent has access to a set of tools depending on the scan type:

ToolUsed byPurpose
read_fileResearch, VerifyRead source files for code review
run_commandResearch, VerifyExecute commands in a sandbox
send_promptResearch, VerifySend prompts to LLM endpoints
save_findingResearchRecord a discovered vulnerability with PoC
list_filesResearchEnumerate files in a directory
search_codeResearchSearch for patterns across a codebase
http_requestResearch, VerifySend HTTP requests for web app pentesting