$2,500. 5 days. PDF report with reproducible PoCs. Solo founder, paid CVE researcher, no SDR follow-up.
AI agents in production leak system prompts on the first try. RAG pipelines ingest poisoned docs that quietly rewrite their behavior. Function-calling LLMs get jailbroken into running tools they shouldn't.
Most teams ship LLM features without a single adversarial test, because the big consultancies (Bishop Fox, NCC, Trail of Bits) charge $50K+ for an LLM engagement and book 8 weeks out.
The result: most production LLM apps are wide open and the founders don't know it.
12 attack categories, 286 detectors, real PoCs.
Executive summary (one page, non-technical), finding-by-finding details with reproducible PoCs (curl/python), CVSS-style severity, concrete fix recommendations.
Walk through findings with your team. Q&A. Prioritization help. No slides, just your report on screen.
Every finding ships with a reproducible PoC. No theoretical "this could happen" filler. If I can't reproduce it, it's not in the report.
During the engagement and 2 weeks after delivery. You email me, I reply.
Three tiers. Pay direct or book a call first.
Type-confusion bug in the GGUF parser. Reported, fixed upstream, $4,000 bounty paid. Real exploitation, real fix, real check cleared.
In-house scanner that runs on every engagement. Same toolchain that found the GGUF bug. Maintained continuously as new attack classes appear.
Every finding gets a reproducible PoC synthesized by an LLM, then validated manually. You're not getting a Burp-Suite report dressed up.
Solo founder, direct line. You email me, I reply. No account managers, no SDR follow-up cadence.
You get the report anyway, documenting exactly what was tested and why each test didn't yield a finding. That report is itself useful for SOC2/ISO27001 evidence. Refunds: Basic — no refund (the testing was done). Pro/Enterprise — 50% refund if zero findings of medium severity or higher (this has not happened yet).
Yes. NDA signed before kickoff. All testing happens against endpoints you provide. No data leaves my workstation except the final PDF. Test artifacts (logs, payloads) deleted 30 days after delivery unless you ask me to keep them. Encrypted MacBook, FileVault, no cloud sync of engagement folders.
Those are platforms — you self-serve, you interpret results. This is a service — I do the work, hand you a report, walk you through it. Their automated scans miss the stack-specific stuff (your custom system prompt, your specific RAG pipeline, your tool surface). I find that because I'm a human looking at your specific app, not running you through a generic scanner.
Anything that exposes an HTTP endpoint or a chat interface. Tested: OpenAI, Anthropic, Google, Mistral, local llama.cpp/Ollama, vLLM, LangChain, LlamaIndex, Haystack, AutoGen, CrewAI, MCP servers, custom orchestrators. Exotic stack? Ask before booking.
Yes, mutual NDA before any technical conversation. I'll sign yours, or send mine. Either works.
I default to staging. If you only have prod, I rate-limit aggressively (≤1 req/sec) and skip destructive tests. We agree on the scope in writing before kickoff.
Basic: 3 business days from kickoff. Pro: 5-7 business days. Enterprise: 2-3 weeks. I usually have one slot open for next week — book the discovery call to check.
Pro and Enterprise: yes, within 60 days. Basic: no, but the re-test is $1,000 flat (one focused pass on the previously found issues).
Yes — add-on workshop, 2 hours over Zoom, $1,500. Hands-on with real payloads against a sandbox. Best booked alongside Pro or Enterprise.
Yes — $2,500/month retainer. I re-scan on every model swap, every prompt change you push, and any time a new public jailbreak/CVE drops that affects your stack. Includes a monthly 30-min sync.
Book a 30-min discovery call (free, no slides) or pay direct and I'll send the kickoff doc within an hour.
Or email hello@owlmind.dev