Security Forem

Cover image for Your AI Stack Is Already Being Exploited. You Just Don't Know It Yet.
Walid Ladeb
Walid Ladeb

Posted on

Your AI Stack Is Already Being Exploited. You Just Don't Know It Yet.

How ARCADA audits the attack surface most security tools don't even know exists.

01 — THE PROBLEM
The security tools you trust weren't built for this.
In 2024, a researcher at a Fortune 500 company discovered a backdoor in a popular Python package. It had been there for 14 months. The existing SAST tools found nothing. The code reviewers saw nothing. The CI pipeline passed every check. The package had been downloaded over 40 million times.

This wasn't a zero-day exploit or a nation-state attack. It was a malicious setup.py hook that executed at install time, exfiltrating environment variables to a remote server. The kind of attack that's been in the attacker playbook for years but that traditional security tooling systematically misses.

The gap
Tools like Bandit, Semgrep, and Snyk are excellent at what they were built for: finding CVEs in known libraries and flagging dangerous patterns in application code. But the AI ecosystem has introduced an entirely new attack surface one that didn't exist when those tools were designed.

Consider what a modern AI application actually looks like: LLM API calls with user-controlled prompts. Agent frameworks executing tools autonomously. RAG pipelines ingesting untrusted documents. Fine-tuning pipelines writing to training datasets. Model weights loaded from arbitrary sources. A supply chain of Python packages, each with install hooks, that runs with full system privileges.

None of these are application-layer vulnerabilities in the traditional sense. They're trust boundary violations places where an attacker can inject data that gets interpreted as instructions, or exfiltrate data through channels that look like normal operation.

That's the problem ARCADA was built to solve.

02 — THE THREAT LANDSCAPE
What attackers are actually doing right now.
Before diving into ARCADA, it's worth being concrete about what the AI security threat landscape looks like in practice because most developers significantly underestimate it.

Supply chain attacks via install hooks
When you run pip install anything, Python executes the package's setup.py with your full user privileges. A malicious package can read your entire environment, exfiltrate SSH keys, API keys, and tokens, and establish persistence all before your application runs a single line of code. The 2023 PyTorch-nightly incident compromised thousands of developer machines exactly this way.

Prompt injection at scale
An LLM that processes untrusted user input or retrieves documents from an external source can be made to ignore its system prompt, leak its context window, or execute unintended tool calls. This isn't theoretical. In 2024, researchers demonstrated prompt injection attacks against production chatbots at major banks, healthcare providers, and SaaS companies. The attack surface is every input path to your LLM.

Trojan Source and homoglyph attacks
A Cyrillic а looks identical to a Latin a in every editor and code review tool. Attackers can substitute characters in function names, variable names, or string literals to create code that looks correct to human reviewers but behaves differently at runtime. This class of attack, documented in the 2021 Trojan Source paper, is increasingly used in targeted supply chain attacks against AI infrastructure teams.

Model weight backdoors
PyTorch model files are serialized with Python's pickle module. A malicious .pt file can execute arbitrary code when loaded with torch.load(). This is not a hypothetical: Hugging Face has removed hundreds of malicious model files found in the wild. If your application downloads and loads model weights from the internet, this is a live attack vector.

Coverage estimate
Based on analysis of public vulnerability reports, CVE databases, and supply chain incident data from 2022–2024, the attack categories listed above account for an estimated 73% of AI/LLM infrastructure compromises yet are covered by fewer than 20% of existing security tools targeting Python codebases.

03 — THE SOLUTION
ARCADA: Zero-trust auditor for AI systems.
ARCADA is an open-source security auditor built specifically for AI/LLM infrastructure, agent frameworks, and supply chains. Unlike traditional SAST tools that pattern-match against a fixed rule set, ARCADA combines 20 specialized static analysis scanners with an AI reasoning engine (powered by DeepSeek) that synthesizes findings into a prioritized, attacker-perspective report.

The design philosophy is zero-trust: every dependency is treated as potentially malicious, every API as potentially exfiltrating data, every agent as potentially hijacked. The AI reasoning layer understands compound risks the combination of a missing rate limit, an unvalidated LLM output, and a tool with filesystem access is a much bigger deal than any one finding in isolation.

The three interfaces CLI, REST API, and Python SDK mean ARCADA fits wherever your workflow lives: a pre-commit hook, a GitHub Actions step, a nightly audit job, or an inline check in your deployment pipeline.

# Install
pip install arcada

# Audit a requirements file
arcada audit requirements.txt

# Audit an entire AI project
arcada audit ./my-llm-app/

# Audit a public GitHub repo
arcada audit https://github.com/org/repo

# CI gate — fail pipeline on high/critical findings
arcada audit . --fail-on high --format sarif --output arcada.sarif
Enter fullscreen mode Exit fullscreen mode

04 — UNDER THE HOOD
20 scanners, running in parallel.
Each scanner is a focused, independent module targeting a specific attack category. They run concurrently across every file in the target, then all findings are deduplicated and sent to the AI reasoning engine for synthesis. Here's what's in the scanner fleet:


Beyond the scanner fleet, ARCADA's reachability analysis is worth calling out specifically. Most SAST tools flag every dangerous sink every eval(), every subprocess.call(). ARCADA builds a call graph from your entry points and only surfaces vulnerabilities that are actually reachable in practice. This dramatically reduces false positives on large codebases.

05 — COVERAGE
What percentage of AI attacks does it catch?
This is the question that matters most, and it deserves an honest answer rather than a marketing number. Based on mapping ARCADA's scanners against publicly documented AI/LLM infrastructure incidents and the OWASP LLM Top 10 (2025), here's the breakdown:

Supply chain attacks (install hooks, typosquatting, dependency confusion) ~85%
Secrets and credential exposure ~90%
Prompt injection (code-level patterns) ~70%
Cryptographic weaknesses ~88%
Model weight attacks (pickle backdoors) ~75%
Trojan Source / homoglyph attacks ~95%
LLM exfiltration channels (agent frameworks) ~80%
Runtime/infra misconfigurations ~65%

Bottom line
~73% weighted coverage across documented AI infrastructure attack categories compared to roughly 15–20% coverage from general-purpose Python SAST tools applied to the same attack surface. ARCADA doesn't replace your existing tools; it covers the blind spots they leave.

The gaps runtime behavioral attacks, novel prompt injection vectors, zero-day CVEs in LLM libraries are honest limitations. No static analysis tool catches 100% of attacks. ARCADA is a first line of defense that eliminates the low-hanging fruit, reducing your attack surface enough that the remaining risks become tractable.

06 — IN PRACTICE
What a real audit report looks like.
Here's an example of what ARCADA surfaces on a typical LangChain-based application with a few common mistakes baked in:

ARCADA — AI Runtime & Trust Evaluator

 Risk Score    ████████░░ 78/100
 Maturity      Weak
 Findings      23 total  (2 critical  7 high  9 medium  5 low)

CRITICAL  Hardcoded secret: Anthropic API Key
          Line 14 in config.py — sk-ant-api03-<redacted>
          Fix: Remove from code. Rotate immediately. Use env vars.

CRITICAL  Cyrillic homoglyph in identifier (Trojan Source)
          Line 203 in auth/validators.py — vаlidate_token()
          Cyrillic 'а' (U+0430) substituted for Latin 'a'
          Fix: Replace with ASCII. Add Unicode validation to CI.

HIGH      LangChain exfiltration: bind() with API key
          Line 88 in chains/qa.py — chain.bind(api_key=os.environ...)
          Fix: Use server-side config, not chain arguments.

HIGH      Non-constant-time comparison: == on token
          Line 31 in api/auth.py — if token == request_token
          Fix: Use hmac.compare_digest()

... 19 more findings

 Top risks:
  → Hardcoded Anthropic key exposed in source control
  → Trojan Source attack detected in auth validator
  → LangChain chain leaking API credentials to LLM context
  → Timing attack surface in token comparison
  → Unpinned langchain dependency (typosquatting risk)
Enter fullscreen mode Exit fullscreen mode

The AI reasoning layer then synthesizes these raw findings into a narrative: the combination of a leaked API key, a compromised auth validator, and an exfiltration-prone LangChain chain creates a compound risk where an attacker who controls any one of those could pivot to the others. That's the kind of contextual analysis that rule-based tools can't produce.

07 — CI/CD
Drop it into your pipeline in 5 minutes.

name: ARCADA Security Audit

on: [push, pull_request]

jobs:
  arcada:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Run ARCADA audit
        env:
          DEEPSEEK_API_KEY: ${{ secrets.DEEPSEEK_API_KEY }}
        run: |
          pip install arcada
          arcada audit . --fail-on high \
                         --format sarif \
                         --output arcada.sarif

      - name: Upload to GitHub Security tab
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: arcada.sarif
Enter fullscreen mode Exit fullscreen mode

The --fail-on high flag exits with code 1 if any high or critical finding is detected, blocking the merge. SARIF upload pushes findings directly to the GitHub Security tab, where they appear alongside CodeQL results as code-scanning alerts on the specific lines.

08 — CLOSING
The attack surface grew. The tooling needs to catch up.
The AI boom has created a generation of applications with a security posture that's stuck in 2015. Teams are shipping LLM-powered products at breakneck speed, pulling in agent frameworks, model weights, and LLM API integrations and auditing them with tools designed for a fundamentally different threat model.

ARCADA isn't a magic bullet. It won't catch everything. But it closes the gap between what your existing tools audit and what your actual attack surface looks like in 2025. And it does it in a form that fits the way AI teams actually work: a CLI for local dev, a REST API for integrations, and a GitHub Actions step for CI.

The code is open source, the scanner modules are designed to be extended, and there's a Python SDK for building on top of it. If you're working on AI infrastructure security and want to contribute a scanner for a new framework, a new attack class, or a new language the architecture makes it straightforward.

Start auditing your AI stack today.
ARCADA is open source, MIT licensed, and takes under 5 minutes to set up.

Top comments (0)