AI-augmented security auditing, for the in-house engineer.
A reproducible workflow for running AI-assisted IT security audits on your own infrastructure. Kali Linux in a Docker container, the official MCP server as the tool seam, and GitHub Copilot agent mode in VS Code as the driver — with every tool call gated behind your explicit approval.
Who this is for
Two readers, both served by every section.
| Reader | Prior knowledge | What they want |
|---|---|---|
| Primary — the operator. SME in-house IT/security engineer; technically capable owner-operator. |
Comfortable with the CLI, basic networking, an IDE. No prior penetration-testing training assumed. | A reproducible method they can run on their own kit. Exact commands, verifiable steps, cross-platform parity. |
| Secondary — the sponsor. Non-specialist IT leadership; learners new to AI-augmented security work. |
General IT literacy; no command-line fluency required. | Enough understanding of what an AI-assisted audit looks like to sponsor, oversee, or sit alongside it. |
What you will be able to do
- L1 State the legal and ethical preconditions for an internal audit in the UK (anchored on the Computer Misuse Act 1990) and produce a one-page written authorisation against a real target.
- L2 Stand up Kali Linux in a Docker container with the official MCP server running as a systemd-managed service on Linux, Windows, or macOS.
- L3 Wire VS Code and GitHub Copilot Chat (in agent mode) to that MCP server, with the operator approval boundary intact on every tool call.
- L4 Run an end-to-end audit session against a scoped target: framing, reconnaissance, CVE verification, authentication and exposure analysis, and a written report.
- L5 Recognise the new risks an agentic-AI toolchain introduces — prompt injection, excessive agency, tool-call abstraction, supply-chain risk, data leakage — and threat-model their own organisation's adoption of it.
How the guide is built
Four teaching commitments shape every section.
- Legal before technical. Authorisation, scope, and disclosure are taught in Section 02, before any tool is introduced. The same Nmap scan is professional work or a Computer Misuse Act offence depending on whether the operator can produce written authorisation.
- Two install paths. The Docker setup is taught manually first (every flag visible), then automated with Docker Compose. You learn what each flag does before letting a compose file hide it.
- Example prompts, not a guided audit. Section 07 gives a handful of prompt templates you can adapt — framing, CVE triangulation, report drafting — rather than walking through one specific engagement step by step.
- The assistant proposes; the operator approves. Every tool call in agent mode is a proposal you must accept. This approval boundary is the load-bearing safety property of the whole workflow.
The eight sections
Legal & ethical foundations
Authorisation in writing, scope drift, and the Computer Misuse Act 1990 applied to a port scan.
03The toolchain at a glance
The four-part architecture and the two trust boundaries that hold it together.
04Kali in Docker — Method 1
Manual step-by-step setup with docker run and docker exec. Every flag explained.
Kali in Docker — Method 2
The same lab in one command using Docker Compose and a first-boot entrypoint script.
06Wiring VS Code to the MCP
Configure mcp.json, enable agent mode, and verify the approval boundary.
Running the workflow
A five-stage session structure plus three example prompts you can adapt to your own engagement.
08Threat-modelling agentic AI
The new risks the toolchain itself introduces — prompt injection, excessive agency, and four more.
GGlossary & key
Every acronym, tool, and standard cited anywhere in the guide.
What a session actually feels like
Section 07 walks through the shape of a working session: how to frame the engagement, how to force the assistant to triangulate a candidate finding, and how to constrain the report draft at the end. The prompts are deliberately generic — you supply the target, the scope, and the constraints; the prompts give you the scaffolding.
One lesson worth previewing here: in the kinds of audits this workflow is built for, the headline risk is usually deployment posture, not unpatched CVEs. Firmware-patching is necessary but not sufficient. The highest-leverage remediation is almost always configuration-level — what is exposed, on which port, with which encryption, behind which authentication — and an AI-augmented audit gets you to those conclusions faster than working unaided.
What this guide deliberately does not cover
- Penetration-testing certification material. OWASP WSTG, NIST SP 800-115, and the CREST CRT body of knowledge are referenced but not replaced.
- Unauthorised access of any kind. Every technique is taught inside a lawful scope. Section 02 sits ahead of every technical section by design.
- Adjacent disciplines. Web application testing, mobile, cloud, and social engineering are out of scope — this guide stays focused on internal IT and infrastructure audits.
Check yourself
Three questions on what this guide is for
Pick the option you think is right and expand it to see the verdict and a one-sentence explanation. These aren't trivia — getting any of them wrong means you've misunderstood something the rest of the guide will assume you've got.
Who is this guide intended for?
Professional penetration testers preparing for a certification exam.
An SME in-house IT or security engineer auditing infrastructure they are authorised to test.
Anyone who wants to learn how to break into IP cameras.
What does the workflow's "the assistant proposes; the operator approves" grammar describe?
A polite convention for talking about the AI in writing.
The load-bearing safety property of the whole workflow: every tool call is a proposal that must be approved.
An indication that the AI is restricted to read-only commands.
Does the architecture in this guide replace your judgement about which targets are in scope?