GOTROOT / Penetration Testing

What a Penetration Test Actually Looks Like: Two Weeks With the Team | GOTROOT

A calm account of a testing team’s two weeks — scoping, recon, verification, reporting — plus how GOTROOT, through an APT lens, finds and proves 1-day (patch-gap) and 0-day flaws in components and libraries, with field examples.

GOTROOT Research Team Jun 5, 2026

There’s one question clients ask most at kickoff — “Is our service safe?” It’s a great question, and the honest answer is a careful one: “It depends on how much we look at, and how.” This is a calm walk through what our testing team thinks about and does during the roughly two weeks between the quote and the final report. We hope it helps anyone preparing for a first penetration test.

How is a vulnerability scan different from a pentest?

Let’s start with the most common question. Scanners are genuinely useful — they sweep known patterns quickly and broadly. What they can’t easily add is the judgment of “what happens if you connect these small issues together.” Think of that connecting step as the part a person helps with.

For example, here’s a path we once mapped for a commerce client. Each piece was “medium” or lower on its own.

[low]   password-reset response leaks the user's email
   │
   ▼
[med]   account enumeration → a valid internal operator account
   │
   ▼
[med]   admin page protected only by an IP allowlist
   │       └─ reach it internally via the SSRF found earlier
   ▼
[high]  operator access → all orders/payments + unlimited coupons

On a scanner report these show up as unrelated items. That doesn’t mean the scanner is lacking — it just means a person adds the step of reading how they connect.

Scoping matters most

This is why that careful answer exists. Testing begins by agreeing on Rules of Engagement together: target assets, test windows, hard “please never do this” lines (no outages, no real-data changes), and a contact path if anything goes wrong. The more carefully we set this, the more value we get from the two weeks.

Type	Information shared	We’d suggest it when…
Black box	Almost none (outsider)	Closest to a real outside attack; recon takes time, so it fits when the schedule allows.
Gray box	A user account or two, some docs	Broad coverage for the cost — the most commonly chosen option.
White box	Source, design, admin	The deepest look; good when you want business-logic flaws covered too.

If it’s your first test, we’d gently suggest gray box: time saved on recon goes into the actual testing, which tends to make a real difference the first time around.

Recon: everyone ends up with a forgotten asset

Without a map of the attack surface, testing is hard, so the first days go into carefully inventorying assets. Run a service long enough and a forgotten server appears for everyone — and it often turns out to be an important starting point.

# gather subdomains → keep the live ones → see what they run
subfinder -d target.com -silent | httpx -silent -title -tech-detect

# sample output
https://api.target.com       [200] [main API · nginx]
https://event2022.target.com [200] [Apache/2.4.29 · PHP/5.6]   ← worth reviewing together

When we find an old asset like that last line, we treat it as “a list to tidy up together,” not something to blame. It’s no one’s fault — assets simply accumulate over time.

“Probably” vs “we confirmed it”

Once we’ve narrowed the suspects, we verify them. One principle we try to keep is to avoid writing “probably” in a report. We reproduce it in a controlled way to confirm, or we explain why it doesn’t reproduce — guesses can cloud a client’s decisions. All verification stays within the RoE safety lines and the agreed scope.

The part after gaining a foothold is actually the more important one. Following credential reuse and permission gaps, we show, as a scenario, how a single intrusion could affect real operations. What helps decisions tends to be that story, more than a vulnerability ID.

Going deeper — through a real attacker’s (APT) eyes

This is the part we put special care into. Real targeted-attack (APT) groups don’t only go after application logic. More quietly, they go after what a service leans on — open-source libraries, frameworks, embedded components, and external dependencies. Even well-written code ultimately sits on dozens of components, and a gap in one of them can become the way into everything. So we look there with the same eyes.

1-day: already disclosed, but not yet closed

Starting with the most realistic threat: the moment a component’s security patch is published, the diff between “before and after” becomes a map of what was vulnerable. While production hasn’t applied it yet (the “patch gap”), attackers read the patch in reverse to reconstruct an exploit. We do the same — fingerprint the components and versions in your environment and confirm, within a controlled scope, whether a known flaw is actually exploitable in this configuration.

# identify components/versions from headers, static assets, error pages
$ httpx -u https://target -title -tech-detect -server
  target  [200]  [Spring Boot · Jackson]  [nginx/1.18.0]

# estimate front-end library versions from bundle hashes
# cross-check identified versions against public advisories
# → reconstruct a PoC via patch diffing, and safely confirm it only works here

The point isn’t “your version is old,” but “this version, in this configuration, is exploitable like this” — which is what makes patch priority clear.

0-day: the gap nobody knows yet

For higher-risk pieces — externally exposed core components or libraries you built yourselves — we sometimes go a step further to find flaws nobody knows yet, using targeted fuzzing and source review of parsers, decoders, and protocol handlers to look for unknown memory corruption or authorization bypass. This runs only when scope and safety lines are fully agreed in advance, and always in a controlled way.

The CVEs we’ve registered over the years are an extension of this work. If you’d like the detailed process, we’ve written it up — patch diffing through fuzzing-harness design — in How 0-days Are Found.

Field note: “our code is clean” often isn’t the end of the story. Even with sound code, one library it relies on can become the real way in. Looking at that square together is, to us, the heart of APT-perspective testing.

The report is what we care about most

The test is two weeks, but what stays with the client is the report. So the question we ask ourselves is simple — “can the responsible developer read this and fix it right away?” Rather than listing many findings, we believe a report with clear priority and copy-along reproduction steps is more genuinely helpful.

Field note: in our experience, the clients who schedule the re-test together at the end tend to stabilize fastest. Security feels less like a one-time event and more like a process you keep up.

FAQ

How long does it take?

It depends on scope and depth. A single web app is usually 1–2 weeks; complex infra or a red team takes a bit longer. We set the exact schedule together during scoping.

Could it affect production?

This is what we’re most careful about. Risky steps run only under windows, approval, and control, and we verify in staging when there’s any concern — with a contact path agreed up front to stop immediately if needed.

Could we just take the scanner output?

Scanner output is a meaningful starting point. Please think of us as adding the step of removing false positives and checking how things connect.

Closing

The heart of a penetration test seems to lie in process and steady human judgment more than flashy tools: scoping it well together, finding and tidying forgotten assets, confirming things directly, and writing them up so they’re easy to fix — the parts we always try to do better. If you’re curious how far your service holds up, feel free to ask us at penetration testing. We’d love to start small and build something solid together.