Have you ever held a really good fake Rolex? Not the forty-dollar beach version. The kind that makes a jeweler pause. The movement is Swiss. The crystal is sapphire. The bracelet is 904L steel — the same alloy Rolex actually uses. Every component is genuine, sourced from real suppliers, assembled with real craftsmanship. The only thing that is fake is the crown on the dial and the person selling it to you.
That is the state of phishing in 2026.
The emails hitting enterprise inboxes right now are built from real parts. Real SendGrid accounts. Real Cloudflare CAPTCHAs. Real Google redirects. Real Microsoft domains. Your security tools inspect each component and return a clean verdict, because each component is legitimate. The only counterfeit is the intent.
We built StrongestLayer on one belief: AI would break email security. We expected the weapon to be AI-generated personalisation — hyper-targeted lures no signature system could catch. That is why we built a reasoning engine instead of another rule set.
We were wrong about the weapon. We were right about the problem.
When we analyzed our first 5,000 real enterprise phishing alerts — live enterprise traffic from Microsoft 365 and Google Workspace tenants over a six-month window, every one of them bypassing at least one of the top three secure email gateways — the answers were not about AI personalisation. They were about structural evasion by design.
This post is what we found.
5,000 alerts in the dataset. 100% bypassed at least one enterprise SEG. 100% autonomously investigated with full reasoning. Every alert analyzed end-to-end — not sampled, not triaged. This is what the data actually shows.
The assumption behind most AI-era email security thinking — including our own founding thesis — was that the primary shift would be at the lure layer. Better writing. More convincing pretexts. Deeper personalisation. The attacker as a better copywriter.
The data does not support that framing.
When we examined the 5,000 alerts in our dataset and asked what made each attack succeed in reaching the inbox — in bypassing SPF, DKIM, DMARC, URL reputation filtering, sandbox detonation, content NLP, and human judgment — the answer was almost never the quality of the writing. It was the architecture of the attack chain.
Every alert was autonomously investigated with full semantic reasoning — read the way a human analyst would read it, not dismissed as a false positive. Not sampled. Not triaged. Each one examined completely. What we found when we did that was a consistent structural pattern: attackers are not crafting better emails. They are engineering better kill chains — sequences of techniques, each one chosen to defeat a specific defence layer, assembled into combinations that pass every check the stack can run.
The craft is not in the prose. The craft is in the construction.
The fake Rolex maker does not need to write better. They need to source better components and assemble them in the right order. That is what modern attackers are doing — sourcing legitimate infrastructure and assembling it into a sequence your security tools cannot inspect as a whole, only as individual parts.
Across ~3,000 detections from December 2025 through February 2026 in enterprise Microsoft 365 and Google Workspace environments, we mapped every evasion technique present in attacks that reached the inbox. The result: 22 distinct techniques across 5 categories, appearing in over 1,400 documented combinations.
The five categories, with their prevalence in the dataset:
This category moves the malicious action off the email plane entirely. The email is not the threat. It is the delivery mechanism for a secondary interaction that the SEG cannot see and has no jurisdiction over. The SEG is structurally irrelevant — there is no URL, no attachment, no executable content to scan.
This category controls what the URL resolves to at scan time versus click time, or routes through infrastructure that security tools trust, exceed in recursion depth, or are explicitly configured to allowlist.
This category makes the email body and attachments appear benign to keyword filters, NLP classifiers, and content analysis systems. It exploits the gap between what a machine parses and what a human reads — or between what exists as a file and what the browser assembles from components.
This category exploits the gap between what authentication protocols verify — domain ownership — and what humans trust: visual similarity. SPF/DKIM/DMARC cannot distinguish a lookalike domain from a spoofed domain. They only verify that the sending infrastructure is authorised to send on behalf of the sending domain.
The most prevalent category by attack presence. This category targets the human, not the technology. It uses vocabulary identical to legitimate business communications, trained on real DocuSign and Microsoft notification language. NLP classifiers cannot distinguish it from genuine communications because the language model has been optimised against the same training data the classifier uses.
56.8% of attacks in the dataset use four or more evasion techniques simultaneously.
The average attack combines 4.11 techniques. Year-over-year, combination attacks have grown by +130%. And more than 80% of attacks use combinations outside the top 10 most common patterns — meaning the long tail of novel combinations is where most attacks live, not the known, documented chains.
This finding is why signature-based detection is structurally inadequate for this threat class. A detection system built to recognise known-bad patterns will catalogue the top ten chains and miss the other 1,400+. Attackers are not staying inside the documented patterns. They are exploring the combination space.
There is also a critical structural characteristic of combination attacks that makes them more dangerous than the individual technique prevalence suggests: there is zero detection rule overlap between different variant families. The rules that catch a QR code attack and the rules that catch a DocuSign variant attack are completely different. A rule that fires on one fires on nothing in the other. Each combination requires its own detection logic — and no organisation can write 1,400+ rules before the next combination appears.
This is the most sophisticated chain in the dataset and the one with the highest financial impact. It crosses the email-to-phone boundary, which means no email security tool can complete the analysis chain.
What each layer misses: SPF/DKIM/DMARC all pass. URL destination is docusign.com (legitimate). Content matches standard template. No payload detected in any attachment. Only a reasoning system that evaluates the entire chain — domain age, sending history, the 3-hop redirect behind docusign.com, the phone number that appears in no legitimate DocuSign communication — produces the correct verdict.
This chain was specifically designed to eliminate the URL surface that automated scanners need.
URL never appears in the email body. Automated scanners see nothing to detonate. 98% bypass rate on M365. 96% on Google Workspace.
This chain is notable because the malicious payload never exists as a file anywhere on the network.
Malicious payload assembled inside the browser — never exists as a file on the wire.
The combination is the attack. Not the QR code. Not the CAPTCHA. Not the multi-hop redirect. The specific sequence in which these techniques are combined — each one chosen to defeat the layer that the previous technique exposed — is the intelligence that makes modern phishing structurally different from its predecessors.
Enterprise email security stacks — in both Microsoft 365 and Google Workspace environments — can be mapped to six sequential defence layers. The attacks in our dataset were analysed against each layer. Here is what defeats each one, with prevalence from the data:

Layer 5 — Human Judgment — has the highest effective bypass rate in the dataset at ~78%. That is not an accident. Social engineering techniques are specifically tuned to defeat deliberation. The language is identical to legitimate communications. The urgency is manufactured at a level calibrated not to trigger suspicion but to suppress the instinct to pause and verify. NLP classifiers cannot distinguish it from genuine communications because it was trained on the same templates.
Layer 6 — Channel Escape — represents the most structurally interesting bypass. When the attack moves off the email plane entirely (TOAD, QR to mobile, Teams pivot), the email security stack becomes completely irrelevant. There is nothing in the email to scan. The threat activates in a channel the SEG cannot see. This is 35.9% of the dataset and growing.
Every security team managing email defence faces a configuration paradox that our data makes concrete. Tighten detection — you break the business. Loosen it — attackers win.
This is not a theoretical tension. Here is what the data shows happens at both extremes:
Attackers calibrate their attacks to stay just below the threshold where blocking them would break your business. The dial is a trap. Every configuration decision you make to protect the business also defines the attack surface attackers will optimise for. Rules are a ceiling on what attackers need to bypass, not a floor on what they cannot.
This is why the data shows +130% year-over-year growth in combination attacks. Attackers are not just discovering that combinations work. They are systematically exploring the combination space to find chains that pass just below the threshold your rules enforce — chains that your configuration decisions have inadvertently defined for them.
One of the most practically significant findings in the dataset: attackers know which email security platform their target uses, and they choose techniques accordingly.
The data shows clear platform-specific technique selection:
The platform targeting split is stark:
This platform specificity is important for security teams to understand. It means that threat intelligence from a competitor's Microsoft 365 breach is not directly applicable to your Google Workspace environment and vice versa. The attack chains are different by design.
Let me show you how our reasoning engine reads one of these attacks, using a representative case from the dataset.
The attack: DocuSign impersonation → Legitimate redirect → CAPTCHA gate → Credential harvest → Authority + Financial lure → M365 tenant
Rules verdict: CLEAN. ML verdict: SUSPICIOUS (ignored — medium confidence, not above quarantine threshold).
Reasoning verdict: MALICIOUS (blocked). Resolution time: under 2 minutes. Full explanation attached. No analyst investigation required.
The fake Rolex is not exposed by a better loupe examining each component. It is exposed by a craftsman who asks: why would a genuine manufacturer route through an unknown workshop before final assembly? That question — why is this chain constructed this way — is what reasoning asks. And it is the only question that exposes a fake built from real parts.
Three conclusions from the data that should change how you think about email security architecture in 2026:
With 80%+ of attacks using combinations outside the top 10 most common patterns, and zero detection rule overlap between variant families, no organisation can rule-engineer their way to comprehensive coverage. The combination space is too large and growing at +130% year-over-year. The right architecture evaluates the combination's intent, not the presence of individual indicators.
Every rule you loosen to protect the business defines an attack surface. Attackers in our dataset demonstrate systematic optimisation against the specific thresholds of the specific platforms they are targeting. The only way to break this cycle is to move detection to the intent layer — where the attacker's optimisation goal (stay below the threshold) does not apply, because the question is not 'how suspicious is this indicator?' but 'what is this email trying to accomplish?'
35.9% of attacks in the dataset exit the email plane entirely — through phone callbacks, QR codes to mobile, Teams pivots. No email security architecture currently monitors what happens after a user dials a number or scans a QR code on their personal phone. This is not an edge case. It is more than a third of the dataset. Security teams need to build awareness programmes specifically around TOAD and QR interaction, because no technical control can see what happens in that channel.
We built StrongestLayer expecting AI personalisation to be the weapon. What 5,000 real alerts taught us is that the weapon is structural — it is the combination, the chain, the sequence of techniques assembled to defeat each defence layer in order.
The fake Rolex is not convincing because the writing on the dial is good. It is convincing because every component is genuine, and the only counterfeit is the intent behind the assembly.
Detecting it requires the right question: not 'is each part legitimate?' but 'why would any legitimate manufacturer assemble these legitimate parts in this specific sequence?' That is intent reasoning. That is what our data shows works. And it is the architectural distinction that defines what email security needs to be in 2026 — not a better rule set, not a better anomaly detector, but a system that reasons about purpose.
The data represents approximately 5,000 phishing alerts from live enterprise traffic — Microsoft 365 and Google Workspace tenants — collected over a six-month window ending February 2026. Every alert bypassed at least one of the top three secure email gateways. Every alert was autonomously investigated end-to-end with full semantic reasoning — not sampled, not triaged, not dismissed as a false positive without investigation. The detections used for technique mapping represent approximately 3,000 confirmed phishing attempts from December 2025 through February 2026.
TOAD stands for Telephone-Oriented Attack Delivery. The attack delivers a phishing email with a phone number rather than a link. The email contains a legitimate-looking notification — a subscription confirmation, a security alert, a package delivery notice — and instructs the user to call a number if they did not initiate the action. The attacker's operator handles the call, walking the target through 'account verification' to capture credentials, MFA codes, or financial information directly. The 99% bypass rate on both M365 and Google Workspace reflects a structural limitation: email security tools inspect email content. TOAD attacks contain no URL, no attachment, no executable content. There is nothing for the security stack to scan. The threat activates entirely in a channel — the phone — that no email security platform monitors.
HTML Smuggling is an evasion technique where the malicious payload is not delivered as a file through the network. Instead, a user is directed to a web page that assembles the malicious payload entirely within the browser using encoded components distributed across the page's HTML. The payload never exists as a file on the wire — it is constructed locally, inside the browser, after the page loads. This means there is no file to hash, no file to sandbox, no network transfer to intercept. The payload is assembled from components that appear individually as normal HTML elements. The only way to detect HTML Smuggling is to evaluate the assembled intent of those components — what they produce when combined — rather than inspecting them individually.
The data shows that attackers select evasion techniques based on the specific detection capabilities and allowlist configurations of the target platform. Microsoft 365 with Defender has stricter URL scanning and more aggressive Safe Links behaviour for certain URL patterns, which pushes attackers toward channel-shifting techniques (QR codes, TOAD) and redirect chains that route through allowlisted Microsoft infrastructure. Google Workspace has different sandbox behaviour and different allowlist configurations, making calendar invite phishing and redirect-heavy chains more effective. The 7x concentration of DocuSign brand impersonation on M365 reflects that DocuSign is a primary Microsoft 365 workflow integration — attackers go where the trust is established.
Intent reasoning means the detection question is 'what is this email trying to make the recipient do, and is that purpose consistent with any legitimate business communication?' rather than 'does this email match a known-bad pattern?' In practice this means: a system reads the sender analysis, the chain of redirect hops behind the URL, the social engineering framing in the message body, the relationship history between sender and recipient, and the specific combination of techniques present — and asks whether that combination serves any legitimate purpose.
A DocuSign notification that routes through 3 redirect hops behind a CAPTCHA gate to an unknown destination has no legitimate purpose. That assessment does not require matching a known pattern. It requires reasoning about whether the construction makes sense for what the email claims to be.
Be the first to get exclusive offers and the latest news
Deploy in minutes, not months. Zero tuning. See what your current tools are missing.