Spam Slayer: Why I Built My Own Email Filter

May 28, 2026Jeff Conn

Building in PublicSpam SlayerAI PipelineEmailProduction Systems

Inbox spam in 2026 is a different problem than it was in 2010. The volume is higher, the senders are smarter, and the standard Gmail-style filter is either too aggressive (killing legitimate sender outreach) or too quiet (letting obvious AI-written garbage through). I lived this every week running operations across multiple businesses with shared inboxes.

So I built Spam Slayer: a multi-tenant IMAP spam filter with a hybrid pipeline that actually thinks before it quarantines.

Spam Slayer operations dashboard — mailbox health, protection metrics, reliability, AI usage and cost.

What It Does

You sign up, connect your IMAP mailbox (with credentials encrypted at rest), and a continuous worker watches incoming mail. The worker classifies each new message, quarantines what looks like spam, and leaves the legitimate stuff alone. You review quarantine in a clean dashboard, restore anything that was wrongly flagged, and one-click block senders or domains that get through.

Every decision is logged. Every restore feeds the recommendations engine. Every quarantine has a full audit trail.

The Hybrid Pipeline

The core insight: rules and AI each fail in different ways, and combining them carefully is dramatically better than either alone. Spam Slayer scores every message through three stages, in order:

Allowlist/blocklist override — explicit user preferences win, always. Your CRM's sender is never going to quarantine, no matter what the model says.
Rule scoring (0–100) — fast, deterministic, free. SPF/DKIM/DMARC results, subject-line patterns, sender reputation, link characteristics. If a message scores clearly spam or clearly clean, it's decided here.
AI classifier in the gray zone only — for messages that the rules can't confidently call, hand it to an LLM with the full context.

The AI sits at the end of the funnel, not the front. The rules handle 80%+ of decisions for free.

This pattern matters because LLM calls have a cost. Running every inbound message through an LLM is uneconomic at scale. Running only the ambiguous ones is cheap and accurate.

The Boring Parts That Actually Mattered

The pipeline was the fun part. The production-grade IMAP plumbing was where most of the time went:

Worker with IDLE + polling fallback — IMAP IDLE is great when it works and silently fails when it doesn't. Both paths are running.
UID checkpointing — no reprocessing the same message twice. Sounds obvious. Took three iterations to get right.
AES-256-GCM credential encryption with key-ID-based decryption fallback and an admin rotation workflow. Storing IMAP passwords correctly is non-negotiable.
Circuit breaker — when a mailbox repeatedly fails (bad creds, server outage), it pauses itself and sends a critical alert instead of hammering the server forever.
Junk folder auto-detection — every IMAP provider names theirs differently. The validator figures it out.
Stuck-mailbox detector + failure class telemetry — so I see which kind of failure is happening, not just that something failed.

The Stack

Next.js 16 (React 19) + TypeScript on the web. A separate long-running Node 22 worker for IMAP. PostgreSQL + Prisma. Pino for structured logging with explicit redaction policy for sensitive fields. Argon2 for password hashing. JWT cookie sessions + CSRF checks.

Deployment is a Render blueprint with web + worker + scheduled-checks + Postgres. Health endpoint with worker heartbeat so I can spot a wedged worker before users do.

The Lesson

I came into this thinking the AI classifier was the product. By the time I was done, I realized the reliability layer was the product. Anyone can build a one-shot LLM email classifier in an afternoon. Almost nobody builds the IDLE-reconnect, UID-checkpoint, credential-encryption, circuit-breaker, audit-trail infrastructure that makes that classifier safe to point at a real inbox.

That gap — between "AI demo" and "production system" — is where the actual moat lives.