Agency 11 min read

The Cold Email Disaster Recovery SOP Every Agency Should Have

Every cold email agency will eventually face a deliverability crisis. The agencies that handle it well are the ones who planned for it in advance. Here's the SOP.

Deliverability crises aren't a question of if — they're a question of when. Domains burn, inboxes get flagged, and clients get angry. The agencies that survive these moments without losing clients are the ones who have a documented response process before the crisis happens.

Part 1: Prevention infrastructure

The best disaster recovery starts before disaster hits.

Monitoring baseline

Run weekly placement tests on every active sending domain. This catches problems before they become crises. A domain that's been steadily in promotions for two weeks is a warning sign; a domain that suddenly drops to spam is a crisis. Weekly testing gives you the early warning signal.

Backup inbox pool

Maintain a reserve of pre-warmed inboxes equal to 25–30% of your active inbox pool. These should be warmed, configured correctly, and ready to deploy within hours. This is your emergency capacity — don't use it for normal campaigns.

Infrastructure segregation

Never put all of one client's campaigns on the same domain. Spread each client across multiple domains and keep different clients on different domain pools. When one domain fails, it only affects part of one client's campaign — not everything.

Documentation

Every client domain should have documented: auth configuration, sending limits, warmup date, sending history, and the current backup infrastructure designated for that client.

Part 2: Crisis detection

Alert triggers

Define what triggers a deliverability incident for your agency:

  • Open rate drops more than 30% week-over-week on any domain
  • Bounce rate exceeds 3% in any campaign
  • Placement test shows spam on any active sending domain
  • Client reports emails going to spam
  • ESP dashboard shows sending account warnings

Immediate response (first 30 minutes)

  1. Pause all sequences on the affected infrastructure
  2. Run the full diagnostic stack: auth checks, blacklist check, placement test, tracking domain check
  3. Classify the issue: technical / reputation-recoverable / reputation-replace
  4. Do not communicate with client until classification is complete

Part 3: Resolution paths

Path A: Technical fix (resolution in hours)

  1. Fix the broken auth record or tracking configuration
  2. Verify DNS propagation
  3. Rerun placement test to confirm fix
  4. Resume sending if placement test passes
  5. Send client a brief update: issue identified and resolved, no campaign impact

Path B: Reputation recovery (resolution in 2–8 weeks)

  1. Move campaign to backup infrastructure immediately
  2. Submit blacklist delisting requests with documentation
  3. Reduce or halt sending on damaged domain
  4. Begin 2-week pause on damaged inboxes
  5. Retest after 2 weeks; if clean, resume at 20% volume and ramp over 4 weeks
  6. Send client a technical summary (see communication template below)

Path C: Full replacement (resolution in hours with pre-warmed, or 4–6 weeks with fresh)

  1. Deploy backup pre-warmed inboxes to client campaign
  2. Migrate campaign sequences to new infrastructure
  3. Run placement test on new infrastructure before first send
  4. Begin blacklist removal on retired infrastructure (for future use)
  5. Send client update: infrastructure upgraded, campaign continuing

Part 4: Client communication templates

Initial communication (within 2 hours of detection)

"We've identified a technical issue affecting deliverability on one of your sending domains. We're investigating and will have a full update within 2 hours. Sequences are paused on the affected domain while we assess."

Technical summary (within 24 hours)

"Root cause: [specific issue — e.g., DKIM key rotation by our ESP wasn't reflected in DNS]. Impact: emails sent from [domain] between [dates] may have had reduced placement rates. Resolution: [fix applied]. Campaign status: migrated to backup infrastructure, sends continuing normally. No action needed from you."

Part 5: Post-mortem

After every incident, document:

  • What happened and when it was detected
  • How long between failure and detection
  • How long between detection and resolution
  • Campaign impact (emails sent during failure, estimated placement drop)
  • What would have caught this earlier
  • What process change prevents this

Most agencies experience the same failure modes multiple times because they don't run post-mortems. One documented incident review is worth more than any amount of monitoring setup.

Pre-crisis readiness checklist

  • Weekly placement tests scheduled for all active domains
  • Backup inbox pool maintained at 25–30% of active pool
  • Each client's domains documented with auth config and limits
  • Alert triggers defined and monitored
  • Client communication templates written and accessible
  • Path A / B / C resolution playbooks documented
  • Designated person responsible for incident response

Agencies that handle crises best are the ones with pre-warmed backup infrastructure already deployed. WarmInboxes is one source for pre-warmed inboxes that can be migrated to within hours. The time to set this up is before a crisis — not during one.

Run the checks first

Before replacing anything, run a free inbox placement test. You might find the issue is DNS, not the domain — and save yourself a week of unnecessary work.

Free inbox placement test Check burn score

More guides

How to Keep Client Campaigns Running When Your Sending Infra BreaksHow Cold Email Agencies Should Build Backup Infrastructure Before Disaster HitsThe Real Cost of Burned Inboxes for Cold Email Agencies