Click and open tracking errors for some SparkPost and SparkPost Enterprise customers (US hosted only)
Incident Report for SparkPost
Postmortem

Summary

On Wed, March 18th there was a temporary outage with click redirection and open tracking for the domain “go.sparkpostmail.com” from 16:50 EST to 18:45 EST. During this time, recipients who clicked on a link in a received message observed a blank page or a 503 error. Also, no opens were recorded during the incident period. Other default tracking domains and custom tracking domains were not impacted.

Timeline and Impact - March 18th, all times in UTC

  • 20:50 - Scheduled change to load balancer settings is made which incorrectly sets the health check target for go.sparkpostmail.com. [IMPACT PERIOD BEGINS]
  • 21:05 - First customer reports redirect links not working.
  • 22:19 - More customers report issues with link redirection; Support team escalates to Engineering on-call who responds to investigate.
  • 22:45 UTC - Fix is deployed to load balancer settings and the issue is resolved. [IMPACT PERIOD ENDS]

Root Cause

During a routine deployment, an engagement tracking load balancer target was misconfigured which caused the LB to mark all hosts as unhealthy/unavailable. The target was set incorrectly because a script was run to set health check values that included an error that was not caught during the review process. The breakage was not immediately caught after deployment because the deployment playbook lacked a clear and complete regression test procedure. The error was also not noticed because these particular endpoints were not monitored adequately.

Corrective Actions

  1. Audit and address gaps in monitoring of load balancers and their health status. [IN PROGRESS, target March 25 2020]
  2. Audit monitoring of all SparkPost default tracking domains [IN PROGRESS, target March 27 2020]
  3. Update Change Management process to ensure sufficient regression testing procedures and peer review are in place for this type of configuration change. [IN PROGRESS, target March 27 2020]

[REF RCA-359]

Posted Mar 19, 2020 - 15:06 EDT

Resolved
RESOLVED: Open and click-tracking links using the domain of go.sparkpostmail.com were returning a 503 for some period of time earlier today. This is one of the default domains provided by SparkPost. Any custom tracking domain was not affected.
Posted Mar 18, 2020 - 18:43 EDT