Summary
On Wed, March 18th there was a temporary outage with click redirection and open tracking for the domain “go.sparkpostmail.com” from 16:50 EST to 18:45 EST. During this time, recipients who clicked on a link in a received message observed a blank page or a 503 error. Also, no opens were recorded during the incident period. Other default tracking domains and custom tracking domains were not impacted.
Timeline and Impact - March 18th, all times in UTC
Root Cause
During a routine deployment, an engagement tracking load balancer target was misconfigured which caused the LB to mark all hosts as unhealthy/unavailable. The target was set incorrectly because a script was run to set health check values that included an error that was not caught during the review process. The breakage was not immediately caught after deployment because the deployment playbook lacked a clear and complete regression test procedure. The error was also not noticed because these particular endpoints were not monitored adequately.
Corrective Actions
[REF RCA-359]