During this incident, outbound message delivery was delayed for all customers provisioned in the US region.
Impact Period: 16:40 Oct 28 - 01:00 Oct 29 UTC
Impact to Customers: No messages were delivered between 16:40 - 19:15 UTC and messages were slow to be attempted for delivery between 19:15 and up to 01:00 Oct 29 UTC until the backlog of queued messages cleared for all customers. Message injection (both REST Transmissions API and SMTP API) was not impacted in this incident and was fully operational for the duration.
This incident was precipitated by a networking issue with our cloud service provider: the network that routes messages to the internet from SparkPost had a failure. However, after our cloud service provider fixed the issue, we did not recover as expected because of a bottleneck in our architecture. It took several hours for the queued messages to be processed resulting in longer time-to-first-attempt metrics. Our corrective actions include: (1) reviewing our architecture both internally and with our providers, (2) building resiliency against the type of network failure that precipitated this event, and, (3) making improvements to accelerate the time to recover for this type of failure to reduce the impact to our customers.