CenturyLink/Level 3 Outage
Incident Report for MacStadium
Postmortem

The following is based on a Reason For Outage document provided by CenturyLink:

Incident Start: August 30, 2020 10:04 GMT
Incident Clear: August 30, 2020 15:10 GMT
Affecting Internet Services in Multiple Markets

Cause
A problematic Flowspec announcement interfered with correctly establishing Border Gateway Protocol (BGP). This had a significant impact on client services.

Resolution
CenturyLink deployed a configuration change to block the problematic Flowspec. This restored services to normal functioning.

Summary
On August 30, 2020 at 10:04 GMT, CenturyLink became aware of an issue that was starting to affect customers in multiple markets. CenturyLink teams were immediately engaged, and they began an intensive investigation. They were unable to determine a cause at first but attempted to deploy potential solutions.

By 10:52 GMT, the effects on MacStadium customers had grown, and the NOC escalated the incident to the Incident Manager and Engineering Team, who immediately began an analysis of the situation based on information from monitoring alerts and customers. The team attempted to contact CenturyLink but were still unable to get through by 11:36 GMT due to overwhelming demand on CenturyLink support.

By 12:18 the Engineering Team shut down the CenturyLink links to temporarily bypass all CenturyLink issues at the level of MacStadium’s network. MacStadium network traffic returned to normal, but traffic was still affected by intermediate carrier dependencies on CenturyLink across the globe.

At approximately 14:00 GMT, CenturyLink found a Flowspec announcement for managing routing rules that was problematic in that it was preventing Border Gateway Protocol (BGP) from establishing as intended. The original source of this was determined to be the unintentional introduction of wildcards into an attempt to block a single IP.

At 14:14 GMT, CenturyLink deployed a global configuration change to block the problematic Flowspec announcement. The command began propagating across devices and the problematic protocol was successfully removed. This allowed BGP to correctly establish again. By 15:10 GMT, all alarms had cleared and service returned to nominal.

After an observation and monitoring period, MacStadium restored CenturyLink links at 4:55 GMT on August 31, 2020.

Posted Sep 08, 2020 - 01:16 UTC

Resolved
We have detected no further issues related to this incident. CenturyLink/Level 3 confirms full restoration of services, and our monitoring will continue as usual.
Posted Aug 31, 2020 - 20:07 UTC
Monitoring
CenturyLink/Level 3 has recovered. MacStadium has re-introduced CenturyLink Internet routing for Atlanta, Las Vegas, and Dublin alongside our other transit carrier partners. Network traffic remains nominal, but the Network Team will continue monitoring for possible anomalies.
Posted Aug 31, 2020 - 13:09 UTC
Identified
CenturyLink/Level 3 is experiencing a widespread outage internationally. MacStadium has rerouted traffic to other providers, and all traffic is restored. Customers might still see problems at a global scale due to CenturyLink's outage. We are working with this provider now to identify an estimated time to restoration.
Posted Aug 30, 2020 - 11:45 UTC
Investigating
CenturyLink/Level 3 Communications is experiencing a widespread network outage affecting some customers. We are now rerouting traffic from those affected links.
Posted Aug 30, 2020 - 11:29 UTC
This incident affected: Atlanta Data Center, Dublin Data Center, and Las Vegas Data Center.