This issue has been resolved and all service is stable.
Posted about 2 years ago. Aug 03, 2017 - 17:07 UTC
We have been tracking an intermittent core switching issue which initiated late Saturday 7/29 around 14:30 UTC, and has affected service to a segment of customers in Atlanta. The condition has been highly intermittent, causing a failure of traffic to flow for several minutes, and then traffic would flow again freely for many hours again without re-occurring.
Log and NMS reports pointed us to initially repair the isolated issue by replacing a single 10Gb optic which had been identified as failed. Unfortunately, the failure condition re-surfaced again early Sunday morning. On site staff have been very busy re-arranging network paths to isolate the issue since then.
More recently, at around 7/30 10:30 UTC, the issue expanded in nature, pointing to a multi port 10Gb switch line card which unexpectedly crashed and rebooted itself. We have since moved all network traffic off of that card, and are in the process of replacing it entirely. This recent development has expanded the nature of this incident to a more significant portion of Atlanta customers.
We do not expect any further intermittent service disruption related to this incident. However, due to the very highly intermittent, and expanding nature of this, we will be monitoring it closely for 48+ hours before officially closing it out as resolved.
Posted about 2 years ago. Jul 30, 2017 - 15:46 UTC