NOTICE TIME ZONE |
Brisbane GMT+10 |
STATUS (Open/Closed) |
Closed |
INCIDENT START DATE |
20200615 |
INCIDENT START TIME (HH:MM) |
05:15 |
ESTIMATED TIME TO RESOLUTION |
N/A |
INCIDENT END DATE |
20200615 |
INCIDENT END TIME (HH:MM) |
09:38 |
OUTAGE DURATION |
4 Hours 23 Minutes |
INCIDENT CAUSED BY A 3RD PARTY? |
Yes |
IF YES, NAME OF 3RD PARTY |
ECN – https://ecn.net.au |
XCRM TICKET NUMBER |
Not Applicable |
BRAND |
XNET |
PRIORITY |
P2 |
CUSTOMERS AFFECTED |
Some or ALL services for X0, X32, X26, X83, X92, X44, X40, X73, X52, X91, X33 |
DESCRIPTION OF INCIDENT |
A fiber link running between NEXTDC B1 and ECN data centers was not functioning |
DESCRIPTION IMPACT |
Users may not be able to access their cloud-based phone systems, servers, virtual desktops, and websites |
Primary Effect |
No services available to those Clients impacted across a range of services |
Secondary Effect |
None |
Residual Effect |
None expected once the problem is resolved |
EVENT TIMELINE |
|
|
Thursday 25th June 2020 |
06:15 |
Tod from XSTRA confirmed that a major piece of infrastructure was down |
06:30 |
Luke from XSTRA inbound to ECN datacenter to investigate |
07:05 |
All XSTRA infrastructure and services are confirmed to be working fine. Identified the issues as being the link from ECN data center and NEXT DC data center |
07:10 – 08:15 |
XSTRA unsuccessfully making attempts to contact ECN staff |
08:20 |
ECN staff arrived and confirmed that last night they had an issue with the link and that the problem was fixed. XSTRA explained that the problem was not fixed. ECN left to investigate. |
08:35 |
ECN reconfirmed that everything was fine at their end but XSTRA still claiming the issue is with ECN |
08:50 |
XSTRA decides to move the last remaining infrastructure out of ECN and move it to NEXT DC B2 with an estimated resolution to the problem to be 10 am. Clients who have failover infrastructure are being advised not to failover as XSTRA believes the window until proposed restoration being 10 am is too small to justify the start of a failover. |
09:00 |
XSTRA starting physical move |
09:38 |
Physical move complete and most services restored |
09:40 |
XSTRA confirmed that the ECN issue is now resolved |
09:55 |
Confirmed all services restored after testing |
09:55 – 10:20 |
XSTRA called back any clients who called the XSTRA CEO personal phone 0400 596 366 |
10:23 |
XSTRA called ECN and asked for a full explanation of what happened. See “ROOT CAUSE” below |
RECOVERY & RESOLUTION |
Client infrastructure that is relied upon was moved from ECN to B2 |
ROOT CAUSE |
ECN suffering a hardware failure and ECN technician human error. ECN knew of the issue with the link to NEXT DC at about 2 am. They despatched a technician to NEXT DC who confirmed a piece of network equipment had failed. (SFP module). ECN then sourced a new SFP module and returned and replaced it with the faulty unit and the problem appeared to be fixed which XSTRA can confirm. However, approximately 20 minutes later, at about 5:15 am the link went down again. ECN eventually asked Next DC staff to go and take the fiber cable out of the SFP unit and re-seat it. This fixed the issue up until now. It appears that the ECN technician that performed the work had not correctly re-installed the fiber cable into the new SFP cable and some time after that technician left the NET DC site, the fiber cable has come out of the SFP enough to create a break in communications across the link. A link to the ECN website that explains the incident as they recorded it can be found here: https://status.ecn.net.au/incidents/42 |
CORRECTIVE & PREVENTATIVE MEASURES |
ECN has taken corrective measures for now. XSTRA can confirm the physical movement today of equipment from the ECN data center to the NEXT DC B2 datacenter has permanently removed any reliance on the link between B2 and ECN data center |
Post your comment on this topic.