4.5.2.3. 20241017 – Disk SusbSystem Errors on Server in Data Center
SERVICE DISRUPTION NOTICE (SDN)
NOTICE TIME ZONE
Brisbane UTC +10
STATUS (Open/Closed)
Closed
CAUSED BY A 3RD PARTY?
No
IF YES, NAME OF 3RD PARTY
START DATE
20241017
START TIME (HHMM)
0300
ESTIMATED TIME TO RESOLUTION
N/A
END DATE
N/A
END TIME (HHMM)
N/A
TOTAL DURATION
ESTIMATED DOWN TIME FOR AFFECTED USERS
90 minutes
XCRM TICKET NUMBER
Not Applicable
BRAND
XCLOUD
PRIORITY
P2
CLIENTS AFFECTED
All Clients using virtual machines hosted on Z-0-0-VH70 in the NextDC B2 Data Center in Brisbane
DESCRIPTION OF INCIDENT
Monitoring showed an error with Z drive on Z-0-0-VH70
EVENT TIMELINE
Thursday 17th Oct 2024
0615
Monitoring showed issues with Z-0-0-VH70. The disk subsystem for Z drive has PCI errors. The RAID card is showing errors. Resorting to restoring Virtual Machines hosted on this server from backups to other servers.
0752
Affected clients are X36, X66, and X73
0810
We have been able to get the RAID hardware back online. For the sake of allowing users to login, we will allow users to login and start working. The issue however is not resolved permanently.
0817
Further investigation has led us to the possibility that there might be a power supply issue in the server hardware. As a precaution, and if the hardware can hold up and we can make it through today, we will begin migrating virtual machine loads off this hardware to other hosts overnight. Once the loads have been transferred, we can take the hardware offline and investigate further.
Friday 18th Oct 2024
0734
Between 8pm last night and 2am today, all production virtual machines have been moved off the host with the hardware issues to other hosts.
Wednesday 30th Oct 2024
1735
Closing this ticket as the server is now fully decommissioned and will be used for parts. We were unable to guarantee 100% stability of the system in its current configuration.
Post your comment on this topic.