4.5.2.2. 20241017 – Disk SusbSystem Errors on Server in Data Center
SERVICEDISRUPTIONNOTICE (SDN)
NOTICETIMEZONE
Brisbane UTC +10
STATUS (Open/Closed)
Closed
CAUSED BY A 3RD PARTY?
No
IF YES, NAME OF 3RD PARTY
STARTDATE
20241017
STARTTIME (HHMM)
0300
ESTIMATEDTIME TO RESOLUTION
N/A
ENDDATE
N/A
ENDTIME (HHMM)
N/A
TOTALDURATION
ESTIMATEDDOWNTIMEFORAFFECTEDUSERS
90 minutes
XCRMTICKETNUMBER
Not Applicable
BRAND
XCLOUD
PRIORITY
P2
CLIENTSAFFECTED
All Clients using virtual machines hosted on Z-0-0-VH70 in the NextDC B2 Data Center in Brisbane
DESCRIPTION OF INCIDENT
Monitoring showed an error with Z drive on Z-0-0-VH70
EVENTTIMELINE
Thursday 17th Oct 2024
0615
Monitoring showed issues with Z-0-0-VH70. The disk subsystem for Z drive has PCI errors. The RAID card is showing errors. Resorting to restoring Virtual Machines hosted on this server from backups to other servers.
0752
Affected clients are X36, X66, and X73
0810
We have been able to get the RAID hardware back online. For the sake of allowing users to login, we will allow users to login and start working. The issue however is not resolved permanently.
0817
Further investigation has led us to the possibility that there might be a power supply issue in the server hardware. As a precaution, and if the hardware can hold up and we can make it through today, we will begin migrating virtual machine loads off this hardware to other hosts overnight. Once the loads have been transferred, we can take the hardware offline and investigate further.
Friday 18th Oct 2024
0734
Between 8pm last night and 2am today, all production virtual machines have been moved off the host with the hardware issues to other hosts.
Wednesday 30th Oct 2024
1735
Closing this ticket as the server is now fully decommissioned and will be used for parts. We were unable to guarantee 100% stability of the system in its current configuration.
Post your comment on this topic.