NOTICE TIME ZONE Australia Brisbane GMT+10
STATUS (Open/Closed) Closed
INCIDENT START DATE 20200528
INCIDENT START TIME (HH:MM) 13:05
OUTAGE DURATION 20 Minutes
ESTIMATED TIME TO RESOLUTION Resolved
XCRM TICKET NUMBER Not Applicable
BRAND XRACK
PRIORITY P2
CUSTOMERS AFFECTED Approximately 4 Clients X32, X67, X73 & X98
DESCRIPTION OF INCIDENT Multiple virtual machines lost network connectivity. Intial findings thought to be a faulty NIC, the redundant NIC was tried but the hyper-visor locked up preventing changes. A reboot of the server was performed and services resumed resulting in a 20 minute outage.
DESCRIPTION IMPACT
  • Primary Effect
6 Vitual machines lost network connectivity through the primary network port
  • Secondary Effect
Access to the VMs and the services they run resulted in disconnection for some users
  • Residual Effect
None expected once the problem is resolved
EVENT TIMELINE
  • 13:05
PRTG Alerts indicated a problem numerous client services
  • 13:06
NOC staff alerted senior technicians of services offline
  • 13:11
Technician Luke commenced identification of primary cause
  • 13:12
Technician Luke identified cause to be isolated to one server and specifically the network interface for customers
  • 13:13
Technician Luke attempted to migrate a virtual machine over to redundant network interface
  • 13:17
Technician Luke found changes were hanging and not applying due to the hypervisor entering a hung state
  • 13:18
Technician Luke initiated a restart of the server
  • 13:23
Technician Luke confirmed server was back online and accessible
  • 13:25
Machine was fully operational again and all services resumed their normal operation
RECOVERY & RESOLUTION XSTRA identified issue was related to the communication between the network interface and the hypervisor of the host server. A reboot allowed the hypervisor to return to normal operation
ROOT CAUSE Hypervisor entered an hung state resulting in no communication between the virtual switch and the physical network interface
CORRECTIVE & PREVENTATIVE MEASURES Remove the host from production and perform routine maintenance and updates
Revision: 1
Last modified: May 28, 2020

Feedback

Was this helpful?

Yes No
You indicated this topic was not helpful to you ...
Could you please leave a comment telling us why? Thank you!
Thanks for your feedback.

Post your comment on this topic.

Please do not use this for support questions.
https://x.direct/1/en/topic/welcome

Post Comment