NOTICE TIME ZONE Brisbane GMT+10
STATUS (Open/Closed) Closed
INCIDENT START DATE 20200706
INCIDENT START TIME (HH:MM) Approx: 06:15
ESTIMATED TIME TO RESOLUTION N/A
INCIDENT END DATE 2020706
INCIDENT END TIME (HH:MM) Approx: 08:45
OUTAGE DURATION Approx. 2 Hours 30 Minutes
INCIDENT CAUSED BY A 3RD PARTY? No
IF YES, NAME OF 3RD PARTY
XCRM TICKET NUMBER Not Applicable
BRAND XRACK
PRIORITY P2
CUSTOMERS AFFECTED X101
DESCRIPTION OF INCIDENT A virtual machine for X101 on Z-0-0-VH209 did not come up after a host restart on Saturday 4th July 2020
PRIMARY IMPACT EFFECT Users may not have been able to login to their XCLOUD Desktop
SECONDARY IMPACT EFFECT None
EVENT TIMELINE
SATURDAY 4TH JULY 20020
21:15 Z-0-0-VH209 Host that contains the A-101-0-VS2 virtual server was restarted to fix a network card issue
21:30 All VM’s checked however A-101-0-VS2 was missing from the Hyper-V manage interface. XSTRA’s engineers did not detect that the VM was missing from Hyper-V as that has never happened before and should not happen
MONDAY 6TH JULY 2020
07:05 First calls started to come in from some users that they could not log in. Attention was brought to the fact that the XSTRA monitoring system was picking up that A-101-0-VS2 which hosts the PRTG probe was down. Investigations by XSTRA engineers could not find out why the VM was not in existence on any Hyper-V host. XSTRA then started a replica of the missing VM until the issue can be resolved permanently. Users could log in again at approx 8:15 am except some new users to the company.
08-15 – 08:45 XSTRA found that the missing VM should be on Z-0-0-VH209. The VHD was found on that host but again, the VM was not present in Hyper-V manage on that host. The VM was re-created on the host and tested OK and we could see the new users listed in the directory so we knew this was the correct VM to have running. The failover VM was shut down and the new VM was started up. Users are now able to log in again including the new users. By 08:40 am – 15 users were logged in and working.
08:42 Daniel from XSTRA started on re-establishing the usual backup protocols for the new VM
RECOVERY & RESOLUTION A new VM was created from the original VM hard disk VHD
ROOT CAUSE XSTRA is still working on the root cause. However, it was clear that the monitoring systems were working fine however XSTRA did not act on the alarms in a timely fashion onMonday morning. The work to resolve should have started at least 1 hour prior.
CORRECTIVE & PREVENTATIVE MEASURES XSTRA will remind the early morning tech team that starts that they must as a first step, always check the PRTG monitoring system for any alarms and bring senior engineers in to solve the problem if they can not resolve the issue quickly. XSTRA has never seen a VM completely disappear from Hyper-V Manager on any host in over 10 years. We will be looking further into this missing VM issue that occurred but it looks more than likely to be human error.
RESIDUAL EFFECT None

Feedback

Was this helpful?

Yes No
You indicated this topic was not helpful to you ...
Could you please leave a comment telling us why? Thank you!
Thanks for your feedback.

Post your comment on this topic.

Please do not use this for support questions.
https://x.direct/1/en/topic/welcome

Post Comment