NOTICE TIME ZONE | Brisbane GMT+10 |
---|---|
STATUS (Open/Closed) | Closed |
INCIDENT START DATE | 20200706 |
INCIDENT START TIME (HH:MM) | Approx: 06:15 |
ESTIMATED TIME TO RESOLUTION | N/A |
INCIDENT END DATE | 2020706 |
INCIDENT END TIME (HH:MM) | Approx: 08:45 |
OUTAGE DURATION | Approx. 2 Hours 30 Minutes |
INCIDENT CAUSED BY A 3RD PARTY? | No |
IF YES, NAME OF 3RD PARTY | |
XCRM TICKET NUMBER | Not Applicable |
BRAND | XRACK |
PRIORITY | P2 |
CUSTOMERS AFFECTED | X101 |
DESCRIPTION OF INCIDENT | A virtual machine for X101 on Z-0-0-VH209 did not come up after a host restart on Saturday 4th July 2020 |
PRIMARY IMPACT EFFECT | Users may not have been able to login to their XCLOUD Desktop |
SECONDARY IMPACT EFFECT | None |
EVENT TIMELINE | |
SATURDAY 4TH JULY 20020 | |
21:15 | Z-0-0-VH209 Host that contains the A-101-0-VS2 virtual server was restarted to fix a network card issue |
21:30 | All VM’s checked however A-101-0-VS2 was missing from the Hyper-V manage interface. XSTRA’s engineers did not detect that the VM was missing from Hyper-V as that has never happened before and should not happen |
MONDAY 6TH JULY 2020 | |
07:05 | First calls started to come in from some users that they could not log in. Attention was brought to the fact that the XSTRA monitoring system was picking up that A-101-0-VS2 which hosts the PRTG probe was down. Investigations by XSTRA engineers could not find out why the VM was not in existence on any Hyper-V host. XSTRA then started a replica of the missing VM until the issue can be resolved permanently. Users could log in again at approx 8:15 am except some new users to the company. |
08-15 – 08:45 | XSTRA found that the missing VM should be on Z-0-0-VH209. The VHD was found on that host but again, the VM was not present in Hyper-V manage on that host. The VM was re-created on the host and tested OK and we could see the new users listed in the directory so we knew this was the correct VM to have running. The failover VM was shut down and the new VM was started up. Users are now able to log in again including the new users. By 08:40 am – 15 users were logged in and working. |
08:42 | Daniel from XSTRA started on re-establishing the usual backup protocols for the new VM |
RECOVERY & RESOLUTION | A new VM was created from the original VM hard disk VHD |
ROOT CAUSE | XSTRA is still working on the root cause. However, it was clear that the monitoring systems were working fine however XSTRA did not act on the alarms in a timely fashion onMonday morning. The work to resolve should have started at least 1 hour prior. |
CORRECTIVE & PREVENTATIVE MEASURES | XSTRA will remind the early morning tech team that starts that they must as a first step, always check the PRTG monitoring system for any alarms and bring senior engineers in to solve the problem if they can not resolve the issue quickly. XSTRA has never seen a VM completely disappear from Hyper-V Manager on any host in over 10 years. We will be looking further into this missing VM issue that occurred but it looks more than likely to be human error. |
RESIDUAL EFFECT | None |
Revision:
4
Last modified:
Jul 05, 2020
Post your comment on this topic.