[Comtec Announce] EMERGENCY Engineering Work 29/09/2012 - Update #6
David Croft
david.croft at comtec.com
Mon Oct 1 16:45:07 BST 2012
Further to the outage at our Telecity (LNPOP02) location on Saturday
night, please find below an analysis of the events. All times in BST
(UTC+0100).
Timeline
19:33 - Our operations centre observed that management connectivity
was lost to the router bb01.lnpop01, the core device handling traffic
through and from that data centre. Logs received from the device
indicated a memory exhaustion error. Although the device appeared to
still be passing traffic normally, it was not possible to log into it
to diagnose the problem and a decision was made to schedule an
emergency reboot of the device to prevent the situation degrading further.
19:50 - Announcement sent scheduling the reboot for 23:00.
23:00 - Router rebooted as scheduled.
23:14 - After the router did not return to service in the expected
amount of time, a call was logged to our remote hands service at
Telecity.
23:27 - Remote hands arrives and manually power cycles the router.
Observes that the device lights indicate a boot failure. First Comtec
engineer is dispatched to site. We continue to work with remote hands
and engage Cisco TAC to attempt to restore service.
23:43 - Second Comtec engineer dispatched to site.
01:08 - First engineer arrives on site and begins troubleshooting.
01:24 - Second engineer arrives on site.
01:39 - A total hardware failure has been determined and the engineers
leave to obtain replacement equipment from the depot as a faster
alternative to waiting for a replacement to arrive.
02:35 - Engineers return with replacement router and begin cabling and
configuring it.
02:47 - Router is fully configured and is booted into service.
02:52 - Network and routing protocols converged, service fully restored.
Outage Duration
23:00 - 02:52 (3 hours 52 minutes)
Root Cause
Hardware failure in a core router.
Further Steps
A maintenance window will be scheduled in the near future to replace
the temporary router with a new one.
Affected Customers
Transit customers in LNPOP02.
21CN Cloud Ethernet customers single-homed to LNPOP02.
Regards,
David Croft
--
David Croft
Service Delivery Manager
Comtec Enterprises Ltd
Comtec House
46a Albert Road North
Reigate Industrial Estate
Reigate
Surrey RH2 9EL
Tel: 0845 899 1400
Fax: 0845 899 1401
www.comtec.com
For urgent operational issues please always contact noc at comtec.com
or 0845 899 1423 and not any named individual.
More information about the UK-Announce
mailing list