[Comtec Announce] EMERGENCY Engineering Work 29/09/2012 - Update #6

David Croft david.croft at comtec.com
Mon Oct 1 16:45:07 BST 2012


Further to the outage at our Telecity (LNPOP02) location on Saturday
night, please find below an analysis of the events. All times in BST
(UTC+0100).

Timeline

19:33 - Our operations centre observed that management connectivity
was lost to the router bb01.lnpop01, the core device handling traffic
through and from that data centre. Logs received from the device
indicated a memory exhaustion error. Although the device appeared to
still be passing traffic normally, it was not possible to log into it
to diagnose the problem and a decision was made to schedule an
emergency reboot of the device to prevent the situation degrading further.

19:50 - Announcement sent scheduling the reboot for 23:00.

23:00 - Router rebooted as scheduled.

23:14 - After the router did not return to service in the expected
amount of time, a call was logged to our remote hands service at
Telecity.

23:27 - Remote hands arrives and manually power cycles the router.
Observes that the device lights indicate a boot failure. First Comtec
engineer is dispatched to site. We continue to work with remote hands
and engage Cisco TAC to attempt to restore service.

23:43 - Second Comtec engineer dispatched to site.

01:08 - First engineer arrives on site and begins troubleshooting.

01:24 - Second engineer arrives on site.

01:39 - A total hardware failure has been determined and the engineers
leave to obtain replacement equipment from the depot as a faster
alternative to waiting for a replacement to arrive.

02:35 - Engineers return with replacement router and begin cabling and
configuring it.

02:47 - Router is fully configured and is booted into service.

02:52 - Network and routing protocols converged, service fully restored.

Outage Duration

23:00 - 02:52 (3 hours 52 minutes)

Root Cause

Hardware failure in a core router.

Further Steps

A maintenance window will be scheduled in the near future to replace
the temporary router with a new one.

Affected Customers

Transit customers in LNPOP02.
21CN Cloud Ethernet customers single-homed to LNPOP02.

Regards,

David Croft

--
David Croft
Service Delivery Manager

Comtec Enterprises Ltd
Comtec House
46a Albert Road North
Reigate Industrial Estate
Reigate
Surrey RH2 9EL

Tel: 0845 899 1400
Fax: 0845 899 1401
www.comtec.com

For urgent operational issues please always contact noc at comtec.com
or 0845 899 1423 and not any named individual.



More information about the UK-Announce mailing list