[Comtec Announce] [IP Voice Services] - Post Incident Review : IPVS Network Incident 01/09/2011

Courtenay Mills courtenay.mills at comtec.com
Fri Sep 2 20:49:09 BST 2011


Please see the Service Alert reviewing the IPVS network incident on 01/09/2011:

Start Date & Time:  01/09/2011 12:15
End Date & Time:  01/09/2011 18:30


ISSUE

A core Cisco network switch suffered hardware failure at 12.15pm.  End users may have experienced intermittent issues with calls for a short period whilst the voice service switched paths onto other core switches. Automatic failover occurred and all voice traffic migrated to resilient equipment. This should have resulted in normal service and quality levels being resumed automatically.

Unfortunately, voice quality was degraded after the automated failover process. This secondary issue occurred due to an incorrect auto negotiation of switch ports in the alternative voice path. The switch ports were set to auto-negotiate, but had negotiated at only 100Mb instead of 1Gb. This resulted in traffic congestion to the resilient voice Session Border Controller (SBC) affecting voice quality. After this was identified, Engineering manually hard-set to 1Gb resolving the voice quality degradation.
 
Separate to the voice traffic, the Business Portal was also affected by the switch failure, resulting in a loss of access.

Once all services were restored, Engineers continued to work through the night to test and confirm the resolution was working as expected.  Network monitoring systems are being fine-tuned using the information and knowledge gathered in relation to this event.


IMPACT ON SERVICE

High – some customers affected on the voice path (a high number of customers using the resilient SBCs).  All customers unable to access the Business Portal.


RESOLUTION TIMELINE 

Issue occurred – 12.15
Automatic disaster recovery occurred – 12.15
Engineers identify cisco network switch hardware failure – 12.45
Initial Service Alert Issued to IPVS Help Desk at 12.47
Remote On-Call Engineer alerted – 12.50
Remote On-Call Engineer arrived at Data Centre with replacement hardware – 14:30
Misconfigured port causing traffic congestion identified – 15:00
Engineering correct misconfigured port to restore voice quality – 16:00
1st update to Service Alert issued – 16.09
2nd update to Service Alert issued – 16.46
Replacement hardware, configured, tested and deployed – 17:00
Access to non-voice services (Business Portal) re-established – 18:30
3rd update to Service Alert informing of resolution issued – 18:45
 

COMMUNICATION FROM BT 

We sincerely apologise for the inconvenience this has caused. We are working hard to ensure that this issue is not experienced again.  Partner communication processes are under review and a detailed review of all process will be undertaken with a further update to be provided by 18:00 09/09/2011.


Regards,

Comtec Network Operation Centre

For urgent operational issues please always contact noc at comtec.com or
0845 899 1423 and not any named individual.



More information about the UK-Announce mailing list