Follow Me on Twitter
Client Support Community Server Status Contact Us Client Login
Email Hosting Website Hosting Reseller Hosting VPS Hosting Dedicated Servers

    Join our Community      Check your private messages       Profile       Search       FAQ       Memberlist       Log in


[24/06/11] Networking issues

 
Post new topic   Reply to topic    NetHosted Community Index -> Technical Announcements
NetHosted - Andrew Reply with quote
 NetHosted Staff

 

 Joined: 22 Mar 2004
 Posts: 7017
 

PostPosted: Fri Jun 24, 2011 8:08 pm    Post subject: [24/06/11] Networking issues
 
Hi,

The datacentre where Pluto, Mars, Earth, Venus, VPS1, VPS2 and Sun reside is currently experiencing networking issues.

We will update you when we have further information and an ETA for a fix.

Thanks,

Andrew

_________________
| Andrew Bassett
| Managing Director, NetHosted Ltd.
| Follow us on Twitter: http://twitter.com/nethosted 
| Members, tell us what you think  of NetHosted!
Back to top
View user's profile Send private message
NetHosted - Andrew Reply with quote
 NetHosted Staff

 

 Joined: 22 Mar 2004
 Posts: 7017
 

PostPosted: Fri Jun 24, 2011 8:17 pm    Post subject:
 
Hi,

Earth, Venus, VPS1 and VPS2 are now back online.

Thanks,

Andrew

_________________
| Andrew Bassett
| Managing Director, NetHosted Ltd.
| Follow us on Twitter: http://twitter.com/nethosted 
| Members, tell us what you think  of NetHosted!
Back to top
View user's profile Send private message
NetHosted - Andrew Reply with quote
 NetHosted Staff

 

 Joined: 22 Mar 2004
 Posts: 7017
 

PostPosted: Fri Jun 24, 2011 9:00 pm    Post subject:
 
Hi,

We are keeping a permanent line of communication open with the datacentre over the extended networking issues being experienced by Pluto, Mars and Sun.

They are unable at this time to give an ETA on when these issues will be resolved.

We will update this thread as we learn more.

Thanks,

Andrew

_________________
| Andrew Bassett
| Managing Director, NetHosted Ltd.
| Follow us on Twitter: http://twitter.com/nethosted 
| Members, tell us what you think  of NetHosted!
Back to top
View user's profile Send private message
NetHosted - Andrew Reply with quote
 NetHosted Staff

 

 Joined: 22 Mar 2004
 Posts: 7017
 

PostPosted: Fri Jun 24, 2011 9:53 pm    Post subject:
 
Hi,

All servers are now back online.

Thanks,

Andrew

_________________
| Andrew Bassett
| Managing Director, NetHosted Ltd.
| Follow us on Twitter: http://twitter.com/nethosted 
| Members, tell us what you think  of NetHosted!
Back to top
View user's profile Send private message
NetHosted - Andrew Reply with quote
 NetHosted Staff

 

 Joined: 22 Mar 2004
 Posts: 7017
 

PostPosted: Sun Jun 26, 2011 12:30 am    Post subject:
 
Here is the full report from the datacentre:

Quote:
Please find below the reason for outage report for the issues seen yesterday evening.
All times quoted are BST (GMT+1)

At 19:35 on Friday 24th June 2011 our monitoring systems indicated an issue with our Telecity HEX core router. At the same time our system logging recorded a large amount of router protocol instability coming from the router, with knock-on instability being detected on various other parts of the network as a result of this. Also at the same time we witnessed instability with the HSRP protocol, which provides redundancy between the two core routers for customer vlan gateway addresses.

Suspecting a similar problem to the recent previous issues with the Telehouse East router we connected to the HEX switch which connects the router to backhaul infrastructure, and disabled the ports connecting to the router at 20:00. This should have forced all customer traffic to go via the Telehouse East router, which we believed would also reduce the CPU load on the Telecity HEX router, and subsequently stabilise the routing protocols. However the Telecity HEX router did not correctly observe the ports to it go down and continued to believe it should be the master for certain customer circuits. Also it did not reduce the CPU load on the Telecity HEX router so the instabilities continued.
We then took the decision to isolate the router entirely from both external and internal connectivity by logging in to the router via remote out of band access and shutting down all peering, transit and customer facing interfaces. This stabilised the network for all customer who were not single homed to the Telecity HEX router. This was completed by 20:30.

After careful consideration we decided that the best course of action was to bring forward the planned router firmware update which was due to be announced for July to bring the Telecity HEX router inline with the version running on the Telehouse East router, which was upgraded last month in order to resolve the instability issues seen during the month of May. This new firmware was advised to us by our Vendor after in-depth analysis of the issues seen previously.

We then proceeded to perform the firmware upgrade with the router isolated from the network entirely. This began at 20:50. This would require a reboot of the Telecity HEX router to the new firmware, whilst keeping all interfaces down to ensure this reboot did not cause any further instability. The new firmware was installed by 21:05, at which point the router was rebooted.

Once the router was back up and running the new firmware, at 21:18 we started slowly bringing selected interfaces back online to bring the Telecity HEX router back in to the network in a controlled manner, whilst carefully monitoring its impact on the rest of the network. This proved successful; therefore we began to slowly re-enable external connectivity to our IP transit suppliers and peers whilst continuing our monitoring for any impact. This took until approximately 21:30.

Once full external and internal routing connectivity was restored and had been stable for a period of time we proceeded to re-enable customer facing interfaces in a staggered manner whilst observing to ensure that the customer redundant gateways established their pairings with the Telehouse East router successfully, this restored redundancy to all multihomed customers and restored connectivity to single homed customers. This took from 21:45 until 22:15.

We continued to monitor the platform as a whole to ensure that it was now stable and at 23:13 we issued an all clear.

We are aware that the network performance throughout the second quarter of this year has been sub-optimal. We will be having a full review of the routing element of the network architecture in the coming weeks to determine if any changes in topology or architecture can be made to reduce the likelihood of similar issues occurring again in the future. We would like to stress however that we and our vendor believe the new firmware which both core routers are now running should have resolved the issues that we have been working with our vendor to fix.

We would like to apologise of the inconvenience caused by yesterday evenings issues.


Thanks,

Andrew

_________________
| Andrew Bassett
| Managing Director, NetHosted Ltd.
| Follow us on Twitter: http://twitter.com/nethosted 
| Members, tell us what you think  of NetHosted!
Back to top
View user's profile Send private message
Post new topic   Reply to topic    NetHosted Community Index -> Technical Announcements
Page 1 of 1

User Permissions
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum

 
Jump to: