| NetHosted - Andrew |
|
NetHosted Staff

Joined: 22 Mar 2004 Posts: 7017
|
Posted: Fri Jun 24, 2011 8:08 pm Post subject: [24/06/11] Networking issues |
| |
Hi,
The datacentre where Pluto, Mars, Earth, Venus, VPS1, VPS2 and Sun reside is currently experiencing networking issues.
We will update you when we have further information and an ETA for a fix.
Thanks,
Andrew _________________ | Andrew Bassett
| Managing Director, NetHosted Ltd.
| Follow us on Twitter: http://twitter.com/nethosted
| Members, tell us what you think of NetHosted! |
|
| Back to top |
|
| NetHosted - Andrew |
|
NetHosted Staff

Joined: 22 Mar 2004 Posts: 7017
|
Posted: Fri Jun 24, 2011 8:17 pm Post subject: |
| |
Hi,
Earth, Venus, VPS1 and VPS2 are now back online.
Thanks,
Andrew _________________ | Andrew Bassett
| Managing Director, NetHosted Ltd.
| Follow us on Twitter: http://twitter.com/nethosted
| Members, tell us what you think of NetHosted! |
|
| Back to top |
|
| NetHosted - Andrew |
|
NetHosted Staff

Joined: 22 Mar 2004 Posts: 7017
|
Posted: Fri Jun 24, 2011 9:00 pm Post subject: |
| |
Hi,
We are keeping a permanent line of communication open with the datacentre over the extended networking issues being experienced by Pluto, Mars and Sun.
They are unable at this time to give an ETA on when these issues will be resolved.
We will update this thread as we learn more.
Thanks,
Andrew _________________ | Andrew Bassett
| Managing Director, NetHosted Ltd.
| Follow us on Twitter: http://twitter.com/nethosted
| Members, tell us what you think of NetHosted! |
|
| Back to top |
|
| NetHosted - Andrew |
|
NetHosted Staff

Joined: 22 Mar 2004 Posts: 7017
|
Posted: Fri Jun 24, 2011 9:53 pm Post subject: |
| |
Hi,
All servers are now back online.
Thanks,
Andrew _________________ | Andrew Bassett
| Managing Director, NetHosted Ltd.
| Follow us on Twitter: http://twitter.com/nethosted
| Members, tell us what you think of NetHosted! |
|
| Back to top |
|
| NetHosted - Andrew |
|
NetHosted Staff

Joined: 22 Mar 2004 Posts: 7017
|
Posted: Sun Jun 26, 2011 12:30 am Post subject: |
| |
Here is the full report from the datacentre:
| Quote: | Please find below the reason for outage report for the issues seen yesterday evening.
All times quoted are BST (GMT+1)
At 19:35 on Friday 24th June 2011 our monitoring systems indicated an issue with our Telecity HEX core router. At the same time our system logging recorded a large amount of router protocol instability coming from the router, with knock-on instability being detected on various other parts of the network as a result of this. Also at the same time we witnessed instability with the HSRP protocol, which provides redundancy between the two core routers for customer vlan gateway addresses.
Suspecting a similar problem to the recent previous issues with the Telehouse East router we connected to the HEX switch which connects the router to backhaul infrastructure, and disabled the ports connecting to the router at 20:00. This should have forced all customer traffic to go via the Telehouse East router, which we believed would also reduce the CPU load on the Telecity HEX router, and subsequently stabilise the routing protocols. However the Telecity HEX router did not correctly observe the ports to it go down and continued to believe it should be the master for certain customer circuits. Also it did not reduce the CPU load on the Telecity HEX router so the instabilities continued.
We then took the decision to isolate the router entirely from both external and internal connectivity by logging in to the router via remote out of band access and shutting down all peering, transit and customer facing interfaces. This stabilised the network for all customer who were not single homed to the Telecity HEX router. This was completed by 20:30.
After careful consideration we decided that the best course of action was to bring forward the planned router firmware update which was due to be announced for July to bring the Telecity HEX router inline with the version running on the Telehouse East router, which was upgraded last month in order to resolve the instability issues seen during the month of May. This new firmware was advised to us by our Vendor after in-depth analysis of the issues seen previously.
We then proceeded to perform the firmware upgrade with the router isolated from the network entirely. This began at 20:50. This would require a reboot of the Telecity HEX router to the new firmware, whilst keeping all interfaces down to ensure this reboot did not cause any further instability. The new firmware was installed by 21:05, at which point the router was rebooted.
Once the router was back up and running the new firmware, at 21:18 we started slowly bringing selected interfaces back online to bring the Telecity HEX router back in to the network in a controlled manner, whilst carefully monitoring its impact on the rest of the network. This proved successful; therefore we began to slowly re-enable external connectivity to our IP transit suppliers and peers whilst continuing our monitoring for any impact. This took until approximately 21:30.
Once full external and internal routing connectivity was restored and had been stable for a period of time we proceeded to re-enable customer facing interfaces in a staggered manner whilst observing to ensure that the customer redundant gateways established their pairings with the Telehouse East router successfully, this restored redundancy to all multihomed customers and restored connectivity to single homed customers. This took from 21:45 until 22:15.
We continued to monitor the platform as a whole to ensure that it was now stable and at 23:13 we issued an all clear.
We are aware that the network performance throughout the second quarter of this year has been sub-optimal. We will be having a full review of the routing element of the network architecture in the coming weeks to determine if any changes in topology or architecture can be made to reduce the likelihood of similar issues occurring again in the future. We would like to stress however that we and our vendor believe the new firmware which both core routers are now running should have resolved the issues that we have been working with our vendor to fix.
We would like to apologise of the inconvenience caused by yesterday evenings issues. |
Thanks,
Andrew _________________ | Andrew Bassett
| Managing Director, NetHosted Ltd.
| Follow us on Twitter: http://twitter.com/nethosted
| Members, tell us what you think of NetHosted! |
|
| Back to top |
|
User Permissions |
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You cannot attach files in this forum You can download files in this forum
|
| |