Engineers mess up causing Microsoft Azure downtime

Microsoft engineers make a roll-out mistake, costing Azure some downtime.

Published
Updated
1 minute & 15 seconds read time

Due to gaps in the deployment policies produced by engineers, Microsoft's Azure cloud service was taken offline during a period of time throughout November 2014. This information has been discovered thanks to a detailed mea culpa analysis by Microsoft themselves.

Engineers mess up causing Microsoft Azure downtime | TweakTown.com

Jason Zander, Azure team member, conducted a final root cause analysis (RCA) and published it recently, claiming that the engineers intended to push software changes to improve performance and reduce processor load of the services' front-end system. However an outage was caused, meaning customers being unable to connect to Azure's storage, virtual machine, website, Active Directory or management portal functions.

The coding succeeded well in improving performance in the testing phases, however the full roll-out was discovered to encounter two main issues. Usually Microsoft deploys these updates in waves, slowly increasing the updated infrastructures bit by bit rather than a full roll-out. However an engineer saw this update as a low risk exercise after a small testing phase and pushed it to everyone in one hit. Thanks to this blunder and subsequent outage, Microsoft are heavily enforcing staged deployments from now on.

The second main mistake was explained by iTnews as leading "to the software change being wrongly enabled on Azure Blob (binary large object) storage front-ends when it had only been tested against table storage front-ends. This exposed a bug that caused some Blob storage front-ends being stuck in infinite loops, and ceasing to respond to requests."

It seems that Microsoft has learned from their mistakes and here's hoping the engineer still has a job to feed his family and lives to work another day. Alongside these two errors rendering their online service useless for many, Microsoft have further blamed poor communications during the outage as part of another issue. Further stating that tweets by the @Azure Twitter account and their live blogs didn't inform consumers well enough of quick updates.

NEWS SOURCE:itnews.com.au

I'm a competitive gamer and was an eSports employee. Recent changes have seen me hang up the mouse and move over to the technology world, covering all news for TweakTown, ranging from gaming news to opinion articles and the latest tech releases. Expect to see a few different articles on international eSports news and competitive game releases, as well as audio and mobile device content.

Newsletter Subscription

Related Tags