AWS explained that a single typo was the reason for service interruption
Amazon Web Services (AWS) finally made a statement explaining the service interruption that affected a significant portion of Internet earlier this week. AWS blamed the engineering team and claimed that the problem originated from a single typo ¹ ².
AWS’s popular web hosting service has become inaccessible for more than 4 hours on Friday because of the power shortage experienced on its S3 servers. The sites affected by the disruption includes Quora, Trello, IFTTT, Business Insider, Medium, Uber, and Grammarly, to name a few. The problem was not limited to websites only; The AWS downtime also caused an interruption for IFTTT applets, according to reports from users.
The AWS statement about why the interruption was caused came today. According to the blog post, Tuesday morning staff of the S3 team needed to shut down some of the servers during the debugging of AWS’s billing system. However, as a result of incorrectly typing an input into the line of code, more servers were shut down than planned, which made other subsystems used for load balancing and data storage Unavailable, including Amazon’s Elastic Compute Cloud (EC2).
AWS says it took a long time to reload the system after this rather extensive interruption and huge growth they had in the recent years was also contributed to the interruption. AWS, which obviously can not keep up with the demand and growth rate, says that they will make the necessary improvements an deploy solutions that will minimize human error in the future.