Major websites like Amazon, Reddit, CNN, The New York Times, Hulu, and others were shut down for an hour. The errors were mostly centered on websites that were generating plenty of traffic but the outage wasn’t universal. Users from Berlin were not affected by it which caused many to wonder if it was a targeted attack or not.
What Caused It?
Though there are some who speculated that this massive outage was caused by a cyberattack, the source of the problem was traced to Fastly. Fastly is a company that provides content network delivery service (CDN). When working, this service helps improve the speed as well as reliability of the internet to its users. The company was quick to admit on its status page that it was facing some problems. Although some websites, like BBC, which has its own backup system, the rest of the major sites that are using Fastly for its CDN service had to wait until the problem was fixed.
Why are CDNs Important?
A content network delivery service, like what Fastly provides, is essential in making the internet accessible to everyone across the globe. Instead of a website’s visitors using a single server to access it, they are redirected to Fastly and its huge server farms placed around the world. These server farms usually have copies of their clients’ websites for easy access.
Basically, a CDN reduces the time it takes for a page to load when you are viewing it. For example, the main website is located in the UK and you are from Asia. Instead of the data traveling long distances to reach the receiver and back again, the connection is directed to the server farms instead. Since they are nearer to the sender, the data requested can be made available in a few seconds. Additionally, CDNs help websites remain stable and reachable from different parts of the world.
Is Fastly Reliable?
When things are running smoothly, Fastly gets the job done. It is actually one of the major companies that are offering CDN services today alongside CloudFlare as well as Amazon’s Cloudfront. Fastly is reliable to the point that even Amazon is using its service for its retail website rather than its own CDN service. They have been using Fastly since 2020 which goes to show that it does know what it is doing.
It is not yet clear as to what triggered the massive outage, but a spokesperson from Fastly stated that they were able to trace to a service configuration that caused a disruption in their points of presence or POPs. These are their server farms that are scattered across the globe. This particular configuration caused a massive failure to their CDNs which affected their clients and their users. Based on their statement, it can be assumed that simple configuration has caused a cascading effect since one tiny issue can lead to a massive disruption of service.
Is a Cyberattack Behind It?
Fastly has already denied that the outage was caused by a malicious attack since they have already traced the issue to a bad configuration. And since there is no other evidence presented that a cyberattack was indeed the cause of this interruption, it is highly unlikely that Fastly experienced an attack on its systems.
The unexpected outage is somewhat similar to what happened to Cloudflare last year. Based on results of the investigation, there was only one error that was found in the physical link between Chicago and Newark that caused their connection to fail. This incident led to the traffic being diverted to the connection between Atlanta and Washington D.C. which also failed. This was due to the fact that the connection was overloaded thus causing a failure on that end too.
This particular failure caused an internet outage in several cities in Europe and in America for half an hour. The cascading failure that happened in Cloudflare almost took down 20 of their data centers in the world.
Why Do Internet Outages Happen?
With technology evolving and internet users demanding faster internet speeds, a high concentration of internet infrastructure has been left in the hands of only a handful of companies. Unfortunately, content delivery networks, like those that are operated by Cloudflare and Fastly, have been determined as choke points.
Cloud hosts, such as AWS (Amazon Web Services), Google Cloud Platform, and Microsoft’s Azure, are also considered as choke points too. Although they do not fail often, there are instances when a human error can cause a massive failure that can affect their clients. One reason why these cloud hosts do not fail most of the time is because they are investing their resources to maintain reliability and resilience.
There is a possibility for a website to run on more than one provider to serve as backup in case an issue, such as in the case of Fastly, occurs. Although this ensures that the clients under the company will not experience downtime in case the first provider fails, this will be a costly endeavor. Additionally, using two or more providers can be complex not to mention that there is no guarantee that an outage will be avoided. One example here is the UK government’s own website which uses two providers. Aside from using Fastly, their backup was CloudFlare. However, to switch to that server requires a manual intervention which did not happen at the time of the internet outage.
This recent outage shows how vulnerable the internet can be. The threat of a cyberattack causing a Business Interruption (BI) is definitely the first thing that pops in mind, but this incident shows us the devastating potential of a system failure or an outage.
Luckily, this time the BI lasted for only an hour, but, will they be that fortunate next time as well?