Facebook reveals maintenance error was cause of global outage

Facebook reveals maintenance error was cause of global outage

Image:
Facebook reveals maintenance error was cause of global outage

Messaging app Telegram says it gained over 70 million new users during Facebook outage

Facebook has blamed a maintenance error for the global outage that left more than 2.9 billion users unable to access Facebook, WhatsApp, Instagram and other tools on Monday.

In a blog post, Facebook vice president of engineering Santosh Janardhan explained that outage was triggered by the system that manages Facebook's global backbone network capacity.

"The backbone is the network Facebook has built to connect all our computing facilities together, which consists of tens of thousands of miles of fibre-optic cables crossing the globe and linking all our data centres," Janardhan said.

The problem began when Facebook's engineers issued a command with the intention to assess the availability of global backbone capacity.

However, the command unintentionally took down all the connections in the company's backbone network, disconnecting Facebook's data centres from the rest of the world.

While Facebook has a system in place to audit commands to prevent errors like this, a bug in the audit system prevented it from properly stopping the command.

This change caused a complete disconnection between the company's data centres and the internet, leading to the collapse of company's global system for more than six hours.

Facebook, WhatsApp and Instagram - which are all owned by Facebook and run on shared infrastructure - stopped working. Users who tried to access these services through smartphone apps or over the web were met with error messages. Other related products, such as Facebook Workplace and Messenger, also went down.

Outage detector site Downdetector said it was the largest failure it had ever seen, with more than 10.6 million issues reported worldwide.

Many of Facebook's internal tools and systems used in day-to-day operations - including door access badges - were also affected, complicating attempts to diagnose and resolve the issue.

Facebook said it sent a team of engineers onsite to the data centres to debug and restart the systems. However, this was challenging because these facilities are designed with high levels of physical and system security in mind and are hard to get into. And once an individual has managed to get into the facilities, it is difficult for them to modify hardware and routers settings.

Facebook said it was also worried about the spike in traffic after restoration of network connectivity, as that could cause its websites and apps to crash.

However, access to services returned relatively quickly as the firm had previously run drills to prepare for such situations.

This was the worst outage for Facebook since 2008, when a bug caused the social media service to remain offline for about a day, although the platform had just 80 million users at that time. Today, Facebook has nearly 3 billion users.

Pavel Durov, the founder of messaging app Telegram, said on Tuesday that his platform gained over 70 million new users during Facebook outage, as people worldwide were left without key messaging services for several hours.

"The daily growth rate of Telegram exceeded the norm by an order of magnitude, and we welcomed over 70 million refugees from other platforms in one day," Durov wrote on his Telegram channel.

Commenting on Facebook outage, Julian Dunn, director of product marketing & advocacy at PagerDuty, a digital operations platform, said: "Although multi-hour outages are relatively rare, even short ones - 15 minutes or half an hour - have an outsized impact, as impatient consumers are all too eager to leave a down site and go elsewhere."

Meanwhile, Facebook faced further problems yesterday when whistleblower Frances Haugen told the US Congress that Facebook consistently chose profits over the safety and wellbeing of its users, hiding its own research that revealed the polarising effects of its products and the deleterious effects of Instagram on teenage girls in particular.

"The result has been more division, more harm, more lies, more threats, and more combat. In some cases, this dangerous online talk has led to actual
violence that harms and even kills people," Haugen said.