Why Facebook Never Goes Down: The Two Systems Behind 3 Billion Users

Most engineers never study LogDevice and RocksDB. Here is why they should.

Apr 02, 2026

Every time you open Facebook, something extraordinary happens behind the scenes.

Three billion people are doing the exact same thing at the same time. Posting. Scrolling. Messaging. Watching. And somehow the whole thing just works. No crashes. No waiting. No downtime.

The answer is not more servers. It is not a bigger database. It is a fundamental design decision that most engineers never think about until their system is already on fire.

Meta treats reads and writes as two completely separate problems.

In most systems, reads and writes share the same path. The same database handles both. Which means when one gets busy, the other suffers. Your users are trying to load their feed while your pipeline is ingesting millions of new events at the same time. They are fighting over the same resources. And under load, everybody loses.

Most engineers respond to this by adding memory, scaling horizontally, or upgrading their database tier. None of that fixes the actual problem. Because the actual problem is architectural, not operational.

Meta solved it by building two completely different systems from scratch.

LOGDEVICE: BUILT FOR WRITES ONLY

LogDevice is Meta’s distributed log storage system. It was designed with one purpose: ingest data at massive speed without ever slowing down.

Every like, every message, every video view, every backend sensor ping across three billion users - LogDevice takes it all in. It uses a log-structured approach which means it writes data sequentially rather than randomly jumping around the disk. Sequential writes are dramatically faster than random writes. That is not an accident. That is a deliberate design choice made specifically to maximize write throughput.

LogDevice does not care about reads. It was never designed to serve reads efficiently. That is the point. By giving up on reads entirely, it becomes extraordinarily good at the one thing it was built for.

Most engineers designing their first production system try to pick one database that handles everything. LogDevice is the proof that this instinct, while understandable, is wrong at scale.

ROCKSDB: BUILT FOR READS WITH SURGICAL PRECISION

RocksDB started as Google’s LevelDB. Meta took it, rebuilt it, and open sourced it in 2013. Today it powers systems at Facebook, LinkedIn, Yahoo, Twitter, and hundreds of other companies running at scale.

The reason Meta built RocksDB instead of using an existing solution is the same reason they built LogDevice. Nothing on the market gave them the control they needed.

RocksDB is an embeddable key-value store that lets you tune read and write performance independently at the instance level. This is the part most engineers miss.

You can deploy one RocksDB instance configured entirely for fast point lookups optimized for the read patterns of a news feed where you need to retrieve a specific user’s data in milliseconds. You deploy another instance configured for high write throughput - optimized for the ingestion patterns of an analytics pipeline processing billions of events. Same underlying technology. Completely different configurations. Completely different jobs.

They never compete for the same resources because they were never meant to run the same workload.

RocksDB also uses a data structure called an LSM tree - Log Structured Merge tree which batches writes in memory and flushes them to disk in sorted order. This makes writes fast and keeps related data physically close together on disk. When you request data, the disk has to seek less to find it. Less seeking means faster reads. Meta takes this even further by pre-arranging the most frequently accessed bytes so they are physically adjacent on disk. The result is a feed that loads in milliseconds regardless of how many people are using it simultaneously.

WHY THIS MATTERS FOR YOUR SYSTEM RIGHT NOW

You are probably not building for three billion users. But the principle applies at every scale.

If you have a system that slows down under load, the first question to ask is not what hardware do I need. The question is are my reads and writes competing for the same resources.

Discussion about this post

Ready for more?