The list is highly subjective and by no means complete. If you need more comprehensive list of papers, then probably Papers We Love is a much better resource.
- Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs
- FIFO Queues are All You Need for Cache Eviction
- Hashed and Hierarchical Timing Wheels: Efficient Data Structures for Implementing a Timer Facility
- SPFresh: Incremental In-Place Update for Billion-Scale Vector Search
- A simple totally ordered broadcast protocol
- Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes
- Calvin: Fast Distributed Transactions for Partitioned Database Systems
- In Search of an Understandable Consensus Algorithm
- Logical Physical Clocks and Consistent Snapshots in Globally Distributed Databases
- Paxos Made Live - An Engineering Perspective
- Paxos Made Simple
- Time, Clocks, and the Ordering of Events in a Distributed System
- Unreliable Failure Detectors for Reliable Distributed Systems
- Viewstamped Replication Revisited
- Are You Sure You Want to Use MMAP in Your Database Management System?
- Can Applications Recover from fsync Failures?
- Simple Testing Can Prevent Most Critical Failures
- A Critique of ANSI SQL Isolation Levels
- Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases
- Amazon DynamoDB: A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service
- Cassandra - A Decentralized Structured Storage System
- Dynamo: Amazon’s Highly Available Key-value Store
- F1: A Distributed SQL Database That Scales
- High Performance Transactions via Early Write Visibility
- Highly Available Transactions: Virtues and Limitations
- Large-scale Incremental Processing Using Distributed Transactions and Notifications
- Linearizability: A Correctness Condition for Concurrent Objects
- Procella: Unifying serving and analytical data at YouTube
- Spanner, TrueTime & The CAP Theorem
- Spanner: Becoming a SQL System
- Spanner: Google’s Globally-Distributed Database
- Bigtable: A Distributed Storage System for Structured Data
- CFS: A Distributed File System for Large Scale Container Platforms
- CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data
- Ceph: A Scalable, High-Performance Distributed File System
- Dynamic Metadata Management for Petabyte-scale File Systems
- Facebook’s Tectonic Filesystem: Efficiency from Exascale
- Finding a needle in Haystack: Facebook’s photo storage
- Megastore: Providing Scalable, Highly Available Storage for Interactive Services
- RADOS: A Scalable, Reliable Storage Service for Petabyte-scale Storage Clusters
- Replex: A Scalable, Highly Available Multi-Index Data Store
- SLM-DB: Single-Level Key-Value Store with Persistent Memory
- The Google File System
- WiscKey: Separating Keys from Values in SSD-conscious Storage
- f4: Facebook’s Warm BLOB Storage System
- Borg, Omega, and Kubernetes
- Large-scale cluster management at Google with Borg
- Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
- Omega: flexible, scalable schedulers for large compute clusters
This work is licensed under a Creative Commons Attribution 4.0 International License.