Effective Strategies for Handling Concurrent Writes in System Design
Written on
Understanding Concurrent Writes
In the realm of system design interviews, grasping the concept of concurrent writes is crucial. Concurrent writes refer to instances where multiple write operations occur simultaneously on the same data. In distributed systems, achieving perfect clock synchronization across nodes is nearly impossible, making it challenging to ascertain if writes truly happened at the same moment.
Before we delve into conflict resolution techniques related to quorum writes, it's vital to define what constitutes a concurrent write. Typically, it implies simultaneous write operations, but in practice, this can be more complex due to the lack of synchronized clocks across distributed nodes.
Consider a scenario involving a system that manages key/value pairs. Suppose Node A writes a value (X, 7) to the database. Shortly after, Node B retrieves this value, increments it, and writes back (X, 8). Here, Node A’s write is seen as occurring before Node B's, establishing a causal dependency. Thus, these operations are not concurrent.
In contrast, imagine both nodes trying to update the same key without knowledge of each other's actions. Even if their write requests do not overlap in time, network delays and failures can result in inconsistent data. Take, for example, the following sequence where Node A aims to set key X to 17, while Node B attempts to set it to 39:
- Node A's write is received by replica#1, but Node B's write fails to reach it due to a network issue.
- Replica#2 processes Node A's write first, followed by Node B's.
- Replica#3 receives Node B's write before Node A's.
This situation leads to inconsistencies across replicas: replica#1 holds (X, 17), replica#2 has (X, 39), and replica#3 stores (X, 17). Clearly, without proper conflict resolution, the system cannot maintain a consistent value for key X.
In this video, "Google SWE teaches systems design | EP3: Multileader replication," the intricacies of managing writes in distributed systems are explored, offering valuable insights into replication challenges.
The Need for Convergence
For replication systems to function effectively, it is essential for all nodes to converge on a consistent value over time. As previously mentioned, concurrent writes do not establish a clear order, leaving their relationship ambiguous. When one event occurs before another, it is logical for the latter to overwrite the former. However, conflicts arising from concurrent writes must be addressed to ensure data integrity.
Forcing an Order on Writes
To handle the lack of order among concurrent writes, one approach is to impose an order on them. This can be achieved by associating a timestamp or a unique identifier with each write, allowing for unambiguous comparisons. The method of adopting timestamps to determine the order and selecting the most recent write as the final value is known as Last Write Wins (LWW). This technique is employed by databases like Cassandra and is also an option in Riak.
In the video "Systems Design 0 to 1 with Ex-Google SWE," viewers gain a deeper understanding of system design fundamentals, including effective strategies for managing concurrent operations.
Challenges with Last Write Wins
While LWW can effectively resolve conflicts, it comes with durability trade-offs. Writes that are acknowledged to clients might still be lost if they are not recorded across all nodes, making LWW unsuitable for scenarios where data loss is unacceptable. It can, however, be a viable option in cache designs where some data loss is tolerable. For systems like Cassandra, adopting immutable keys (e.g., using UUIDs) can mitigate data loss risks.
Your Comprehensive Interview Toolkit
To excel in system design interviews, consider enrolling in specialized courses that enhance your understanding and skills. Here are some recommended resources:
- Grokking the Machine Learning Interview
- Grokking the System Design Interview
- Grokking Dynamic Programming Patterns for Coding Interviews
- Grokking the Advanced System Design Interview
- Grokking the Coding Interview: Patterns for Coding Questions
- Grokking the Object-Oriented Design Interview
- Machine Learning System Design
- System Design Course Bundle
- Coding Interviews Bundle
- Tech Design Bundle
- All Courses Bundle