Replication
Appending entry is the only RPC to replicate logs from leader to followers or learners. Installing a snapshot can be seen as a special form of appending logs.
Append-entry
Raft logs can be together seen as a single value: An append-entry RPC forwards all logs to a follower and replace all the logs on the follower. This way it guarantees committed log can always been seen by next leader.
Although in practice, it is infeasible sending all the logs in one RPC. Thus, the receiving end in the algorithm becomes:
- Proceed only when
prev_log_id
matches local log id at the sameindex
. - Save every log entry into local store if:
- the entry at the target index is empty.
- the entry at the target index is the same as the input one. Otherwise, there is an inconsistent entry, the follower must delete all entries since this one before storing the input one.
Why need to delete
The following diagram shows only log term.
R1 5
R2 5
R3 5 3 3
R4
R5 2 4 4
If log 5 is committed by R1, and log 3 is not removed, R5 in future could become a new leader and overrides log 5 on R3.
Caveat: deleting all entries after prev_log_id
will get committed log lost
One of the mistakes is to delete all entries after prev_log_id
when a matching prev_log_id
is found, e.g.:
fn handle_append_entries(req) {
if store.has(req.prev_log_id) {
store.delete_logs(req.prev_log_id.index..)
store.append_logs(req.entries)
}
}
This results in loss of committed entry, because deleting and appending are not atomically executed.
E.g., the log entries are as following and R1 now is the leader:
R1 1,1 1,2 1,3
R2 1,1 1,2
R3
When the following steps take place, committed entry {1,2}
is lost:
- R1 to R2:
append_entries(entries=[{1,2}, {1,3}], prev_log_id={1,1})
- R2 deletes
{1,2}
- R2 crash
- R2 elected as leader and only see
{1,1}
; the committed entry{1,2}
is lost.
The safe way is to skip every entry that present in append-entries message then delete only the inconsistent entries.
Caveat: commit-index must not advance the last known consistent log
Because we can not just delete log[prev_log_id.index..]
, (which results in loss of committed
entry), the commit index must be updated only after append-entries
and must point to a log entry that is consistent to the leader.
Or there would be chance applying an uncommitted entry on a follower:
R0 1,1 1,2 3,3
R1 1,1 1,2 2,3
R2 1,1 1,2 3,3
- R0 to R1 append_entries:
entries=[{1,2}], prev_log_id = {1,1}, commit_index = 3
- R1 accepted this append-entries request but was not aware of that entry
{2,3}
is inconsistent to leader. Then it will updatecommit_index
to3
and apply{2,3}
Snapshot replication
Snapshot replication can be considered as a special form of log replication: It replicates all committed logs since the index-0 upto some index.
Similar to append-entry:
-
(1) If the logs contained in the snapshot matches logs that are stored on a Follower/Learner, nothing is done.
-
(2) If the logs conflicts with the local logs, ALL non-committed logs will be deleted, because we do not know which logs conflict. And effective membership has to be reverted to some previous non-conflicting one.
Delete conflicting logs
If snapshot_meta.last_log_id
conflicts with the local log,
Because the node that has conflicting logs won't become a leader:
If this node can become a leader, according to raft spec, it has to contain all committed logs.
But the log entry at last_applied.index
is not committed, thus it can never become a leader.
But, it could become a leader when more logs are received.
At this time, the logs after snapshot_meta.last_log_id
will all be cleaned.
The logs before or equal snapshot_meta.last_log_id
will not be cleaned.
Then there is chance this node becomes leader and uses these log for replication.
Delete all non-committed
It just truncates ALL non-committed logs here,
because snapshot_meta.last_log_id
is committed, if the local log id conflicts
with snapshot_meta.last_log_id
, there must be a quorum that contains snapshot_meta.last_log_id
.
Thus, it is safe to remove all logs on this node.
But removing committed logs leads to some trouble with membership management.
Thus, we just remove logs since committed+1
.
Not safe to clean conflicting logs after installing snapshot
It's not safe to remove the conflicting logs that are less than snap_last_log_id
after installing
snapshot.
If the node crashes, dirty logs may remain there. These logs may be forwarded to other nodes if this nodes becomes a leader.