Redis源码 - High availability - Sentinel
前序文章讨论了Redis 从事件管理->命令处理->持久化->主从复制的实现
Redis HA的机制是如何实现呢?
本文主要想聊聊non-clustered(Redis Cluster) 模式下的高可用机制 Sentinel
-
Sentinel提供哪些功能?
-
Sentinel如果与master/slave 通信?
-
Sentinel 如何实现故障的发现和确认?
-
Sentinel如何leader election?
-
Sentinel如何完成failover?
- 多个slaves,需要选择哪一个slave?
- 切换步骤是什么?
Sentinel 功能 - big picture
- Monitoring - 检查 redis 节点健康状况
- Notification - Sentinel 可以通过 API 通知系统,受监控节点出现了问题
- Automatic failover - 如果master未按预期工作,Sentinel 可以启动故障转移流程:将一个slave提升为主服务器,将其他额外的slave重新配置为使用新的master,并通知使用 Redis 服务器的应用程序连接时要使用的新地址
- Configuration provider - Sentinel 充当client服务发现的源:client连接到 Sentinel,请求给定服务的当前 Redis master的地址。如果发生故障转移,Sentinel 将报告新的地址
Sentinel 通信
Sentinel启动
命令启动
- redis-sentinel /path/to/your/sentinel.conf
- redis-server /path/to/your/sentinel.conf --sentinel
代码略,生成sentinelHandleConfiguration->createSentinelRedisInstance->createInstanceLink
- RedisInstance
- InstanceLink
连接
- sentinel节点连接master/slave
- sentinel节点间也是互相连接的
- sentinel是一种特殊的server mode, 亦可以处理client过来了连接
建立连接
serverCron 定时调用sentinelTimer,最终调用sentinelReconnectInstance建立Async连接,独立与主线程
sentinelTimer->
sentinelHandleDictOfRedisInstances->
sentinelHandleRedisInstance->
sentinelReconnectInstance
struct instanceLink {
int refcount; /* Number of sentinelRedisInstance owners. */
int disconnected; /* Non-zero if we need to reconnect cc or pc. */
int pending_commands; /* Number of commands sent waiting for a reply. */
redisAsyncContext *cc; /* Hiredis context for commands. */
redisAsyncContext *pc; /* Hiredis context for Pub / Sub. */
......
}
/* Create the async connections for the instance link if the link
* is disconnected. Note that link->disconnected is true even if just
* one of the two links (commands and pub/sub) is missing. */
void sentinelReconnectInstance(sentinelRedisInstance *ri) {
......
/* Commands connection. */
if (link->cc == NULL) {
link->cc = redisAsyncConnectBind(ri->addr->ip,ri->addr->port,NET_FIRST_BIND_ADDR);
if (link->cc->err) {
sentinelEvent(LL_DEBUG,"-cmd-link-reconnection",ri,"%@ #%s",
link->cc->errstr);
instanceLinkCloseConnection(link,link->cc);
} else {
link->pending_commands = 0;
link->cc_conn_time = mstime();
link->cc->data = link;
redisAeAttach(server.el,link->cc);
redisAsyncSetConnectCallback(link->cc,
sentinelLinkEstablishedCallback);
redisAsyncSetDisconnectCallback(link->cc,
sentinelDisconnectCallback);
sentinelSendAuthIfNeeded(ri,link->cc);
sentinelSetClientName(ri,link->cc,"cmd");
/* Send a PING ASAP when reconnecting. */
sentinelSendPing(ri);
}
}
/* Pub / Sub */
if ((ri->flags & (SRI_MASTER|SRI_SLAVE)) && link->pc == NULL) {
link->pc = redisAsyncConnectBind(ri->addr->ip,ri->addr->port,NET_FIRST_BIND_ADDR);
if (link->pc->err) {
sentinelEvent(LL_DEBUG,"-pubsub-link-reconnection",ri,"%@ #%s",
link->pc->errstr);
instanceLinkCloseConnection(link,link->pc);
} else {
int retval;
link->pc_conn_time = mstime();
link->pc->data = link;
redisAeAttach(server.el,link->pc);
redisAsyncSetConnectCallback(link->pc,
sentinelLinkEstablishedCallback);
redisAsyncSetDisconnectCallback(link->pc,
sentinelDisconnectCallback);
sentinelSendAuthIfNeeded(ri,link->pc);
sentinelSetClientName(ri,link->pc,"pubsub");
/* Now we subscribe to the Sentinels "Hello" channel. */
retval = redisAsyncCommand(link->pc,
sentinelReceiveHelloMessages, ri, "%s %s",
sentinelInstanceMapCommand(ri,"SUBSCRIBE"),
SENTINEL_HELLO_CHANNEL);
if (retval != C_OK) {
/* If we can't subscribe, the Pub/Sub connection is useless
* and we can simply disconnect it and try again. */
instanceLinkCloseConnection(link,link->pc);
return;
}
}
}
......
}
连接过程&通信
- sentinel 通过配置文件 master 的链接信息,成链接 master(真正的连接见👆),像+monitor 频道发布消息quorum
- 连接成功后 ASAP发送 PING
- sentinel 向 master 发送 INFO 命令,获取 master 上的 slave 名单
- 由于sentinel是多对多的连接方式,引入pub-sub模式处理信息交互
- sentinel 向 master/slave 订阅了 sentinel:hello 频道,当其它sentinel节点向 master/slave 发布消息时,订阅者也能被通知,所以当前 sentinel 也能收到其它 sentinel 的信息,并进行链接
struct redisCommand sentinelcmds[] = {
{"ping",pingCommand,1,"",0,NULL,0,0,0,0,0},
{"sentinel",sentinelCommand,-2,"",0,NULL,0,0,0,0,0},
{"subscribe",subscribeCommand,-2,"",0,NULL,0,0,0,0,0},
{"unsubscribe",unsubscribeCommand,-1,"",0,NULL,0,0,0,0,0},
{"psubscribe",psubscribeCommand,-2,"",0,NULL,0,0,0,0,0},
{"punsubscribe",punsubscribeCommand,-1,"",0,NULL,0,0,0,0,0},
{"publish",sentinelPublishCommand,3,"",0,NULL,0,0,0,0,0},
{"info",sentinelInfoCommand,-1,"",0,NULL,0,0,0,0,0},
{"role",sentinelRoleCommand,1,"l",0,NULL,0,0,0,0,0},
{"client",clientCommand,-2,"rs",0,NULL,0,0,0,0,0},
{"shutdown",shutdownCommand,-1,"",0,NULL,0,0,0,0,0},
{"auth",authCommand,2,"sltF",0,NULL,0,0,0,0,0}
};
以上是sentinel端使用的命令
例如sentinel命令,处理函数 sentinelCommand
命令处理
processCommand,详细见前序文章 Redis 事件处理
Sentinel 故障处理
sentinelTimer -> sentinelHandleRedisInstance定时管理节点
过程如下:
-
重连 sentinelHandleRedisInstance 会调用 sentinelReconnectInstance 函数,尝试和断连的实例重新建立连接
-
心跳 sentinelSendPeriodicCommands 向实例发送 PING、INFO 等命令
-
判断主观下线 sentinelCheckSubjectivelyDown
-
判断客观下线 sentinelCheckObjectivelyDown
-
故障处理 sentinelStartFailoverIfNeeded
- 如果要启动故障切换,就调用 sentinelAskMasterStateToOtherSentinels 函数,获取其他哨兵对主节点状态的判断,并向其他哨兵发送 is-master-down-by-addr 命令,发起 Leader 选举
- sentinelFailoverStateMachine执行故障切换
- 再次调用 sentinelAskMasterStateToOtherSentinels 函数,获取其他哨兵实例对主节点状态的判断
/* Perform scheduled operations for the specified Redis instance. */
void sentinelHandleRedisInstance(sentinelRedisInstance *ri) {
/* ========== MONITORING HALF ============ */
/* Every kind of instance */
sentinelReconnectInstance(ri);
sentinelSendPeriodicCommands(ri);
/* ============== ACTING HALF ============= */
/* We don't proceed with the acting half if we are in TILT mode.
* TILT happens when we find something odd with the time, like a
* sudden change in the clock. */
if (sentinel.tilt) {
if (mstime()-sentinel.tilt_start_time < SENTINEL_TILT_PERIOD) return;
sentinel.tilt = 0;
sentinelEvent(LL_WARNING,"-tilt",NULL,"#tilt mode exited");
}
/* Every kind of instance */
sentinelCheckSubjectivelyDown(ri);
/* Masters and slaves */
if (ri->flags & (SRI_MASTER|SRI_SLAVE)) {
/* Nothing so far. */
}
/* Only masters */
if (ri->flags & SRI_MASTER) {
sentinelCheckObjectivelyDown(ri);
if (sentinelStartFailoverIfNeeded(ri))
sentinelAskMasterStateToOtherSentinels(ri,SENTINEL_ASK_FORCED);
sentinelFailoverStateMachine(ri);
sentinelAskMasterStateToOtherSentinels(ri,SENTINEL_NO_FLAGS);
}
}
故障发现
心跳
sentinelSendPeriodicCommands 函数负责所有节点sentinel/master/slave的心跳
-
调用redisAsyncCommand
-
先向masters/slaves发送 INFO 命令,获取信息
- 发现节点信息变更,同步新的节点属性信息
- 如果在 master 的回复文本中发现新的 slave,进行链接建立联系
- 节点角色改变,进行故障转移或其它相关的逻辑
-
再向三种类型节点发送PING
- 三个角色之间通过发送 PING 作为心跳,确认对方是否在线
-
主管下线判断
sentinelCheckSubjectivelyDown 函数 检查所有节点类型 sentinel/master/slave,是否主观下线
-
计算间隔elapsed
-
关闭超时连接 代码略
-
超时判断
- elapsed>down_after_period 阈值
- sentinel 认为当前master是主节点,但是这个节点向sentinel报告它将成为slave,并且在 down_after_period 时长,再加上两个 INFO 命令间隔后,该节点还是没有转换成功
-
两个条件有一个满足时,sentinel就判定主节点为主观下线了
-
调用 sentinelEvent 函数发送“+sdown”事件信息
/* Is this instance down from our point of view? */
void sentinelCheckSubjectivelyDown(sentinelRedisInstance *ri) {
mstime_t elapsed = 0;
if (ri->link->act_ping_time)
elapsed = mstime() - ri->link->act_ping_time;//距离上一次发送PING的间隔
else if (ri->link->disconnected)
elapsed = mstime() - ri->link->last_avail_time;//距离最后可用的间隔
......
/* Update the SDOWN flag. We believe the instance is SDOWN if:
*
* 1) It is not replying.
* 2) We believe it is a master, it reports to be a slave for enough time
* to meet the down_after_period, plus enough time to get two times
* INFO report from the instance. */
if (elapsed > ri->down_after_period ||
(ri->flags & SRI_MASTER &&
ri->role_reported == SRI_SLAVE &&
mstime() - ri->role_reported_time >
(ri->down_after_period+SENTINEL_INFO_PERIOD*2)))
{
/* Is subjectively down */
if ((ri->flags & SRI_S_DOWN) == 0) {
sentinelEvent(LL_WARNING,"+sdown",ri,"%@");
ri->s_down_since_time = mstime();
ri->flags |= SRI_S_DOWN;
}
} else {
/* Is subjectively up */
if (ri->flags & SRI_S_DOWN) {
sentinelEvent(LL_WARNING,"-sdown",ri,"%@");
ri->flags &= ~(SRI_S_DOWN|SRI_SCRIPT_KILL_SENT);
}
}
}
客观下线判断
sentinelCheckObjectivelyDown 函数,来检测主节点是否客观下线
客观下线需要全部sentinel 认可,所以这里分层两部分秒速
-
当前sentinel(发现master出现问题的sentinel)
- 使用 quorum 变量,来记录判断主节点为主观下线的sentinel数量
- 如果当前sentinel已经判断主节点为主观下线,那么它会先把 quorum 值置为 1
- 然后,它会依次判断其他sentinel的 flags 变量,检查是否设置了 SRI_MASTER_DOWN 的标记
- 如果设置了,它就会把 quorum 值加 1
- 当遍历完 sentinels 哈希表后,sentinelCheckObjectivelyDown 函数会判断 quorum 值是否大于等于预设定的 quorum 阈值(sentinel.conf 配置)
- quorum >= master->quorum 设置 odown 客观下线
- 调用 sentinelEvent 函数发送 +odown 事件消息
/* Is this instance down according to the configured quorum?
*
* Note that ODOWN is a weak quorum, it only means that enough Sentinels
* reported in a given time range that the instance was not reachable.
* However messages can be delayed so there are no strong guarantees about
* N instances agreeing at the same time about the down state. */
void sentinelCheckObjectivelyDown(sentinelRedisInstance *master) {
......
if (master->flags & SRI_S_DOWN) {
/* Is down for enough sentinels? */
quorum = 1; /* the current sentinel. */
/* Count all the other sentinels. */
di = dictGetIterator(master->sentinels);
while((de = dictNext(di)) != NULL) {
sentinelRedisInstance *ri = dictGetVal(de);
if (ri->flags & SRI_MASTER_DOWN) quorum++;
}
dictReleaseIterator(di);
if (quorum >= master->quorum) odown = 1;
}
......
}
其他sentinel的 flags 变量如何获得呢?
sentinelAskMasterStateToOtherSentinels 函数,调用redisAsyncCommand
向其他sentinel 发送 is-master-down-by-addr命令 设置回调函数sentinelReceiveIsMasterDownReply
sentinelReceiveIsMasterDownReply接收回复消息,设置 其他sentinel SRI_MASTER_DOWN
- 其他sentinel
接收👆sentinel发送过来了is-master-down-by-addr执行判断
sentinelCommand执行命令:
- 👆的 sentinel 如果没有处于异常保护状态,而且也检测到询问的 master 已经主观下线了(定时检测)
- 发送回复给当前sentinel👆确认SRI_MASTER_DOWN
if (!strcasecmp(c->argv[1]->ptr,"is-master-down-by-addr")) {
/* SENTINEL IS-MASTER-DOWN-BY-ADDR <ip> <port> <current-epoch> <runid>
*
* Arguments:
*
* ip and port are the ip and port of the master we want to be
* checked by Sentinel. Note that the command will not check by
* name but just by master, in theory different Sentinels may monitor
* differnet masters with the same name.
*
* current-epoch is needed in order to understand if we are allowed
* to vote for a failover leader or not. Each Sentinel can vote just
* one time per epoch.
*
* runid is "*" if we are not seeking for a vote from the Sentinel
* in order to elect the failover leader. Otherwise it is set to the
* runid we want the Sentinel to vote if it did not already voted.
*/
sentinelRedisInstance *ri;
long long req_epoch;
uint64_t leader_epoch = 0;
char *leader = NULL;
long port;
int isdown = 0;
if (c->argc != 6) goto numargserr;
if (getLongFromObjectOrReply(c,c->argv[3],&port,NULL) != C_OK ||
getLongLongFromObjectOrReply(c,c->argv[4],&req_epoch,NULL)
!= C_OK)
return;
ri = getSentinelRedisInstanceByAddrAndRunID(sentinel.masters,
c->argv[2]->ptr,port,NULL);
/* It exists? Is actually a master? Is subjectively down? It's down.
* Note: if we are in tilt mode we always reply with "0". */
if (!sentinel.tilt && ri && (ri->flags & SRI_S_DOWN) &&
(ri->flags & SRI_MASTER))
isdown = 1;
......
/* Reply with a three-elements multi-bulk reply:
* down state, leader, vote epoch. */
addReplyMultiBulkLen(c,3);
addReply(c, isdown ? shared.cone : shared.czero);
addReplyBulkCString(c, leader ? leader : "*");
addReplyLongLong(c, (long long)leader_epoch);
if (leader) sdsfree(leader);
故障处理
sentinelCheckObjectivelyDown 观察到主管下线后
调用 sentinelStartFailoverIfNeeded 函数判断是否进行故障处理
切换条件:
- master flags 已经标记了 SRI_O_DOWN
- 当前没有在执行故障切换
- 如果已经开始故障切换,那么开始时间距离当前时间,需要超过 sentinel.conf 中 sentinel failover-timeout 的 2 倍
满足上诉条件后,触发 sentinelStartFailover,设置状态
- master->failover_epoch 用于后续投票
- master->failover_state = SENTINEL_FAILOVER_STATE_WAIT_START
- master->failover_start_time 用于方式多节点同时发起故障投票
接下来继续调用 sentinelAskMasterStateToOtherSentinels 发起leader投票 命令
/* Setup the master state to start a failover. */
void sentinelStartFailover(sentinelRedisInstance *master) {
serverAssert(master->flags & SRI_MASTER);
master->failover_state = SENTINEL_FAILOVER_STATE_WAIT_START;
master->flags |= SRI_FAILOVER_IN_PROGRESS;
master->failover_epoch = ++sentinel.current_epoch;//current_epoch增加
sentinelEvent(LL_WARNING,"+new-epoch",master,"%llu",
(unsigned long long) sentinel.current_epoch);
sentinelEvent(LL_WARNING,"+try-failover",master,"%@");
master->failover_start_time = mstime()+rand()%SENTINEL_MAX_DESYNC;
master->failover_state_change_time = mstime();
}
void sentinelCommand(client *c) {
......
/* Vote for the master (or fetch the previous vote) if the request
* includes a runid, otherwise the sender is not seeking for a vote. */
if (ri && ri->flags & SRI_MASTER && strcasecmp(c->argv[5]->ptr,"*")) {
leader = sentinelVoteLeader(ri,(uint64_t)req_epoch,
c->argv[5]->ptr,
&leader_epoch);
}
/* Reply with a three-elements multi-bulk reply:
* down state, leader, vote epoch. */
addReplyMultiBulkLen(c,3);
addReply(c, isdown ? shared.cone : shared.czero);
addReplyBulkCString(c, leader ? leader : "*");
addReplyLongLong(c, (long long)leader_epoch);
if (leader) sdsfree(leader);
......
}
Leader Election
为什么在failover前需要发起投票呢? 这是为了保证只有一个sentinel执行这个工作:sentinelCommand执行投票命令,调用sentinelVoteLeader
投票流程
-
当前sentinelA发起投票,发送自己的req_epoch和runid
-
其它 sentinel 节点,接收到拉票信息,进行投票
-
如果req_epoch > sentinel自己的epoch,同步数据sentinel.current_epoch = req_epoch
-
如果当前sentinel记录 记录的master->leader_epoch< req_epoch且sentinel自己的epoch<=req_epoch 回复投票给req_epoch
- 发送 +vote-for-leader 订阅
-
否则的话,sentinelVoteLeader 函数就直接返回当前 master 中记录的 Leader ID 了,这也是当前sentinel之前投过票后记录下来的
-
发起投票的sentinelA通过 sentinelReceiveIsMasterDownReply 函数来处理其他sentinel对 Leader 投票的返回结果,便于后面统计票数
- sentinel leader
- sentinel req_runid
- leader_epoch
-
最后通过sentinelFailoverWaitStart函数来判断发起投票的sentinelA是否为leader
必须满足两个条件:
- 获得超过半数的其他sentinel的赞成票
- 获得超过预设的 quorum 阈值的赞成票数
-
满足这两个条件sentinelA 变成为leader,不满足,继续投票流程
-
sentinelGetLeader函数完成
- 先统计其他sentinel投票结果,满足上面两个条件设置winner
- 如果winner不是我,大概率会认为他会赢,投票给他
- 否则投给自己
- 再此判断是否满足👆两个条件,满足winner
-
/* Vote for the sentinel with 'req_runid' or return the old vote if already
* voted for the specified 'req_epoch' or one greater.
*
* If a vote is not available returns NULL, otherwise return the Sentinel
* runid and populate the leader_epoch with the epoch of the vote. */
char *sentinelVoteLeader(sentinelRedisInstance *master, uint64_t req_epoch, char *req_runid, uint64_t *leader_epoch) {
if (req_epoch > sentinel.current_epoch) {
sentinel.current_epoch = req_epoch;
sentinelFlushConfig();
sentinelEvent(LL_WARNING,"+new-epoch",master,"%llu",
(unsigned long long) sentinel.current_epoch);
}
if (master->leader_epoch < req_epoch && sentinel.current_epoch <= req_epoch)
{
sdsfree(master->leader);
master->leader = sdsnew(req_runid);
master->leader_epoch = sentinel.current_epoch;
sentinelFlushConfig();
sentinelEvent(LL_WARNING,"+vote-for-leader",master,"%s %llu",
master->leader, (unsigned long long) master->leader_epoch);
/* If we did not voted for ourselves, set the master failover start
* time to now, in order to force a delay before we can start a
* failover for the same master. */
if (strcasecmp(master->leader,sentinel.myid))
master->failover_start_time = mstime()+rand()%SENTINEL_MAX_DESYNC;
}
*leader_epoch = master->leader_epoch;
return master->leader ? sdsnew(master->leader) : NULL;
}
/* Scan all the Sentinels attached to this master to check if there
* is a leader for the specified epoch.
*
* To be a leader for a given epoch, we should have the majority of
* the Sentinels we know (ever seen since the last SENTINEL RESET) that
* reported the same instance as leader for the same epoch. */
char *sentinelGetLeader(sentinelRedisInstance *master, uint64_t epoch) {
......
/* Count other sentinels votes */
di = dictGetIterator(master->sentinels);
while((de = dictNext(di)) != NULL) {
sentinelRedisInstance *ri = dictGetVal(de);
if (ri->leader != NULL && ri->leader_epoch == sentinel.current_epoch)
sentinelLeaderIncr(counters,ri->leader);
}
dictReleaseIterator(di);
/* Check what's the winner. For the winner to win, it needs two conditions:
* 1) Absolute majority between voters (50% + 1).
* 2) And anyway at least master->quorum votes. */
di = dictGetIterator(counters);
while((de = dictNext(di)) != NULL) {
uint64_t votes = dictGetUnsignedIntegerVal(de);
if (votes > max_votes) {
max_votes = votes;
winner = dictGetKey(de);
}
}
dictReleaseIterator(di);
/* Count this Sentinel vote:
* if this Sentinel did not voted yet, either vote for the most
* common voted sentinel, or for itself if no vote exists at all. */
if (winner)
myvote = sentinelVoteLeader(master,epoch,winner,&leader_epoch);
else
myvote = sentinelVoteLeader(master,epoch,sentinel.myid,&leader_epoch);
if (myvote && leader_epoch == epoch) {
uint64_t votes = sentinelLeaderIncr(counters,myvote);
if (votes > max_votes) {
max_votes = votes;
winner = myvote;
}
}
voters_quorum = voters/2+1;
if (winner && (max_votes < voters_quorum || max_votes < master->quorum))
winner = NULL;
winner = winner ? sdsnew(winner) : NULL;
sdsfree(myvote);
dictRelease(counters);
return winner;
}
总结一下:
-
比较req_epoch和sentinel本地master->leader_epoch
-
半数且超过quorum 阈值的赞成票数
-
随机性发起投票 - 每个sentinel节点serverCron 中调用 sentinelTimer 函数,增加随机性(类似于raft)
- server.hz = CONFIG_DEFAULT_HZ + rand() % CONFIG_DEFAULT_HZ;
-
先下手为强,非最先发起投票的sentinel节点设置 master->failover_start_time 需要有一定的时间间隔 :master->failover_timeout*2 再发起投票
-
Election超时控制 election_timeout
Failover
sentinelFailoverStateMachine 执行故障切换函数,由sentinel 本轮leader完成(👆的得票leader)
-
SRI_FAILOVER_IN_PROGRESS 说明faiover正在进行,返回(状态机)
-
如果是failover_state(sentinelStartFailover函数里设置) 开启failover流程
- SENTINEL_FAILOVER_STATE_WAIT_START 等待投票结果
- SENTINEL_FAILOVER_STATE_SELECT_SLAVE 选择最优slave to promote - 以下条件之一不存在:S_DOWN、O_DOWN、DISCONNECTED (slave连接保持其未被发现为主客观下线) - 从节点最后一次回复 ping 的时间不超过 PING 周期的 5 倍 (sentinel ping) - info_refresh 的时间不早于 INFO 刷新周期的 3 倍 (sentinel info) - master_link_down_time 不超过: (now - master->s_down_since_time) + (master->down_after_period * 10)。 基本上,主服务器已经宕机,从节点报告断开的时间不会超过配置的 down-after-period 的 10 倍 黑魔法 - 从节点优先级不能为零,否则将丢弃该从节点 - 在所有符合上述条件的从节点中,我们按以下排序关键字的顺序选择从节点 - 高优先级 salve slave_priority - 较大的已处理复制偏移量 - 较小的 runid(防止前两项相同)
-
sentinelFailoverSendSlaveOfNoOne 提升👆选择的最优 slave 为新master
-
SENTINEL_FAILOVER_STATE_WAIT_PROMOTION 等待最优 slave 成功晋升,slave->master role改变
-
SENTINEL_FAILOVER_STATE_RECONF_SLAVES 通知slaves 连接新 master
-
SENTINEL_FAILOVER_STATE_UPDATE_CONFIG slave 成功晋升 master 后,更新 master <--> slave 的数据结构关系
更多代码不详细展开
/* Failover machine different states. */
#define SENTINEL_FAILOVER_STATE_NONE 0 /* No failover in progress. */
#define SENTINEL_FAILOVER_STATE_WAIT_START 1 /* Wait for failover_start_time*/
#define SENTINEL_FAILOVER_STATE_SELECT_SLAVE 2 /* Select slave to promote */
#define SENTINEL_FAILOVER_STATE_SEND_SLAVEOF_NOONE 3 /* Slave -> Master */
#define SENTINEL_FAILOVER_STATE_WAIT_PROMOTION 4 /* Wait slave to change role */
#define SENTINEL_FAILOVER_STATE_RECONF_SLAVES 5 /* SLAVEOF newmaster */
#define SENTINEL_FAILOVER_STATE_UPDATE_CONFIG 6 /* Monitor promoted slave. */
void sentinelFailoverStateMachine(sentinelRedisInstance *ri) {
serverAssert(ri->flags & SRI_MASTER);
if (!(ri->flags & SRI_FAILOVER_IN_PROGRESS)) return;
switch(ri->failover_state) {
case SENTINEL_FAILOVER_STATE_WAIT_START:
sentinelFailoverWaitStart(ri);
break;
case SENTINEL_FAILOVER_STATE_SELECT_SLAVE:
sentinelFailoverSelectSlave(ri);
break;
case SENTINEL_FAILOVER_STATE_SEND_SLAVEOF_NOONE:
sentinelFailoverSendSlaveOfNoOne(ri);
break;
case SENTINEL_FAILOVER_STATE_WAIT_PROMOTION:
sentinelFailoverWaitPromotion(ri);
break;
case SENTINEL_FAILOVER_STATE_RECONF_SLAVES:
sentinelFailoverReconfNextSlave(ri);
break;
}
}
参考
Redis 5.0.1 source code