Nacos-Server jraft初始化失败，导致集群多节点服务下的实例数不一致，重启节点也无法恢复，最后只能删除data目录 #1118

guozongkang · 2024-06-26T12:38:30Z

集群环境： 3台ALiyun ECS 16C 32G
Nacos-Server版本： 2.1.2

问题现象：
Nacos-Server3台节点已经正常运行了半个月的时候，但是其中一台因为内存问题，我们不得不将其重启，我们将其命名为1节点，另外两台节点分别为2,3节点。将1节点重启的方式是执行bin目录下的shutdown脚本，然后执行bin下的startup脚本，这个时候我们发现了问题。从Nacos控制台查看，1节点显示某一个服务有45个实例，2，3节点显示这个服务有65个实例(后经查实，65个实例是正常的)。也就是说1节点的数据有问题，我们查看日志。发现
alipay-jraft日志有错误：

2024-06-19 00:16:35,087 WARN Node <naming_persistent_service/10.254.16.7:7848> RequestVote to 10.254.18.46:7848 error: Status[EINTERNAL<1004>: RPC exception:UNKNOWN].
2024-06-19 00:16:35,707 WARN Fail to issue RPC to 10.254.18.46:7848, consecutiveErrorTimes=1, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-19 00:16:35,710 WARN Fail to issue RPC to 10.254.18.46:7848, consecutiveErrorTimes=1, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-19 00:16:35,707 WARN Node <naming_persistent_service_v2/10.254.16.7:7848> RequestVote to 10.254.18.46:7848 error: Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception].
2024-06-19 00:16:35,710 WARN Fail to issue RPC to 10.254.18.46:7848, consecutiveErrorTimes=1, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-19 00:16:38,277 WARN Fail to issue RPC to 10.254.18.46:7848, consecutiveErrorTimes=11, error=Status[ENOENT<1012>: Peer id not found: 10.254.18.46:7848, group: naming_service_metadata]
2024-06-19 00:18:21,216 WARN Fail to issue RPC to 10.254.17.172:7848, consecutiveErrorTimes=1, error=Status[ENOENT<1012>: Peer id not found: 10.254.17.172:7848, group: naming_persistent_service]
2024-06-19 00:18:21,264 WARN Fail to issue RPC to 10.254.17.172:7848, consecutiveErrorTimes=1, error=Status[ENOENT<1012>: Peer id not found: 10.254.17.172:7848, group: naming_service_metadata]
2024-06-19 00:18:21,266 WARN Fail to issue RPC to 10.254.17.172:7848, consecutiveErrorTimes=1, error=Status[ENOENT<1012>: Peer id not found: 10.254.17.172:7848, group: naming_persistent_service_v2]
2024-06-19 00:18:26,139 WARN Node <naming_instance_metadata/10.254.16.7:7848> RequestVote to 10.254.17.172:7848 error: Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception].
2024-06-19 00:18:26,326 WARN Fail to issue RPC to 10.254.17.172:7848, consecutiveErrorTimes=11, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-19 00:18:26,328 WARN Fail to issue RPC to 10.254.17.172:7848, consecutiveErrorTimes=11, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-19 00:18:26,336 WARN Fail to issue RPC to 10.254.17.172:7848, consecutiveErrorTimes=11, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-19 00:18:28,668 WARN Fail to issue RPC to 10.254.17.172:7848, consecutiveErrorTimes=1, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-19 00:18:31,188 WARN Fail to issue RPC to 10.254.17.172:7848, consecutiveErrorTimes=11, error=Status[EINTERNAL<1004>: Check connection[10.254.17.172:7848] fail and try to create new one]
2024-06-19 00:18:31,360 WARN Fail to issue RPC to 10.254.17.172:7848, consecutiveErrorTimes=21, error=Status[EINTERNAL<1004>: Check connection[10.254.17.172:7848] fail and try to create new one]
2024-06-19 00:18:31,385 WARN Fail to issue RPC to 10.254.17.172:7848, consecutiveErrorTimes=21, error=Status[EINTERNAL<1004>: Check connection[10.254.17.172:7848] fail and try to create new one]
2024-06-19 00:18:31,388 WARN Fail to issue RPC to 10.254.17.172:7848, consecutiveErrorTimes=21, error=Status[EINTERNAL<1004>: Check connection[10.254.17.172:7848] fail and try to create new one]
2024-06-19 00:18:33,710 WARN Fail to issue RPC to 10.254.17.172:7848, consecutiveErrorTimes=21, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-19 00:18:36,225 WARN Fail to issue RPC to 10.254.17.172:7848, consecutiveErrorTimes=31, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-19 00:18:36,400 WARN Fail to issue RPC to 10.254.17.172:7848, consecutiveErrorTimes=31, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-19 00:18:36,424 WARN Fail to issue RPC to 10.254.17.172:7848, consecutiveErrorTimes=31, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-19 00:18:36,449 WARN Fail to issue RPC to 10.254.17.172:7848, consecutiveErrorTimes=31, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-19 00:18:38,786 WARN Fail to issue RPC to 10.254.17.172:7848, consecutiveErrorTimes=41, error=Status[EINTERNAL<1004>: Check connection[10.254.17.172:7848] fail and try to create new one]
2024-06-19 00:18:41,462 WARN Fail to issue RPC to 10.254.17.172:7848, consecutiveErrorTimes=41, error=Status[ENOENT<1012>: Peer id not found: 10.254.17.172:7848, group: naming_service_metadata]
2024-06-19 00:18:41,477 WARN Fail to issue RPC to 10.254.17.172:7848, consecutiveErrorTimes=41, error=Status[ENOENT<1012>: Peer id not found: 10.254.17.172:7848, group: naming_persistent_service_v2]
2024-06-19 00:18:41,530 WARN Fail to issue RPC to 10.254.17.172:7848, consecutiveErrorTimes=51, error=Status[ENOENT<1012>: Peer id not found: 10.254.17.172:7848, group: naming_instance_metadata]
2024-06-19 00:19:36,094 WARN ThreadId: Replicator [state=Destroyed, statInfo=<running=IDLE, firstLogIndex=171, lastLogIncluded=0, lastLogIndex=171, lastTermIncluded=0>, peerId=10.254.18.46:7848, waitId=2, type=Follower] already destroyed, ignore error code: 1001
2024-06-19 00:19:36,143 WARN Fail to issue RPC to 10.254.17.172:7848, consecutiveErrorTimes=1, error=Status[EINTERNAL<1004>: RPC exception:DEADLINE_EXCEEDED: deadline exceeded after 2.499983956s. [remote_addr=10.254.17.172/10.254.17.172:7848]]
2024-06-19 00:19:36,272 WARN Fail to issue RPC to 10.254.17.172:7848, consecutiveErrorTimes=1, error=Status[EINTERNAL<1004>: RPC exception:DEADLINE_EXCEEDED: deadline exceeded after 2.499984812s. [remote_addr=10.254.17.172/10.254.17.172:7848]]
2024-06-19 00:19:36,303 WARN ThreadId: Replicator [state=Destroyed, statInfo=<running=IDLE, firstLogIndex=3446087, lastLogIncluded=0, lastLogIndex=3446087, lastTermIncluded=0>, peerId=10.254.18.46:7848, waitId=270, type=Follower] already destroyed, ignore error code: 1001
2024-06-19 00:19:36,501 WARN ThreadId: Replicator [state=Destroyed, statInfo=<running=IDLE, firstLogIndex=72, lastLogIncluded=0, lastLogIndex=72, lastTermIncluded=0>, peerId=10.254.18.46:7848, waitId=2, type=Follower] already destroyed, ignore error code: 1001
[admin@b01_nacos_service_test_hk logs]$ cat alipay-jraft.log|grep ERROR
2024-06-19 00:16:35,666 ERROR Fail to connect 10.254.18.46:7848, remoting exception: java.util.concurrent.TimeoutException.
2024-06-19 00:18:26,134 ERROR Fail to connect 10.254.17.172:7848, remoting exception: java.util.concurrent.ExecutionException: io.grpc.StatusRuntimeException: UNAVAILABLE: io exception.
2024-06-19 00:18:26,165 ERROR Fail to connect 10.254.17.172:7848, remoting exception: java.util.concurrent.ExecutionException: io.grpc.StatusRuntimeException: UNAVAILABLE: io exception.
2024-06-19 00:18:26,165 ERROR Fail to init sending channel to 10.254.17.172:7848.
2024-06-19 00:18:26,165 ERROR Fail to start replicator to peer=10.254.17.172:7848, replicatorType=Follower.
2024-06-19 00:18:26,165 ERROR Fail to add a replicator, peer=10.254.17.172:7848.

Protocol-raft日志错误为：
2024-06-19 00:16:35,175 ERROR Fail to refresh route configuration for group : naming_service_metadata, status is : Status[UNKNOWN<-1>: io.grpc.StatusRuntimeException: UNKNOWN]
2024-06-19 00:18:21,467 ERROR Fail to refresh leader for group : naming_instance_metadata, status is : Status[UNKNOWN<-1>: Unknown leader, No nodes in group naming_instance_metadata, Unknown leader]
2024-06-19 00:18:21,469 ERROR Fail to refresh route configuration for group : naming_instance_metadata, status is : Status[ENOENT<1012>: Fail to find node 10.254.17.172:7848 in group naming_instance_metadata]

我们将1节点shutdown10分钟，然后再次重启，问题仍然没有解决。我们在社区的Isseus翻找，发现之前人提出的问题，和我们很类似，解决方式是删除data目录，然后重启即可。我们照着做，确实解决了问题，但是如何避免这种问题出现呢

guozongkang · 2024-06-26T12:39:02Z

我又把问题复现了一下，启动了机器后，故障节点的jraft没什么有用的信息。但是我登录了ld节点，发现ld节点有错误信息

2024-06-25 00:15:39,043 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=1, error=Status[ENOENT<1012>: Peer id not found: 10.254.16.7:7848, group: naming_persistent_service]
2024-06-25 00:15:39,078 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=1, error=Status[ENOENT<1012>: Peer id not found: 10.254.16.7:7848, group: naming_instance_metadata]
2024-06-25 00:15:39,284 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=1, error=Status[ENOENT<1012>: Peer id not found: 10.254.16.7:7848, group: naming_service_metadata]
2024-06-25 00:15:39,344 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=1, error=Status[ENOENT<1012>: Peer id not found: 10.254.16.7:7848, group: naming_persistent_service_v2]
2024-06-25 00:15:41,590 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=11, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:15:43,624 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=21, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:15:44,101 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=11, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:15:44,596 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=11, error=Status[EINTERNAL<1004>: Check connection[10.254.16.7:7848] fail and try to create new one]
2024-06-25 00:15:44,596 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=11, error=Status[EINTERNAL<1004>: Check connection[10.254.16.7:7848] fail and try to create new one]
2024-06-25 00:15:46,102 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=31, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:15:48,134 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=41, error=Status[EINTERNAL<1004>: Check connection[10.254.16.7:7848] fail and try to create new one]
2024-06-25 00:15:49,115 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=21, error=Status[EINTERNAL<1004>: Check connection[10.254.16.7:7848] fail and try to create new one]
2024-06-25 00:15:49,609 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=21, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:15:49,611 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=21, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:15:50,615 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=51, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:15:51,345 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=61, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:15:51,624 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=71, error=Status[EINTERNAL<1004>: Check connection[10.254.16.7:7848] fail and try to create new one]
2024-06-25 00:15:52,073 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=81, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:15:52,453 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=91, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:15:53,648 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=101, error=Status[EINTERNAL<1004>: Check connection[10.254.16.7:7848] fail and try to create new one]
2024-06-25 00:15:54,129 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=31, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:15:54,622 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=31, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:15:54,622 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=31, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:15:56,157 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=111, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:15:58,662 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=121, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:15:59,250 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=41, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:15:59,636 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=41, error=Status[EINTERNAL<1004>: Check connection[10.254.16.7:7848] fail and try to create new one]
2024-06-25 00:15:59,636 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=41, error=Status[EINTERNAL<1004>: Check connection[10.254.16.7:7848] fail and try to create new one]
2024-06-25 00:16:01,170 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=131, error=Status[EINTERNAL<1004>: Check connection[10.254.16.7:7848] fail and try to create new one]
2024-06-25 00:16:03,678 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=141, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:16:04,265 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=51, error=Status[EINTERNAL<1004>: Check connection[10.254.16.7:7848] fail and try to create new one]
2024-06-25 00:16:04,649 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=51, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:16:04,650 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=51, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:16:06,183 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=151, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:16:08,690 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=161, error=Status[EINTERNAL<1004>: Check connection[10.254.16.7:7848] fail and try to create new one]
2024-06-25 00:16:09,308 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=61, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:16:09,662 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=61, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:16:09,662 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=61, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:16:11,199 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=171, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:16:13,704 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=181, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:16:14,320 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=71, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:16:14,673 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=71, error=Status[EINTERNAL<1004>: Check connection[10.254.16.7:7848] fail and try to create new one]
2024-06-25 00:16:14,673 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=71, error=Status[EINTERNAL<1004>: Check connection[10.254.16.7:7848] fail and try to create new one]
2024-06-25 00:16:16,211 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=191, error=Status[EINTERNAL<1004>: Check connection[10.254.16.7:7848] fail and try to create new one]
2024-06-25 00:16:18,717 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=201, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:16:19,332 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=81, error=Status[EINTERNAL<1004>: Check connection[10.254.16.7:7848] fail and try to create new one]
2024-06-25 00:16:19,687 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=81, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:16:19,687 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=81, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:16:21,222 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=211, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:16:23,728 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=221, error=Status[EINTERNAL<1004>: Check connection[10.254.16.7:7848] fail and try to create new one]
2024-06-25 00:16:24,344 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=91, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:16:24,700 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=91, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:16:24,702 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=91, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:16:26,235 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=231, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:16:28,739 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=241, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:16:29,360 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=101, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:16:29,711 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=101, error=Status[EINTERNAL<1004>: Check connection[10.254.16.7:7848] fail and try to create new one]
2024-06-25 00:16:29,712 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=101, error=Status[EINTERNAL<1004>: Check connection[10.254.16.7:7848] fail and try to create new one]
2024-06-25 00:16:31,246 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=251, error=Status[EINTERNAL<1004>: Check connection[10.254.16.7:7848] fail and try to create new one]
2024-06-25 00:16:33,753 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=261, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:16:34,370 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=111, error=Status[EINTERNAL<1004>: Check connection[10.254.16.7:7848] fail and try to create new one]
2024-06-25 00:16:34,723 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=111, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]
2024-06-25 00:16:34,723 WARN Fail to issue RPC to 10.254.16.7:7848, consecutiveErrorTimes=111, error=Status[EINTERNAL<1004>: RPC exception:UNAVAILABLE: io exception]

怀疑是jraft有问题

guozongkang · 2024-06-26T12:40:31Z

我们的错误好像和#1096 这个是一样的

killme2008 · 2024-06-26T19:11:54Z

Peer id not found: [10.254.16.7:7848](http://10.254.16.7:7848/), group: naming_persistent_service

这个错误就是该节点10.254.16.7:7848 从 naming_persistent_service 分组移除了，主动 shutdown 了。

guozongkang · 2024-06-27T10:43:50Z

Peer id not found: [10.254.16.7:7848](http://10.254.16.7:7848/), group: naming_persistent_service

这个错误就是该节点10.254.16.7:7848 从 naming_persistent_service 分组移除了，主动 shutdown 了。

为什么10.254.16.7这个节点会移除，这个是我的启动日志

guozongkang · 2024-06-27T10:49:21Z

10.254.16.7这个节点启动后，LD节点一直报上面的错。在nacos控制台上看，集群两台节点有都显示一个服务数量时65个，我启动10.254.16.7 follower后(称为节点3)，节点1显示服务实例为65，节点2实例为56，节点三服务实例为40，并且长期无法达到一致性状态。随着时间的推移，节点2的实例数是波动的，有时候是61，也有时候是50。实在没有办法，我将节点3(10.254.16.7)shutdown，节点1,2在短时间内恢复正常，显示服务实例为65

killme2008 · 2024-06-27T16:25:32Z

Peer id not found: [10.254.16.7:7848](http://10.254.16.7:7848/), group: naming_persistent_service
这个错误就是该节点10.254.16.7:7848 从 naming_persistent_service 分组移除了，主动 shutdown 了。

为什么10.254.16.7这个节点会移除，这个是我的启动日志

这个你可能要问下 nacos，因为 jraft 不会主动去 shutdown 一个 node

fengjiachun closed this as completed Jul 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nacos-Server jraft初始化失败，导致集群多节点服务下的实例数不一致，重启节点也无法恢复，最后只能删除data目录 #1118

Nacos-Server jraft初始化失败，导致集群多节点服务下的实例数不一致，重启节点也无法恢复，最后只能删除data目录 #1118

guozongkang commented Jun 26, 2024

guozongkang commented Jun 26, 2024

guozongkang commented Jun 26, 2024

killme2008 commented Jun 26, 2024

guozongkang commented Jun 27, 2024

guozongkang commented Jun 27, 2024

killme2008 commented Jun 27, 2024

Nacos-Server jraft初始化失败，导致集群多节点服务下的实例数不一致，重启节点也无法恢复，最后只能删除data目录 #1118

Nacos-Server jraft初始化失败，导致集群多节点服务下的实例数不一致，重启节点也无法恢复，最后只能删除data目录 #1118

Comments

guozongkang commented Jun 26, 2024

guozongkang commented Jun 26, 2024

guozongkang commented Jun 26, 2024

killme2008 commented Jun 26, 2024

guozongkang commented Jun 27, 2024

guozongkang commented Jun 27, 2024

killme2008 commented Jun 27, 2024