others-how to solve 'node unavailable, kubelet stopped posting node status' when using rancher

Problem

When we use rancher, sometimes, the worker node stopped working, we get a warning like this:

Unavailable
kubelet stopped posting node status

Environment

  • Docker: Server Version: 19.03.13
  • Rancher 2.x

Debug

You can debug the node status by running this command:

kubectl describe nodes

Then check the kubelet logs in the node:

journalctl -u kubelet

Solution

Solution #1: Restart docker/kubelet service

You can try to restart the docker service in the not working node:

In centos:

service docker restart

Or ubuntu:

systemctl restart docker
systemctl restart kubelet

Solution #2: Reboot the node

If you have the root permission and the server is ready to reboot, then you can do this:

reboot

Solution #3: Recreate the cluster

You can follow this guide to create the cluster again.

Solution #4: Remove and then re-add the node

  1. First remove the node from the cluster
  2. Second add the node to the cluster again or do the etcd snapshot restore by following this guide.

Solution #5: Close the swap memory in the node

You can follow this guide or just execute the command as follows:

swapoff -a 

Solution #6: Re-enable the ip forwarding of docker

Dockerd enables ip forwarding (sysctl net.ipv4.ip_forward) when it starts. But if you do service network restart, it will disable ip forwarding while stopping networking. You need to re-enable it.

You can verify the ip_forward status by running:

docker info|grep WARNING

If you got this:

WARNING: IPv4 forwarding is disabled

Then you should re-enable the ip forwarding temporarily:

sudo sysctl -w net.ipv4.ip_forward=1

Or permanently:

echo "net.ipv4.ip_forward = 1" | sudo tee -a /etc/sysctl.conf