others-how to solve 'node unavailable, kubelet stopped posting node status' when using rancher

Dec 3, 2020

Problem

When we use rancher, sometimes, the worker node stopped working, we get a warning like this:

Unavailable
kubelet stopped posting node status

Environment

Docker: Server Version: 19.03.13
Rancher 2.x

Debug

You can debug the node status by running this command:

kubectl describe nodes

Then check the kubelet logs in the node:

journalctl -u kubelet

Solution

Solution #1: Restart docker/kubelet service

You can try to restart the docker service in the not working node:

In centos:

service docker restart

Or ubuntu:

systemctl restart docker
systemctl restart kubelet

Solution #2: Reboot the node

If you have the root permission and the server is ready to reboot, then you can do this:

reboot

Solution #3: Recreate the cluster

You can follow this guide to create the cluster again.

Solution #4: Remove and then re-add the node

First remove the node from the cluster
Second add the node to the cluster again or do the etcd snapshot restore by following this guide.

Solution #5: Close the swap memory in the node

You can follow this guide or just execute the command as follows:

swapoff -a

Solution #6: Re-enable the ip forwarding of docker

Dockerd enables ip forwarding (sysctl net.ipv4.ip_forward) when it starts. But if you do service network restart, it will disable ip forwarding while stopping networking. You need to re-enable it.

You can verify the ip_forward status by running:

docker info|grep WARNING

If you got this:

WARNING: IPv4 forwarding is disabled

Then you should re-enable the ip forwarding temporarily:

sudo sysctl -w net.ipv4.ip_forward=1

Or permanently:

echo "net.ipv4.ip_forward = 1" | sudo tee -a /etc/sysctl.conf