Problem

I was running Kubernetes upgrades with my script (kubify) but they were hanging on random machines. Not always the same one.

Running

2023-06-22 00:37:07,984 kubify.py:628 DEBUG running ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -t -t ubuntu@10.10.2.140 sudo apt-mark unhold kubeadm && sudo apt update && sudo apt install -y kubeadm=1.25.11-00 && sudo apt-mark hold kubeadm

Always to a hang. Never on the same machine.

Looking the machine in question, I found this output in ps

ubuntu   2663021  0.0  0.0   7892  3520 pts/0    Ss+  Jun21   0:00         bash -c sudo apt-mark unhold kubeadm && sudo apt update && sudo apt install -y kubeadm=1.25.11-00 && sudo apt-mark hold kubeadm
root     2663603  0.0  0.1  11896  4548 pts/0    S+   Jun21   0:00           sudo apt install -y kubeadm=1.25.11-00
root     2663604  0.0  0.0  11896   892 pts/1    Ss   Jun21   0:00             sudo apt install -y kubeadm=1.25.11-00
root     2663605  0.0  1.7  78816 69156 pts/1    S+   Jun21   0:03               apt install -y kubeadm=1.25.11-00
root     2663817  0.0  0.4  78816 19672 pts/1    S+   Jun21   0:00                 apt install -y kubeadm=1.25.11-00
root     2663826  0.0  0.0   2888   996 pts/1    S+   Jun21   0:00                   sh -c test -x /usr/lib/needrestart/apt-pinvoke && /usr/lib/needrestart/apt-pinvoke || true
root     2663827  0.0  0.4  25332 19228 pts/1    S+   Jun21   0:01                     /usr/bin/perl -w /usr/share/debconf/frontend /usr/sbin/needrestart
root     2663877  0.0  0.6  31288 24940 pts/1    S+   Jun21   0:00                       /usr/bin/perl /usr/sbin/needrestart
root     2663941  0.0  0.1  10820  4296 pts/1    S+   Jun21   0:00                       whiptail --backtitle Package configuration --title Pending kernel upgrade --output-fd 11 --msgbox Newer kernel available  The currently running kernel version is 5.15.0-73-generic which is not the expected kernel version 5.15.0-75-generic.  Restarting the system to load the new kernel will not be handled automatically, so you should consider rebooting. 11 122

So the machine needed to be rebooted to continue?

Solution

First, detecting if a reboot was needed.

Enter /var/run/reboot-required. A file one can check for existance.

If you were curious why, /var/run/reboot-required.pkgs has you covered.

$ cat /var/run/reboot-required.pkgs
linux-image-5.15.0-75-generic
linux-base

So, reboot before installing the packages? Reboot periodically to always be on the newest kernel?

Given I use Ansible as much as I can, enter the reboot module.

I wrote up a simple little playbook to reboot a set of hosts


# Reboot host(s) but only if necessary
---
- hosts: all
  become: yes
  gather_facts: no
  # Only do one host at a time.
  serial: 1

  tasks:
  - name: check if reboot required
    stat:
      path: /var/run/reboot-required
    register: reboot_required_path

  - name: reboot required found
    debug:
      msg: "Reboot-required file found on host."
    when: reboot_required_path.stat.exists

  - name: reboot host
    ansible.builtin.reboot:
      # How long to wait until retrying connection
      # after host is back up.
      post_reboot_delay: 300 # 5 min
      # How long to wait for machine to reboot
      # and respond to test command.
      reboot_timeout: 600 # 10 min
    when: reboot_required_path.stat.exists

  - name: pause for 2 minutes after rebooting a host
    pause:
      minutes: 2
    when: reboot_required_path.stat.exists

Docs