Borked Cluster

Categories: tech

I let my homelab Kubernetes version stagnate for a while. Life got busy, all the typical excuses.

I made the mistake of updating last night. containerd updated last night overwriting it’s configuration file borking everything. Of course I did not have it under Ansible version control :-) . After wrestling with kubelet loosing track of all the containers I made the decision to restart the node. Come along with me to find out how to fix some of the things!

Anthens Proxy

I have an Athens proxy setup in proxy-athens-prod installed via Helm which I need to find out it’s uses. Installation predated my use of IaC on my cluster. To grab values from a Helm release you use can helm -n proxy-athens-prod get values r0 where r0 is the release. Using helm -n proxy-athens-prod list will show the chart like the following:

NAME	NAMESPACE	REVISION	UPDATED	STATUS	CHART	APP VERSION
r0	proxy-athens-prod	11	2024-05-14 12:50:43.961647 -0700 PDT	deployed	athens-proxy-0.11.0	v0.14.0
redis-r0	proxy-athens-prod	1	2024-05-14 12:36:58.556789 -0700 PDT	deployed	redis-18.17.0	7.2.4

Unforutnately, this doesn’t provide me with the original chart URL. Athens repo points the chart repository https://github.com/gomods/athens-charts. Most recent version is 0.15.5! Looks like the only major change is the Redis chart, unsurprisingly.

Once i pushed this up I found a major problem: my internal GItea instance is down. Turns out my disks filled up.

`containerd` on a different volume

By default containerd uses /var/lib/containerd to store metadata. This effecitvely filled up my / volume and stopped everything. Resolved this by issuing the following commands:

NODE=controlplane-00
OTHER_DRIVE_PATH=/data/some-drive/path
OLD_PATH=/var/lib/containerd
kubectl cordon "$NODE"
kubectl drain "$NODE" --delete-emptydir-data --ignore-daemonsets
systemctl stop kubelet
systemctl stop containerd
mkdir -p "$OTHER_DRIVE_PATH"
mv "$OLD_PATH" "$OTHER_DRIVE_PATH"
ln -s "$OTHER_DRIVE_PATH" "$OLD_PATH"
kubectl uncordon "$NODE"

After about 10 minute things in the cluster stablized and we could return to normal. Or so I thought. I eventaully needed to run deployment-replica-mismatch.sh to find deployment mismatches and decdied to “go nuclear” to reduce complexity with deployment-shutdown.sh .

ArgoCD -> Gitea

Using ArgoCD with repositories from my Gitea instance means Gitea is a strict dependency of modifying Kubernetes. Kind of like a snake eating itself in some ways I guess. Gitea is refusing to start with the only error message an fsnotify error. This appeared to actually be an error related to restart the devices above.

Really this was a symptom of the host being swampped. I went through and shutdown all pods in a CrashLoopBackoff or Creating.

Re-approaching the problem

This obviously isn’t a I'll just giggle a knob to fix it type problem. Reprioritizing based on what is most important to my users, my family. We run home-assistant as our house brains with a Longhorn volume for with data storage. While booting up, it complained loudly about DNS failing! Fair, as unbound did not listen on the expected target.

Amazingly this came up without any intervention Paperless-NGX came up with an error. Likely indicates the faulting node was saturated.

Gitea

Gitea was broken because DNS was broken. This was broken as a result of the bitnami charts.

TODOs

Find out why unbound on hosts is not properly binding to the external IP address of the host.
Cluster internal DNS needs to be checke.d

Short term todos

Move deployment-replica-mismatch.sh and deployment-shutdown.sh .
Update this to point here https://bsky.app/profile/did:plc:rqh2ntpoz3mafvazrzbqfkxx/post/3mf5wrja5hk2h

Stream of Consciousness

Mark Eschbach's random writings on various topics.

Borked Cluster

Anthens Proxy

`containerd` on a different volume

ArgoCD -> Gitea

Re-approaching the problem

Gitea

TODOs

Short term todos

Borked Cluster

Anthens Proxy

containerd on a different volume

ArgoCD -> Gitea

Re-approaching the problem

Gitea

TODOs

Short term todos

`containerd` on a different volume