Nuking the Hidden State

My old homelab was an overloaded Raspberry Pi running a mess of Docker containers tangled with system dependencies. When something broke, restoring it was a manual nightmare.

I wanted a proper, declarative way to manage my virtual machines and containers. Earlier this year, I found an article from Nijho detailing a move from Proxmox to NixOS and Incus. As a heavy NixOS user, I dug into his solution to see if it fit.

When the hardware finally arrived, I replaced the Pi with a dedicated WTR PRO. I’m running 32GB of RAM in single-channel. It’s a pain for the iGPU, but I’m not ready to spend 200€ for a second one.

The NixOS Stack: Foundation, Engine, and Brain#

The golden rule is simple: Zero application services run on the bare metal. If you mix your host OS with your applications, you lose the ability to nuke and pave. Infrastructure must be modular.

I built the system on three strictly isolated pillars. No overlap, no host pollution.

Three-layer homelab architectureOpenTofuBRAINdeclarative HCL · provisions Incus via APIIncusENGINELXC containers + KVM VMs · ZFS-backed snapshotsNixOSFOUNDATIONbare metal · ZFS pool · Incus daemon · ~150 lines

NixOS is the foundation. It handles the bare metal, the ZFS pool, and the Incus daemon. That is it. A declarative configuration.nix of barely 150 lines defines the entire state. If the motherboard fails, I can provision identical bare-metal on new hardware in ten minutes using nixos-anywhere.

Incus is the engine. While Docker manages processes, Incus manages systems. I use it to run LXC containers and KVM virtual machines with zero overhead. By hooking Incus directly into ZFS, I gain instant, copy-on-write snapshots. If an upgrade breaks a service, I am one command away from a total rollback.

OpenTofu is the brain. Manual configuration is technical debt. Every container, network bridge, and proxy device is defined in OpenTofu (Terraform). The entire internal cloud is managed via declarative HCL files pushed through the Incus API. The code is the documentation.

Security: Mesh and TLS#

Internal networks are not an excuse for weak security.

Tailscale handles the mesh, letting me access the stack from anywhere as if I were on the LAN. To secure the ingress, a dedicated Traefik instance issues Let’s Encrypt wildcard certificates against my own domain via ACME.

Services stay internal — never publicly exposed — but the certs are browser-trusted and rotate themselves. No private CA to push to every device, no “insecure” warnings, no manual bypasses.

State Protection: Snapshots, Replication, Alerts#

Snapshots are not backups, but they are the first line of defense.

Sanoid takes hourly ZFS snapshots of rpool/persistent and rpool/home with 24h / 30d / 6mo retention. Syncoid replicates them every 30 minutes to a second pool on a separate HDD. Restic ships a smaller critical subset — Vaultwarden DB, OpenTofu secrets, Authelia configs — to Backblaze B2, encrypted client-side. The 3-2-1 contract, declared in roughly 60 lines of Nix.

What matters more is what happens when a job fails. A template systemd unit posts to a self-hosted ntfy on every OnFailure from sanoid, syncoid, or restic:

systemd.services.sanoid.unitConfig.OnFailure = [ "failure-notify@%n.service" ];

Phone buzzes, runbook opens, no silent rot.

The Takeaway#

Code is disposable. State is sacred.

This architecture requires effort. You have to write HCL, manage ZFS datasets, and handle deployment keys. But the reward is absolute control. When a system fails—and it will—there is no panic.

You just re-apply the state.