zuzu-system-backup/README.md
2025-12-06 09:31:50 -05:00

489 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# zuzu-system-backup
Python-based backup tool for Linux hosts that pushes **snapshots** and **live mirrors** to a remote NAS over SSH/rsync.
- Snapshots are compressed archives (tar + optional zstd) created on the host, then uploaded.
- Mirrors are “single-copy” rsync trees kept in sync with `--delete`.
- Configuration is per-host, in YAML.
- Scheduling is done via systemd timers.
All hostnames, users, and paths in this README use **fictional example values** so its safe to publish as-is. Replace them with your own.
---
## 1. High-level architecture
### 1.1 Components
- **Host script**: `zuzu-system-backup.py`
- Runs on each Linux host.
- Reads per-host `backup.yaml`.
- Builds local temp trees, compresses them, and uploads archives.
- Manages retention (max snapshot count).
- **Remote backup share** (NAS / backup server):
- Exposed via SSH (and optionally a mounted share).
- Receives snapshot archives and sync mirrors.
### 1.2 Snapshot vs mirror
The script supports two complementary backup modes:
1. **Snapshots** (versioned)
- Controlled by:
- `system.include_paths`
- `user.include_dirs`
- `user.include_files`
- On each run:
- rsync selected paths into a **local temp tree** per category.
- compress each category into a single `*.tar` or `*.tar.zst` archive.
- upload archives to a timestamped directory on the backup share.
2. **Single-copy mirrors** (no history)
- Controlled by `single_copy_mappings`.
- Each mapping is `"LOCAL_SRC|REMOTE_DEST"`.
- rsync with `--delete` keeps the remote tree in sync with local.
- Intended for large data (databases, model stores, etc.).
---
## 2. Data-flow diagrams
### 2.1 Backup flow on a single host
```mermaid
flowchart LR
subgraph Host["Linux host (example: orion.example.net)"]
A[Snapshot sources\nsystem + user] --> B[Local temp trees\nsystem / user-dirs / user-files]
B --> C[Tar plus optional zstd\ncreate archives]
D[Mirror sources\nsingle_copy_mappings] --> E[rsync --delete]
end
subgraph NAS["Backup share (example: backup-nas.local)"]
C --> F[Snapshots directory\n/daily/system-orion/<timestamp>/]
E --> G[Sync mirrors\n/sync/system-orion-*]
end
````
* Snapshots: archives are compressed **locally** and only the archives are sent.
* Mirrors: rsync goes directly from host paths to remote paths, with `--delete`.
### 2.2 Control flow inside the script
```mermaid
flowchart TD
Start([systemd timer or manual run])
--> LCFG[Load backup.yaml]
--> PREP[Compute paths and compression root]
--> SNAPROOT[Ensure remote snapshots root exists]
--> SNAPNAME[Generate snapshot name\nYYYY-MM-DD_HH-MM-SS]
--> MKDIR_REMOTE[Create remote snapshot directory]
MKDIR_REMOTE --> TMPDIR[Create local temp root]
TMPDIR --> RSYNC_SYS[rsync system.include_paths\ninto temp/system]
TMPDIR --> RSYNC_UDIR[rsync user.include_dirs\ninto temp/user-dirs]
TMPDIR --> RSYNC_UFILE[rsync user.include_files\ninto temp/user-files]
RSYNC_SYS --> COMP_SYS[Compress system category\ncreate system archive]
RSYNC_UDIR --> COMP_UDIR[Compress user-dirs category\ncreate user-dirs archive]
RSYNC_UFILE --> COMP_UFILE[Compress user-files category\ncreate user-files archive]
COMP_SYS --> UPLOAD_SYS[Upload archives to remote snapshot directory]
COMP_UDIR --> UPLOAD_UDIR
COMP_UFILE --> UPLOAD_UFILE
UPLOAD_UFILE --> MIRRORS[Process mirrors\nsingle_copy_mappings with rsync --delete]
MIRRORS --> PRUNE[Prune old snapshots\nkeep at most N]
PRUNE --> CLEANUP[Remove temp tree and excludes file]
CLEANUP --> Done([Exit])
```
---
## 3. Backup share layout
This project assumes a structured backup share on the NAS or backup server, for example:
```text
/srv/backup/
automated/
daily/
system-orion/
2025-01-01_03-00-01/
system.tar.zst
user-dirs.tar.zst
user-files.tar.zst
2025-01-02_03-00-01/
...
system-pegasus/
system-hera/
sync/
system-orion-databases/
... live rsync mirror ...
system-orion-models/
...
system-pegasus-luns/
...
manual/
installer-isos/
http-bookmarks/
license-keys/
...
```
Typical patterns:
* **automated/daily/system-<host>/**
Date-based snapshot directories, each containing only a few archives.
* **automated/sync/system-<host>-<purpose>/**
One dir per mirror, updated in place.
* **manual/**
Hand-managed backups, not touched by this tool.
You can adapt the root (`/srv/backup`) and naming (`system-orion`, etc.) to match your environment.
---
## 4. Features
* Python 3 script, no shell gymnastics.
* YAML configuration, per-host.
* Snapshot categories:
* `system.include_paths` (root-owned config and service dirs)
* `user.include_dirs` (full user trees, relative to `user.home`)
* `user.include_files` (one-off important files)
* Compression modes:
* `high` (zstd `-19`)
* `light` (zstd `-3`)
* `none` (plain `.tar`)
* Local compression only:
* No dependency on the NAS having zstd or GNU tar.
* Single-copy mirrors:
* Declarative `"LOCAL_SRC|REMOTE_DEST"` mappings.
* Retention:
* Keep at most `retention.snapshots` snapshot directories per host.
* Systemd integration:
* Ones-shot service + timer per host.
* Logging:
* Structured, timestamped logs via `journalctl`.
---
## 5. Dependencies
On each host:
* `python` (3.x)
* `python-yaml` (PyYAML)
* `rsync`
* `openssh`
* `tar`
* `zstd` (optional but strongly recommended if using `compression.mode: high|light`)
Example (Arch-based host):
```bash
sudo pacman -S python python-yaml rsync openssh tar zstd
```
---
## 6. Configuration (`backup.yaml`)
Each host has its own `backup.yaml` next to the script.
### 6.1 Schema overview
* `remote`: where to send backups
* `retention`: how many snapshots to keep
* `compression`: how to compress snapshot archives
* `rsync`: extra rsync flags
* `system`: system-level include paths
* `user`: user home and per-user include paths/files
* `exclude_patterns`: rsync-style excludes
* `single_copy_mappings`: one-way mirrors (no history)
### 6.2 Example `backup.yaml` (for host `orion`)
```yaml
remote:
user: backupuser
host: backup-nas.local
port: 22
key: /home/backupuser/.ssh/id_ed25519-orion
base: /srv/backup/automated
host_dir: system-orion
retention:
# Max number of snapshot directories to keep on NAS
snapshots: 7
compression:
# high | light | none
mode: high
# Optional: where local temp trees and archives live
path: /srv/tmp/backups
rsync:
extra_opts:
- --numeric-ids
- --info=progress2
- --protect-args
system:
include_paths:
- /etc/nftables.conf
- /etc/snapper/configs
- /etc/NetworkManager/system-connections
- /etc/chromium/policies/managed
- /etc/fstab
- /etc/systemd/system/*.mount
- /etc/systemd/system/*.automount
- /etc/nut/nut.conf
- /etc/nut/upsmon.conf
user:
home: /home/devuser
include_dirs:
- .ssh
- .gnupg
- .local/share/wallpapers
- projects
- pkgbuilds
- venvs
include_files:
- .config/chromium/Default/Preferences
- .config/chromium/Default/Bookmarks
- .config/vlc/vlcrc
- .gitconfig
- .bashrc
- .bash_profile
- .local/share/user-places.xbel
exclude_patterns:
# Caches (generic)
- "**/Cache/**"
- "**/GPUCache/**"
- "**/shadercache/**"
- "**/ShaderCache/**"
- "**/Code Cache/**"
# SSH ControlMaster sockets
- "${USER_HOME}/.ssh/ctl-*"
- "**/.ssh/ctl-*"
# JetBrains bulk (plugins + Toolbox app bundles)
- "${USER_HOME}/.local/share/JetBrains/**/plugins/**"
- "${USER_HOME}/.local/share/JetBrains/Toolbox/apps/**"
- "${USER_HOME}/.cache/JetBrains/**"
# Chromium bulk (we include only specific files above)
- "${USER_HOME}/.config/chromium/**"
single_copy_mappings:
# Example mirrors:
- "/srv/data/postgres|/srv/backup/automated/sync/system-orion-postgres"
- "/srv/data/models|/srv/backup/automated/sync/system-orion-models"
```
Notes:
* `user.include_dirs` and `user.include_files` are **relative to `user.home`** unless they start with `/`.
* `${USER_HOME}` and `${HOME}` in `exclude_patterns` are expanded to `user.home` by the script.
* `single_copy_mappings` paths are *not* expanded; use absolute paths.
---
## 7. Script usage
### 7.1 Manual run
From the directory where the script lives, or via its full path:
```bash
sudo /usr/local/sbin/zuzu-system-backup/zuzu-system-backup.py
```
(or whatever path you install it to)
Logs go to stderr; under systemd, they land in `journalctl`.
### 7.2 Systemd service & timer
Example service:
```ini
# /etc/systemd/system/host-backup.service
[Unit]
Description=Host backup to NAS via zuzu-system-backup
Wants=network-online.target
After=network-online.target
[Service]
Type=oneshot
ExecStart=/usr/local/sbin/zuzu-system-backup/zuzu-system-backup.py
Nice=10
IOSchedulingClass=best-effort
IOSchedulingPriority=7
```
Example timer:
```ini
# /etc/systemd/system/host-backup.timer
[Unit]
Description=Nightly host backup to NAS
[Timer]
OnCalendar=*-*-* 03:15:00
RandomizedDelaySec=20min
Persistent=true
[Install]
WantedBy=timers.target
```
Enable and start:
```bash
sudo systemctl daemon-reload
sudo systemctl enable --now host-backup.timer
```
Check status:
```bash
systemctl list-timers 'host-backup*'
journalctl -u host-backup.service -n 50
```
---
## 8. Retention policy
Retention is implemented as “keep at most **N snapshots**” for each host:
* `retention.snapshots: 7` → keep the newest 7 snapshot directories under `REMOTE_BASE/host_dir/snapshots`.
* Snapshot directory names are timestamps: `YYYY-MM-DD_HH-MM-SS`.
* Older snapshot dirs are deleted entirely (`rm -rf` on the NAS via SSH).
No time math on `mtime`, just count-based retention by sorted timestamp name; simple and predictable.
---
## 9. Restore basics
### 9.1 Restoring from a snapshot archive
On the NAS or after copying archives locally:
```bash
# Example: restore system snapshot for 2025-01-02 from host "orion"
cd /restore/target
# If compressed
zstd -d /srv/backup/automated/daily/system-orion/2025-01-02_03-00-01/system.tar.zst -o system.tar
tar -xf system.tar
# If compression.mode was "none"
tar -xf /srv/backup/automated/daily/system-orion/2025-01-02_03-00-01/system.tar
```
Repeat for `user-dirs.tar(.zst)` and `user-files.tar(.zst)` as needed.
### 9.2 Restoring from a mirror
Mirrors are just rsynced trees; you can restore them with rsync or cp:
```bash
# rsync mirror back to host
rsync -aHAX --numeric-ids \
backupuser@backup-nas.local:/srv/backup/automated/sync/system-orion-postgres/ \
/srv/data/postgres/
```
Always test restores on a non-production target first.
---
## 10. Safety notes
* **Mirrors are destructive:**
* `single_copy_mappings` use rsync `--delete`.
* Deletes on the host will remove files on the backup side in the next run.
* **Snapshots are immutable per run:**
* Each run creates a new directory, writes archives, and then retention may remove older snapshot dirs.
* **Local compression uses space:**
* `compression.path` should point at a filesystem with enough free space to hold a full snapshots uncompressed temp trees **plus** the compressed archives.
* **Permissions:**
* The script expects to be run as root (or with enough privileges) to read system paths and user homes.
* **SSH keys:**
* Use dedicated SSH keys per host with restricted accounts on the NAS where possible.
---
## 10. Contributing
Contributions are very welcome, especially around:
* additional backup backends or layout conventions,
* smarter snapshot/mirror strategies (e.g., per-path compression settings),
* restore helpers and verification tooling,
* better safety guards around destructive operations (`--delete`, pruning),
* distro packaging (Arch, Debian, containers, etc.).
Basic guidelines:
* Treat this as infrastructure code:
* avoid surprises in defaults (compression, retention, paths),
* keep the YAML schema stable and well-documented,
* make new features opt-in whenever they could delete or overwrite data.
* Be conservative with `rsync --delete`:
* mirrors are intentionally destructive, but code paths that trigger deletion should be obvious and well-commented.
* Keep logs readable and actionable:
* clear “what is happening” messages,
* explicit summary per run (snapshot name, mirrors processed, pruning done).
Bug reports and pull requests are preferred over vibes and interpretive dance.
---
## 11. License
This project is licensed under the **MIT License**.
See the `LICENSE` file in this repository for the full text.
---
## 12. Author & acknowledgements
**Author / Maintainer**
* **Name:** \<your-name-here\>
**Acknowledgements**
* Inspired by earlier shell-based backup scripts and ad-hoc rsync one-liners that deserved a nicer life.
* Thanks to everyone who runs this on real systems, weird filesystems, and “creative” NAS setups and reports back what explodes.
The goal of this project is to make Linux backups **boring, predictable, and inspectable**—for both humans and tools trying to reason about how and where data is stored.