As you may have noticed, we had some downtime.
For the IT nerds, here’s what’s happened:
All BrewPi sites were hosted on a bare metal server in Germany, running self-managed proxmox cluster to run a couple of virtual machines.
I had some issues with the KVM for the community running out of disk space. In the process of trying to solve this, I upgraded the host machine, but by mistake, I had some apt-lists for a newer Linux release than what the server was running. The upgrade failed and this left the server in a partially updated state, with no way to undo the updates, and no way to proceed either. Running containers/KVM’s completely stopped working, so all brewpi services and my mood were down.
I had a backup script on the host, but I did not have a regular backup test. So this backup strategy failed a while ago and I did not have recent complete VM dumps. But luckily, I was still able to access all files.
I moved everything to DigitalOcean to be able to better mange this from now on.
To do so, I had to reinstall all VMs. To recover the databases, I had to copy the
var/lib/mysql directories to a newly created similar LAMP VM, get the database going again, then export a proper sql dump.
This SQL dump could be imported in newly installed VM’s and the file copying was a breeze with rsync. Some server config, some cursing, lot’s of googling further, and we are back up and running!
The community KVM was a different story, because it was a KVM. I had to convert the qcow2 disk image to a vdi that could be loaded into Virtualbox. It still worked and had a recent community backup on the system. I could import this backup on a fresh install of discourse in a DigitalOcean Droplet. After a lot of minor issues and fiddling with the SSL config, community is back up again too in a completely fresh install
Back online now:
- Front page
- Docs (now hosted on readthedocs.io, redirected)
If you are missing anything else, let me know.
And if you want to try out DigitalOcean yourself, please use my referral link to get $10 free:
Based on my 2 days of usage so far, I do recommend them
And some friendly advice (mainly to my future self): I was lucky to get everything back up in 2 days, because the data was still intact and accessible, just running containers was broken. If the server would have gone up in flames, I would have had a much bigger problem. My backup mechanism was broken, because I didn’t regularly check that it was still working well. Go test your backups now, even when you have other important things to do (like next week, and next week, and next week).