Salvage old brewblox data

Stefan_Max · December 21, 2020, 4:58pm

Thank you for the extremely quick reply, Bob!

I managed to run brewblox-ctl log:
https://termbin.com/a21k

(To rund brewblox-ctl I need to manually add the ~/.local/bin directory to the $PATH every time, even though it’s specified in .profice and .bashrc, and there’s a lot of other weird behaviour from the pi. Got random ZRAM errors (which I now disabled) and more…)

brewblox-ctl snapshot save also worked, I now have a 65MB big backup file. Can I treat that like a “normal” backup?

I’ve already installed brewblox from scratch on another Pi (which I use for experimenting) and fortunately, my fermentation fridge is still behaving nicely. The spark with all the blocks is still there and running.

Bob_Steers · December 21, 2020, 5:10pm

Logs indicate services are currently offline, could you start then, and then re-run the log command?

If your Pi is all-round behaving weirdly, it does often indicate that another SD card died. I haven’t yet seen it throw PATH errors though - typically it slows down, and starts throwing low-level OS errors.

Snapshots can be loaded using either brewblox-ctl snapshot load, or brewblox-ctl install --snapshot FILE. They’re different from backups in that backups only include settings, and are more specific in what they save. Snapshots just zip the entire dir, history and all. Loading a snapshot will overwrite the entire brewblox dir.

Stefan_Max · December 21, 2020, 7:34pm

brewblox-ctl up led to the following output:

[20:10:29] pi@brewpi:~/brewblox$ brewblox-ctl up
Starting brewblox_redis_1 …
Starting brewblox_spark-one_1 …
Starting brewblox_eventbus_1 …
Starting brewblox_ui_1 …
Starting brewblox_influx_1 …
Starting brewblox_redis_1 … done
Starting brewblox_spark-one_1 … done
Starting brewblox_eventbus_1 … done
Starting brewblox_ui_1 … done
Starting brewblox_influx_1 … done
Starting brewblox_history_1 … done
d340b): Bind for 0.0.0.0:443 failed: port is already allocated

ERROR: for traefik Cannot start service traefik: driver failed programming external connec
y on endpoint brewblox_traefik_1 (10330d14f76b330f33ad4aa17947be57c452f3b2f32521fe08a4ed9ca
0b): Bind for 0.0.0.0:443 failed: port is already allocated
ERROR: Encountered errors while bringing up the project.
Command ‘docker-compose up -d --remove-orphans’ returned non-zero exit status 1.

What I see is that traefik seems to be started twice… That’s weird.

The new log is here:
https://termbin.com/m9av

I’ll try using the snapshot later when I have access to the other Pi again.
New SD card is already ordered and I’ll try to hook up a HDD to the Pi this time.

Bob_Steers · December 21, 2020, 7:52pm

It’s telling you that something already is using port 443. Maybe your Home Assistant?
To check, you can run

sudo netstat -tulpn

Stefan_Max · December 21, 2020, 8:12pm

I managed to use the snapshot to restore my data.
Thanks so much, Bob! This saved my day.

For curiosities sake: When using brewblox-ctl update to try to fix my installation, the following happens:

INFO Starting services…
Creating network “brewblox_default” with the default driver
Creating brewblox_eventbus_1 …
Creating brewblox_history_1 …
Creating brewblox_ui_1 …
Creating brewblox_redis_1 …
Creating brewblox_influx_1 …
Creating brewblox_spark-one_1 …
Creating brewblox_traefik_1 …
Creating brewblox_traefik_1 … error

ERROR: for brewblox_traefik_1 Cannot start service traefik: driver failed programming external connectivity on endpoint brewblox_traefik_1 (5b4c6aeaa1e03d8c06008264b3e93c49153add9fec83e8075060e77a7a2d5518): Bind for 0.0.0.0:443 failed: port is already allocated

Here’s the point where traefik_1 is apparently started twice and the second one reports an error…

And this is also weird:

docker.io/brewblox/brewblox-ctl-lib:edge
WARNING: The requested image’s platform (linux/amd64) does not match the detected host platform (linux/arm/v7) and no specific platform was requested
6bf50d01f92343d2b1ea9791ffdacd94df73bb457fce2574e3f4398e3f45426a

Probably something to do with messed up environment variables?

But I’ve got a running Brewblox installation with my setup again. So all else is just, as I said - curiosity.

Bob_Steers · December 21, 2020, 10:11pm

Good to hear it works again!

You can ignore the platform warning for ctl-lib. It’s technically correct, but we only use the container to copy python files, so it never becomes an issue.

The error during the update is the same again: something else is using port 443 already. It’s the default port for HTTPS, so it’s probably some webserver on your Pi.

Stefan_Max · December 22, 2020, 5:01pm

Well weirdly enough, only “docker-proxy” is shown running on 443 with sudo netstat -tulpn…

Bob_Steers · December 22, 2020, 6:19pm

This can happen if Docker failed to clean up some earlier containers and their claims.

A fix would be to run brewblox-ctl kill (this will stop and clean up all containers), and then reboot.
After reboot, netstat should no longer show a process being attached to 443.

Stefan_Max · December 27, 2020, 3:38pm

I am having the same issue again with a freshly installed Raspi. Apparently there is some conflict between openHab and Brewblox. I installed openHab3 first and after that proceeded to install Brewblox.
(I don’t want to use two different Raspis for two programs.)

First issue I encountered again is that brewblox-ctl is missing from the PATH. I found out why:
openhab uses .bashrc-profile while brewblox is writing to .profile, which is apparently ignored when a .bashrc_profile exists. I don’t know in which conditions one would use which file, but maybe the brewblox installer could check which file is being used and then writing the path to the corresponding file (could imagine that might be an issue for more people).
That was fixed easily enough (just deleted the .bashrc_profile, there wasn’t much useful stuff in there).

Brewblox installed just fine, but when running brewblox-ctl up, it hangs at the following point:

INFO Starting configured services…
Creating network “brewblox_default” with the default driver
Creating brewblox_traefik_1 … done
Creating brewblox_history_1 … done
Creating brewblox_influx_1 … done
INFO Configuring history settings…
Connecting https://localhost:443/history/history/ping, attempt 1/60
Connecting https://localhost:443/history/history/ping, attempt 2/60
Success!
INFO Stopping services…
Stopping brewblox_history_1 … done
Stopping brewblox_influx_1 … done
Stopping brewblox_traefik_1 … done
Removing brewblox_history_1 … done
Removing brewblox_influx_1 … done
Removing brewblox_traefik_1 … done
Removing network brewblox_default
INFO All done!
pi@ANNIE:~/brewblox $ brewblox-ctl up
Creating network “brewblox_default” with the default driver
Creating brewblox_ui_1 … done
Creating brewblox_spark-one_1 … done
Creating brewblox_traefik_1 … done
Creating brewblox_redis_1 …
Creating brewblox_history_1 …
Creating brewblox_influx_1 …
Creating brewblox_eventbus_1 …

ERROR: for brewblox_eventbus_1 UnixHTTPConnectionPool(host=‘localhost’, port=None): Read timed out. (read timeout=60)

ERROR: for brewblox_influx_1 UnixHTTPConnectionPool(host=‘localhost’, port=None): Read timed out. (read timeout=60)

ERROR: for brewblox_history_1 UnixHTTPConnectionPool(host=‘localhost’, port=None): Read timed out. (read timeout=60)

ERROR: for brewblox_redis_1 UnixHTTPConnectionPool(host=‘localhost’, port=None): Read timed out. (read timeout=60)

ERROR: for eventbus UnixHTTPConnectionPool(host=‘localhost’, port=None): Read timed out. (read timeout=60)

…and it crashed the entire pi. I can’t reach it via network anymore (SSH, web, anything).
I’ll try to find more information about this. Rebooted the pi now, here’s the brewblox-log:
https://termbin.com/ungox

Second very weird thing: On my other Pi, where I thought the SD card was corrupted, for some strange reason brewblox is running again. Flawlessly. Last thing I did was a brewblox-ctl kill and then I left it alone. Yesterday I noticed that brewblox was running again neatly. SD card seems to be fine as well.

Stefan_Max · December 27, 2020, 4:06pm

Update: Hard-rebooting (unplugging) the Pi solved the problem and Brewblox is now running.
But this looked too much like some of the issues I had before so I fear there might be something lurking deep inside the code…

Bob_Steers · December 27, 2020, 4:10pm

Well-behaved .bash_profile implementations should load .profile as well. As usual, stack overflow explains this in more depth.

We can check anyway, but we’ll probably have to limit ourselves to a warning if .bash_profile exists, but has not loaded .profile. We try and keep our changes to the host system to an absolute minimum. The $HOME/.local/bin directory where brewblox-ctl lives is added to PATH by default in Debian Buster. It was considered a bug that Debian Stretch didn’t do this.

These Docker timeout errors can be caused by a variety of reasons.
The primary suspect is that OpenHAB + Brewblox causes memory issues.
Have you disabled swap memory?
If you open htop, what is idle / active memory usage with OpenHAB and Brewblox both running?

Stefan_Max · December 27, 2020, 4:40pm

Thanks for pointing that out. I created a github issue for openhab stating very politely that their setup should include the .profile.

I have disabled swap memory now. I think I had done that on my old machine as well, but I’m not sure anymore.
htop is showing 500MB of memory usage, on my old Pi even 750MB (which is currently running both openhab and brewblox in a production environment). Maybe I should invest in a Pi4 with more RAM? Or just run both systems on two different Pis after all?

I’d love to run all of it on my NAS, but back then I managed to buy one of the only systems that runs on powerPC… and almost nothing compiles on a PPC…

Bob_Steers · December 27, 2020, 6:27pm

I’d keep an eye on memory usage. Scaling up hardware would be a solution if you’re commonly maxed out.

We support the Pi because it’s a cheap way to get a dedicated server, but Brewblox typically uses ~400MB memory. That is a significant chunk of the Pi3’s 1GB RAM.

You’re correct on this count. We’re already waiting for upstream support for ARM64. PPC is comparatively obscure.

Stefan_Max · January 18, 2021, 3:55pm

Reviving this old topic. Fooled around with my raspberry a bit and had to do a couple of ungraceful restarts (read: pulling the plug).

I encountered something similar and the problem seemed to be that the eventbus-container was being created twice (and thus complaining that the port was already in use). Same problem occurred above, online with traefik, it’s actually in the log I posted above:

Creating brewblox_traefik_1 …
Creating brewblox_traefik_1 … error

Had the same now with eventbus. I deleted all docker images I could find, ran brewblox-ctl kill a couple of times and rebooted. Then stopped all other software and restarted brewblox. Then it worked.

But apparently, when crashing or unplugging the pi, there might be some leftover double containers? I have no idea why that might be happening, but maybe it gives you a lead (if this is something that needs solving).

Bob_Steers · January 18, 2021, 6:21pm

Thanks for taking the effort to provide extra info!

We haven’t looked into the exact cause beyond “hard crashes can cause undefined state”. The chain of events is clear enough, but it’s not an issue in our code or something we can work around.

Deleting images is slightly overkill: they are the executable data, and will not be changed at runtime.

Stefan_Max · January 18, 2021, 7:49pm

What would be the best way to solve the “double services / containers” problem? Should brewblox-ctl kill and a reboot be sufficient?

Another observation: Using ZRAM swapping also seems to cause trouble.

Bob_Steers · January 18, 2021, 10:16pm

Kill and reboot should indeed be sufficient.

Best I can tell, on crash either the port claim persisted, or the restart: unless_stopped configuration flag starts a malfunctioning image which claims the port.
Either way, there’s a leftover process with a claim. If the process doesn’t unclaim the resource, the OS will do so about ~1m after process death. A reboot will also wipe current claims.

It’s not actually trying to start the service twice: the two lines in the log should be interpreted as “starting container”, and “failed to start container”.

The best way to deal with the problem is to prevent hard crashes.

Stefan_Max · January 19, 2021, 8:48am

Thanks Bob!

Yes, I’m working on that. I will give Brewblox it’s own dedicated Pi at some point (when I’m done tinkering).

Bob_Steers · February 3, 2021, 3:56pm

I happened to come across an alternative fix for forcing docker to unclaim ports: https://stackoverflow.com/a/61239636/6868722

To quote:

This worked for me on Linux, and without having to delete any resources.

# Stop docker
sudo service docker stop

# Find your particular zombie proxy processes
sudo netstat -pna | grep docker-proxy
# tcp6       0      0 :::8025       :::*     LISTEN      <PID_A>/docker-proxy  
# tcp6       0      0 :::13306      :::*     LISTEN      <PID_B>/docker-proxy
# ...

# Kill them
sudo kill -9 PID_A PID_B ...

# restart
sudo service docker start

Stefan_Max · February 11, 2021, 3:01pm

Awesome, thanks @Bob_Steers for sharing this!