I am experiencing Spark 4 reboots periodically. Over the last couple days it has been happening irregularly. Checking the firmware is 2022-01-21, yet the UI is telling me there is a firmware upgrade, yet it does not seem to update.
It is connected over wifi, shows an address on the display and responds to a browser with the prompt for getting a QR code, but in the info block does not show an IP address.
What would be the appropriate troubleshooting steps for this issue?
The firmware version seems in order, but there are repeated connection resets in the Spark service.
The Tilt service has its own issues, with repeated bluetooth errors (100 Network is Down). The Tilt service restarting after every error could explain the Spark connection issues.
If you remove the Tilt service for now with brewblox-ctl service remove tilt (this will not remove tilt configuration or history data), does that stabilize the Spark? We can then hunt down the bluetooth errors.
Ugh, well should have done that when I stopped fermentation! I’ve removed the tilt service and restarted the services. Will see if this resolves the problem!
I’m hopeful. What’s weird is I’ve not used the fermentation chamber for a while and it was fine, then a couple days ago the spark4 started rebooting. Still wondering…
Thank you for the suggestion and I will remember to disable the Tilt service from now on!
Normally, the Tilt service should function just fine if the Tilt isn’t broadcasting. It’s passively listening, not making an active connection.
There are multiple steps and checks when a Spark service connects to a controller. One of them is the verification that the controller firmware matches the service firmware. The service broadcasts its current connection state to the eventbus, where the UI is listening.
I suspect that when the service loses connection and immediately reconnects, the UI gets confused by the state changes. After all, is_latest_firmware is false when the service is disconnected.
I’ll have a look at making this information more explicit. Right now the UI has to juggle too many variables to interpret the data it has (Is the eventbus connected? Is the spark service available? Is the controller connected with outdated firmware, or not connected at all?)
I will keep an eye on it and how the behavior changes. The last restart was 50 minutes ago, while I was outside. My system really isn’t doing that much, runs the kegerator and fermenter. The Pi seems underwhelmed in terms of CPU and memory usage. The Pi 4b is a little overkill for just Brewblox, but was available and my other Pi 4b is very busy with various tasks.
Seems like having the UI manage all of that, rather than some event handlers that handle those tasks and simply update data the UI reads would be an improvement. I’ve not looked at the source, so can’t really be sure. After 35 years in IT, I’m loathe to dig-in .
The service has to be aware of connection state as well (it should not be reading/writing blocks if firmware is incompatible). I’m probably just replacing a various boolean flags with enums, so we can have firmware_status: 'DISCONNECTED' | 'INCOMPATIBLE' | 'OUTDATED' | 'OK'.
It’s definitely restarting the controller itself. Could you please connect it over USB, and run brewblox-ctl coredump?
It looks like it crashed on simultaneous use of wifi and bluetooth. I believe this is part of the laundry list of fixes that are waiting for release, but I’ll look into it to be sure.
Correction: there was a memory allocation failure. From your log, it looks like you have 108 blocks, the vast majority of which have default assigned names. (start with New|) This is significantly more than what you’d expect for a setup that runs a kegerator and fermenter.
I extracted the blocks that were not hardware-related (not a sensor, not a GPIO module), and listed them in a quick and dirty python script that will delete them. You’ll want to run the script on your Pi (or edit the localhost in the url).
It did, I only see one now New|TempSensorInterface-4. It was rebooting constantly throughout the day, but since the one after the script ran, I’ve not heard any reboots. When it was rebooting it would do it for a couple days and then stop for a few days and then start again. So it will take a few days to know that it has stopped.
When generating the list of blocks it removed, I filtered to exclude blocks that represent hardware. From a quick check of your blocks, it looks like that’s an actual sensor, but only used by blocks that no longer exist.
Immediately after, or minutes / hours later? When parsed, it’s the same error as before (memory allocation failure in the wifi thread).