Spark 3 hangs every few hours

Another hang this evening have used the reset button this time.
Logs https://termbin.com/gqfq

Exported blocks
brewblox-blocks-spark-one_20211021_reset.json (15.1 KB)

Thank you.
I have not been able to recreate it yet. One thing that could be different in your environment compared to mine could be the MDNS queries on the network. I have refactored the MDNS responder today to make the code more robust. It’s a long shot, but a code quality improvement anyway.

I’ll let you know when it’s ready to try, after code review tomorrow.

I noticed this in your dmesg logs:

blk_update_request: I/O error, dev mmcblk0, sector 16376 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0

I think maybe your SD card is dying. Haven’t really looked into it, but mccblk0 is your SD card and it has an I/O error. Maybe @Bob_Steers knows more about this, but if you have a new card, that might be a good idea.

Thanks for the pointer on the SDcard I’ll sort a new card.

For a change this evenings hang has a solid blue led.
Mem figures 66% 70%

Reset with button.
Log: https://termbin.com/7342

I am having very similar issues that seems to have started after I updated to build 2021-09-20T14:16:22.040Z

I have had it twice when it no actuator was running, and once where my fridge was cooling, freezing my beer.

I have been pulling power to get it going back again.

https://termbin.com/ts65v

I think this recent hotfix came out just after my last update, could it be related to this?

  • (fix) Resolved an issue where firmware communication would crash when many blocks were present on the controller.

My memory indicators are 64% and 77%

You can see in the graph below the big jumps where the spark was unresponsive

Died around Oct 9th, then Oct 22nd in the morning (9am), then again that night (10pm) where it was locked on cooling.

I am just doing a brewblox-ctl update and reflashing the firmware

logs after the update https://termbin.com/l0ox

I was reading the documentaion on the blocks API and I was hoping that I could get some data direct from the spark using

$ curl 'https://localhost/brewpi/blocks/discover' -k
405: Method Not Allowed
$ curl 'https://localhost/brewpi/blocks/all/read' -k
405: Method Not Allowed

It would be good to be able to poll the device and alert me if it has stopped while we are sorting this out.

You can, but you have to add -X POST to not get the 405 error.

Installed a new SD card and install earlier today.
Imported blocks from original install.
Locked up again solid green led.
Reset by button
Logs https://termbin.com/lrjo

Any chance of reverting the last update and firmware to exclude that as a possible cause.

I’ll build a feature branch with previous firmware tomorrow.

Perfect. I can use the Ticks block watching millisSinceBoot

I have been monitoring it for 24 hours after the firmware update, no freezes yet.

1 Like

Build is up. In your docker-compose.yml file, for the spark services, change ${BREWBLOX_RELEASE} to firmware-fix-rollback

services:  
  ...
  spark-one:
    image: brewblox/brewblox-devcon-spark:firmware-fix-rollback
    ...

It will automatically download and use the image when you run brewblox-ctl up.
Later, replace it with ${BREWBLOX_RELEASE} again to undo the change.

It’s unlikely to fix the problem, but may as well give it a try.

1 Like

Thank for the assistance. Firmware revert completed. Should know soon if the helps.

After my vacation I reactivated my setup. This time it lasted a longer period until I had another freeze in my Spark 3. Status LED was solid green. This are the logs before I power cycled: https://termbin.com/chx5

Sad to say firmware revert did not prevent another hang.
Sorry am Away again so do not know led status or mem usage.
Power cycled and here are the logs
https://termbin.com/7amw

Thanks for letting us know!

We narrowed down the problem, and are now looking for the root cause and a fix.
It appears that OneWire sensors suddenly take a long time to read. If you have enough senors, this will slow down the system to the point of it becoming unresponsive.

1 Like

Not sure if it’s the same issue but my Spark 2 hung recently:

Luckily both of my mini fridges where not cooling at the time and it only made my kegerator serve warm beer and forced a diacetil rest on my already conditioning Helles.

Blocks: brewblox-blocks-spark-one_28-10-2021.json (13.3 KB)

Logs: https://termbin.com/n65z

Hi.
My Spark also hangs at least once a day.

Blocks: brewblox-blocks-spark-one (1).json (22.1 KB)

Logs: https://termbin.com/1eoe

I have not been able to get a system hung up myself, best I could do is a watchdog timeout.

Any idea since which update the problems started?

I’m pretty sure it startet just a few days ago, and I’m running the latest update.

1 Like

We just released an update that hopefully also fixes the hangups. I was not able to pinpoint the cause though, so I started refactoring things that maybe could have been a cause.

Please report any differences that you see.

1 Like