Spark 3 hangs every few hours

Hi, I have to regretfully report that Spark is still hanging or freezing after I installed the latest update last night.
When it wakes up, all regulation starts from zero again.

tempsnip

Logs: https://termbin.com/d8nk

Happened again today.

Logs: https://termbin.com/fjju

Just installed the update and the new spark 3 firmware… Is it just me, or does the UI loads and responds a bit faster?!

Possibly? The last performance-related change to the UI was made 2021/10/4, and since then we’ve also updated some dependencies.

@Clrwl04 @Arnt We’ve been busy hunting firmware bugs, and while we still can’t reproduce the hangup itself, we’re confident we found and fixed a suspicious bug that may well have caused it.

If you have the time and inclination, could you please run the fix branch for a while to see whether that fixed the issue?

In docker-compose.yml, for the Spark service, replace

  image: brewblox/brewblox-devcon-spark:${BREWBLOX_RELEASE}

with

  image: brewblox/brewblox-devcon-spark:preview-firmware-fix

Then run brewblox-ctl up to download and apply.
To undo the change, revert to ${BREWBLOX_RELEASE}, and run brewblox-ctl up again.

Ok, have made the change in docker-compose.yml and applyed it.
Should I undo the change again at once or see if this fixed the bug?

You’ll want to let it run for a while, see if it still has hangups.
Undo the change when you want to stop using the test version (eg. if we release the changes to everyone)

That`s good :slight_smile:
I will report back within a couple of days.

Had my first issue since the flashing to the latest firmware.

Not sure if it is related, but here is the logs https://termbin.com/930i, maybe it can help with the issue.

I have a simple script reading all blocks from my spark and reading the ticks block watching the millisSinceBoot field.

There is a few interesting error messages around 2021/11/02 00:53:29, and I can confirm that the spark was still up as it was up for 97 hours when this happened 15 mins ago.

Uptime is lower: 351791597 -> 16120

It seems the watchdog restarted it in this instance.

Thanks for reporting! It indeed looks like something is slowing down the controller, to the point where it’s either getting timeout errors, or triggering the watchdog. This is very much in line with what we’ve seen so far.
I just pushed a firmware release that fixes a display bug on the Spark 4, and will likely (but not 100% sure) fix these hangups on the Spark 2/3.

Some more specific observations:

If you have a script that exclusively reads the ticks block, why not use the /blocks/read endpoint?

response = requests.post("https://localhost:40443/brewpi/blocks/read", json={"id": "SystemTime"}, verify=False).json()

Traefik is complaining about the sharemycook service not having endpoints. You may want to add a traefik.enable=false label.

Hi, have applied the update with the preview firmware. Will let you know what happens next.

No need for the preview branch anymore, it has been merged into edge.

Hi,
The problem was not solved by preview firmware fix, it got worse. The controller reboots more often now than before the preview firmware fix.
I’m going back to $ {BREWBLOX_RELEASE}.

Logs: https://termbin.com/og8m

I tried it with

requests.post("https://localhost:40443/brewpi/blocks/read", data={"id": "SystemTime"}, verify=False)

forgot about json=, will update.

Cheers, logs that you never look at :smiley:

That now includes the fixes.
I wonder what’s different in your situation, because I couldn’t reproduce this.

By exporting the eeprom as binary blob, I can make an exact copy of your spark. Could you please export it by connecting the Spark over USB, and running brewblox-ctl particle -c eeprom?

Have reverted to $(BREWBLOX_RELEASE} earlier this evening. It hung a time later. Have tried the suggested option to export the eeprom, this failed, the final output of the command is below

IMPORTANT: Copy the following bashupload url. The line starts with ‘wget’
curl: (60) SSL certificate problem: self signed certificate in certificate chain
More details here: curl - SSL CA Certificates

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
Command ‘docker run -it --rm --privileged -v /dev:/dev --pull always brewblox/firmware-flasher:edge eeprom’ returned non-zero exit status 60.

Have tried creating a .curlrc file in $HOME containing the option: insecure
also tried the option: --insecure
No luck, no change in the error message.

Hi Elco, I’m currently at work and the Spark is not connected via USB at home.
If the problem has not disappeared until I get home in 14 days, I will send you a copy of the particle eeprom.

Hung again last night. Any advice to extract the eeprom image?

Run

docker run -it --name flasher --privileged -v /dev:/dev --pull always brewblox/firmware-flasher:edge eeprom

docker cp flasher:/app/eeprom.bin ./

docker rm flasher

The first command will give an error, but this is after it created the eeprom file. The docker cp command will copy the file to your Pi. From there you can get it to your own pc using filezilla, vscode or scp.

We’ll be replacing the bashupload in the script.

1 Like

Nice instructions. Be sure to have the USB cable connected before running the first command.
If it complains the docker image “flasher” already exists. Run the docker rm flasher to remove it. Then connect the USB and try again

Here is the eeprom binary
eeprom.bin (128 KB)

2 Likes