Blocks reverting to "NEW" after latest update (2023-05-06) AND Reboot Loop

HoboStatus · June 8, 2023, 2:05pm

Hey guys,

Two days ago I updated the 6 Spark 4s I have running at the brewery. After the update, all of the blocks on one of the Sparks reverted from my designated names to “NEW”. I thought it was a glitch with the update process and reconfigured the blocks.

Today I had to power cycle all the Sparks due to a different issue already being addressed in another post. After the power cycle all of the Sparks got into a Reboot Loop where they would power up for about 10-15 seconds and then reboot. Rinse and repeat. When on, they showed the temp sensor and PID settings on the display (as I have set it up) and the lights on the casing were flashing orange (if I remember properly). The ethernet icon was there, but with no IP address after it. When I watched my ethernet switch, I could see when the Sparks rebooted based on the (lack of) data transfer. I eventually uplugged my server (a raspberry pi) and all of the Sparks. I then turned the Sparks on, then the pi, and ran BREWBLOX-CTL UP. Everything is back up, but it was weird and troubling. Any thoughts?

Log: https://termbin.com/054s

Also following the power cycle of the Sparks, a different Spark has now reverted half of the blocks to “NEW”.

Any ideas what could be causing this?

Elco · June 8, 2023, 7:25pm

There’s a bug that’s on the todo list that we still need to investigate:

Maybe it is related.
Can you plug one of the Spark 4’s that rebooted into USB and run
brewblox-ctl coredump?

That will give us the code location of the crash to help us find the cause.
A network failure should never cause a reboot of course.

HoboStatus · June 9, 2023, 3:24pm

Hey Elco,

Here is the coredump file generated. This Spark was running normally (though failing to stay connected to ethernet as discussed in this thread.

https://termbin.com/2ki9

If you need the spark to have just been in a reboot loop, I’ll have to wait until I see that behavior again.

Bob_Steers · June 12, 2023, 9:47am

I parsed the coredump, and it matches crashes we see when the Spark can’t get a DHCP response.
Next release will include a fix for the Spark crashing when it can’t get an ethernet DHCP IP address, but that does not magically fix the overall network problem.

Do the Sparks have a good connection to the router?
Is DHCP active on the router?
Do the Sparks have a static DHCP lease?

HoboStatus · June 13, 2023, 11:11am

The connection to the router is good. Not sure about the DHCP questions so I’ll get back to you.

On a possibly related note: Since the software update my server (Raspberry Pi 3) keeps losing connectivity. It isn’t reachable on the back-end and the only way to get it back online is to physically reset it. The duration it stays online differs, but it can lose connection as quickly as a couple hours. I read in another thread that it could be an SD card issue (as well)…

Any troubleshooting thoughts are always appreciated.

Bob_Steers · June 13, 2023, 11:38am

Occasional freezeups and loss of connectivity (it’s hard to tell the difference for a headless Pi) are a known issue. I haven’t been able to consistently reproduce the problem, but there are multiple theories with fixes that seem to help. We documented these fixes at Troubleshooting | Brewblox.

DHCP failure is a new theory. I’m currently looking into this. When I tried a possible solution, I managed to render the tested Pi unresponsive, so I’ve been having to fix that first.
Admittedly, I had previously been testing the use of the Pi as a Wifi → ethernet bridge for the Spark, and I made the change while containers were still running. I probably only have myself to blame (as usual).