Service not connected to controller

Hey guys, hope this isn’t a repeat of an existing post (didn’t find anything when I looked).

I have been experiencing issues with my two Spark 4s sometimes loosing connection to my controller (Raspberry Pi) and being unable to reconnect. It happens to both of the Sparks intermittently and so far the only fix I have found is to power cycle the Spark itself (not ideal as I am in a commercial brewery). Below you can see what I see in the UI:

And the log file: https://termbin.com/gxbl

I have noticed that some times there is a statement that I need to update the firmware, but having read another post about that issue I think that the firmware message is more of a coincidence than a cause. Today, for example, there is no message about the firmware.

I can leave this Spark hanging for a bit to see if it comes back online or if someone wants me to try something specific.

One note, I did realize that my Controller may be attached to the “Brewhouse Lights” socket (my fail there) so it is likely that the Controller is offline outside of working hours. Gonna correct this today, but thought I’d add it in the description.

Cheers.

From logs, it looks like it discovers spark-one, but then struggles to communicate. Spark-two is not discovered at all right now, but was seen earlier by the spark-one service.

spark-one at least has a 6d uptime, so it doesn’t look like it’s crashing.

If you edit docker-compose.yml to add a --device-host={IP_ADDRESS} to command for both sparks, does this improve matters?

Are the Sparks using wifi or ethernet?

Correct. The reported firmware is up to date. The firmware popup is due to the UI getting confused about the connection status.

Hey Bob,

Both Sparks are connected via ethernet (too much stainless for super-stable WiFi here).
I’ve added the --device-host={IP ADDRESS} to the ¨command¨ sections for both Sparks using the Ethernet IP addresses for each. I just whacked it in after the --device-id=... text. Hope that’s correct. Not a coder… So lets see how that goes. Is there anything I can do to test the stability other than just waiting?

One interesting thing: When I went to check the IP addresses of the Sparks, Spark-two was flashing on/off an ethernet address of 0.0.0.0. There was no address or symbol for the ethernet connection for, say, 4 seconds, then one second of the 0.0.0.0, and then blank again. Had to power cycle Spark-two to re-establish the ethernet connection.

Log file after the change: https://termbin.com/nawj

Looks like I missed a step in instructions: after changing the compose file, you need to run brewblox-ctl up to apply the changes.

Ethernet being flaky would explain matters. If you run ping {SPARK_IP} on the pi, does that remain steady?

Ran brewblox-ctl up.
New log: https://termbin.com/hl18

The ping on Spark1 fluctuates between 1-60.0ms with my unofficial average being somewhere around 45 ms. Seems to have about 10% packet loss over the short-ish ping interval. The output after the Spark1 ping:

269 packets transmitted, 242 received, 10.0372% packet loss, time 690ms
rtt min/avg/max/mdev = 0.886/20.111/82.401/24.146 ms

The ping on Spark 2 started at 1.83ms, briefly went up to 50.0ms, and then came back and re-stabilized between 1.0-3.0ms. Will leave it running while I have lunch and see what happens. The output after the Spark2 ping:

72 packets transmitted, 72 received, 0% packet loss, time 175ms 
rtt min/avg/max/mdev = 0.887/13.595/49.956/16.098 ms

Can you try swapping the ethernet cables of spark 1 and spark 2? If the packet loss moves to the other spark, that pinpoints the problem.

Technically possible, but would be quite challenging as the system is kind of “permanently” installed in the brewery. Should be easier to attach a different/new Ethernet cable to Spark 1 which I will do after running the ping on it, with the existing cable, for an extended period.

I ran the ping on the Spark2 (the one which had connectivity issues today) for an extended time and this is the result:

7467 packets transmitted, 7461 received, 0.0803536% packet loss, time 8386ms
rtt min/avg/max/mdev = 0.627/15.235/99.602/18.521 ms

Good morning all,

Adding my somewhat new issue to this old thread as there is enough similarity:

Since the previous posts, I have re-wired the brewery which solved the previous packet loss issue.

The current challenge:

I am currently running 6 x Spark 4s with the same setup and one of them, spark-two, keeps losing ethernet connection. They are all powered by one 24V/2.0A power supply and are each connected to an 8 port ethernet switch.

I tried swapping the ethernet cables to different ports on the switch but the issue moves with the cable connected to spark-two. As of yesterday all are running the latest firmware (f6ac8cef - 2023-05-06), but the problem remains.

I was concerned that my ethernet cable might be faulty so I re-crimped the connectors and tested the connection and speed by connecting a laptop which seemed to work. I pinged the Spark while it was connected and the result was:

--- 192.168.5.183 ping statistics ---

656 packets transmitted, 656 received, 0% packet loss, time 1519ms

rtt min/avg/max/mdev = 0.593/0.929/2.360/0.268 ms

When I lose connection again I will try to ping the Spark and see what happens.

This morning I had a similar message as before that the service was online, but not connected to my controller.

Previously, I could regain the connection by disconnecting/reconnecting the ethernet cable from the switch but that seems to have stopped working. This morning I was able to regain connection by power cycling the Spark AND disconnecting/reconnecting the ethernet cable (power cycling alone wasn´t enough).
I also checked the voltage of in the System Info and I believe it is good:

After regaining connection this morning, I snagged the log which can be found here:
https://termbin.com/op8ao

I can see a lot of DISCONNECTED messages for spark-two but I’m far from technical enough to identify what is causing it.

I will update this thread with both a log and the ping results when I lose connectivity again.

Looking forward to any suggestions.

Your logs don’t show anything that immediately stands out, apart from the obvious errors.

A useful tool may be to listen to the USB console output. This prints more detailed messages.
To do so, attach a Spark over USB, and run

. .venv/bin/activate
pyserial-miniterm /dev/serial/by-id/{tab to autocomplete long name} 115200 | tee output.txt

This will both show the output, and write it to output.txt. If it loses connection again, please upload the output.txt file with

brewblox-ctl termbin output.txt

Thanks Bob. I’ll try to hook it up to a pc over the weekend when there will be no brewing going on. Will update when I have some logs.