Service not connected to controller

Hey guys, hope this isn’t a repeat of an existing post (didn’t find anything when I looked).

I have been experiencing issues with my two Spark 4s sometimes loosing connection to my controller (Raspberry Pi) and being unable to reconnect. It happens to both of the Sparks intermittently and so far the only fix I have found is to power cycle the Spark itself (not ideal as I am in a commercial brewery). Below you can see what I see in the UI:

And the log file: https://termbin.com/gxbl

I have noticed that some times there is a statement that I need to update the firmware, but having read another post about that issue I think that the firmware message is more of a coincidence than a cause. Today, for example, there is no message about the firmware.

I can leave this Spark hanging for a bit to see if it comes back online or if someone wants me to try something specific.

One note, I did realize that my Controller may be attached to the “Brewhouse Lights” socket (my fail there) so it is likely that the Controller is offline outside of working hours. Gonna correct this today, but thought I’d add it in the description.

Cheers.

From logs, it looks like it discovers spark-one, but then struggles to communicate. Spark-two is not discovered at all right now, but was seen earlier by the spark-one service.

spark-one at least has a 6d uptime, so it doesn’t look like it’s crashing.

If you edit docker-compose.yml to add a --device-host={IP_ADDRESS} to command for both sparks, does this improve matters?

Are the Sparks using wifi or ethernet?

Correct. The reported firmware is up to date. The firmware popup is due to the UI getting confused about the connection status.

Hey Bob,

Both Sparks are connected via ethernet (too much stainless for super-stable WiFi here).
I’ve added the --device-host={IP ADDRESS} to the ¨command¨ sections for both Sparks using the Ethernet IP addresses for each. I just whacked it in after the --device-id=... text. Hope that’s correct. Not a coder… So lets see how that goes. Is there anything I can do to test the stability other than just waiting?

One interesting thing: When I went to check the IP addresses of the Sparks, Spark-two was flashing on/off an ethernet address of 0.0.0.0. There was no address or symbol for the ethernet connection for, say, 4 seconds, then one second of the 0.0.0.0, and then blank again. Had to power cycle Spark-two to re-establish the ethernet connection.

Log file after the change: https://termbin.com/nawj

Looks like I missed a step in instructions: after changing the compose file, you need to run brewblox-ctl up to apply the changes.

Ethernet being flaky would explain matters. If you run ping {SPARK_IP} on the pi, does that remain steady?

Ran brewblox-ctl up.
New log: https://termbin.com/hl18

The ping on Spark1 fluctuates between 1-60.0ms with my unofficial average being somewhere around 45 ms. Seems to have about 10% packet loss over the short-ish ping interval. The output after the Spark1 ping:

269 packets transmitted, 242 received, 10.0372% packet loss, time 690ms
rtt min/avg/max/mdev = 0.886/20.111/82.401/24.146 ms

The ping on Spark 2 started at 1.83ms, briefly went up to 50.0ms, and then came back and re-stabilized between 1.0-3.0ms. Will leave it running while I have lunch and see what happens. The output after the Spark2 ping:

72 packets transmitted, 72 received, 0% packet loss, time 175ms 
rtt min/avg/max/mdev = 0.887/13.595/49.956/16.098 ms

Can you try swapping the ethernet cables of spark 1 and spark 2? If the packet loss moves to the other spark, that pinpoints the problem.

Technically possible, but would be quite challenging as the system is kind of “permanently” installed in the brewery. Should be easier to attach a different/new Ethernet cable to Spark 1 which I will do after running the ping on it, with the existing cable, for an extended period.

I ran the ping on the Spark2 (the one which had connectivity issues today) for an extended time and this is the result:

7467 packets transmitted, 7461 received, 0.0803536% packet loss, time 8386ms
rtt min/avg/max/mdev = 0.627/15.235/99.602/18.521 ms