Heat and cool race condition

I came hoe to find the fermentation fridge in an heat and cool race condition. Both SSR were in th ON stat as you can see from the attached graph (and logs). Lastest firmware and UI. Had do restart the brewpi.
I also have had problems with multiplexed Mutual heaters on the seperate HERMS setup, not beeing able to turn all element OFF and “restarting” the boil (BK). Needed a restart of the brewpi, also on this occation.

https://termbin.com/z1l2a

Could you zoom in on the 0-1 range for a short period of time? Values are averaged when viewing long periods. This includes actuator states.

I’m also not sure I follow “multiplexed mutual heaters”. Which blocks did you attempt to turn off, and how?

Your logs show warnings about your DS2413 disconnecting multiple times. (New|DS2413-1, used by LF Cool Actuator / LF Heat Actuator). Could this be related to your issue?

The Tilt service is also logging some of the same errors reported in Lost connection to Tilt and UI. It may be background noise.

@Bob The latchup started @16:55 according to the graph. Looking at the logs I can not pinpoint a distinct error message that indicates the direct cause of the latch (time-wise).
There are similar warning before and after the graph indicates that both heater and cooler are on simultaneously.

snap from the Spark-one log:
spark-one_1 | 2020-02-18T13:00:44.678812160Z 2020/02/18 13:00:44 INFO …_devcon_spark.communication Spark log: WARNING:DS2413 disconnected: 3A3F1B21000000A5
spark-one_1 | 2020-02-18T13:00:45.680005744Z 2020/02/18 13:00:45 INFO …_devcon_spark.communication Spark log: INFO:DS2413 connected: 3A3F1B21000000A5
spark-one_1 | 2020-02-18T14:03:52.551397823Z 2020/02/18 14:03:52 INFO …_devcon_spark.communication Spark log: WARNING:DS2413 disconnected: 3A3F1B21000000A5
spark-one_1 | 2020-02-18T14:03:53.552340445Z 2020/02/18 14:03:53 INFO …_devcon_spark.communication Spark log: INFO:DS2413 connected: 3A3F1B21000000A5
spark-one_1 | 2020-02-18T15:56:36.095121372Z 2020/02/18 15:56:36 INFO …_devcon_spark.communication Spark log: WARNING:DS2413 disconnected: 3A3F1B21000000A5
spark-one_1 | 2020-02-18T15:56:37.096488798Z 2020/02/18 15:56:37 INFO …_devcon_spark.communication Spark log: INFO:DS2413 connected: 3A3F1B21000000A5
spark-one_1 | 2020-02-18T17:28:36.411556159Z 2020/02/18 17:28:36 INFO …_devcon_spark.communication Spark log: WARNING:DS2413 disconnected: 3A3F1B21000000A5
spark-one_1 | 2020-02-18T17:28:37.055356491Z 2020/02/18 17:28:37 INFO …_devcon_spark.communication Spark log: INFO:DS2413 connected: 3A3F1B21000000A5
spark-one_1 | 2020-02-18T18:04:17.985277519Z 2020/02/18 18:04:17 INFO …_devcon_spark.communication Spark log: WARNING:DS2413 disconnected: 3A3F1B21000000A5
spark-one_1 | 2020-02-18T18:04:18.985348234Z 2020/02/18 18:04:18 INFO …_devcon_spark.communication Spark log: INFO:DS2413 connected: 3A3F1B21000000A5

“multiplexed mutual heaters” = Standard HERMS setup (Wizard) where the two heaters er on a “mutex block”. I referred to the incident because from the user perspective the nature of the problem beeing similar; I could not manually turn off the corresponding “spark pins” even after disabling ALL blocks in the HERMS “realation” (control chain). (in the HERMS setup the heaters are connected to the Brewpi-spark directly and not through a ds241)

Obviously the two spark pins should NOT be on simultaneously. (even if the ds2413 is not responding)…
(contreller fail-safe, user warning in GUI, auto-recovery, restart…)

How can we get closer to the root-cause of this latch-up?

Atle

Your graph doesn’t appear to be showing any mutex violations. The cool pin state goes to 1 around 1655, but heat pin state remains at 0 for the duration of the graph.
This is not to say all is well. The earlier posted graph indeed shows a mismatch between desired and actual beer temp.

The earlier picture suggests that issues started around 1800. Could you go to your graph settings, and select a time period of 10 minutes to 1800? (It’s important you use graph settings for this, and not zoom in on an existing graph).

I agree that timing indicates we can rule out the DS2413 reconnects.

As for pin behavior:
You can’t manually toggle SparkPins objects. You can toggle not driven digital actuators. To disable a chain, disable a setpoint, and the rest will follow.
Spark pins targeted by a disabled digital actuator are always off.

How did you observe SSRs being enabled? Software readouts, the activity LED on the SSR, heater state?

Thanks for your promt replies @Bob_Steers

Attached I have zoomed in on around 18:00.
To me it is clear that the cool pin is ON while the heat (pid) is at the same time trying to increase the beer temp (naturally ;-))

The latch-up was observed by several factors in chronological order 1) the graph showed a abnormality 2) the graph indicated both heat and cool action 3) the pin state(s) indicated ditto 4) the fact that the fridge was running and the heater was hot 5) the LEDs on the SSRs wher all ON.

[—] as for pin behaviour I observed a similar latch-up by NOT beeing able to get the second heater element to turn ON in the HERMS system, Heating in the HLT at the same time as boiling. The boiler PWM indicated a 60% on state at the moment however the mutex was holding the HLT at 0% (at a later stage I discovered that the BT element we at 100%) Disabling both (even the MT one) setpoints and in turn disabling all blocks did not change the latch-up. I tride to “free the driver for the pins” to see if I could toggle the output manually. I did not find any workable solution until the BrewPI-spark was restarted by a power cycle.
I mention this in the event that a this could indicate the same symptom(s) as we see on the Fermentor issue from yesterday?
Atle

Agreed that this picture is very much showing simultaneous activation of cool / heat pins.

I have a hunch that the cool pin may have become stuck, causing the heater to try and compensate. I can’t recall seeing this behavior before.

Either way, firmware/hardware bugs are outside my area of expertise, so I’ll have to summon @Elco to take a look at it.

@Bob,
These are the kind of glitches it is important to find the root cause of. And, probably the firmware is the best place to add safe-guards as “fail-to safe” modes or even “watchdog” functionality to avoid a serious behavior like this. Worst case cenario is burning fridge motors or even overheating fermentor cabinets.
Hope @Elco can shed som light into probabilities and even fixes?
Thanks again,
Atle

Can you do a full export of your blocks from the Spark page?
And make a screenshot of your block relations diagram.

This is exactly what the mutex should prevent. I would like to see what the settings are to see if it was configured correctly. There is a watchdog that resets the spark if it becomes unresponsive, otherwise the blocks should maintain correct behavior themselves.

@Elco, Thank you for the response!
I have attached the relations and the configured “blox”.
This setup runs two fermentation fridges SmallFridge (SF) and LargeFridge (LF),
Small Fridge is standard wizard setup (two PIDs)
Large Fridge has the advanced setup (w beer PID) due to larger volumes and in my opinion, better control. High air circulation is achieved by a 230V 5" computer fan (2,5m3/min). Heater and cooler SSRs driven by means of the BREWPI ONEWIRE SSR EXPANSION BOARD.

In my experience this is probably not a setup problem.
The fridge was controlling the beer temp OK before the incident (ca 16:55) (image in original post, detail in the follow up post). (first a ramp down from 19.5 C to 18C, then for several hours stable (although 0,5C low with a slow integrator) at 18C. @16:55 the graph indicates both heater and cooler at ON state. It looks like the heater and cooler at this point is “disconnected” from the mutex. Heater ramping to 100% while cooler also is at ON state however following the max on time/min off time constraints (approx
30m period) , which the temperature and set-values indicate.

(As mentioned above I have also experienced latch-up in the HERMS setup where the mutex-block did not let go of the BT heater to allow HLT element some power)

hope this brings us a step closer.

Atle


brewblox-blocks-spark-one.json (11.0 KB)

We just spent some time debugging this issue, and managed to reproduce the OFF/OFF mutex deadlock.
The ON/ON state you had in your fermentation fridge is more elusive, but seems possible.
The problem is caused by incorrect handling of mutex state after the DS2413 is disconnected / reconnected.

To confirm, as it caused some confusion: your graphs are about the Large Fridge?

Could you also please export your graph data to CSV? You can do so by selecting the relevant period in settings (18th 0300 -> 18th 2200), and then choosing “export to CSV” in the graph action menu.

That is promising! :slight_smile:
I have exported the original graph and can export more parameters if needed.
And of course to be clear, this is regarding the Large Fridge (LF)

Thank you for your effort and for the great work on a very promissing and feature rich brewing-kit!

Looking forward for your analysis,
Atle

graph-LF Graph-spark-one-downsample_1m.csv|attachment (90.6 KB)

Hi @Bob_Steers, @Elco
How is the fix for the mutex deadlock progressing. Today it has again struck the LargeFridge setup (attached screenshot, csv file and link til log). Both the Heater and the Cooler is at present in an ON state. :open_mouth:
The profile is on a ramp from 18C to 4C ColdCrash. The BrewPI-spark has been running steady at 18C since last incident.
Notice the Cool PWM value that “freezes” at 29.37 when the deadlock appears.
Are you able to hint at a workaround or change in setup until, as I understand it, a firmware update is available that will catch the situation and reset the fault?

https://termbin.com/ozmi


graph-LF Graph-spark-one-downsample_1m(1).csv (71.4 KB)

Can you confirm that this only happens with DS2413 actuators? I have an idea for a potential fix and will work on it today.

You can also try to replace the RJ12 cables to avoid the communication errors that trigger the bug.
Don’t route high current wires next to them.

I can confirm that the LargeFridge SSRs are driven through the DS2413.
I will patch up a new cable for the one-wire bus to the DS 2413.(short and dedicated).

Question: as the wifi connected sparks are, to my experience, not as predictable as the USB connected, I have rather hooked up a RPI to each brewpi-spark in use (HERMS and Fermentation). For the Large- and Small-Fridge I am currently using one brewpi-spark_v2. Is is possible to have TWO brewpi-sparks_v2 connected to one RPI through USB at the same time? (I have a “spare” brewpi-spark_v2 and could drive the LF and the SF directly from separate spark_v2 (pins) and omit the DS2413 all together)

Atle

Yes, you can have both on USB, connecting to one Pi.
If you have the device-id flag in your compose file, they can be uniquely identified:
https://brewblox.netlify.com/user/connect_settings.html#device-id

The RJ12 cables I have in the store are made from CAT5 cable. 2 wires are left unconnected. 6-wire cable used for phone lines is much thinner than UTP cable. You could try that too.

Thanks @Elco, I was hoping that the host ID was going to be a means to enable multiple brewpi-s on USB to the RPI (as on wifi) but have not had time to test.
However, first I`ll patch up a short cat5.
We can decide the order of change (cable/SW) when we have the possibility to test a fix as you are ready with the firmware.

Atle

device-host is for WiFi devices.
The last example is what you will need to force USB.

      --device-id=300045000851353532343835
      --discovery=usb

To reconfigure, plug in your second spark, and run these commands.

brewblox-ctl add-spark -n spark-one --discovery=usb  --force
brewblox-ctl add-spark -n spark-two --discovery=usb

It will let your pick from discovered devices.
Your log file reports your spark-one having device ID 3E0035000A47353235303037.

:heartpulse: Love the new features in brewblox-ctl!

If time, I`ll test the DUAL brewpi-spark setup for the two fridges as a reliability test. However, I would rather have them both be controlled by a single brewpi-spark and use the spare for a separate project.
For now, I will leave the LF setup in peace at least until the current brew is kegged (mango infused, pressure fermented (NE)IPA).

Atle

1 Like

That sounds delicious!