Brewblox SSR suddenly stuck on cooling

We have a strange error that one of the PID, PWM, SSR suddenly is “stuck” and then cools down the fermenter. It resets itself when unplugging/plugging in the onewire temperature sensor / or rebooting Brewpi / raspberry pi. We are running 4 sets of PID, PWM, SSR’s that enables a magnetic valve that cools the fermenter with cooling water. This problem has occasionally occurred on the other PID’s aswell.
Attaching a picture of the PID graph. I need help to understand why this is happening and how to fix it if possible :grinning:

btw, we are using Edge release 2019/08/22

The PID seems to have the correct target (0), so I think this is a PWM bug.
You are using very low PWM values, only 1%. The PWM driver can stretch periods to achieve those very low values to correct for previous periods that were high for too long or low for too long.

Are you also using the latest release for the firmware?
This looks like something that could be caused by a bug that I fixed in the firmware release 21 days ago, bbe0ced. Maybe I missed a test case.

Please check your firmware version on the Spark service page, Spark widget.
Can you also share your your blocks (export blocks on spark service page) and show a graph of the PWM block and the Digital Actuator block.

@Elco Thanks for replying!
Here are the firmware info :
FW

PWM graph :

Digital Actuator graph :

brewblox-blocks-spark-one.json (9.0 KB)

Looking at your blocks, it looks like you are controlling all the SSRs over OneWire, correct?

That could be another thing to look into for me, whether the issue could be in the combo of PWM and DS2413. I run that same setup at home without problems.

I did notice that you use very short PWM periods. I use a 10 minute period on mine. To keep the temperature regulated, the PWM value is often only 1%. By using a 10 min PWM period, this will still be 6 seconds.

In your case, with a PWM period of 2 or 5 seconds, 1% will be only 0.02 or 0.05 seconds. This will cause a lot of OneWire communication and fluctuation of the achieved duty cycle because of time jitter.
I think just moving the solenoid doesn’t even happen before the valve closed signal is sent in most cases. So it starts to increase the PID output because the system doesn’t respond.

I think you will have much better results if you increase your PWM period to 5 or 10 minutes and perhaps add a bit of filtering in the setpoint.

I will still try to reproduce the bug by copying your settings to handle even this bad configuration withouth major issues.

Yes, all the SSR’s are connected over OneWire. One of the DS2413 are even connected via onewire from another DS2413. Seems to work fine (except this issue :slight_smile:)

I’ll try to increase the PWM period and filtering to see if it helps. Thanks a lot for your help!

I have found a possible cause and implemented a fix. This is a small change to how PWM toggling is calculated and limits how much a previous period can affect the new period.
When an actuator was held back by the mutex, it would cause a long low period. A long high period would follow to compensate, but this is often not desirable.
With the new code, the startup after being held back is normal and not too aggressive.

This fix is included in today’s firmware release. Thanks reporting the issue. Let me know how the system responds with the new firmware and the 10 minute period.

@Elco I updated the firmware and adjusted PWM and filtering, but now the problem occurred again.

Attaching file with blocks : brewblox-blocks-spark-one_20.09.2019.json (9.3 KB)
FW :
FW

PID :

PWM :

How long of actual cooling was that?
Checking your block config now.

Ah, you have your Ti set to 1 second!

That means that every second, your integral part of the output will grow with the proportional part!

The integrator is to correct steady state errors: when the temperature flatlines a bit under the setpoint, the integrator will pull it towards the setpoint slowly. 1s is not slowly.

You should set Ti to at least 2 hours. Try a Kp of 5, a Ti of 2h and a Td of 10m. Td should be the duration of an expected overshoot after a burst of cooling. Ti should be 2-3 times as long as the system takes to respond to a step.

All your PIDs have this wrong configuration looking at your export.

Aha! Thanks.
Like this? (-5 instead of 5 on Kp)
09

Yes, and indeed negative Kp for the cooler.
For the heater, use a Kp of 50. It has much less power than the glycol.