GPIO SPI Error and Temp Sensors Failing/Undiscoverable

Hey guys.

I’m running multiple Spark4s and for months things have been super smooth. However, 2 days ago one of two temp sensors connected to “CCT 21/22 Spark” lost connection with the error “OneWire Sensor could not be read.” I thought maybe the senor itself had crapped out so I borrowed one from another Spark4 (BBT Spark) to see if I could replace it. No luck. I tried Discovering New OneWire Blocks but nothing was discovered. I power cycled the Spark4 and after doing so still didn’t see the sensor and had an error on the GPIO Module: SPI error. The other temp sensor on CCT 21/22 Spark still works.

To make things more wonky, I tried to return the borrowed sensor to BBT Spark and it doesn’t show up. I also tried to connect the originally wonky sensor from CCT 21/22 Spark to BBT Spark and it also isn’t discovered.

I have tried resetting everything, Stopping and Starting all services, and power cycling everything. I also removed all blocks on BBT Spark and tried discovery again. No joy.

Hoping you can help and that I haven’t fried 2 temp sensors, and/or my GPIO module.

I tried to perform a coredump, but I got the following error which is new to me (used to be on a pi as a server and am not on a Linux PC):

A fatal error occurred: Could not connect to an Espressif device on any of the 0 available serial ports.
Command 'sudo -E env "PATH=$PATH" esptool.py --chip esp32 --baud 115200 read_flash 0xA10000 81920 coredump.bin' returned non-zero exit status 2.

If a single OneWire sensor dies, it will block all other sensors from being read. If you plug in a single sensor, does it then work?

The coredump error suggests that the Spark is not connected over USB to the server.

Also on the system block, what are the voltages reported?

CCT 21/22 Spark
Internal: 5.135V
External: 24.403V

BBT Spark
Internal: 5.119V
External: 24.423V

Will try for a coredump via USB shortly. In the meantime:

If a single OneWire sensor dies, it will block all other sensors from being read. If you plug in a single sensor, does it then work?

A single sensor works on both CCT and BBT Sparks still. The potentially broken one isn’t read on either, nor is the one that was working on the BBT. Is it accurate then that if a OneWire sensor dies, then it will block “new” sensors from being read. That could explain what I’m seeing. On the CCT Spark the sensor may have died, making it unable to read a working sensor. I then tried plugging the potentially broken sensor into the BBT Spark while the working sensor was disconnected. When I reconnect the working sensor it isn’t readable so the broken one may have blocked the BBT Spark’s ability to read other sensors. Again, both BBT and CCT sparks have one working sensor connected.

If this is the case, is there something I can/should do to make the GPIO units able to read working sensors?

To be more precise: if a OneWire device fails, the OneWire bus master will be unable to interact with any connected devices. This includes reading existing sensors and discovering new ones.

However, we have a OneWire bus per GPIO module. If a broken sensor is connected to module 1, it will not block module 2.

Would this explain the symptoms, or are the issues position-independent?

Here’s the coredump from CCT 21/22 Spark.

https://termbin.com/4wss

I can add that both the Spark and the GPIO Module feel more than warm and a bit less than hot to the touch. I don’t expect that’s supposed to be the case.

Would this explain the symptoms, or are the issues position-independent?

It seems that the issues are position-independent. I moved the working sensor to different ports on the GPIO module on CCT21/22 and it still works. I hesitate to mess around with the working sensor on BBT Spark as we need it to work and I don’t wanna mes it up as well.

Hey guys,
Any thoughts on this? I’m starting to think the GPIO module crapped out. Before ordering a new one (and sensors) I’d be interested in knowing your thoughts.

Thanks for your patience - it took a while to get around to looking into it. (parsed coredump output: https://termbin.com/araf)

It looks like you faced a heap memory overflow. The upcoming release already includes some improvements that drastically reduce memory usage when you have a lot of blocks. Could you please export and upload the blocks on your controller so we can check whether this theory is plausible for your setup?

Hey Bob,

Thanks for taking the time. Attached you can find the blocks. At the moment the one working sensor is shared between the two setpoint blocks as there isn’t another working sensor. Obviously this isn’t how it would be if I had both sensors working.

We don’t have a ton of blocks, but you’ll let me know what you think.

Is this what would cause the SPI error?

blocks_spark-two_2023-10-27_11-44.json (10.8 KB)

The number and type of blocks is nowhere near enough to cause a heap overflow on its own. The error itself is located in the network stack. It’s very unlikely to be caused by a faulty GPIO module.

If I understand correctly, the current state is:

  • Both CCT and BBT Spark have a single GPIO module
  • sensor 1 → still plugged in on CCT, works fine
  • sensor 2 → was plugged in on CCT, now unreadable on both CCT and BBT
  • sensor 3 → was plugged in on BBT, now unreadable on both CCT and BBT
  • sensor 4 → still plugged in on BBT, works fine

I’m indeed leaning to suspect hardware failure, but @Elco is the expert on that.

Hey Bob,

It’s very unlikely to be caused by a faulty GPIO module.

Is this to say you think it is on the Spark 4? I was reallyhoping it was the GPIO module as that is significantly cheaper to replace.

A small amendment/update to your summary of the current state:

Sensor 3 is working again on the BBT (not the CCT). I didn’t do anything in particular to get it to work again, but maybe a power cycle here and there helped… So the curent state is:

  • Both CCT and BBT Spark have a single GPIO module
  • sensor 1 → still plugged in on CCT, works fine
  • sensor 2 → was plugged in on CCT, now unreadable on both CCT and BBT
  • sensor 3 → was plugged in on BBT, unreadable on CCT, was unreadable on BBT after connection to CCT but is now miraculously readable again on BBT
  • sensor 4 → still plugged in on BBT, works fine

Is there anything I can do on my side to help in determining if this is a hardware failure?

Hey guys,

Just checking in again to see if there is anything I can do on my side to check the hardware side of things. It’s getting kind of time critical that I find a solution as I’m relying on Inkbirds for my temp control and we’ve had some bad experiences with them in the past.

Thanks in advance.

Hey @Bob_Steers and @Elco,

Following up again on this as it feels it has fallen through the cracks. It’s quite disappointing as this forum has traditionally been exceptionally helpful and speedy in troubleshooting issues.

I am hesitant to hook up a GPIO module from another Spark in case it gets “blown up” by the CCT Spark. If you think this would be a safe test, I’m glad to try it.

The Spark and the GPIO module are built to withstand both overvoltage and overcurrent safely. I’m really not qualified to predict scenarios where this protection is insufficient.

I pinged Elco again to look at this. He has been swamped in hardware work for the last few weeks. Our apologies for the delay.

This can use some clarification: the heap overflow spotted in the logs does not match the other symptoms. It is most likely unrelated.

Hi Henri,

We’ll just ship you a new IO module to try. Should I also include sensors just to be sure?
It is unlikely that the Spark 4 itself is the issue.

Best regards,

Elco

Thanks for the reply Elco,

I think new sensors would be smart, just in case something goes wrong. Should I email you the address to send the unit and sensors to?

Yes, and include hoe many and which sensors :slight_smile:

Good afternoon @Elco . I sent an email to what I believe if your email address on November 30th but haven’t heard anything back yet. In case I have the wrong email, could you possibly email me directly via the contact info attached to my BrewPi Community account? I don’t know what the rules/etiquette of sharing email addresses in this forum is so I don’t want to overstep somehow.

I can’t find an email in any of my inboxes. You could also send a PM on the forum. I’ll give you a call tomorrow.