I’m running multiple Spark4s and for months things have been super smooth. However, 2 days ago one of two temp sensors connected to “CCT 21/22 Spark” lost connection with the error “OneWire Sensor could not be read.” I thought maybe the senor itself had crapped out so I borrowed one from another Spark4 (BBT Spark) to see if I could replace it. No luck. I tried Discovering New OneWire Blocks but nothing was discovered. I power cycled the Spark4 and after doing so still didn’t see the sensor and had an error on the GPIO Module: SPI error. The other temp sensor on CCT 21/22 Spark still works.
To make things more wonky, I tried to return the borrowed sensor to BBT Spark and it doesn’t show up. I also tried to connect the originally wonky sensor from CCT 21/22 Spark to BBT Spark and it also isn’t discovered.
I have tried resetting everything, Stopping and Starting all services, and power cycling everything. I also removed all blocks on BBT Spark and tried discovery again. No joy.
Hoping you can help and that I haven’t fried 2 temp sensors, and/or my GPIO module.
I tried to perform a coredump, but I got the following error which is new to me (used to be on a pi as a server and am not on a Linux PC):
A fatal error occurred: Could not connect to an Espressif device on any of the 0 available serial ports.
Command 'sudo -E env "PATH=$PATH" esptool.py --chip esp32 --baud 115200 read_flash 0xA10000 81920 coredump.bin' returned non-zero exit status 2.
Will try for a coredump via USB shortly. In the meantime:
If a single OneWire sensor dies, it will block all other sensors from being read. If you plug in a single sensor, does it then work?
A single sensor works on both CCT and BBT Sparks still. The potentially broken one isn’t read on either, nor is the one that was working on the BBT. Is it accurate then that if a OneWire sensor dies, then it will block “new” sensors from being read. That could explain what I’m seeing. On the CCT Spark the sensor may have died, making it unable to read a working sensor. I then tried plugging the potentially broken sensor into the BBT Spark while the working sensor was disconnected. When I reconnect the working sensor it isn’t readable so the broken one may have blocked the BBT Spark’s ability to read other sensors. Again, both BBT and CCT sparks have one working sensor connected.
If this is the case, is there something I can/should do to make the GPIO units able to read working sensors?
To be more precise: if a OneWire device fails, the OneWire bus master will be unable to interact with any connected devices. This includes reading existing sensors and discovering new ones.
However, we have a OneWire bus per GPIO module. If a broken sensor is connected to module 1, it will not block module 2.
Would this explain the symptoms, or are the issues position-independent?
I can add that both the Spark and the GPIO Module feel more than warm and a bit less than hot to the touch. I don’t expect that’s supposed to be the case.
Would this explain the symptoms, or are the issues position-independent?
It seems that the issues are position-independent. I moved the working sensor to different ports on the GPIO module on CCT21/22 and it still works. I hesitate to mess around with the working sensor on BBT Spark as we need it to work and I don’t wanna mes it up as well.
Hey guys,
Any thoughts on this? I’m starting to think the GPIO module crapped out. Before ordering a new one (and sensors) I’d be interested in knowing your thoughts.
Thanks for your patience - it took a while to get around to looking into it. (parsed coredump output: https://termbin.com/araf)
It looks like you faced a heap memory overflow. The upcoming release already includes some improvements that drastically reduce memory usage when you have a lot of blocks. Could you please export and upload the blocks on your controller so we can check whether this theory is plausible for your setup?
Thanks for taking the time. Attached you can find the blocks. At the moment the one working sensor is shared between the two setpoint blocks as there isn’t another working sensor. Obviously this isn’t how it would be if I had both sensors working.
We don’t have a ton of blocks, but you’ll let me know what you think.
The number and type of blocks is nowhere near enough to cause a heap overflow on its own. The error itself is located in the network stack. It’s very unlikely to be caused by a faulty GPIO module.
If I understand correctly, the current state is:
Both CCT and BBT Spark have a single GPIO module
sensor 1 → still plugged in on CCT, works fine
sensor 2 → was plugged in on CCT, now unreadable on both CCT and BBT
sensor 3 → was plugged in on BBT, now unreadable on both CCT and BBT
sensor 4 → still plugged in on BBT, works fine
I’m indeed leaning to suspect hardware failure, but @Elco is the expert on that.
It’s very unlikely to be caused by a faulty GPIO module.
Is this to say you think it is on the Spark 4? I was reallyhoping it was the GPIO module as that is significantly cheaper to replace.
A small amendment/update to your summary of the current state:
Sensor 3 is working again on the BBT (not the CCT). I didn’t do anything in particular to get it to work again, but maybe a power cycle here and there helped… So the curent state is:
Both CCT and BBT Spark have a single GPIO module
sensor 1 → still plugged in on CCT, works fine
sensor 2 → was plugged in on CCT, now unreadable on both CCT and BBT
sensor 3 → was plugged in on BBT, unreadable on CCT, was unreadable on BBT after connection to CCT but is now miraculously readable again on BBT
sensor 4 → still plugged in on BBT, works fine
Is there anything I can do on my side to help in determining if this is a hardware failure?
Just checking in again to see if there is anything I can do on my side to check the hardware side of things. It’s getting kind of time critical that I find a solution as I’m relying on Inkbirds for my temp control and we’ve had some bad experiences with them in the past.
Following up again on this as it feels it has fallen through the cracks. It’s quite disappointing as this forum has traditionally been exceptionally helpful and speedy in troubleshooting issues.
I am hesitant to hook up a GPIO module from another Spark in case it gets “blown up” by the CCT Spark. If you think this would be a safe test, I’m glad to try it.
The Spark and the GPIO module are built to withstand both overvoltage and overcurrent safely. I’m really not qualified to predict scenarios where this protection is insufficient.
I pinged Elco again to look at this. He has been swamped in hardware work for the last few weeks. Our apologies for the delay.
This can use some clarification: the heap overflow spotted in the logs does not match the other symptoms. It is most likely unrelated.
Good afternoon @Elco . I sent an email to what I believe if your email address on November 30th but haven’t heard anything back yet. In case I have the wrong email, could you possibly email me directly via the contact info attached to my BrewPi Community account? I don’t know what the rules/etiquette of sharing email addresses in this forum is so I don’t want to overstep somehow.