Spark v3 drops wifi

adempewolff · January 3, 2018, 3:58pm

I’m running my first test of my fermentation chamber which is controlled by a spark 3. I started the test run around midnight and a little less than 9 hours later the pi loses contact with the spark and cannot reconnect.

 Jan 03 2018 08:56:33   Error: controller is not responding anymore. Exiting script.
 Jan 03 2018 08:57:04   Notification: Script started for beer 'Constant water (start: ~23C) test'
 Jan 03 2018 08:57:04   Connecting to controller...
 Jan 03 2018 08:57:04   Opening serial port
 Jan 03 2018 08:57:36   Errors while opening serial port:
Could not open port socket://192.168.86.32:6666: [Errno 113] No route to host

When I noticed the problem (around 10am) I checked my wifi router and saw that the spark was not connected. I went to look at the spark and it was running, appeared to be continuing to run the control algorithm and green LED was breathing (which I thought meant that the photo was connected to the Internet but not the cloud.

Restarting the brewpi spark made it reconnect to the wifi, and the pi was able to connect. Interestingly, after reconnecting it made a strange decision to lower the fridge setting briefly before reverting to the original trend a few minutes later.

I have a few questions:

If this keeps happening, how can I collect sufficient information to debug the problem? (aka logs from spark 3)
How does the pi deal with logging outages, I notice that it correctly inferred that the heating remained on throughout the entire wifi outage despite the spark booting directly to an idle state (when it presumably sent the first new datapoint for logging).
Why the sudden change in fridge setting after the reboot? Was this a product of the spark (and thus the algorithm) rebooting, or a product of reconnecting to the pi for the first time in 90 minute?

Thanks in advance for any help or advice!

Best,
Austin

Elco · January 4, 2018, 8:01pm

I think I finally found a fix for the WiFi connectivity issues today. I’ll do a bit more testing and then release a new version.

The change in fridge setpoint is because of the reset, the integrator has to regrow to its stable version. I don’t think it heated continously in this case, the graph just interpolates.

adempewolff · January 5, 2018, 3:52am

Great news. Please let me know if there is anything I can do to help test the fix (or provide debugging info from the current firmware).

My spark lost wifi 3-4 more times over the past day. Interestingly, this morning it reconnected by itself after 2 hours. Unfortunately, the next time it disconnected in late afternoon I waited 4 hours and it still hadn’t reconnected by itself–so even if it sometimes might automatically reconnect, this behavior is inconsistent at best.

adempewolff · January 5, 2018, 6:30pm

I’ve been following the discussion(s) on the particle github and forums and your commits to the brewpi development branch. I’m excited to see if they work. I wonder if these changes will also make using the particle app to set up wifi less buggy… I seem to remember that it was often hanging when it tried and failed to connect to the cloud.

Elco · January 12, 2018, 9:57am

I have put version 0.5.4 online, which seems to resolve the wifi issues.
It is still tagged as pre-release, because I have to look at the update process.
Updating the system layer only works over DFU now.

You can already update by taking these steps:

Docker

Make sure the Spark has WiFi credentials and Internet access. This version requires an updated bootloader. The Spark will download this from the Particle cloud after the update. It will hang in safe mode until this has succeeded.
Log in to the docker host via SSH (putty). Do NOT go to the console in portainer. You want the console of the host, not the container.
Connect your BrewPi Spark to the raspberry pi with USB.
Run the command below to put it in DFU mode, then use ctrl-C to stop the container (it hangs)
Run the command again to restart the container. It now has access to the DFU device and updates the BrewPi Spark

docker run -it --name brewpi-dfu --privileged -v ~/brewpi-data:/data --rm brewpi/brewpi-raspbian sudo python utils/flashDfu.py --tag=0.5.4 --noreset --autodfu

adempewolff · January 14, 2018, 12:26am

Great, I just updated as instructed and it looks like everything works (it’s breathing Cyan, so it’s connected to the cloud at least). Interestingly, when it went into listening mode the cooling PID relay turned on. Luckily the update only took a minute or so and the relay turned off as soon as the spark 3 rebooted. I’ll keep an eye on it over the next couple days and let you know if the wifi connectivity problems have indeed disappeared.

Thanks!

Arnt · January 16, 2018, 8:58am

Hi.
Should it be like this after updating to 0.5.4?

BrewPi: wifiChecker: Attempt 1 to reach 192.168.10.1 failed (ti. 16. jan. 09:10:11 +0100 2018)
BrewPi: wifiChecker: Attempt 1 to reach 192.168.10.1 failed (ti. 16. jan. 09:40:12 +0100 2018)
Jan 16 2018 09:43:28 Error: controller is not responding anymore. Exiting script.
Exception in thread Thread-1 (most likely raised during interpreter shutdown):
Traceback (most recent call last):
File “/usr/lib/python2.7/threading.py”, line 810, in __bootstrap_inner
File “/usr/lib/python2.7/threading.py”, line 763, in run
File “/home/brewpi/backgroundserial.py”, line 87, in __listen_thread
File “/usr/local/lib/python2.7/dist-packages/serial/urlhandler/protocol_socket.py”, line 140, in in_waiting
: ‘NoneType’ object has no attribute 'select’
Jan 16 2018 09:44:03 Notification: Script started for beer 'test 0.5.4’
Jan 16 2018 09:44:03 Connecting to controller…
Jan 16 2018 09:44:03 Opening serial port
Jan 16 2018 09:44:03 Checking software version on controller…
Jan 16 2018 09:44:03 Found BrewPi v0.5.4 build 0.5.4-0-gb18a0ebdf, running on a Particle p1 with a V3 shield on port socket://192.168.10.74:6666

bpascucci · January 17, 2018, 1:14am

Same for me. Had one occasion where the BrewPi would not reconnect and needed to be restarted. I got the same error as above tonight.

Jan 15 2018 19:17:39 Error: controller is not responding anymore. Exiting script.
Exception in thread Thread-1 (most likely raised during interpreter shutdown):
Traceback (most recent call last):
File “/usr/lib/python2.7/threading.py”, line 810, in __bootstrap_inner
File “/usr/lib/python2.7/threading.py”, line 763, in run
File “/home/brewpi/backgroundserial.py”, line 87, in __listen_thread
File “/usr/local/lib/python2.7/dist-packages/serial/urlhandler/protocol_socket.py”, line 140, in in_waiting
: ‘NoneType’ object has no attribute 'select’
Jan 15 2018 19:18:11 Notification: Script started for beer 'None’
Jan 15 2018 19:18:11 Connecting to controller…
Jan 15 2018 19:18:11 Opening serial port
Jan 15 2018 19:18:11 Checking software version on controller…
Jan 15 2018 19:18:11 Found BrewPi v0.5.4 build 0.5.4-0-gb18a0ebdf, running on a Particle p1 with a V3 shield on port socket://192.168.86.35:6666

adempewolff · January 17, 2018, 1:50am

Strange. I don’t see any such errors in my logs and it’s now gone 3 days without dropping wifi once since the upgrade (compared to before when it would drop several times a day and sometimes never reconnect).

Some clarifying questions:

You upgraded according to Elco’s instructions above and the upgrade appeared to work correctly but then you encountered these errors afterwards. Is that correct?
It looks like it ultimately does connect at 19:18:11 about 30 seconds after the initial error. Is that correct?
Did the error just occur once (ie. the first time it tried connecting after the upgrade) or is it a re-occurring error? If the latter, how often does the error occur and under what circumstances (ie. “every 10 minutes” or “the first time it connects after rebooting the spark” or “about once a day but with no apparent pattern”)?

Arnt · January 17, 2018, 7:09am

In my case it dropped wifi for some seconds and reconnected again once every 20 minutes to an hour until it couldn’t reconnect and had to be restarted.
Kept on like this for a couple of days.
All this after updating according to instructions.
I have my spark in the garage (outdoor temperature in northern Norway, so its cold there ). I moved it half a meter to a warmer place and in the same room as the router and it has been running without errors for 12 hours now.

Arnt · January 17, 2018, 8:05pm

@elco: Hi,
I have experience that the Spark wont connect to wifi if for example I have had loss of electricity and the router restarts, it can try for hours to reconnect to wifi but cant do so. It has to be restarted to get wifi connection again. This has occurred several times since I updated to v.0.5.4. I have even seen it happen if the router is just restarting.
This suits me bad as I wish to remotely read the spark when I am gone to work and is away from home for 2 weeks at the time and I dont have the opportunity to restart the Spark
I often use to ferment beer when I am away since the beer usually ferments for 2 weeks.

adempewolff · January 17, 2018, 9:49pm

I tried recreating this by cycling the power on my router but didn’t see the same behavior (the spark reconnected immediately once the router was back up). Interestingly, the pi did not reconnect on its own and had to be manually restarted, but that sounds like a different problem from what you are experiencing.

Elco · January 17, 2018, 10:47pm

I think this might be a problem with the TCP server instead of WiFi. I have experienced this while testing and the device would still respond to ping, but not to TCP requests. This is an improvement over earlier versions in which WiFi would not be restored.

Could you confirm this? If that’s the case, I can try switching to a different TCP server implementation, because the one from Particle seems to have issues.

bpascucci · January 18, 2018, 12:44am

I can confirm that I was able to ping the Spark but it was not establish the socket connection.

Arnt · January 18, 2018, 9:03am

It happened again today when I had to reboot my router. Can’t reconnect to the Spark. I am as @bpascucci able to ping the spark but no socket connection is established
How can i restart the TPC server without rebooting RPI?

Elco · January 18, 2018, 9:17am

I don’t think restarting the Pi will work, does it? It is a problem in the BrewPi Spark.
I think this is a problem that I should solve in the code.

Arnt · January 18, 2018, 9:27am

No, restarting PI don’t help

adempewolff · January 22, 2018, 8:23am

Okay, I can confirm that the spark has stopped responding to TCP requests (while continuing to stay connected to wifi and respond to pings) twice since I updated (about once every 2 days). The first time, I barely noticed because I was in between batches and cycling power to the spark regularly. This time I’ve got a batch in primary fermentation now, so I don’t dare cycle power, but I’ll try that once fermentation cools down and expect it should start accepting TCP connections again.

@Elco do let us know if there is anything we can do to help debug. Thanks.

Edit: restarting the wifi network and forcing the spark to reconnect to wifi did not resolve the problem. was worth a try though…

overcookedTOFU · February 14, 2018, 3:49pm

@Elco, I am having the same problem with my BrewPi.

The script (running via docker) is unable to connect to the Spark via WiFi. The Spark will respond to ping and nmap suggests port 6666 is open on the Spark. Restarting the docker container or clicking “restart script” does not work. The only way to reconnect to the Spark is to power-cycle the device.

Is there any logging on the Spark TCP server that could be useful? Otherwise I agree, the problem sure seems like it is the TCP server on the Spark.

Please let me know if there is anything I can do to help debug. Thanks!

adempewolff · February 14, 2018, 4:42pm

I believe the working hypothesis is that the particle firmware removes all listeners (including TCP listeners) when Wifi resets and does not recreate them automatically. The solution is presumably to have the Brewpi code running on top of the boilerplate particle firmware check for a wifi reset in an intelligent way and recreate listeners when it happens. I’m guessing this is on Elco’s to-do list and we’ll see some sort of patch in the next couple months. I’m personally not rushing him because a) the problem is mostly just an inconvenience of occasionally loosing logging but maintaining temperature control and b) it looks like his attention is currently focused on the next version of the whole controller/ui architecture which looks to be making really exciting progress. If I have to wait a couple months for a wifi fix but in the meantime get multi-process control I will be a very happy camper.