New V3 keeps rebooting

I wonder whether it is related to this long standing bug:



Which just got fixed in particle firmware 0.7.0-rc2

The green blinking stopping for 30 seconds before the hard fault is why I think it could maybe be related.

But if the photon is right next to it with the same wifi, that maybe rules it out.

@mdma, any ideas?

Could that be caused if I have multiple AP’s?

Good question and I don’t know. We might have uncovered a new bug in the particle stack. If there is a way to test this, that would be interesting info. Can temporarily switch one of the AP’s to a separate SSID?

i’ll temporary turn one of the AP’s off, signal will diminish but still be fair, just not good.

Now that is interesting, just turned one AP off and the V3 just rebooted at the same time and associated with the other AP.

I’ll leave it like that for a few days and let you know if it reboots at all.

It is now moving between the other AP’s, here is a list of where the P1 has been associated

As you can see it just associated with Landing and a little later with Log Cabin, just the it also rebooted.

Mobility Trail (Last 7 days)
 
Access Points 
Date/Time 
Log Cabin
Fri, 28 Jul 2017 18:56:13 GMT
Landing
Fri, 28 Jul 2017 18:50:10 GMT
Living
Fri, 28 Jul 2017 16:45:41 GMT
Landing
Fri, 28 Jul 2017 13:54:56 GMT
Living
Fri, 28 Jul 2017 13:46:54 GMT
Landing
Fri, 28 Jul 2017 13:44:55 GMT
Living
Thu, 27 Jul 2017 15:47:06 GMT
Landing
Thu, 27 Jul 2017 14:34:06 GMT

INFO:pySerial.socket:ignored port configuration change
INFO:pySerial.socket:ignored port configuration change
Jul 28 2017 19:51:09 Error: controller is not responding anymore. Exiting script.
Jul 28 2017 19:51:39 Notification: Script started for beer 'Cellar-01'
Jul 28 2017 19:51:39 Connecting to controller...
Jul 28 2017 19:51:39 Opening serial port
DEBUG:pySerial.socket:enabled logging
INFO:pySerial.socket:ignored port configuration change
INFO:pySerial.socket:ignored _update_dtr_state(True)
INFO:pySerial.socket:ignored _update_rts_state(True)
INFO:pySerial.socket:ignored reset_output_buffer
INFO:pySerial.socket:ignored reset_output_buffer
Jul 28 2017 19:51:39 Checking software version on controller... 
INFO:pySerial.socket:ignored port configuration change
INFO:pySerial.socket:ignored reset_output_buffer
INFO:pySerial.socket:ignored port configuration change
Jul 28 2017 19:51:39 Found BrewPi v0.5.2 build 0.5.2-0-g72e633171, running on a Particle p1 with a V3 shield on port socket://192.168.0.147:6666?logging=debug

INFO:pySerial.socket:ignored port configuration change
INFO:pySerial.socket:ignored port configuration change
Jul 28 2017 19:55:20 Serial Error: read failed: [Errno 104] Connection reset by peer)
DEBUG:pySerial.socket:enabled logging
INFO:pySerial.socket:ignored port configuration change
INFO:pySerial.socket:ignored _update_dtr_state(True)
INFO:pySerial.socket:ignored _update_rts_state(True)
INFO:pySerial.socket:ignored reset_output_buffer
DEBUG:pySerial.socket:enabled logging
INFO:pySerial.socket:ignored port configuration change
INFO:pySerial.socket:ignored _update_dtr_state(True)
INFO:pySerial.socket:ignored _update_rts_state(True)
INFO:pySerial.socket:ignored reset_output_buffer
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/brewpi/backgroundserial.py", line 110, in __listen_thread
if self.writeln("") > 0:
File "/home/brewpi/backgroundserial.py", line 56, in writeln
return self.write(data + "\n")
File "/home/brewpi/backgroundserial.py", line 59, in write
self.exit_on_fatal_error()
File "/home/brewpi/backgroundserial.py", line 75, in exit_on_fatal_error
self.stop()
File "/home/brewpi/backgroundserial.py", line 38, in stop
self.thread.join() # wait for background thread to terminate
File "/usr/lib/python2.7/threading.py", line 931, in join
raise RuntimeError("cannot join current thread")
RuntimeError: cannot join current thread

Jul 28 2017 19:55:20 Lost serial connection. Cannot write to serial
Terminating due to fatal serial error
Jul 28 2017 19:55:51 Notification: Script started for beer 'Cellar-01'
Jul 28 2017 19:55:51 Connecting to controller...
Jul 28 2017 19:55:51 Opening serial port
DEBUG:pySerial.socket:enabled logging
DEBUG:pySerial.socket:enabled logging
INFO:pySerial.socket:ignored port configuration change
INFO:pySerial.socket:ignored _update_dtr_state(True)
INFO:pySerial.socket:ignored _update_rts_state(True)
INFO:pySerial.socket:ignored reset_output_buffer
INFO:pySerial.socket:ignored reset_output_buffer
Jul 28 2017 19:55:52 Checking software version on controller... 
INFO:pySerial.socket:ignored port configuration change
INFO:pySerial.socket:ignored reset_output_buffer
INFO:pySerial.socket:ignored port configuration change
Jul 28 2017 19:55:52 Found BrewPi v0.5.2 build 0.5.2-0-g72e633171, running on a Particle p1 with a V3 shield on port socket://192.168.0.147:6666?logging=debug

INFO:pySerial.socket:ignored port configuration change
INFO:pySerial.socket:ignored port configuration change

Good info, feels like we are narrowing down the cause! Especially if you turn off one AP and it causes the SOS immediately. That sounds like a repeatable test case :slight_smile:, which will make fixing the bug a lot easier.

This is defiantly repeatable, I have just rebooted Log Cabin at 20:06 and

Jul 28 2017 20:06:48 Lost serial connection. Cannot write to serial

ends up in the logs

Do both access points have the same SSID?
The Spark can remember multiple SSIDs, and will remember 5 networks if I remember correctly.

To remove all access points from the memory of the Spark and start fresh, you can hold the setup button for about 10 seconds until you get fast blue flashing. But if you have multipe APs with the same credentials, that’s not the problem here.

If the Spark decides to connect to a different access point and gets a new IP, then it is of course normal that the connection is lost. A hard fault should still not occur of course, but maybe we could detect that and prevent the hard fault.

Yes, they all broadcast the same SSID’s with active roaming between them, I’ll check the DHCP reservation is correct but as you say it should just loose connection not hard fault.

Bit more investigation, I have moved the Spark next to me so I can see what happens when the Spark roams between AP’s

The hard fault does not happen during AP switch, but the Web interface does loose connection.

Jul 28 2017 20:42:51 Connecting to controller...
Jul 28 2017 20:42:51 Opening serial port
DEBUG:pySerial.socket:enabled logging
DEBUG:pySerial.socket:enabled logging
DEBUG:pySerial.socket:enabled logging
DEBUG:pySerial.socket:enabled logging
DEBUG:pySerial.socket:enabled logging
DEBUG:pySerial.socket:enabled logging
DEBUG:pySerial.socket:enabled logging
DEBUG:pySerial.socket:enabled logging
DEBUG:pySerial.socket:enabled logging
DEBUG:pySerial.socket:enabled logging
Jul 28 2017 20:43:01 Errors while opening serial port: 

This is happening continuously even though I can ping the Spark.

If I manually reboot the Spark it all comes back and is working again.

“The hard fault does not happen during AP switch, but the Web interface does loose connection.”

Do you mean the script? Does it hard fault soon after?

Yes sorry the script, and no the hard fault does not happen soon after, I have just left it after a AP switch and 15 mins later it still will not recover unless I power cycle the Spark.

Think we might have a couple of issues that seem to be related but might not be.

This is the relevant code on the Spark:

0.7.0-rc2 has some interesting bug fixes that could be related:

@elco would you like me to install the pre-release firmware?

That’s only the system firmware from particle. I don’t think our user firmware is compatible, because its compiled against 0.5.3.

I think it would be a good test, but I think I’ll need to recompile the firmware for you. I expect breaking changes in the particle system firmware update, so I expect this to be non-trivial.

I’m deep in financial figures for the quarterly tax report, so I won’t have time for that this weekend.

Ok I won’t bother, It does seem to not be V3 specific during this testing I didn’t look at my V2 Spark which is also setup for wi-fi and that to has lost connection until I have rebooted it.

I assume it was moving between the AP;s during this testing swell.

I’ll temporarly config a seperate SSID that is only broadcast on one AP until you find a solution/fix.

If you need me to do any more testing let me know.

Rich

Great, thanks for the tests! You gave me a lot of info and I should be able to reproduce the scenario by setting up an extra router myself.

Still getting the Hard Fault, This is with no devices plugged in as I had left it on in the living room still in Beer Constant mode.

There was not roaming of the AP since 2034 yesterday.

Terminating due to fatal serial error
Jul 29 2017 05:08:05 Notification: Script started for beer 'Cellar-01'
Jul 29 2017 05:08:05 Connecting to controller...
Jul 29 2017 05:08:05 Opening serial port
DEBUG:pySerial.socket:enabled logging
INFO:pySerial.socket:ignored port configuration change
INFO:pySerial.socket:ignored _update_dtr_state(True)
INFO:pySerial.socket:ignored _update_rts_state(True)
INFO:pySerial.socket:ignored reset_output_buffer
INFO:pySerial.socket:ignored reset_output_buffer
Jul 29 2017 05:08:05 Checking software version on controller... 
INFO:pySerial.socket:ignored port configuration change
INFO:pySerial.socket:ignored reset_output_buffer
INFO:pySerial.socket:ignored port configuration change
Jul 29 2017 05:08:05 Found BrewPi v0.5.2 build 0.5.2-0-g72e633171, running on a Particle p1 with a V3 shield on port socket://192.168.0.240:6666?logging=debug

INFO:pySerial.socket:ignored port configuration change
INFO:pySerial.socket:ignored port configuration change
Jul 29 2017 05:31:04 Serial Error: read failed: [Errno 104] Connection reset by peer)
DEBUG:pySerial.socket:enabled logging
INFO:pySerial.socket:ignored port configuration change
INFO:pySerial.socket:ignored _update_dtr_state(True)
INFO:pySerial.socket:ignored _update_rts_state(True)
INFO:pySerial.socket:ignored reset_output_buffer
DEBUG:pySerial.socket:enabled logging
INFO:pySerial.socket:ignored port configuration change
INFO:pySerial.socket:ignored _update_dtr_state(True)
INFO:pySerial.socket:ignored _update_rts_state(True)
INFO:pySerial.socket:ignored reset_output_buffer
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/brewpi/backgroundserial.py", line 110, in __listen_thread
if self.writeln("") > 0:
File "/home/brewpi/backgroundserial.py", line 56, in writeln
return self.write(data + "\n")
File "/home/brewpi/backgroundserial.py", line 59, in write
self.exit_on_fatal_error()
File "/home/brewpi/backgroundserial.py", line 75, in exit_on_fatal_error
self.stop()
File "/home/brewpi/backgroundserial.py", line 38, in stop
self.thread.join() # wait for background thread to terminate
File "/usr/lib/python2.7/threading.py", line 931, in join
raise RuntimeError("cannot join current thread")
RuntimeError: cannot join current thread

Jul 29 2017 05:31:05 Lost serial connection. Cannot write to serial
Terminating due to fatal serial error
Jul 29 2017 05:31:35 Notification: Script started for beer 'Cellar-01'
Jul 29 2017 05:31:35 Connecting to controller...
Jul 29 2017 05:31:35 Opening serial port
DEBUG:pySerial.socket:enabled logging
DEBUG:pySerial.socket:enabled logging
INFO:pySerial.socket:ignored port configuration change
INFO:pySerial.socket:ignored _update_dtr_state(True)
INFO:pySerial.socket:ignored _update_rts_state(True)
INFO:pySerial.socket:ignored reset_output_buffer
INFO:pySerial.socket:ignored reset_output_buffer
Jul 29 2017 05:31:36 Checking software version on controller... 
INFO:pySerial.socket:ignored port configuration change
INFO:pySerial.socket:ignored reset_output_buffer
INFO:pySerial.socket:ignored port configuration change
Jul 29 2017 05:31:37 Found BrewPi v0.5.2 build 0.5.2-0-g72e633171, running on a Particle p1 with a V3 shield on port socket://192.168.0.240:6666?logging=debug

INFO:pySerial.socket:ignored port configuration change
INFO:pySerial.socket:ignored port configuration change

@elco, further update.

This thing just rebooted even while in off mode, what does that tell us?

As you can see from the first line “Temperature control disable” then an hour later two reboots in quick succession.

Jul 29 2017 06:27:00 Notification: Temperature control disabled
Jul 29 2017 06:27:00 Controller debug message: INFO MESSAGE 12: Received new setting: mode = o
Jul 29 2017 07:30:25 Error: controller is not responding anymore. Exiting script.
Jul 29 2017 07:30:55 Notification: Script started for beer 'None'
Jul 29 2017 07:30:55 Connecting to controller...
Jul 29 2017 07:30:55 Opening serial port
DEBUG:pySerial.socket:enabled logging
INFO:pySerial.socket:ignored port configuration change
INFO:pySerial.socket:ignored _update_dtr_state(True)
INFO:pySerial.socket:ignored _update_rts_state(True)
INFO:pySerial.socket:ignored reset_output_buffer
INFO:pySerial.socket:ignored reset_output_buffer
Jul 29 2017 07:30:55 Checking software version on controller... 
INFO:pySerial.socket:ignored port configuration change
INFO:pySerial.socket:ignored reset_output_buffer
INFO:pySerial.socket:ignored port configuration change
Jul 29 2017 07:30:55 Found BrewPi v0.5.2 build 0.5.2-0-g72e633171, running on a Particle p1 with a V3 shield on port socket://192.168.0.240:6666?logging=debug

INFO:pySerial.socket:ignored port configuration change
INFO:pySerial.socket:ignored port configuration change
Jul 29 2017 07:31:14 Serial Error: socket disconnected)
DEBUG:pySerial.socket:enabled logging
DEBUG:pySerial.socket:enabled logging
Jul 29 2017 07:31:24 Lost serial connection. Error: Could not open port socket://192.168.0.240:6666?logging=debug: timed out)
Terminating due to fatal serial error
Jul 29 2017 07:31:54 Notification: Script started for beer 'None'
Jul 29 2017 07:31:54 Connecting to controller...
Jul 29 2017 07:31:54 Opening serial port
DEBUG:pySerial.socket:enabled logging
INFO:pySerial.socket:ignored port configuration change
INFO:pySerial.socket:ignored _update_dtr_state(True)
INFO:pySerial.socket:ignored _update_rts_state(True)
INFO:pySerial.socket:ignored reset_output_buffer
INFO:pySerial.socket:ignored reset_output_buffer
Jul 29 2017 07:31:54 Checking software version on controller... 
INFO:pySerial.socket:ignored port configuration change
INFO:pySerial.socket:ignored reset_output_buffer
INFO:pySerial.socket:ignored port configuration change
Jul 29 2017 07:31:55 Found BrewPi v0.5.2 build 0.5.2-0-g72e633171, running on a Particle p1 with a V3 shield on port socket://192.168.0.240:6666?logging=debug

INFO:pySerial.socket:ignored port configuration change
INFO:pySerial.socket:ignored port configuration change

That rules out a big part of the code. So it makes it more likely that it’s WiFi related. To rule out more, can you test with no devices connected and installed and in off mode?

You can do a factory reset by clicking the button at the bottom of advanced settings to go to that state.