Spark service continuously reconnecting after CommaindTimeout() errors

I keep losing sync between the spark-one service and my brewpi.
Firmware has been flashed yesterday and I’ve reinstalled brewblox and the tilt integration from scratch to eliminate any prior config issues.

https://termbin.com/v3k3

Getting a lot of this sort of thing every 2-3 minutes…

spark-one_1  | 2021/02/24 17:54:08 INFO     ...evcon_spark.synchronization  Service synchronized!
spark-one_1  | 2021/02/24 17:55:11 ERROR    brewblox_service.repeater       <Broadcaster> error during runtime: CommandTimeout(ListObjectsCommand)
spark-one_1  | 2021/02/24 17:55:11 INFO     brewblox_devcon_spark.spark     Checking connection...
spark-one_1  | 2021/02/24 17:55:17 ERROR    ...blox_devcon_spark.commander  Error parsing message `7A0005BA|0000,010080FEFF81C4,02008000010A0C23002A000A473531383831381208636166356564646618063A083639356364626631420A323032302D31312D30324A0A323032302D31302D30380099,0300800101088FEAA31A10EBA5DA8106180128053001002C,0400800201008E,05008038010A08626161687977656828BEFFFFFFFFFFFFFFFF01320D3139322E3136382E342E323434000E,060080390108012040284000A2,0700803A010A1308021203DF2B351A08486561742050494470700A1308011203037CD51A08436F6F6C20504944706F0A1708031203D66C001A0C426565722053657474696E6758680A1408061203A3A3001A09426565722054656D7050650A17080412034C4C4C1A0C416D6269656E742054656D7050640A16080512034169E11A0B4672696467652054656D7050661218427265776572792038343535204B6567657261746F722031180100DD,13008040010A060A04080210010A0212000A021A000A042202080140020041,6400012E01088088082128FF7E5E9015012F0041,6500012E0108FF1F2128FF451C901503800035,6600012E010880800D2128FF1F8590150166002B,6700012F0110662880800A30EEFD0C38014080800A48015080C0025880800D00D1,6800012F011065288E4730FF1F3801408E4748015080C00258FF1F00E7,690001360110809F49004C,6A00013E01081310042A190A0408E0A7120A0410C0A9070A0B2209086910E0E5A401180100A6,6B00013E010813100118012A0E0A0C220A086910809F4918012001300100FD,6C00013301086A18C0EE6D386A40010058,6D00013301086B18904E2080803228808032386B4001488080320025,6E000137010A04108080050A080880F524108080050A080880DE341080E00520683096F3D9810600C3,6F000130010868106C28FF1F308E475801600168FFFF1870E0A80178880E8001BBA32888011D98018E67B0016CD8010600F3,70000130010868106D28FF1F308E473880803240808032580160016880803270E0A80178880E8001F8C65088013C98018E67B0016DD8010600DC` : ValueError(Unexpected message)
spark-one_1  | 2021/02/24 17:55:17 INFO     ...lox_devcon_spark.connection  HandshakeMessage(name='BREWBLOX', firmware_version='caf5eddf', proto_version='695cdbf1', firmware_date='2020-11-02', proto_date='2020-10-08', system_version='2.0.0-rc.1', platform='photon', reset_reason_hex='8C', reset_data_hex='02', device_id='23002A000A47353138383138', reset_reason='USER', reset_data='CBOX_RESET')
spark-one_1  | 2021/02/24 17:55:20 INFO     brewblox_service.repeater       <Broadcaster> resumed OK

Spotty wifi can trigger a soft hangup in the controller. It seems to deal badly with router channel switches, but we’re also aware of a bug in the Wifi network stack.
For the first, we recommend setting your router to a fixed channel. For the second, we’ll release a fix soon. If your Spark is close to the Pi, you may also just switch to USB for stability.

WiFi in that part of the basement is probably the “worst”, though only about 10 ft from an access point with a wall between… Signal strength for both devices are between -68 and -58 dBm for the last 24+ hours on both devices. From the client perspective, there has been no roaming or channel changes, and zero Tx drops/retries for either device, though the AP is reporting a fair amount (60-90 packets/sec) of Rx drops in the 2.4Ghz band. Probably due to the low tx power of the rpi2 and spark WiFi chipset/antennae?

First, I’ll relocate the rpi running brewblox to see if it’s the rpi that’s having the wifi issues. If that doesn’t do it, I’ll see if I can relocate the in-ceiling access point. Moving the fermentation chamber and attached brewpi is harder than the ceiling tiles and structured wiring. :slight_smile:

Is there a thread or issue for the wifi stack bug I can link to and keep tabs on?

Likely, but unconfirmed. I’ve been looking into it, but right now my data is n=1 in a system without any problems. I live in an apartment, so no bad wifi coverage spots either.
We’re adding some wifi diagnostics to brewblox-ctl log next release, and packaging some tool people can run to collect wifi stats. We have no feasible way to collect this data for the controller though - memory restraints are much too severe.

I put up an issue for it with informal discussions from slack.

I have just merged some fixes that seem to help.
It is running on our test setups now and will be released soon.

wonderful. Hope the tests pass. Thanks Elco.