RPi stops responding periodically

Hello all,

I am running Brewblox on a Raspberry Pi 3 Model B Rev 1.2. Recently I have had issues with is stopping responding. The web interface dies and I cannot connect on ssh. This happens more frequently during updates using apt, or updating with brewblox-ctl - from this I assume it is possibly the SD Card as this seems to be a common cause.

From StackExchange, https://raspberrypi.stackexchange.com/questions/49862/sd-card-related-random-freezes , someone commented that disabling swap helps, would that negatively impact Brewblox at all?

Another suggestion was: ā€œMore practically you should try implementing the watchdog timer (included on the SOC) to detect failure and perform a graceful restartā€ not sure if that is something which can be enabled in Brewblox.

Any other ideas of how to troubleshoot what might be causing this? I can easily buy a new card, but would like to confirm that this will fix it, I wish the RPi had SATA ports, I have a couple spare SSDs :slight_smile:

The /var/log/syslog leading up to where it appeared to be unavailable (and when I then restart it) are below:

Jun 22 20:59:02 fridgepi dhcpcd[511]: veth3a94e19: using IPv4LL address 169.254.193.216
Jun 22 20:59:02 fridgepi dhcpcd[511]: veth3a94e19: adding route to 169.254.0.0/16
Jun 22 20:59:02 fridgepi avahi-daemon[351]: Joining mDNS multicast group on interface veth3a94e19.IPv4 with address 169.254.193.216.
Jun 22 20:59:02 fridgepi avahi-daemon[351]: New relevant interface veth3a94e19.IPv4 for mDNS.
Jun 22 20:59:02 fridgepi avahi-daemon[351]: Registering new address record for 169.254.193.216 on veth3a94e19.IPv4.
Jun 22 20:59:04 fridgepi dhcpcd[511]: veth3a94e19: no IPv6 Routers available
Jun 22 21:00:55 fridgepi containerd[519]: time=ā€œ2020-06-22T21:00:55.031207044+01:00ā€ level=info msg=ā€œshim reapedā€ id=16f0cbf366fb04f6cc7de913a31c82857bff194c69ad1ece634621cb8ca45b52
Jun 22 21:00:55 fridgepi dockerd[521]: time=ā€œ2020-06-22T21:00:55.042214846+01:00ā€ level=info msg=ā€œignoring eventā€ module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Jun 22 21:00:55 fridgepi dhcpcd[511]: veth3a94e19: carrier lost
Jun 22 21:00:55 fridgepi dhcpcd[511]: veth3a94e19: deleting address fe80::74f9:47e:790d:ec6f
Jun 22 21:00:55 fridgepi avahi-daemon[351]: Withdrawing address record for fe80::74f9:47e:790d:ec6f on veth3a94e19.
Jun 22 21:00:55 fridgepi avahi-daemon[351]: Leaving mDNS multicast group on interface veth3a94e19.IPv6 with address fe80::74f9:47e:790d:ec6f.
Jun 22 21:00:55 fridgepi avahi-daemon[351]: Joining mDNS multicast group on interface veth3a94e19.IPv6 with address fe80::c882:2ff:fec3:5b7b.
Jun 22 21:00:55 fridgepi avahi-daemon[351]: Interface veth3a94e19.IPv6 no longer relevant for mDNS.
Jun 22 21:00:55 fridgepi avahi-daemon[351]: Leaving mDNS multicast group on interface veth3a94e19.IPv6 with address fe80::c882:2ff:fec3:5b7b.
Jun 22 21:00:55 fridgepi avahi-daemon[351]: Interface veth3a94e19.IPv4 no longer relevant for mDNS.
Jun 22 21:00:55 fridgepi avahi-daemon[351]: Leaving mDNS multicast group on interface veth3a94e19.IPv4 with address 169.254.193.216.
Jun 22 21:00:55 fridgepi avahi-daemon[351]: Withdrawing address record for fe80::c882:2ff:fec3:5b7b on veth3a94e19.
Jun 22 21:00:55 fridgepi avahi-daemon[351]: Withdrawing address record for 169.254.193.216 on veth3a94e19.
Jun 22 21:00:55 fridgepi dhcpcd[511]: veth3a94e19: deleting route to 169.254.0.0/16
Jun 22 21:00:55 fridgepi systemd[1]: run-docker-netns-1b0bbb8d6ec1.mount: Succeeded.
Jun 22 21:00:55 fridgepi dhcpcd[511]: veth3a94e19: removing interface
Jun 22 21:00:55 fridgepi systemd[1]: var-lib-docker-containers-16f0cbf366fb04f6cc7de913a31c82857bff194c69ad1ece634621cb8ca45b52-mounts-shm.mount: Succeeded.
Jun 22 21:00:55 fridgepi systemd[1]: var-lib-docker-overlay2-91358d1b953db5560cd5643f4e4ff491b13c47bc20de1a2384a30b8b3ae65c07-merged.mount: Succeeded.
Jun 22 20:17:05 fridgepi fake-hwclock[112]: Mon 22 Jun 19:17:01 UTC 2020
Jun 22 20:17:05 fridgepi systemd-fsck[138]: e2fsck 1.44.5 (15-Dec-2018)
Jun 22 20:17:05 fridgepi systemd[1]: Started udev Coldplug all Devices.
Jun 22 20:17:05 fridgepi systemd[1]: Starting Helper to synchronize boot up for ifupdownā€¦
Jun 22 20:17:05 fridgepi systemd[1]: Started Helper to synchronize boot up for ifupdown.
Jun 22 20:17:05 fridgepi systemd-fsck[138]: root: clean, 283538/1826816 files, 1999591/7303296 blocks
Jun 22 20:17:05 fridgepi systemd[1]: Started File System Check on Root Device.
Jun 22 20:17:05 fridgepi systemd[1]: Starting Remount Root and Kernel File Systemsā€¦
Jun 22 20:17:05 fridgepi systemd[1]: Started Remount Root and Kernel File Systems.
Jun 22 20:17:05 fridgepi systemd[1]: Condition check resulted in Rebuild Hardware Database being skipped.
Jun 22 20:17:05 fridgepi systemd[1]: Starting Flush Journal to Persistent Storageā€¦
Jun 22 20:17:05 fridgepi systemd[1]: Starting Load/Save Random Seedā€¦
Jun 22 20:17:05 fridgepi systemd[1]: Starting Create System Usersā€¦
Jun 22 20:17:05 fridgepi systemd[1]: Started Set the console keyboard layout.
Jun 22 20:17:05 fridgepi systemd[1]: Started Load/Save Random Seed.
Jun 22 20:17:05 fridgepi systemd[1]: Started Flush Journal to Persistent Storage.
Jun 22 20:17:05 fridgepi systemd[1]: Started Create System Users.
Jun 22 20:17:05 fridgepi systemd[1]: Starting Create Static Device Nodes in /devā€¦
Jun 22 20:17:05 fridgepi systemd[1]: Started Create Static Device Nodes in /dev.
Jun 22 20:17:05 fridgepi systemd[1]: Reached target Local File Systems (Pre).
Jun 22 20:17:05 fridgepi systemd[1]: Starting udev Kernel Device Managerā€¦
Jun 22 20:17:05 fridgepi systemd[1]: Started udev Kernel Device Manager.
Jun 22 20:17:05 fridgepi systemd[1]: Found device /dev/serial1.
Jun 22 20:17:05 fridgepi systemd-udevd[152]: Using default interface naming scheme ā€˜v240ā€™.
Jun 22 20:17:05 fridgepi systemd[1]: Found device /dev/mmcblk0p6.
Jun 22 20:17:05 fridgepi systemd-udevd[155]: Using default interface naming scheme ā€˜v240ā€™.
Jun 22 20:17:05 fridgepi systemd-udevd[152]: Process ā€˜/usr/sbin/alsactl -E HOME=/run/alsa restore 0ā€™ failed with exit code 99.
Jun 22 20:17:05 fridgepi systemd[1]: Listening on Load/Save RF Kill Switch Status /dev/rfkill Watch.
Jun 22 20:17:05 fridgepi systemd[1]: Condition check resulted in FUSE Control File System being skipped.
Jun 22 20:17:05 fridgepi systemd[1]: Condition check resulted in Huge Pages File System being skipped.
Jun 22 20:17:05 fridgepi systemd[1]: Condition check resulted in Set Up Additional Binary Formats being skipped.
Jun 22 20:17:05 fridgepi systemd[1]: Condition check resulted in Rebuild Hardware Database being skipped.
Jun 22 20:17:05 fridgepi systemd[1]: Starting File System Check on /dev/mmcblk0p6ā€¦
Jun 22 20:17:05 fridgepi systemd[1]: Starting Load/Save RF Kill Switch Statusā€¦
Jun 22 20:17:05 fridgepi systemd[1]: Started Load/Save RF Kill Switch Status.
Jun 22 20:17:05 fridgepi systemd-fsck[273]: fsck.fat 4.1 (2017-01-24)
Jun 22 20:17:05 fridgepi systemd-fsck[273]: 0x41: Dirty bit is set. Fs was not properly unmounted and some data may be corrupt.
Jun 22 20:17:05 fridgepi systemd-fsck[273]: Automatically removing dirty bit.
Jun 22 20:17:05 fridgepi systemd-fsck[273]: Performing changes.
Jun 22 20:17:05 fridgepi systemd-fsck[273]: /dev/mmcblk0p6: 236 files, 104463/516188 clusters
Jun 22 20:17:05 fridgepi systemd[1]: Started File System Check on /dev/mmcblk0p6.
Jun 22 20:17:05 fridgepi systemd[1]: Mounting /bootā€¦
Jun 22 20:17:05 fridgepi systemd[1]: Mounted /boot.
Jun 22 20:17:05 fridgepi systemd[1]: Reached target Local File Systems.
Jun 22 20:17:05 fridgepi systemd[1]: Starting Raise network interfacesā€¦

Not sure if that helps.

Thanks,
Jerry

@Bob_Steers just found the same recommendation to disable swap this week. So Iā€™ll let him weigh in on that.

It is possible to use a USB to SATA adapter run the OS from the USB drive.

In general, I also recommend using a big SD card. A 64GB card will have a lot more free space to do wear leveling and this will increase the life span. An SD card has a limited number of times each page can be erased/rewritten, when those are exhausted the card will go into read-only mode to prevent data loss. Or it might fail less neatly.

The network issues might also be related to container restarts, which seem to cause the entire network stack to reset.

I used the commands here to disable swap memory. Since then Iā€™ve been keeping an eye on it to check its effect. So far I havenā€™t seen any new hangups, so Iā€™m inclined to be positive about it.

Do try it out, and please let us know if there are any issues. The watchdog idea also isnā€™t bad. Iā€™ll look into that.

I have tried disabling swap and will see how it goes. Hopefully this will make things a little more stable, otherwise I may look into a USB to SATA interface - I think I have a few hanging around from old external drivesā€¦ :slight_smile:

Thanks @Elco and @Bob_Steers for your responses, prompt as always. I do have to wonder if you two ever sleep, though. A number of times I have thought ā€œItā€™s late, Iā€™ll just post this now and check tomorrowā€ and you respond before I have managed to do anything else! It is appreciated, but surely you need some down-time, too!

Thanks for the idea of disabling swap! I already bought a new SD card and upgraded the power to the official Rpi adapter AND connected the Spark to 12V, but my Rpi 3 keeps crashing as well. Letā€™s try this.

PS: Where can I see the current power levels of the Spark? I remember they were visible in Device info, but canā€™t find them anymore.

The Spark Pins block shows actual voltage for the 5V/12V sources.

1 Like