kiwid restarts after assertion failure in support/coroutines.cpp [fixed in v1.184]

edited April 2018 in Problems Now Fixed
Hello,

Just found quite a few records in /var/log/messages, suggesting that the kiwid restarts after assertion failure in support/coroutines.cpp:

Apr 22 18:39:30 kiwisdr rsyslogd-2007: action 'action 17' suspended, next retry is Sun Apr 22 18:41:00 2018 [try http://www.rsyslog.com/e/2007 ]
Apr 22 18:40:36 kiwisdr kiwid: 00:26:37.291 0123    3  7020.00 kHz lsb z2  "x.x.x.x" Saratov, Russia (ARRIVED)
Apr 22 18:42:43 kiwisdr kiwid: 00:28:44.766 012.    3  7045.23 kHz lsb z12 "x.x.x.x" Saratov, Russia (LEAVING after 0:02:18)
Apr 22 18:42:43 kiwisdr rsyslogd-2007: action 'action 17' suspended, next retry is Sun Apr 22 18:44:13 2018 [try http://www.rsyslog.com/e/2007 ]
Apr 22 18:43:23 kiwisdr kiwid: 00:29:24.747 0123    3  7020.00 kHz lsb z0  "x.x.x.x" Businovo, Russia (ARRIVED)
Apr 22 18:51:20 kiwisdr kiwid: 00:37:20.887 0123      TaskSleepID: pending_sleep =========================================
Apr 22 18:51:20 kiwisdr rsyslogd-2007: action 'action 17' suspended, next retry is Sun Apr 22 18:52:50 2018 [try http://www.rsyslog.com/e/2007 ]
Apr 22 18:51:20 kiwisdr kiwid: 00:37:20.903 0123      assertion failed: "ct->sleeping == FALSE" support/coroutines.cpp line 1574
Apr 22 18:51:31 kiwisdr kernel: [164727.096403] bone_capemgr bone_capemgr: part_number 'BB-SPI0-01', version 'N/A'
Apr 22 18:51:31 kiwisdr kernel: [164727.096435] bone_capemgr bone_capemgr: slot #20: override
Apr 22 18:51:31 kiwisdr kernel: [164727.096452] bone_capemgr bone_capemgr: Using override eeprom data at slot 20
Apr 22 18:51:31 kiwisdr kernel: [164727.096467] bone_capemgr bone_capemgr: slot #20: 'Override Board Name,00A0,Override Manuf,BB-SPI0-01'
Apr 22 18:51:31 kiwisdr kernel: [164727.097242] device-tree: Duplicate name in spi@48030000, renamed to "spidev@0#13"
Apr 22 18:51:31 kiwisdr kernel: [164727.134509] bone_capemgr bone_capemgr: slot #20: dtbo 'BB-SPI0-01-00A0.dtbo' loaded; overlay id #16
Apr 22 18:51:31 kiwisdr kiwid: 00:00:00.569 ....      KiwiSDR v1.179 --------------------------------------------------------------------
Apr 22 18:51:31 kiwisdr kiwid: 00:00:00.573 ....      compiled: Apr 19 2018 15:23:17
Apr 22 18:51:31 kiwisdr kiwid: 00:00:00.577 ....      -debian 8
Apr 22 18:51:31 kiwisdr kiwid: 00:00:00.577 ....      /etc/debian_version 8.7
Apr 22 18:51:31 kiwisdr kiwid: 00:00:00.577 ....      background mode: delaying start 30 secs...
Apr 22 18:52:01 kiwisdr kiwid: 00:00:30.610 ....      reading configuration from file /root/kiwi.config/kiwi.json: 139 tokens
Apr 22 18:52:01 kiwisdr kiwid: 00:00:30.615 ....      reading configuration from file /root/kiwi.config/admin.json: 73 tokens
Apr 22 18:52:02 kiwisdr kiwid: 00:00:31.003 ....      serial number from EEPROM: 1247
Apr 22 18:52:02 kiwisdr kiwid: 00:00:31.124 ....      reading configuration from file /root/kiwi.config/dx.json: 6956 tokens
Apr 22 18:52:02 kiwisdr kiwid: 00:00:31.127 ....      883 dx entries
Apr 22 18:52:02 kiwisdr kiwid: 00:00:31.140 ....      listening on default port 8073/8073 for "openwebrx"
Apr 22 18:52:02 kiwisdr kiwid: 00:00:31.144 ....      webserver for "openwebrx" on port [::]:8073
Apr 22 18:52:02 kiwisdr kiwid: 00:00:31.378 ....      ### using SPI_DEV
Apr 22 18:52:03 kiwisdr kiwid: 00:00:32.529 ....      FPGA version 1

Looks like this is recurring issue:

root@kiwisdr:~# grep assertion\ failed /var/log/messages
Apr 22 06:35:36 kiwisdr kiwid: 00:24:16.124 0123      assertion failed: "ct->sleeping == FALSE" support/coroutines.cpp line 1574
Apr 22 09:49:01 kiwisdr kiwid: 03:13:14.778 012.      assertion failed: "ct->sleeping == FALSE" support/coroutines.cpp line 1574
Apr 22 11:49:21 kiwisdr kiwid: 02:00:08.680 0.23      assertion failed: "ct->sleeping == FALSE" support/coroutines.cpp line 1574
Apr 22 13:27:19 kiwisdr kiwid: 01:37:46.848 012.      assertion failed: "ct->sleeping == FALSE" support/coroutines.cpp line 1574
Apr 22 14:51:18 kiwisdr kiwid: 01:23:47.137 0123      assertion failed: "ct->sleeping == FALSE" support/coroutines.cpp line 1574
Apr 22 15:09:25 kiwisdr kiwid: 00:17:56.342 01..      assertion failed: "ct->sleeping == FALSE" support/coroutines.cpp line 1574
Apr 22 18:13:47 kiwisdr kiwid: 03:04:10.845 0123      assertion failed: "ct->sleeping == FALSE" support/coroutines.cpp line 1574
Apr 22 18:51:20 kiwisdr kiwid: 00:37:20.903 0123      assertion failed: "ct->sleeping == FALSE" support/coroutines.cpp line 1574
Apr 22 18:53:38 kiwisdr kiwid: 00:02:07.843 01..      assertion failed: "ct->sleeping == FALSE" support/coroutines.cpp line 1574

Anyone else got this, or is it just me?

Ivan,
UR5VIB

Comments

  • There was a bug fixed in the locking stuff recently. But I've never seen this assertion fail before. If you don't mind I'd like to add some debugging to your Kiwi and see what I can find out.

  • Sure, John, go ahead, please.

    Ivan,
    UR5VIB

  • jksjks
    edited April 2018
    Okay Ivan, I've got a fix running on your Kiwi and so far it looks good. I was able to write some test code for my Kiwi that caused the assertion failure to occur. So I'm pretty sure that the bug I fixed is really the source of the problem. I'm not sure why it happens so often with your Kiwi. The problem occurs when the GPS acquisition task is started and stopped as the number of user connections changes between zero and greater than zero. That is the only case that uses a special case of the coroutine task sleep/lock code.

    I'll include this fix in the next update.

    UR5VIB
  • Thanks!

    I replaced the failed GPS frontend IC (bought on Aliexpress, not sure about it's quality) a few months before, may be it's somehow related to this assertion ocurrence.

    Ivan,
    UR5VIB

  • jksjks
    edited April 2018
    No, that shouldn't be related. There was definitely a bug in the task lock code for a special case. It's just that the case only ever occurs when the GPS acquisition is enabled or disabled as users make SDR connections. I was able to trigger the bug on my own Kiwi by artificially increasing the rate of occurrence of the special case.

    UR5VIB
  • Hi John,

    Found another assertion failure in /var/log/messages:

    Apr 24 09:33:20 kiwisdr kiwid: 06:38:06.599 0123      TaskSleepID: pending_sleep =========================================
    Apr 24 09:35:22 kiwisdr kiwid: 06:40:08.656 0123      TaskSleepID: pending_sleep =========================================
    Apr 24 09:38:08 kiwisdr kiwid: 06:42:54.699 0123      TaskSleepID: pending_sleep =========================================
    Apr 24 09:58:13 kiwisdr kiwid: 07:02:59.194 0123      TaskSleepID: pending_sleep =========================================
    Apr 24 09:58:47 kiwisdr kiwid: 07:03:33.403 0123      TaskSleepID: pending_sleep =========================================
    Apr 24 10:00:01 kiwisdr kiwid: 07:04:47.478 0123      TaskSleepID: pending_sleep =========================================
    Apr 24 11:18:34 kiwisdr kiwid: 08:23:20.670 0123      ca_pause -1594 0xfffff9c6 ca_shift=5588 code_creep=45110 code_period_ms=1 code_period_samples=16368
    Apr 24 11:18:34 kiwisdr kiwid: 08:23:20.674 0123      assertion failed: "ca_pause <= 0xffff" gps/channel.cpp line 279

    This one is less frequent:

    root@kiwisdr:~# grep channel.cpp /var/log/messages
    Apr 23 03:27:38 kiwisdr kiwid: 01:25:36.151 0...      assertion failed: "ca_pause <= 0xffff" gps/channel.cpp line 276
    Apr 24 00:40:49 kiwisdr kiwid: 03:27:48.405 .1..      assertion failed: "ca_pause <= 0xffff" gps/channel.cpp line 276
    Apr 24 11:18:34 kiwisdr kiwid: 08:23:20.674 0123      assertion failed: "ca_pause <= 0xffff" gps/channel.cpp line 279

  • Thanks for reporting this. As you can see I added some additional debugging in the latest release (the "ca_pause -1594 ..." stuff).

    I added the assertion a little while ago to make sure a hardware (FPGA) constraint wasn't getting overrun. If it is then a sat that is acquired may not be tracked properly. And we see this behavior sometimes. But this assertion has never occurred in my testing (or at least I've never caught it happening). So if I can understand what's going on and fix a bug this might result in greater tracking successes which would be very helpful.

    UR5VIB
  • Thanks John!

    Ivan,
    UR5VIB
  • I think I understand this issue now. v1.184 has a fix and will no longer panic.

    Andrew's original code assumed an upper bound in the delay between starting an acquisition and when the "code creep" correction had to be applied at the start of tracking (code creep is a result of sat doppler shift). On his GPS-only system that assumption was completely reasonable. On the Kiwi this bound is much larger because there is much more going on because of the presence of the SDR and a slightly different ca_pause calculation is required.

Sign In or Register to comment.