4 of 5 Kiwis won't stay on the LAN

At the Northern Utah WebSDR we have five identical KiwiSDRs (using BBGs), all running WSPRDaemon with a couple slots on each available for public use, and as of a few days ago, something strange started happening that I've not seen before:

Units 2-5 will disappear from the network - cannot be pinged locally ("Destination Host Unreachable") until, on the router, each respective port is disabled and then re-enabled ("flashed"). They will immediately come back online after this - temporarily: They will then, after just a couple of minutes, drop off the network again in the same order that the Ethernet ports were "flashed".

Unit #1 seems to be working normally.

What I have done:

  • In the brief time that they are up, I looked at the /admin page logs and see no obvious errors. Unfortunately, I don't really have enough time to poke around on the command line and look at the various logs before they go offline - although knowing a specific log on which to concentrate should be possible.
  • Forced a reboot of the Beagle itself on each of the units - no change in behavior.
  • The site is quite remote and a physical removal of power has not yet been done. A site visit will not be possible until, perhaps, tomorrow.

I believe that all five units are running V1.451 with their Ethernet ports set to 10M FDX to minimize VHF QRM.

Any ideas?

TNX,

Clint, KA7OEI

Comments

  • To me, "flashed" is a very confusing term to be using when discussing network connections because I immediately think of "re-flashing" the software on the Beagle's onboard eMMC flash memory via sd card (or whatever). You're not talking about that, right?

    Does your router use DHCP to assign local ip addresses to the Kiwis? (e.g. an ip range like 192.168.x.x)

    Definitely cycle the power on all network equipment. I've seem so much strange behavior from consumer grade network gear it's just incredible. Recently around here people complained the WiFi went down at exactly 11pm interrupting their Netflix viewing (and long after I had gone to bed). A power cycle fixed it! Damn Huawei junk!!

    For remote sites I would really recommend installing a device that allows remote power cycling and watchdog power cycling (e.g. can't ping 1.1.1.1).

  • Are you sure you need that force-10M option? I tried enabling that and my Kiwi refused to talk to the network at all. I think at the time it was plugged directly into a 48-port Juniper switch, so no "Huawei junk" in this case, but it's possible that the switch was configured not to allow 10M connections, I suppose. I never checked to see if that might be a thing those switches can do, or do by default. But I know the 10M option caused me problems, so maybe it's causing problems for you, too.

    Another thing I'm wondering - are you using DHCP, and if so, are you using an older version of Windows for your DHCP server? There's a potential issue there that tripped me up at work recently that might apply here.

  • edited April 2021

    A bit more information:

    • A site visit was made today (not by me) and it was found that while KiwiSDR #1 was functioning normally, KiwiSDRs 2-5 were extremely warm to the touch, despite working fans.
    • KiwiSDRs 2-5 were unplugged for some time (10-20 minutes) and allowed to cool. When plugged in, 3-5 came up and so far, appear to be operating normally.
    • KiwiSDR #2 is still "flapping" around - that is, constantly dropping its LAN interface and coming back up for 10-20 seconds. During the brief time that it is up, it is refusing SSH connections.

    Additional comments:

    • KiwiSDRs 1-3 share the same power supply and switch, while KiwiSDRs 4 and 5 share a different power supply and switch.
    • In reviewing the log, it would appear that KiwiSDR #2 went offline at about 1014 MDT on 4/16 and never came back up - except for constantly bouncing as evidenced the network logs.
    • KiwiSDRs 1, 3, 4 and 5 all went offline at about 0435 MDT on 4/19 - apparently to do a firmware update (that coincides with when WSPRDaemon drops the connections every night, to "reset" Kiwirecorder sessions and allow pending updates to happen) and only #1 came back online after 10-15 minutes - presumably after an update. KiwiSDRs 3, 4 and 5 began "flapping" on the network from this point on, based on the thousands of network log entries.

    It's worth noting the KiwiSDR #2 has been a problem child for a while: It may be recalled that a year or so ago, it was stuck in a similar loop (I don't know about the excess heat on that occasion): It was only by interfacing via the OTG that a diagnosis could be made and the problem resolved. We'll attempt to do that again, but barring success with that we'll completely reload the firmware from an image. Given available time, the person who lives near-ish the WebSDR site will attempt to recover KiwiSDR #2 tomorrow.

    While one can reasonably expect the occasionally non-determistic behavior from a computer now and then (something to do with Murphy) it is a bit alarming that 3 of the 4 KiwiSDRs got scrogged at precisely the same time - with a problem that apparently could be resolved only by powering down - for reasons that we'll likely never know.

    Clint, KA7OEI

    P.S. The "flash" term mentioned in the previous message relates to working for years in a company with network greybeards that had all been phone guys in a previous life - hence the nature of the reference to something that is momentarily cycled.

  • Agreed, that's pretty disturbing. Are these Kiwis run in 4-channel mode or more than that? I can't think of what would account for the heat. Unless, the Beagles were from the batch some number of years ago that had faulty Ethernet PHY chips. I had a couple here with that problem.

    It might be worth buying a few BBG/BBBs replacements to test that theory and have as spares.

    p.s. You mean "hook flash"? Wow, I had almost forgotten about that term. Respect for old time telecom though. https://en.wikipedia.org/wiki/Hook_flash

  • jksjks
    edited April 2021

    If it turns out to be faulty Beagles I'll reimburse you for their replacement (PayPal easiest for me). Mouser and DigiKey have plenty of BBGs.

    For BBGs DigiKey will likely add the 25% import tax/fee and Mouser may not. Not sure about that. Mouser will likely make you fill out the dreaded EAR paperwork. So you promise not to export to China a product manufactured in China 🙄

    It also looks like Mouser now sells a BBB manufactured by Seeed. Its catalog page doesn't have the EAR flag set which is odd. The BBB won't have the Grove connector of the BBG for powering a fan if that's how you're doing it now.

  • Clint,

    just a few stupid questions, what is the voltage, is it solid, is the supply cable same length for all devices?

    Is the Ethernet sheilded, antennas grounded, does the network switch/router have shielded ports?

    It all seems a bit bizare that you could have that many faulty devices hit at once, so long after the Ethernet chip original issue.

    To me there could be some supply - ground current path issue.

    (I know same comment every post, but even a stopped clock is right twice a day)

    Stu

  • Another site visit was made (not by me - it's a 120+km drive each way) - with enough time to take things apart and do a more thorough investigation:

    • Kiwi #2 was found to have low supply voltage: We found that the power cable had been crushed/stretched (but not shorted) and it had rather high resistance: With 5.05 volts at the power supply, there was only 4.35 volts measured at the PCB connector. We ended up removing about half of the power cable for the time-being as we no longer trusted it. I'm guessing that #2 went offline coincidentally a few days before the others, largely for this reason.
    • As per suggestions, we did check the power supplies. Even under start-up load, nothing seemed amiss: The supply for Kiwis 1-3 is a pair of diode-OR'd 3 amp-rated linear supplies (International IHB5-3) and couldn't find anything strange about it (about 4.95 volts at the Kiwi PCB when it's up and running): We were wondering if one of the two supplies had crowbarred - but this wasn't the case. (We have had one of the two supplies crowbar in the past due to lighting and it was still enough to run the three Kiwis as the actual fold-back current of these 3 amp supplies appears to be a bit north of 4 amps).
    • I believe that the power supply for units 4 and 5 is a 4 amp switcher (with lots of added filtering on the AC input and DC output to avoid RFI and a lot of bulk capacitance on the output for impulse loads).
    • Just to be safe - and since it had been a problem child for a long time (even before it was used on the current, dual 3-amp ORed supply) we backed up the newest KiwiSDR (#5) and put the image on KiwiSDR #2, just in case something was amiss.

    The reason for Kiwis 3-5 to do what they did all at the same instant (except for #1) when conditions could not be ascertained to be quantifiably different than in the past when updates were done is unknown - but the reason for #2 is quite clear, now.

    • * *

    On the next "official" site visit (where there are several of us with a carload of gear and parts) I'll bump up the output of at least the dual 3-amp supply to about 5.25 volts - something that has been suggested here before.

    Thanks for everyone who made comments and suggestions.

    73,

    Clint, KA7OEI

    Powernumpty
  • From personal experience 4.95 volts is probably the cause of your problems. I run my Kiwi at 5.2V.

    73,

    -Zyg- AF4MP

  • edited April 2021

    I don't understand how you are feeding 1-3, "diode OR" to me means Kiwi's 1-3 get 5V supply through a couple of diodes from two PSU's with a crowbar set at possibly 5.8V?

    (OVERVOLTAGE PROTECTION:•PROVIDED. FACTORY SET AT 6.2VDC, +/-0.4VDC)

    What sort of diode silicon or schottky? - I may be daft as I can't quite see how that has enough headroom to keep the supply at robust 5V (adjustment range not obvious from the spec sheet). Two Kiwi's might work but three is almost certain to push it into current fold-back at the worst possible moment (?).

    To me if you had the ability to supply ~5.7-5.9 (5.2V + Diode drop), in this use you'd probably be better off with a timed relay (as suggested before on this forum) and some decent capacitance after the diode(s) so that the PSU gets a chance to settle before the Beaglebones loads it.

    I might have misunderstood but two red flags to me there already, voltage and current even if everything is decent copper and corrosion free.

    73

    Stu

  • Also, while bulk capacitance is good, you have to be careful of it causing excessive rise time of the 5V presented to the Beagle at power on. The spec is 50 milliseconds max or the Beagle's PMIC will refuse to turn on.

  • edited April 2021

    I think that was Yuri's motivation for the fix.

    http://forum.kiwisdr.com/index.php?p=/discussion/1762/bbg-and-some-psu-problem

    I couldn't remember the thread while at work.

    (the relay being the last thing in the DC chain before the Kiwi)

  • This is likely not your issue, but something that came up recently for me was Cisco's "port security" which rather inconveniently will shut down a port when MAC addresses are detected that it doesn't expect. In my case, we brought some new VMs online (networked through the hypervisor) and the switch shut it down! This was also a remote site so you can imagine how I feel about this "feature".

    Anyway, certainly check switch/router logs and reboot them. The part that struck me about your report is how after you disabled and re-enabled the ports, the kiwis worked and then went down, serially, in the order the ports were enabled. To me, this speaks to a networking problem. Sure, it could be power too since the network connection draws current (more than you might think) but if you can rule that out, it sure sounds like some piece of networking gear trying to be smart or green...

    Nick

Sign In or Register to comment.