4 of 5 Kiwis won't stay on the LAN
At the Northern Utah WebSDR we have five identical KiwiSDRs (using BBGs), all running WSPRDaemon with a couple slots on each available for public use, and as of a few days ago, something strange started happening that I've not seen before:
Units 2-5 will disappear from the network - cannot be pinged locally ("Destination Host Unreachable") until, on the router, each respective port is disabled and then re-enabled ("flashed"). They will immediately come back online after this - temporarily: They will then, after just a couple of minutes, drop off the network again in the same order that the Ethernet ports were "flashed".
Unit #1 seems to be working normally.
What I have done:
- In the brief time that they are up, I looked at the /admin page logs and see no obvious errors. Unfortunately, I don't really have enough time to poke around on the command line and look at the various logs before they go offline - although knowing a specific log on which to concentrate should be possible.
- Forced a reboot of the Beagle itself on each of the units - no change in behavior.
- The site is quite remote and a physical removal of power has not yet been done. A site visit will not be possible until, perhaps, tomorrow.
I believe that all five units are running V1.451 with their Ethernet ports set to 10M FDX to minimize VHF QRM.
Any ideas?
TNX,
Clint, KA7OEI
Comments
To me, "flashed" is a very confusing term to be using when discussing network connections because I immediately think of "re-flashing" the software on the Beagle's onboard eMMC flash memory via sd card (or whatever). You're not talking about that, right?
Does your router use DHCP to assign local ip addresses to the Kiwis? (e.g. an ip range like 192.168.x.x)
Definitely cycle the power on all network equipment. I've seem so much strange behavior from consumer grade network gear it's just incredible. Recently around here people complained the WiFi went down at exactly 11pm interrupting their Netflix viewing (and long after I had gone to bed). A power cycle fixed it! Damn Huawei junk!!
For remote sites I would really recommend installing a device that allows remote power cycling and watchdog power cycling (e.g. can't ping 1.1.1.1).
Are you sure you need that force-10M option? I tried enabling that and my Kiwi refused to talk to the network at all. I think at the time it was plugged directly into a 48-port Juniper switch, so no "Huawei junk" in this case, but it's possible that the switch was configured not to allow 10M connections, I suppose. I never checked to see if that might be a thing those switches can do, or do by default. But I know the 10M option caused me problems, so maybe it's causing problems for you, too.
Another thing I'm wondering - are you using DHCP, and if so, are you using an older version of Windows for your DHCP server? There's a potential issue there that tripped me up at work recently that might apply here.
A bit more information:
Additional comments:
It's worth noting the KiwiSDR #2 has been a problem child for a while: It may be recalled that a year or so ago, it was stuck in a similar loop (I don't know about the excess heat on that occasion): It was only by interfacing via the OTG that a diagnosis could be made and the problem resolved. We'll attempt to do that again, but barring success with that we'll completely reload the firmware from an image. Given available time, the person who lives near-ish the WebSDR site will attempt to recover KiwiSDR #2 tomorrow.
While one can reasonably expect the occasionally non-determistic behavior from a computer now and then (something to do with Murphy) it is a bit alarming that 3 of the 4 KiwiSDRs got scrogged at precisely the same time - with a problem that apparently could be resolved only by powering down - for reasons that we'll likely never know.
Clint, KA7OEI
P.S. The "flash" term mentioned in the previous message relates to working for years in a company with network greybeards that had all been phone guys in a previous life - hence the nature of the reference to something that is momentarily cycled.
Agreed, that's pretty disturbing. Are these Kiwis run in 4-channel mode or more than that? I can't think of what would account for the heat. Unless, the Beagles were from the batch some number of years ago that had faulty Ethernet PHY chips. I had a couple here with that problem.
It might be worth buying a few BBG/BBBs replacements to test that theory and have as spares.
p.s. You mean "hook flash"? Wow, I had almost forgotten about that term. Respect for old time telecom though. https://en.wikipedia.org/wiki/Hook_flash
If it turns out to be faulty Beagles I'll reimburse you for their replacement (PayPal easiest for me). Mouser and DigiKey have plenty of BBGs.
For BBGs DigiKey will likely add the 25% import tax/fee and Mouser may not. Not sure about that. Mouser will likely make you fill out the dreaded EAR paperwork. So you promise not to export to China a product manufactured in China 🙄
It also looks like Mouser now sells a BBB manufactured by Seeed. Its catalog page doesn't have the EAR flag set which is odd. The BBB won't have the Grove connector of the BBG for powering a fan if that's how you're doing it now.
Clint,
just a few stupid questions, what is the voltage, is it solid, is the supply cable same length for all devices?
Is the Ethernet sheilded, antennas grounded, does the network switch/router have shielded ports?
It all seems a bit bizare that you could have that many faulty devices hit at once, so long after the Ethernet chip original issue.
To me there could be some supply - ground current path issue.
(I know same comment every post, but even a stopped clock is right twice a day)
Stu
Another site visit was made (not by me - it's a 120+km drive each way) - with enough time to take things apart and do a more thorough investigation:
The reason for Kiwis 3-5 to do what they did all at the same instant (except for #1) when conditions could not be ascertained to be quantifiably different than in the past when updates were done is unknown - but the reason for #2 is quite clear, now.
On the next "official" site visit (where there are several of us with a carload of gear and parts) I'll bump up the output of at least the dual 3-amp supply to about 5.25 volts - something that has been suggested here before.
Thanks for everyone who made comments and suggestions.
73,
Clint, KA7OEI
From personal experience 4.95 volts is probably the cause of your problems. I run my Kiwi at 5.2V.
73,
-Zyg- AF4MP
I don't understand how you are feeding 1-3, "diode OR" to me means Kiwi's 1-3 get 5V supply through a couple of diodes from two PSU's with a crowbar set at possibly 5.8V?
(OVERVOLTAGE PROTECTION:•PROVIDED. FACTORY SET AT 6.2VDC, +/-0.4VDC)
What sort of diode silicon or schottky? - I may be daft as I can't quite see how that has enough headroom to keep the supply at robust 5V (adjustment range not obvious from the spec sheet). Two Kiwi's might work but three is almost certain to push it into current fold-back at the worst possible moment (?).
To me if you had the ability to supply ~5.7-5.9 (5.2V + Diode drop), in this use you'd probably be better off with a timed relay (as suggested before on this forum) and some decent capacitance after the diode(s) so that the PSU gets a chance to settle before the Beaglebones loads it.
I might have misunderstood but two red flags to me there already, voltage and current even if everything is decent copper and corrosion free.
73
Stu
Also, while bulk capacitance is good, you have to be careful of it causing excessive rise time of the 5V presented to the Beagle at power on. The spec is 50 milliseconds max or the Beagle's PMIC will refuse to turn on.
I think that was Yuri's motivation for the fix.
http://forum.kiwisdr.com/index.php?p=/discussion/1762/bbg-and-some-psu-problem
I couldn't remember the thread while at work.
(the relay being the last thing in the DC chain before the Kiwi)
This is likely not your issue, but something that came up recently for me was Cisco's "port security" which rather inconveniently will shut down a port when MAC addresses are detected that it doesn't expect. In my case, we brought some new VMs online (networked through the hypervisor) and the switch shut it down! This was also a remote site so you can imagine how I feel about this "feature".
Anyway, certainly check switch/router logs and reboot them. The part that struck me about your report is how after you disabled and re-enabled the ports, the kiwis worked and then went down, serially, in the order the ports were enabled. To me, this speaks to a networking problem. Sure, it could be power too since the network connection draws current (more than you might think) but if you can rule that out, it sure sounds like some piece of networking gear trying to be smart or green...
Nick