Modify

Opened 4 years ago

Last modified 3 months ago

#12372 reopened defect

ar71xx/ath9k (WRT160NL): wifi client's connection quality suddenly drops after 18-24h hostapd uptime

Reported by: dap@… Owned by: nbd
Priority: normal Milestone: Barrier Breaker 14.07
Component: kernel Version: Attitude Adjustment 12.09 Beta
Keywords: ar71xx ath9k wrt160nl loss Cc:

Description

After I upgraded from a more than 8 months old trunk release to AA-beta1 on my WRT160NL I found wifi client's connection quality suddenly drops after 18-24h hostapd uptime. The quality drop means 10-20% pkt loss and high RTT. The clients are within 5 meters, the wifi traffic is very light, the noise is low on the floor. My wrt configuration almost default.

I began to monitor some wifi parameters with munin recently, I found the following on the WRT AP when this problem kicks in:

  • iw station dump: rx bitrates dropped to 1.0 MBit/s, tx bitrates are vary but does not fall so low
  • iw station dump: tx retries are counting 4 times faster than tx packets (!) and tx failed is 10-20% of tx packets
  • iw station dump: signal strength of clients dropped *just a little* (about -10%) so this is not the reason itself because sometimes there's much lower signal levels without such issues
  • iw survey dump: does not change, sum of times (except 'active') in the used channel is always under 200ms per second
  • hostapd does not log anything that worth to mention (even with -dd), only the usual "group key handshake completed" messages
  • the clients does not dis- or reconnect

Last time this issue (probably) triggered by me when I arrived home with my smartphone but currently I am not sure that the trigger is always a new client - I keep my eye open from now.

A simple hostapd restart *always* resolves the problem for 16-24 hours. Client reboot/reconnect does not help.

AA-beta2 is under testing right now. My wireless config and a station dump snapshot attached. Can I help you more?

Attachments (4)

uci_wireless.txt (511 bytes) - added by anonymous 4 years ago.
station_dump.txt (1.7 KB) - added by anonymous 4 years ago.
wlan0_station_dump.txt (736 bytes) - added by Francesco Lotti <francesco@…> 3 years ago.
wlan0 station dump
wlan0.sta1_station_dump.txt (374 bytes) - added by Francesco Lotti <francesco@…> 3 years ago.
wlan0.sta1 station dump (WDS)

Download all attachments as: .zip

Change History (167)

Changed 4 years ago by anonymous

Changed 4 years ago by anonymous

comment:1 Changed 4 years ago by stefan.joosten+openwrt@…

My box, TP-Link TL-WR1043ND, possibly suffers from the same problem.
Connected clients remain connected, rx bitrates drop too.

Another thing happens over here which is not on your list: new clients are unable to connect!
Tested two smartphones, a laptop with Broadcom chip using Windows and another laptop using Atheros 9k chip and Linux (Debian Wheezy).

Solved the problem by restarting the wireless. (wifi down; wifi up).

Not at home, so providing dumps/logs I can't do now.
But I can provide what you ask of me when I get there.

comment:2 Changed 4 years ago by dap@…

I'm done with AA-beta2 testing: failed too. I began to test different trunk revisions. I pick up randomly between head and 29289 - the last one served me nicely for months. I'll update this ticket with my experiences;

r33917 - same issue as AA-beta2
r32510 - very high tx retries&failed constantly, but rx bitrates is not dropped to 1.0 MBit/s. hostapd restart does not help. It is not exactly the same problem, but the retries- and fail / txpkts rate matches (~ 350%/15%).

comment:3 Changed 4 years ago by Rascas

I have the same problem too in 12.09-beta. TP-Link TL-WR1043ND in Client+VAP mode. I can see wireless dropping and staying at 1Mbit (rx bitrates only) in a uptime of more or less 24 hours but sometimes only 2-3 hours is needed to this happen. This is my first time with openWRT, i could configure everything that i needed, this is the only problem. Can post some logs but someone have to say what i must do, because i dont know. Im testing beta2 now.

comment:4 Changed 4 years ago by Roland Pallai <dap@…>

r29294 - seems like works fine (again). There is some "ath: Could not stop RX" messages in dmesg, but no problem with wifi connections after 23 hours, yet.

Guys, try r29294 if you can. Probably you're hitting the same bug as here if that release fixes your problems too.

(My next try is r30403 with kernel 3.2)

comment:5 Changed 4 years ago by Rascas

The same happens in beta2.

comment:6 Changed 4 years ago by Roland Pallai <dap@…>

r30403 and r31337 are OK. (Next try is r32055)

comment:7 Changed 3 years ago by riproute@…

Any progress on this? I am seeing the same issue but only with certain environments. I am testing the same build on 8 identical access points and only see the failure in one of those environments. I've changed out the hardware in that environment and still see the issue.

comment:8 Changed 3 years ago by Roland Pallai <dap@…>

I'm still in testing. Unfortunately 24 hours is not enough for a test run - once it came after 43 hours.

Now I suspect the evil patch is between r32420 and r32510. I have to make sure then have to revert patches one by one and run tests over and over. It takes weeks to see results but I have no other chance. I'll report on progress.

comment:9 Changed 3 years ago by stefan.joosten+openwrt@…

Could very well be Roland.
The issue seems to have been resolved for me.
I'm a bit unsure what revision I used during the time I experienced the issue.

Rebuild using more recent code and I've been running Attitude Adjustment branch r33969 for 7 days without any problems now. I'm going to refresh my sources now and compile a new build using it.
So far so good :)

comment:10 Changed 3 years ago by Roland Pallai <dap@…>

Stefan, thanks for the hint. There's some interesting fixes in the log, specifically r33989 and r33992. I checked out changes between beta2(r33883) and what you use(r33969), but I didn't find a change that targeted my problem - but maybe I'm wrong, so I'll try the latest trunk next.

comment:11 follow-up: Changed 3 years ago by Roland Pallai <dap@…>

r34123 (latest trunk) failed after 2d 22h, the symptoms are same.

comment:12 Changed 3 years ago by Roland Pallai <dap@…>

comment:13 Changed 3 years ago by vadim@…

Also have same problem. On TL-WR941N OpenWrt Attitude Adjustment 12.09-beta2 r33883

comment:14 Changed 3 years ago by elvstone@…

I can confirm problems on my WRT160NL with r34325 that I built today. I'm guessing they might be the same you're seeing, but possibly there's some crash involved as well. Today after starting the AP up I tried a 2 GB torrent download and it went fine, but then after a little while the connection got flaky and then the clients got disconnected. Had to reboot.

It seems ath9k is really finicky :(

I'm new to OpenWRT, but if I get instructions on how to make useful information for the bug report I'll see what I can do.

comment:15 Changed 3 years ago by elvstone@…

I should note that I don't have to wait 1-2 days to get problems. They usually show up after minutes/hours and quite randomly. They don't seem to be related to the amount of traffic. Like I said, a 2 GB torrent was downloaded successfully in a few minutes, and then during the slow traffic period that followed the connection got flaky again and finally all clients were disconnected.

comment:16 Changed 3 years ago by elvstone@…

Is anyone currently running a WRT160NL on a revision that works without problems? If so, I'm willing to try a bisect to find the exact offending commit. But perhaps there are no (completely) good revisions?

comment:17 in reply to: ↑ 11 Changed 3 years ago by anonymous

Replying to Roland Pallai <dap@…>:

r34123 (latest trunk) failed after 2d 22h, the symptoms are same.

Yup, a more recent AA build failed for me as well on TL-WR1043ND.
Although I just get very slow WiFi. All RX rates on the router drop to 1 mbit, which kills the experience. Restarting the wireless (wifi down; wifi up) fixes it and everything works like it should.

I noticed the problems start occurring after some decent to heavy load with for example a torrent client making lots of connections. I do everything over Ethernet, but my roommates are stubborn and everything is WiFi for them...

comment:18 Changed 3 years ago by anonymous

was me: Stefan <stefan.joosten+openwrt@…>

comment:19 follow-up: Changed 3 years ago by Roland Pallai <dap@…>

Good news, I found the root of my problem! My WRT160NL uptime is 6 days without issues. ;)

The buggy patch is: https://dev.openwrt.org/browser/trunk/package/mac80211/patches/562-ath9k_reduce_ani_interval.patch?rev=32510
I reverted the patch on r34287 and the problem has gone.

There is an easy workaround for this:

echo 1 >/sys/kernel/debug/ieee80211/phy0/ath9k/disable_ani

Although I didn't try it yet, it should fix this issue too. Try it and report - I will try this workaround too.

comment:20 Changed 3 years ago by elvstone@…

That's great. I have enabled the workaround and will report if I see any problems in the coming days. The connection has actually been surprisingly stable in the last few days, though my girlfriend who was home yesterday said that it was slow and she got disconnected once.

comment:21 Changed 3 years ago by stefan.joosten+openwrt@…

I resorted to restarting wireless during the night, which does help a bit.

Will try the disable ANI workaround as well.
Funny enough I had found out about the same workaround last week, just hadn't gotten around to trying it yet.

But I do hope some OpenWRT devs check this out and find some middle ground in ANI that works. Because disabling ANI can be detrimental for performance can it not?

comment:22 Changed 3 years ago by elvstone@…

I'd just like to mention that the disable ANI workaround seems to have worked for me. No problems in the last 5 days. I don't know what impacts on performance this has, since I've never had a flawless connection until now. At the moment I think I'm getting ~5-6 MB/s, which I think is okay if not great. Would also like the problem to be fixed at its root.

comment:23 Changed 3 years ago by nbd

  • Owner changed from developers to nbd
  • Status changed from new to accepted

comment:24 Changed 3 years ago by stefan.joosten+openwrt@…

Boy, that was quick. :-)
Testing the disable_ani on tl-wr1043nd. will report back in a couple of days. If there is anything else I can try or should provide, let know.

comment:25 Changed 3 years ago by Roland Pallai <dap@…>

disable_ani workaround also worked for me too; 5d9h+ uptime without issues on (unpatched) r34287.

I don't know the performance impact of the reverted patch nor disable_ani, I'm not really interested in wifi performance now.

Changed 3 years ago by Francesco Lotti <francesco@…>

wlan0 station dump

Changed 3 years ago by Francesco Lotti <francesco@…>

wlan0.sta1 station dump (WDS)

comment:26 Changed 3 years ago by Francesco Lotti <francesco@…>

disable_ani didn't work here. I'm using AR5416 on Asus WL500GP in Access Point (WDS) mode.

After a few hours clients get disconnected and the AP is not willing to accepting any new connection. Strange thing is that the client connected to the AP in WDS mode continues to work without problems.

comment:27 Changed 3 years ago by nbd

what version of openwrt is that?

comment:28 Changed 3 years ago by Francesco Lotti <francesco@…>

AA beta2 .

comment:29 Changed 3 years ago by nbd

ok, then update to rc1, it might fix your issue

comment:30 Changed 3 years ago by vadim@…

So it fixed in rc1, or just workaround will work?

comment:31 Changed 3 years ago by Francesco Lotti <francesco@…>

Ok thanks. I'll upgrade asap:-)

comment:32 Changed 3 years ago by Francesco Lotti <francesco@…>

Ok, meanwhile I swapped my AR5416 (ath9k) minipci with a AR5413/AR5414 (ath5k) one.
Everything seemed to work properly until today when clients suddenly disconnected and new clients weren't able to connect anymore.
So ath5k seems to have same problems of ath9k, at least on AA beta2.

# iw wlan0 station dump
Station 00:19:d2:XX:XX:XX (on wlan0)

inactive time: 388 ms
rx bytes: 153449
rx packets: 2475
tx bytes: 20941
tx packets: 161
tx retries: 0
tx failed: 0
signal: -58 dBm
signal avg: -58 dBm
tx bitrate: 1.0 MBit/s
rx bitrate: 54.0 MBit/s
authorized: yes
authenticated: yes
preamble: long
WMM/WME: yes
MFP: no
TDLS peer: no

comment:33 Changed 3 years ago by nbd

I thought you wanted to update... I already told you the problem of no clients being able to connect anymore is fixed in rc1. It's not drivers specific by the way, it's a mac80211 issue.

comment:34 Changed 3 years ago by anonymous

Hello

Has anyone else noticed a problem functioning router wr1043nd v1.8 for pre-OpenWRT Backfire 10.03.X?
Previously I was using ddwrt, later I upgraded back to the original firmware. Until then, everything is ok, when I upgraded OpenWRT Backfire 10.03.1 I had problems with inactivity wan port. After switching to ddwrt I found that even there wan port does not work. (work only on the original firmware). After the upgrade attitude adjustment 12.9-rc1, the wan port into operation when I tried to install the original firmware but I brickit. After a short time I'm using a RS232 port successfully debrick router. I found that the router does not work the same as before in the ddwrt still not working WAN port, problems with Wifi as you have mentioned are present. What I wonder is functioning LEDs startup (boot) is not the same as compared to the factory settings (original firmware).
Does anyone know how to restore a record in the rom chip??

comment:35 Changed 3 years ago by Francesco Lotti <francesco@…>

nbd, It happened that I didn't have the time to upgrade so I tried to replaced the card.
Now I just flashed AA rc1 and everything seems fine again.

comment:36 Changed 3 years ago by karsten.bier@…

I am seeing diconnects too, but the connection returns pretty quickly after a few seconds.
I'm on AA-rc1 too, on a TP-Link TL-WR1043N/ND v1.
I tried the disable_ani workaround, but it didn't work.
For checking if the connection is stable i start a ftp-transfer. The connection speed is very good in HT40, while it lasts. I'm getting up to 10 mb/s in HT40 and something around 6 mb/s in HT20.

For now i can only use 802.11g. There was a bug in kamikaze before which lead to drops in connection speeds, but that seems fixed now.

here's a snippet from the kernel log when the dosconnects happen in 802.11n:
Dec 23 13:06:16 OpenWrt daemon.info hostapd: wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: disconnected due to excessive missing ACKs
Dec 23 13:06:46 OpenWrt daemon.info hostapd: wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
Dec 23 13:06:47 OpenWrt daemon.info hostapd: wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: authenticated
Dec 23 13:06:47 OpenWrt daemon.info hostapd: wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: associated (aid 1)
Dec 23 13:06:47 OpenWrt daemon.info hostapd: wlan0: STA xx:xx:xx:xx:xx:xx WPA: pairwise key handshake completed (RSN)

as you can see, it only takes about 30 seconds and everything is fine again.
i would love to see this fixed in the release. if needed i can provide more info or test a newer trunk version.

comment:37 Changed 3 years ago by karsten.bier@…

I tried again with two clients connected, but only he one with a high network load (doing the ftp transfer) got disconnected.
Can anybody confirm that behaviour ?

comment:38 Changed 3 years ago by anonymous

using AA rc1, same behaviour (wr1043nd) as karsten:
with light load I get nearly no problems (uptime of a week or so), but using a client with high network load causes disconnect (sometimes tens of minutes, sometimes it takes couple hours).

comment:39 Changed 3 years ago by anonymous

Just wanted to report my findings. It seems the disconnect issue is cleared up for me on RC1. However I'm still getting the slowdown issues mentioned earlier. Most of the RX rates drop to 1Mbit. Doing the wifi up/down command clears the problem.

TL-WR1043N
AA-RC1

comment:40 Changed 3 years ago by stefan.joosten+openwrt@…

That is the same behaviour my router shows. It slows down to 1 mbit, making it extremely slow and unusable from a user perspective. Restarting wifi solves this problem until it occurs again.
Using the disable_ani workaround keeps WiFi working here, but I get a whole lot more "Could not stop TX" errors. WiFi continues to work though.

I've noticed there are some new backported fixes to ath9k, but I'm unsure if those possibly address this problem. I will probably compile from AA branch later this month and test some more.

comment:41 Changed 3 years ago by stefan.joosten+openwrt@…

Just to add some info: that's a TL-WR1043ND using AA 12.09-rc1 (r34457)

comment:42 Changed 3 years ago by spamsales@…

I use the latest trunk and have the same behaviour like Karsten. Moreover i have a WRT841N V8.1 with ath9k.

comment:43 Changed 3 years ago by stefan.joosten+openwrt@…

Can confirm this still happens on AA-rc1 built 3 days ago on TL-WR1043ND without disable_ani workaround.
Seemed to work fine, as it did yesterday and the day before. But today, again 1Mbit RX rates, but really no data going through, so wireless is useless.

I have no warnings or errors in kernel log, other than:

hrtimer: interrupt took 31346 ns

Restarting my wireless now to solve it.
The disable_ani workaround is still necessary it seems, at least on my end.

Any news or progress on this yet?
Is there anything I can do as a user to help?

comment:44 Changed 3 years ago by anonymous

I think this is the same problem but I rebooted the router so I can't compare.
I have a TL-WR1043ND with AA 12.09-rc1.

I was running a download and torrents, then speeds went to zero.
I disconnected and reconnected wireless connection.

It connected but everything was slow.
Ping to router was 49-222ms, average ~100ms, 33% (4/12) packet loss.
Memory usage was normal, load average 0.03.

Rebooted router, everything is fine.

comment:45 Changed 3 years ago by matti.laakso@…

I started to encounter this problem after getting a new wireless client (Acer Iconia Tab W510). Running 12.09-rc1 on a Buffalo WZR-HP-G300NH. Restarting wifi nightly seems to be enough to get mostly error free operation.

comment:46 Changed 3 years ago by nbd

please try r35786 or newer

comment:47 Changed 3 years ago by stefan.joosten+openwrt@…

After 9 days of running this newer version I can say this fix helps. It does not seems to fix the true cause of the issue, but it helps greatly with the experience from a user's point of view.
I have experienced WiFi drop to the 1 mbit speed after this fix, albeit less frequently. And I bet that is due to the driver cold resetting the chip when most of the problems occur. This way it "fixes" the problem by restarting itself in most cases.

So I would like to thank you for this fix, as it does help to workaround the issue here most of the time. I hope more tweaks and fixes keep coming, and I will continue testing of course.

comment:48 Changed 3 years ago by nbd

another fix committed, please try r35974 or newer

comment:49 Changed 3 years ago by stefan.joosten+openwrt@…

I'm sorry to report this issue still occurs on r35974 and the r36052 I'm currently running.
r35786 seems to have been the best one for me. But that's hardly a scientific conclusion, because that one just happened to last the longest without me having to reset the WiFi.
I'm considering reverting back to it, to see if it can pull it off a second time. I will keep you posted in case I do return to it and it happens to be more stable. I expect it won't and I just got lucky with it.

So while the fixes do address some of it, because the WiFi is more usable than it was before, it still really slows down. Instead of several MB/s, my download speed falls to between 200KB/s and 400KB/s.

comment:50 Changed 3 years ago by stefan.joosten+openwrt@…

AA r36052 is throwing a fit again today. WiFi slowed to a crawl and it's not fixing itself, so I will have to restart it manually after having done that just yesterday.
I'm reverting back to AA r35786 because that was better in my experience. I will inform you if that happens to be better than the more current revisions.

comment:51 Changed 3 years ago by matti.laakso@…

I'm running AA r36099 now, and I noticed something strange: When this problem occurs download speeds from internet with wireless drop to around 600 kB/s, however, I can still download with samba from a USB hard drive connected to the router at 3.5 MB/s, which is pretty much the maximum I can get from the 65 Mbps wifi link! Also, minstrel_ht statistics from rc_stats always show a throughput of ~35 Mbps. Wired connection to internet is stable at 50 Mbps which is what the ISP gives me. How is this possible?

comment:52 Changed 3 years ago by anonymous

r36088 looks ok with a 150 mbps and 40hz in 802.11n, transfer rates are a bit unstable but basically i get something like 9mb/s on ftp transfers.
i just installed the 12.09 release and was eager to check :)
now it's a waiting game to see how well it works in the long run, but things are definitely looking good.
i guess a huge thank you is in order !

comment:53 Changed 3 years ago by elvstone@…

I just upgraded to 12.09 release and it took only an hour or so until I got disconnected and had to restart the AP (a WRT160NL) :(

comment:54 Changed 3 years ago by nbd

Please try current AA SVN - the commit r36664 should hopefully have fixed this.

comment:55 Changed 3 years ago by anonymous

nbd:
Tested AA r36716, 2 days works fine, but on 3rd I got problem again. Still need to remove 550-ath9k_reduce_ani_interval.patch for normal use.

comment:56 Changed 3 years ago by nbd

Please try changing ATH9K_ANI_POLLINTERVAL in that patch to 200 and see if that makes things more stable for you.

comment:57 Changed 3 years ago by igor

nbd:
With '200' working good so far. I was planning on testing for 2-4 weeks, but saw r36823. Do I need to continue testing or begin re-test with '300' ?

comment:58 Changed 3 years ago by nbd

If 200 worked and 1000 worked, then 300 is going to work as well. Thanks for testing.

comment:59 Changed 3 years ago by igor

nbd:
Got problem today after 1 week uptime. Will start testing with '300'.

comment:60 Changed 3 years ago by nbd

Please change .config to set CONFIG_BUSYBOX_CONFIG_FEATURE_IPC_SYSLOG_BUFFER_SIZE to 512,
and make sure CONFIG_PACKAGE_ATH_DEBUG is enabled.

After you've brought up wifi, run this:

echo 0x49 > /sys/kernel/debug/ieee80211/phy0/ath9k/debug

As soon as the problem appears, run

logread | gzip -c > /tmp/log.gz

And send me (or attach) the contents of /tmp/log.gz

Thanks

comment:61 Changed 3 years ago by anonymous

I'm still seeing the Excessive ack issue with ATH0K_ANI_POLLINTERVAL set to 300 (using snapshot r36859). I also posted something to thread on forum (https://forum.openwrt.org/viewtopic.php?pid=204139#p204139)

comment:62 Changed 3 years ago by anonymous

Been doing some testing looks like the 'disassoc_low_ack' setting (in /var/run/hostapd-phy0.conf) impacts the excessive missing ACK message.
(forum post https://forum.openwrt.org/viewtopic.php?pid=204236#p204236)

comment:63 Changed 3 years ago by igor

nbd:
1 month passed after I started test with ani_pollinterval '300' and enabled debug. No problem at all.

comment:64 Changed 3 years ago by nbd

  • Resolution set to fixed
  • Status changed from accepted to closed

Good to know, thanks for testing.

comment:65 in reply to: ↑ 19 Changed 3 years ago by anonymous

Replying to Roland Pallai <dap@…>:

There is an easy workaround for this:

echo 1 >/sys/kernel/debug/ieee80211/phy0/ath9k/disable_ani

Although I didn't try it yet, it should fix this issue too. Try it and report - I will try this workaround too.

nope, does not fix the wlan clients losing connections after 1-2days.

comment:66 Changed 3 years ago by nbd

please test with the fixes in r37616

comment:67 Changed 3 years ago by anonymous

I had problem with wifi too (it usually disappeared and i had to reboot the router or use the "wifi" command to get it back), and now wifi is working, but the connection slows down after a while (especially when downloading big files from the internet), but one thing is fixed: now wifi doesn't disappears! Will be an other patch for this? My current version is:r37673.

comment:68 Changed 3 years ago by nbd

there were some more changes after that, please test latest.

comment:69 Changed 3 years ago by anonymous

Ok. One more question: where can i see the changelist? Is there a link for that? I'll report as soon as possible.

comment:71 Changed 3 years ago by dap@…

2 days ago I've upgraded to r37948 from my old r34287 and now the problem is back. The symptomps are same as in my original report. A hostapd restart resolved the issue, again. r34287 with the disable_ani workaround was stable for months.

Now I'm running r37948 with 'echo 0 >/sys/kernel/debug/ieee80211/phy0/ath9k/ani' - I'll report on the next week..

comment:72 follow-up: Changed 3 years ago by vadim@…

  • Resolution fixed deleted
  • Status changed from closed to reopened

Also have speed slow down, after some time.

comment:73 in reply to: ↑ 72 Changed 3 years ago by nbd

Replying to vadim@…:

Also have speed slow down, after some time.

your report is way too vague to be useful in any way. please mention your hardware, your openwrt version, what kind of client, etc.
also, make sure you use r38249 or newer.

comment:74 Changed 3 years ago by awilchak@…

r38249 is definitely better. Instead of dropping to zero and staying there, it seems like speeds occasionally drop to 500KB/s and then go back up. Need to test further but I think that last commit is making a big difference. Thank you!

comment:75 Changed 3 years ago by nbd

please also try r38257, it can prevent spurious reconnects

comment:76 Changed 3 years ago by dap@…

r37948 with disabled ANI is stable enough, used for weeks. There was some hiccups on "ath: phy0: Failed to stop TX DMA, queues=0x100!" but it's another issue.

Now I'm running r38259 with enabled ANI, I'll report back.

comment:77 Changed 3 years ago by rw_trac

r38294 - the problem with disconnects still present:

Oct 4 09:26:22 OpenWrt kernel: [ 837.060000] ath: phy0: Failed to stop TX DMA, queues=0x004!
Oct 4 09:26:22 OpenWrt kernel: [ 837.080000] ath: phy0: DMA failed to stop in 10 ms AR_CR=0x00000024 AR_DIAG_SW=0x42000020 DMADBG_7=0x000084c0
Oct 4 09:26:22 OpenWrt kernel: [ 837.090000] ath: phy0: Could not stop RX, we could be confusing the DMA engine when we start RX up
Oct 4 09:26:23 OpenWrt kernel: [ 837.320000] ath: phy0: Failed to stop TX DMA, queues=0x004!
Oct 4 09:26:23 OpenWrt kernel: [ 837.570000] ath: phy0: Failed to stop TX DMA, queues=0x004!
Oct 4 09:26:23 OpenWrt kernel: [ 837.810000] ath: phy0: Failed to stop TX DMA, queues=0x004!
Oct 4 09:26:23 OpenWrt kernel: [ 838.050000] ath: phy0: Failed to stop TX DMA, queues=0x004!
Oct 4 09:26:24 OpenWrt kernel: [ 838.300000] ath: phy0: Failed to stop TX DMA, queues=0x004!
Oct 4 09:26:24 OpenWrt kernel: [ 838.540000] ath: phy0: Failed to stop TX DMA, queues=0x004!
Oct 4 09:26:24 OpenWrt kernel: [ 839.020000] ath: phy0: Failed to stop TX DMA, queues=0x004!
Oct 4 09:26:27 OpenWrt kernel: [ 841.350000] ath: phy0: Failed to stop TX DMA, queues=0x004!
Oct 4 09:26:29 OpenWrt kernel: [ 843.450000] ath: phy0: Failed to stop TX DMA, queues=0x004!
Oct 4 09:26:31 OpenWrt kernel: [ 846.010000] ath: phy0: Failed to stop TX DMA, queues=0x004!
Oct 4 09:26:32 OpenWrt kernel: [ 846.250000] ath: phy0: Failed to stop TX DMA, queues=0x004!
Oct 4 09:26:32 OpenWrt kernel: [ 846.500000] ath: phy0: Failed to stop TX DMA, queues=0x004!
Oct 4 09:26:32 OpenWrt hostapd: wlan0: STA 00:1e:4c:47:f5:08 IEEE 802.11: disconnected due to excessive missing ACKs

comment:78 Changed 3 years ago by anonymous

Same problem here. Drops to 1 MB RX and TX (sometimes only one).

Some clients disconnects completely.

comment:79 follow-up: Changed 3 years ago by dap@…

Something between r37948 and r38259 has fixed this long standing issue of mine: 2 weeks on r38259 without problems with enabled ANI.

I did not used ANI for almost 1 year - hard to believe, it works again. :) I agree this bug can be closed. Thanks!

comment:80 in reply to: ↑ 79 Changed 3 years ago by nbd

  • Resolution set to fixed
  • Status changed from reopened to closed

Thanks for testing

comment:81 Changed 3 years ago by Steffen

Does this fix the TL-WR1043ND v1.8 problem "Suspect of hardware bug that bring down WiFi after a while"? (http://wiki.openwrt.org/toh/tp-link/tl-wr1043nd)

comment:82 Changed 3 years ago by kisssandoradam@…

  • Resolution fixed deleted
  • Status changed from closed to reopened

Today i have flashed my router with the latest trunk version and when i've been downloading with high speed, the wifi stopped working. I think it is not fixed. I connected my computer to the router with cable, but i didn't see any intersting in the kernel and system log.

comment:83 Changed 3 years ago by nbd

what kind of router, what revision, what configuration?

comment:84 Changed 3 years ago by kisssandoradam@…

1043nd router, version number: 1.8
OpenWrt configuration:
Wlan:
WPA2-PSK
Channel 11, second channel below (Force 40MHz mode)
Country code: Hungary
Transmit Power: 20dbm (100mW)

comment:85 Changed 3 years ago by kisssandoradam@…

The wifi stopped working again. Twice in 24 hours. I think the 2013. september 29. trunk version was a bit more stable than the current one. The connection drops when i download with about 5MB/s continuously. But sometimes it works for weeks with the same speed, sometimes not.

comment:86 Changed 3 years ago by dap@…

Seems like I'm hitting kisssandoradam's issue on my WRT160NL with r38259 right now. On massive download, traffic of the wifi client stops after a while. The client remains connected, no error messages but the connection stalling. All other connected clients are working fine, without interrupt. Reconnecting the client does solve the problem. I can reproduce it in 2-3 minutes.

Now I tried with disabled ANI and no problem for 15 minutes now. It's not enough to say that disabled ANI is a workaround, but worth to try. Kisssandoradam, please try:

echo 0 >/sys/kernel/debug/ieee80211/phy0/ath9k/ani

and report back!

comment:87 Changed 3 years ago by Adam <kisssandoradam@…>

My problem was a bit more complicated than yours dap. When i lost connection, then every client loses the wifi connection, because something stops working in the router. Today i have reverted back to backfire 10.03.1 and i think it's more stable than the newer builds. Maybe the problem is in the linux kernel and not in the openwrt. If this stops working too i will use again the latest trunk, but i hope i don't have too.

comment:88 Changed 3 years ago by dap@…

I agree Adam, it's an another problem.

Although I'm downloading for 40 minutes now with disabled ANI and no problem. I suspect my download issue is still an ANI issue - ticket status "reopened" is valid.. I'll do some tests tomorrow..

comment:89 Changed 2 years ago by anonymous

Yes, i can confirm it's a new issue, I have been running an old r36715 build and wifi is fine for most of the time (i restart wifi daily, router weekly) but when I built r38347, i started having frequent wifi issues after a few hours. Speed would drop down to around 5 Mbit/s, some connections would just timeout. In a nutshell, wifi is basically useless, had to revert back to old firmware.

comment:90 Changed 2 years ago by nbd

please try the latest version.

comment:91 Changed 2 years ago by anonymous

i also have this problem. Connection quality drops after 12 to 24 hours of wifi uptime. It seems to depend on the wifi load.

I tried openwrt 12.09 and the latest BB r38999. The symptons are identical.

Router model Buffalo WZR-HP-G300NH.

cat /sys/kernel/debug/ieee80211/phy0/ath9k/ani

ANI: ENABLED

ANI RESET: 221

SPUR UP: 62341

SPUR DOWN: 62341

OFDM WS-DET ON: 0

OFDM WS-DET OFF: 0

MRC-CCK ON: 0

MRC-CCK OFF: 0
FIR-STEP UP: 59683

FIR-STEP DOWN: 59821

INV LISTENTIME: 0

OFDM ERRORS: 299990116

CCK ERRORS: 18037358

cat /sys/kernel/debug/ieee80211/phy0/ath9k/reset

Baseband Hang: 2

Baseband Watchdog: 0

Fatal HW Error: 0

TX HW error: 0

TX Path Hang: 0

PLL RX Hang: 0

MCI Reset: 0

iw wlan0 station dump

Station xx:22 (on wlan0) (macbook)

inactive time: 150 ms
rx bytes: 1351429
rx packets: 7394
tx bytes: 10212479
tx packets: 7963
tx retries: 2313
tx failed: 12
signal: -51 [-60, -52, -58] dBm
signal avg: -51 [-59, -53, -59] dBm
tx bitrate: 117.0 MBit/s MCS 14
rx bitrate: 5.5 MBit/s
authorized: yes
authenticated: yes
preamble: short
WMM/WME: yes
MFP: no
TDLS peer: no

Station xx:eb (on wlan0) (xbox 360)

inactive time: 10440 ms
rx bytes: 22653983
rx packets: 240071
tx bytes: 643697633
tx packets: 464528
tx retries: 149820
tx failed: 139
signal: -39 [-42, -49, -42] dBm
signal avg: -41 [-46, -51, -43] dBm
tx bitrate: 78.0 MBit/s MCS 12
rx bitrate: 104.0 MBit/s MCS 13
authorized: yes
authenticated: yes
preamble: short
WMM/WME: yes
MFP: no
TDLS peer: no

comment:92 Changed 2 years ago by nbd

You forgot to post your wireless config.

comment:93 Changed 2 years ago by anonymous

sorry.

cat /etc/config/wireless

config wifi-device 'radio0'

option type 'mac80211'
option macaddr '00:xx:xx:xx:xx:8e'
option hwmode '11ng'
list ht_capab 'SHORT-GI-40'
list ht_capab 'DSSS_CCK-40'
option distance '20'
option country 'DE'
option htmode 'HT20'
option channel '11'
option txpower '9'

config wifi-iface

option device 'radio0'
option network 'lan'
option mode 'ap'
option ssid 'wlan-1234'
option key 'xyz'
option encryption 'psk2+ccmp'
option macfilter 'allow'
list maclist 'xx:xx:xx:xx:xx:xx'
list maclist 'xx:xx:xx:xx:xx:xx'

comment:94 Changed 2 years ago by anonymous

I tried latest r39096, wifi slowed down again after 12h.

#cat /sys/kernel/debug/ieee80211/phy0/ath9k/ani

ANI: ENABLED

ANI RESET: 11

SPUR UP: 11796

SPUR DOWN: 11796

OFDM WS-DET ON: 0

OFDM WS-DET OFF: 0

MRC-CCK ON: 0

MRC-CCK OFF: 0
FIR-STEP UP: 6916

FIR-STEP DOWN: 6919

INV LISTENTIME: 0

OFDM ERRORS: 30786720

CCK ERRORS: 1669543

# cat /sys/kernel/debug/ieee80211/phy0/ath9k/reset

Baseband Hang: 1

Baseband Watchdog: 0

Fatal HW Error: 0

TX HW error: 0

TX Path Hang: 0

PLL RX Hang: 0

MCI Reset: 0

# iw wlan0 station dump
Station xx:xx:xx:xx:xx:22 (on wlan0)

inactive time: 1890 ms
rx bytes: 166795
rx packets: 1064
tx bytes: 786085
tx packets: 951
tx retries: 1619
tx failed: 19
signal: -47 [-56, -48, -54] dBm
signal avg: -48 [-56, -49, -54] dBm
tx bitrate: 117.0 MBit/s MCS 14
rx bitrate: 1.0 MBit/s
authorized: yes
authenticated: yes
preamble: short
WMM/WME: yes
MFP: no
TDLS peer: no

Station xx:xx:xx:xx:xx:10 (on wlan0)

inactive time: 20 ms
rx bytes: 573743
rx packets: 4682
tx bytes: 12198774
tx packets: 8598
tx retries: 2812
tx failed: 22
signal: -52 [-54, -56, -67] dBm
signal avg: -53 [-54, -60, -63] dBm
tx bitrate: 52.0 MBit/s MCS 5
rx bitrate: 19.5 MBit/s MCS 2
authorized: yes
authenticated: yes
preamble: short
WMM/WME: yes
MFP: no
TDLS peer: no

#uptime

10:54:56 up 12:35, load average: 0.01, 0.02, 0.04

comment:95 Changed 2 years ago by dap@…

Hi,

Now I have tried r39124 with enabled ANI and the wifi has stopped working after 10 minutes of massive download. All clients were disconnected, SSID disappeared.

Latest log messages:
Tue Dec 17 23:23:54 2013 daemon.info hostapd: wlan0: STA a0:f3:c1:f8:9b:e0 IEEE 802.11: disconnected due to excessive missing ACKs
Tue Dec 17 23:24:02 2013 daemon.info hostapd: wlan0: STA 7c:d1:c3:6d:16:e6 IEEE 802.11: disconnected due to excessive missing ACKs
Tue Dec 17 23:24:24 2013 daemon.info hostapd: wlan0: STA a0:f3:c1:f8:9b:e0 IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
Tue Dec 17 23:24:32 2013 daemon.info hostapd: wlan0: STA 7c:d1:c3:6d:16:e6 IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)

The 'wifi' command fixed it.

Now I'm running r39124 with disabled ANI and I'll report back if something goes wrong.

comment:96 Changed 2 years ago by anonymous

Hi,

r39139 seems to be an improvement on Buffalo WZR-HP-G300NH. No Baseband hangs for the last 24h.

comment:97 Changed 2 years ago by fa11enangel

I've tested it with r39096 on a TP-Link WR1043nd v1.11. After about 7-8 days the DMA errors occurred again, but the router was not used for 3 days during holidays.

[   20.380000] br-lan: port 2(wlan0) entered forwarding state
[240600.930000] ath: phy0: Failed to stop TX DMA, queues=0x004!
[240619.490000] ath: phy0: Failed to stop TX DMA, queues=0x004!
[240620.100000] ath: phy0: Failed to stop TX DMA, queues=0x004!
[240621.100000] ath: phy0: Failed to stop TX DMA, queues=0x004!
[240622.170000] ath: phy0: Failed to stop TX DMA, queues=0x004!
[240622.990000] ath: phy0: Failed to stop TX DMA, queues=0x001!
[268452.990000] ath: phy0: Failed to stop TX DMA, queues=0x004!
[268485.950000] ath: phy0: Failed to stop TX DMA, queues=0x004!
[268486.560000] ath: phy0: Failed to stop TX DMA, queues=0x004!
[268487.460000] ath: phy0: Failed to stop TX DMA, queues=0x004!
[268488.740000] ath: phy0: Failed to stop TX DMA, queues=0x004!
[496566.560000] ath: phy0: Failed to stop TX DMA, queues=0x004!
...
# run command "wifi"
...
[743186.430000] ath: phy0: Failed to stop TX DMA, queues=0x100!
[743186.830000] ath: phy0: Failed to stop TX DMA, queues=0x100!
[743187.340000] ath: phy0: Failed to stop TX DMA, queues=0x100!
[743245.040000] device wlan0 left promiscuous mode
[743245.040000] br-lan: port 2(wlan0) entered disabled state
[743245.560000] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[743245.560000] device wlan0 entered promiscuous mode
[743245.570000] br-lan: port 2(wlan0) entered forwarding state
[743245.570000] br-lan: port 2(wlan0) entered forwarding state
[743245.940000] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
[743247.570000] br-lan: port 2(wlan0) entered forwarding state

I've upgraded to r39163 from snapshots. I'll tell how it is working.

Last edited 2 years ago by fa11enangel (previous) (diff)

comment:98 Changed 2 years ago by anonymous

Buffalo WZR-HP-G300NH with r39155 slow down after 4 days uptime.

20:03:28 up 4 days, 18:40, load average: 0.04, 0.02, 0.04

cat /sys/kernel/debug/ieee80211/phy0/ath9k/reset

Baseband Hang: 2

Baseband Watchdog: 0

Fatal HW Error: 0

TX HW error: 0

TX Path Hang: 0

PLL RX Hang: 0

MCI Reset: 0

cat /sys/kernel/debug/ieee80211/phy0/ath9k/ani

ANI: ENABLED

ANI RESET: 82

SPUR UP: 75060

SPUR DOWN: 75060

OFDM WS-DET ON: 0

OFDM WS-DET OFF: 0

MRC-CCK ON: 0

MRC-CCK OFF: 0
FIR-STEP UP: 63234

FIR-STEP DOWN: 63250

INV LISTENTIME: 0

OFDM ERRORS: 298793033

CCK ERRORS: 14665933

iw wlan0 station dump
Station xx:xx:xx:xx:xx:22 (on wlan0)

inactive time: 950 ms
rx bytes: 65533
rx packets: 517
tx bytes: 72318
tx packets: 226
tx retries: 332
tx failed: 9
signal: -41 [-48, -41, -51] dBm
signal avg: -35 [-42, -37, -46] dBm
tx bitrate: 130.0 MBit/s MCS 15
rx bitrate: 1.0 MBit/s
authorized: yes
authenticated: yes
preamble: short
WMM/WME: yes
MFP: no
TDLS peer: no

comment:99 Changed 2 years ago by anonymous

Don't know if it is related to this problem but i got an Kernel oops after a few days of uptime with r39155 on WZR-HP-G300NH.

[871033.260000] ------------[ cut here ]------------
[871033.260000] WARNING: at /store/buildbot/slave/ar71xx/build/build_dir/target-mips_34kc_uClibc-0.9.33.2/linux-ar71xx_generic/compat-wireless-2013-11-05/net/mac80211/rx.c:3365 mac80211_ieee80211_rx+0x134/0x800 [mac80211]()
[871033.280000] Rate marked as an HT rate but passed status->rate_idx is not an MCS index [0-76]: 79 (0x4f)
[871033.290000] Modules linked in: ath9k ath9k_common pppoe ppp_async iptable_nat ath9k_hw ath pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv4 mac80211 ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT slhc nf_nat_irc nf_nat_ftp nf_nat nf_defrag_ipv4 nf_conntrack_irc nf_conntrack_ftp iptable_raw iptable_mangle iptable_filter ipt_REJECT ip_tables crc_ccitt compat ledtrig_usbdev ledtrig_netdev ip6t_REJECT ip6t_rt ip6t_hbh ip6t_mh ip6t_ipv6header ip6t_frag ip6t_eui64 ip6t_ah ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables nf_conntrack_ipv6 nf_conntrack nf_defrag_ipv6 ipv6 arc4 crypto_blkcipher leds_gpio ohci_hcd ledtrig_timer ledtrig_default_on ehci_platform ehci_hcd gpio_button_hotplug usbcore nls_base usb_common
[871033.360000] CPU: 0 PID: 0 Comm: swapper Not tainted 3.10.24 #1
[871033.370000] Stack : 00000006 00000000 00000000 00000000 00000000 00000000 803a2ac6 00000032
[871033.370000] 803276b8 802d7664 80382a38 8032743b 00000000 00000400 00000010 00000000
[871033.370000] 83884010 800790b0 00000003 80076af0 00000000 00000000 802d8f2c 80321b9c
[871033.370000] 00321b9c 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[871033.370000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 80321b28
[871033.370000] ...
[871033.410000] Call Trace:
[871033.410000] [<8006e2f0>] show_stack+0x48/0x70
[871033.410000] [<80076bec>] warn_slowpath_common+0x78/0xa8
[871033.420000] [<80076ca4>] warn_slowpath_fmt+0x2c/0x38
[871033.420000] [<8329e814>] mac80211_ieee80211_rx+0x134/0x800 [mac80211]
[871033.430000] [<83166a88>] ath_rx_tasklet+0xd40/0xe34 [ath9k]
[871033.440000] [<83164a10>] ath9k_tasklet+0x100/0x180 [ath9k]
[871033.440000] [<8007e100>] tasklet_action+0x78/0xc8
[871033.450000] [<8007d92c>] do_softirq+0xc8/0x1b4
[871033.450000] [<8007dac8>] do_softirq+0x48/0x68
[871033.460000] [<8007dd04>] irq_exit+0x54/0x70
[871033.460000] [<8006082c>] ret_from_irq+0x0/0x4
[871033.460000] [<80060a60>]
r4k_wait+0x20/0x40
[871033.470000] [<8009ecb4>] cpu_startup_entry+0xa0/0x108
[871033.470000] [<8033e908>] start_kernel+0x380/0x3a0
[871033.480000]
[871033.480000] ---[ end trace 7b6176610614fce4 ]---

comment:100 Changed 2 years ago by elvstone@…

I've been running a quite old revision (r36088) for a long time, and my WRT160NL has been quite annoying, with frequent disconnects so that I've had to restart the router. But now that my girlfriend has gotten a new laptop, the problems have gotten worse. The speed has been crawling (unusably slow) for several days.

So now I'm going to upgrade. Should I try Attitude Adjustment final or the latest trunk revision?

comment:101 Changed 2 years ago by nbd

please try r39688 or newer

comment:102 Changed 2 years ago by dap@…

r39688 seems like OK, but the counters in file /sys/kernel/debug/ieee80211/phy0/ath9k/ani is weird:

            ANI: ENABLED
      ANI RESET: 3
        SPUR UP: 0
      SPUR DOWN: 0
 OFDM WS-DET ON: 0
OFDM WS-DET OFF: 0
     MRC-CCK ON: 0
    MRC-CCK OFF: 0
    FIR-STEP UP: 0
  FIR-STEP DOWN: 0
 INV LISTENTIME: 0
    OFDM ERRORS: 601939
     CCK ERRORS: 23662

There's too much zeros after hours of uptime, I've never seen this in older releases if I remember correctly. I'm not sure if ANI is *really* working now.

comment:103 Changed 2 years ago by nbd

is it still this way with newer versions?

comment:104 Changed 2 years ago by fa11enangel

Device: TP-Link TL-WR1043ND v2.1
OpenWRT Version: r39535

After long time the error comes back:

[   16.590000] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   17.050000] br-lan: port 1(eth1) entered forwarding state
[   17.220000] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[   17.230000] device wlan0 entered promiscuous mode
[   17.390000] br-lan: port 2(wlan0) entered forwarding state
[   17.390000] br-lan: port 2(wlan0) entered forwarding state
[   17.400000] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
[   19.390000] br-lan: port 2(wlan0) entered forwarding state
[258894.260000] ath: phy0: Failed to stop TX DMA, queues=0x004!
[346640.110000] ath: phy0: Failed to stop TX DMA, queues=0x005!

The wireless lan is working, but sometime users have complained about connection problems to some sites. After rebooting the device the wireless lan was working again.

comment:105 Changed 2 years ago by dap@…

Now it's sure: r39688 is also broken with enabled ANI, mainly fails at massive downloads.

Symptoms are variable: sometimes only the downloading client lost the connection, sometimes the whole AP disappears.

It's stable when ANI is disabled.

comment:106 Changed 2 years ago by nbd

I changed ANI to improve behavior on older chips in r39767 - please test.

comment:107 Changed 2 years ago by dap@…

Ok, I'm running r39788 now.

Counters of file /sys/kernel/debug/ieee80211/phy0/ath9k/ani seems the usual this time:

root@OpenWrt:~# uptime
 13:37:43 up 7 min,  load average: 0.00, 0.06, 0.04
root@OpenWrt:~# cat /sys/kernel/debug/ieee80211/phy0/ath9k/ani
            ANI: ENABLED
      ANI RESET: 4
        SPUR UP: 95
      SPUR DOWN: 95
 OFDM WS-DET ON: 0
OFDM WS-DET OFF: 0
     MRC-CCK ON: 0
    MRC-CCK OFF: 0
    FIR-STEP UP: 98
  FIR-STEP DOWN: 93
 INV LISTENTIME: 0
    OFDM ERRORS: 117313
     CCK ERRORS: 4094

I begin to stress test it.

comment:108 Changed 2 years ago by dap@…

Early report: the stability problem fixed by the patch, but introduced a new performance issue.

I was downloading when there was a short hiccup in the traffic flow. One client were disconnected (maybe the user reconnected due to traffic stalling) and my workstation rx/tx bitrate peak dropped to 117Mb/s from that point.

After about 10 minutes I typed 'wifi' command that immediately restored my workstation rx/tx bitrate to 270Mb/s, the download speed jumped up.

iw wlan0 station dump after the hiccup:

Station a0:f3:c1:f8:9b:e0 (on wlan0)
        inactive time:  0 ms
        rx bytes:       899045603
        rx packets:     2658734
        tx bytes:       1910238837
        tx packets:     4335089
        tx retries:     423763
        tx failed:      115
        signal:         -42 [-52, -43] dBm
        signal avg:     -42 [-51, -43] dBm
        tx bitrate:     117.0 MBit/s MCS 14
        rx bitrate:     104.0 MBit/s MCS 13
        authorized:     yes
        authenticated:  yes
        preamble:       short
        WMM/WME:        yes
        MFP:            no
        TDLS peer:      no

iw wlan0 station after 'wifi' command issued:

Station a0:f3:c1:f8:9b:e0 (on wlan0)
        inactive time:  0 ms
        rx bytes:       8033073
        rx packets:     32395
        tx bytes:       80907015
        tx packets:     56014
        tx retries:     3566
        tx failed:      0
        signal:         -47 [-56, -47] dBm
        signal avg:     -46 [-56, -47] dBm
        tx bitrate:     270.0 MBit/s MCS 14 40MHz short GI
        rx bitrate:     270.0 MBit/s MCS 14 40MHz short GI
        authorized:     yes
        authenticated:  yes
        preamble:       short
        WMM/WME:        yes
        MFP:            no
        TDLS peer:      no

There's nothing interesting in the log.

comment:109 Changed 2 years ago by dap@…

There is even stability issues, now the downloading client lost the connection. AP log message:

hostapd: wlan0: STA a0:f3:c1:f8:9b:e0 IEEE 802.11: disconnected due to excessive missing ACKs

Now I run this stress test without ANI.

comment:110 Changed 2 years ago by nbd

Please update to latest trunk with the change I just committed.
Afterwards, show me /sys/kernel/debug/ieee80211/phy0/ath9k/ani both in working state (after running a few minutes), and when the stability issue has appeared.

comment:111 Changed 2 years ago by dap@…

Ok, r39860 is running.

(No performance nor stability issue on r39788 if ANI is disabled.)

comment:112 Changed 2 years ago by dap@…

I did hit the first traffic stall after a few minutes, had to re-connect the client.

Right before:

            ANI: ENABLED
      ANI RESET: 4
     OFDM LEVEL: 9
      CCK LEVEL: 0
        SPUR UP: 4
      SPUR DOWN: 4
 OFDM WS-DET ON: 0
OFDM WS-DET OFF: 0
     MRC-CCK ON: 0
    MRC-CCK OFF: 0
    FIR-STEP UP: 77
  FIR-STEP DOWN: 71
 INV LISTENTIME: 0
    OFDM ERRORS: 320963
     CCK ERRORS: 20037

Right after:

            ANI: ENABLED
      ANI RESET: 4
     OFDM LEVEL: 8
      CCK LEVEL: 0
        SPUR UP: 4
      SPUR DOWN: 4
 OFDM WS-DET ON: 0
OFDM WS-DET OFF: 0
     MRC-CCK ON: 0
    MRC-CCK OFF: 0
    FIR-STEP UP: 78
  FIR-STEP DOWN: 73
 INV LISTENTIME: 0
    OFDM ERRORS: 324167
     CCK ERRORS: 20078

comment:113 Changed 2 years ago by dap@…

As I found, "traffic stalling" means AP becomes deaf, the client still sees the AP's traffic:

19:35:42.780542 IP 192.168.5.118 > 192.168.5.2: ICMP echo request, id 11482, seq 2999, length 64
19:35:42.901602 ARP, Request who-has 192.168.5.118 tell 192.168.5.2, length 28
19:35:42.901616 ARP, Reply 192.168.5.118 is-at a0:f3:c1:f8:9b:e0, length 28
19:35:43.720796 ARP, Request who-has 192.168.5.116 tell 192.168.5.2, length 28
19:35:43.780559 IP 192.168.5.118 > 192.168.5.2: ICMP echo request, id 11482, seq 3000, length 64
19:35:43.925598 ARP, Request who-has 192.168.5.118 tell 192.168.5.2, length 28
19:35:43.925613 ARP, Reply 192.168.5.118 is-at a0:f3:c1:f8:9b:e0, length 28
19:35:44.324578 IP 192.168.5.118.50297 > 217.20.130.72.openvpn: UDP, length 14
19:35:44.744820 ARP, Request who-has 192.168.5.116 tell 192.168.5.2, length 28

comment:114 Changed 2 years ago by nbd

Please try r39865, it should be better now. It was missing one more change to properly toggle weak signal detection.

comment:115 Changed 2 years ago by dap@…

Still not fixed in r39865, but the behavior changed: after a few minutes every clients' connection quality dropped, some clients tried to reconnect repeatedly. High ping times, very slow networking, hard to re-connect. Remembered me to my original report.

The ani file shows weird counter values this time:

Before:

            ANI: ENABLED
      ANI RESET: 9
     OFDM LEVEL: 8
      CCK LEVEL: 0
        SPUR UP: 29
      SPUR DOWN: 29
 OFDM WS-DET ON: 295
OFDM WS-DET OFF: 296
     MRC-CCK ON: 0
    MRC-CCK OFF: 0
    FIR-STEP UP: 326
  FIR-STEP DOWN: 305
 INV LISTENTIME: 0
    OFDM ERRORS: 234493
     CCK ERRORS: 21398

Meanwhile:

            ANI: ENABLED
      ANI RESET: 9
     OFDM LEVEL: 0
      CCK LEVEL: 0
        SPUR UP: 29
      SPUR DOWN: 29
 OFDM WS-DET ON: 358
OFDM WS-DET OFF: 358
     MRC-CCK ON: 0
    MRC-CCK OFF: 0
    FIR-STEP UP: 389
  FIR-STEP DOWN: 375
 INV LISTENTIME: 0
    OFDM ERRORS: 283564
     CCK ERRORS: 27361

After 'wifi' command issued, connections restored:

            ANI: ENABLED
      ANI RESET: 14
     OFDM LEVEL: 5
      CCK LEVEL: 0
        SPUR UP: 33
      SPUR DOWN: 33
 OFDM WS-DET ON: 358
OFDM WS-DET OFF: 358
     MRC-CCK ON: 0
    MRC-CCK OFF: 0
    FIR-STEP UP: 396
  FIR-STEP DOWN: 380
 INV LISTENTIME: 0
    OFDM ERRORS: 290320
     CCK ERRORS: 30242

comment:116 Changed 2 years ago by nbd

when the issue occurs again, please measure how much the "OFDM ERRORS" counter increases during a time of 10 seconds or so (need average errors per second), and of the OFDM LEVEL is at 0 again.

comment:117 Changed 2 years ago by dap@…

I have identified multiple kind of ANI related problems. There may be multiple bugs or may be just different results of a bug. Let me explain them before I gave the numbers.

The "OFDM LEVEL zero" issue, symptom: very bad connection quality on all clients.
The "AP disaster" issue, symptom: all clients are disconnected and impossible to connect to the AP (the SSID is there).
The "downloader stalling" issue, symptom: I have to reconnect. Other clients may not be affected.

First I did hit the "downloader stalling" issue:

         ANIR OFDM CCKL SPUP SPDW OWD1 OWD0 MRC1 MRC0 FIRU FIRD INVL OERR CERR
22:03:58    0    6    0    2    2    0    0    0    0    2    2    0 1787  354
22:04:08    0    6    0    1    1    0    0    0    0    1    1    0 1591  394
22:04:18    0    5    0    1    1    0    0    0    0    1    2    0 1767  438
22:04:28    0    6    0    1    1    0    0    0    0    1    0    0 1779  450
22:04:38    0    6    0    1    1    0    0    0    0    1    1    0 1852  352
22:04:48    0    6    0    2    2    0    0    0    0    2    2    0 1776  390
22:04:58    0    7    0    3    3    0    0    0    0    3    2    0 1938  422
22:05:08    0    5    0    1    1    0    0    0    0    1    3    0 1482  305
22:05:18    0    6    0    2    2    0    0    0    0    2    1    0 1858  326
22:05:28    0    6    0    2    2    0    0    0    0    2    2    0 1880  298
22:05:39    0    6    0    2    2    0    0    0    0    2    2    0 2036  248
22:05:49    0    6    0    1    1    0    0    0    0    1    1    0 1828  258
22:05:59    0    4    0    1    1    0    0    0    0    1    3    0 2098  332
22:06:09    0    5    0    3    3    0    0    0    0    3    2    0 2230  277
22:06:19    0    5    0    1    1    0    0    0    0    1    1    0 1755  236
22:06:29    0    3    0    2    2    0    0    0    0    2    4    0 2461  172
22:06:39    0    5    0    3    3    0    0    0    0    3    1    0 4238    1
22:06:49    0    5    0    2    2    0    0    0    0    2    2    0 3150    0
22:06:59    0    4    0    2    2    0    0    0    0    2    3    0 3080    0
22:07:09    0    4    0    1    1    0    0    0    0    1    1    0 2924    0
22:07:19    0    4    0    0    0    0    0    0    0    0    0    0 3358    0
22:07:29    0    4    0    1    1    0    0    0    0    1    1    0 3302    0
22:07:39    0    4    0    2    2    0    0    0    0    2    2    0 3139    1
22:07:49    0    3    0    0    0    0    0    0    0    0    1    0 3364    0
22:07:59    0    4    0    3    3    0    0    0    0    3    2    0 3419    0
22:08:09    0    4    0    2    2    0    0    0    0    2    2    0 3138    0
22:08:19    0    4    0    2    2    0    0    0    0    2    2    0 3179    1
22:08:29    0    4    0    1    1    0    0    0    0    1    1    0 3208    1
22:08:39    1    6    2    3    3    0    0    0    0    3    0    0 3938   52

ANIR: ANI RESET
OFDM: OFDM LEVEL
CCKL: CCK LEVEL
SPUP: SPUR UP
SPDW: SPUR DOWN
OWD1: OFDM WS-DET ON
OWD0: OFDM WS-DET OFF
MRC1: MRC-CCK ON
MRC0: MRC-CCK OFF
FIRU: FIR-STEP UP
FIRD: FIR-STEP DOWN
INVL: INV LISTENTIME
OERR: OFDM ERRORS
CERR: CCK ERRORS

Mon Mar 10 22:06:25 2014 daemon.info hostapd: wlan0: STA a0:f3:c1:f8:9b:e0 IEEE 802.11: disconnected due to excessive missing ACKs
Mon Mar 10 22:06:55 2014 daemon.info hostapd: wlan0: STA a0:f3:c1:f8:9b:e0 IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)

Second I did hit the "AP disaster" issue:

         ANIR OFDM CCKL SPUP SPDW OWD1 OWD0 MRC1 MRC0 FIRU FIRD INVL OERR CERR
22:39:51    0    4    0    2    2    0    0    0    0    2    2    0 1992  103
22:40:01    0    3    0    3    3    0    0    0    0    3    4    0 2332   94
22:40:11    1    5    2    5    5    0    0    0    0    5    2    0 2762  162
22:40:21    0    7    1    3    3    0    0    0    0    3    1    0 1799  127
22:40:32    0    4    0    1    1    0    0    0    0    1    4    0 1692  143
22:40:42    0    6    0    3    3    0    0    0    0    3    1    0 2219  244
22:40:52    0    5    0    1    1    0    0    0    0    1    2    0 1655  124
22:41:02    0    7    0    2    2    0    0    0    0    2    0    0 1990  182
22:41:12    0    7    0    1    1    0    0    0    0    1    1    0 1578  119
22:41:22    0    4    0    1    1    0    0    0    0    1    4    0 1585  107
22:41:32    0    4    0    1    1    0    0    0    0    1    1    0 3165    0
22:41:42    0    4    0    1    1    0    0    0    0    1    1    0 3149    1
22:41:52    0    4    0    2    2    0    0    0    0    2    2    0 3396    0
22:42:02    0    4    0    2    2    0    0    0    0    2    2    0 3369    1
22:42:12    0    5    0    4    4    0    0    0    0    4    3    0 3107    0
22:42:22    0    4    0    3    3    0    0    0    0    3    4    0 3335    0
22:42:32    0    3    0    2    2    0    0    0    0    2    3    0 3225    0
22:42:42    0    3    0    2    2    0    0    0    0    2    2    0 3508    1
22:42:52    0    4    0    3    3    0    0    0    0    3    2    0 3573    0
22:43:02    0    4    0    1    1    0    0    0    0    1    1    0 3036    0

Mon Mar 10 22:40:07 2014 kern.err kernel: [ 2701.660000] ath: phy0: Failed to stop TX DMA, queues=0x004!
Mon Mar 10 22:40:07 2014 kern.err kernel: [ 2701.680000] ath: phy0: DMA failed to stop in 10 ms AR_CR=0x00000024 AR_DIAG_SW=0x42000020 DMADBG_7=0x00028cc1
Mon Mar 10 22:40:07 2014 kern.err kernel: [ 2701.690000] ath: phy0: Could not stop RX, we could be confusing the DMA engine when we start RX up
Mon Mar 10 22:41:22 2014 daemon.info hostapd: wlan0: STA a0:f3:c1:f8:9b:e0 IEEE 802.11: disconnected due to excessive missing ACKs
Mon Mar 10 22:41:52 2014 daemon.info hostapd: wlan0: STA a0:f3:c1:f8:9b:e0 IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)

The counters and the messages are similar in both cases, but the user experence very different..
I'm hunting for the "OFDM LEVEL zero" issue now.

comment:118 Changed 2 years ago by dap@…

Here is the "OFDM LEVEL zero" issue too. After 3 minutes the AP restored the service without manual intervention.

         ANIR OFDM CCKL SPUP SPDW OWD1 OWD0 MRC1 MRC0 FIRU FIRD INVL OERR CERR
00:35:50    0    5    0    1    1    0    0    0    0    1    2    0 1660  140
00:36:00    0    5    0    2    2    0    0    0    0    2    2    0 1683  155
00:36:11    0    5    0    2    2    0    0    0    0    2    2    0 1745  161
00:36:21    0    5    0    1    1    0    0    0    0    1    1    0 1821  171
00:36:31    0    5    0    1    1    0    0    0    0    1    1    0 1814  165
00:36:41    0    6    0    1    1    0    0    0    0    1    0    0 1786  138
00:36:51    0    6    0    2    2    0    0    0    0    2    2    0 1836  228
00:37:01    0    6    0    1    1    0    0    0    0    1    1    0 1687  138
00:37:11    0    5    0    0    0    0    0    0    0    0    1    0 1842  166
00:37:21    0    6    0    3    3    0    0    0    0    3    2    0 1930  211
00:37:31    0    6    0    0    0    0    0    0    0    0    0    0 1523  119
00:37:41    0    6    0    2    2    0    0    0    0    2    2    0 1611  124
00:37:51    0    6    0    2    2    1    1    0    0    3    3    0 1593  112
00:38:01    0    5    0    1    1    0    0    0    0    1    2    0 1674  171
00:38:11    0    6    0    3    3    0    0    0    0    3    2    0 2004  192
00:38:21    0    6    0    1    1    0    0    0    0    1    1    0 1951  128
00:38:31    0    6    0    2    2    0    0    0    0    2    2    0 1829  117
00:38:41    0    7    0    2    2    0    0    0    0    2    1    0 2125  146
00:38:51    0    5    0    0    0    0    0    0    0    0    2    0 1963  144
00:39:01    0    6    0    2    2    0    0    0    0    2    1    0 1851  135
00:39:11    0    6    0    1    1    0    0    0    0    1    1    0 1609  150
00:39:21    0    5    0    1    1    0    0    0    0    1    2    0 1793  139
00:39:32    0    5    0    2    2    0    0    0    0    2    2    0 1938  199
00:39:42    0    5    0    1    1    0    0    0    0    1    1    0 2092  184
00:39:52    0    6    0    1    1    0    0    0    0    1    0    0 1949  233
00:40:02    0    6    0    2    2    0    0    0    0    2    2    0 1807  156
00:40:12    0    6    0    1    1    0    0    0    0    1    1    0 1437  135
00:40:22    0    5    0    1    1    0    0    0    0    1    2    0 1814  128
00:40:32    0    5    0    1    1    0    0    0    0    1    1    0 1899  134
00:40:42    0    5    0    0    0    0    0    0    0    0    0    0 2047  216
00:40:52    0    5    0    2    2    0    0    0    0    2    2    0 1860  147
00:41:02    0    6    0    2    2    0    0    0    0    2    1    0 1853  195
00:41:12    0    5    0    1    1    0    0    0    0    1    2    0 1636  206
00:41:22    0    5    0    1    1    0    0    0    0    1    1    0 1906  209
00:41:32    0    6    0    2    2    0    0    0    0    2    1    0 1832  144
00:41:42    0    5    0    1    1    0    0    0    0    1    2    0 1597  193
00:41:52    0    6    0    1    1    0    0    0    0    1    0    0 1762  127
00:42:02    0    6    0    2    2    0    0    0    0    2    2    0 1914  156
00:42:12    0    6    0    1    1    0    0    0    0    1    1    0 1687  139
00:42:22    0    7    0    3    3    0    0    0    0    3    2    0 1937  160
00:42:32    0    7    0    1    1    1    1    0    0    2    2    0 1561  116
00:42:42    0    5    0    1    1    0    0    0    0    1    3    0 1602  128
00:42:52    0    0    0    1    1    0    0    0    0    1    5    0  766   52
00:43:02    0    0    0    0    0    0    0    0    0    0    0    0    0    0
00:43:12    0    0    0    0    0    0    0    0    0    0    0    0    0    0
00:43:22    0    0    0    0    0    0    0    0    0    0    0    0    0    0
00:43:32    0    0    0    0    0    0    0    0    0    0    0    0    0    0
00:43:42    0    0    0    0    0    0    0    0    0    0    0    0    0    0
00:43:53    0    0    0    0    0    0    0    0    0    0    0    0    0    0
[..only zeros..]
00:46:07    0    0    0    0    0    0    0    0    0    0    0    0    0    0
00:46:17    1    6    2    3    3    0    0    0    0    3    0    0 2876   12
[some clients are reconnected at this point, only light traffic, the download tcp stream still stalling]
00:46:27    0    7    1    2    2    0    0    0    0    2    1    0 2719    9
00:46:37    0    6    0    1    1    0    0    0    0    1    2    0 2920   10
00:46:47    0    4    0    0    0    0    0    0    0    0    2    0 2624   16
00:46:57    0    4    0    1    1    0    0    0    0    1    1    0 3190   28
00:47:07    0    4    0    0    0    0    0    0    0    0    0    0 2880   17
00:47:17    0    4    0    0    0    0    0    0    0    0    0    0 2979   26
00:47:27    0    4    0    3    3    0    0    0    0    3    3    0 3235   20
00:47:37    0    4    0    1    1    0    0    0    0    1    1    0 3115   22
00:47:47    0    6    0    2    2    0    0    0    0    2    0    0 3731   24
00:47:57    0    5    0    1    1    0    0    0    0    1    2    0 3014   16

Tue Mar 11 00:42:46 2014 daemon.info hostapd: wlan0: STA a0:f3:c1:f8:9b:e0 IEEE 802.11: disconnected due to excessive missing ACKs
Tue Mar 11 00:43:16 2014 daemon.info hostapd: wlan0: STA a0:f3:c1:f8:9b:e0 IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)

Now I doubt if there is such situation when some clients are not affected. Maybe the ANI reset fooled me, restored the service meanwhile I walked to the another client to check the network.. So I think the "downloader stalling" issue is same as the "AP disaster", the only difference is the intervention of the ANI reset.

Anyway, this is definitely different.

comment:119 Changed 2 years ago by nbd

Got another one for you: http://nbd.name/950-ath9k_ani_test.patch
The stats are very useful, please also include them for the next round.

Thanks for testing!

comment:120 Changed 2 years ago by dap@…

r39865 patched, the first issue is here.

The download began stalling at 14:17:05. I was able to reconnect without ANI reset this time and did not get "disconnected due to excessive missing ACKs" message. This is not typical.

         ANIR OFDM CCKL SPUP SPDW OWD1 OWD0 MRC1 MRC0 FIRU FIRD INVL OERR CERR
14:07:50    0    6    0    1    1    0    0    0    0    1    0    0 2146  110
14:08:00    0    5    0    1    1    0    0    0    0    1    2    0 2017   92
14:08:10    0    6    0    3    3    0    0    0    0    3    2    0 2200   75
14:08:20    0    5    0    1    1    0    0    0    0    1    2    0 1860   93
14:08:30    0    5    0    1    1    0    0    0    0    1    1    0 2103   95
14:08:40    0    6    0    1    1    0    0    0    0    1    0    0 2342   80
14:08:50    0    4    0    0    0    0    0    0    0    0    2    0 1909   76
14:09:00    0    5    0    3    3    0    0    0    0    3    2    0 2263   81
14:09:10    0    5    0    0    0    0    0    0    0    0    0    0 1999   83
14:09:20    0    5    0    1    1    0    0    0    0    1    1    0 2032   86
14:09:30    0    5    0    1    1    0    0    0    0    1    1    0 1813   94
14:09:40    0    5    0    0    0    0    0    0    0    0    0    0 1612   91
14:09:50    0    5    0    1    1    0    0    0    0    1    1    0 1706  100
14:10:00    0    6    0    4    4    0    0    0    0    4    3    0 2113   71
14:10:10    0    5    0    2    2    0    0    0    0    2    3    0 2060   99
14:10:21    0    6    0    1    1    0    0    0    0    1    0    0 2143   74
14:10:31    0    5    0    2    2    0    0    0    0    2    3    0 1809   72
14:10:41    0    5    0    2    2    0    0    0    0    2    2    0 2330   69
14:10:51    0    5    0    1    1    0    0    0    0    1    1    0 1834   73
14:11:01    0    5    0    2    2    0    0    0    0    2    2    0 2041   67
14:11:11    0    5    0    0    0    0    0    0    0    0    0    0 2113   62
14:11:21    0    6    0    3    3    0    0    0    0    3    2    0 2311   85
14:11:31    0    6    0    2    2    1    1    0    0    3    3    0 2003   77
14:11:41    0    7    0    3    3    0    0    0    0    3    2    0 2539   95
14:11:51    0    5    0    2    2    0    0    0    0    2    4    0 2289   95
14:12:01    0    7    0    3    3    0    0    0    0    3    1    0 2932   90
14:12:11    0    6    0    4    4    0    0    0    0    4    5    0 2177   73
14:12:21    0    6    0    3    3    0    0    0    0    3    3    0 1999   78
14:12:31    0    6    0    2    2    0    0    0    0    2    2    0 1800   71
14:12:41    0    6    0    2    2    0    0    0    0    2    2    0 2114   82
14:12:51    0    4    0    2    2    0    0    0    0    2    4    0 1924   95
14:13:01    0    5    0    2    2    0    0    0    0    2    1    0 1984   84
14:13:11    0    5    0    0    0    0    0    0    0    0    0    0 1814  110
14:13:21    0    5    0    2    2    0    0    0    0    2    2    0 1887  103
14:13:31    0    6    0    4    4    0    0    0    0    4    3    0 2049   79
14:13:41    0    6    0    1    1    0    0    0    0    1    1    0 1760   77
14:13:51    0    5    0    3    3    0    0    0    0    3    4    0 2237   84
14:14:02    0    5    0    2    2    0    0    0    0    2    2    0 1756   70
14:14:12    0    4    0    2    2    0    0    0    0    2    3    0 2183   99
14:14:22    0    5    0    2    2    0    0    0    0    2    1    0 2172   75
14:14:32    0    6    0    3    3    0    0    0    0    3    2    0 2099   80
14:14:42    0    5    0    0    0    0    0    0    0    0    1    0 1882   77
14:14:52    0    6    0    2    2    0    0    0    0    2    1    0 2365   69
14:15:02    0    6    0    3    3    0    0    0    0    3    3    0 1890   55
14:15:12    0    5    0    1    1    0    0    0    0    1    2    0 1905   79
14:15:22    0    5    0    1    1    0    0    0    0    1    1    0 1647   81
14:15:32    0    5    0    1    1    0    0    0    0    1    1    0 2247   77
14:15:42    0    5    0    3    3    0    0    0    0    3    3    0 1955   67
14:15:52    0    5    0    3    3    0    0    0    0    3    3    0 2220   78
14:16:02    0    6    0    3    3    0    0    0    0    3    2    0 2192   65
14:16:12    0    6    0    3    3    0    0    0    0    3    3    0 2244   68
14:16:22    0    5    0    1    1    0    0    0    0    1    2    0 2063   79
14:16:32    0    5    0    1    1    0    0    0    0    1    1    0 2144   85
14:16:42    0    5    0    2    2    0    0    0    0    2    2    0 2168   78
14:16:52    0    5    0    2    2    0    0    0    0    2    2    0 1910   83
14:17:02    0    5    0    2    2    0    0    0    0    2    2    0 2439   51
14:17:12    0    5    0    1    1    0    0    0    0    1    1    0 3052    3
14:17:22    0    4    0    0    0    0    0    0    0    0    1    0 3083   10
14:17:32    0    5    0    1    1    0    0    0    0    1    0    0 3033    8
14:17:42    0    5    0    1    1    0    0    0    0    1    1    0 3152    3
14:17:52    0    5    0    0    0    0    0    0    0    0    0    0 3001    8
14:18:02    0    5    0    2    2    0    0    0    0    2    2    0 3347    5
14:18:12    0    5    0    1    1    0    0    0    0    1    1    0 3115   14
14:18:22    0    5    0    1    1    0    0    0    0    1    1    0 3208    6
14:18:32    0    5    0    0    0    0    0    0    0    0    0    0 2954   14
14:18:42    0    5    0    2    2    0    0    0    0    2    2    0 3527   11

And there's a kernel message:

Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.520000] ------------[ cut here ]------------
Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.530000] WARNING: at /usr/src/openwrt/trunk/build_dir/target-mips_34kc_uClibc-0.9.33.2/linux-ar71xx_generic/compat-wireless-2014-01-23.1/net/mac80211/rx.c:3397 mac80211_ieee80211_rx+0x13c/0x818 [mac80211]()
Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.550000] Rate marked as an HT rate but passed status->rate_idx is not an MCS index [0-76]: 101 (0x65)
Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.560000] Modules linked in: ath9k ath9k_common ath9k_hw ath pppoe ppp_async iptable_nat pppox ppp_generic nf_nat_ipv4 nf_conntrack_netlink nf_conntrack_ipv4 mac80211 ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT slhc nfnetlink nf_nat_irc nf_nat_ftp nf_nat nf_defrag_ipv4 nf_conntrack_irc nf_conntrack_ftp iptable_raw iptable_mangle iptable_filter ipt_REJTue Mar 11 14:08:59 2014 kern.warn kernel: [55094.640000] CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G        W    3.10.32 #1
Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.650000] Stack : 00000000 00000000 00000000 00000000 80372e7a 00000041 81828a08 80dea010
Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.650000]         802d2600 803213db 00000003 80372628 81828a08 80dea010 80f05d74 00000014
Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.650000]         00000018 80078fb0 00000003 800769c0 80cc9688 80dea010 802d3ec0 8183bc64
Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.650000]         00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.650000]         00000000 00000000 00000000 00000000 00000000 00000000 00000000 8183bbf0
Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.650000]         ...
Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.690000] Call Trace:
Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.690000] [<8006e278>] show_stack+0x48/0x70
Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.690000] [<80076b30>] warn_slowpath_common+0x78/0xa8
Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.700000] [<80076b8c>] warn_slowpath_fmt+0x2c/0x38
Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.700000] [<80c9f388>] mac80211_ieee80211_rx+0x13c/0x818 [mac80211]
Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.710000] [<80e46c74>] ath_rx_tasklet+0xcc0/0xda8 [ath9k]
Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.720000] [<80e44214>] ath9k_tasklet+0x1ac/0x230 [ath9k]
Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.720000] [<8007e0ac>] tasklet_action+0x84/0xcc
Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.730000] [<8007d8ac>] __do_softirq+0xd0/0x1b8
Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.730000] [<8007d9c0>] run_ksoftirqd+0x2c/0x58
Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.740000] [<80099d40>] smpboot_thread_fn+0x134/0x164
Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.740000] [<80092e9c>] kthread+0xb0/0xb8
Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.740000] [<80060878>] ret_from_kernel_thread+0x14/0x1c
Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.750000] 
Tue Mar 11 14:08:59 2014 kern.warn kernel: [55094.750000] ---[ end trace 8c56e57a4320c6d8 ]---

comment:121 Changed 2 years ago by nbd

The fact that it doesn't get stuck anymore sounds like progress to me.

Here's another patch: http://nbd.name/951-rifs_test.patch (keep the last one in your tree)
Increasing the PHY search delay should hopefully reduce the number of errors as well.

comment:122 Changed 2 years ago by dap@…

Yes, it's definitely progress! Before the patch I was able to reproduce an issue in a few minutes, but now I have had only one problem in hours of testing (and possible this isn't ANI related, I did not test this version without ANI as long as now I do with ANI).

I'm applying the new patch and restart testing.

comment:123 Changed 2 years ago by dap@…

ANI counters with 951-rifs_test.patch:

         ANIR OFDM CCKL SPUP SPDW OWD1 OWD0 MRC1 MRC0 FIRU FIRD INVL OERR CERR
16:15:01    0    5    0    1    1    0    0    0    0    1    1    0 2620  123
16:15:11    0    4    0    1    1    0    0    0    0    1    2    0 1405  209
16:15:21    0    5    0    2    2    0    0    0    0    2    1    0 2119  259
16:15:31    1    4    2    3    3    0    0    0    0    3    6    0 2089  119
16:15:41    0    3    0    1    1    0    0    0    0    0    1    0 1562  128
16:15:51    0    3    0    1    1    0    0    0    0    0    0    0 1763   99
16:16:01    0    3    0    0    0    0    0    0    0    0    0    0 2080  127
16:16:11    0    2    0    0    0    0    0    0    0    0    0    0 1799  109
16:16:21    0    1    0    2    2    0    0    0    0    0    1    0 2115  126
16:16:32    0    2    0    2    2    0    0    0    0    1    0    0 2316   98
16:16:42    0    5    0    5    5    0    0    0    0    4    2    0 2727  278
16:16:52    0    6    0    1    1    0    0    0    0    1    0    0 2026  230
16:17:02    0    4    0    1    1    0    0    0    0    1    3    0 1692  236
16:17:12    0    6    0    3    3    0    0    0    0    3    1    0 2044  185
16:17:22    0    5    0    1    1    0    0    0    0    1    2    0 1688  133
16:17:32    0    5    0    1    1    0    0    0    0    1    1    0 1347  189
16:17:42    0    5    0    1    1    0    0    0    0    1    1    0 2187  178

I can't see big difference, the performance is good with both versions.

comment:124 Changed 2 years ago by dap@…

Aie, "OFDM LEVEL zero" is back;

         ANIR OFDM CCKL SPUP SPDW OWD1 OWD0 MRC1 MRC0 FIRU FIRD INVL OERR CERR
16:38:34    0    5    0    3    3    0    0    0    0    3    3    0 1591  104
16:38:44    0    5    0    0    0    0    0    0    0    0    0    0 1804  135
16:38:54    0    6    0    2    2    0    0    0    0    2    1    0 1808  112
16:39:04    0    5    0    1    1    0    0    0    0    1    2    0 1765  100
16:39:14    0    5    0    2    2    0    0    0    0    2    2    0 1710  126
16:39:24    0    5    0    1    1    0    0    0    0    1    1    0 1708  105
16:39:34    0    4    0    0    0    0    0    0    0    0    1    0 1847  102
16:39:44    0    5    0    2    2    0    0    0    0    2    1    0 1742  113
16:39:54    0    5    0    2    2    0    0    0    0    2    2    0 1750   87
16:40:04    0    5    0    2    2    0    0    0    0    2    2    0 1762  112
16:40:15    0    6    0    3    3    0    0    0    0    3    2    0 1743  122
16:40:25    0    5    0    1    1    0    0    0    0    1    2    0 1711  147
16:40:35    0    6    0    2    2    0    0    0    0    2    1    0 1810  105
16:40:45    0    5    0    2    2    0    0    0    0    2    3    0 1618  126
16:40:55    0    5    0    3    3    0    0    0    0    3    3    0 1727   89
16:41:05    0    5    0    0    0    0    0    0    0    0    0    0 1748  102
16:41:15    0    5    0    0    0    0    0    0    0    0    0    0 1864  103
16:41:25    0    5    0    0    0    0    0    0    0    0    0    0 1851  100
16:41:35    0    5    0    1    1    0    0    0    0    1    1    0 1671   89
16:41:45    0    4    0    1    1    0    0    0    0    1    2    0 1688  130
16:41:55    0    5    0    3    3    0    0    0    0    3    2    0 1836  144
16:42:05    0    5    0    1    1    0    0    0    0    1    1    0 1567  118
16:42:15    0    4    0    2    2    0    0    0    0    2    3    0 1737  123
16:42:25    0    5    0    3    3    0    0    0    0    3    2    0 1915  154
16:42:35    0    5    0    2    2    0    0    0    0    2    2    0 1872  124
16:42:45    0    5    0    2    2    0    0    0    0    2    2    0 1652  196
16:42:55    0    5    0    1    1    0    0    0    0    1    1    0 1619  155
16:43:05    0    5    0    1    1    0    0    0    0    1    1    0 1660  103
16:43:15    0    5    0    1    1    0    0    0    0    1    1    0 1479  115
16:43:25    0    4    0    0    0    0    0    0    0    0    1    0 1526  135
16:43:36    0    5    0    3    3    0    0    0    0    3    2    0 1738  137
16:43:46    0    5    0    2    2    0    0    0    0    2    2    0 1533  112
16:43:56    0    4    0    3    3    0    0    0    0    3    4    0 1888  142
16:44:06    0    4    0    4    4    0    0    0    0    4    4    0 1719  123
16:44:16    0    4    0    4    4    0    0    0    0    4    4    0 1917  153
16:44:26    0    4    0    3    3    0    0    0    0    3    3    0 1805  143
16:44:36    0    4    0    3    3    0    0    0    0    3    3    0 2229  126
16:44:46    0    4    0    4    4    0    0    0    0    4    4    0 1676  182
16:44:56    0    5    0    5    5    0    0    0    0    5    4    0 2128  137
16:45:06    0    4    0    3    3    0    0    0    0    3    4    0 1974  150
16:45:16    0    5    0    4    4    0    0    0    0    4    3    0 2007  149
16:45:26    0    0    0    0    0    0    0    0    0    0    4    0 1068  103
16:45:36    0    0    0    0    0    0    0    0    0    0    0    0 1030  110
16:45:46    0    0    0    0    0    0    0    0    0    0    0    0  863  104
16:45:56    0    0    0    0    0    0    0    0    0    0    0    0  918  133
16:46:06    0    0    0    0    0    0    0    0    0    0    0    0  956   92
16:46:16    0    0    0    0    0    0    0    0    0    0    0    0 1009  109
16:46:26    0    0    0    0    0    0    0    0    0    0    0    0 1282  114
16:46:36    0    0    0    0    0    0    0    0    0    0    0    0 1382   90
16:46:46    0    0    0    0    0    0    0    0    0    0    0    0 1240  101
16:46:56    0    0    0    0    0    0    0    0    0    0    0    0 1333  105
16:47:06    0    0    0    0    0    0    0    0    0    0    0    0 1224   78
16:47:17    0    0    0    0    0    0    0    0    0    0    0    0 1297  118
16:47:27    0    0    0    0    0    0    0    0    0    0    0    0 1358   98
16:47:37    0    0    0    0    0    0    0    0    0    0    0    0 1340   83
16:47:47    0    0    0    0    0    0    0    0    0    0    0    0 1287  103
16:47:57    0    0    0    0    0    0    0    0    0    0    0    0 1293  117
16:48:07    0    0    0    0    0    0    0    0    0    0    0    0 1331  108
16:48:17    0    0    0    0    0    0    0    0    0    0    0    0 1128  108
16:48:27    0    0    0    0    0    0    0    0    0    0    0    0 1295   93
16:48:37    0    1    0    1    1    0    0    0    0    1    0    0 1462   69
16:48:47    0    0    0    0    0    0    0    0    0    0    1    0 1118   81
16:48:57    0    0    0    0    0    0    0    0    0    0    0    0 1235   91
16:49:07    0    0    0    0    0    0    0    0    0    0    0    0 1308   77
16:49:17    0    0    0    0    0    0    0    0    0    0    0    0 1345   84
16:49:27    0    0    0    0    0    0    0    0    0    0    0    0 1432  100
16:49:37    0    0    0    0    0    0    0    0    0    0    0    0 1022   64
16:49:47    0    0    0    0    0    0    0    0    0    0    0    0    0    0
16:49:57    0    0    0    0    0    0    0    0    0    0    0    0    0    0
16:50:07    0    0    0    0    0    0    0    0    0    0    0    0    0    0
16:50:17    0    0    0    0    0    0    0    0    0    0    0    0    0    0
16:50:27    0    0    0    0    0    0    0    0    0    0    0    0    0    0
16:50:37    0    0    0    0    0    0    0    0    0    0    0    0    0    0
16:50:47    0    0    0    0    0    0    0    0    0    0    0    0    0    0
16:50:57    0    0    0    0    0    0    0    0    0    0    0    0    0    0
16:51:07    0    0    0    0    0    0    0    0    0    0    0    0    0    0
16:51:17    0    0    0    0    0    0    0    0    0    0    0    0    0    0
16:51:27    0    0    0    0    0    0    0    0    0    0    0    0    0    0
16:51:37    0    0    0    0    0    0    0    0    0    0    0    0    0    0
16:51:47    0    0    0    0    0    0    0    0    0    0    0    0    0    0
16:51:57    0    0    0    0    0    0    0    0    0    0    0    0    0    0
16:52:07    0    0    0    0    0    0    0    0    0    0    0    0    0    0
16:52:17    0    0    0    0    0    0    0    0    0    0    0    0    0    0
16:52:27    0    0    0    0    0    0    0    0    0    0    0    0    0    0
16:52:37    0    0    0    0    0    0    0    0    0    0    0    0    0    0
16:52:47    0    0    0    0    0    0    0    0    0    0    0    0    0    0
16:52:57    0    0    0    0    0    0    0    0    0    0    0    0    0    0
16:53:08    0    0    0    0    0    0    0    0    0    0    0    0    0    0
16:53:18    0    0    0    0    0    0    0    0    0    0    0    0    0    0
16:53:28    0    0    0    0    0    0    0    0    0    0    0    0    0    0

Tue Mar 11 16:49:36 2014 daemon.info hostapd: wlan2: STA a0:f3:c1:f8:9b:e0 IEEE 802.11: disconnected due to excessive missing ACKs
Tue Mar 11 16:49:53 2014 daemon.info hostapd: wlan2: STA 7c:d1:c3:6d:16:e6 IEEE 802.11: disconnected due to excessive missing ACKs
Tue Mar 11 16:50:06 2014 daemon.info hostapd: wlan2: STA a0:f3:c1:f8:9b:e0 IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
Tue Mar 11 16:50:23 2014 daemon.info hostapd: wlan2: STA 7c:d1:c3:6d:16:e6 IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)

I couldn't wait more, issued the 'wifi' command that restored the AP.

comment:125 Changed 2 years ago by pedro@…

I seem to be hitting this issue on 12.09 on a WDR4300. Adding a comment here as Trac account register seems to be broken and there doesn't seem to be a way to just subscribe to the bug report. Sorry for the spam.

comment:126 follow-up: Changed 2 years ago by nbd

please try copying http://nbd.name/950-test3.patch to package/kernel/mac80211/patches, rebuild and test again.

comment:127 Changed 2 years ago by nbd

  • Resolution set to no_response
  • Status changed from reopened to closed

comment:128 Changed 2 years ago by anonymous

I have the same issue with Firmware Version OpenWrt Barrier Breaker r40887 on my TL-WA901N/ND v2.

Has somebody found a solution?

Falcon83

comment:129 Changed 23 months ago by anonymous

Try BB r41156 or later. Works for me on TL-WA901N/ND v2. See #15320

comment:130 Changed 22 months ago by thelists@…

Can confirm still broken for me; I just experienced this problem again on my TL-WDR4300 on BB r41391.

I only recently found this bug report, but have experienced this on the same hardware with an older revision of BB.

comment:131 Changed 22 months ago by nbd

Instead of saying "this problem", please describe the *exact* symptoms you're seeing. What you're experiencing is most likely a different bug, because you're using a chipset that is quite different from the one in the WRT160NL

comment:132 Changed 22 months ago by thelists@…

I arrived here searching for ar71xx (from my tl-wdr4300) and issues with wireless. I'm experiencing wireless-only connection issues (significant packet loss) after being up for some period of time greater than 24 - 48 hours. The period of time after which the problem manifests is not constant, and, a restart resolves the issue.

I apologize for any confusion, or that I may be posting in the wrong location. It just seemed like the same, or very similar, issue.

comment:133 Changed 21 months ago by jow

  • Milestone changed from Attitude Adjustment 12.09 to Barrier Breaker 14.07

Milestone Attitude Adjustment 12.09 deleted

comment:134 in reply to: ↑ 126 ; follow-up: Changed 20 months ago by dap@…

Replying to nbd:

please try copying http://nbd.name/950-test3.patch to package/kernel/mac80211/patches, rebuild and test again.

Sorry, I lost my interest, the workaround has worked well and got a little tired of this long standing bug.

Now I've some spare time, I'm testing r42516 right now;
everything is OK in the first few hours even under stress.

I'll report in a few days. I hope you can close this ticket as "fixed".

comment:135 in reply to: ↑ 134 Changed 20 months ago by dap@…

Replying to dap@…:

Replying to nbd:

please try copying http://nbd.name/950-test3.patch to package/kernel/mac80211/patches, rebuild and test again.

Now I've some spare time, I'm testing r42516 right now;

Stability is currently OK, but the performance issue I mentioned in comment:108 still there. It hitted the AP after about 24h uptime.

ANI stat under degraded performance when clients are idle:

         ANIR OFDM CCKL SPUP SPDW OWD1 OWD0 MRC1 MRC0 FIRU FIRD INVL OERR CERR
14:22:17    0    7    2    1    1    0    0    0    0    1    1    0 2859    5
14:22:27    0    6    2    1    1    0    0    0    0    1    2    0 2760    3
14:22:37    0    6    2    1    1    0    0    0    0    1    1    0 3310    6
14:22:47    0    6    2    1    1    0    0    0    0    1    1    0 3028   10
14:22:57    0    7    2    1    1    0    0    0    0    1    0    0 3120    3
14:23:07    0    7    2    1    1    0    0    0    0    1    1    0 3021   18
14:23:17    0    6    2    0    0    1    1    0    0    1    2    0 2218    4
14:23:27    0    7    2    1    1    0    0    0    0    1    0    0 3234    6

ANI stat under degraded performance when a client downloading:

         ANIR OFDM CCKL SPUP SPDW OWD1 OWD0 MRC1 MRC0 FIRU FIRD INVL OERR CERR
14:26:48    0    6    2    0    0    0    0    0    0    0    1    0 1666   58
14:26:58    0    7    2    2    2    0    0    0    0    2    1    0 1866   58
14:27:08    0    7    2    0    0    1    1    0    0    1    1    0 1903   55
14:27:18    0    7    2    0    0    3    3    0    0    3    3    0 1386   33
14:27:28    0    6    2    1    1    0    0    0    0    1    2    0 1721   45
14:27:38    0    7    2    2    2    0    0    0    0    2    1    0 2266   40
14:27:48    0    5    2    1    1    0    0    0    0    1    3    0 1755   41
14:27:58    0    5    2    2    2    0    0    0    0    2    2    0 1915   39

Here I issued the "wifi" command, normal performance has been restored immediately.

ANI stat when clients are idle:

         ANIR OFDM CCKL SPUP SPDW OWD1 OWD0 MRC1 MRC0 FIRU FIRD INVL OERR CERR
14:39:10    0    8    2    0    0    0    0    0    0    1    1    0 2899    5
14:39:20    0    6    2    0    0    1    0    0    0    0    2    0 2899    4
14:39:30    0    9    2    1    1    0    1    0    0    3    0    0 3413    9
14:39:40    0    7    2    0    0    1    0    0    0    0    2    0 2708   15
14:39:50    0    9    2    0    0    0    1    0    0    2    0    0 3557   18
14:40:00    0    9    2    0    0    0    0    0    0    1    1    0 3190    7
14:40:10    0    7    2    0    0    1    0    0    0    0    2    0 2598    7
14:40:20    0    9    2    0    0    0    1    0    0    2    0    0 3099   43

ANI stat when a client downloading with high speed:

         ANIR OFDM CCKL SPUP SPDW OWD1 OWD0 MRC1 MRC0 FIRU FIRD INVL OERR CERR
14:36:22    0    7    2    0    0    1    0    0    0    0    1    0 2897   13
14:36:32    0    7    2    0    0    1    1    0    0    1    1    0 2293   42
14:36:42    0    9    2    0    0    0    1    0    0    2    0    0 1766   92
14:36:52    0    9    2    0    0    0    0    0    0    0    0    0 1370   84
14:37:02    0    8    2    0    0    0    0    0    0    0    1    0 1550   88
14:37:12    0    9    2    0    0    0    0    0    0    1    0    0 1581   92
14:37:22    0    9    2    0    0    0    0    0    0    1    1    0 1287   96
14:37:32    0    9    2    0    0    0    0    0    0    0    0    0 1495   84

The performance issue is noticable on normal web browsing too.

Good to see the improvement on stability, but this performance issue is still a significant problem.

comment:136 Changed 16 months ago by anonymous

  • Resolution no_response deleted
  • Status changed from closed to reopened

I can't disable ani.
it says no directory found for;

/sys/kernel/debug/ieee80211/phy0/ath9k/ani

should i build openwrt from scratch with ath9k debugging on?

comment:137 Changed 15 months ago by italovalcy@…

Same problem with WR842NDv2 (chipset Atheros AR9341). I'm using openwrt BB kernel 3.10.49) and wifi in Ad-hoc mode. The problem is about transmission erros:

prompt# iwconfig wlan0
wlan0     IEEE 802.11bgn  ESSID:"foobar"  
          Mode:Ad-Hoc  Frequency:2.412 GHz  Cell: 06:DA:85:6E:97:4B   
          Tx-Power=18 dBm   
          RTS thr:off   Fragment thr:off
          Encryption key:off
          Power Management:off
prompt# iw dev wlan0 station dump
Station a0:f3:c1:0e:16:c3 (on wlan0)
	inactive time:	50 ms
	rx bytes:	548427
	rx packets:	12661
	tx bytes:	45830
	tx packets:	455
	tx retries:	134
	tx failed:	363
	signal:  	-44 [-46, -49] dBm
	signal avg:	-44 [-46, -48] dBm
	tx bitrate:	24.0 MBit/s
	rx bitrate:	54.0 MBit/s
	authorized:	yes
	authenticated:	yes
	preamble:	long
	WMM/WME:	yes
	MFP:		no
	TDLS peer:	no

If I run a tcpdump on that interface, I can see the kernel answer but the packet is not transmitted as I cannot see it on the other device. I have tried the workround (echo 0 > /sys/kernel/debug/ieee80211/phy0/ath9k/ani), but it didn't work for me.

Do you have any other idea?

Thanks.

comment:138 Changed 14 months ago by italovalcy@…

Hello everyone,

I've tried with the recent kernel (3.18.7) from openwrt trunk, and to packet loss was solved! I build the image from openwrt doc [1].

[1] http://wiki.openwrt.org/doc/howto/build

comment:139 Changed 14 months ago by TaiSHi

I tried 3.18.7 and it failed after 36 hours. Initially speed dropped to 1mb/s and then failed completely.

comment:140 Changed 14 months ago by TaiSHi

Also, I can't seem to be able to disable ANI, it returns permission denied when trying to output echo to the file

comment:141 follow-up: Changed 14 months ago by nbd

please try r44696

comment:142 in reply to: ↑ 141 Changed 14 months ago by dap@…

Replying to nbd:

please try r44696

I'm testing it. I'll report in a few days.

comment:143 Changed 14 months ago by dap@…

The performance problem is still there.

ANI stat under degraded performance:

         ANIR OFDM CCKL SPUP SPDW OWD1 OWD0 MRC1 MRC0 FIRU FIRD INVL OERR CERR
20:22:10    0    5    2    2    2    0    0    0    0    2    1    0 1429   89
20:22:20    0    4    2    1    1    0    0    0    0    1    2    0 1334   89
20:22:30    0    5    2    3    3    0    0    0    0    3    2    0 1527  111
20:22:40    0    5    2    1    1    0    0    0    0    1    1    0 1450   98
20:22:50    0    4    2    1    1    0    0    0    0    1    2    0 1536  116
20:23:00    0    4    2    0    0    0    0    0    0    0    0    0 1792  142
20:23:10    0    5    2    1    1    0    0    0    0    1    0    0 1559  124
20:23:20    0    4    2    0    0    0    0    0    0    0    1    0 1568   98
20:23:30    0    5    2    1    1    0    0    0    0    1    0    0 1585  127
20:23:40    0    4    2    2    2    0    0    0    0    2    3    0 1463  118

The "wifi" command restored normal AP performance. ANI stat under downloading:

         ANIR OFDM CCKL SPUP SPDW OWD1 OWD0 MRC1 MRC0 FIRU FIRD INVL OERR CERR
20:28:12    0    5    2    0    0    0    0    0    0    0    0    0  586  161
20:28:22    0    5    2    0    0    0    0    0    0    0    0    0  691  148
20:28:32    0    5    2    0    0    0    0    0    0    0    0    0  674  158
20:28:42    0    5    2    0    0    0    0    0    0    0    0    0  660  143
20:28:52    0    5    2    1    1    0    0    0    0    1    1    0  915  123
20:29:02    0    5    2    0    0    0    0    0    0    0    0    0  670  156
20:29:12    0    5    2    1    1    0    0    0    0    1    1    0  791  140
20:29:22    0    6    2    1    1    0    0    0    0    1    0    0 1044  155
20:29:32    0    6    2    1    1    0    0    0    0    1    1    0  702  137

ANI stats are very similar for me. Should I post it or does not help anymore?

Now I testing with disabled ANI to be sure this is still an ANI-related issue.

comment:144 Changed 13 months ago by dap@…

I've serious performance issues on r44696 without ANI too. I did not find a stable and high-performance setup yet, I have no idea what's going on but it's definitely not (just) ANI issue anymore.

comment:145 Changed 13 months ago by dap@…

I've identified 2 independent performance problems:

  1. r44655 introduced a 100% reproducible regression: signal level is lower than r44654, throughput performance dropped by 50% in the same environment.
  2. The ANI issue what I have unfolded above.

Regression of r44655 is independent from this ticket.

comment:146 Changed 12 months ago by dap@…

I'm using r44654 from 6 weeks ago and seems like this issue has been resolved in this version. I did not find what fixed this problem exactly, but sure it happened somewhere between r42516 and r44654.

I still have some problems but that is an another issue, another ticket. This one is resolved in r44654. Thank you!

comment:147 Changed 11 months ago by nbd

  • Resolution set to fixed
  • Status changed from reopened to closed

comment:148 follow-up: Changed 11 months ago by taishi@…

I'm using r44685 and have this issue, did you happen to test newer builds as well as r44654?

comment:149 in reply to: ↑ 148 Changed 11 months ago by dap@…

Replying to taishi@…:

I'm using r44685 and have this issue, did you happen to test newer builds as well as r44654?

r44655 introduced a serious regression (see comment:145) but r45743 is OK again for me (uptime: 10 days). Try r44654 or something after r45743.

comment:150 follow-up: Changed 11 months ago by taishi@…

r45884 failed after 48 hours or so. I'll build r44654 and see how it performs.

comment:151 Changed 11 months ago by taishi@…

r44654 failed after <20 hours with ANI enabled. Now did a factory reset and disabled ANI, let's see how it goes.

comment:152 in reply to: ↑ 150 ; follow-up: Changed 10 months ago by dap@…

Replying to taishi@…:

r45884 failed after 48 hours or so.

Damn, you're right. The breakdown is much rare than before, maybe something changed in my enviroment, but yes, it's still there. :(

Last time the following command fixed it instantly, did not have to restart anything:

echo 0 >/sys/kernel/debug/ieee80211/phy0/ath9k/ani

comment:153 in reply to: ↑ 152 Changed 10 months ago by taishi@…

Replying to dap@…:

Last time the following command fixed it instantly, did not have to restart anything:

echo 0 >/sys/kernel/debug/ieee80211/phy0/ath9k/ani

That's pretty detriment to performance, although the only way to keep it up. This issue seems to affect just a small number of people, even with same hardware, could it be something hw related?

comment:154 follow-up: Changed 8 months ago by dap@…

  • Resolution fixed deleted
  • Status changed from closed to reopened

It is not fixed, still an issue on r45743.

Additional problems summarized in #comment:117 are actual too, disabling ANI does not affect those:

  • "AP disaster" issue, symptom: all clients are disconnected and impossible to connect to the AP (the SSID is there).
  • "downloader stalling" issue, symptom: I have to reconnect. Other clients may not be affected.

Although some hours of heavy stressing needed to reproduce one of those randomly. Are there open tickets which may be about them?

Due to WRT160NL stability issues I recently bought a WRT1200AC. My WRT160NL being idle if something worth a try. The downloader stalling issue might be client bug, I will use the new AP to boil it down.

comment:155 Changed 8 months ago by anonymous

I am running r46724 on a TP-LINK TL-WR703N and the exact issues described on comment:154 are still present. Is there a list of ath9k devices that are NOT affected by this issue?

comment:156 Changed 8 months ago by taishi@…

I'm not exactly sure if this affects the entire ath9k, router models or just SOME routers (as I have people with my own router -not surve if same rev- working flawlessly).
@anonymous on comment:155 tested a WDR4300 for a customer (can't recall rev, but I think they're all 1.0 :P) and worked beyond perfect

/cheers

comment:157 follow-up: Changed 8 months ago by anonymoux

I'm asking because I have a TP-LINK TL-WR841ND and a D-Link DIR-615 that both have the exact same problem and they are all ath9k devices. It works fine on stock firmware so I know it's not a hardware or interference issue. This seems like a regression since I remember it working flawlessly a few years back.

comment:158 in reply to: ↑ 157 Changed 8 months ago by dap@…

Replying to anonymoux:

I'm asking because I have a TP-LINK TL-WR841ND and a D-Link DIR-615 that both have the exact same problem and they are all ath9k devices. It works fine on stock firmware so I know it's not a hardware or interference issue. This seems like a regression since I remember it working flawlessly a few years back.

As I can rembember my ath9k was rock stable with Backfire, troubles came with AA. I had plans to join issues to a specific revision started from Backfire but ath9k was broken&fixed so many times from that point that I'm afraid it could not help in the current issues.

I did not find a flawlessly working release in the past years also.

comment:159 Changed 8 months ago by dap@…

I reviewed all comments, affected wifi chipsets probably are:

  • AR9102
  • AR9103

Reports with different problems are sorted out. Many of you did not report hardware revision but every reported router has this problem has a version with the chipsets above.

My WRT160NL traded for a TL-WDR4300 yesterday. I stress this new toy with heavy traffic+noise but it's stable for the time being. The wifi chip is a AR9340.

Anybody has this issue with a different wifi chip?

comment:160 Changed 7 months ago by taishi@…

My 1043ND has AR9103, but I have reports that same router performs fine.
I'll be getting a Tl-WDRR4300 in a few weeks, although I've instaled one of those and has >60 day uptime with no hiccups

comment:161 Changed 6 months ago by anonymous

I seem to be having similar issues on Mikrotik rb493g with R52Hn miniPCI card with ar9220 installed.

comment:162 in reply to: ↑ 154 Changed 5 months ago by anonymous

Replying to dap@…:

It is not fixed, still an issue on r45743.

Additional problems summarized in #comment:117 are actual too, disabling ANI does not affect those:

  • "AP disaster" issue, symptom: all clients are disconnected and impossible to connect to the AP (the SSID is there).
  • "downloader stalling" issue, symptom: I have to reconnect. Other clients may not be affected.

Although some hours of heavy stressing needed to reproduce one of those randomly. Are there open tickets which may be about them?

See also https://dev.openwrt.org/ticket/11862.

comment:163 in reply to: ↑ description Changed 3 months ago by anonymous

Replying to dap@…:

I began to monitor some wifi parameters with munin recently, I found the following on the WRT AP when this problem kicks in:

Can you attach or pastebin these plugins? Hopefully you're still monitoring this ticket.

Add Comment

Modify Ticket

Action
as reopened .
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.