Opened 3 years ago

Last modified 3 years ago

#13612 new defect

Client on wndr3700 suddenly stops receiving ARP replies from LAN side

Reported by: syzop@… Owned by: developers
Priority: normal Milestone: Chaos Calmer 15.05
Component: packages Version: Trunk
Keywords: Cc:


==[ SUMMARY ]==
For some reason my WNDR3700 access points occasionally suddenly stop forwarding ARP replies from a server on the LAN side to a particular (wireless) client. Actually it's not just ARP, it's true for all ethernet traffic with a destination of that particular (wireless) client.

The affected client (laptop) is random. Other clients on the same AP are (almost?) always unaffected. Identical laptop hardware sitting next to the affected laptop works perfectly fine.

Bringing wifi down & up doesn't fix the issue.

Version: r36692 and r34200

See below for more information.


Laptop )wifi)) AP <-wired-> SWITCH <-wired-> SERVER
                                       `---- SERVER2

I have 14 of these AP's. All kinds of laptops (different brands, etc) experience this issue, and it happens with all the AP's (all of which are WNDR3700).

Around 350 clients are associated at any random time to the 14 AP's (in total). There are many (really MANY) associate/disassocate events at certain times, as this is a high school and people move from one place to another multiple times a day. Just in case it matters...

==[ WHAT WORKS ]==

  • Pinging from laptop to AP (and vice versa)
  • Pinging from AP to SERVER & SERVER2
  • Traffic from laptops on the same AP to SERVER & SERVER2

From affected client (laptop):

  • Pinging SERVER (
  • Pinging SERVER2
  • Any traffic from SERVER to LAPTOP

11:31:52.506153 08:3e:8e:a2:f2:2d > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 1, p 0, ethertype ARP, Request who-has tell, length 28
11:31:53.506151 08:3e:8e:a2:f2:2d > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 1, p 0, ethertype ARP, Request who-has tell, length 28

11:34:03.221254 08:3e:8e:a2:f2:2d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has tell, length 46
11:34:03.221265 00:01:03:c1:d4:30 > 08:3e:8e:a2:f2:2d, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Reply is-at 00:01:03:c1:d4:30, length 28

==[ JUST ARP? NO ]==
Actually it's not just ARP. When I manually add the mac address of SERVER on the wifi client you no longer see any ARP request/reply trouble, but then still traffic doesn't work.
Such as PING from LAPTOP to SERVER: you see only ping requests @ AP, and ping request + replies on SERVER, so very similar to the ARP story, but this proves that ARP itself is not the problem.

What works:

  • Rebooting the access point
  • /etc/init.d/network restart

What does not work:

  • kill hostapd and restart wifi (or through wifi down & wifi up)
  • brctl to remove/add everything
  • in addition to that, bring all interfaces down, and up, same as /etc/init.d/network reload (reload! not restart). didn't help either.

In short: it SEEMS related to a driver(?), since 'network restart' fixes the issue and 'network reload' doesn't'

When it occurs and who is affected is random. It happens both to existing users (who are already connected) and users that bring their laptop that very same day and never were able to get online.

Other laptops (wifi clients) on the access point are almost always unaffected.

==[ OTHER ]==
I don't have a reproducable test case, so I rely on staff to bring affected laptops. There are always a few people per day which experience this issue. I've already been hunting this down for hours, without much success.

Any help or suggestions would be greatly appreciated.

Attachments (2)

config-dump.txt (15.0 KB) - added by syzop@… 3 years ago.
uci export of one of the affected access points
main-bg-bottom.gif (352 bytes) - added by Slavon 2 years ago.
DemTech Promo

Download all attachments as: .zip

Change History (6)

comment:1 Changed 3 years ago by anonymous

component should probably be kernel

Changed 3 years ago by syzop@…

uci export of one of the affected access points

comment:2 Changed 3 years ago by syzop@…

Today I changed the following on 21 AP's:

uci set network.@switch[0].enable_learning=0
uci delete network.@switch[0].enable_vlan
uci commit

& restart

I hope this helps, as I suspect the internal switch to be the source of this issue.

comment:3 Changed 3 years ago by syzop@…

It seems we are no longer having any problems with disappearing traffic.

uci set network.@switch[0].enable_learning=0
uci commit
# and reboot..

(the vlan change I mentioned earlier is unlikely to be related, though I haven't double checked)

Does this indicate a hardware problem in the WNDR3700v2 internal switch? Or the driver?

Well anyway, I'm happy this workaround works for me :)
I've added a note about this in the WNDR3700 wiki so others won't have to waste tens of hours of time on this.

comment:4 Changed 3 years ago by syzop@…

By the way, I now see I didn't mention this. But when I change the mac address of the affected wifi client, then that client works perfectly again. That's (another reason) why I highly suspect the internal switch.

Changed 2 years ago by Slavon

Add Comment

Modify Ticket

as new .

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.