Modify

Opened 7 years ago

Last modified 8 months ago

#6819 reopened defect

DHCP passing through wan to lan on TL-WR1043ND

Reported by: KillaB Owned by: juhosg
Priority: high Milestone: Barrier Breaker 14.07
Component: base system Version: Trunk
Keywords: ar71xx Cc: juhosg

Description

During router boot (running r20010) my client PC is obtaining an IP from my upstream router, requiring a DHCP release/renew once the router is fully booted. Confirmed with wireshark.

Attachments (2)

wr1043.txt (25.1 KB) - added by KillaB 7 years ago.
Timing of OEM firmware
dhcpdump-tl-wr1043nd.txt (4.7 KB) - added by hanno.schupp@… 5 years ago.
dhcpdump from my PC connected to the booting tl-wr1043nd

Download all attachments as: .zip

Change History (37)

comment:1 Changed 7 years ago by nico

That seems really odd because DHCP packets are not usually routed between interfaces without using some kind of DHCP relay or forwarding.

comment:2 Changed 7 years ago by jow

I believe this is "normal", the switch will pass anything through until swconfig finished the vlan setup.

No idea how to fix that though, there will always be a delay until swconfig is launched. During the time between the bootloader initializing (or not initializing) the switch and swconfig taking over, traffic will leak through.

comment:3 Changed 7 years ago by cezary@…

Very early script to isolate wan from switch? Or hard-coded initial vlan for wan in driver.

comment:4 Changed 7 years ago by KillaB

Loaded the original firmware back onto the WR1043ND and confirmed that there is still switch leakage, however the timing of does not allow for the PC to obtain an IP from the upstream DHCP server.

comment:5 Changed 7 years ago by KanjiMonster

Probably the window could be made smaller if the switch driver defaults to a port isolation setup (i.e. all ports can communicate with the cpu port, but not with any other port).

Changed 7 years ago by KillaB

Timing of OEM firmware

comment:6 Changed 7 years ago by KillaB

Sorry for the extra formatting in the attach. It's my first time running cereal and for some reason couldn't strip them out using sed.

comment:7 Changed 7 years ago by Bodlio

Same situation with stock, openwrt and dd-wrt firmwares.
Currently there is no fix :(

comment:8 Changed 7 years ago by KillaB

Yes, there is leakage with the stock firmware, but it is very minimal in comparison with OpenWrt.

The stock firmware appears to do multiple up/down/up actions which causes my PC to think there is a cable disconnect before it can finish it's first DHCP renew. By the time the switch is fully configured and the firewall is up, I can obtain an IP in the proper subnet. Sadly with OpenWrt, this is not the case. I'm able to obtain an IP through the switch during boot, which requires a manual DHCP release/renew to correct.

comment:9 Changed 6 years ago by KillaB

Is this a problem with just the WR1043ND, or all rtl8366 switch based devices?

I just connected the WAN port of my WR1043ND to an ADSL modem in bridged mode (no PPPoX with my ISP) and rebooted. In the short time it took for the WR1043ND to boot, my LAN devices had already pulled two public IP's from my ISP!

comment:10 Changed 6 years ago by jow

It probably affects all devices with no dedicated phy for wan.

comment:11 Changed 6 years ago by anonymous

If the WAN port is always on the same switch port, hardcoding a different vlan for it on the driver or simply keeping it down until configured seems to be the most elegant way

comment:12 Changed 6 years ago by anonymous

I there any fix for that? Still present on 10.3.1rc4.

comment:13 Changed 6 years ago by anonymous

Still present in rc4 but I can confirm that early openwrt.groov.pl kamikaze build (maybe r19581) for WR1043ND didn't have this problem, at least my pc's never obtained ip's from upstream router but in rc4 I can even browse around the internet at least maybe 10 seconds before switch gets configured after which I need to release-renew and restart networkings to all my dhcp based client pc's.

comment:14 Changed 6 years ago by anonymous

also openwrt.groov.pl did use his own patched to add WR1043ND support, so openwrt svn probably does not have needed code to view changes.

comment:15 Changed 6 years ago by obsy

openwrt.groov.pl uses old code with hardcoded vlan's (this is code from openwrt's forum, before modification based on swconfig)

comment:16 Changed 6 years ago by anonymous

Maybe it's possible to deactivate whole switch at boot-up inside driver by default and use swconfig to turn it on while configuring it.

comment:17 follow-up: Changed 6 years ago by juhosg

  • Owner changed from developers to juhosg
  • Status changed from new to accepted

comment:18 in reply to: ↑ 17 Changed 6 years ago by snowyowlster

Replying to juhosg:
Looks like this has been fixed. Ran tests comparing old with new and there doesn't seem to be any dhcp leakage at all. Also interesting that since the router reset interfaces in the process of booting, Windows clients automatically discarded the leaked dhcp address and got the right address.

comment:19 Changed 6 years ago by KillaB

Excellent work juhosg! Even with a static IP on my Windows client, the first packet I see is the failsafe packet. Absolutely no leakage from what I can see as well.

comment:20 Changed 6 years ago by KillaB

Any chance these changes can be easily rolled into backfire?

comment:21 Changed 6 years ago by juhosg

  • Resolution set to fixed
  • Status changed from accepted to closed

Fixed in r24938 (trunk) and r24998 (Backfire).

comment:22 Changed 5 years ago by Hanno.Schupp@…

  • Resolution fixed deleted
  • Status changed from closed to reopened

Unfortunately this exact same problem is occurring again in trunk 31314. I suspect the major set of changes for the rtl8366 type of switches introduced in revisions 30842-30857 reintroduced this problem and led to a regression.

comment:23 Changed 5 years ago by anonymous

Just done some more testing and found the problem also exists in backfire 10.03.1 and in trunk 80835. The regression must have started a while back.

Changed 5 years ago by hanno.schupp@…

dhcpdump from my PC connected to the booting tl-wr1043nd

comment:24 Changed 5 years ago by anonymous

Attached some dhcpdump output from my linux pc connected to the tl-wr1043nd router running openwrt trunk.

The first dhcp request and response are happening within the first second of a boot process, before the four lan LEDs flash and long before the sys LED starts flashing. You can see how the pc receives the ip address 192.168.20.107 from the cable modem on 192.168.20.1 to which the tp-link is connected.

The other requests are happening much later, presumably, when the network starts and dhcpmasq kicks in, but gets nowhere, as the connection is already broken due to incorrect ip address assignment.

comment:25 Changed 5 years ago by hanno.schupp@…

.. and some more. Extract of dmesg of a boot.
You can see how early in the Openwrt load process after the LEDs are activated and before the cfg80211 starts activating the USB drive and wireless that the lan is activated and deactivated and br-lan is out into forwarding mode, allowing for the bogous DHCP request forwarding to the upstream device.

...
Please be patient, while OpenWrt loads ...
ar71xx: pll_reg 0xb8050014: 0x1a000000
eth0: link up (1000Mbps/Full duplex)
Registered led device: tl-wr1043nd:green:usb
Registered led device: tl-wr1043nd:green:system
Registered led device: tl-wr1043nd:green:qss
Registered led device: tl-wr1043nd:green:wlan
mini_fo: using base directory: /
mini_fo: using storage directory: /overlay
eth0: link down
ar71xx: pll_reg 0xb8050014: 0x1a000000
eth0: link up (1000Mbps/Full duplex)
device eth0.1 entered promiscuous mode
device eth0 entered promiscuous mode
br-lan: port 1(eth0.1) entering forwarding state
Compat-wireless backport release: compat-wireless-2010-12-10-3-g880bb0b
Backport based on wireless-testing.git master-2010-12-16
cfg80211: Calling CRDA to update world regulatory domain
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
cfg80211: World regulatory domain updated:
cfg80211:     (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
cfg80211:     (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
cfg80211:     (2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
cfg80211:     (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
cfg80211:     (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
cfg80211:     (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
...

comment:26 Changed 5 years ago by anonymous

Sorry, the above dmesg was for r24943. The following is for trunk 31356
You can see how the switch is started up at about 0.6 seconds, and eth0 is clamped shut at about 6secs into the boot, so there are bout ~5.5secs where traffic can 'leak' between lan and wan interface uncontrolled leading to the dhcp confusion.

...
[    0.520000] Creating 5 MTD partitions on "spi0.0":
[    0.530000] 0x000000000000-0x000000020000 : "u-boot"
[    0.540000] 0x000000020000-0x0000000fcc00 : "kernel"
[    0.540000] mtd: partition "kernel" must either start or end on erase block boundary or be smaller than an erase block -- forcing read-only
[    0.560000] 0x0000000fcc00-0x0000007f0000 : "rootfs"
[    0.560000] mtd: partition "rootfs" must either start or end on erase block boundary or be smaller than an erase block -- forcing read-only
[    0.580000] mtd: partition "rootfs" set to be root filesystem
[    0.580000] mtd: partition "rootfs_data" created automatically, ofs=4A0000, len=350000 
[    0.590000] 0x0000004a0000-0x0000007f0000 : "rootfs_data"
[    0.600000] 0x0000007f0000-0x000000800000 : "art"
[    0.600000] 0x000000020000-0x0000007f0000 : "firmware"
[    0.610000] Realtek RTL8366RB ethernet switch driver version 0.2.3
[    0.620000] rtl8366rb rtl8366rb: using GPIO pins 18 (SDA) and 19 (SCK)
[    0.630000] rtl8366rb rtl8366rb: RTL5937 ver. 3 chip found
[    0.670000] rtl8366rb: probed
[    0.680000] eth0: Atheros AG71xx at 0xb9000000, irq 4
[    0.980000] TCP westwood registered
[    0.990000] NET: Registered protocol family 17
[    0.990000] 8021q: 802.1Q VLAN Support v1.8
[    1.000000] VFS: Mounted root (squashfs filesystem) readonly on device 31:2.
[    1.010000] Freeing unused kernel memory: 192k freed
[    3.020000] ar71xx: pll_reg 0xb8050014: 0x1a000000
[    3.020000] eth0: link up (1000Mbps/Full duplex)
[    3.050000] Registered led device: tp-link:green:usb
[    3.050000] Registered led device: tp-link:green:system
[    3.050000] Registered led device: tp-link:green:qss
[    3.060000] Registered led device: tp-link:green:wlan
[    6.350000] JFFS2 notice: (406) jffs2_build_xattr_subsystem: complete building xattr subsystem, 1 of xdatum (1 unchecked, 0 orphan) and 24 of xref (0 dead, 10 orphan) found.
[    6.540000] eth0: link down
[    8.580000] Compat-wireless backport release: compat-wireless-2012-04-17-1-r31356
[    8.590000] Backport based on wireless-testing.git master-2012-04-17
[    8.620000] cfg80211: Calling CRDA to update world regulatory domain
[    9.200000] cfg80211: World regulatory domain updated:
[    9.200000] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
[    9.210000] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[    9.220000] cfg80211:   (2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
[    9.230000] cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
[    9.230000] cfg80211:   (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[    9.240000] cfg80211:   (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
...

comment:27 Changed 4 years ago by juhosg

  • Resolution set to fixed
  • Status changed from reopened to closed
  • Version changed from Kamikaze trunk to Trunk

Fixed in r32946.

comment:28 Changed 3 years ago by ok1djo@…

  • Resolution fixed deleted
  • Status changed from closed to reopened

this issue still last in 12.09, tp-link dir300, tp-link dir 600, asus wl-500gp, all doing the same. For me reboot of router means block from ISP, as all devices see LAN interface went down and up and all of them asks for IP through switch directly to ISP, which allows only one connected device per time, leading to ban from ISP (guess how did I found :/ )

comment:29 Changed 2 years ago by jow

  • Milestone changed from Attitude Adjustment 12.09 to Barrier Breaker 14.07

Milestone Attitude Adjustment 12.09 deleted

comment:30 Changed 2 years ago by khmtanveer@…

facing the same problem in barrier breaker. hardware is rtl8196d with 8192ce wireless. On LAN it is getting the ip of upstream router, strangely on WIFI it is giving the configured ip in /etc/config/network. I thought it was the problem related to dnsmasq so i upgraded it and also the firewall package.Still the same results i.e. wired gives upstream ip and wifi gives the network.

comment:31 Changed 22 months ago by anonymous

Same issue in Chaos Calmer r42830. Router forwarding DHCP requests to wan instead of handing ips to clients on the lan. Hardware: Lamobo-R1

comment:32 Changed 18 months ago by anonymous

Same issue with D-Link DIR-620 (ralink chip, sticker says P/N: RIR620EEU....A1E, H/W Ver.: A1). Router makes all its ports to work as switch at boot time which leads clients connected to LAN ports to obtain lease from upstream DHCP server. Tried 12.09, 14.07, 15.05-rc2. No issue with stock firmware (1.4.0).

comment:33 Changed 18 months ago by anonymous

Same issue with D-Link DIR-620 (ralink chip, sticker says P/N: RIR620EEU....A1E, H/W Ver.: A1). Router makes all its ports to work as switch at boot time which leads clients connected to LAN ports to obtain lease from upstream DHCP server. Tried 12.09, 14.07, 15.05-rc2. No issue with stock firmware (1.4.0).

comment:34 Changed 8 months ago by sangamshukla.cvs@…

Also facing the same issue in tplink743NDV2 with chaos calmer 15.05.1 and barrier breaker 1407,1407-rc1 & rc2.
Wan port is working as switch and forwarding all packets on WAN to LAN.

comment:35 Changed 8 months ago by pe0fko@…

Im running OpenWRT on the "KPN Experiabox V8" with version "OpenWrt Chaos Calmer 15.05 / LuCI (git-15.248.30277-3836b45)" and have the same problem.

When connecting to wifi I get a local DHCP address, when connecting to the LAN ports I get a address from the WAN DHCP server (provider router). After disconnection the WAN port I get a local address and reconnecting the wan port it is working ok.
The wifi and lan eth0 is bridged, the wan is connected to eth1.

Add Comment

Modify Ticket

Action
as reopened .
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.