Modify

Opened 5 years ago

Closed 5 years ago

Last modified 22 months ago

#9851 closed defect (fixed)

OpenWrt crashes after WPA rekeying

Reported by: Luiz Angelo Daros de Luca <luizluca@…> Owned by: developers
Priority: highest Milestone: Barrier Breaker 14.07
Component: base system Version: Backfire 10.03.1 RC5
Keywords: Cc:

Description

Hello,

I'm using openwrt in my TL-WR740N since a year, frequently reflashing. After updating to yesterday backfire RC6 (SVN), I noticed that openwrt crashes some seconds after some(most of) the WPA rekeys. This is syslog from my laptop and openwrt. Tico is my laptop and 192.168.3.1 is openwrt.

 Jul 30 16:50:33 tico NetworkManager[1325]: <info> (wlan0): supplicant connection state:  completed -> group handshake
 Jul 30 16:50:33 tico wpa_supplicant[1454]: WPA: Group rekeying completed with d8:5d:4c:f8:82:78 [GTK=TKIP]
 Jul 30 16:50:33 tico NetworkManager[1325]: <info> (wlan0): supplicant connection state:  group handshake -> completed
 Jul 30 16:50:33 192.168.3.1 hostapd: wlan0: STA 00:16:ea:d4:61:9e WPA: group key handshake completed (RSN)
 Jul 30 16:51:45 192.168.3.1 kernel: REJECT(wan):IN=eth1 OUT= MAC=d8:5d:4c:f8:82:79:00:0e:83:ca:bb:3f:08:00 SRC=177.0.48.72  DST=189.4.112.123 LEN=131 TOS=0x00 PREC=0x00 TTL=53 ID=27416  PROTO=UDP SPT=11584 DPT=10572 LEN=111 MARK=0x4 
 Jul 30 16:51:45 192.168.3.1 kernel: REJECT(wan):IN=eth1 OUT= MAC=d8:5d:4c:f8:82:79:00:0e:83:ca:bb:3f:08:00 SRC=201.59.75.144 DST=189.4.112.123 LEN=179 TOS=0x00 PREC=0x00 TTL=113 ID=15387 DF PROTO=TCP SPT=56693 DPT=40396 WINDOW=65535 RES=0x00 ACK PSH URGP=0 MARK=0
 Jul 30 16:52:46 tico wpa_supplicant[1454]: CTRL-EVENT-DISCONNECTED bssid=d8:5d:4c:f8:82:78 reason=0

BTW, before reflashing, I was using an midle-april svn checkout of backfire without this problem.

I was running a procfs top from a ssh and the system was running with free resources;

root@router:/tmp/usr/bin# ./top -d1

top - 16:52:25 up 22 min,  0 users,  load average: 0.40, 0.50, 0.47
Tasks:  38 total,   1 running,  37 sleeping,   0 stopped,   0  zombie
Cpu(s):  6.7%us, 14.4%sy,  0.0%ni, 40.4%id, 26.9%wa,  1.0%hi,  10.6%si,  0.0%st 
Mem:     29532k total,    24368k used,     5164k free,      516k buffers
Swap:        0k total,        0k used,        0k free,     4884k cached
Write failed: Broken pipe

The only process that constantly uses 10-20% of CPU is luci-bwc.
I have no serial port. Is there any way to detect why the router rebooted?

Attachments (0)

Change History (19)

comment:1 Changed 5 years ago by nbd

if it rebooted, check if there is a /sys/kernel/debug/crashlog file

comment:2 Changed 5 years ago by nbd

  • Resolution set to no_response
  • Status changed from new to closed

comment:3 Changed 5 years ago by Luiz Angelo Daros de Luca <luizluca@…>

  • Resolution no_response deleted
  • Status changed from closed to reopened

Sorry for "no_response" status. Isn't trac supposed to mail me when something changes in a issue I reported?

I'll look for the crashlog file in sysfs. Do I need to add any options to have crashlog avaiable?

comment:4 Changed 5 years ago by nbd

Crashlog support is enabled by default, no need to enable it separately. By the way, there are some updated RC6 builds here, you may want to give those a try if your build still crashes: http://openwrt.linux-appliance.net/ar71xx/

comment:5 Changed 5 years ago by Luiz Angelo Daros de Luca <luizluca@…>

I'm building right now the updated version. I can easily reproduce the crash. The interesting part is that this only happens when my laptop is connected (wlan Intel Corporation WiFi Link 5100) and while it is running Linux. There is a known bug that makes the wlan disconnect from time to time. Currently, I'm using "options iwlagn 11n_disable=1" as a workarround. Maybe it is related. I have no problem with a Atheros wlan or when running Windows.

Sorry, I cannot use your compiled firmware version. I need to remove some packages (i.e.: ppp) in order to make room for IPv6. My device has only 4MB. Also, it is faster to compile myself than to download 300M of ImageBuilder. I would be interesting to have something like http://rom-o-matic.net/ but for firmwares.

I didn't have the debugfs support. I'll compile it in. Is it this one?

--- Kernel build options
[*] Compile the kernel with Debug FileSystem enabled

It eats 70k of my image :)

(some minutes later)

The changes in SVN made my router get bricked! Just blink the led and reboot. I'll try to make it back to life somehow.

comment:6 Changed 5 years ago by Luiz Angelo Daros de Luca <luizluca@…>

I'm back. I soldered a serial cable in order to debrick my router. My model does not have an easy serial access as others. I needed to solder directly into the board (and without good skill nor good equipment)

So, I reproduced the problem. This is the serial log when the reboot occurs. No good info:

root@router:/#

U-Boot 1.1.4 (Mar 8 2010 - 10:29:42)

AP91 (ar7240) U-boot
DRAM:
sri
#### TAP VALUE 1 = 9, 2 = a
32 MB
id read 0x100000ff
(...)

And about the /sys/kernel/debug/crashlog file, I get nothing.

root@router:/# ls /sys/kernel/debug/
bdi gpio ieee80211 mips

What do I need to get this crashlog? I'm curious how it survive the device reboot process.

BTW, this problem seems to be what is reported in https://dev.openwrt.org/ticket/8600 and https://forum.openwrt.org/viewtopic.php?id=28009. I found it looking for how to get crashlog.

comment:7 Changed 5 years ago by nbd

Can you try compiling latest backfire SVN with this patch? http://nbd.name/backfire-mac80211-update.patch

comment:8 Changed 5 years ago by Luiz Angelo Daros de Luca <luizluca@…>

Wow, 200k patch. Is this from trunk?

Patched, compiled and running. I did a quick 15min stress test (torrent) and it passed. I'll try to do a longer test when my client (wife) stops using the Internet.

Generally, the router does not stand a full day long. It specially crashes when my intel wlan laptop shuts down.

comment:9 Changed 5 years ago by Luiz Angelo Daros de Luca <luizluca@…>

It crashed again after running for 02h 02min 51s (Backfire (10.03.1-RC6, r28106)). It didn't stand too much time with some torrents running. Router (re)booted at Aug 29 00:18:00

This is the last sign of life from it in my laptop's syslog:

Aug 29 00:17:45 tico wpa_supplicant[1229]: CTRL-EVENT-DISCONNECTED bssid=d8:5d:4c:f8:82:78 reason=0
Aug 29 00:17:45 tico NetworkManager[1107]: <info> (wlan0): supplicant connection state:  completed -> disconnected
Aug 29 00:17:45 tico kernel: [20684.220317] cfg80211: All devices are disconnected, going to restore regulatory settings
Aug 29 00:17:45 tico kernel: [20684.220330] cfg80211: Restoring regulatory settings
Aug 29 00:17:45 tico kernel: [20684.220340] cfg80211: Calling CRDA to update world regulatory domain
Aug 29 00:17:45 tico NetworkManager[1107]: <info> (wlan0): supplicant connection state:  disconnected -> scanning
Aug 29 00:17:45 tico kernel: [20684.341841] cfg80211: Ignoring regulatory request Set by core since the driver uses its own custom regulatory domain 
Aug 29 00:17:45 tico kernel: [20684.341857] cfg80211: World regulatory domain updated:
Aug 29 00:17:45 tico kernel: [20684.341861] cfg80211:     (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
Aug 29 00:17:45 tico kernel: [20684.341869] cfg80211:     (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
Aug 29 00:17:45 tico kernel: [20684.341876] cfg80211:     (2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
Aug 29 00:17:45 tico kernel: [20684.341883] cfg80211:     (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
Aug 29 00:17:45 tico kernel: [20684.341890] cfg80211:     (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
Aug 29 00:17:45 tico kernel: [20684.341896] cfg80211:     (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)

I already thought that it might be overheating but the weather is not hot (about 15oC) and the plastic chassis is merely warm.

Also, the patch introduced a problem:

Aug 29 00:18:18 router user.info sysinit: /sbin/wifi: eval: line 1: hostapd_set_log_options: not found

comment:10 Changed 5 years ago by nbd

I wonder if your device might be having a hardware issue. I'm not aware of other people's TP-Link devices crashing like that.

comment:11 Changed 5 years ago by luizluca@…

Maybe it is hardware problem. I found other reports about "intel chipset + heavy load + atheros = reboot" but they are months and years old.

comment:12 Changed 5 years ago by nbd

yeah, those problems are unlikely to be in any way related.

comment:13 Changed 5 years ago by nbd

please try the latest backfire mac80211 update patch, i just fixed some crash bugs

comment:14 Changed 5 years ago by Luiz Angelo Daros de Luca <luizluca@…>

I applied the patch, after reverting the previous one, and got this error.

rm -rf /mnt/usuarios/luizluca/prog/openwrt/backfire/build_dir/linux-ar71xx/compat-wireless-2011-06-22
mkdir -p /mnt/usuarios/luizluca/prog/openwrt/backfire/build_dir/linux-ar71xx/compat-wireless-2011-06-22
bzcat /mnt/usuarios/luizluca/prog/openwrt/backfire/dl/compat-wireless-2011-06-22.tar.bz2 | /bin/tar -C /mnt/usuarios/luizluca/prog/openwrt/backfire/build_dir/linux-ar71xx/compat-wireless-2011-06-22/.. -xf -

Applying ./patches/000-disable_ethernet.patch using plaintext:
patching file Makefile
Hunk #1 FAILED at 29.
1 out of 1 hunk FAILED -- saving rejects to file Makefile.rej
Patch failed! Please fix ./patches/000-disable_ethernet.patch!

I'm using an updated svn copy. Shouldn't it be using compat-wireless-2011-08-26.tar.bz2? The previous patch changed this but not the new one.

I also tried to apply both patches but the amount of errors tells me that I wasn't supposed to do it.

comment:15 Changed 5 years ago by nbd

should be fixed now

comment:16 Changed 5 years ago by luizluca@…

I don't know if it was only this patch but the load average of my router is constantly increasing over time. When I reboot, it starts at 0.2. After some time, it gets 1.2. Acording to top information, userspace process does not explain this increase as I get 98% idle. It seems to be kernel stuff.

I notice that at this loadavg, latency increases alot. FPS online games gets unplayable and I quickly get kicked.

Should I revert the patch? Does the SVN version also contain this patch? Is there any news in SVN that recomends an update?

comment:17 Changed 5 years ago by luizluca@…

Hello,

Recent SVN copy (r28285) solved all problem. The crash one and the high load avg. It seems that something fixed it.

comment:18 Changed 5 years ago by nbd

  • Resolution set to fixed
  • Status changed from reopened to closed

comment:19 Changed 22 months ago by jow

  • Milestone changed from Attitude Adjustment 12.09 to Barrier Breaker 14.07

Milestone Attitude Adjustment 12.09 deleted

Add Comment

Modify Ticket

Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.