Modify

Opened 8 years ago

Closed 8 years ago

#3071 closed defect (fixed)

rb532a watchdog & madwifi

Reported by: acoul <alex@…> Owned by: florian
Priority: high Milestone: Kamikaze 8.09 RC1
Component: kernel Version:
Keywords: rb532a watchdog madwifi kernel panic Cc:

Description

I get a kernel panic when bringing the interface up on an atheros cm9 on station/managed mode. This was noticed on svn > #9705. Turning off the new watchdog code introduced on the patchset [9897] does not solve this issue.

CPU 0 Unable to handle kernel paging request at virtual address 00000010, epc == 80101f40, ra == 8014a83c
Oops[#1]:
Cpu 0
$ 0   : 00000000 1010ea00 80370000 ffffff7f
$ 4   : 00000000 00000004 00000087 00000080
$ 8   : 1010ea00 1000001f ffffffff 00000000
$12   : ffffffff 00000010 1b4a1b80 002c002b
$16   : 80347cd8 83dbae80 0000008f 80380000
$20   : 81254164 00000014 812545a8 c00b66d0
$24   : 00000010 802204f4
$28   : 83dae000 83dafcb8 80310000 8014a83c
Hi    : 002fe7e2
Lo    : 2d3840ea
epc   : 80101f40     Tainted: P
ra    : 8014a83c Status: 1010ea02    KERNEL EXL
Cause : 10800008
BadVA : 00000010
PrId  : 0001800a
Modules linked in: hostap_pci hostap ath_pci ath_rate_minstrel ath_hal(P) wlan_scan_sta wlan ieee80211_crypt_ccmp ieee80211 ieee80211_crypt
Process killall (pid: 377, threadinfo=83dae000, task=8039e480)
Stack : 00000000 00000000 00000000 00000000 00000018 00000068 81254380 00000001
        80101e4c ffffffff 00000018 00000000 00000001 83d38000 83dafe80 83dbb000
        801027c4 00000000 00000003 00000000 83d30000 83d30380 81300320 8121f000
        00000000 1010ea00 83dbb000 00000001 00000028 812545a8 81254164 81254164
        00000003 83dbb014 ffffffff 00000000 ffffffff 00000010 1b4a1b80 002c002b
        ...
Call Trace:[<80101e4c>][<801027c4>][<c00b66d0>][<802204f4>][<c00bac9c>][<c00bacfc>][<c00bab00>][<8012e090>][<8012df48>][<801298a0>][<80129984>][<80175e04>]

Code: 3c028037  8c4424d0  00071827 <8c820010> 00431024  ac820010  24a20002  24040100  00442004
Kernel panic - not syncing: Fatal exception in interrupt
Rebooting in 3 seconds..

Attachments (0)

Change History (15)

comment:1 Changed 8 years ago by acoul <alex@…>

the problem is not madwifi driver related. backporting madwifi snapshot #10277 to snapshot #9705 base system does not have this issue. something between > #9705 & < #10175 broke madwifi functionality on the rb532 series.

comment:2 Changed 8 years ago by acoul <alex@…>

This problem remains up to snapshot [10463]. This is a serious issue. The more we are going away from the last functional snapshot the harder will be to locate and fix the problem. RB532 should be either marked as broken or revert to the last functional shapshot [9705] in this case. Who ever did any changes to the rb532 tree past snapshot [9705] should really look into this. The following is from a freshly flashed rb532a:

BusyBox v1.8.2 (2008-02-15 06:48:27 EET) built-in shell (ash)
Enter 'help' for a list of built-in commands.

  _______                     ________        __
 |       |.-----.-----.-----.|  |  |  |.----.|  |_
 |   -   ||  _  |  -__|     ||  |  |  ||   _||   _|
 |_______||   __|_____|__|__||________||__|  |____|
          |__| W I R E L E S S   F R E E D O M
 KAMIKAZE (bleeding edge, r10463) -------------------
  * 10 oz Vodka       Shake well with ice and strain
  * 10 oz Triple sec  mixture into 10 shot glasses.
  * 10 oz lime juice  Salute!
 ---------------------------------------------------
root@OpenWrt:/# wlanconfig ath0 create wlandev wifi0 wlanmode sta
ath0
root@OpenWrt:/# cat /proc/interrupts
           CPU0
  1:          0           RB500  S1
  7:       8399           RB500  timer
 40:         15           RB500  Korina ethernet Rx
 41:          3           RB500  Korina ethernet Tx
104:       1155           RB500  serial
114:          0           RB500  Ethernet Underflow
142:          0           RB500  wifi0
143:          0           RB500  wifi1

ERR:          0
root@OpenWrt:/# lsmod
Module                  Size  Used by    Tainted: P
ath_pci               125680  0
ath_rate_minstrel       8240  2
ath_hal               271168  4 ath_pci,ath_rate_minstrel
wlan_scan_sta           8960  0
wlan                  152608  5 ath_pci,ath_rate_minstrel,wlan_scan_sta
root@OpenWrt:/# ifconfig ath0 up
root@OpenWrt:/# CPU 0 Unable to handle kernel paging request at virtual address 00000010, epc == 80101f40, ra == 8014a83c
Oops[#1]:
Cpu 0
$ 0   : 00000000 1010ea00 80350000 ffffffbf
$ 4   : 00000000 00000004 00000086 00000040
$ 8   : 1010ea00 1000001f ffffffff 00000000
$12   : ffffffff 00000019 8b16d100 0060005c
$16   : 80329ca0 83d95980 0000008e 80360000
$20   : 83eae164 00000014 83eae5ac c009fa98
$24   : 00000010 8022050c
$28   : 80322000 80323ce0 80300000 8014a83c
Hi    : 00405b19
Lo    : caf8d2c9
epc   : 80101f40     Tainted: P
ra    : 8014a83c Status: 1010ea02    KERNEL EXL
Cause : 10800008
BadVA : 00000010
PrId  : 0001800a
Modules linked in: ath_pci ath_rate_minstrel ath_hal(P) wlan_scan_sta wlan
Process swapper (pid: 0, threadinfo=80322000, task=80324188)
Stack : 00000000 00000000 00000000 00000003 00000019 00000067 ffffb57e 00000001
        80101e4c 00000000 00000000 00000000 00000000 00000000 80323ed0 00000000
        801027c4 83d98000 00000000 00000000 ffffffff 00000018 00000000 00000001
        00000000 1010ea00 8035c0e8 1010ea01 8035c0e8 83f0b464 ffffb57e ffffb56b
        8123bc00 83f0b014 ffffffff 00000000 ffffffff 00000019 8b16d100 0060005c
        ...
Call Trace:[<80101e4c>][<801027c4>][<c009fa98>][<8022050c>][<c009fcfc>][<c009fcd4>][<8012de94>][<8012deb0>][<c009fa98>][<c00a4320>][<c00a40f0>][<8012e090>]

Code: 3c028035  8c4404d0  00071827 <8c820010> 00431024  ac820010  24a20002  24040100  00442004
Kernel panic - not syncing: Fatal exception in interrupt
Rebooting in 3 seconds..

comment:3 Changed 8 years ago by acoul <alex@…>

snapshot [9705] works fine with the latest madwifi openwrt trunk http://10.2.19.1/airo/openwrt/firmware/kamikaze/2.6/rb5xx/9705-latest-madwifi/packages/kmod-madwifi_2.6.22.4+r3314-rb532-1_mipsel.ipk so this issue is not a madwifi driver problem.

comment:4 Changed 8 years ago by florian

  • Owner changed from developers to florian
  • Status changed from new to assigned

Can you please try with the rc32434_wdt driver disabled (CONFIG_RC32434_WDT is not set).

Also enable kernel debugging, so that we can see which symbol is causing this crash.

comment:5 Changed 8 years ago by acoul <alex@…>

please replace on the above url link 10.2.19.1 with wifi.ozo.com (I have no edit rights). rc32434_wdt driver (CONFIG_RC32434_WDT is not set) is already disabled. Here is the output with most of the kernel debug verbose options enabled:

root@OpenWrt:/# wlanconfig ath0 create wlandev wifi0 wlanmode sta
kobject ath0: registering. parent: net, set: devices
kobject_uevent_env
fill_kobj_path: path = '/devices/pci0000:00/0000:00:04.0/net/ath0'
ath0
root@OpenWrt:/# ifconfig ath0 up
root@OpenWrt:/# CPU 0 Unable to handle kernel paging request at virtual address 00000010, epc == 80101f50, ra == 80155564
Oops[#1]:
Cpu 0
$ 0   : 00000000 1010ea00 803c0000 ffffffbf
$ 4   : 00000000 00000004 00000086 00000040
$ 8   : 1010ea00 1000001f ffffffff 00000000
$12   : ffffffff 00000009 e9d44e80 0060005c
$16   : 80397cf8 80397d28 812c4400 0000008e
$20   : 805d0000 00000014 812476cc c00a5b98
$24   : 00000010 80242f1c
$28   : 8038e000 8038fd08 80360000 80155564
Hi    : 00000000
Lo    : 00000000
epc   : 80101f50 rb500_end_irq+0x90/0xf0     Tainted: P
ra    : 80155564 __do_IRQ+0x124/0x150
Status: 1010ea02    KERNEL EXL
Cause : 10800008
BadVA : 00000010
PrId  : 0001800a
Modules linked in: ath_pci ath_rate_minstrel ath_hal(P) wlan_scan_sta wlan
Process swapper (pid: 0, threadinfo=8038e000, task=80390200)
Stack : 0000002e a1210000 8119bf44 81220000 00000019 00000067 812474a0 00000000
        81247164 80101e54 00000003 00000000 ffffffff 00000018 8038fed0 805fe800
        80102844 812474a0 812284a0 c006c198 812284a0 81228000 812474a0 81336000
        00000000 1010ea00 00000000 00000000 00000000 80370000 c006f130 00000000
        0000002e 00000003 ffffffff 00000000 ffffffff 00000009 e9d44e80 0060005c
        ...
Call Trace:
[<80101f50>] rb500_end_irq+0x90/0xf0
[<80155564>] __do_IRQ+0x124/0x150
[<80101e54>] plat_irq_dispatch+0xd0/0xf4
[<80102844>] ret_from_irq+0x0/0x4
[<c00aaafc>] ieee80211_scan_attach+0x30c/0x11b8 [wlan]


Code: 3c02803c  8c4434d0  00071827 <8c820010> 00431024  ac820010  24a20002  24040100  00442004
Kernel panic - not syncing: Fatal exception in interrupt
Rebooting in 3 seconds..

comment:6 Changed 8 years ago by acoul <alex@…>

also here is some more info on the boot process:

Please press Enter to activate this console.

=================================
[ INFO: inconsistent lock state ]
2.6.23.16 #2
---------------------------------
inconsistent {in-hardirq-W} -> {hardirq-on-W} usage.
ash/182 [HC0[0]:SC1[1]:HE1:SE0] takes:
 (&lp->lock){++..}, at: [<8026e7c8>] korina_poll+0x60/0x840
{in-hardirq-W} state was registered at:
  [<80149f74>] __lock_acquire+0x6b0/0x1184
  [<8014aaf4>] lock_acquire+0xac/0xfc
  [<803107ec>] _spin_lock+0x30/0x44
  [<8026ddfc>] korina_dma_interrupt+0x20/0x110
  [<801553c8>] handle_IRQ_event+0x38/0xb0
  [<8015550c>] __do_IRQ+0xcc/0x150
  [<80101e54>] plat_irq_dispatch+0xd0/0xf4
  [<80102844>] ret_from_irq+0x0/0x4
  [<8017f6d0>] check_poison_obj+0xdc/0x22c
  [<8017fd14>] cache_alloc_debugcheck_after+0x40/0x234
  [<80182360>] kmem_cache_alloc+0x258/0x2a4
  [<80190194>] getname+0x28/0xdc
  [<801840b4>] do_sys_open+0x30/0xc0
  [<8010ba30>] stack_done+0x20/0x3c
irq event stamp: 1930
hardirqs last  enabled at (1930): [<80298c48>] net_rx_action+0xa8/0x288
hardirqs last disabled at (1929): [<80298c04>] net_rx_action+0x64/0x288
softirqs last  enabled at (1564): [<8012b264>] do_softirq+0x64/0xe8
softirqs last disabled at (1927): [<8012b264>] do_softirq+0x64/0xe8

other info that might help us debug this:
no locks held by ash/182.

stack backtrace:
Call Trace:
[<8010956c>] dump_stack+0x8/0x34
[<801479d4>] print_usage_bug+0x154/0x174
[<80148988>] mark_lock+0x290/0x5dc
[<8014a000>] __lock_acquire+0x73c/0x1184
[<8014aaf4>] lock_acquire+0xac/0xfc
[<803107ec>] _spin_lock+0x30/0x44
[<8026e7c8>] korina_poll+0x60/0x840
[<80298c88>] net_rx_action+0xe8/0x288
[<8012b160>] __do_softirq+0x9c/0x13c
[<8012b264>] do_softirq+0x64/0xe8
[<80102844>] ret_from_irq+0x0/0x4
[<8017f6d0>] check_poison_obj+0xdc/0x22c
[<8017fd14>] cache_alloc_debugcheck_after+0x40/0x234
[<80182360>] kmem_cache_alloc+0x258/0x2a4
[<80190194>] getname+0x28/0xdc
[<801840b4>] do_sys_open+0x30/0xc0
[<8010ba30>] stack_done+0x20/0x3c

comment:7 Changed 8 years ago by florian

The problem seems to be related to the korina ethernet driver, not the watchdog. I will try to fix this.

comment:8 Changed 8 years ago by acoul <alex@…>

until this issue, a quite serious one I may say, is resolved it may not be a bad idea to revert to the last working kernel a 2.6.22-4 I believe ...

comment:9 follow-up: Changed 8 years ago by zgreycoat

Does anyone know if there has been any progress on this? I'm seeing the same issue with the most current svn (10622) when using madwifi and the ADM5120 target on a Mikrotik RB133.

comment:10 in reply to: ↑ 9 Changed 8 years ago by agb

Replying to zgreycoat:

Does anyone know if there has been any progress on this? I'm seeing the same issue with the most current svn (10622) when using madwifi and the ADM5120 target on a Mikrotik RB133.

Are you sure you aren't running into the problem described in #3213? If so I reverted the problematic code in [10628]. The RB133 and RB532a use different processors and different ethernet drivers.

comment:11 Changed 8 years ago by acoul <alex@…>

the problem remains with snapshot [10635]. there is a possibility that the HAL bug that nbd is talking about is triggered earlier than the other platforms here.

comment:12 Changed 8 years ago by acoul <alex@…>

this problem remains even with madwifi-0.9.4 drivers. anyone wants to try here: http://wifi.ozo.com/airo/openwrt/firmware/kamikaze/2.6/rb5xx/10635/packages/kmod-madwifi_2.6.23.16+0.9.4-rb532-1_mipsel.ipk is the package

comment:13 Changed 8 years ago by acoul <alex@…>

meanwhile if someone wants to use a working kamikaze for rb532 with latest madwifi and ethtool functionality for the korina ethernet driver she/he can use an image from here: http://wifi.ozo.com/airo/openwrt/firmware/kamikaze/2.6/rb5xx/9705-latest-madwifi-ethtool-new/

comment:14 Changed 8 years ago by acoul <alex@…>

snapshot 10931 does not have this issue. this ticket can now close.

comment:15 Changed 8 years ago by florian

  • Resolution set to fixed
  • Status changed from assigned to closed

Ok, thanks for testing.

Add Comment

Modify Ticket

Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.