Modify

Opened 7 years ago

Closed 6 years ago

Last modified 22 months ago

#5864 closed defect (fixed)

programs hang

Reported by: anonymous Owned by: developers
Priority: normal Milestone: Barrier Breaker 14.07
Component: packages Version:
Keywords: Cc:

Description

I downloaded and compiled the newest trunk for Routerboard 411. It works fine when it comes to busybox, but when i try install to run packets like "python" or "apache", they simply hang, using 100% CPU. I checked the same root filesystems, with the same binaries on QEMU and on the original kernel (2.6.27.21 by Mikrotik) and on those kernels this programs start fine.

I tried other packets like "bash" and they work fine.

I checked this on RB411U and older RB411, on both it hangs.

gdb shows that the program is simply stuck at certain address (0x42a6b4) and it cannot be stepped any further.

Attachments (1)

999-mini_fo_lock_fix.patch (745 bytes) - added by psvopenwrt@… 6 years ago.

Download all attachments as: .zip

Change History (8)

comment:1 Changed 7 years ago by anonymous

I've done additional research, tried .ipk's of various precompiled pythons, and they all end up hanging in more or less the same place on the trunk kernel (and on mikrotik kernel they work perfectly)

comment:2 Changed 7 years ago by arus at poczta dot onet dot pl

I think it is a more general problem. I see similar behaviour on brcm47xx. There are problems with restoring configuration from backup (using luci or unpacking tarball manually). Processes hang, they has status "D" on process list.
The following exemplary operations cause processes to hang:
1)
scp tarball to directory /tmp/tmp on router
login to router
cd /
tar zxf /tmp/tmp/backup-Openwrt-date.tar.gz
tar hangs
2)
scp tarball to directory /tmp/tmp on router
login to router
cd /tmp/tmp
tar zxf backup-OpenWrt-date.tar.gz
cd etc
cp -pR config /etc
cp hangs

However, copying each FILE one by one works.

comment:3 Changed 6 years ago by anonymous

This is likely a bug in mini_fo, jffs or the flash driver. Easy to reproduce on a wrt160nl as well.

comment:4 Changed 6 years ago by psvopenwrt@…

The bug seems to be in state.c in mini_fo in nondir_mod_to_del. I think it takes the mutex for the writable directory containing the file twice - once in nondir_mod_to_del (for the file itself) and once in meta_write_d_entry (for the meta file).

As far as I can tell the bug will always bite whenever deleting a modified file. After a reboot the file will be gone from the writable store, but remain in the meta file.

Since the lock is taken in meta_write_d_entry only to protect the lookup of the meta file and there are no other operations after the call to meta_write_d_entry in nondir_mod_to_del maybe it is safe to simply move the mutex_unlock to before the call to meta_write_d_entry?

Here are the lock analysis:
INFO: task tar:1283 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
tar D 80720000 0 1283 1281
Stack : 7fa9fb20 800a543c 0000002c 813c66c0 8013d8e8 80720000 00000001 813c67a8

81ec56a0 813c67c8 80350000 80069310 802cdc18 00000001 818cbd28 8030bf30
00000001 00000000 8012f328 00100100 813c67c8 813c67c8 81ec56a0 819dbdb0
ffffffff 802d0000 8188f6e0 00000000 00000001 00000004 818cb394 7fa9fb20
0048e03c 8012f328 0048e040 8012e3f8 8030bf30 00200200 8188f6e0 818cb394
...

Call Trace:
[<800683f4>] schedule+0x368/0x3cc
[<80069310>] mutex_lock_nested+0x1cc/0x2d8
[<8012f328>] meta_write_d_entry+0x9c/0x2b0
[<8012f578>] meta_add_d_entry+0x3c/0x5c
[<801349d0>] nondir_mod_to_del+0x138/0x17c
[<80132e60>] mini_fo_unlink+0x60/0x1fc
[<800e833c>] vfs_unlink+0x70/0xbc
[<800ea53c>] do_unlinkat+0xd4/0x174
[<80062504>] stack_done+0x20/0x3c

4 locks held by tar/1283:

#0: (&sb->s_type->i_mutex_key#8/1){......}, at: [<800ea4c4>] do_unlinkat+0x5c/0x174
#1: (&type->i_mutex_dir_key#4){......}, at: [<800e831c>] vfs_unlink+0x50/0xbc
#2: (&sb->s_type->i_mutex_key#10){......}, at: [<8013493c>] nondir_mod_to_del+0xa4/0x17c
#3: (&sb->s_type->i_mutex_key#10){......}, at: [<8012f328>] meta_write_d_entry+0x9c/0x2b0

Bug 6125 indicates that there may be more recursive locking in mini_fo.

Changed 6 years ago by psvopenwrt@…

comment:5 Changed 6 years ago by psvopenwrt@…

I have attached a patch that seems to work for me. No further lock warnings and restore etc works as well.

comment:6 Changed 6 years ago by nbd

  • Resolution set to fixed
  • Status changed from new to closed

added in r19203

comment:7 Changed 22 months ago by jow

  • Milestone changed from Attitude Adjustment 12.09 to Barrier Breaker 14.07

Milestone Attitude Adjustment 12.09 deleted

Add Comment

Modify Ticket

Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.