Modify

Opened 3 years ago

Closed 3 years ago

Last modified 18 months ago

#17839 closed defect (fixed)

Reboot command causes sometimes hanging router with TL-WDR3600 (HW V1.5) and TL-WDR4300 (HW V1.7)

Reported by: michaeluray Owned by: developers
Priority: high Milestone: Barrier Breaker 14.07
Component: base system Version: Barrier Breaker 14.07
Keywords: reboot Cc:

Description

With the new TP-Link router hardware TL-WDR4300 (HW V1.7) the router hangs sometimes after a reboot command and a power cycle is required to get it back to life.
Other guys also experienced this problem with the new hardware version of the TL-WDR3600 (HW V1.5).

With the hardware revision < V1.7 (TL-WDR4300) and < V1.5 (TL-WDR3600) the problem never occurred.
The problem showed up with a TL-WDR4300 (HW V1.7) on OpenWRT Attitude Adjustment and it also happens with BB-RC3 on this device.

Sometimes a reboot is successful, sometimes not and a power cycle of the device is required.

Forum entry for TL-WDR3600 V1.5 reboot problem:
https://forum.openwrt.org/viewtopic.php?pid=246752
There did a guy a reboot test with a cron job over 24h and just 2 of 6 devices stayed alive over this time.

Forum entry for problems with TL-WDR4300 V1.7, including the reboot problem:
https://forum.openwrt.org/viewtopic.php?pid=245767#p245767

Attachments (7)

openwrt r42625.log (22.8 KB) - added by ctbenergy 3 years ago.
openwrt r42625 debug-mode 4.log​ (344.8 KB) - added by ctbenergy 3 years ago.
TL-WDR4300_v1_140916.log (171.8 KB) - added by ctbenergy 3 years ago.
openwrt r42625 cronjob.log (34.7 KB) - added by ctbenergy 3 years ago.
openwrt r42625 cronjob with kernel crash.log​ (31.1 KB) - added by ctbenergy 3 years ago.
openwrt r42625 cronjob with ath9k disable.log (165.1 KB) - added by ctbenergy 3 years ago.
openwrt r42625 cronjob with will_crash_now.log (94.5 KB) - added by ctbenergy 3 years ago.

Download all attachments as: .zip

Change History (118)

comment:1 follow-up: Changed 3 years ago by bittorf@…

comment:2 Changed 3 years ago by anonymous

It turned out my cron-job failed. None of the routers were up and running after 24 h. When trying by hand I have never managed more than 10 (ten) successful reboots before the router hanging. This with AA as well as the rc-versions of BB (BB's unmodified images from openwrt.org) on several tens of devices.

comment:3 in reply to: ↑ 1 Changed 3 years ago by michaeluray

Replying to bittorf@…:

sounds like https://dev.openwrt.org/ticket/17824

The problem came up with a new hardware revision and so I think it's not a general software problem with the procd process management daemon.
I guess a procd problem would occur at every hardware platform.

comment:4 Changed 3 years ago by anonymous

I had the same problem some time ago on my wndr3800. I was able to fix it by adding a sleep 15 to my scripts I have in rc.local, and it worked. I dont know why, but the router hangs like 30% if I dont have these sleeps in there after a reboot.

comment:5 Changed 3 years ago by anonymous

I had the same problem some time ago on my wndr3800. I was able to fix it by adding a sleep 15 to my scripts I have in rc.local, and it worked. I dont know why, but the router hangs like 30% if I dont have these sleeps in there after a reboot.

comment:6 Changed 3 years ago by balazs.pomykala

This week I bought a brand new TP-Link WDR-4300 router, hardware rev. 1.7. I tested the original firmware for a couple of days - TL-WDR4300_V1_130617-. I did not have any problem with it.

After it I flashed BB-RC3 - openwrt-ar71xx-generic-tl-wdr4300-v1-squashfs-factory.bin from http://downloads.openwrt.org/barrier_breaker/14.07-rc3/ar71xx/generic/openwrt-ar71xx-generic-tl-wdr4300-v1-squashfs-factory.bin- and noticed that most of the cases when I issue a reboot from ssh terminal. It can not restart. When I apply a power off-on power cycle, it can start every-time.

First I thought there is some problem with the installed packages. Therefore I erase rootfs_data with fail safe - http://wiki.openwrt.org/doc/howto/generic.failsafe -.
But still the same problem exist even thought I did not installed any package.

I performed a couple of reboot tests and most of time it cannot restart. When I issue a reboot from terminal I can see that all leds goes down - only power is on-. But the router did not reach that state of the boot process that sys led starts to blink.

comment:7 follow-up: Changed 3 years ago by bittorf@…

this is fixed (workaround) in trunk r42545 - so please try this or later an report again.

comment:8 Changed 3 years ago by balazs.pomykala

Hi,

I flashed Bleeding Edge, r42616 from http://downloads.openwrt.org/snapshots/trunk/ar71xx/openwrt-ar71xx-generic-tl-wdr4300-v1-squashfs-sysupgrade.bin into my WDR4300 router.

After initial configuration, it did not restart. Maybe I applied some wrong configuration. Therefore I applied a power off-on cycle.

After above it worked. I performed quickly 6 reboot tests and all f them worked. Router could reboot after I issued reboot command from ssh terminal.

Great!

I hope this fix will be back ported to BB RC4.

Maybe this ticket can be closed.

comment:9 in reply to: ↑ 7 ; follow-up: Changed 3 years ago by ctbenergy

Replying to bittorf@…:

this is fixed (workaround) in trunk r42545 - so please try this or later an report again.


I flashed Barrier Breaker (bulid from source) with procd: update to latest git HEAD

http://git.openwrt.org/?p=14.07/openwrt.git;a=blobdiff;f=package/system/procd/Makefile;h=de73784fcf7bc2ba70009660d59046377ad92b3b;hp=840778af5b1f29458fff4dc2fb05e425a988f563;hb=bfaaa74a45520bf0e70c2bd85a0ca2508a771161;hpb=f6ae2c25f8242a8d00b5ea09545130940945c25d

I performed a couple of reboot tests from ssh terminal and most of time it cannot restart.
When I issue a reboot from terminal I can see that all leds goes down - only power is on.
But the router did not reach that state of the boot process that sys led starts to blink.

Last edited 3 years ago by ctbenergy (previous) (diff)

comment:10 in reply to: ↑ 9 ; follow-ups: Changed 3 years ago by bittorf@…

Replying to ctbenergy:

I performed a couple of reboot tests from ssh terminal and most of time it cannot restart.
When I issue a reboot from terminal I can see that all leds goes down - only power is on.
But the router did not reach that state of the boot process that sys led starts to blink.

how long did you wait? 180 sec?
if you have a serial console, please use the debug-mode 4 for procd and post the output during shutdown

comment:11 in reply to: ↑ 10 ; follow-up: Changed 3 years ago by ctbenergy

Replying to bittorf@…:

Replying to ctbenergy:

I performed a couple of reboot tests from ssh terminal and most of time it cannot restart.
When I issue a reboot from terminal I can see that all leds goes down - only power is on.
But the router did not reach that state of the boot process that sys led starts to blink.

how long did you wait? 180 sec?
if you have a serial console, please use the debug-mode 4 for procd and post the output during shutdown

first reboot over ssh ok, wait 5 minute, reboot over ssh only power led is on
if have a serial console, what do you mean with "debug-mode 4 for procd"?

comment:12 in reply to: ↑ 11 Changed 3 years ago by michaeluray

Replying to ctbenergy:

if have a serial console, what do you mean with "debug-mode 4 for procd"?

Hit the key '4' on the serial console during the bootup.
Run then a reboot and post the output.

Last edited 3 years ago by michaeluray (previous) (diff)

Changed 3 years ago by ctbenergy

comment:13 in reply to: ↑ 10 ; follow-ups: Changed 3 years ago by ctbenergy

Replying to bittorf@…:

Replying to ctbenergy:

I performed a couple of reboot tests from ssh terminal and most of time it cannot restart.
When I issue a reboot from terminal I can see that all leds goes down - only power is on.
But the router did not reach that state of the boot process that sys led starts to blink.

how long did you wait? 180 sec?
if you have a serial console, please use the debug-mode 4 for procd and post the output during shutdown

see attachment "openwrt r42625.log" for serial console log with debug-mode 4
openwrt r42625 build from source
reboot with cronjob every 5 minutes
after 4 reboots only power LED

comment:14 in reply to: ↑ 13 ; follow-up: Changed 3 years ago by michaeluray

Replying to ctbenergy:

see attachment "openwrt r42625.log" for serial console log with debug-mode 4
reboot with cronjob every 5 minutes

Is this log exactly from this boot when the router was hanging after?
How did you actually switch to debug-mode 4 when the reset was done by a cron job?

comment:15 in reply to: ↑ 14 Changed 3 years ago by ctbenergy

Replying to michaeluray:

Replying to ctbenergy:

see attachment "openwrt r42625.log" for serial console log with debug-mode 4
reboot with cronjob every 5 minutes

Is this log exactly from this boot when the router was hanging after?
How did you actually switch to debug-mode 4 when the reset was done by a cron job?

Yes, log is exactly from this boot when the router was hanging after.
I switch to debug-mode 4 by pressing key 4 and hit key enter once the serial console shows the message "Press the [1], [2], [3] or [4] key and hit [enter] to select the debug level"
The last message in the log file is "Restarting system". This message came from the kernel function "void kernel_restart(char *cmd)". So i think this is a linux kernel (3.10.49) problem.

comment:16 in reply to: ↑ 13 ; follow-up: Changed 3 years ago by michaeluray

Replying to ctbenergy:

see attachment "openwrt r42625.log" for serial console log with debug-mode 4
after 4 reboots only power LED

You wrote about 4 reboots, but there are just 2 reboots in the log.
Are these the last two reboots and the last one was the hanging one?

Changed 3 years ago by ctbenergy

comment:17 in reply to: ↑ 16 ; follow-up: Changed 3 years ago by ctbenergy

Replying to michaeluray:

Replying to ctbenergy:

see attachment "openwrt r42625.log" for serial console log with debug-mode 4
after 4 reboots only power LED

You wrote about 4 reboots, but there are just 2 reboots in the log.
Are these the last two reboots and the last one was the hanging one?

Attachment "openwrt r42625.log​​" without debug-mode 4 (2 reboots)
Attachment "openwrt r42625 debug-mode 4.log​​​​" with debug-mode 4 (4 reboots)

comment:18 in reply to: ↑ 17 ; follow-up: Changed 3 years ago by bittorf@…

Replying to ctbenergy:

Attachment "openwrt r42625.log​​" without debug-mode 4 (2 reboots)
Attachment "openwrt r42625 debug-mode 4.log​​​​" with debug-mode 4 (4 reboots)

thank you for the logs.
can you reproduce this with another power-supply? (does it hang?)
i have the same router here, without such probs.

comment:19 in reply to: ↑ 18 Changed 3 years ago by ctbenergy

Replying to bittorf@…:

Replying to ctbenergy:

Attachment "openwrt r42625.log​​" without debug-mode 4 (2 reboots)
Attachment "openwrt r42625 debug-mode 4.log​​​​" with debug-mode 4 (4 reboots)

thank you for the logs.
can you reproduce this with another power-supply? (does it hang?)
i have the same router here, without such probs.

i test 4 wdr4300 v1.7 routers in the company and all 4 routers have the same probs.
one of this 4 router hangs only after an hour.

Changed 3 years ago by ctbenergy

comment:20 follow-up: Changed 3 years ago by ctbenergy

New Attachment "TL-WDR4300_v1_140916.log"
This is the log file from the same test-router with original tp-link firmware.
I make 10 reboots through web interface and the router does't hang.

comment:21 follow-up: Changed 3 years ago by bittorf@…

can you please make a cronjob, which does:

sysctl -w kernel.panic_on_oops=1
sysctl -w kernel.panic=5
echo c >/proc/sysrq-trigger

and try if this also hangs.

comment:22 in reply to: ↑ 21 ; follow-up: Changed 3 years ago by ctbenergy

Replying to bittorf@…:

can you please make a cronjob, which does:

sysctl -w kernel.panic_on_oops=1
sysctl -w kernel.panic=5
echo c >/proc/sysrq-trigger

and try if this also hangs.

I made a cronjob as described, but after the third reboot over ssh the router also hangs.
Log file attachment "openwrt r42625 cronjob.log"

Changed 3 years ago by ctbenergy

comment:23 in reply to: ↑ 20 Changed 3 years ago by underthehood

Replying to ctbenergy:

New Attachment "TL-WDR4300_v1_140916.log"
This is the log file from the same test-router with original tp-link firmware.
I make 10 reboots through web interface and the router does't hang.

I can confirm this on the T_WDR3600 as well: original tp-link firmware works as expected (reboot via web interface) but AA and BB_rcN fail to reboot.

comment:24 in reply to: ↑ 22 ; follow-up: Changed 3 years ago by bittorf@…

Replying to ctbenergy:

sysctl -w kernel.panic_on_oops=1
sysctl -w kernel.panic=5
echo c >/proc/sysrq-trigger

and try if this also hangs.

I made a cronjob as described, but after the third reboot over ssh the router also hangs.
Log file attachment "openwrt r42625 cronjob.log"

this log does not show a crash (from sysrq-trigger). please try again.

Changed 3 years ago by ctbenergy

comment:25 in reply to: ↑ 24 ; follow-up: Changed 3 years ago by ctbenergy

Replying to bittorf@…:

Replying to ctbenergy:

sysctl -w kernel.panic_on_oops=1
sysctl -w kernel.panic=5
echo c >/proc/sysrq-trigger

and try if this also hangs.

I made a cronjob as described, but after the third reboot over ssh the router also hangs.
Log file attachment "openwrt r42625 cronjob.log"

this log does not show a crash (from sysrq-trigger). please try again.


on Linux, you might have to echo 1 > /proc/sys/kernel/sysrq before you are able to echo c > /proc/sysrq-trigger.

this is the current cronjob:

sysctl -w kernel.panic_on_oops=1
sysctl -w kernel.panic=5
echo 1 > /proc/sys/kernel/sysrq
*/5 * * * * echo c >/proc/sysrq-trigger

log is in attachment "openwrt r42625 cronjob with kernel crash.log"
after 3 reboots router hang

comment:26 in reply to: ↑ 25 ; follow-up: Changed 3 years ago by bittorf@…

log is in attachment "openwrt r42625 cronjob with kernel crash.log"
after 3 reboots router hang

can you please repeat the test with disabled wifi?:
you can achieve this by adding a '#' in front of each line in:

/etc/modules.d/ath9k

and adding 'option disabled 1' to each radio in /etc/config/wireless

comment:27 Changed 3 years ago by anonymous

@bittorf: I have a spare unit that also does not reboot reliably. I could ship it to you if you find it beneficial. Please contact me on sverre.slotte@….

Changed 3 years ago by ctbenergy

comment:28 in reply to: ↑ 26 ; follow-up: Changed 3 years ago by ctbenergy

Replying to bittorf@…:

log is in attachment "openwrt r42625 cronjob with kernel crash.log"
after 3 reboots router hang

can you please repeat the test with disabled wifi?:
you can achieve this by adding a '#' in front of each line in:

/etc/modules.d/ath9k

and adding 'option disabled 1' to each radio in /etc/config/wireless

log is in attachment
https://dev.openwrt.org/attachment/ticket/17839/openwrt%20r42625%20cronjob%20with%20ath9k%20disable.log

after 14 reboots router hang

comment:29 in reply to: ↑ 28 ; follow-up: Changed 3 years ago by bittorf@…

Replying to ctbenergy:

after 14 reboots router hang

interesting, because it hangs during normal operation:
no crash can be seen just before the hang. can you repeat
with a leading 'echo will_crash_now >/dev/console;sleep 3'
before the crashcommand?

comment:30 Changed 3 years ago by anonymous

Any news?

Changed 3 years ago by ctbenergy

comment:31 in reply to: ↑ 29 ; follow-up: Changed 3 years ago by ctbenergy

Replying to bittorf@…:

Replying to ctbenergy:

after 14 reboots router hang

interesting, because it hangs during normal operation:
no crash can be seen just before the hang. can you repeat
with a leading 'echo will_crash_now >/dev/console;sleep 3'
before the crashcommand?


I hope the cronjob is properly

sysctl -w kernel.panic_on_oops=1
sysctl -w kernel.panic=5
echo will_crash_now >/dev/console
sleep 3
echo 1 > /proc/sys/kernel/sysrq
echo c >/proc/sysrq-trigger


https://dev.openwrt.org/attachment/ticket/17839/openwrt%20r42625%20cronjob%20with%20will_crash_now.log

comment:32 Changed 3 years ago by anonymous

It seems the release version of BB is out. Has the rebooting problem been fixed there? What does the cron-based test report?

comment:33 Changed 3 years ago by anonymous

I tested the final BB which came out today. With simple reboot command from shell I can reproduce the reboot problem in my TP-Link WDR-4300 v:1.7.
I seems this bug was not fixed in final BB. :-(

comment:34 Changed 3 years ago by anonymous

I also testet the final BB today. I reboot the router via LuCI and can confirm the reboot problem ist still present altoughth it is not deterministic. TP-Link WDR4300 v1.7

Only packages I have additionally installed are openvpn-openss, openvpn-easy-rsa and openssh-sftp-server

comment:35 Changed 3 years ago by hausladen.j@…

I also testet the final BB today. I reboot the router via LuCI and can confirm the reboot problem ist still present altoughth it is not deterministic. TP-Link WDR4300 v1.7

Only packages I have additionally installed are openvpn-openss, openvpn-easy-rsa and openssh-sftp-server

comment:36 in reply to: ↑ 31 ; follow-up: Changed 3 years ago by bittorf@…

https://dev.openwrt.org/attachment/ticket/17839/openwrt%20r42625%20cronjob%20with%20will_crash_now.log

does it really hang after the last "Rebooting in 3 seconds.."?
also: why i cannot see the 'will_crash_now' in the log?

comment:37 Changed 3 years ago by anonymous

same... reboot fails on wdr3600

comment:38 in reply to: ↑ 36 Changed 3 years ago by ctbenergy

Replying to bittorf@…:

https://dev.openwrt.org/attachment/ticket/17839/openwrt%20r42625%20cronjob%20with%20will_crash_now.log

does it really hang after the last "Rebooting in 3 seconds.."?
also: why i cannot see the 'will_crash_now' in the log?

yes, it does really hang after the last "Rebooting in 3 seconds.."
search for 'will_crash_now' in the log file

comment:39 follow-ups: Changed 3 years ago by lc

We have a bunch of TL-WDR3600. Some devices have this issue, others don't. So we thought there must be a difference. We found the difference is the the RAM used.

Devices having reboot issues is eqipped with Zentel A3R12E40CBF
Devices without reboot issues has SKhynix H5PS5162GFR
Devices without reboot issues has Winbond W9751G6KB-25

It seems SKhynix and Winbond were for older production lots (serial # starting with 13B, 13C, 2141)
Zentel was used for serial # starting with 2143 and 2145

Can you guys check which RAM you have on your boards? Can you confirm our findings?

Maybe the hardware gurus know why Zentel RAM chips cause the reboot issue (timing?) and find a fix.

comment:40 in reply to: ↑ 39 Changed 3 years ago by _Steffen_

Replying to lc:

Can you guys check which RAM you have on your boards? Can you confirm our findings?

I have got a TL-WDR3600 V1.5 with Zentel A3R12E40CBF with reboot problems.
I have got a TL-WDR3600 V1.4 with Winbond W9751G6KB-25 without problems.
This confirms your findings.
All TL-WDR3600 of version V1.4 that I know are working fine. Reboot problems occurs only with version V1.5.
My V1.5 router under test shows the reboot problem with different OpenWRT versions AA, trunk, BB.
The last outputs at the serial console are looking pretty much the same as posted from ctbenergy.
The problem occurs less often if procd debug mode 4 is active on my tests (current stable BB release firmware image with default configuration).

comment:41 Changed 3 years ago by lc

I forgot to mention: all devices we tested and are TL-WDR3600 V1.5.

V1.5 devices with SKhynix H5PS5162GFR or Winbond W9751G6KB-25 work perfectly.

comment:42 Changed 3 years ago by anonymous

Interesting progress. How does one check the RAM type?

comment:43 in reply to: ↑ 39 Changed 3 years ago by ctbenergy

Replying to lc:

Can you guys check which RAM you have on your boards? Can you confirm our findings?


I have a TL-WDR4300 V1.7 with Zentel A3R12E40CBF with reboot problems.

comment:44 Changed 3 years ago by anonymous

Is physical disassembly the only way to check?

comment:45 Changed 3 years ago by anonymous

We have TL-WDR3600 devices V1.5 with Zentel A3R12E40CBF with reboot problems.
We have TL-WDR3600 devices V1.5 with Winbond W9751G6KB-25 without problems.
(We do not have any devices with SKhynix RAM.)

One of the Winbond-machines has the serial number starting with 13A.

This supports your findings, thank you for your detective work. Where do we go from here?

comment:46 follow-up: Changed 3 years ago by lc

I believe this is a timing issue with Zentel. I do not know what exactly OpenWRT does shortly before triggering the actual restart. However we played with sleep commands in /etc/rc.d/K99umount and found that if you add a sleep of 5 seconds the reboot problem does not occur as often as without. Unfortunately the issue does not go away completely, it just seems to take more reboots before the device hangs. Can you reproduce this change of behaviour?

procd: - reboot -
[  163.230000] Removing MTD device #3 (rootfs_data) with use count 1
[  163.250000] Restarting system.

Maybe a little more time (1 sec?) before the actual restart command is issued could solve the problem. Currently it seems there is just 0,02 seconds between "Removing MTD device #3" and the actual restart. Do you know where to increase this value or add a pause of 1 sec?

comment:47 in reply to: ↑ 46 Changed 3 years ago by bittorf@…

Replying to lc:

Maybe a little more time (1 sec?) before the actual restart command is issued could solve the problem. Currently it seems there is just 0,02 seconds between "Removing MTD device #3" and the actual restart. Do you know where to increase this value or add a pause of 1 sec?

this will not help. it seems to hang also when the kernel panics and no userspace is involved.

comment:48 Changed 3 years ago by cedrix

The ram seems to work fine in general. How can it be result in trouble just when the reboot occurs? Maybe there are also other undiscovered different parts in the production?

comment:49 Changed 3 years ago by anonymous

EXCUSE ME!!! How do I check the type of RAM?

comment:50 Changed 3 years ago by anonymous

In order to examine the RAM you need to open the box. See here http://wiki.openwrt.org/toh/tp-link/tl-wdr4300#opening.the.case.v.1.1 for instructions.

comment:51 follow-ups: Changed 3 years ago by anonymous

I have looked at the reboot implementation of the published TP-Link kernel source code of TL-WDR3600 and compared it with the implemetation in OpenWRT.

TP-Link source code:
File TL-WDR3600_GPL_2.6.31.tar.gz: /./GPL_2.6.31/db12x/linux/kernels/mips-linux-2.6.31/arch/mips/atheros/setup.c

void ath_restart(char *command)
{
        for (;;) {
                if (is_ar934x_10()) {
                        ath_reg_wr(ATH_GPIO_OE, ath_reg_rd(ATH_GPIO_OE) & (~(1 << 17)));
                } else {
                        ath_reg_wr(ATH_RESET, ATH_RESET_FULL_CHIP);
                }
        }
}

OpenWRT trunk source code:
openwrt/trunk/build_dir/target-mips_r2_uClibc-0.9.33.2/linux-ar71xx_generic/linux-3.7.8/arch/mips/ath79/setup.c

static void ath79_restart(char *command)
{
        ath79_device_reset_set(AR71XX_RESET_FULL_CHIP);
        for (;;)
                if (cpu_wait)
                        cpu_wait();
}

/srv/openwrt/trunk/build_dir/target-mips_r2_uClibc-0.9.33.2/linux-ar71xx_generic/linux-3.7.8/arch/mips/ath79/common.c

void ath79_device_reset_set(u32 mask)
{
        unsigned long flags;
        u32 reg;
        u32 t;

        if (soc_is_ar71xx())
                reg = AR71XX_RESET_REG_RESET_MODULE;
        else if (soc_is_ar724x())
                reg = AR724X_RESET_REG_RESET_MODULE;
        else if (soc_is_ar913x())
                reg = AR913X_RESET_REG_RESET_MODULE;
        else if (soc_is_ar933x())
                reg = AR933X_RESET_REG_RESET_MODULE;
        else if (soc_is_ar934x() ||
                 soc_is_qca955x())
                reg = AR934X_RESET_REG_RESET_MODULE;
        else
                BUG();

        spin_lock_irqsave(&ath79_device_reset_lock, flags);
        t = ath79_reset_rr(reg);
        ath79_reset_wr(reg, t | mask);
        spin_unlock_irqrestore(&ath79_device_reset_lock, flags);
}

The TP-Link code writes the ATH_RESET_FULL_CHIP value (0x01000000) to the RESET register in a loop.
The OpenWRT (current kernel?) implementation reads the RESET register (t), set the bit 24 (t | mask) and write it to the RESET register.

I think in case of an FULL_CHIP reset it is not necessary. I have checked with "printk" the values of the variables:

mask = 0x01000000
t = 0x24044008

I have seen these values at a normal restart and at a brooken restart (system hangs).
I have also noticed that a "printk" after the line "ath79_reset_wr(reg, t | mask);" will be not executed in case of an FULL_CHIP reset.

I have tried to change the logic of function "ath79_device_reset_set" by insert the following code to the function "ath79_device_reset_set" right before "spin_lock_irqsave" call:

        if (mask == AR71XX_RESET_FULL_CHIP) {
                printk(KERN_WARNING "TEST: ath79_device_reset_set: RESET_FULL_CHIP\n");
                for (;;) {
                        ath79_reset_wr(reg, mask);
                }
        }

I have build an image with this modification based on my (old) current AA branch and it looks like the problem is gone.

I think this change could not be the final solution because this code will be used by all Atheros devices and I have no experience in kernel programming.

But may be this could leads to the right direction.

comment:52 in reply to: ↑ 51 Changed 3 years ago by michaeluray

Replying to anonymous:

I think this change could not be the final solution because this code will be used by all Atheros devices and I have no experience in kernel programming.

But may be this could leads to the right direction.

Thanks for the information, I think this will help.

So far as I have seen the reset register gets written correctly in the OpenWRT program code. Your solution as well as the OpenWRT solution sets the bit 24 which is the "FULL_CHIP_RESET" bit.

FULL_CHIP_RESET:
"Used to command a full chip reset. This is the software
equivalent of pulling the reset pin. The system will reboot with
PLL disabled. Always zero when read."

Maybe the register gets overwritten in a interrupt routine which raises under some circumstances and this causes that the reboot problem occurs not all the time.
If my assumption is right, then is probably somewhere a interrupt routine which does not write on the right way to this register, or maybe is there is a failure in the chip which resets this bit under some circumstances.

I have also no experience with kernel programming neither with more hardware near programming, but I will try to compile and test BB with the following changed function to fix this problem:

static void ath79_restart(char *command)
{
        for (;;)
        {
                ath79_device_reset_set(AR71XX_RESET_FULL_CHIP);
                if (cpu_wait)
                        cpu_wait();
        }
}

I will probably find tomorrow some time to verify if this solution works on the involved routers.

Last edited 3 years ago by michaeluray (previous) (diff)

comment:53 Changed 3 years ago by _Steffen_

Thanks for the feedback. I'm curious about the result.
There are still two differences between my and your approach.
In your approach the FULL CHIP RESET bit is still merged with the current status of the Reset register (e.g. 0x24044008 | 0x01000000 = 0x25044008) and the function "spin_lock_irqsave" is called.

comment:54 Changed 3 years ago by anonymous

Hats off to you guys! We implemented the fix suggested in comment 51 (file common.c) for AA. Reboots now work for both Zentel and Winbond equipped routers.

The above was with the stock (default) AA setup. We usually build our system using Image Generator. I have not been able to figure out how to use Image Generator with the new AA previously built. Any hints?

comment:55 in reply to: ↑ 51 ; follow-ups: Changed 3 years ago by michaeluray

Replying to michaeluray:

I have also no experience with kernel programming neither with more hardware near programming, but I will try to compile and test BB with the following changed function to fix this problem:

static void ath79_restart(char *command)
{
        for (;;)
        {
                ath79_device_reset_set(AR71XX_RESET_FULL_CHIP);
                if (cpu_wait)
                        cpu_wait();
        }
}

I will probably find tomorrow some time to verify if this solution works on the involved routers.

I tested my proposed solution, but it did not work.

After that I did test a solution which is very similar to the solution which was posted in comment 51 and it looks like as it works well.
I did a couple (> 20) of reboots with it and the router never was hanging.

I used the following code within BB:

build_dir/target-mips_34kc_uClibc-0.9.33.2/linux-ar71xx_generic/linux-3.10.49/arch/mips/ath79/common.c

void ath79_device_reset_set(u32 mask)
{
        unsigned long flags;
        u32 reg;
        u32 t;

        if (soc_is_ar71xx())
                reg = AR71XX_RESET_REG_RESET_MODULE;
        else if (soc_is_ar724x())
                reg = AR724X_RESET_REG_RESET_MODULE;
        else if (soc_is_ar913x())
                reg = AR913X_RESET_REG_RESET_MODULE;
        else if (soc_is_ar933x())
                reg = AR933X_RESET_REG_RESET_MODULE;
        else if (soc_is_ar934x())
                reg = AR934X_RESET_REG_RESET_MODULE;
        else if (soc_is_qca953x())
                reg = QCA953X_RESET_REG_RESET_MODULE;
        else if (soc_is_qca955x())
                reg = QCA955X_RESET_REG_RESET_MODULE;
        else
                panic("Reset register not defined for this SOC");

        spin_lock_irqsave(&ath79_device_reset_lock, flags);
        t = ath79_reset_rr(reg);
        if (mask == AR71XX_RESET_FULL_CHIP)
                for (;;)
                        ath79_reset_wr(reg, mask);
        else
                ath79_reset_wr(reg, t | mask);
        spin_unlock_irqrestore(&ath79_device_reset_lock, flags);
}

This was the original code from BB:

void ath79_device_reset_set(u32 mask)
{
        unsigned long flags;
        u32 reg;
        u32 t;

        if (soc_is_ar71xx())
                reg = AR71XX_RESET_REG_RESET_MODULE;
        else if (soc_is_ar724x())
                reg = AR724X_RESET_REG_RESET_MODULE;
        else if (soc_is_ar913x())
                reg = AR913X_RESET_REG_RESET_MODULE;
        else if (soc_is_ar933x())
                reg = AR933X_RESET_REG_RESET_MODULE;
        else if (soc_is_ar934x())
                reg = AR934X_RESET_REG_RESET_MODULE;
        else if (soc_is_qca953x())
                reg = QCA953X_RESET_REG_RESET_MODULE;
        else if (soc_is_qca955x())
                reg = QCA955X_RESET_REG_RESET_MODULE;
        else
                panic("Reset register not defined for this SOC");

        spin_lock_irqsave(&ath79_device_reset_lock, flags);
        t = ath79_reset_rr(reg);
        ath79_reset_wr(reg, t | mask);
        spin_unlock_irqrestore(&ath79_device_reset_lock, flags);
}

Has anyone a glue why that works, but not the original version?
Has the CPU maybe a problem with executing a reset of a submodule and a full chip reset at the same time? - Why did it work on the earlier hardware revisions?

Last edited 3 years ago by michaeluray (previous) (diff)

comment:56 in reply to: ↑ 55 Changed 3 years ago by michaeluray

I also have tried now the following code, but it did not work. - The router was hanging after a reset:

void ath79_device_reset_set(u32 mask)
{
        unsigned long flags;
        u32 reg;
        u32 t;

        if (soc_is_ar71xx())
                reg = AR71XX_RESET_REG_RESET_MODULE;
        else if (soc_is_ar724x())
                reg = AR724X_RESET_REG_RESET_MODULE;
        else if (soc_is_ar913x())
                reg = AR913X_RESET_REG_RESET_MODULE;
        else if (soc_is_ar933x())
                reg = AR933X_RESET_REG_RESET_MODULE;
        else if (soc_is_ar934x())
                reg = AR934X_RESET_REG_RESET_MODULE;
        else if (soc_is_qca953x())
                reg = QCA953X_RESET_REG_RESET_MODULE;
        else if (soc_is_qca955x())
                reg = QCA955X_RESET_REG_RESET_MODULE;
        else
                panic("Reset register not defined for this SOC");

        spin_lock_irqsave(&ath79_device_reset_lock, flags);
        t = ath79_reset_rr(reg);
        if (mask == AR71XX_RESET_FULL_CHIP)
                ath79_reset_wr(reg, mask);
        else
                ath79_reset_wr(reg, t | mask);
        spin_unlock_irqrestore(&ath79_device_reset_lock, flags);
}

It looks like as it is important to lock the interrupts, set the reset flag and wait there until the CPU reboots.

Last edited 3 years ago by michaeluray (previous) (diff)

comment:57 Changed 3 years ago by _Steffen_

Thank you for testing. It looks like that my proposal at comment 51 was a good assumption.
My test system (TL-WDR3600) is running with the "patch" for 4 days and executed more than 100 reboots without any problems.

comment:58 in reply to: ↑ 55 ; follow-ups: Changed 3 years ago by michaeluray

I did a couple tests and at the end I just added an endless loop when the FULL_CHIP_RESET bit gets set.

if (mask == AR71XX_RESET_FULL_CHIP)
        for (;;);

The reboot works well with this solution.

File: build_dir/target-mips_34kc_uClibc-0.9.33.2/linux-ar71xx_generic/linux-3.10.49/arch/mips/ath79/common.c

void ath79_device_reset_set(u32 mask)
{
        unsigned long flags;
        u32 reg;
        u32 t;

        if (soc_is_ar71xx())
                reg = AR71XX_RESET_REG_RESET_MODULE;
        else if (soc_is_ar724x())
                reg = AR724X_RESET_REG_RESET_MODULE;
        else if (soc_is_ar913x())
                reg = AR913X_RESET_REG_RESET_MODULE;
        else if (soc_is_ar933x())
                reg = AR933X_RESET_REG_RESET_MODULE;
        else if (soc_is_ar934x())
                reg = AR934X_RESET_REG_RESET_MODULE;
        else if (soc_is_qca953x())
                reg = QCA953X_RESET_REG_RESET_MODULE;
        else if (soc_is_qca955x())
                reg = QCA955X_RESET_REG_RESET_MODULE;
        else
                panic("Reset register not defined for this SOC");

        spin_lock_irqsave(&ath79_device_reset_lock, flags);
        t = ath79_reset_rr(reg);
        ath79_reset_wr(reg, t | mask);
        if (mask == AR71XX_RESET_FULL_CHIP)
                for (;;);
        spin_unlock_irqrestore(&ath79_device_reset_lock, flags);
}

I uploaded the compiled BB version with the reset fix to our web server.
If someone wants to test this with his device, then you can download it by the following links:

http://www.ctb.co.at/download/OpenWRT/14.07-rf/ar71xx/generic/openwrt-ar71xx-generic-tl-wdr3600-v1-squashfs-factory.bin

http://www.ctb.co.at/download/OpenWRT/14.07-rf/ar71xx/generic/openwrt-ar71xx-generic-tl-wdr3600-v1-squashfs-sysupgrade.bin

http://www.ctb.co.at/download/OpenWRT/14.07-rf/ar71xx/generic/openwrt-ar71xx-generic-tl-wdr4300-v1-squashfs-factory.bin

http://www.ctb.co.at/download/OpenWRT/14.07-rf/ar71xx/generic/openwrt-ar71xx-generic-tl-wdr4300-v1-squashfs-sysupgrade.bin

What are the next steps here and what is the right way to get this fix into the OpenWRT system or into the Linux kernel?

Last edited 3 years ago by michaeluray (previous) (diff)

comment:59 Changed 3 years ago by anonymous

I have tried the both fixes (comment 51 and comment 58). Both work, over 1000 successful reboots via cron.

How can we converge on one "right" fix and how do we get it into the official source tree? Are there any maintainers following this thread?

comment:60 follow-up: Changed 3 years ago by jow

I strongly suggest to send comment 58 as RFC patch to the OpenWrt development list.

comment:61 in reply to: ↑ 58 Changed 3 years ago by anonymous

Replying to michaeluray:

I can confirm, that the patch of @michaeluray in comment:58 is working on 4300-v1.7.

I'm using my own build with this patch.

@michaeluray thanks for the patch

comment:62 Changed 3 years ago by markit

fyi i had the similar reboot issues with various ubiqiti devices, (mostly unifi ap and unifi outdoor)

Doing the reset within the loop like in comment:55, cures any issues on them too.

comment:63 in reply to: ↑ 51 Changed 3 years ago by markit

Replying to anonymous:

I think this change could not be the final solution because this code will be used by all Atheros devices and I have no experience in kernel programming.

Btw i tested about 50 different ubiquiti devices, consisting of nanostations, locos, rockets, unifi ap and unifi outdoor (infact all of them ar7242 based), for multiple days in an automated testsetup.

Imho all models of them are affected by the same reboot issues, the difference is just how much, some models just hang on e.g. every 1000th reboot in average, other on every 5th.

So the bottom line is this issue is definetly not limited on WDR3400/4300 or only ar9344 based hardware! Maybe nearly all atheros based routers are affected, but many of them so seldom that normally no one would notice.

comment:64 in reply to: ↑ 60 Changed 3 years ago by michaeluray

Replying to jow:

I strongly suggest to send comment 58 as RFC patch to the OpenWrt development list.

I did put the proposal from comment 58 into a patch and I sent it to the
OpenWRT development mailing list.

Please see the following link:
https://lists.openwrt.org/pipermail/openwrt-devel/2014-October/028783.html

comment:65 Changed 3 years ago by nbd

please try this patch as well: http://nbd.name/restart-fix.patch

comment:66 Changed 3 years ago by markit

I tested both patches on 10 unifi APs, which previously failed on every 5th to 10th reboot.

Both patches seem to work fine so far, but i'll keep them continuously rebooting for the next hours.

comment:67 follow-up: Changed 3 years ago by nbd

  • Resolution set to fixed
  • Status changed from new to closed

fix committed in r42955

comment:68 in reply to: ↑ 58 Changed 3 years ago by anonymous

Replying to michaeluray:

I uploaded the compiled BB version with the reset fix to our web server.
If someone wants to test this with his device, then you can download it by the following links:

Tried using opkg? You broke it with the unofficial version name.

comment:69 Changed 3 years ago by lc

I cannot confirm nbd's fix. It seems the reboot issue gets even worse - with this fix the WDR3600 hangs every single time I issue the reboot command.

Who else has given it a try? What are your results?

comment:70 in reply to: ↑ 67 Changed 3 years ago by michaeluray

Replying to nbd:

fix committed in r42955

I have tried to apply your patch to BB and it did not work there.
I also tried the following code modifications and just the modification "common.c - for loop" works for me.

"common.c - for loop" - works

void ath79_device_reset_set(u32 mask)
{
        unsigned long flags;
        u32 reg;
        u32 t;

        if (soc_is_ar71xx())
                reg = AR71XX_RESET_REG_RESET_MODULE;
        else if (soc_is_ar724x())
                reg = AR724X_RESET_REG_RESET_MODULE;
        else if (soc_is_ar913x())
                reg = AR913X_RESET_REG_RESET_MODULE;
        else if (soc_is_ar933x())
                reg = AR933X_RESET_REG_RESET_MODULE;
        else if (soc_is_ar934x())
                reg = AR934X_RESET_REG_RESET_MODULE;
        else if (soc_is_qca953x())
                reg = QCA953X_RESET_REG_RESET_MODULE;
        else if (soc_is_qca955x())
                reg = QCA955X_RESET_REG_RESET_MODULE;
        else
                panic("Reset register not defined for this SOC");

        spin_lock_irqsave(&ath79_device_reset_lock, flags);
        t = ath79_reset_rr(reg);
        ath79_reset_wr(reg, t | mask);
        if (mask == AR71XX_RESET_FULL_CHIP)
                for(;;);
        spin_unlock_irqrestore(&ath79_device_reset_lock, flags);
}

"common.c - irq disabled, unreachable macro" - router hangs

void ath79_device_reset_set(u32 mask)
{
        unsigned long flags;
        u32 reg;
        u32 t;

        if (soc_is_ar71xx())
                reg = AR71XX_RESET_REG_RESET_MODULE;
        else if (soc_is_ar724x())
                reg = AR724X_RESET_REG_RESET_MODULE;
        else if (soc_is_ar913x())
                reg = AR913X_RESET_REG_RESET_MODULE;
        else if (soc_is_ar933x())
                reg = AR933X_RESET_REG_RESET_MODULE;
        else if (soc_is_ar934x())
                reg = AR934X_RESET_REG_RESET_MODULE;
        else if (soc_is_qca953x())
                reg = QCA953X_RESET_REG_RESET_MODULE;
        else if (soc_is_qca955x())
                reg = QCA955X_RESET_REG_RESET_MODULE;
        else
                panic("Reset register not defined for this SOC");

        spin_lock_irqsave(&ath79_device_reset_lock, flags);
        t = ath79_reset_rr(reg);
        if (mask == AR71XX_RESET_FULL_CHIP)
                local_irq_disable();
        ath79_reset_wr(reg, t | mask);
        if (mask == AR71XX_RESET_FULL_CHIP)
                unreachable();
        spin_unlock_irqrestore(&ath79_device_reset_lock, flags);
}

"common.c - unreachable macro" - router hangs

void ath79_device_reset_set(u32 mask)
{
        unsigned long flags;
        u32 reg;
        u32 t;

        if (soc_is_ar71xx())
                reg = AR71XX_RESET_REG_RESET_MODULE;
        else if (soc_is_ar724x())
                reg = AR724X_RESET_REG_RESET_MODULE;
        else if (soc_is_ar913x())
                reg = AR913X_RESET_REG_RESET_MODULE;
        else if (soc_is_ar933x())
                reg = AR933X_RESET_REG_RESET_MODULE;
        else if (soc_is_ar934x())
                reg = AR934X_RESET_REG_RESET_MODULE;
        else if (soc_is_qca953x())
                reg = QCA953X_RESET_REG_RESET_MODULE;
        else if (soc_is_qca955x())
                reg = QCA955X_RESET_REG_RESET_MODULE;
        else
                panic("Reset register not defined for this SOC");

        spin_lock_irqsave(&ath79_device_reset_lock, flags);
        t = ath79_reset_rr(reg);
        ath79_reset_wr(reg, t | mask);
        if (mask == AR71XX_RESET_FULL_CHIP)
                unreachable();
        spin_unlock_irqrestore(&ath79_device_reset_lock, flags);
}

"setup.c - irq disabled, for loop" - router hangs

static void ath79_restart(char *command)
{
        local_irq_disable();
        ath79_device_reset_set(AR71XX_RESET_FULL_CHIP);
        for (;;)
                if (cpu_wait)
                        cpu_wait();
}

comment:71 follow-up: Changed 3 years ago by michaeluray

  • Resolution fixed deleted
  • Status changed from closed to reopened

After the patch from r42955 did not work, I applied again to add my patch to the OpenWRT Source:
https://lists.openwrt.org/pipermail/openwrt-devel/2014-October/028838.html

comment:72 in reply to: ↑ 71 ; follow-up: Changed 3 years ago by anonymous

Replying to michaeluray:

After the patch from r42955 did not work, I applied again to add my patch to the OpenWRT Source:
https://lists.openwrt.org/pipermail/openwrt-devel/2014-October/028838.html

You should check RAM of router.

opkg update
opkg install memtester
memtester 90m 1

comment:73 in reply to: ↑ 72 ; follow-up: Changed 3 years ago by michaeluray

Replying to anonymous:

You should check RAM of router.

opkg update
opkg install memtester
memtester 90m 1

I checked the ram on the router as you recommended and everything is fine with it.

root@OpenWrt:~# memtester 90m 1
memtester version 4.1.3 (32-bit)
Copyright (C) 2010 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffff000
want 90MB (94371840 bytes)
got  90MB (94371840 bytes), trying mlock ...locked.
Loop 1/1:
  Stuck Address       : ok
  Random Value        : ok
  Compare XOR         : ok
  Compare SUB         : ok
  Compare MUL         : ok
  Compare DIV         : ok
  Compare OR          : ok
  Compare AND         : ok
  Sequential Increment: ok
  Solid Bits          : ok
  Block Sequential    : ok
  Checkerboard        : ok
  Bit Spread          : ok
  Bit Flip            : ok
  Walking Ones        : ok
  Walking Zeroes      : ok

Done.

At the moment I have 5 routers (TL-WDR4300, HW version V1.7) where the reset problem occurs.
I also got contacted by "lc" two weeks ago and he confirmed that this fix (r42955) does not work at his routers as well and he has a bunch of it.
So far as I can see, it happens on all of the new TL-WDR4300 models and probably on other devices too.

comment:74 in reply to: ↑ 73 ; follow-up: Changed 3 years ago by anonymous

Replying to michaeluray:

Replying to anonymous:
At the moment I have 5 routers (TL-WDR4300, HW version V1.7) where the reset problem occurs.

Hmm... Can you test other firmware? Please try AA 12.09 with some patches: http://drive.google.com/file/d/0B_-qnrQh__6KaUNHUzBmWmhCS0E/edit

I found this link while reading an extensive ​Russian forum thread on OpenWrt on the TL-WDR4300.

comment:75 in reply to: ↑ 74 Changed 3 years ago by michaeluray

Replying to anonymous:

Hmm... Can you test other firmware? Please try AA 12.09 with some patches: http://drive.google.com/file/d/0B_-qnrQh__6KaUNHUzBmWmhCS0E/edit

I did test this version but the router hangs with it.

The only solution which works for me is from comment:58 which is based on comment:51.

For me is it the final solution to use my self compiled BB with the patch in comment:58, but it would be good to adapt this also into the next OpenWRT version.

Replying to michaeluray:

After the patch from r42955 did not work, I applied again to add my patch to the OpenWRT Source:
https://lists.openwrt.org/pipermail/openwrt-devel/2014-October/028838.html

I did not get any response to my last request on the mailing list to add this patch to the official source and so I don't know how to go ahead here.

Last edited 3 years ago by michaeluray (previous) (diff)

comment:76 Changed 3 years ago by anonymous

Any progress?

comment:77 Changed 3 years ago by aversario

I can confirm TL-WDR3600 (HW 1.5) with CHAOS CALMER (Bleeding Edge, r43375) still hangs on reboot. Even with u-boot_mod by pepe2k -> https://github.com/pepe2k/u-boot_mod/releases/tag/2014-11-19

comment:78 follow-up: Changed 3 years ago by michaeluray

I compiled my own barrier breaker version with the mentioned patch and it works fine in a productive environment.

You can try the compiled versions on your devices if you want,

​http://www.ctb.co.at/download/OpenWRT/14.07-rf/ar71xx/generic/openwrt-ar71xx-generic-tl-wdr3600-v1-squashfs-factory.bin
​http://www.ctb.co.at/download/OpenWRT/14.07-rf/ar71xx/generic/openwrt-ar71xx-generic-tl-wdr3600-v1-squashfs-sysupgrade.bin
​http://www.ctb.co.at/download/OpenWRT/14.07-rf/ar71xx/generic/openwrt-ar71xx-generic-tl-wdr4300-v1-squashfs-factory.bin
​http://www.ctb.co.at/download/OpenWRT/14.07-rf/ar71xx/generic/openwrt-ar71xx-generic-tl-wdr4300-v1-squashfs-sysupgrade.bin

or you can build your own by using my patch:

svn checkout -rr42625 svn://svn.openwrt.org/openwrt/branches/barrier_breaker/
cd barrier_breaker
scripts/feeds update -a && scripts/feeds install -a
wget http://www.ctb.co.at/download/OpenWRT/14.07-rf/ar71xx/generic/.config
wget http://www.ctb.co.at/download/OpenWRT/14.07-rf/ar71xx/generic/903-MIPS-ath79-fix-restart.patch -P target/linux/ar71xx/patches-3.10
make menuconfig
make

The patch will maybe also work on CC, but I never tried it.

comment:79 Changed 3 years ago by anonymous

I can confirm also that current trunk ( r43322) hangs after reboot on my WDR4300 ver 1.7 router.
It seems the only stable fix was what I have tested so far was the below one - from comment 58 -:

http://www.ctb.co.at/download/OpenWRT/14.07-rf/ar71xx/generic/openwrt-ar71xx-generic-tl-wdr4300-v1-squashfs-sysupgrade.bin

I am going to test it again. While I tested the above firmware for awhile.

comment:80 in reply to: ↑ 78 ; follow-up: Changed 3 years ago by aversario

Replying to michaeluray:

or you can build your own by using my patch:

tnx, there's no doubt your patch works on BB r.42625

The patch will maybe also work on CC, but I never tried it.

I'm building CC r.43375, which should have this patch included, but strangely doesn't work -> https://dev.openwrt.org/browser/trunk/target/linux/ar71xx/patches-3.14/728-MIPS-ath79-fix-restart.patch?rev=42955

Last edited 3 years ago by aversario (previous) (diff)

comment:81 in reply to: ↑ 80 ; follow-up: Changed 3 years ago by michaeluray

Replying to aversario:

The patch will maybe also work on CC, but I never tried it.

I'm building CC r.43375, which should have this patch included, but strangely doesn't work -> https://dev.openwrt.org/browser/trunk/target/linux/ar71xx/patches-3.14/728-MIPS-ath79-fix-restart.patch?rev=42955

This is not the patch which I proposed and it also is not working on my devices.
For this reason I reopened this ticket some weeks ago, please see comment:70 and comment:71.

Replying to michaeluray:

After the patch from r42955 did not work, I applied again to add my patch to the OpenWRT Source:
https://lists.openwrt.org/pipermail/openwrt-devel/2014-October/028838.html

I did not get any response to my last request on the mailing list to add this patch to the official source and so I don't know how to go ahead here.

I think I will try it once more to apply the patch to the official sources via the mailing list.

comment:82 in reply to: ↑ 81 Changed 3 years ago by anonymous

Replying to michaeluray:

I think I will try it once more to apply the patch to the official sources via the mailing list.

Maybe is it make sense to replace eternal loop on delay?

--- a/arch/mips/ath79/common.c
+++ b/arch/mips/ath79/common.c
@@ -14,6 +14,7 @@
 
 #include <linux/kernel.h>
 #include <linux/module.h>
+#include <linux/delay.h>
 #include <linux/types.h>
 #include <linux/spinlock.h>
 
@@ -80,6 +81,9 @@ void ath79_device_reset_set(u32 mask)
 	spin_lock_irqsave(&ath79_device_reset_lock, flags);
 	t = ath79_reset_rr(reg);
 	ath79_reset_wr(reg, t | mask);
+	if (mask == AR71XX_RESET_FULL_CHIP)
+		udelay(100);
+
 	spin_unlock_irqrestore(&ath79_device_reset_lock, flags);
 }
 EXPORT_SYMBOL_GPL(ath79_device_reset_set);

comment:83 follow-ups: Changed 3 years ago by nbd

please try this patch: http://nbd.name/reboot-fix.patch

comment:84 in reply to: ↑ 83 Changed 3 years ago by michaeluray

Replying to anonymous:

Replying to michaeluray:

I think I will try it once more to apply the patch to the official sources via the mailing list.

Maybe is it make sense to replace eternal loop on delay?

Replying to nbd:

please try this patch: http://nbd.name/reboot-fix.patch

As I mentioned a couple times here, there is a patch which works.
So what is wrong with it and why should I try your proposals?
I did send this patch again to the development mailing list and I got again no response on it:
https://lists.openwrt.org/pipermail/openwrt-devel/2014-December/029662.html
I actually do not want to spend more time on this issue if there is not a special reason for it, especially because it gets fixed by the mentioned patch.

comment:85 follow-up: Changed 3 years ago by nbd

What's wrong with that patch is that it is unclear why it works. I'm trying other approaches in order to find out more about the problem.

comment:86 in reply to: ↑ 85 Changed 3 years ago by Deny

We've tried michaeluray's patch and it clearly solves the problem we had (thanks for that by the way).
Will try nbd's patch and see what happens.

comment:87 follow-up: Changed 3 years ago by anonymous

anyone know what type of ram started with serial # 2141 on TL-WDR3600/4300?

comment:88 in reply to: ↑ 87 Changed 3 years ago by aversario

Replying to anonymous:

anyone know what type of ram started with serial # 2141 on TL-WDR3600/4300?

I have a tl-wdr3600 with 2143 serial number and it has 2 Zentel A3R12E40CBF-8E chips.
(total of 1GB? -> http://61.222.70.43/upload/product/datasheet_3_2013-09-17_13-12-07.2_Zentel-20130911)

comment:89 in reply to: ↑ 83 Changed 3 years ago by markit

Replying to nbd:

please try this patch: http://nbd.name/reboot-fix.patch

Tested it on unifi ap, there it works.

But if it works on TP-Link, I cant tell, as i have no affected tplinks.

comment:90 Changed 3 years ago by jernej@…

CCing.

comment:91 Changed 3 years ago by nbd

  • Resolution set to fixed
  • Status changed from reopened to closed

fix committed in r43777, please reopen if the issue shows up again on other devices.

comment:92 Changed 3 years ago by jernej@…

Can this be backported to BB?

comment:93 Changed 3 years ago by nbd

yes, once i get more test feedback on this.

comment:95 Changed 3 years ago by nbd

No, that's a different patch.

comment:96 Changed 3 years ago by mwarning

@nbd: A friend has tested reboot ten times on a wdr3600 v1.1 using r43778. It works fine. Can we expect this patch in BB soon? :)

comment:97 Changed 3 years ago by Deny

r43777 tested on 3600 and works fine after more than 100 reboots.

comment:98 Changed 3 years ago by nbd

backported in r44065, thanks for testing

comment:99 Changed 3 years ago by anonymous

Hi! A newbie here.... How can I install this on 14.07? I've just transitioned from DD-WRT to OpenWrt on 7 WDR3600s, but this bug is killing me :(

comment:100 Changed 3 years ago by anonymous

Hi! A newbie here.... How can I install this on 14.07? I've just transitioned from DD-WRT to OpenWrt on 7 WDR3600s, but this bug is killing me :(

comment:101 Changed 3 years ago by mwarning

I think you need to build a fresh 14.07 image yourself or use the developer builds: http://downloads.openwrt.org/snapshots/trunk/ The maintainers may need to rebuild the release images.

comment:102 Changed 3 years ago by anonymous

Thanks a lot and sorry for the double post, my internet connection is not amazing right now.

The trunk images are "nightlies", not stable images, right? Has this patch already been applied?

Thanks again for the help!

comment:103 Changed 3 years ago by mwarning

yes and yes.

comment:104 Changed 3 years ago by anonymous

Does anyone have an image with the patch applied? Unfortunately I cannot compile it myself and I don't want to install a trunk image and go through the elaborate process of installing all missing packages and probably have to deal with more instability than this bug. It would be highly appreciated :)

comment:105 Changed 3 years ago by anonymous

@nbd, I could be mistaken, but when I apply your patch or when I clone the newest BB down, this seems to break extroot. Is anyone else experiencing this?

comment:106 Changed 3 years ago by anonymous

@nbd, I could be mistaken, but when I apply your patch or when I clone the newest BB down, this seems to break extroot. Is anyone else experiencing this?

comment:107 Changed 2 years ago by 1cc13944@…

yes setup pivot overlay and now hangs :(

comment:108 Changed 2 years ago by anonymous

Thanks Mwarning for answering a different newbie and big thanks to all contributors.
The /snapshots/trunk/ has an -il- version of the squashfs and one without -il-.
My TP-Link N750/WDR4300 V 1.7 -> RaspberryPi -> Arduino project will someday be 3 km away from my location so the rebooting is a challenge I need to solve.
Kindly advise on the difference or the meaning of -il-.

Regards from the Kingdom of Bahrain,
Gero

comment:109 Changed 2 years ago by anonymous

I switched to chaos calmer and it's working fine fyi.

comment:110 Changed 2 years ago by leonardo.canducci@…

I'm a new openwrt user so I apologize if I don't grok openwrt workflow yet.
How do I get an image with r44065 fix? Should I compile an entire image myself? What's the point of backporting if no new stable bugfix releases are built?

comment:111 Changed 18 months ago by MajkWood

ZBT-WE826 affected for reboot command too on fresh last trunk. It freeze on every reboot. Patch is not working for me.

Maybe is it another problem and could be create new ticket for it.

root@OpenWrt:/# echo s > /proc/sysrq-trigger
[ 21.953623] sysrq: SysRq : Emergency Sync
[ 21.957933] Emergency Sync complete
root@OpenWrt:/# echo u > /proc/sysrq-trigger
[ 22.532805] sysrq: SysRq : rigger
Emergency Remount R/O
[ 22.539529] Emergency Remount complete
root@OpenWrt:/# echo b > /proc/sysrq-trigger
[ 22.983407] sysrq: SysRq : rigger
Resetting

and freeze

[edit]: reboot command freeze too

Last edited 18 months ago by MajkWood (previous) (diff)

Add Comment

Modify Ticket

Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.