Modify

Opened 5 years ago

Closed 4 years ago

Last modified 22 months ago

#8753 closed defect (obsolete)

Low performance on board Livebox

Reported by: danitool <dgcbueu@…> Owned by: florian
Priority: highest Milestone: Barrier Breaker 14.07
Component: kernel Version: Trunk
Keywords: livebox performance oprofile Cc:

Description

Openwrt has a very low performance on the board Livebox (DV4210) bcm6348. Every active process raises cpu to 100 wich results in a very bad performance.

I took wget as a reference, downloading a big archive through ethernet to /dev/null. Whereas in the original Inventel's firmware (kernel 2.4) it behaves good enough with a transfer rate of 2.4 M/s, with openwrt I can only get 700 k/s and consuming 100% of CPU slowing down the entire system. Both transfers are done under same conditions and without routing. This is not only a wget problem, also luci or other apps are simply unusable.

I did perform an oprofile test after running wget on openwrt, though I don't have any skills to detect faults.

I was trying to dig for a long time where is the problem, no results. I tried to do some things, like disabling the "wait instruction" in the sources without success. Also looking at the inventel kernel sources, and comparing with openwrt.
Perhaps other boards are afected, like Homehub 1 which is almost identical to livebox.

Could you give some clues or modifications to probe on this board?

any ideas are welcome

Attachments (2)

oprofile00.txt (29.9 KB) - added by danitool <dgcbueu@…> 5 years ago.
c-r4k.diff (472 bytes) - added by danitool <dgcbueu@…> 5 years ago.

Download all attachments as: .zip

Change History (20)

Changed 5 years ago by danitool <dgcbueu@…>

comment:1 Changed 5 years ago by danitool <dgcbueu@…>

I don't know if this matters but these are some differences at early log in inventlel's firmware and Openwrt

Whit the Inventel Thomson firmware kernel 2.6.12:

# cat /proc/kmsg
Linux version 2.6.12.6 (luyckxo@cplx208.edegem.eu.thmulti.com) (gcc version 3.4.2) #1 Thu Jul 19 11:20:13 CEST 2007
C0 config : 2147516544x 
CPU revision is: 00029107
mpi: No Card is in the PCMCIA slot
Determined physical RAM map:
 memory: 00fa0000 @ 00000000 (usable)
 memory: 00060000 @ 00fa0000 (reserved)
On node 0 totalpages: 4000
  DMA zone: 4000 pages, LIFO batch:1
  Normal zone: 0 pages, LIFO batch:1
  HighMem zone: 0 pages, LIFO batch:1
Built 1 zonelists
Kernel command line: boot_loader=RedBoot root=1F01
brcm mips: enabling icache and dcache...
Primary instruction cache 16kB, physically tagged, 2-way, linesize 16 bytes.
Primary data cache 8kB, 2-way, linesize 16 bytes.
Synthesized TLB refill handler (19 instructions).
Synthesized TLB load handler fastpath (31 instructions).
Synthesized TLB store handler fastpath (31 instructions).
Synthesized TLB modify handler fastpath (30 instructions).
PID hash table entries: 64 (order: 6, 1024 bytes)
Using 128.000 MHz high precision timer.
Dentry cache hash table entries: 4096 (order: 2, 16384 bytes)
Inode-cache hash table entries: 2048 (order: 1, 8192 bytes)
Memory: 13148k/16000k available (1829k kernel code, 2832k reserved, 238k data, 336k init, 0k highmem)
Calibrating delay loop... (HZ=200) 255.59 BogoMIPS (lpj=638976)
Mount-cache hash table entries: 512
Checking for 'wait' instruction...  unavailable.
NET: Registered protocol family 16
...................

Whereas in Openwrt:

root@OpenWrt:/# dmesg 
Linux version 2.6.35.9 (dani@tool) (gcc version 4.3.3 (GCC) ) #3 Thu Feb 3 19:02:59 CET 2011
Detected Broadcom 0x6348 CPU revision b0
CPU frequency is 256 MHz
16MB of RAM installed
registering 37 GPIOs
bootconsole [early0] enabled
CPU revision is: 00029107 (Broadcom BCM6348)
board_livebox: board name: Livebox
Determined physical RAM map:
memory: 01000000 @ 00000000 (usable)
Initrd not found or empty - disabling initrd
Zone PFN ranges:
   Normal   0x00000000 -> 0x00001000
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
0: 0x00000000 -> 0x00001000
On node 0 totalpages: 4096
free_area_init_node: node 0, pgdat 802821a0, node_mem_map 802c9000
Normal zone: 32 pages used for memmap
Normal zone: 0 pages reserved
Normal zone: 4064 pages, LIFO batch:0
Built 1 zonelists in Zone order, mobility grouping off.  Total pages: 4064
Kernel command line:  root=/dev/mtdblock1 rootfstype=cramfs,squashfs noinitrd console=ttyS0,115200
PID hash table entries: 64 (order: -4, 256 bytes)
Dentry cache hash table entries: 2048 (order: 1, 8192 bytes)
Inode-cache hash table entries: 1024 (order: 0, 4096 bytes)
Primary instruction cache 16kB, VIPT, 2-way, linesize 16 bytes.
Primary data cache 8kB, 2-way, VIPT, no aliases, linesize 16 bytes
Memory: 13372k/16384k available (2134k kernel code, 3012k reserved, 371k data, 148k init, 0k highmem)
Hierarchical RCU implementation.
 RCU-based detection of stalled CPUs is disabled.
 Verbose stalled-CPUs detection is disabled.
NR_IRQS:128
Calibrating delay loop... 253.63 BogoMIPS (lpj=989184)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 512
NET: Registered protocol family 16

"Synthesized TLB refill handler" is not logged, or not done in Openwrt.
And

Dentry cache hash table entries: 4096 (order: 2, 16384 bytes)
Inode-cache hash table entries: 2048 (order: 1, 8192 bytes)

in contrast to Openwrt:

Dentry cache hash table entries: 2048 (order: 1, 8192 bytes)
Inode-cache hash table entries: 1024 (order: 0, 4096 bytes)

And another nil difference "On node 0 totalpages: 4000" whereas in Openwrt this value is 4096

comment:2 Changed 5 years ago by KanjiMonster

I don't see these either in my bootlogs, so I don't think taht they have to do anything with it.

What I find rather curious is:

brcm mips: enabling icache and dcache...

Which could mean that the i/d-cache is turned off when linux boots, and I don't know whether linux initializes the caches on boot.
If this is the case, then this would definitely be a reason for low performance.

Can you check whether the part enabling the i/dcaches is part of the GPL sources for the Livebox? (probably should be in arch/mips/brcm-boards/ ).

comment:3 Changed 5 years ago by danitool <dgcbueu@…>

I found this snippet code at the incomplete Inventel-thomson sourcecode, located in c-r4k.c

#if defined(CONFIG_MIPS_BRCM)
		printk("brcm mips: enabling icache and dcache...\n");
		/* Enable caches */
		write_c0_diag(read_c0_diag() | 0xC0000000);
#endif

Not present in the official kernel 2.6.12.6, then added by the Inventel turkeys.
The variable write_c0_diag is defined at mipsregs.h already present in the old unmodified kernel and Openwrt-linux kernel. It is located under /* RM9000 PerfControl performance counter control register */

Btw there is no #include <asm/mipsregs.h> header in the c-r4k.c file, nor other includes which references to this function. Might be something missing.

comment:4 Changed 5 years ago by florian

  • Owner changed from developers to florian
  • Status changed from new to accepted

This perfectly makes sense, and the reason why I could not see this on other boards is because the Inventel bootloader disables the I-cache and D-cache, while the other plaforms running CFE do not.

Feel free to add a patch doing this in c-r4k.c when the LIVEBOX board is enabled.

Changed 5 years ago by danitool <dgcbueu@…>

comment:5 Changed 5 years ago by danitool <dgcbueu@…>

I made the patch, with the code as is. It works quite good. After this, the board experimented a dramatic improvement on its performance. The time to start Openwrt is reduced to 1/3. And now wget downloads at a 1.8 M/s transfer rate, pretty close to the expected one.

comment:6 Changed 5 years ago by danitool <dgcbueu@…>

Would be better to put this code in the board_livebox.c file?.
This works for me as good as with the patched c-r4k.c

/*
 * all boards
 */
static const struct board_info __initdata *bcm963xx_boards[] = {
#ifdef CONFIG_BCM63XX_CPU_6348
	&board_livebox
#endif
};

+static void __cpuinit enable_idcache(void)
+{
+#ifdef CONFIG_BCM63XX_CPU_6348
+  printk("board_livebox: enabling icache and dcache...\n");
+	/* Enable caches */
+	write_c0_diag(read_c0_diag() | 0xC0000000);
+#endif
+}

/*
 * early init callback
 */
void __init board_prom_init(void)
{
	u32 val;

	/* read base address of boot chip select (0) */
	val = bcm_mpi_readl(MPI_CSBASE_REG(0));
	val &= MPI_CSBASE_BASE_MASK;

	/* assume board is a Livebox */
	memcpy(&board, bcm963xx_boards[0], sizeof(board));
+	enable_idcache();
	/* setup pin multiplexing depending on board enabled device,
	 * this has to be done this early since PCI init is done
	 * inside arch_initcall */

comment:7 Changed 5 years ago by danitool <dgcbueu@…>

Looking at the sources from other bcm63xx boards using cfe bootloader, I noticed this code is also present. The usr9108 sources have this code in c-r4k.c:

#if defined(CONFIG_MIPS_BRCM)
        	if (c->cputype == CPU_BCM6338 || c->cputype == CPU_BCM6345 || c->cputype == CPU_BCM6348){
			printk("brcm mips: enabling icache and dcache...\n");
        	        /* Enable caches */
        	        write_c0_diag(read_c0_diag() | 0xC0000000);
        	}
#endif

and in the comtrend 5361T the identical code as the inventel board:

#if defined(CONFIG_MIPS_BRCM)
		printk("brcm mips: enabling icache and dcache...\n");
        	    /* Enable caches */
        	    write_c0_diag(read_c0_diag() | 0xC0000000);
#endif

I own the last model of this router and the early log also shows the message:

brcm mips: enabling icache and dcache...

Thus other boards may be affected by this issue too, not only the redboot ones.

comment:8 Changed 5 years ago by florian

As far as I can tell, only the RedBoot based boards are affected, this was fixed in CFE a while ago for other boards.

comment:9 Changed 5 years ago by danitool <dgcbueu@…>

I tried to use different code to enable caches.

As a result of my tests, I can say the code:

#if defined(CONFIG_MIPS_BRCM)
		printk("brcm mips: enabling icache and dcache...\n");
        	    /* Enable caches */
        	    write_c0_diag(read_c0_diag() | 0xC0000000);
#endif

and the code present in trunk/target/linux/brcm47xx/patches-2.6.37/150-cpu_fixes.patch

	        case CPU_BMIPS3300: 
	                { 
	                        u32 cm; 
	                        cm = read_c0_diag(); 
	                        /* Enable icache */ 
	                        cm |= (1 << 31); 
	                        /* Enable dcache */ 
	                        cm |= (1 << 30); 
	                        write_c0_diag(cm); 
	                }

Both enable the caches in the board livebox, it seems the effect is exactly the same. So I don't know if some other improvements can be applied from the brcm47xx code.

comment:10 Changed 5 years ago by danitool <dgcbueu@…>

I'm still struggling with the performance. Other boards with exactly the same CPU (bcm6348 but with CFE) have much better performance. The livebox is very slow executing luci, transferring files, etc.

So, I presume enabling dcache and icache wasn't enough.

I suspect the readahead cache isn't enabled either by the redboot.

I want to check if this is true. But I don't know how to read from the memory locations core registers (kseg3).

These are the registers for the readahead cache located at 0xff400000 (taken from usrobotics released sources):

#define MIPS_BASE       0xff400000
#define RAC_CR0         0x00
#define RAC_PWR         (1 << 31)
#define RAC_BRR_PF      (1 << 30)
#define RAC_FLH         (1 << 8)
#define RAC_DPF         (1 << 6)
#define RAC_NCH         (1 << 5)
#define RAC_C_INV       (1 << 4)
#define RAC_PF_D        (1 << 3)
#define RAC_PF_I        (1 << 2)
#define RAC_D           (1 << 1)
#define RAC_I           (1 << 0)

#define RAC_CR1         0x04
#define RAC_UPB_SHFT    16
#define RAC_LWB_SHFT    0

I only need to know how to read the memory at 0xff400000 for those registers (32 bits length) to make a printk and check if all is ok with the RAC.

Some help are welcome, thanks.

comment:11 Changed 5 years ago by florian

You might want to also enable RAC cache using something like this:

        volatile unsigned int *cr0, *cr1;

        cr0 = (volatile unsigned int *)(MIPS_BASE + RAC_CR0);
        cr1 = (volatile unsigned int *)(MIPS_BASE + RAC_CR1);

        *cr0 = RAC_I | RAC_PF_I | RAC_C_INV;
        *cr1 = (0x2000 << RAC_UPB_SHFT);

and see if it gives you better performance

comment:12 Changed 4 years ago by danitool <dgcbueu@…>

And, yes, setting those RAC bits with that code gives better performance.

I was playing with some of the RAC options. If I set RAC_D and RAC_PF_D to 1, the system isn't stable and causes kernel Oops, so I suppose the readahead data cache is not present in this cpu (only inode cache), doesn't it?

Also when the RAC is configured but if I set to 1 the bit 20 at CP0 REGISTER 22, SELECT 0, the kernel doesn't load.

Finally I was playing with other registers: when I set to 1 the bit 26 at CP0 REGISTER 22, SELECT 0, the performance increases again. I don't know what this bit activates.

It seems the Redboot doesn't configure the hardware as good as should be expected. I'll try to read registers from other bcm6348 cfe based board (as soon as a friend with that board have some spare time) and compare to the livebox dv4210. Once I have those registers I'll try to tune the registers again.

Another question: RAC_UPB_SHFT must be 0x2000 or can I set any value between 0x0000 and 0xFFFF?

comment:13 Changed 4 years ago by danitool <dgcbueu@…>

I've got some registers from other boards with CFE. After several probes it seem's the relevant registers are these ones:

	/*REGISTER 16, SELECT 0*/
	write_c0_config(0x80008083);
	/*REGISTER 22, SELECT 0*/
	write_c0_diag(0xe3880000);

Setting bits 0,1,2 to the value 011 increase dramatically the performance. With this change setting the bit 26 to 1 in the REGISTER 22, SELECT 0 apparently makes no difference.
I use this function to enable RAC and set coprocessor registers to the same state other boards with CFE are:

static void __cpuinit set_cpuregs(void)
{
	volatile unsigned int *cr0, *cr1;
        cr0 = (volatile unsigned int *)(MIPS_BASE + RAC_CR0);
        cr1 = (volatile unsigned int *)(MIPS_BASE + RAC_CR1);

	printk("board_livebox: setting cpu registers...\n");
	
        *cr0 = RAC_I | RAC_PF_I ;
        *cr1 = (0x200 << RAC_UPB_SHFT);
	  printk("RAC changed: RAC_CR0 = 0x%0x, RAC_CR1 = 0x%0x\n", (int)*cr0, (int)*cr1);
	/*REGISTER 16, SELECT 0*/
	write_c0_config(0x80008083);
	/*REGISTER 22, SELECT 0*/
	write_c0_diag(0xe3880000); 
}

Now the performance is very close to those CFE board, still not perfect but almost.

Also there is a register in the livebox indicating there is a pending interrupt 0
/*REGISTER 22, SELECT 0*/ bit 10
The state of this bit is 1 in the livebox whereas in CFE boards are always 0. I don't know if it causes some impact on the performance or is it negligible.

I'll resend the patch with this an other changes for this board in the pending ticket elsewhere.

comment:14 Changed 4 years ago by danitool <dgcbueu@…>

I've noticed one of the problems I already solved is exactly the same problem the routerstation had some time ago:

http://forum.ubnt.com/showpost.php?p=31628&postcount=16

	/*
	 * Some bootloaders set the 'Kseg0 coherency algorithm' to
	 * 'Cacheable, noncoherent, write-through, no write allocate'
	 * and this cause performance issues. Let's go and change it to
	 * 'Cacheable, noncoherent, write-back, write allocate'
	 */
	.macro	kernel_entry_setup
	mfc0	t0, CP0_CONFIG
	li	t1, ~CONF_CM_CMASK
	and	t0, t1
	ori	t0, CONF_CM_CACHABLE_NONCOHERENT
	mtc0	t0, CP0_CONFIG
	nop
	.endm

	.macro	smp_slave_setup
	.endm

And this was the major problem in my board, the write-back for kseg0. And the caches of course.

I'm currently patching my bootloader with the previous fixes, I didn't find any difference in performance either enabling this features with the linux kernel or else modifying the bootloader.

comment:15 Changed 4 years ago by florian

I would rather we avoid patching any bootloader if people intend to run non-modified RedBoot with our kernel, this is not a big problem if we patch the kernel to enable the proper caching policy early in bcm63xx's board code.

comment:16 Changed 4 years ago by danitool <dgcbueu@…>

I don't know any working way to boot Openwrt with a non-modified bootloader.

The original redboot has three problems.

  • It only accept encrypted DWB firmwares for flashing.
  • We cannot break into the command line, even with the serial console
  • Finally if we change the partitions, the bootloader rejects to load any image from the flash chip, it reads some bytes at the end of the partition to make some kind of verification.

As a result of this I would say to use a modified Redboot is mandatory. I shouldn't expect somebody find the way to bypass the security checks made by the non-modified Redboot.

comment:17 Changed 4 years ago by florian

  • Resolution set to obsolete
  • Status changed from accepted to closed

Since you have updated RedBoot to enable RAC for your boards, we no longer need these patches.

comment:18 Changed 22 months ago by jow

  • Milestone changed from Attitude Adjustment 12.09 to Barrier Breaker 14.07

Milestone Attitude Adjustment 12.09 deleted

Add Comment

Modify Ticket

Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.