Discussion:
[RFC] x86_64: Move initrd relocation to sysdeps_create_boot_params()
Chris McDermott
2010-03-10 00:01:46 UTC
Permalink
I have been working on bringing up an IBM x86 server in native EFI mode and
have encountered a problem where the initrd does not get properly mounted
by the kernel. I have tracked down the problem to an elilo issue in
sysdeps_create_boot_params() (from x86_64/system.c) where the code that
iniializes bp->s.initrd_start ends up truncating the actual initrd start
address.

In load_file(), the AllocatePages() EFI Boot Service is called to allocate
memory for the initrd. This invocation passes AllocateAnyPages as the
allocation request type, which informs the firmware to allocate any available
range of pages that will satisfy the memory request. On this particular
platform the starting address for the range of pages returned by the firmware
is > 4GB, which is perfectly legal given my interpretation of the UEFI spec.

sysdeps_create_boot_params() will then truncate initrd->start_addr to a 32-bit
address. Notice the "This will probably have to be changed" comment. :-)
start_kernel() (in x86_64/sysdeps.h) will relocate the initial ramdisk to an
address below 4GB, because that's what the elilo/kernel handoff expects.
However, by that point it's already too late.

So, what do folks think about moving the code that relocates the initrd to
INITRD_START from start_kernel() to sysdeps_create_boot_params()?

Another possible fix would be to have load_file() use the AllocateAddress
allocation type, which it appears to conditionally do now, but the code
looks suspect to me. Or, it could be that I just don't understand it.
A secondary unrelated problem I worry about is how we prevent the initrd
from being relocated on top of itself (if the firmware returns an
AllocateAnyPages that happens to span 50MB)?

Here's the current working patch I'm using that relocates the initrd in
sysdeps_create_boot_params(). In order to preserve existing behavior, I
actually implemented a patch that only relocates initrd's that start above
4GB. But, I don't see any reason why this couldn't be done in _all_ cases.



diff -pruN a/x86_64/sysdeps.h b/x86_64/sysdeps.h
--- a/x86_64/sysdeps.h 2010-03-09 14:54:10.000000000 -0800
+++ b/x86_64/sysdeps.h 2010-03-09 14:58:55.000000000 -0800
@@ -401,7 +401,7 @@ start_kernel(VOID *kentry, boot_params_t
MEMCPY(kernel_start, kernel_load_address, kernel_size);
}

- if (bp->s.initrd_start) {
+ if (bp->s.initrd_start && (bp->s.initrd_start != INITRD_START)) {
temp = bp->s.initrd_start;
MEMCPY(INITRD_START, temp , bp->s.initrd_size);
bp->s.initrd_start = INITRD_START;
diff -pruN a/x86_64/system.c b/x86_64/system.c
--- a/x86_64/system.c 2010-03-09 14:54:20.000000000 -0800
+++ b/x86_64/system.c 2010-03-09 15:17:10.000000000 -0800
@@ -557,8 +557,14 @@ sysdeps_create_boot_params(
bp->s.ramdisk_flags = 0;

if (initrd->start_addr && initrd->pgcnt) {
- /* %%TBD - This will probably have to be changed. */
- bp->s.initrd_start = (UINT32)(UINT64)initrd->start_addr;
+ /* If initrd was loaded at addr > 4GB, relocate now */
+ if ((UINT64)initrd->start_addr & 0xffffffff00000000) {
+ uint64_t temp = (UINT64)initrd->start_addr;
+ MEMCPY(INITRD_START, temp , initrd->size);
+ bp->s.initrd_start = INITRD_START;
+ } else
+ bp->s.initrd_start = (UINT32)(UINT64)initrd->start_addr;
+
bp->s.initrd_size = (UINT32)(initrd->size);
/*
* This is the RAMdisk root device for RedHat 2.2.x




thanks,
--
Chris McDermott <lcm-r/Jw6+rmf7HQT0dZR+***@public.gmane.org>
Raymund Will
2010-03-19 16:33:31 UTC
Permalink
Chris McDermott wrote on 2010-03-09T16:01:46 -0800:
[...]
Post by Chris McDermott
So, what do folks think about moving the code that relocates the initrd to
INITRD_START from start_kernel() to sysdeps_create_boot_params()?
Isn't that asking for trouble?
Given, that the firmware didn't hand out that memory, when it's been
kindly asked, I'd consider it dangerous to just ignore that and simply
overwrite it, especially way before ExitBootServices is called.
Post by Chris McDermott
Another possible fix would be to have load_file() use the AllocateAddress
allocation type, which it appears to conditionally do now, but the code
looks suspect to me.
And rightly so: it's buggy! The call to 'sysdeps_initrd_get_addr()'
is ignored twice, first when 'start_addr' is initialized as NULL and
second, when alloc_pages() is given a '0' even when 'start_addr' should
happen to be non-null!

[...]
Post by Chris McDermott
A secondary unrelated problem I worry about is how we prevent the initrd
from being relocated on top of itself (if the firmware returns an
AllocateAnyPages that happens to span 50MB)?
That's been taken care of by the patch to MEMCPY by Stuart Hayes!
(It simply adjusts the copy direction.)
Post by Chris McDermott
Here's the current working patch I'm using that relocates the initrd in
sysdeps_create_boot_params(). In order to preserve existing behavior, I
actually implemented a patch that only relocates initrd's that start above
4GB. But, I don't see any reason why this couldn't be done in _all_ cases.
[...]

See above. We should simply avoid allocating the initrd above the
kernel-specified 'initrd_addr_max' (which is significantly below 4GB ATM).
(BTW, this will be addressed by patch #2 in the series I'm about to send!)

Thanks,
--
Raymund Will rw-***@public.gmane.org
SUSE LINUX Products GmbH GF: Markus Rex HRB 16746 (AG Nuernberg)
Chris McDermott
2010-03-23 01:33:57 UTC
Permalink
Post by Raymund Will
[...]
Post by Chris McDermott
So, what do folks think about moving the code that relocates the initrd to
INITRD_START from start_kernel() to sysdeps_create_boot_params()?
Isn't that asking for trouble?
Given, that the firmware didn't hand out that memory, when it's been
kindly asked, I'd consider it dangerous to just ignore that and simply
overwrite it, especially way before ExitBootServices is called.
No. The firmware DID hand out the memory. It just handed out a chunk of memory
at an address (above 4GB) that elilo can't correctly manage.
Post by Raymund Will
Post by Chris McDermott
Another possible fix would be to have load_file() use the AllocateAddress
allocation type, which it appears to conditionally do now, but the code
looks suspect to me.
And rightly so: it's buggy! The call to 'sysdeps_initrd_get_addr()'
is ignored twice, first when 'start_addr' is initialized as NULL and
second, when alloc_pages() is given a '0' even when 'start_addr' should
happen to be non-null!
Yep. Agreed.
Post by Raymund Will
[...]
See above. We should simply avoid allocating the initrd above the
kernel-specified 'initrd_addr_max' (which is significantly below 4GB ATM).
(BTW, this will be addressed by patch #2 in the series I'm about to send!)
Yes, I agree this seems like the safest approach. Particularly since the
x64 elilo->kernel handoff restricts the initrd start address to < 4GB.


thanks,
--
Chris McDermott <lcm-***@public.gmane.org>
Loading...