====== Differences ====== This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
soc:2010:peper:notes:usermode_explained [2010/06/05 15:10] peper created |
soc:2010:peper:notes:usermode_explained [2010/06/14 17:00] (current) peper |
||
---|---|---|---|
Line 39: | Line 39: | ||
==== Linking to stdlib (glibc) ==== | ==== Linking to stdlib (glibc) ==== | ||
+ | |||
+ | **UPDATE**: That approach has been moved to a separate ''linuxlibc'' ''PLATFORM'' and is available on the [[http://git.etherboot.org/?p=people/peper/gpxe.git;a=shortlog;h=refs/heads/linuxlibc|linuxlibc branch]]. | ||
Despite being non-trivial, forcing some compile flags to be disabled (namely ''-mrtd'' and ''-mregparm'' mentioned earlier) and having [[#the_other_problem_with_stdlib|some other problems]] linking to stdlib was still the quickest for prototyping. | Despite being non-trivial, forcing some compile flags to be disabled (namely ''-mrtd'' and ''-mregparm'' mentioned earlier) and having [[#the_other_problem_with_stdlib|some other problems]] linking to stdlib was still the quickest for prototyping. | ||
Line 85: | Line 87: | ||
} | } | ||
</code> | </code> | ||
+ | |||
+ | === Prefix === | ||
+ | |||
+ | stdlib's ''_start'' takes care of everything so the prefix code is empty. | ||
+ | |||
==== Being self-contained ==== | ==== Being self-contained ==== | ||
- | Work in progress. | + | To overcome the problems with linking to stdlib we need to implement some of its elementary features ourselves. |
+ | |||
+ | === Linker script === | ||
+ | |||
+ | A good read for starters is [[http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/gnu-linker/index.html|Using ld, the Gnu Linker]]. | ||
+ | With that backgrund the currently used linker scirpts (''arch/*/scripts/*.lds'') should make more sense. | ||
+ | |||
+ | As we are not going to be linking against stdlib, the linker script should be really simple. | ||
+ | In fact it turned out that there is already a simple enough linker script used for efi (''arch/x86/scripts/efi.lds'') that can be used more or less out of the box. | ||
+ | The only necessary modification is setting the start of the Text segment properly, because not every value works (you can try ''0x0'' and see :) | ||
+ | We can see what's the convention by looking at how the default linker script does it | ||
+ | by passing ''--verbose'' to ''ld'' while compiling a simple program in 32bit and 64bit mode. | ||
+ | |||
+ | <code> | ||
+ | $ gcc -m32 foo.c -o foo -Wl,--verbose | ||
+ | $ gcc -m64 foo.c -o foo -Wl,--verbose | ||
+ | </code> | ||
+ | |||
+ | From that we can gather that ''i386'' uses ''0x08048000'' and ''x86_64'' uses ''0x400000'' as the start address. | ||
+ | I haven't been able to find a good explanation on why these are used in particular. Moreover many other values also seem to be working. | ||
+ | Other way of figuring out the specific values is reading [[http://www.sco.com/developers/devspecs/abi386-4.pdf|i386 ABI]] (page 48) | ||
+ | and [[http://www.x86-64.org/documentation/abi.pdf|AMD64 ABI]] (page 26). | ||
+ | |||
+ | === Prefix (_start) === | ||
+ | |||
+ | ''_start'' being the default ''ENTRY'' point is the very first thing that's executed when a new process receives control. | ||
+ | What we want to do in ''_start'' is the minimal work necessary to actually call our ''main()'' function. | ||
+ | |||
+ | To accomplish that we need to know 3 things: | ||
+ | * What's the state of things when ''_start'' is executed | ||
+ | * How to actually call ''main()'' | ||
+ | * What to do when ''main()'' returns | ||
+ | |||
+ | The state of the stack and registers at the time of ''_start'' execution is descrbed in | ||
+ | [[http://www.sco.com/developers/devspecs/abi386-4.pdf|i386 ABI]] (page 54) | ||
+ | and [[http://www.x86-64.org/documentation/abi.pdf|AMD64 ABI]] (page 28). | ||
+ | |||
+ | The function calling convention is also desribed in the ABI docs: [[http://www.sco.com/developers/devspecs/abi386-4.pdf|i386 ABI]] (pages 36-38) | ||
+ | and [[http://www.x86-64.org/documentation/abi.pdf|AMD64 ABI]] (pages 15-23). A nice overview is [[http://www.agner.org/optimize/calling_conventions.pdf|calling conventions]]. | ||
+ | |||
+ | What we need to do after ''main()'' returns is to call the ''exit'' syscall. Details on that are in the next section. | ||
+ | |||
+ | To actually make use of all that information we need to learn GNU Assembler first though. | ||
+ | I haven't been able to find any too good docs on it and certainly nothing resembling a tutorial. | ||
+ | Look at [[http://sig9.com/articles/att-syntax|quick syntax]], [[ftp://ftp.estec.esa.nl/pub/ws/wsd/erc32/doc/as.pdf|manual]] and [[http://tigcc.ticalc.org/doc/gnuasm.html|manual2]]. | ||
+ | |||
+ | Following simplified ''_start''s should make sense now: | ||
+ | |||
+ | ''arch/i386/prefix/linuxprefix.S'': | ||
+ | <code asm> | ||
+ | _start: | ||
+ | xorl %ebp, %ebp // ABI wants us to zero the base frame | ||
+ | |||
+ | popl %esi // save argc | ||
+ | movl %esp, %edi // save argv | ||
+ | |||
+ | pushl %edi // argv -> C arg2 | ||
+ | pushl %esi // argc -> C arg1 | ||
+ | |||
+ | call main | ||
+ | |||
+ | movl %eax, %ebx // rc -> syscall arg1 | ||
+ | movl $__NR_exit, %eax | ||
+ | int $0x80 | ||
+ | </code> | ||
+ | ''arch/x86_64/prefix/linuxprefix.S'': | ||
+ | <code asm> | ||
+ | _start: | ||
+ | xorq %rbp, %rbp // ABI wants us to zero the base frame | ||
+ | |||
+ | popq %rdi // argc -> C arg1 | ||
+ | movq %rsp, %rsi // argv -> C arg2 | ||
+ | |||
+ | call main | ||
+ | |||
+ | movq %rax, %rdi // rc -> syscall arg1 | ||
+ | movq $__NR_exit, %rax | ||
+ | syscall | ||
+ | </code> | ||
+ | |||
+ | === Syscalls === | ||
+ | |||
+ | To provide the necessary kernel API (functions declared in ''include/linux_api.h'') we need a way to perform syscalls. | ||
+ | |||
+ | A simple way of doing that is implementing our own ''int syscall(int number, ...);'' | ||
+ | as ''long linux_syscall(int number, ...);'' and using that as the building block. | ||
+ | |||
+ | The syscall calling conventions is a bit different than normal function calling convention on both ''i386'' and ''x86_64''. | ||
+ | The [[http://www.x86-64.org/documentation/abi.pdf|AMD64 ABI]] (pages 123-124) is an informative section covering that for ''x86_64''. | ||
+ | For ''i386'' we can look at [[http://www.cin.ufpe.br/~if817/arquivos/asmtut/index.html#syscalls|i386 syscalls]]. | ||
+ | |||
+ | With that information we can implement our own ''syscall()''. | ||
+ | |||
+ | ''arch/i386/core/linux/linux_syscall.S'': | ||
+ | <code asm> | ||
+ | linux_syscall: | ||
+ | /* Save registers */ | ||
+ | pushl %ebx | ||
+ | pushl %esi | ||
+ | pushl %edi | ||
+ | pushl %ebp | ||
+ | |||
+ | movl 20(%esp), %eax // C arg1 -> syscall number | ||
+ | movl 24(%esp), %ebx // C arg2 -> syscall arg1 | ||
+ | movl 28(%esp), %ecx // C arg3 -> syscall arg2 | ||
+ | movl 32(%esp), %edx // C arg4 -> syscall arg3 | ||
+ | movl 36(%esp), %esi // C arg5 -> syscall arg4 | ||
+ | movl 40(%esp), %edi // C arg6 -> syscall arg5 | ||
+ | movl 44(%esp), %ebp // C arg7 -> syscall arg6 | ||
+ | |||
+ | int $0x80 | ||
+ | |||
+ | /* Restore registers */ | ||
+ | popl %ebp | ||
+ | popl %edi | ||
+ | popl %esi | ||
+ | popl %ebx | ||
+ | |||
+ | cmpl $-4095, %eax | ||
+ | jae 1f | ||
+ | ret | ||
+ | |||
+ | 1: | ||
+ | negl %eax | ||
+ | movl %eax, linux_errno | ||
+ | movl $-1, %eax | ||
+ | ret | ||
+ | </code> | ||
+ | |||
+ | ''arch/x86_64/core/linux/linux_syscall.S'': | ||
+ | <code asm> | ||
+ | linux_syscall: | ||
+ | movq %rdi, %rax // C arg1 -> syscall number | ||
+ | movq %rsi, %rdi // C arg2 -> syscall arg1 | ||
+ | movq %rdx, %rsi // C arg3 -> syscall arg2 | ||
+ | movq %rcx, %rdx // C arg4 -> syscall arg3 | ||
+ | movq %r8, %r10 // C arg5 -> syscall arg4 | ||
+ | movq %r9, %r8 // C arg6 -> syscall arg5 | ||
+ | movq 8(%rsp), %r9 // C arg7 -> syscall arg6 | ||
+ | |||
+ | syscall | ||
+ | |||
+ | cmpq $-4095, %rax | ||
+ | jae 1f | ||
+ | ret | ||
+ | |||
+ | 1: | ||
+ | negq %rax | ||
+ | movl %eax, linux_errno | ||
+ | movq $-1, %rax | ||
+ | ret | ||
+ | </code> | ||
+ | |||
+ | With that in place we can implement most of the functions as simple wrappers: | ||
+ | <code c> | ||
+ | void * linux_mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset) | ||
+ | { | ||
+ | return (void*)linux_syscall(__SYSCALL_mmap, addr, length, prot, flags, fd, offset); | ||
+ | } | ||
+ | |||
+ | void * linux_mremap(void * old_address, size_t old_size, size_t new_size, int flags) | ||
+ | { | ||
+ | return (void*)linux_syscall(__NR_mremap, old_address, old_size, new_size, flags); | ||
+ | } | ||
+ | </code> | ||
+ | Now you can see why our ''syscall()'' returns a ''long'' instead of an ''int''. Otherwise we wouldn't be able to return a pointer on ''x86_64''. | ||
===== Subsystems ===== | ===== Subsystems ===== |