**This is an old revision of the document!** ----
====== Stefan Hajnoczi: GDB Remote Debugging ====== ===== Week 4 ===== **Milestones:** * Get latest GDB stub work into mainline. * Modern bzImage prefix for gPXE. ==== Mon Jun 16 ==== **The ''gdbstub2'' branch is now ready for mainline review**. Diffs against gPXE ''master'' are [[http://etherboot.org/share/stefanha/gdbstub2.diff|here]]. Once it is merged I will update the documentation and encourage others to use GDB. **gPXE needs modern bzImage support so that GRUB, lilo, and SYSLINUX can load it**. This is my next piece of work after the GDB stub. There is already code in etherboot to make a bzImage. The old code doesn't work by default on today's popular bootloaders since the Linux bzImage header it supplies is outdated. I am investigating what needs to be done for GRUB, lilo, SYSLINUX, etherboot, and gPXE to load a gPXE bzImage. ==== Tue Jun 17 ==== Git commit: * [[http://git.etherboot.org/?p=people/stefanha/gpxe.git;a=commit;h=bfd885802fd6af9938f2b703f6c48a9259cd7657|[bzImage] Make gpxe.lkrn a zImage 2.07]] **I am trying out bootloaders on ''gpxe.lkrn'' images**. We were afraid that the outdated Linux zImage prefix no longer works with modern bootloaders. Here are results for unmodified gPXE (I have not yet attempted to implement bzImage): * **GRUB** boots ''gpxe.lkrn'' successfully. Here is a script to create a GRUB/gPXE boot floppy: <code> #!/bin/sh set -e dd if=/dev/zero of=grub.img bs=1024 count=1440 losetup /dev/loop0 grub.img mkfs /dev/loop0 mount /dev/loop0 /mnt mkdir -p /mnt/boot/grub cp /boot/grub/stage1 /boot/grub/stage2 /mnt/boot/grub/ cat >/mnt/boot/grub/menu.lst <<EOF title=gPXE root (fd0) kernel /boot/gpxe.lkrn EOF cp bin/gpxe.lkrn /mnt/boot/ umount /mnt grub --device-map=/dev/null <<EOF device (fd0) /dev/loop0 root (fd0) setup (fd0) quit EOF losetup -d /dev/loop0 </code> * **SYSLINUX** boots ''gpxe.lkrn'' successfully. Here is a script to create a boot floppy: <code> #!/bin/sh set -e dd if=/dev/zero of=syslinux.img bs=1024 count=1440 mkfs.msdos syslinux.img mount -o loop syslinux.img /mnt cp bin/gpxe.lkrn /mnt/gpxe.zi cat >/mnt/SYSLINUX.CFG <<EOF default gpxe.zi EOF umount /mnt syslinux syslinux.img </code> * **lilo** boots ''gpxe.lkrn'' unsuccessfully. QEMU stops with a triple-fault. I still need to look into this. Here is a script to create a boot floppy: <code> #!/bin/sh set -e dd if=/dev/zero of=lilo.img bs=1024 count=1440 losetup /dev/loop0 lilo.img mkfs /dev/loop0 mount /dev/loop0 /mnt mkdir /mnt/etc /mnt/boot cp bin/gpxe.lkrn /mnt/gpxe.zi cat >/mnt/etc/lilo.conf <<EOF boot =/dev/loop0 disk =/dev/loop0 bios =0x00 # 1.44MB disk geometry sectors =18 heads =2 cylinders =80 install =/mnt/boot/boot.b map =/mnt/boot/map backup =/dev/null image =/mnt/gpxe.zi EOF /tmp/lilo/sbin/lilo -C /mnt/etc/lilo.conf umount /mnt losetup -d /dev/loop0 </code> * **gPXE** boots ''gpxe.lkrn'' unsuccessfully since only the newer bzImage and not the old zImage format is supported. Testing was easy: <code> qemu -bootp gpxe.lkrn -tftp bin bin/gpxe.usb </code> * **Etherboot 5.4.3** boots ''gpxe.lkrn'' successfully. I used [[http://freshmeat.net/projects/wraplinux/|wraplinux]] to make an NBI file from ''gpxe.lkrn''. **Updated ''lkrnprefix.S'' to zImage 2.07**. The image is still only a zImage since the non-real code loads at 0x10000. A bzImage loads non-real code at 0x100000, i.e. right after the 1 MB low memory. Perhaps ''gpxe.lkrn'' can be a full bzImage, but I think that the A20 line will prevent us from accessing 0x100000. * **GRUB** boots successfully. * **Lilo** still fails. I need to investigate this, probably I'm not using it properly. * **SYSLINUX** boots successfully. * **gPXE** boots successfully with a small patch to ''bzimage.c''. Need to discuss this with mcb30. * **Etherboot** boots successfully. ==== Wed Jun 18 ==== **Lilo still triple-faults when loading ''gpxe.lkrn''**. I set up a virtual machine with [[http://damnsmalllinux.org/|Damn Small Linux]] to ensure a clean environment. The DSL kernel is boots successfully while ''gpxe.lkrn'' fails. Here is the triple fault information from QEMU: <code> qemu: fatal: triple fault EAX=60000000 EBX=0000fee8 ECX=00002900 EDX=00001d8a ESI=0001ffff EDI=0000ff51 EBP=0000f9c4 ESP=0000f96e EIP=0000074c EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0018 00000000 ffffffff 00cf9300 CS =0008 0000f600 0000ffff 00009b00 SS =0010 00090000 0000ffff 00009309 DS =0018 00000000 ffffffff 00cf9300 FS =0018 00000000 ffffffff 00cf9300 GS =0018 00000000 ffffffff 00cf9300 LDT=0000 00000000 0000ffff 00008000 TR =0000 00000000 00000000 00000000 GDT= 0009f99c 0000001f IDT= 00000000 000003ff CR0=60000011 CR2=00000000 CR3=00000000 CR4=00000000 CCS=00000000 CCD=0000f97e CCO=ADDB FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80 FPR0=0000000000000000 0000 FPR1=0000000000000000 0000 FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 FPR4=0000000000000000 0000 FPR5=0000000000000000 0000 FPR6=0000000000000000 0000 FPR7=0000000000000000 0000 XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000 XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000 XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000 XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000 Aborted </code> I don't see an obvious clue in the crash dump, so I'll wait until after speaking with mcb30 about bzImage. If we decide to go in a different direction then I'd waste time debugging this. In the meantime I'll investigate real-mode GDB debugging. I already tried ''set architecture i8086'' for 16-bit disassembly. GDB still treats memory as a flat 32-bit space and will probably require some address translation inside the GDB stub. Another thought I'm holding is that loading ''gpxe.lkrn'' recursively fails. That potentially means you cannot load another zImage after gPXE has been loaded from ''gpxe.lkrn''. <del>My theory is that gPXE has been loaded to the default zImage load address, i.e. 0x10000. If gPXE then tries to load another image there, it overwrites itself and crashes</del>. It looks unlikely that gPXE is overwriting itself because it relocates as high up as possible. ==== Thu Jun 19 ==== Git commit: [[http://git.etherboot.org/?p=people/stefanha/gpxe.git;a=commit;h=2c5b2a45b7c33b1cd45e99c29f0a833c458e2ecb|[b44] Create skeleton driver for Broadcom 4401 NIC]] **Brought up ROM-o-matic for Etherboot top-of-git-tree**. I have been occasionally assisting mdc with his [[http://rom-o-matic.net/|ROM-o-matic.net]] online boot ROM generator. He recently enabled ROM-o-matic for gPXE top-of-git-tree. That way users can get ROMs for the latest development version of gPXE without having to set up a development environment and build from source. This is now also possible for Etherboot. **Beginning work to port Linux b44 (Broadcom 4401) driver**. My laptop has a BCM4401-B0 NIC and is currently not supported by gPXE. The idea is to port the Linux driver to gPXE. I am looking forward to learning more about network drivers and device driver development in general. ==== Fri Jun 20 ==== Git commit: [[http://git.etherboot.org/?p=people/stefanha/gpxe.git;a=commit;h=598d23c5b0ec778dcbaafe634569ca9f8f8d1854|[b44] Minimal TX path]] **The b44 driver is transmitting Ethernet frames**. Thanks to Michael Decker's excellent [[:soc:2008:mdeck:notes:gpxe_driver_api|gPXE Driver API Documentation]] I got the skeleton for the driver working very quickly last night. This morning I started porting the Linux b44 driver code. After getting the initialization working (mainly by copy-paste) and reading the MAC address from the card, I decided to pursue the TX path. Getting transmit working early is useful since gPXE will attempt to do DHCP automatically and therefore needs to send packets. Copy-pasting the Linux driver was not a good tactic since the Linux code is much more complex. Eventually I just focused on understanding how the hardware supports transmitting frames (there is no public documentation available!), and then implemented a simple TX path resembling the gPXE natsemi driver. ==== Sat Jun 21 ==== Git commit: [[http://git.etherboot.org/?p=people/stefanha/gpxe.git;a=commit;h=39144f971c261bcd92f5052059d444d742b0010f|[b44] Working RX path]] **The b44 driver is receiving Ethernet frames**. I just booted PXELINUX and HTTP-booted Linux 2.6.25 on this card for the first time! Getting the RX path working has been painful. I think some of the Linux driver code is misleading/incorrect. Luckily there are drivers for OpenBSD, FreeBSD, and Solaris. Those drivers might even be based on the Linux driver, but they do some things differently and it helps to compare them to each other. My main issue with the RX path was a comment in the Linux driver claiming that the hardware writes a header structure 30 bytes //before// the DMA address of the I/O buffer. This is false. The Linux driver does offset the DMA address by 30 bytes, but it also offsets the IO buffer by 30 bytes. In the end, it makes no difference and all that has happened is that 30 bytes of the IO buffer have been wasted. The header structure gets written //to// the DMA address, not before it. Next steps: * [b44] Implement TX error handling. * [bzImage] Debug Lilo triple-fault. * [GDB] Update [[:dev:gdbstub|GDB stub page]] and screencast when UDP code is merged into mainline. See [[http://grub.enbug.org/DebuggingWithGDB|GRUB GDB wiki page]] for inspiration. * [GDB] Real-mode remote debugging.