===== Week 7 [ Mon 5 Jul 2010 - Sun 11 Jul 2010 ] ===== ==== Day 1 [ Mon 5 Jul 2010 ] ==== Git commit: [[http://git.etherboot.org/?p=people/andreif/gpxe.git;a=commit;h=959381fd2d516e50324047113bf7a5bd160d07e3|959381fd2d516e50324047113bf7a5bd160d07e3]] Started working on the .transmit routine. As a minor off-topic, I just want to say that even though this driver hasn't been so much fun to develop ( because it involved a lot more copy and paste than the old one ), it definitely compensated through a large number of theoretical concepts ( see my previous entries for more details ). The first thing I ran into today was [[http://en.wikipedia.org/wiki/Large_segment_offload |Large segment offloading]]. LSO works by delegating the responsibility of packet fragmentation to the NIC, thus relieving the CPU of the task of splitting packets to the appropriate size. This saves CPU cycles and increases performance. I made the connection with a [[http://portal.acm.org/citation.cfm?id=1298483|paper]] I've read earlier this year, also related to offloading. Anyway, since gPXE does not does not support LSO, the related code will be ignored but it was an interesting concept to read about nevertheless. After that I removed the old way of representing the circular buffer (using two pointers), cleaned up the dma-tx-related code, and got .transmit done. There isn't much to say about it, you just put the buffer's address into the descriptor, along with the size and the ownership bit. I suppose the NVREG_TXRXCTL_KICK write into the NvRegTxRxControl registers makes the NIC re-evaluate the descriptors and send any new packets. FIXME : forgot to fill the size. ==== Day 2 [ Tue 6 Jul 2010 ] ==== Git commit: [[http://git.etherboot.org/?p=people/andreif/gpxe.git;a=commit;h=f75a876c5036ed50b81a6b7569eadf86c24474d7|f75a876c5036ed50b81a6b7569eadf86c24474d7]] When I started working today on the implementation of the .poll routine I realised that the Linux driver followed the general principles that the pcnet32 driver did. I adapted the pcnet32 code to use the forcedeth descriptors and it turned out pretty well. The Linux driver does some advanced error checking but I think that is just for reporting stats to userspace so I'll skip that. All I have to do now is cleanup rx and re-implement the iobuf allocation routine so it can be used both at start up and afterwards, when refilling rx entries. ==== Day 3 [ Wed 7 Jul 2010 ] ==== Git commit: [[http://git.etherboot.org/?p=people/andreif/gpxe.git;a=commit;h=e230901ae95b2abe730f704e302b8b495d8ec344|e230901ae95b2abe730f704e302b8b495d8ec344]] I started fixing the .poll routine so it now processes packets only if interrupts are "signalled" in the NvRegIrqStatus register. Cleaned up rx, and implemented a routine that refills the rx descriptors. After that, I started testing the NIC. I did manage to fix some bugs related to bad initializations, too many descriptors, and the fact that I did not call netdev_link_up(). The problem I have right now is that ''alloc_iob()'' fails and thus the whole ''forcedeth_alloc_rx()'' routine fails. I think I may be overallocating memory somewhere or forgetting to free it. I can't think of any other reason for memory allocation to fail. I didn't get to dig in too much into this, but I'm sure tomorrow will prove to be more fruitful. In other news, Piotr reported a problem with the pcnet32 driver which, fortunately, was easy to fix. Thanks Piotr! ==== Day 4 [ Thu 8 Jul 2010 ] ==== Git commit: [[http://git.etherboot.org/?p=people/andreif/gpxe.git;a=commit;h=f4ae3fafb3291254e99c378f909599e8930c9431|f4ae3fafb3291254e99c378f909599e8930c9431]] I did a lot of debugging today, which turned out to be a real PITA because of the following reasons: * the router I am using causes link failures every now and then. I got tired of it so I'm back to USB sticks * I have no serial port so more than a couple of DBG messages are difficult to handle ( Pause/Break ftw ) Bugs: * Too many RX rings ( there were 32, now set to 16 ) * Forgot to call ''netdev_rx()'' so iobufs were never passed to the upper layers. This was the reason for ''alloc_iob()'' failing since iobufs were never freed * I was using the wrong kind of descriptor in .transmit so the flaglen field never got set. Thus, the NIC was not sending any packets * = instead of ==. I know some people prefer to write if ( 1 == var ) in order to avoid these issues but IMHO it seriously affects readability. * Misinterpretation of the flag field ( I'm still not sure about NV_RX_AVAIL ) * I wasn't setting up the descriptor rings' physical address correctly Finding these bugs involved a lot of printf debugging. This took a lot of time especially because I constantly had to limit the number of DBGs. After all of this, a minor victory occured, gPXE managed to send the first DHCP DISCOVER packet (actionally two, I'm not sure if this is from gPXE or there is something wrong with the driver), _but_ the dhcp server was not replying at all. I forgot to mention that I was using Wireshark all this time to see if there was any traffic on the wire. I looked at the packet sent by gPXE and saw that it contained a BOOTP section. I wasn't sure if DHCP servers are automatically configured to reply to BOOTP packets so I went back to my dhcp.conf file. Extract from the .conf: <code> # Fixed IP addresses can also be specified for hosts. These addresses # should not also be listed as being available for dynamic assignment. # Hosts for which fixed IP addresses have been specified can boot using # BOOTP or DHCP. Hosts for which no fixed address is specified can only # be booted with DHCP, unless there is an address range on the subnet # to which a BOOTP client is connected which has the dynamic-bootp flag # set. </code> After adding a host configuration in the .conf, the server finally replied with a DHCP OFFER packet. Still, the NIC remained silent, so now there probably are issues in RX. ==== Day 5 [ Fri 9 Jul 2010 ] ==== me.away() ==== Day 6 [ Sat 10 Jul 2010 ] ==== Git commit: [[http://git.etherboot.org/?p=people/andreif/gpxe.git;a=commit;h=7b189803f76097d4b6b47bb5b90146259b7c8834|7b189803f76097d4b6b47bb5b90146259b7c8834]] Yet Another Debugging Session. Managed to fix more bugs, still not working properly. Bugs: * When allocating rx descriptors, was using the original descriptor format instead of the extended one * I was not sending received packets to the upper layers correctly. Used iob_put and fixed a bug where a NULL iobuf would be netdev_rx-ed * This one was the most difficult to fix. Last time, I left off with the server replying with a DHCP OFFER packet and the NIC remaining silent after that. I thought it was an RX issue. Still, after finding and fixing some RX bugs, the problem still remained. Running out of ideas, I tried sending another packet to the NIC (aka, not a DHCP OFFER). Lo and behold, an ARP packet was received properly by the NIC. The destination address of the ARP packet was 255.255.255.255. The DHCP OFFER destination address was a unicast address. Clearly, the NIC did unwanted filtering. Digging through the registers I set the NIC in promiscuous mode and it replied to the DHCP server.{{:soc:2010:andreif:journal:forcedet-buggy1.png?1000|forcedeth driver first packets}} * Finally got NV_RX_AVAIL and NV_TX_VALID right Eventually, the above process stops with an TX overflow error. The NIC is not sending out packets properly, even though the flaglen field is marked with NV_TX_VALID. I fiddled with the code some more and right now it does not send packets at all (sometimes it does, sometimes it sends the first two DISCOVER packets but the server does not reply). Also, notice in the above picture that there are some malformed-packets (the white ones). I've yet to figure out what is causing these issues. ==== Day 7 [ Sun 11 Jul 2010 ] ==== Git commit: [[http://git.etherboot.org/?p=people/andreif/gpxe.git;a=commit;h=d6dd69ea626d2033f51f18673947148b63530087|d6dd69ea626d2033f51f18673947148b63530087]] The only significant improvement made today was that now the driver consistently transmits packets (eventually stopping with an overflow error) Bugs: * I started by comparing the PHY register values with the old driver. One of the registers had a different value than expected and this was because I forgot to initialize a local variable in the ''phy_init()'' routine. * I also forgot to pad small packets. Used iob_pad for this. * Comparing the flaglen field of tx packets I noticed that the old driver has an additional bit set. Setting it didn't make any difference but I am keeping it for now and will try to remove it later after the driver works. I got some good suggestions in today's meeting regarding debugging so I'll try them out tomorrow.