**This is an old revision of the document!** ----

A PCRE internal error occured. This might be caused by a faulty plugin

====== Michael Decker: Driver Development ====== ==== Week 7 ==== ---- === 9 July === A new branch, ''drivers6'' was created. This branch was merged with the mainline via ''git pull origin master''. This brought the GDB code into my tree. Experimenting with GDB, a segfault was reported following the point where gPXE was freezing during the second NIC boot. I ran a backtrace: <file> Program received signal SIGSEGV, Segmentation fault. alloc_memblock (size=96, align=<value optimized out>) at include/gpxe/list.h:64 64 __list_add ( new, head, head->next ); (gdb) backtrace #0 alloc_memblock (size=96, align=<value optimized out>) at include/gpxe/list.h:64 #1 0x00007cd1 in realloc (old_ptr=0x0, new_size=80) at core/malloc.c:265 #2 0x00007d2f in zalloc (size=96) at core/malloc.c:332 #3 0x0000814b in resolv (resolv=0x78a8, name=0xf "Ãë\t\017¾CÿèR", sa=0x33ad8) at core/resolv.c:260 #4 0x0000823b in xfer_open_named_socket (xfer=0x784c, semantics=208084, peer=0x33ad8, name=0x13356 "192.168.2.8", local=0x0) at core/resolv.c:389 #5 0x00005f64 in http_open_filter (xfer=0x12de8, uri=0x13324, default_port=80, filter=0) at net/tcp/http.c:501 #6 0x00012a70 in mtftp_uri_opener () #7 0x00012de8 in heap () #8 0x00005fc9 in http_open (xfer=0x60, uri=0xf) at net/tcp/http.c:527 #9 0x00000000 in ?? () </file> Not sure why the segfault occurred, although I do see the parameter to ''resolve'' is not valid. Marty recommended I install wireshark and take a look at what's happening. Additionally, testing at his end showed two different NICs failing iSCSI booting, but passing HTTP booting. I haven't tried iSCSI booting yet, so I'll need to set this up to recreate the errors he's seeing. In the meantime, analyzing wireshark output should show any problems with rx & tx during HTTP booting. I may also play with GDB a bit more to figure out what's going on, but currently I need to nail down the bug to something more specific. === 10 July === This morning I installed wireshark and have been inspecting HTTP boot packet communications. I found a number of duplicate transmissions (including duplication of TCP sequence numbers.) It seemed something was wrong with the transmission path. I added a few debug lines to ''ifec_tx_wake()'': <file> void ifec_tx_wake ( struct net_device *netdev ) { struct ifec_private *priv = netdev->priv; unsigned long ioaddr = priv->ioaddr; struct ifec_active *a = priv->active; struct ifec_tcb *tcb = a->tcb_head->next; /* For the special case of the first transmit, we issue a START. The * card won't RESUME after the configure command. */ if ( a->configured ) { a->configured = 0; ifec_scb_cmd ( netdev, virt_to_bus ( tcb ), CUStart ); ifec_scb_cmd_wait ( netdev ); return; } /* if not suspended, and all other tcbs have suspend flag clear, do NOT clear * the suspend flag. if you do, it will enter a bad state. we need a tcb with * a suspend flag set in the tx ring at all times. */ /* Resume if suspended. */ switch ( ( inw ( ioaddr + SCBStatus ) >> 6 ) & 0x3 ) { case 0: /* Idle - We should not reach this state. */ DBG ( "ifec_net_transmit: tx idle!\n" ); ifec_scb_cmd ( netdev, virt_to_bus ( tcb ), CUStart ); ifec_scb_cmd_wait ( netdev ); break; case 1: /* Suspended */ DBG ( "s" ); //ifec_net_transmit: tx suspended : resume issued\n" ); ifec_scb_cmd_wait ( netdev ); outl ( 0, ioaddr + SCBPointer ); a->tcb_head->command &= ~CmdSuspend; /* Immediately issue Resume command */ outb ( CUResume, ioaddr + SCBCmd ); ifec_scb_cmd_wait ( netdev ); break; default: DBG ( "a" ); a->tcb_head->command &= ~CmdSuspend; } } </file> This way I could see what state the Command Unit was in prior to each tx. Comparing this debug output with the wireshark output, I found that every instance of an 'a' coincided with a duplicate packet transmission. Now, the same packet being transmitted twice is odd. The driver is setup to write into the next TCB in the tx ring for each transmit call. I added a debug line in ''ifec_net_transmit()'': <file> static int ifec_net_transmit ( struct net_device *netdev, struct io_buffer *iobuf ) { struct ifec_private *priv = netdev->priv; unsigned long ioaddr = priv->ioaddr; struct ifec_active *a = priv->active; struct ifec_tcb *tcb = a->tcb_head->next; unsigned short status; /* Wait for TCB to become available. */ if ( tcb->status || tcb->iob ) { DBGP ( "TX overflow\n" ); return -ENOBUFS; } status = inw ( ioaddr + SCBStatus ); /* Acknowledge all of the current interrupt sources ASAP. */ outw ( status & 0xfc00, ioaddr + SCBStatus ); DBGIO ( "transmitting packet (%d bytes). status = %hX, cmd=%hX\n", iob_len ( iobuf ), status, inw ( ioaddr + SCBCmd ) ); DBGIO_HD ( iobuf->data, iob_len ( iobuf ) ); tcb->command = CmdSuspend | CmdTx | CmdTxFlex; tcb->count = 0x01208000; tcb->tbd_addr0 = virt_to_bus ( iobuf->data ); tcb->tbd_size0 = 0x3FFF & iob_len ( iobuf ); tcb->iob = iobuf; DBG ( "%i", tcb - a->tcbs ); DBGIO ( "tcb: \n" ); DBGIO_HD ( tcb, sizeof ( *tcb ) ); ifec_tx_wake ( netdev ); /* Append to end of ring. */ a->tcb_head = tcb; return 0; } </file> The line ''DBG ( "%i", tcb - a->tcbs );'' prints out the index of the current TCB in the tx ring. The debug output showed proper circulation from 0 through 3 and back to 0 repeatedly. However, it also showed no duplicates in wireshark! From this behavior, I made the assumption that the time delay of printing the debug output at that point prevents the 'a' condition from ever occuring. This, in turn, prevents the duplication bug. The 'a' condition is the CU being in the active state, which occurs when a transmit request occurs quickly before the previous tx finished processing on the card. Thus, I now have nailed down at least //one// bug, and now I can determine what's going wrong.


QR Code
QR Code soc:2008:mdeck:journal:week7 (generated for current page)