====== Differences ====== This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
soc:2009:asdlkf:journal:week5 [2009/06/23 22:00]
asdlkf created
soc:2009:asdlkf:journal:week5 [2009/06/29 16:42] (current)
asdlkf
Line 16: Line 16:
 Unfortunately,​ today I came home and saw that my computer was frozen trying to enumerate it's RAM. After some basic troubleshooting and memtest x86, i found that my main desktop machine has some faulty ram in it. Hopefully I'll be able to replace the ram in store tomorrow... 4x 2GB sticks of corsair dominator 1066 isn't cheap... Unfortunately,​ today I came home and saw that my computer was frozen trying to enumerate it's RAM. After some basic troubleshooting and memtest x86, i found that my main desktop machine has some faulty ram in it. Hopefully I'll be able to replace the ram in store tomorrow... 4x 2GB sticks of corsair dominator 1066 isn't cheap...
  
-All I'll be able to get done today (well, the rest of today) is finish this journal post. However, tonight I initialized a complete build/test environment in my laptop and downloaded my git tree there. Hopefully this will allow me to continue to work and minimize my down time to the time it took me to setup this laptop. ​+All I'll be able to get done today (well, the rest of today) is finish this journal post. However, tonight I initialized a complete build/test environment in my laptop and downloaded my git tree there. ​ 
 +Hopefully this will allow me to continue to work and minimize my down time to the time it took me to setup this laptop. ​ 
 + 
 +June 24: 
 + 
 +Everything is fixed! My computer(s) are back online and ready to go.  
 + 
 +Today I began by tracing attempting to trace out the source of the memory error I'm experiencing.\\  
 +I started with inserting several DEBP messages to follow program flow. Unfortunately after several hours, I've made no "​actual"​ progress. 
 + 
 +June 25: 
 + 
 +I have made some headway, but still have not resolved the memory issue. 
 + 
 +This is what the code is executing as:\\  
 +<​code>​ 
 +skge_probe - start 
 +     ​skge_probe - middle: addr 0xde258b10 irq 10 chip 0x0 rev 0 
 +     ​ll_addr[i]:​ 00:​21:​91:​91:​10:​6dskge_initialize - start 
 +skge_perform_software_reset() 
 +     ​initialize -> removing error bits 
 +skge_enable_test_mode - 1 
 +skge_enable_test_mode - 0 
 +chip id: 177 
 +chip id: 10 
 +chip id: 176 
 +chip id: 178 
 +     ​initialize -> chip id: MARV: 0xb1 
 +     ​initialize -> ram_size ​ : 65536 
 +     ​initialize -> ram_offset: 0 
 +     ​initialize -> wasn't genesis 
 +     ​initialize -> Clearing error bits 
 +     ​initialize -> Performing reset 
 +     ​initialize -> Stopping card 
 +     ​initialize -> Turning LED on 
 +     ​initialize -> Enabling arbiter 
 +     ​initialize -> Setting timeout init values 
 +     ​initialize -> Setting clock values 
 +skge_usecs2clk 
 +hwkhz 
 +     ​initialize -> Resetting each port 
 +yukon_reset start 
 +yukon_reset end 
 +skge initialize - end 
 +port: 0skge_probe - end - return 0 
 + 
 + 
 + 
 +gPXE 0.9.7+ -- Open Source Boot Firmware -- http://​etherboot.org 
 +Features: HTTP DNS TFTP AoE iSCSI bzImage COMBOOT ELF Multiboot PXE PXEXT 
 + 
 +skge_open 
 +skge net0: enabling interface 
 +skge_ring_alloc 1 
 +skge_ring_alloc 2 
 +skge_ring_alloc 3 - ring->​start = 97264 
 +skge_ring_alloc 3 - ring->​count = 6 
 +skge_ring_alloc 3 - vaddr = 0 
 +                skge_ring_alloc 3 - i = 0 
 +                skge_ring_alloc 3 - e = 97264 
 +                skge_ring_alloc 3 - d = 0 
 +                skge_ring_alloc 3 - i = 1 
 +                skge_ring_alloc 3 - e = 97284 
 +                skge_ring_alloc 3 - d = 32 
 +                skge_ring_alloc 3 - i = 2 
 +                skge_ring_alloc 3 - e = 97304 
 +                skge_ring_alloc 3 - d = 64 
 +                skge_ring_alloc 3 - i = 3 
 +                skge_ring_alloc 3 - e = 97324 
 +                skge_ring_alloc 3 - d = 96 
 +                skge_ring_alloc 3 - i = 4 
 +                skge_ring_alloc 3 - e = 97344 
 +                skge_ring_alloc 3 - d = 128 
 +                skge_ring_alloc 3 - i = 5 
 +                skge_ring_alloc 3 - e = 97364 
 +                skge_ring_alloc 3 - d = 160 
 +skge_ring_alloc 4 
 +skge_ring_alloc 5 
 +Function: skge_rx_fill - 
 +Function: skge_rx_fill - end 
 +here 0009skge_ring_alloc 1 
 +skge_ring_alloc 2 
 +skge_ring_alloc 3 - ring->​start = 97392 
 +skge_ring_alloc 3 - ring->​count = 6 
 +skge_ring_alloc 3 - vaddr = 192 
 +                skge_ring_alloc 3 - i = 0 
 +                skge_ring_alloc 3 - e = 97392 
 +                skge_ring_alloc 3 - d = 192 
 +                skge_ring_alloc 3 - i = 1 
 +                skge_ring_alloc 3 - e = 97412 
 +                skge_ring_alloc 3 - d = 224 
 +                skge_ring_alloc 3 - i = 2 
 +                skge_ring_alloc 3 - e = 97432 
 +                skge_ring_alloc 3 - d = 256 
 +                skge_ring_alloc 3 - i = 3 
 +                skge_ring_alloc 3 - e = 97452 
 +                skge_ring_alloc 3 - d = 288 
 +                skge_ring_alloc 3 - i = 4 
 +                skge_ring_alloc 3 - e = 97472 
 +                skge_ring_alloc 3 - d = 320 
 +                skge_ring_alloc 3 - i = 5 
 +                skge_ring_alloc 3 - e = 97492 
 +                skge_ring_alloc 3 - d = 352 
 +skge_ring_alloc 4 
 +skge_ring_alloc 5 
 +here 0011yukon_mac_init - Not Yukon Lite   - Not Yukon Lite   - Autoneg disabled - half duplex 
 +Function: yukon_init - start 
 +skge à: phy read timeout port 0 reg 0 val 0 
 +Function: yukon_init - end 
 +yukon_mac_init - endhere 0002 
 +adapter : 96796 
 +rxqaddr : 82636 
 +port    : 0 
 +ram_addr: -65281 
 +chunk: 8454017 
 + 
 +</​code>​ 
 + 
 +At this point (directly following the "​chunk"​ execution, execution haults and the system becomes non-responsive. 
 + 
 +I managed to narrow down exection to a single point in the source code that execution haults on, however, I find it hard to believe that between two successive DBGP() statements execution haults. 
 + 
 +Lines 637, 638, and 639 of skge.c: 
 + 
 +<​code>​ 
 + ​637 ​        ​DBGP("​chunk:​ %d\n",​chunk);​ 
 + ​638 ​        ​skge_ramset(adapter,​ rxqaddr[port],​ ram_addr, chunk); 
 + ​639 ​        ​DBGP("​here 0003\n"​);​ 
 + 
 +</​code>​ 
 + 
 +and, subsequently,​ the first few lines in the definition of skge_ramset:​ 
 +<​code>​ 
 + 
 + 415 static void skge_ramset(struct skge_adapter *hw, u16 q, u32 start, size_t len) { 
 + ​416 ​        u32 end; 
 + ​417 ​        ​DBGP("​skge_ramset - start"​);​ 
 + 
 +</​code>​ 
 + 
 + 
 +Thus, I'm very concerned by the fact that the output does *NOT* look like this... 
 +<​code>​ 
 +[...] 
 +chunk: 8454917 
 +skge_ramset - start 
 +[...] 
 +</​code>​ 
 +As these are basicly subsequent lines of execution (ignoring the function devision). 
 + 
 + 
 +Also, i sent an email to gsoc-mentors-2009 today; The contents of which read (briefly) as: 
 + 
 +<​code>​ 
 +when built as "make bin/​skge.pxe DEBUG=skge:​7",​ output is http://​pxe.asdlkf.net/​single.txt,​ and execution haults 
 +when built as "make bin/​skge--rtl8139.pxe DEBUG=skge:​7"​ and execution ... works? 
 +</​code>​ 
 + 
 + 
 +June 26: <day taken off, friends over> 
 + 
 + 
 +June 27: 
 + 
 +Looking more closely into the output of the two commands run at the end of the 25th (the 2 different versions of building skge), the mentor email list pointed me towards "​vaddr"​ being 0. This does absolutly make sense as a probable cause for a crash. 
 +I will look more directly into following this variable in both versions of the executible build as soon as my meeting is over today. 
 + 
 + 
 +... Later that day... 
 + 
 +So, vaddr was just a symptom. 
 + 
 +I spent about 90 minutes with MCB30 and AndyTim trying to diagnose what was going on; In the end, one specific line stood out: 
 + 
 +<​code>​ 
 +       ​netdev = alloc_etherdev (sizeof (*adapter));​ 
 +</​code>​ 
 + 
 +Hmm...... It COULD have something to do with the fact that netdev is being allocated the size of *adapter... 
 + 
 + 
 +/GROAN 
 + 
 +Ok, so, better. Things appear to be running smoothly. 
 + 
 + 
 +June 28: Taken off, Helping a friend paint a wall. Then watching it dry. Then painting it again. 
  
 -- Chris -- Chris
 +

QR Code
QR Code soc:2009:asdlkf:journal:week5 (generated for current page)