Brandeburg, Jesse
2009-04-06 17:36:06 UTC
Hi Jesper,
NETDEV_WATCHDOG message, which would normally indicate that the driver
isn't resetting itself out of the problem. Does ethtool -S eth3 show any
tx_timeout_count ?
something changed in the kernel that is causing remote link down events to
not stop the tx queue (our hardware just completely stops in its tracks
w.r.t tx when link goes down)
I have a 2.6.27.20 system in production, the e1000 drivers seem pretty
"noisy" allthough everything appears to work excellent.
well, nice to hear its working, but wierd about the messages."noisy" allthough everything appears to work excellent.
dmesg here: http://krogh.cc/~jesper/dmesg-ko-2.6.27.20.txt
[476197.380486] e1000: eth3: e1000_clean_tx_irq: Detected Tx Unit Hang
[476197.380488] Tx Queue <0>
[476197.380489] TDH <c>
[476197.380490] TDT <63>
[476197.380490] next_to_use <63>
[476197.380491] next_to_clean <b>
[476197.380491] buffer_info[next_to_clean]
[476197.380492] time_stamp <10717579a>
[476197.380492] next_to_watch <f>
[476197.380493] jiffies <107175a3e>
[476197.380494] next_to_watch.status <0>
The system has been up for 14 days but the dmesg-buffer has allready
overflown with these.
I looked at your dmesg and it appears that there is never a[476197.380486] e1000: eth3: e1000_clean_tx_irq: Detected Tx Unit Hang
[476197.380488] Tx Queue <0>
[476197.380489] TDH <c>
[476197.380490] TDT <63>
[476197.380490] next_to_use <63>
[476197.380491] next_to_clean <b>
[476197.380491] buffer_info[next_to_clean]
[476197.380492] time_stamp <10717579a>
[476197.380492] next_to_watch <f>
[476197.380493] jiffies <107175a3e>
[476197.380494] next_to_watch.status <0>
The system has been up for 14 days but the dmesg-buffer has allready
overflown with these.
NETDEV_WATCHDOG message, which would normally indicate that the driver
isn't resetting itself out of the problem. Does ethtool -S eth3 show any
tx_timeout_count ?
Configuratoin is a 4 x 1GbitE bond all with Intel NICs
06:01.0 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet
Controller (Copper) (rev 03)
06:01.1 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet
Controller (Copper) (rev 03)
06:02.0 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet
Controller (Copper) (rev 03)
06:02.1 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet
Controller (Copper) (rev 03)
are you doing testing with the remote end of this link? I'm wondering if06:01.0 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet
Controller (Copper) (rev 03)
06:01.1 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet
Controller (Copper) (rev 03)
06:02.0 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet
Controller (Copper) (rev 03)
06:02.1 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet
Controller (Copper) (rev 03)
something changed in the kernel that is causing remote link down events to
not stop the tx queue (our hardware just completely stops in its tracks
w.r.t tx when link goes down)