Discussion:
Fake TX hangs
(too old to reply)
Tal Abudi
2015-09-22 06:36:38 UTC
Permalink
Hi All
My system is experiencing strange fake TX hangs. I'm running ixgbe
3.9.15 on a modified 2.6.18 Linux (with multi queue enabled, Per TX
queue lock).

ixgbe 0000:82:00.1: eth0: Fake Tx hang detected with timeout of 5 seconds
NETDEV WATCHDOG: eth0: transmit timed out
ixgbe 0000:82:00.1: eth0: Fake Tx hang detected with timeout of 10 seconds
NETDEV WATCHDOG: eth0: transmit timed out
ixgbe 0000:82:00.1: eth0: Fake Tx hang detected with timeout of 20 seconds

And this keeps going on.
The messages are from ixgbe_tx_timeout() which is invoked from
dev_watchdog() (sch_generic.c)

I instrumented the kernel and the ixgbe driver and found that
ixgbe_maybe_stop_tx() stop a tx queue in ixgbe_xmit_frame_ring().

Please help me figure out where the queue is restart or the device is restarted.
Looking at ethtool -S show tx_restart_queue as 0.
I'm running a Spirent Avalanche test so it's quite consistent.

Any leads ?
Thanks !
--
Best regards,
Tal Abudi

------------------------------------------------------------------------------
Skidmore, Donald C
2015-09-22 17:12:12 UTC
Permalink
Hey Tal Abudi,

The Fake Tx hang message means that the stack is trying to reset the driver since it "thinks" we are hung however the driver doesn't believe that there is anything it can transmit. This is most often cased but excessive flow control or a faulty switch. What does your ethtool stats show? And you might what to test with FC disabled (assuming you have it enabled currently) to see if the messages go away.

Thanks,
-----Original Message-----
Sent: Monday, September 21, 2015 11:37 PM
Subject: [E1000-devel] Fake TX hangs
Hi All
My system is experiencing strange fake TX hangs. I'm running ixgbe
3.9.15 on a modified 2.6.18 Linux (with multi queue enabled, Per TX queue
lock).
ixgbe 0000:82:00.1: eth0: Fake Tx hang detected with timeout of 5 seconds
NETDEV WATCHDOG: eth0: transmit timed out ixgbe 0000:82:00.1: eth0: Fake
transmit timed out ixgbe 0000:82:00.1: eth0: Fake Tx hang detected with
timeout of 20 seconds
And this keeps going on.
The messages are from ixgbe_tx_timeout() which is invoked from
dev_watchdog() (sch_generic.c)
I instrumented the kernel and the ixgbe driver and found that
ixgbe_maybe_stop_tx() stop a tx queue in ixgbe_xmit_frame_ring().
Please help me figure out where the queue is restart or the device is restarted.
Looking at ethtool -S show tx_restart_queue as 0.
I'm running a Spirent Avalanche test so it's quite consistent.
Any leads ?
Thanks !
--
Best regards,
Tal Abudi
------------------------------------------------------------------------------
_______________________________________________
E1000-devel mailing list
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit
http://communities.intel.com/community/wired
------------------------------------------------------------------------------
Tal Abudi
2015-10-07 15:43:38 UTC
Permalink
Hi
It also happens on a different set of hardware, Flow Control is off in
this scenario.
Any leads ?
Thanks !

NIC statistics:
rx_packets: 18830758
tx_packets: 37113
rx_bytes: 3524962567
tx_bytes: 2171019
rx_errors: 0
tx_errors: 0
rx_dropped: 0
tx_dropped: 0
multicast: 0
collisions: 0
rx_over_errors: 0
rx_crc_errors: 0
rx_frame_errors: 0
rx_fifo_errors: 0
rx_missed_errors: 0
tx_aborted_errors: 0
tx_carrier_errors: 0
tx_fifo_errors: 0
tx_heartbeat_errors: 0
rx_pkts_nic: 28183307
tx_pkts_nic: 38626
rx_bytes_nic: 5423242790
tx_bytes_nic: 2606640
lsc_int: 0
tx_busy: 0
non_eop_descs: 0
broadcast: 3735
rx_no_buffer_count: 0
tx_timeout_count: 0
tx_restart_queue: 0
rx_long_length_errors: 0
rx_short_length_errors: 0
tx_flow_control_xon: 0
rx_flow_control_xon: 0
tx_flow_control_xoff: 0
rx_flow_control_xoff: 0
rx_csum_offload_errors: 0
alloc_rx_page_failed: 0
alloc_rx_buff_failed: 10177
rx_no_dma_resources: 9352549
hw_rsc_aggregated: 0
hw_rsc_flushed: 0
fdir_match: 0
fdir_miss: 0
fdir_overflow: 0
tx_queue_0_packets: 11184
tx_queue_0_bytes: 625603
tx_queue_1_packets: 11517
tx_queue_1_bytes: 689431
tx_queue_2_packets: 14235
tx_queue_2_bytes: 847527
tx_queue_3_packets: 177
tx_queue_3_bytes: 8458
rx_queue_0_packets: 7040617
rx_queue_0_bytes: 1330091185
rx_queue_1_packets: 11790141
rx_queue_1_bytes: 2194871382
rx_queue_2_packets: 0
rx_queue_2_bytes: 0
rx_queue_3_packets: 0
rx_queue_3_bytes: 0


On Tue, Sep 22, 2015 at 8:12 PM, Skidmore, Donald C
Post by Skidmore, Donald C
Hey Tal Abudi,
The Fake Tx hang message means that the stack is trying to reset the driver since it "thinks" we are hung however the driver doesn't believe that there is anything it can transmit. This is most often cased but excessive flow control or a faulty switch. What does your ethtool stats show? And you might what to test with FC disabled (assuming you have it enabled currently) to see if the messages go away.
Thanks,
-----Original Message-----
Sent: Monday, September 21, 2015 11:37 PM
Subject: [E1000-devel] Fake TX hangs
Hi All
My system is experiencing strange fake TX hangs. I'm running ixgbe
3.9.15 on a modified 2.6.18 Linux (with multi queue enabled, Per TX queue
lock).
ixgbe 0000:82:00.1: eth0: Fake Tx hang detected with timeout of 5 seconds
NETDEV WATCHDOG: eth0: transmit timed out ixgbe 0000:82:00.1: eth0: Fake
transmit timed out ixgbe 0000:82:00.1: eth0: Fake Tx hang detected with
timeout of 20 seconds
And this keeps going on.
The messages are from ixgbe_tx_timeout() which is invoked from
dev_watchdog() (sch_generic.c)
I instrumented the kernel and the ixgbe driver and found that
ixgbe_maybe_stop_tx() stop a tx queue in ixgbe_xmit_frame_ring().
Please help me figure out where the queue is restart or the device is restarted.
Looking at ethtool -S show tx_restart_queue as 0.
I'm running a Spirent Avalanche test so it's quite consistent.
Any leads ?
Thanks !
--
Best regards,
Tal Abudi
------------------------------------------------------------------------------
_______________________________________________
E1000-devel mailing list
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit
http://communities.intel.com/community/wired
--
Best regards,
Tal Abudi
Tantilov, Emil S
2015-10-07 16:32:49 UTC
Permalink
-----Original Message-----
Sent: Wednesday, October 07, 2015 8:44 AM
To: Skidmore, Donald C
Subject: Re: [E1000-devel] Fake TX hangs
Hi
It also happens on a different set of hardware, Flow Control is off in
this scenario.
Any leads ?
Thanks !
...
alloc_rx_buff_failed: 10177
rx_no_dma_resources: 9352549
Looks like you are starving for memory while receiving more traffic than what your bus can handle (possibly related).

The "fake" check is basically to make sure that the HW is not hung. If that is the case then whatever issues you have are most likely in your environment.

Thanks,
Emil
Tal Abudi
2015-10-07 17:25:11 UTC
Permalink
Will check.
Thanks !


On Wed, Oct 7, 2015 at 7:32 PM, Tantilov, Emil S
Post by Tantilov, Emil S
-----Original Message-----
Sent: Wednesday, October 07, 2015 8:44 AM
To: Skidmore, Donald C
Subject: Re: [E1000-devel] Fake TX hangs
Hi
It also happens on a different set of hardware, Flow Control is off in
this scenario.
Any leads ?
Thanks !
...
alloc_rx_buff_failed: 10177
rx_no_dma_resources: 9352549
Looks like you are starving for memory while receiving more traffic than what your bus can handle (possibly related).
The "fake" check is basically to make sure that the HW is not hung. If that is the case then whatever issues you have are most likely in your environment.
Thanks,
Emil
--
Best regards,
Tal Abudi
Loading...