Discussion:
[PATCH] igb: don't reuse pages with pfmemalloc flag
(too old to reply)
Roman Gushchin
2014-10-22 13:50:19 UTC
Permalink
Incoming packet is dropped silently by sk_filter(), if the skb was
allocated from pfmemalloc reserves and the corresponding socket is
not marked with the SOCK_MEMALLOC flag.

Igb driver allocates pages for DMA with __skb_alloc_page(), which
calls alloc_pages_node() with the __GFP_MEMALLOC flag. So, in case
of OOM condition, igb can get pages with pfmemalloc flag set.

If an incoming packet hits the pfmemalloc page and is large enough
(small packets are copying into the memory, allocated with
netdev_alloc_skb_ip_align(), so they are not affected), it will be
dropped.

This behavior is ok under high memory pressure, but the problem is
that the igb driver reuses these mapped pages. So, packets are still
dropping even if all memory issues are gone and there is a plenty
of free memory.

In my case, some TCP sessions hang on a small percentage (< 0.1%)
of machines days after OOMs.

Fix this by avoiding reuse of such pages.

Signed-off-by: Roman Gushchin <***@yandex-team.ru>
---
drivers/net/ethernet/intel/igb/igb_main.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 0d4c897..6586392 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6178,6 +6178,9 @@ static bool igb_can_reuse_rx_page(struct igb_rx_buffer *rx_buffer,
if (unlikely(page_to_nid(page) != numa_node_id()))
return false;

+ if (unlikely(page->pfmemalloc))
+ return false;
+
#if (PAGE_SIZE < 8192)
/* if we are only owner of page we can reuse it */
if (unlikely(page_count(page) != 1))
@@ -6245,7 +6248,8 @@ static bool igb_add_rx_frag(struct igb_ring *rx_ring,
memcpy(__skb_put(skb, size), va, ALIGN(size, sizeof(long)));

/* we can reuse buffer as-is, just make sure it is local */
- if (likely(page_to_nid(page) == numa_node_id()))
+ if (likely((page_to_nid(page) == numa_node_id()) &&
+ !page->pfmemalloc))
return true;

/* this page cannot be reused so discard it */
--
1.9.3
Eric Dumazet
2014-10-22 15:45:37 UTC
Permalink
Post by Roman Gushchin
Incoming packet is dropped silently by sk_filter(), if the skb was
allocated from pfmemalloc reserves and the corresponding socket is
not marked with the SOCK_MEMALLOC flag.
Igb driver allocates pages for DMA with __skb_alloc_page(), which
calls alloc_pages_node() with the __GFP_MEMALLOC flag. So, in case
of OOM condition, igb can get pages with pfmemalloc flag set.
If an incoming packet hits the pfmemalloc page and is large enough
(small packets are copying into the memory, allocated with
netdev_alloc_skb_ip_align(), so they are not affected), it will be
dropped.
This behavior is ok under high memory pressure, but the problem is
that the igb driver reuses these mapped pages. So, packets are still
dropping even if all memory issues are gone and there is a plenty
of free memory.
In my case, some TCP sessions hang on a small percentage (< 0.1%)
of machines days after OOMs.
Fix this by avoiding reuse of such pages.
---
Interesting...

It seems we also need to clear skb->pfmemalloc in napi_reuse_skb()
Roman Gushchin
2014-10-23 11:21:12 UTC
Permalink
Post by Eric Dumazet
Interesting...
It seems we also need to clear skb->pfmemalloc in napi_reuse_skb()
Sounds reasonable, but are you sure, that we can just drop skb->pfmemalloc flag in napi_reuse_skb()?
Post by Eric Dumazet
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Eric Dumazet
2014-10-23 13:30:30 UTC
Permalink
From: Eric Dumazet <***@google.com>

Do not reuse skb if it was pfmemalloc tainted, otherwise
future frame might be dropped anyway.

Signed-off-by: Eric Dumazet <***@google.com>
---
net/core/dev.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/net/core/dev.c b/net/core/dev.c
index b793e3521a36..945bbd001359 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4157,6 +4157,10 @@ EXPORT_SYMBOL(napi_gro_receive);

static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb)
{
+ if (unlikely(skb->pfmemalloc)) {
+ consume_skb(skb);
+ return;
+ }
__skb_pull(skb, skb_headlen(skb));
/* restore the reserve we had after netdev_alloc_skb_ip_align() */
skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN - skb_headroom(skb));
Roman Gushchin
2014-10-23 13:49:30 UTC
Permalink
Signed-off-by: Roman Gushchin <***@yandex-team.ru>

--
@klamm
Post by Eric Dumazet
Do not reuse skb if it was pfmemalloc tainted, otherwise
future frame might be dropped anyway.
---
 net/core/dev.c |    4 ++++
 1 file changed, 4 insertions(+)
diff --git a/net/core/dev.c b/net/core/dev.c
index b793e3521a36..945bbd001359 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4157,6 +4157,10 @@ EXPORT_SYMBOL(napi_gro_receive);
 static void napi_reuse_skb(struct napi_struct *napi, struct sk_buff *skb)
 {
+ if (unlikely(skb->pfmemalloc)) {
+ consume_skb(skb);
+ return;
+ }
         __skb_pull(skb, skb_headlen(skb));
         /* restore the reserve we had after netdev_alloc_skb_ip_align() */
         skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN - skb_headroom(skb));
David Miller
2014-10-27 02:47:50 UTC
Permalink
From: Eric Dumazet <***@gmail.com>
Date: Thu, 23 Oct 2014 06:30:30 -0700
Post by Eric Dumazet
Do not reuse skb if it was pfmemalloc tainted, otherwise
future frame might be dropped anyway.
Applied, thanks Eric.

------------------------------------------------------------------------------
Jeff Kirsher
2014-10-22 18:30:30 UTC
Permalink
Post by Roman Gushchin
Incoming packet is dropped silently by sk_filter(), if the skb was
allocated from pfmemalloc reserves and the corresponding socket is
not marked with the SOCK_MEMALLOC flag.
Igb driver allocates pages for DMA with __skb_alloc_page(), which
calls alloc_pages_node() with the __GFP_MEMALLOC flag. So, in case
of OOM condition, igb can get pages with pfmemalloc flag set.
If an incoming packet hits the pfmemalloc page and is large enough
(small packets are copying into the memory, allocated with
netdev_alloc_skb_ip_align(), so they are not affected), it will be
dropped.
This behavior is ok under high memory pressure, but the problem is
that the igb driver reuses these mapped pages. So, packets are still
dropping even if all memory issues are gone and there is a plenty
of free memory.
In my case, some TCP sessions hang on a small percentage (< 0.1%)
of machines days after OOMs.
Fix this by avoiding reuse of such pages.
---
drivers/net/ethernet/intel/igb/igb_main.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
Thanks Roman, I have added you patch to my queue.
Roman Gushchin
2014-10-23 07:52:03 UTC
Permalink
Thank you!

Probably we should add it to stable trees too?

--
Regards,
Roman
Post by Jeff Kirsher
 Incoming packet is dropped silently by sk_filter(), if the skb was
 allocated from pfmemalloc reserves and the corresponding socket is
 not marked with the SOCK_MEMALLOC flag.
 Igb driver allocates pages for DMA with __skb_alloc_page(), which
 calls alloc_pages_node() with the __GFP_MEMALLOC flag. So, in case
 of OOM condition, igb can get pages with pfmemalloc flag set.
 If an incoming packet hits the pfmemalloc page and is large enough
 (small packets are copying into the memory, allocated with
 netdev_alloc_skb_ip_align(), so they are not affected), it will be
 dropped.
 This behavior is ok under high memory pressure, but the problem is
 that the igb driver reuses these mapped pages. So, packets are still
 dropping even if all memory issues are gone and there is a plenty
 of free memory.
 In my case, some TCP sessions hang on a small percentage (< 0.1%)
 of machines days after OOMs.
 Fix this by avoiding reuse of such pages.
 ---
  drivers/net/ethernet/intel/igb/igb_main.c | 6 +++++-
  1 file changed, 5 insertions(+), 1 deletion(-)
Thanks Roman, I have added you patch to my queue.
Loading...