flink内存超用问题

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

flink内存超用问题

元始(Bob Hu)
请教下,我有个flink任务经常因为内存超用被yarn 集群kill,不知道该怎么排查问题,flink版本1.11.0,启动命令为:
bin/flink run -m yarn-cluster -yjm 2048m -ytm 8192m -ys 2 xxx.jar,使用rocksdb状态后端,设置的参数有taskmanager.memory.managed.fraction=0.6;taskmanager.memory.jvm-overhead.fraction=0.2。下面是某个时刻flink页面的taskmanage统计。请问内存超用可能是来自什么地方呢,感觉程序中并没用用到第三方jar使用大量native,自己程序里也没有用native内存的地方


Free Slots / All Slots:0 / 2
CPU Cores:24
Physical Memory:251 GB
JVM Heap Size:1.82 GB
Flink Managed Memory:4.05 GB

Memory


JVM (Heap/Non-Heap)


Type
Committed
Used
Maximum

Heap1.81 GB1.13 GB1.81 GB
Non-Heap169 MB160 MB1.48 GB
Total1.98 GB1.29 GB3.30 GB





Outside JVM


Type
Count
Used
Capacity

Direct24,493718 MB718 MB
Mapped00 B0 B






Network


Memory Segments


Type
Count

Available21,715
Total22,118





Garbage Collection


Collector
Count
Time

PS_Scavenge19917,433
PS_MarkSweep44,173
Reply | Threaded
Open this post in threaded view
|

Re:flink内存超用问题

hailongwang
Hi Bob,
 可以设置下参数 'state.backend.rocksdb.memory.fixed-per-slot' [1] 看下有没有效果。
[1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/config.html#state-backend-rocksdb-memory-fixed-per-slot


Best,
Hailong Wang




在 2020-11-08 10:50:29,"元始(Bob Hu)" <[hidden email]> 写道:

>请教下,我有个flink任务经常因为内存超用被yarn 集群kill,不知道该怎么排查问题,flink版本1.11.0,启动命令为:
>bin/flink run -m yarn-cluster -yjm 2048m -ytm 8192m -ys 2 xxx.jar,使用rocksdb状态后端,设置的参数有taskmanager.memory.managed.fraction=0.6;taskmanager.memory.jvm-overhead.fraction=0.2。下面是某个时刻flink页面的taskmanage统计。请问内存超用可能是来自什么地方呢,感觉程序中并没用用到第三方jar使用大量native,自己程序里也没有用native内存的地方
>
>
>Free Slots / All Slots:0 / 2
>CPU Cores:24
>Physical Memory:251 GB
>JVM Heap Size:1.82 GB
>Flink Managed Memory:4.05 GB
>
>Memory
>
>
>JVM (Heap/Non-Heap)
>
>
>Type
>Committed
>Used
>Maximum
>
>Heap1.81 GB1.13 GB1.81 GB
>Non-Heap169 MB160 MB1.48 GB
>Total1.98 GB1.29 GB3.30 GB
>
>
>
>
>
>Outside JVM
>
>
>Type
>Count
>Used
>Capacity
>
>Direct24,493718 MB718 MB
>Mapped00 B0 B
>
>
>
>
>
>
>Network
>
>
>Memory Segments
>
>
>Type
>Count
>
>Available21,715
>Total22,118
>
>
>
>
>
>Garbage Collection
>
>
>Collector
>Count
>Time
>
>PS_Scavenge19917,433
>PS_MarkSweep44,173
Reply | Threaded
Open this post in threaded view
|

Re: Re:flink内存超用问题

Yun Tang
Hi

可以通过增大 "taskmanager.memory.jvm-overhead.max" [1] 以及  "taskmanager.memory.process.size" [2] 来增大可以超用的内存空间。可以通过观察 "state.backend.rocksdb.metrics.block-cache-pinned-usage" [3] 的数值看rocksDB使用的native memory是否超过managed memory。


[1] https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#taskmanager-memory-jvm-overhead-max
[2] https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#taskmanager-memory-process-size
[3]https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/config.html#state-backend-rocksdb-metrics-block-cache-pinned-usage
祝好
唐云

________________________________
From: hailongwang <[hidden email]>
Sent: Sunday, November 8, 2020 20:03
To: [hidden email] <[hidden email]>
Subject: Re:flink内存超用问题

Hi Bob,
 可以设置下参数 'state.backend.rocksdb.memory.fixed-per-slot' [1] 看下有没有效果。
[1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/config.html#state-backend-rocksdb-memory-fixed-per-slot


Best,
Hailong Wang




在 2020-11-08 10:50:29,"元始(Bob Hu)" <[hidden email]> 写道:

>请教下,我有个flink任务经常因为内存超用被yarn 集群kill,不知道该怎么排查问题,flink版本1.11.0,启动命令为:
>bin/flink run -m yarn-cluster -yjm 2048m -ytm 8192m -ys 2 xxx.jar,使用rocksdb状态后端,设置的参数有taskmanager.memory.managed.fraction=0.6;taskmanager.memory.jvm-overhead.fraction=0.2。下面是某个时刻flink页面的taskmanage统计。请问内存超用可能是来自什么地方呢,感觉程序中并没用用到第三方jar使用大量native,自己程序里也没有用native内存的地方
>
>
>Free Slots / All Slots:0 / 2
>CPU Cores:24
>Physical Memory:251 GB
>JVM Heap Size:1.82 GB
>Flink Managed Memory:4.05 GB
>
>Memory
>
>
>JVM (Heap/Non-Heap)
>
>
>Type
>Committed
>Used
>Maximum
>
>Heap1.81 GB1.13 GB1.81 GB
>Non-Heap169 MB160 MB1.48 GB
>Total1.98 GB1.29 GB3.30 GB
>
>
>
>
>
>Outside JVM
>
>
>Type
>Count
>Used
>Capacity
>
>Direct24,493718 MB718 MB
>Mapped00 B0 B
>
>
>
>
>
>
>Network
>
>
>Memory Segments
>
>
>Type
>Count
>
>Available21,715
>Total22,118
>
>
>
>
>
>Garbage Collection
>
>
>Collector
>Count
>Time
>
>PS_Scavenge19917,433
>PS_MarkSweep44,173