请教下,我有个flink任务经常因为内存超用被yarn 集群kill,不知道该怎么排查问题,flink版本1.11.0,启动命令为:
bin/flink run -m yarn-cluster -yjm 2048m -ytm 8192m -ys 2 xxx.jar,使用rocksdb状态后端,设置的参数有taskmanager.memory.managed.fraction=0.6;taskmanager.memory.jvm-overhead.fraction=0.2。下面是某个时刻flink页面的taskmanage统计。请问内存超用可能是来自什么地方呢,感觉程序中并没用用到第三方jar使用大量native,自己程序里也没有用native内存的地方 Free Slots / All Slots:0 / 2 CPU Cores:24 Physical Memory:251 GB JVM Heap Size:1.82 GB Flink Managed Memory:4.05 GB Memory JVM (Heap/Non-Heap) Type Committed Used Maximum Heap1.81 GB1.13 GB1.81 GB Non-Heap169 MB160 MB1.48 GB Total1.98 GB1.29 GB3.30 GB Outside JVM Type Count Used Capacity Direct24,493718 MB718 MB Mapped00 B0 B Network Memory Segments Type Count Available21,715 Total22,118 Garbage Collection Collector Count Time PS_Scavenge19917,433 PS_MarkSweep44,173 |
Hi Bob,
可以设置下参数 'state.backend.rocksdb.memory.fixed-per-slot' [1] 看下有没有效果。 [1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/config.html#state-backend-rocksdb-memory-fixed-per-slot Best, Hailong Wang 在 2020-11-08 10:50:29,"元始(Bob Hu)" <[hidden email]> 写道: >请教下,我有个flink任务经常因为内存超用被yarn 集群kill,不知道该怎么排查问题,flink版本1.11.0,启动命令为: >bin/flink run -m yarn-cluster -yjm 2048m -ytm 8192m -ys 2 xxx.jar,使用rocksdb状态后端,设置的参数有taskmanager.memory.managed.fraction=0.6;taskmanager.memory.jvm-overhead.fraction=0.2。下面是某个时刻flink页面的taskmanage统计。请问内存超用可能是来自什么地方呢,感觉程序中并没用用到第三方jar使用大量native,自己程序里也没有用native内存的地方 > > >Free Slots / All Slots:0 / 2 >CPU Cores:24 >Physical Memory:251 GB >JVM Heap Size:1.82 GB >Flink Managed Memory:4.05 GB > >Memory > > >JVM (Heap/Non-Heap) > > >Type >Committed >Used >Maximum > >Heap1.81 GB1.13 GB1.81 GB >Non-Heap169 MB160 MB1.48 GB >Total1.98 GB1.29 GB3.30 GB > > > > > >Outside JVM > > >Type >Count >Used >Capacity > >Direct24,493718 MB718 MB >Mapped00 B0 B > > > > > > >Network > > >Memory Segments > > >Type >Count > >Available21,715 >Total22,118 > > > > > >Garbage Collection > > >Collector >Count >Time > >PS_Scavenge19917,433 >PS_MarkSweep44,173 |
Hi
可以通过增大 "taskmanager.memory.jvm-overhead.max" [1] 以及 "taskmanager.memory.process.size" [2] 来增大可以超用的内存空间。可以通过观察 "state.backend.rocksdb.metrics.block-cache-pinned-usage" [3] 的数值看rocksDB使用的native memory是否超过managed memory。 [1] https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#taskmanager-memory-jvm-overhead-max [2] https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#taskmanager-memory-process-size [3]https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/config.html#state-backend-rocksdb-metrics-block-cache-pinned-usage 祝好 唐云 ________________________________ From: hailongwang <[hidden email]> Sent: Sunday, November 8, 2020 20:03 To: [hidden email] <[hidden email]> Subject: Re:flink内存超用问题 Hi Bob, 可以设置下参数 'state.backend.rocksdb.memory.fixed-per-slot' [1] 看下有没有效果。 [1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/config.html#state-backend-rocksdb-memory-fixed-per-slot Best, Hailong Wang 在 2020-11-08 10:50:29,"元始(Bob Hu)" <[hidden email]> 写道: >请教下,我有个flink任务经常因为内存超用被yarn 集群kill,不知道该怎么排查问题,flink版本1.11.0,启动命令为: >bin/flink run -m yarn-cluster -yjm 2048m -ytm 8192m -ys 2 xxx.jar,使用rocksdb状态后端,设置的参数有taskmanager.memory.managed.fraction=0.6;taskmanager.memory.jvm-overhead.fraction=0.2。下面是某个时刻flink页面的taskmanage统计。请问内存超用可能是来自什么地方呢,感觉程序中并没用用到第三方jar使用大量native,自己程序里也没有用native内存的地方 > > >Free Slots / All Slots:0 / 2 >CPU Cores:24 >Physical Memory:251 GB >JVM Heap Size:1.82 GB >Flink Managed Memory:4.05 GB > >Memory > > >JVM (Heap/Non-Heap) > > >Type >Committed >Used >Maximum > >Heap1.81 GB1.13 GB1.81 GB >Non-Heap169 MB160 MB1.48 GB >Total1.98 GB1.29 GB3.30 GB > > > > > >Outside JVM > > >Type >Count >Used >Capacity > >Direct24,493718 MB718 MB >Mapped00 B0 B > > > > > > >Network > > >Memory Segments > > >Type >Count > >Available21,715 >Total22,118 > > > > > >Garbage Collection > > >Collector >Count >Time > >PS_Scavenge19917,433 >PS_MarkSweep44,173 |
Free forum by Nabble | Edit this page |