flink 1.11 taskmanager实际占用内存超出配置太多

classic Classic list List threaded Threaded
3 messages Options
Z-Z
Reply | Threaded
Open this post in threaded view
|

flink 1.11 taskmanager实际占用内存超出配置太多

Z-Z
Hi 大家早上好,请问大佬:
flink docker session模式中,taskmanager的内存配置是
taskmanager.memory.process.size: 5120m
taskmanager.memory.jvm-metaspace.size: 1024m

但在容器外,taskmanager实际占用的内存超过了7.5G,taskmanager启动时打印的内存参数也是对的
INFO  [] - Final TaskExecutor Memory configuration:
INFO  [] -   Total Process Memory:          5.000gb (5368709120 bytes)
INFO  [] -     Total Flink Memory:          3.500gb (3758096376 bytes)
INFO  [] -       Total JVM Heap Memory:     1.625gb (1744830433 bytes)
INFO  [] -         Framework:               128.000mb (134217728 bytes)
INFO  [] -         Task:                    1.500gb (1610612705 bytes)
INFO  [] -       Total Off-heap Memory:     1.875gb (2013265943 bytes)
INFO  [] -         Managed:                 1.400gb (1503238572 bytes)
INFO  [] -         Total JVM Direct Memory: 486.400mb (510027371 bytes)
INFO  [] -           Framework:             128.000mb (134217728 bytes)
INFO  [] -           Task:                  0 bytes
INFO  [] -           Network:               358.400mb (375809643 bytes)
INFO  [] -     JVM Metaspace:               1024.000mb (1073741824 bytes)
INFO  [] -     JVM Overhead:                512.000mb (536870920 bytes)

使用jdk1.8的jmap进行dump会报错,加-F也不行:
1: Unable to open socket file: target process not responding or HotSpot VM not loaded


我该怎么分析它内存到底用在哪了呢?
Z-Z
Reply | Threaded
Open this post in threaded view
|

回复:flink 1.11 taskmanager实际占用内存超出配置太多

Z-Z
补充一下,是用的rocksdb做状态存储




------------------ 原始邮件 ------------------
发件人:                                                                                                                        "Z-Z"                                                                                    <[hidden email]&gt;;
发送时间:&nbsp;2020年9月10日(星期四) 上午10:08
收件人:&nbsp;"user-zh"<[hidden email]&gt;;

主题:&nbsp;flink 1.11 taskmanager实际占用内存超出配置太多



Hi 大家早上好,请问大佬:
flink docker session模式中,taskmanager的内存配置是
taskmanager.memory.process.size: 5120m
taskmanager.memory.jvm-metaspace.size: 1024m

但在容器外,taskmanager实际占用的内存超过了7.5G,taskmanager启动时打印的内存参数也是对的
INFO&nbsp; [] - Final TaskExecutor Memory configuration:
INFO&nbsp; [] -&nbsp; &nbsp;Total Process Memory:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 5.000gb (5368709120 bytes)
INFO&nbsp; [] -&nbsp; &nbsp; &nbsp;Total Flink Memory:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 3.500gb (3758096376 bytes)
INFO&nbsp; [] -&nbsp; &nbsp; &nbsp; &nbsp;Total JVM Heap Memory:&nbsp; &nbsp; &nbsp;1.625gb (1744830433 bytes)
INFO&nbsp; [] -&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Framework:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;128.000mb (134217728 bytes)
INFO&nbsp; [] -&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Task:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1.500gb (1610612705 bytes)
INFO&nbsp; [] -&nbsp; &nbsp; &nbsp; &nbsp;Total Off-heap Memory:&nbsp; &nbsp; &nbsp;1.875gb (2013265943 bytes)
INFO&nbsp; [] -&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Managed:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1.400gb (1503238572 bytes)
INFO&nbsp; [] -&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Total JVM Direct Memory: 486.400mb (510027371 bytes)
INFO&nbsp; [] -&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Framework:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;128.000mb (134217728 bytes)
INFO&nbsp; [] -&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Task:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0 bytes
INFO&nbsp; [] -&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Network:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;358.400mb (375809643 bytes)
INFO&nbsp; [] -&nbsp; &nbsp; &nbsp;JVM Metaspace:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1024.000mb (1073741824 bytes)
INFO&nbsp; [] -&nbsp; &nbsp; &nbsp;JVM Overhead:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 512.000mb (536870920 bytes)

使用jdk1.8的jmap进行dump会报错,加-F也不行:
1: Unable to open socket file: target process not responding or HotSpot VM not loaded


我该怎么分析它内存到底用在哪了呢?
Reply | Threaded
Open this post in threaded view
|

Re: flink 1.11 taskmanager实际占用内存超出配置太多

Xintong Song
Flink 是无法完全控制所有内存开销的,这是 java 应用程序自身特点决定的。
- 对于 java heap/direct/metaspace 等 JVM 可以控制的内存,Flink 会设置 JVM 参数控制其不能超用
- 对于 Flink 自己维护的固定大小的缓冲池,如 network buffer pool, managed memory 等,Flink 也会
严格限制申请内存的大小。
- 对于其他一些开销,如 JVM 的线程栈、用户代码及第三方依赖中的 native 方法等,Flink
是无法限制这部分内存使用的大小的,只能是根据配置从总内存中预留出一部分来。如果预留的不够多,就回出现内存潮涌的情况

如果出现内存超用,只可能是上述第三部分造成的。建议:
- 确认 'state.backend.rocksdb.memory.managed' 是否配为了 true。默认为 true 时,rocksdb
会根据 managed memory 的大小调整自己的内存用量。如果是 false,则 rocksdb 的内存用量会无视 taskmanager
的内存配置,存在超用的可能。
- 调整 'taskmanager.memory.jvm-overhead.[min|max|fraction]’,让 taskmanager
预留更多的内存。

另外可以参考官方文档中针对内存超用的建议 [1]。

Thank you~

Xintong Song


[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.11/zh/ops/memory/mem_trouble.html#%E5%AE%B9%E5%99%A8container%E5%86%85%E5%AD%98%E8%B6%85%E7%94%A8

On Thu, Sep 10, 2020 at 12:54 PM Z-Z <[hidden email]> wrote:

> 补充一下,是用的rocksdb做状态存储
>
>
>
>
> ------------------&nbsp;原始邮件&nbsp;------------------
> 发件人:
>                                                   "Z-Z"
>                                                                 <
> [hidden email]&gt;;
> 发送时间:&nbsp;2020年9月10日(星期四) 上午10:08
> 收件人:&nbsp;"user-zh"<[hidden email]&gt;;
>
> 主题:&nbsp;flink 1.11 taskmanager实际占用内存超出配置太多
>
>
>
> Hi 大家早上好,请问大佬:
> flink docker session模式中,taskmanager的内存配置是
> taskmanager.memory.process.size: 5120m
> taskmanager.memory.jvm-metaspace.size: 1024m
>
> 但在容器外,taskmanager实际占用的内存超过了7.5G,taskmanager启动时打印的内存参数也是对的
> INFO&nbsp; [] - Final TaskExecutor Memory configuration:
> INFO&nbsp; [] -&nbsp; &nbsp;Total Process Memory:&nbsp; &nbsp; &nbsp;
> &nbsp; &nbsp; 5.000gb (5368709120 bytes)
> INFO&nbsp; [] -&nbsp; &nbsp; &nbsp;Total Flink Memory:&nbsp; &nbsp; &nbsp;
> &nbsp; &nbsp; 3.500gb (3758096376 bytes)
> INFO&nbsp; [] -&nbsp; &nbsp; &nbsp; &nbsp;Total JVM Heap Memory:&nbsp;
> &nbsp; &nbsp;1.625gb (1744830433 bytes)
> INFO&nbsp; [] -&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Framework:&nbsp; &nbsp;
> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;128.000mb (134217728 bytes)
> INFO&nbsp; [] -&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Task:&nbsp; &nbsp; &nbsp;
> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1.500gb (1610612705 bytes)
> INFO&nbsp; [] -&nbsp; &nbsp; &nbsp; &nbsp;Total Off-heap Memory:&nbsp;
> &nbsp; &nbsp;1.875gb (2013265943 bytes)
> INFO&nbsp; [] -&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Managed:&nbsp; &nbsp;
> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1.400gb (1503238572 bytes)
> INFO&nbsp; [] -&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Total JVM Direct Memory:
> 486.400mb (510027371 bytes)
> INFO&nbsp; [] -&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Framework:&nbsp;
> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;128.000mb (134217728 bytes)
> INFO&nbsp; [] -&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Task:&nbsp; &nbsp;
> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0 bytes
> INFO&nbsp; [] -&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Network:&nbsp;
> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;358.400mb (375809643 bytes)
> INFO&nbsp; [] -&nbsp; &nbsp; &nbsp;JVM Metaspace:&nbsp; &nbsp; &nbsp;
> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1024.000mb (1073741824 bytes)
> INFO&nbsp; [] -&nbsp; &nbsp; &nbsp;JVM Overhead:&nbsp; &nbsp; &nbsp;
> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 512.000mb (536870920 bytes)
>
> 使用jdk1.8的jmap进行dump会报错,加-F也不行:
> 1: Unable to open socket file: target process not responding or HotSpot VM
> not loaded
>
>
> 我该怎么分析它内存到底用在哪了呢?