Hi,
大家好,咨询一个问题,我们有个实时任务运行在Flink1.11.2版本,使用rocksdbstatebackend,最近报警出现了物理内存超限被kill的异常信息,我们查看了监控taskmanager heap使用量没有超限,direct内存使用量也维持在一个平稳的范围内没有超限,也没有报oom,这种情况是非堆内存异常是吗?完整报错信息如下: Dump of the process-tree for container_e06_1603181034156_0137_01_000002 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 180421 180362 180362 180362 (java) 258262921 59979106 30306209792 6553277 /usr/jdk64/jdk1.8.0_152/bin/java -XX:+UseSerialGC -Xmx11542724608 -Xms11542724608 -XX:MaxDirectMemorySize=1207959552 -XX:MaxMetaspaceSize=268435456 -Dlog.file=/mnt/ssd/8/yarn/log/application_1603181034156_0137/container_e06_1603181034156_0137_01_000002/taskmanager.log -Dlog4j.configuration=file:./log4j.properties -Dlog4j.configurationFile=file:./log4j.properties org.apache.flink.yarn.YarnTaskExecutorRunner -D taskmanager.memory.framework.off-heap.size=134217728b -D taskmanager.memory.network.max=1073741824b -D taskmanager.memory.network.min=1073741824b -D taskmanager.memory.framework.heap.size=134217728b -D taskmanager.memory.managed.size=12750684160b -D taskmanager.cpu.cores=1.0 -D taskmanager.memory.task.heap.size=11408506880b -D taskmanager.memory.task.off-heap.size=0b --configDir . -Djobmanager.rpc.address=flink-cm8.jd.163.org -Dweb.port=0 -Dweb.tmpdir=/tmp/flink-web-9197a884-03b9-4865-a0a0-0b6a1c295f2c -Djobmanager.rpc.port=33656 -Drest.address=flink-cm8.jd.163.org -Dsecurity.kerberos.login.keytab=/mnt/ssd/3/yarn/local/usercache/portal/appcache/application_1603181034156_0137/container_e06_1603181034156_0137_01_000001/krb5.keytab |- 180362 180360 180362 180362 (bash) 0 2 116011008 353 /bin/bash -c /usr/jdk64/jdk1.8.0_152/bin/java -XX:+UseSerialGC -Xmx11542724608 -Xms11542724608 -XX:MaxDirectMemorySize=1207959552 -XX:MaxMetaspaceSize=268435456 -Dlog.file=/mnt/ssd/8/yarn/log/application_1603181034156_0137/container_e06_1603181034156_0137_01_000002/taskmanager.log -Dlog4j.configuration=file:./log4j.properties -Dlog4j.configurationFile=file:./log4j.properties org.apache.flink.yarn.YarnTaskExecutorRunner -D taskmanager.memory.framework.off-heap.size=134217728b -D taskmanager.memory.network.max=1073741824b -D taskmanager.memory.network.min=1073741824b -D taskmanager.memory.framework.heap.size=134217728b -D taskmanager.memory.managed.size=12750684160b -D taskmanager.cpu.cores=1.0 -D taskmanager.memory.task.heap.size=11408506880b -D taskmanager.memory.task.off-heap.size=0b --configDir . -Djobmanager.rpc.address='flink-cm8.jd.163.org' -Dweb.port='0' -Dweb.tmpdir='/tmp/flink-web-9197a884-03b9-4865-a0a0-0b6a1c295f2c' -Djobmanager.rpc.port='33656' -Drest.address='flink-cm8.jd.163.org' -Dsecurity.kerberos.login.keytab='/mnt/ssd/3/yarn/local/usercache/portal/appcache/application_1603181034156_0137/container_e06_1603181034156_0137_01_000001/krb5.keytab' 1> /mnt/ssd/8/yarn/log/application_1603181034156_0137/container_e06_1603181034156_0137_01_000002/taskmanager.out 2> /mnt/ssd/8/yarn/log/application_1603181034156_0137/container_e06_1603181034156_0137_01_000002/taskmanager.err Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143 2021-01-07 11:51:00,781 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: 银河SDK原始日志 (18/90) (51ac2f29df472d001ce9b4307636ac1c) switched from RUNNING to FAILED on org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@1aad00fa. java.lang.Exception: Container [pid=180362,containerID=container_e06_1603181034156_0137_01_000002] is running beyond physical memory limits. Current usage: 25.0 GB of 25 GB physical memory used; 28.3 GB of 52.5 GB virtual memory used. Killing container. Dump of the process-tree for container_e06_1603181034156_0137_01_000002 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 180421 180362 180362 180362 (java) 258262921 59979106 30306209792 6553277 /usr/jdk64/jdk1.8.0_152/bin/java -XX:+UseSerialGC -Xmx11542724608 -Xms11542724608 -XX:MaxDirectMemorySize=1207959552 -XX:MaxMetaspaceSize=268435456 -Dlog.file=/mnt/ssd/8/yarn/log/application_1603181034156_0137/container_e06_1603181034156_0137_01_000002/taskmanager.log -Dlog4j.configuration=file:./log4j.properties -Dlog4j.configurationFile=file:./log4j.properties org.apache.flink.yarn.YarnTaskExecutorRunner -D taskmanager.memory.framework.off-heap.size=134217728b -D taskmanager.memory.network.max=1073741824b -D taskmanager.memory.network.min=1073741824b -D taskmanager.memory.framework.heap.size=134217728b -D taskmanager.memory.managed.size=12750684160b -D taskmanager.cpu.cores=1.0 -D taskmanager.memory.task.heap.size=11408506880b -D taskmanager.memory.task.off-heap.size=0b --configDir . -Djobmanager.rpc.address=flink-cm8.jd.163.org -Dweb.port=0 -Dweb.tmpdir=/tmp/flink-web-9197a884-03b9-4865-a0a0-0b6a1c295f2c -Djobmanager.rpc.port=33656 -Drest.address=flink-cm8.jd.163.org -Dsecurity.kerberos.login.keytab=/mnt/ssd/3/yarn/local/usercache/portal/appcache/application_1603181034156_0137/container_e06_1603181034156_0137_01_000001/krb5.keytab |- 180362 180360 180362 180362 (bash) 0 2 116011008 353 /bin/bash -c /usr/jdk64/jdk1.8.0_152/bin/java -XX:+UseSerialGC -Xmx11542724608 -Xms11542724608 -XX:MaxDirectMemorySize=1207959552 -XX:MaxMetaspaceSize=268435456 -Dlog.file=/mnt/ssd/8/yarn/log/application_1603181034156_0137/container_e06_1603181034156_0137_01_000002/taskmanager.log -Dlog4j.configuration=file:./log4j.properties -Dlog4j.configurationFile=file:./log4j.properties org.apache.flink.yarn.YarnTaskExecutorRunner -D taskmanager.memory.framework.off-heap.size=134217728b -D taskmanager.memory.network.max=1073741824b -D taskmanager.memory.network.min=1073741824b -D taskmanager.memory.framework.heap.size=134217728b -D taskmanager.memory.managed.size=12750684160b -D taskmanager.cpu.cores=1.0 -D taskmanager.memory.task.heap.size=11408506880b -D taskmanager.memory.task.off-heap.size=0b --configDir . -Djobmanager.rpc.address='flink-cm8.jd.163.org' -Dweb.port='0' -Dweb.tmpdir='/tmp/flink-web-9197a884-03b9-4865-a0a0-0b6a1c295f2c' -Djobmanager.rpc.port='33656' -Drest.address='flink-cm8.jd.163.org' -Dsecurity.kerberos.login.keytab='/mnt/ssd/3/yarn/local/usercache/portal/appcache/application_1603181034156_0137/container_e06_1603181034156_0137_01_000001/krb5.keytab' 1> /mnt/ssd/8/yarn/log/application_1603181034156_0137/container_e06_1603181034156_0137_01_000002/taskmanager.out 2> /mnt/ssd/8/yarn/log/application_1603181034156_0137/container_e06_1603181034156_0137_01_000002/taskmanager.err Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143 |
Hi,
有可能是堆外内存超用,可以参考最近中文社区的一篇投稿 《详解 Flink 容器化环境下的 OOM Killed》进行修改,建议先增大 jvm-overhead 相关配置 [1] https://mp.weixin.qq.com/s?__biz=MzU3Mzg4OTMyNQ==&mid=2247490197&idx=1&sn=b0893a9bf12fbcae76852a156302de95 祝好 唐云 ________________________________ From: Yang Peng <[hidden email]> Sent: Thursday, January 7, 2021 12:24 To: user-zh <[hidden email]> Subject: Flink 1.11.2版本 实时任务运行 报错 is running beyond physical memory limits. Current usage: 25.0 GB of 25 GB physical memory used; 28.3 GB of 52.5 GB virtual memory used. Killing container Hi, 大家好,咨询一个问题,我们有个实时任务运行在Flink1.11.2版本,使用rocksdbstatebackend,最近报警出现了物理内存超限被kill的异常信息,我们查看了监控taskmanager heap使用量没有超限,direct内存使用量也维持在一个平稳的范围内没有超限,也没有报oom,这种情况是非堆内存异常是吗?完整报错信息如下: Dump of the process-tree for container_e06_1603181034156_0137_01_000002 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 180421 180362 180362 180362 (java) 258262921 59979106 30306209792 6553277 /usr/jdk64/jdk1.8.0_152/bin/java -XX:+UseSerialGC -Xmx11542724608 -Xms11542724608 -XX:MaxDirectMemorySize=1207959552 -XX:MaxMetaspaceSize=268435456 -Dlog.file=/mnt/ssd/8/yarn/log/application_1603181034156_0137/container_e06_1603181034156_0137_01_000002/taskmanager.log -Dlog4j.configuration=file:./log4j.properties -Dlog4j.configurationFile=file:./log4j.properties org.apache.flink.yarn.YarnTaskExecutorRunner -D taskmanager.memory.framework.off-heap.size=134217728b -D taskmanager.memory.network.max=1073741824b -D taskmanager.memory.network.min=1073741824b -D taskmanager.memory.framework.heap.size=134217728b -D taskmanager.memory.managed.size=12750684160b -D taskmanager.cpu.cores=1.0 -D taskmanager.memory.task.heap.size=11408506880b -D taskmanager.memory.task.off-heap.size=0b --configDir . -Djobmanager.rpc.address=flink-cm8.jd.163.org -Dweb.port=0 -Dweb.tmpdir=/tmp/flink-web-9197a884-03b9-4865-a0a0-0b6a1c295f2c -Djobmanager.rpc.port=33656 -Drest.address=flink-cm8.jd.163.org -Dsecurity.kerberos.login.keytab=/mnt/ssd/3/yarn/local/usercache/portal/appcache/application_1603181034156_0137/container_e06_1603181034156_0137_01_000001/krb5.keytab |- 180362 180360 180362 180362 (bash) 0 2 116011008 353 /bin/bash -c /usr/jdk64/jdk1.8.0_152/bin/java -XX:+UseSerialGC -Xmx11542724608 -Xms11542724608 -XX:MaxDirectMemorySize=1207959552 -XX:MaxMetaspaceSize=268435456 -Dlog.file=/mnt/ssd/8/yarn/log/application_1603181034156_0137/container_e06_1603181034156_0137_01_000002/taskmanager.log -Dlog4j.configuration=file:./log4j.properties -Dlog4j.configurationFile=file:./log4j.properties org.apache.flink.yarn.YarnTaskExecutorRunner -D taskmanager.memory.framework.off-heap.size=134217728b -D taskmanager.memory.network.max=1073741824b -D taskmanager.memory.network.min=1073741824b -D taskmanager.memory.framework.heap.size=134217728b -D taskmanager.memory.managed.size=12750684160b -D taskmanager.cpu.cores=1.0 -D taskmanager.memory.task.heap.size=11408506880b -D taskmanager.memory.task.off-heap.size=0b --configDir . -Djobmanager.rpc.address='flink-cm8.jd.163.org' -Dweb.port='0' -Dweb.tmpdir='/tmp/flink-web-9197a884-03b9-4865-a0a0-0b6a1c295f2c' -Djobmanager.rpc.port='33656' -Drest.address='flink-cm8.jd.163.org' -Dsecurity.kerberos.login.keytab='/mnt/ssd/3/yarn/local/usercache/portal/appcache/application_1603181034156_0137/container_e06_1603181034156_0137_01_000001/krb5.keytab' 1> /mnt/ssd/8/yarn/log/application_1603181034156_0137/container_e06_1603181034156_0137_01_000002/taskmanager.out 2> /mnt/ssd/8/yarn/log/application_1603181034156_0137/container_e06_1603181034156_0137_01_000002/taskmanager.err Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143 2021-01-07 11:51:00,781 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: 银河SDK原始日志 (18/90) (51ac2f29df472d001ce9b4307636ac1c) switched from RUNNING to FAILED on org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@1aad00fa. java.lang.Exception: Container [pid=180362,containerID=container_e06_1603181034156_0137_01_000002] is running beyond physical memory limits. Current usage: 25.0 GB of 25 GB physical memory used; 28.3 GB of 52.5 GB virtual memory used. Killing container. Dump of the process-tree for container_e06_1603181034156_0137_01_000002 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 180421 180362 180362 180362 (java) 258262921 59979106 30306209792 6553277 /usr/jdk64/jdk1.8.0_152/bin/java -XX:+UseSerialGC -Xmx11542724608 -Xms11542724608 -XX:MaxDirectMemorySize=1207959552 -XX:MaxMetaspaceSize=268435456 -Dlog.file=/mnt/ssd/8/yarn/log/application_1603181034156_0137/container_e06_1603181034156_0137_01_000002/taskmanager.log -Dlog4j.configuration=file:./log4j.properties -Dlog4j.configurationFile=file:./log4j.properties org.apache.flink.yarn.YarnTaskExecutorRunner -D taskmanager.memory.framework.off-heap.size=134217728b -D taskmanager.memory.network.max=1073741824b -D taskmanager.memory.network.min=1073741824b -D taskmanager.memory.framework.heap.size=134217728b -D taskmanager.memory.managed.size=12750684160b -D taskmanager.cpu.cores=1.0 -D taskmanager.memory.task.heap.size=11408506880b -D taskmanager.memory.task.off-heap.size=0b --configDir . -Djobmanager.rpc.address=flink-cm8.jd.163.org -Dweb.port=0 -Dweb.tmpdir=/tmp/flink-web-9197a884-03b9-4865-a0a0-0b6a1c295f2c -Djobmanager.rpc.port=33656 -Drest.address=flink-cm8.jd.163.org -Dsecurity.kerberos.login.keytab=/mnt/ssd/3/yarn/local/usercache/portal/appcache/application_1603181034156_0137/container_e06_1603181034156_0137_01_000001/krb5.keytab |- 180362 180360 180362 180362 (bash) 0 2 116011008 353 /bin/bash -c /usr/jdk64/jdk1.8.0_152/bin/java -XX:+UseSerialGC -Xmx11542724608 -Xms11542724608 -XX:MaxDirectMemorySize=1207959552 -XX:MaxMetaspaceSize=268435456 -Dlog.file=/mnt/ssd/8/yarn/log/application_1603181034156_0137/container_e06_1603181034156_0137_01_000002/taskmanager.log -Dlog4j.configuration=file:./log4j.properties -Dlog4j.configurationFile=file:./log4j.properties org.apache.flink.yarn.YarnTaskExecutorRunner -D taskmanager.memory.framework.off-heap.size=134217728b -D taskmanager.memory.network.max=1073741824b -D taskmanager.memory.network.min=1073741824b -D taskmanager.memory.framework.heap.size=134217728b -D taskmanager.memory.managed.size=12750684160b -D taskmanager.cpu.cores=1.0 -D taskmanager.memory.task.heap.size=11408506880b -D taskmanager.memory.task.off-heap.size=0b --configDir . -Djobmanager.rpc.address='flink-cm8.jd.163.org' -Dweb.port='0' -Dweb.tmpdir='/tmp/flink-web-9197a884-03b9-4865-a0a0-0b6a1c295f2c' -Djobmanager.rpc.port='33656' -Drest.address='flink-cm8.jd.163.org' -Dsecurity.kerberos.login.keytab='/mnt/ssd/3/yarn/local/usercache/portal/appcache/application_1603181034156_0137/container_e06_1603181034156_0137_01_000001/krb5.keytab' 1> /mnt/ssd/8/yarn/log/application_1603181034156_0137/container_e06_1603181034156_0137_01_000002/taskmanager.out 2> /mnt/ssd/8/yarn/log/application_1603181034156_0137/container_e06_1603181034156_0137_01_000002/taskmanager.err Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143 |
Free forum by Nabble | Edit this page |