各位好,checkpoint相关问题L
flink版本1.10.1:,个别的checkpoint过程发生问题: java.lang.Exception: Could not perform checkpoint 1194 for operator Map (3/3). at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:816) at org.apache.flink.streaming.runtime.io.CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:86) at org.apache.flink.streaming.runtime.io.CheckpointBarrierTracker.processBarrier(CheckpointBarrierTracker.java:99) at org.apache.flink.streaming.runtime.io.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:155) at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:133) at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:69) at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:310) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:187) at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:485) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:469) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:708) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:533) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1382) at org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:974) at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$5(StreamTask.java:870) at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:94) at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:843) at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:803) ... 12 mor 绝大部分是正常完成的,但是小部分比如上面的情况,就会失败,还会导致suspending-->restart. -- Sent from: http://apache-flink.147419.n8.nabble.com/ |
Hi
这个问题是应该和 FLINK-17479 是一样的,是特定 JDK 上会遇到问题,可以考虑升级一下 flink 版本,或者替换一个 JDK 版本 Best, Congxian Storm☀️ <[hidden email]> 于2020年9月27日周日 上午10:17写道: > 各位好,checkpoint相关问题L > > flink版本1.10.1:,个别的checkpoint过程发生问题: > java.lang.Exception: Could not perform checkpoint 1194 for operator Map > (3/3). > at > > org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:816) > at > org.apache.flink.streaming.runtime.io > .CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:86) > at > org.apache.flink.streaming.runtime.io > .CheckpointBarrierTracker.processBarrier(CheckpointBarrierTracker.java:99) > at > org.apache.flink.streaming.runtime.io > .CheckpointedInputGate.pollNext(CheckpointedInputGate.java:155) > at > org.apache.flink.streaming.runtime.io > .StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:133) > at > org.apache.flink.streaming.runtime.io > .StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:69) > at > > org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:310) > at > > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:187) > at > > org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:485) > at > > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:469) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:708) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:533) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at > > org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1382) > at > > org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:974) > at > > org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$5(StreamTask.java:870) > at > > org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:94) > at > > org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:843) > at > > org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:803) > ... 12 mor > > 绝大部分是正常完成的,但是小部分比如上面的情况,就会失败,还会导致suspending-->restart. > > > > -- > Sent from: http://apache-flink.147419.n8.nabble.com/ > |
谢谢
我看了那个issue,有问题的是jdk 1.8_060版本的,我们用的是074版本的。 我测试环境尝试升级一下jdk到251版本。 -- Sent from: http://apache-flink.147419.n8.nabble.com/ |
In reply to this post by Congxian Qiu
|
我这边是老版本的jdk8,和jdk261没啥关系的
------------------ 原始邮件 ------------------ 发件人: "user-zh" <[hidden email]>; 发送时间: 2020年10月10日(星期六) 上午9:03 收件人: "user-zh"<[hidden email]>; 主题: Re: Flink 1.10.1 checkpoint失败问题 尝试了将jdk升级到了261,报错依然还有。 -- Sent from: http://apache-flink.147419.n8.nabble.com/ |
Hi, @Storm 请问你用的是 flink 是哪个版本,然后栈是什么呢?可以把相关性信息回复到这里,可以一起看看是啥问题
Best, Congxian 大森林 <[hidden email]> 于2020年10月10日周六 下午1:05写道: > 我这边是老版本的jdk8,和jdk261没啥关系的 > > > > > ------------------ 原始邮件 ------------------ > 发件人: > "user-zh" > < > [hidden email]>; > 发送时间: 2020年10月10日(星期六) 上午9:03 > 收件人: "user-zh"<[hidden email]>; > > 主题: Re: Flink 1.10.1 checkpoint失败问题 > > > > 尝试了将jdk升级到了261,报错依然还有。 > > > > -- > Sent from: http://apache-flink.147419.n8.nabble.com/ |
flink版本:Flink1.10.1
部署方式:flink on yarn hadoop版本:cdh5.15.2-2.6.0 现状:Checkpoint Counts Triggered: 9339In Progress: 0Completed: 8439Failed: 900Restored: 7 错误信息: ava.lang.Exception: Could not perform checkpoint 1194 for operator Map (3/3). at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:816) at org.apache.flink.streaming.runtime.io.CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:86) at org.apache.flink.streaming.runtime.io.CheckpointBarrierTracker.processBarrier(CheckpointBarrierTracker.java:99) at org.apache.flink.streaming.runtime.io.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:155) at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:133) at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:69) at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:310) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:187) at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:485) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:469) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:708) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:533) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1382) at org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:974) at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$5(StreamTask.java:870) at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:94) at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:843) at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:803) ... 12 more 同样的程序在11.2的版本上,chk是完全正常的。 -- Sent from: http://apache-flink.147419.n8.nabble.com/ |
在我们 1.10 版本的生产环境上这个问题也确实出现过,也有几个 issue 在讨论这个,比如: https://issues.apache.org/jira/browse/FLINK-18196 其中说了2个方法,曾经也试过: 1、是换 JDK 版本,这个没有试过,因为需要更新 NodeManeger 的 JDK,代价比较高; 2、重新 new 一个 CheckpointMetaData,通过修改这个,生产环境上确实没有出现过这个问题了,但是本质原因不太清楚。 希望这些可以帮助到你 Best, Hailong Wang 在 2020-10-13 18:04:11,"Storm☀️" <[hidden email]> 写道: >flink版本:Flink1.10.1 >部署方式:flink on yarn >hadoop版本:cdh5.15.2-2.6.0 >现状:Checkpoint Counts Triggered: 9339In Progress: 0Completed: 8439Failed: >900Restored: 7 >错误信息: >ava.lang.Exception: Could not perform checkpoint 1194 for operator Map >(3/3). > at >org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:816) > at >org.apache.flink.streaming.runtime.io.CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:86) > at >org.apache.flink.streaming.runtime.io.CheckpointBarrierTracker.processBarrier(CheckpointBarrierTracker.java:99) > at >org.apache.flink.streaming.runtime.io.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:155) > at >org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:133) > at >org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:69) > at >org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:310) > at >org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:187) > at >org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:485) > at >org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:469) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:708) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:533) > at java.lang.Thread.run(Thread.java:745) >Caused by: java.lang.NullPointerException > at >org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1382) > at >org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:974) > at >org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$5(StreamTask.java:870) > at >org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:94) > at >org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:843) > at >org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:803) > ... 12 more > > >同样的程序在11.2的版本上,chk是完全正常的。 > > > > > >-- >Sent from: http://apache-flink.147419.n8.nabble.com/ |
FYI 分享一个可能相关的文章[1]
[1] https://cloud.tencent.com/developer/news/564780 Best, Congxian Storm☀️ <[hidden email]> 于2020年10月15日周四 上午10:42写道: > 非常感谢。 > 后续我关注下这个问题,有结论反馈给大家,供参考。 > > > > -- > Sent from: http://apache-flink.147419.n8.nabble.com/ |
Free forum by Nabble | Edit this page |