hi,这个问题我也遇到了,这个问题的根本原因是啥呢?
------------------ 原始邮件 ------------------ 发件人: "chenkaibit"<[hidden email]>; 发送时间: 2020年5月9日(星期六) 中午12:09 收件人: "user-zh"<[hidden email]>; 主题: Re:Re:Re: flink-1.10 checkpoint 偶尔报 NullPointerException Hi: 加了一些日志后发现是 checkpointMetaData 为 NULL 了 https://github.com/apache/flink/blob/release-1.10.0/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L1421 测试程序为读 kafka,然后进行 wordcount,结果写入 kafka。checkpoint 配置如下: | Checkpointing Mode | Exactly Once | | Interval | 5s | | Timeout | 10m 0s | | Minimum Pause Between Checkpoints | 0ms | | Maximum Concurrent Checkpoints | 1 | 稳定在第 5377 个 checkpoint 抛出 NPE 虽然原因还不清楚,但是修改了部分代码(见 https://github.com/yuchuanchen/flink/commit/e5122d9787be1fee9bce141887e0d70c9b0a4f19)后不再出现 NPE 了。 在 2020-04-21 10:21:56,"chenkaibit" <[hidden email]> 写道: > > > >这个不是稳定复现的,但是在最近 1.10 上测试的几个作业出现了,触发时也没有其他报错。我加了一些日志,再观察下 > > > > >在 2020-04-21 01:12:48,"Yun Tang" <[hidden email]> 写道: >>Hi >> >>这个NPE有点奇怪,从executeCheckpointing方法[1]里面其实比较难定位究竟是哪一个变量或者变量的取值是null。 >>一种排查思路是打开 org.apache.flink.streaming.runtime.tasks 的DEBUG level日志,通过debug日志缩小范围,判断哪个变量是null >> >>这个异常出现的时候,相关task上面的日志有什么异常么,触发这个NPE的条件是什么,稳定复现么? >> >>[1] https://github.com/apache/flink/blob/aa4eb8f0c9ce74e6b92c3d9be5dc8e8cb536239d/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L1349 >> >>祝好 >>唐云 >> >>________________________________ >>From: chenkaibit <[hidden email]> >>Sent: Monday, April 20, 2020 18:39 >>To: [hidden email] <[hidden email]> >>Subject: flink-1.10 checkpoint 偶尔报 NullPointerException >> >>大家遇到过这个错误吗, CheckpointOperation.executeCheckpointing 的时候报 NullPointerException >>java.lang.Exception: Couldnot perform checkpoint 5505for operator Source: KafkaTableSource(xxx) -> SourceConversion(table=[xxx, source: [KafkaTableSource(xxx)]], fields=[xxx]) -> Calc(select=[xxx) AS xxx]) -> SinkConversionToTuple2 -> Sink: Elasticsearch6UpsertTableSink(xxx) (1/1). >> >> at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:802) >> >> at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$triggerCheckpointAsync$3(StreamTask.java:777) >> >> at org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$228/1024478318.call(UnknownSource) >> >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> >> at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.run(StreamTaskActionExecutor.java:87) >> >> at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:78) >> >> at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:261) >> >> at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:186) >> >> at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:487) >> >> at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:470) >> >> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707) >> >> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532) >> >> at java.lang.Thread.run(Thread.java:745) >> >>Causedby: java.lang.NullPointerException >> >> at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1411) >> >> at org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:991) >> >> at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$5(StreamTask.java:887) >> >> at org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$229/1010499540.run(UnknownSource) >> >> at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:94) >> >> at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:860) >> >> at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:793) >> >> ... 12 more |
Hi
这个问题看上去是特定 JDK 版本上,某些写法下对象被提前回收了,猜测和 gc 有关。之前看到一个可能相关的帖子[1] [1] https://cloud.tencent.com/developer/news/564780 Best, Congxian 蒋佳成(Jiacheng Jiang) <[hidden email]> 于2020年11月4日周三 上午11:33写道: > hi,这个问题我也遇到了,这个问题的根本原因是啥呢? > > > > ------------------ 原始邮件 ------------------ > 发件人: "chenkaibit"<[hidden email]>; > 发送时间: 2020年5月9日(星期六) 中午12:09 > 收件人: "user-zh"<[hidden email]>; > 主题: Re:Re:Re: flink-1.10 checkpoint 偶尔报 NullPointerException > > > > Hi: > 加了一些日志后发现是 checkpointMetaData 为 NULL 了 > https://github.com/apache/flink/blob/release-1.10.0/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L1421 > 测试程序为读 kafka,然后进行 wordcount,结果写入 kafka。checkpoint 配置如下: > | Checkpointing Mode | Exactly Once | > | Interval | 5s | > | Timeout | 10m 0s | > > | Minimum Pause Between Checkpoints | 0ms | > | Maximum Concurrent Checkpoints | 1 | > > > 稳定在第 5377 个 checkpoint 抛出 NPE > > > 虽然原因还不清楚,但是修改了部分代码(见 > https://github.com/yuchuanchen/flink/commit/e5122d9787be1fee9bce141887e0d70c9b0a4f19 > )后不再出现 NPE 了。 > > > 在 2020-04-21 10:21:56,"chenkaibit" < > [hidden email]> 写道: > > > > > > > >这个不是稳定复现的,但是在最近 1.10 上测试的几个作业出现了,触发时也没有其他报错。我加了一些日志,再观察下 > > > > > > > > > >在 2020-04-21 01:12:48,"Yun Tang" < > [hidden email]> 写道: > >>Hi > >> > >>这个NPE有点奇怪,从executeCheckpointing方法[1]里面其实比较难定位究竟是哪一个变量或者变量的取值是null。 > > >>一种排查思路是打开 org.apache.flink.streaming.runtime.tasks 的DEBUG level日志,通过debug日志缩小范围,判断哪个变量是null > >> > >>这个异常出现的时候,相关task上面的日志有什么异常么,触发这个NPE的条件是什么,稳定复现么? > >> > >>[1] > https://github.com/apache/flink/blob/aa4eb8f0c9ce74e6b92c3d9be5dc8e8cb536239d/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L1349 > >> > <https://github.com/apache/flink/blob/aa4eb8f0c9ce74e6b92c3d9be5dc8e8cb536239d/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L1349>>> > ; > >>祝好 > >>唐云 > >> > >>________________________________ > >>From: chenkaibit <[hidden email]> > >>Sent: Monday, April 20, 2020 18:39 > >>To: [hidden email] <[hidden email] > > > > >>Subject: flink-1.10 checkpoint 偶尔报 NullPointerException > >> > > >>大家遇到过这个错误吗, CheckpointOperation.executeCheckpointing 的时候报 NullPointerException > > >>java.lang.Exception: Couldnot perform checkpoint 5505for operator Source: KafkaTableSource(xxx) -> SourceConversion(table=[xxx, source: [KafkaTableSource(xxx)]], fields=[xxx]) -> Calc(select=[xxx) AS xxx]) -> SinkConversionToTuple2 -> Sink: Elasticsearch6UpsertTableSink(xxx) (1/1). > >> > > >> at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:802) > >> > > >> at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$triggerCheckpointAsync$3(StreamTask.java:777) > >> > > >> at org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$228/1024478318.call(UnknownSource) > >> > > >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) > >> > > >> at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.run(StreamTaskActionExecutor.java:87) > >> > > >> at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:78) > >> > > >> at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:261) > >> > > >> at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:186) > >> > > >> at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:487) > >> > > >> at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:470) > >> > > >> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707) > >> > > >> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532) > >> > > >> at java.lang.Thread.run(Thread.java:745) > >> > >>Causedby: java.lang.NullPointerException > >> > > >> at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1411) > >> > > >> at org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:991) > >> > > >> at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$5(StreamTask.java:887) > >> > > >> at org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$229/1010499540.run(UnknownSource) > >> > > >> at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:94) > >> > > >> at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:860) > >> > > >> at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:793) > >> > >> ... 12 more |
Free forum by Nabble | Edit this page |