大家遇到过这个错误吗, CheckpointOperation.executeCheckpointing 的时候报 NullPointerException
java.lang.Exception: Couldnot perform checkpoint 5505for operator Source: KafkaTableSource(xxx) -> SourceConversion(table=[xxx, source: [KafkaTableSource(xxx)]], fields=[xxx]) -> Calc(select=[xxx) AS xxx]) -> SinkConversionToTuple2 -> Sink: Elasticsearch6UpsertTableSink(xxx) (1/1). at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:802) at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$triggerCheckpointAsync$3(StreamTask.java:777) at org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$228/1024478318.call(UnknownSource) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.run(StreamTaskActionExecutor.java:87) at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:78) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:261) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:186) at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:487) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:470) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532) at java.lang.Thread.run(Thread.java:745) Causedby: java.lang.NullPointerException at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1411) at org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:991) at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$5(StreamTask.java:887) at org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$229/1010499540.run(UnknownSource) at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:94) at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:860) at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:793) ... 12 more |
Hi
这个NPE有点奇怪,从executeCheckpointing方法[1]里面其实比较难定位究竟是哪一个变量或者变量的取值是null。 一种排查思路是打开 org.apache.flink.streaming.runtime.tasks 的DEBUG level日志,通过debug日志缩小范围,判断哪个变量是null 这个异常出现的时候,相关task上面的日志有什么异常么,触发这个NPE的条件是什么,稳定复现么? [1] https://github.com/apache/flink/blob/aa4eb8f0c9ce74e6b92c3d9be5dc8e8cb536239d/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L1349 祝好 唐云 ________________________________ From: chenkaibit <[hidden email]> Sent: Monday, April 20, 2020 18:39 To: [hidden email] <[hidden email]> Subject: flink-1.10 checkpoint 偶尔报 NullPointerException 大家遇到过这个错误吗, CheckpointOperation.executeCheckpointing 的时候报 NullPointerException java.lang.Exception: Couldnot perform checkpoint 5505for operator Source: KafkaTableSource(xxx) -> SourceConversion(table=[xxx, source: [KafkaTableSource(xxx)]], fields=[xxx]) -> Calc(select=[xxx) AS xxx]) -> SinkConversionToTuple2 -> Sink: Elasticsearch6UpsertTableSink(xxx) (1/1). at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:802) at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$triggerCheckpointAsync$3(StreamTask.java:777) at org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$228/1024478318.call(UnknownSource) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.run(StreamTaskActionExecutor.java:87) at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:78) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:261) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:186) at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:487) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:470) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532) at java.lang.Thread.run(Thread.java:745) Causedby: java.lang.NullPointerException at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1411) at org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:991) at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$5(StreamTask.java:887) at org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$229/1010499540.run(UnknownSource) at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:94) at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:860) at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:793) ... 12 more |
这个不是稳定复现的,但是在最近 1.10 上测试的几个作业出现了,触发时也没有其他报错。我加了一些日志,再观察下 在 2020-04-21 01:12:48,"Yun Tang" <[hidden email]> 写道: >Hi > >这个NPE有点奇怪,从executeCheckpointing方法[1]里面其实比较难定位究竟是哪一个变量或者变量的取值是null。 >一种排查思路是打开 org.apache.flink.streaming.runtime.tasks 的DEBUG level日志,通过debug日志缩小范围,判断哪个变量是null > >这个异常出现的时候,相关task上面的日志有什么异常么,触发这个NPE的条件是什么,稳定复现么? > >[1] https://github.com/apache/flink/blob/aa4eb8f0c9ce74e6b92c3d9be5dc8e8cb536239d/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L1349 > >祝好 >唐云 > >________________________________ >From: chenkaibit <[hidden email]> >Sent: Monday, April 20, 2020 18:39 >To: [hidden email] <[hidden email]> >Subject: flink-1.10 checkpoint 偶尔报 NullPointerException > >大家遇到过这个错误吗, CheckpointOperation.executeCheckpointing 的时候报 NullPointerException >java.lang.Exception: Couldnot perform checkpoint 5505for operator Source: KafkaTableSource(xxx) -> SourceConversion(table=[xxx, source: [KafkaTableSource(xxx)]], fields=[xxx]) -> Calc(select=[xxx) AS xxx]) -> SinkConversionToTuple2 -> Sink: Elasticsearch6UpsertTableSink(xxx) (1/1). > > at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:802) > > at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$triggerCheckpointAsync$3(StreamTask.java:777) > > at org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$228/1024478318.call(UnknownSource) > > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > > at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.run(StreamTaskActionExecutor.java:87) > > at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:78) > > at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:261) > > at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:186) > > at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:487) > > at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:470) > > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707) > > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532) > > at java.lang.Thread.run(Thread.java:745) > >Causedby: java.lang.NullPointerException > > at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1411) > > at org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:991) > > at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$5(StreamTask.java:887) > > at org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$229/1010499540.run(UnknownSource) > > at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:94) > > at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:860) > > at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:793) > > ... 12 more |
Hi:
加了一些日志后发现是 checkpointMetaData 为 NULL 了 https://github.com/apache/flink/blob/release-1.10.0/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L1421 测试程序为读 kafka,然后进行 wordcount,结果写入 kafka。checkpoint 配置如下: | Checkpointing Mode | Exactly Once | | Interval | 5s | | Timeout | 10m 0s | | Minimum Pause Between Checkpoints | 0ms | | Maximum Concurrent Checkpoints | 1 | 稳定在第 5377 个 checkpoint 抛出 NPE 虽然原因还不清楚,但是修改了部分代码(见 https://github.com/yuchuanchen/flink/commit/e5122d9787be1fee9bce141887e0d70c9b0a4f19)后不再出现 NPE 了。 在 2020-04-21 10:21:56,"chenkaibit" <[hidden email]> 写道: > > > >这个不是稳定复现的,但是在最近 1.10 上测试的几个作业出现了,触发时也没有其他报错。我加了一些日志,再观察下 > > > > >在 2020-04-21 01:12:48,"Yun Tang" <[hidden email]> 写道: >>Hi >> >>这个NPE有点奇怪,从executeCheckpointing方法[1]里面其实比较难定位究竟是哪一个变量或者变量的取值是null。 >>一种排查思路是打开 org.apache.flink.streaming.runtime.tasks 的DEBUG level日志,通过debug日志缩小范围,判断哪个变量是null >> >>这个异常出现的时候,相关task上面的日志有什么异常么,触发这个NPE的条件是什么,稳定复现么? >> >>[1] https://github.com/apache/flink/blob/aa4eb8f0c9ce74e6b92c3d9be5dc8e8cb536239d/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L1349 >> >>祝好 >>唐云 >> >>________________________________ >>From: chenkaibit <[hidden email]> >>Sent: Monday, April 20, 2020 18:39 >>To: [hidden email] <[hidden email]> >>Subject: flink-1.10 checkpoint 偶尔报 NullPointerException >> >>大家遇到过这个错误吗, CheckpointOperation.executeCheckpointing 的时候报 NullPointerException >>java.lang.Exception: Couldnot perform checkpoint 5505for operator Source: KafkaTableSource(xxx) -> SourceConversion(table=[xxx, source: [KafkaTableSource(xxx)]], fields=[xxx]) -> Calc(select=[xxx) AS xxx]) -> SinkConversionToTuple2 -> Sink: Elasticsearch6UpsertTableSink(xxx) (1/1). >> >> at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:802) >> >> at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$triggerCheckpointAsync$3(StreamTask.java:777) >> >> at org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$228/1024478318.call(UnknownSource) >> >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> >> at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.run(StreamTaskActionExecutor.java:87) >> >> at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:78) >> >> at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:261) >> >> at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:186) >> >> at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:487) >> >> at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:470) >> >> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707) >> >> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532) >> >> at java.lang.Thread.run(Thread.java:745) >> >>Causedby: java.lang.NullPointerException >> >> at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1411) >> >> at org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:991) >> >> at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$5(StreamTask.java:887) >> >> at org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$229/1010499540.run(UnknownSource) >> >> at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:94) >> >> at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:860) >> >> at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:793) >> >> ... 12 more |
你这样改没什么用吧,如果checkpointMetaData为空还是会报错吧 在2020年05月09日 12:09,chenkaibit 写道: Hi: 加了一些日志后发现是 checkpointMetaData 为 NULL 了 https://github.com/apache/flink/blob/release-1.10.0/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L1421 测试程序为读 kafka,然后进行 wordcount,结果写入 kafka。checkpoint 配置如下: | Checkpointing Mode | Exactly Once | | Interval | 5s | | Timeout | 10m 0s | | Minimum Pause Between Checkpoints | 0ms | | Maximum Concurrent Checkpoints | 1 | 稳定在第 5377 个 checkpoint 抛出 NPE 虽然原因还不清楚,但是修改了部分代码(见 https://github.com/yuchuanchen/flink/commit/e5122d9787be1fee9bce141887e0d70c9b0a4f19)后不再出现 NPE 了。 在 2020-04-21 10:21:56,"chenkaibit" <[hidden email]> 写道: > > > >这个不是稳定复现的,但是在最近 1.10 上测试的几个作业出现了,触发时也没有其他报错。我加了一些日志,再观察下 > > > > >在 2020-04-21 01:12:48,"Yun Tang" <[hidden email]> 写道: >>Hi >> >>这个NPE有点奇怪,从executeCheckpointing方法[1]里面其实比较难定位究竟是哪一个变量或者变量的取值是null。 >>一种排查思路是打开 org.apache.flink.streaming.runtime.tasks 的DEBUG level日志,通过debug日志缩小范围,判断哪个变量是null >> >>这个异常出现的时候,相关task上面的日志有什么异常么,触发这个NPE的条件是什么,稳定复现么? >> >>[1] https://github.com/apache/flink/blob/aa4eb8f0c9ce74e6b92c3d9be5dc8e8cb536239d/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L1349 >> >>祝好 >>唐云 >> >>________________________________ >>From: chenkaibit <[hidden email]> >>Sent: Monday, April 20, 2020 18:39 >>To: [hidden email] <[hidden email]> >>Subject: flink-1.10 checkpoint 偶尔报 NullPointerException >> >>大家遇到过这个错误吗, CheckpointOperation.executeCheckpointing 的时候报 NullPointerException >>java.lang.Exception: Couldnot perform checkpoint 5505for operator Source: KafkaTableSource(xxx) -> SourceConversion(table=[xxx, source: [KafkaTableSource(xxx)]], fields=[xxx]) -> Calc(select=[xxx) AS xxx]) -> SinkConversionToTuple2 -> Sink: Elasticsearch6UpsertTableSink(xxx) (1/1). >> >> at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:802) >> >> at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$triggerCheckpointAsync$3(StreamTask.java:777) >> >> at org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$228/1024478318.call(UnknownSource) >> >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> >> at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.run(StreamTaskActionExecutor.java:87) >> >> at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:78) >> >> at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:261) >> >> at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:186) >> >> at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:487) >> >> at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:470) >> >> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707) >> >> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532) >> >> at java.lang.Thread.run(Thread.java:745) >> >>Causedby: java.lang.NullPointerException >> >> at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1411) >> >> at org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:991) >> >> at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$5(StreamTask.java:887) >> >> at org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$229/1010499540.run(UnknownSource) >> >> at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:94) >> >> at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:860) >> >> at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:793) >> >> ... 12 more faaron zheng 邮箱:[hidden email] 签名由 网易邮箱大师 定制
|
Free forum by Nabble | Edit this page |