flink on yarn启动失败

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

flink on yarn启动失败

magichuang
机器参数:三台  32C64G  centos  7.8,cdh集群在这上面先部署
flink版本:1.11.2,在三台集群上搭建的集群

hadoop集群是用cdh搭建的


启动命令:flink run -m yarn-cluster -ynm traffic -s 2 -p 2 -ytm 1024 -py traffic.py

程序使用pyflink开发的,从kafka读取数据,然后用滚动窗口聚合每分钟的数据在写入kafka




这个程序在local模式下是正常运行的,但是用per-job模式提交总是失败

测试官方例子  flink run -m yarn-cluster examples/batch/WordCount.jar   是可以输出结果的,所以想请教一下这个是yarn的问题还是程序的问题啊?




下面是主要报错信息

Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.client.JobExecutionException: Could not instantiate JobManager.

at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$6(Dispatcher.java:398) ~[flink-dist_2.12-1.11.2.jar:1.11.2]

at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) ~[?:1.8.0_202]

at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) ~[flink-dist_2.12-1.11.2.jar:1.11.2]

at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44) ~[flink-dist_2.12-1.11.2.jar:1.11.2]

... 4 more

Caused by: org.apache.flink.runtime.client.JobExecutionException: Could not instantiate JobManager.

at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$6(Dispatcher.java:398) ~[flink-dist_2.12-1.11.2.jar:1.11.2]

at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) ~[?:1.8.0_202]

at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) ~[flink-dist_2.12-1.11.2.jar:1.11.2]

at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44) ~[flink-dist_2.12-1.11.2.jar:1.11.2]

... 4 more

Caused by: java.io.FileNotFoundException: Cannot find checkpoint or savepoint file/directory '2' on file system 'file'.

at org.apache.flink.runtime.state.filesystem.AbstractFsCheckpointStorage.resolveCheckpointPointer(AbstractFsCheckpointStorage.java:243) ~[flink-dist_2.12-1.11.2.jar:1.11.2]

at org.apache.flink.runtime.state.filesystem.AbstractFsCheckpointStorage.resolveCheckpoint(AbstractFsCheckpointStorage.java:110) ~[flink-dist_2.12-1.11.2.jar:1.11.2]

at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreSavepoint(CheckpointCoordinator.java:1394) ~[flink-dist_2.12-1.11.2.jar:1.11.2]

at org.apache.flink.runtime.scheduler.SchedulerBase.tryRestoreExecutionGraphFromSavepoint(SchedulerBase.java:300) ~[flink-dist_2.12-1.11.2.jar:1.11.2]

at org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:253) ~[flink-dist_2.12-1.11.2.jar:1.11.2]

at org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:229) ~[flink-dist_2.12-1.11.2.jar:1.11.2]

at org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:119) ~[flink-dist_2.12-1.11.2.jar:1.11.2]

at org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:103) ~[flink-dist_2.12-1.11.2.jar:1.11.2]

at org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:284) ~[flink-dist_2.12-1.11.2.jar:1.11.2]

at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:272) ~[flink-dist_2.12-1.11.2.jar:1.11.2]

at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:98) ~[flink-dist_2.12-1.11.2.jar:1.11.2]

at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:40) ~[flink-dist_2.12-1.11.2.jar:1.11.2]

at org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl.<init>(JobManagerRunnerImpl.java:140) ~[flink-dist_2.12-1.11.2.jar:1.11.2]

at org.apache.flink.runtime.dispatcher.DefaultJobManagerRunnerFactory.createJobManagerRunner(DefaultJobManagerRunnerFactory.java:84) ~[flink-dist_2.12-1.11.2.jar:1.11.2]

at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$6(Dispatcher.java:388) ~[flink-dist_2.12-1.11.2.jar:1.11.2]

at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) ~[?:1.8.0_202]

at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) ~[flink-dist_2.12-1.11.2.jar:1.11.2]

at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44) ~[flink-dist_2.12-1.11.2.jar:1.11.2]

... 4 more

2020-12-23 20:12:46,459 INFO org.apache.flink.runtime.blob.BlobServer [] - Stopped BLOB server at 0.0.0.0:16109







全部日志可以打开下面的链接:https://note.youdao.com/ynoteshare1/index.html?id=25f1af945e277057c2251e8f60d90f8a&type=note
加载可能慢一些,请稍等一会就出来了~













Best,

MagicHuang





Reply | Threaded
Open this post in threaded view
|

Re: flink on yarn启动失败

Yang Wang
你这个命令写的有点问题,flink run -m yarn-cluster -ynm traffic -s 2 -p 2 -ytm 1024 -py
traffic.py

应该是-ys,而不是-s
-s是从savepoints恢复,所以报错里面会有找不到savepoints目录


Best,
Yang

magichuang <[hidden email]> 于2020年12月23日周三 下午8:29写道:

> 机器参数:三台  32C64G  centos  7.8,cdh集群在这上面先部署
> flink版本:1.11.2,在三台集群上搭建的集群
>
> hadoop集群是用cdh搭建的
>
>
> 启动命令:flink run -m yarn-cluster -ynm traffic -s 2 -p 2 -ytm 1024 -py
> traffic.py
>
> 程序使用pyflink开发的,从kafka读取数据,然后用滚动窗口聚合每分钟的数据在写入kafka
>
>
>
>
> 这个程序在local模式下是正常运行的,但是用per-job模式提交总是失败
>
> 测试官方例子  flink run -m yarn-cluster examples/batch/WordCount.jar
>  是可以输出结果的,所以想请教一下这个是yarn的问题还是程序的问题啊?
>
>
>
>
> 下面是主要报错信息
>
> Caused by: java.util.concurrent.CompletionException:
> org.apache.flink.runtime.client.JobExecutionException: Could not
> instantiate JobManager.
>
> at
> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$6(Dispatcher.java:398)
> ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>
> at
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> ~[?:1.8.0_202]
>
> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
> ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>
> at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
> ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>
> ... 4 more
>
> Caused by: org.apache.flink.runtime.client.JobExecutionException: Could
> not instantiate JobManager.
>
> at
> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$6(Dispatcher.java:398)
> ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>
> at
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> ~[?:1.8.0_202]
>
> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
> ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>
> at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
> ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>
> ... 4 more
>
> Caused by: java.io.FileNotFoundException: Cannot find checkpoint or
> savepoint file/directory '2' on file system 'file'.
>
> at
> org.apache.flink.runtime.state.filesystem.AbstractFsCheckpointStorage.resolveCheckpointPointer(AbstractFsCheckpointStorage.java:243)
> ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>
> at
> org.apache.flink.runtime.state.filesystem.AbstractFsCheckpointStorage.resolveCheckpoint(AbstractFsCheckpointStorage.java:110)
> ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>
> at
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreSavepoint(CheckpointCoordinator.java:1394)
> ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>
> at
> org.apache.flink.runtime.scheduler.SchedulerBase.tryRestoreExecutionGraphFromSavepoint(SchedulerBase.java:300)
> ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>
> at
> org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:253)
> ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>
> at
> org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:229)
> ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>
> at
> org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:119)
> ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>
> at
> org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:103)
> ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>
> at
> org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:284)
> ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>
> at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:272)
> ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>
> at
> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:98)
> ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>
> at
> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:40)
> ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>
> at
> org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl.<init>(JobManagerRunnerImpl.java:140)
> ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>
> at
> org.apache.flink.runtime.dispatcher.DefaultJobManagerRunnerFactory.createJobManagerRunner(DefaultJobManagerRunnerFactory.java:84)
> ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>
> at
> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$6(Dispatcher.java:388)
> ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>
> at
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> ~[?:1.8.0_202]
>
> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
> ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>
> at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
> ~[flink-dist_2.12-1.11.2.jar:1.11.2]
>
> ... 4 more
>
> 2020-12-23 20:12:46,459 INFO org.apache.flink.runtime.blob.BlobServer [] -
> Stopped BLOB server at 0.0.0.0:16109
>
>
>
>
>
>
>
> 全部日志可以打开下面的链接:
> https://note.youdao.com/ynoteshare1/index.html?id=25f1af945e277057c2251e8f60d90f8a&type=note
> 加载可能慢一些,请稍等一会就出来了~
>
>
>
>
>
>
>
>
>
>
>
>
>
> Best,
>
> MagicHuang
>
>
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Re: flink on yarn启动失败

magichuang
感谢感谢感谢!!!

原来是这样,以为solt 缩写就是-s了,,,感谢这位朋友的解答,已经可以提交了~


> ------------------ 原始邮件 ------------------
> 发 件 人:"Yang Wang" <[hidden email]>
> 发送时间:2020-12-24 11:01:46
> 收 件 人:user-zh <[hidden email]>
> 抄 送:
> 主 题:Re: flink on yarn启动失败
>
> 你这个命令写的有点问题,flink run -m yarn-cluster -ynm traffic -s 2 -p 2 -ytm 1024 -py
> traffic.py
>
> 应该是-ys,而不是-s
> -s是从savepoints恢复,所以报错里面会有找不到savepoints目录
>
>
> Best,
> Yang
>
> magichuang 于2020年12月23日周三 下午8:29写道:
>
> > 机器参数:三台 32C64G centos 7.8,cdh集群在这上面先部署
> > flink版本:1.11.2,在三台集群上搭建的集群
> >
> > hadoop集群是用cdh搭建的
> >
> >
> > 启动命令:flink run -m yarn-cluster -ynm traffic -s 2 -p 2 -ytm 1024 -py
> > traffic.py
> >
> > 程序使用pyflink开发的,从kafka读取数据,然后用滚动窗口聚合每分钟的数据在写入kafka
> >
> >
> >
> >
> > 这个程序在local模式下是正常运行的,但是用per-job模式提交总是失败
> >
> > 测试官方例子 flink run -m yarn-cluster examples/batch/WordCount.jar
> > 是可以输出结果的,所以想请教一下这个是yarn的问题还是程序的问题啊?
> >
> >
> >
> >
> > 下面是主要报错信息
> >
> > Caused by: java.util.concurrent.CompletionException:
> > org.apache.flink.runtime.client.JobExecutionException: Could not
> > instantiate JobManager.
> >
> > at
> > org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$6(Dispatcher.java:398)
> > ~[flink-dist_2.12-1.11.2.jar:1.11.2]
> >
> > at
> > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> > ~[?:1.8.0_202]
> >
> > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
> > ~[flink-dist_2.12-1.11.2.jar:1.11.2]
> >
> > at
> > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
> > ~[flink-dist_2.12-1.11.2.jar:1.11.2]
> >
> > ... 4 more
> >
> > Caused by: org.apache.flink.runtime.client.JobExecutionException: Could
> > not instantiate JobManager.
> >
> > at
> > org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$6(Dispatcher.java:398)
> > ~[flink-dist_2.12-1.11.2.jar:1.11.2]
> >
> > at
> > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> > ~[?:1.8.0_202]
> >
> > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
> > ~[flink-dist_2.12-1.11.2.jar:1.11.2]
> >
> > at
> > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
> > ~[flink-dist_2.12-1.11.2.jar:1.11.2]
> >
> > ... 4 more
> >
> > Caused by: java.io.FileNotFoundException: Cannot find checkpoint or
> > savepoint file/directory '2' on file system 'file'.
> >
> > at
> > org.apache.flink.runtime.state.filesystem.AbstractFsCheckpointStorage.resolveCheckpointPointer(AbstractFsCheckpointStorage.java:243)
> > ~[flink-dist_2.12-1.11.2.jar:1.11.2]
> >
> > at
> > org.apache.flink.runtime.state.filesystem.AbstractFsCheckpointStorage.resolveCheckpoint(AbstractFsCheckpointStorage.java:110)
> > ~[flink-dist_2.12-1.11.2.jar:1.11.2]
> >
> > at
> > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreSavepoint(CheckpointCoordinator.java:1394)
> > ~[flink-dist_2.12-1.11.2.jar:1.11.2]
> >
> > at
> > org.apache.flink.runtime.scheduler.SchedulerBase.tryRestoreExecutionGraphFromSavepoint(SchedulerBase.java:300)
> > ~[flink-dist_2.12-1.11.2.jar:1.11.2]
> >
> > at
> > org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:253)
> > ~[flink-dist_2.12-1.11.2.jar:1.11.2]
> >
> > at
> > org.apache.flink.runtime.scheduler.SchedulerBase.(SchedulerBase.java:229)
> > ~[flink-dist_2.12-1.11.2.jar:1.11.2]
> >
> > at
> > org.apache.flink.runtime.scheduler.DefaultScheduler.(DefaultScheduler.java:119)
> > ~[flink-dist_2.12-1.11.2.jar:1.11.2]
> >
> > at
> > org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:103)
> > ~[flink-dist_2.12-1.11.2.jar:1.11.2]
> >
> > at
> > org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:284)
> > ~[flink-dist_2.12-1.11.2.jar:1.11.2]
> >
> > at org.apache.flink.runtime.jobmaster.JobMaster.(JobMaster.java:272)
> > ~[flink-dist_2.12-1.11.2.jar:1.11.2]
> >
> > at
> > org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:98)
> > ~[flink-dist_2.12-1.11.2.jar:1.11.2]
> >
> > at
> > org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:40)
> > ~[flink-dist_2.12-1.11.2.jar:1.11.2]
> >
> > at
> > org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl.(JobManagerRunnerImpl.java:140)
> > ~[flink-dist_2.12-1.11.2.jar:1.11.2]
> >
> > at
> > org.apache.flink.runtime.dispatcher.DefaultJobManagerRunnerFactory.createJobManagerRunner(DefaultJobManagerRunnerFactory.java:84)
> > ~[flink-dist_2.12-1.11.2.jar:1.11.2]
> >
> > at
> > org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$6(Dispatcher.java:388)
> > ~[flink-dist_2.12-1.11.2.jar:1.11.2]
> >
> > at
> > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> > ~[?:1.8.0_202]
> >
> > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
> > ~[flink-dist_2.12-1.11.2.jar:1.11.2]
> >
> > at
> > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
> > ~[flink-dist_2.12-1.11.2.jar:1.11.2]
> >
> > ... 4 more
> >
> > 2020-12-23 20:12:46,459 INFO org.apache.flink.runtime.blob.BlobServer [] -
> > Stopped BLOB server at 0.0.0.0:16109
> >
> >
> >
> >
> >
> >
> >
> > 全部日志可以打开下面的链接:
> > https://note.youdao.com/ynoteshare1/index.html?id=25f1af945e277057c2251e8f60d90f8a&type=note
> > 加载可能慢一些,请稍等一会就出来了~
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Best,
> >
> > MagicHuang
> >
> >
> >
> >
> >
> >



--

Best,

MagicHuang