机器参数:三台 32C64G centos 7.8,cdh集群在这上面先部署
flink版本:1.11.2,在三台集群上搭建的集群 hadoop集群是用cdh搭建的 启动命令:flink run -m yarn-cluster -ynm traffic -s 2 -p 2 -ytm 1024 -py traffic.py 程序使用pyflink开发的,从kafka读取数据,然后用滚动窗口聚合每分钟的数据在写入kafka 这个程序在local模式下是正常运行的,但是用per-job模式提交总是失败 测试官方例子 flink run -m yarn-cluster examples/batch/WordCount.jar 是可以输出结果的,所以想请教一下这个是yarn的问题还是程序的问题啊? 下面是主要报错信息 Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.client.JobExecutionException: Could not instantiate JobManager. at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$6(Dispatcher.java:398) ~[flink-dist_2.12-1.11.2.jar:1.11.2] at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) ~[?:1.8.0_202] at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) ~[flink-dist_2.12-1.11.2.jar:1.11.2] at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44) ~[flink-dist_2.12-1.11.2.jar:1.11.2] ... 4 more Caused by: org.apache.flink.runtime.client.JobExecutionException: Could not instantiate JobManager. at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$6(Dispatcher.java:398) ~[flink-dist_2.12-1.11.2.jar:1.11.2] at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) ~[?:1.8.0_202] at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) ~[flink-dist_2.12-1.11.2.jar:1.11.2] at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44) ~[flink-dist_2.12-1.11.2.jar:1.11.2] ... 4 more Caused by: java.io.FileNotFoundException: Cannot find checkpoint or savepoint file/directory '2' on file system 'file'. at org.apache.flink.runtime.state.filesystem.AbstractFsCheckpointStorage.resolveCheckpointPointer(AbstractFsCheckpointStorage.java:243) ~[flink-dist_2.12-1.11.2.jar:1.11.2] at org.apache.flink.runtime.state.filesystem.AbstractFsCheckpointStorage.resolveCheckpoint(AbstractFsCheckpointStorage.java:110) ~[flink-dist_2.12-1.11.2.jar:1.11.2] at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreSavepoint(CheckpointCoordinator.java:1394) ~[flink-dist_2.12-1.11.2.jar:1.11.2] at org.apache.flink.runtime.scheduler.SchedulerBase.tryRestoreExecutionGraphFromSavepoint(SchedulerBase.java:300) ~[flink-dist_2.12-1.11.2.jar:1.11.2] at org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:253) ~[flink-dist_2.12-1.11.2.jar:1.11.2] at org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:229) ~[flink-dist_2.12-1.11.2.jar:1.11.2] at org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:119) ~[flink-dist_2.12-1.11.2.jar:1.11.2] at org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:103) ~[flink-dist_2.12-1.11.2.jar:1.11.2] at org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:284) ~[flink-dist_2.12-1.11.2.jar:1.11.2] at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:272) ~[flink-dist_2.12-1.11.2.jar:1.11.2] at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:98) ~[flink-dist_2.12-1.11.2.jar:1.11.2] at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:40) ~[flink-dist_2.12-1.11.2.jar:1.11.2] at org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl.<init>(JobManagerRunnerImpl.java:140) ~[flink-dist_2.12-1.11.2.jar:1.11.2] at org.apache.flink.runtime.dispatcher.DefaultJobManagerRunnerFactory.createJobManagerRunner(DefaultJobManagerRunnerFactory.java:84) ~[flink-dist_2.12-1.11.2.jar:1.11.2] at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$6(Dispatcher.java:388) ~[flink-dist_2.12-1.11.2.jar:1.11.2] at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) ~[?:1.8.0_202] at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) ~[flink-dist_2.12-1.11.2.jar:1.11.2] at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44) ~[flink-dist_2.12-1.11.2.jar:1.11.2] ... 4 more 2020-12-23 20:12:46,459 INFO org.apache.flink.runtime.blob.BlobServer [] - Stopped BLOB server at 0.0.0.0:16109 全部日志可以打开下面的链接:https://note.youdao.com/ynoteshare1/index.html?id=25f1af945e277057c2251e8f60d90f8a&type=note 加载可能慢一些,请稍等一会就出来了~ Best, MagicHuang |
你这个命令写的有点问题,flink run -m yarn-cluster -ynm traffic -s 2 -p 2 -ytm 1024 -py
traffic.py 应该是-ys,而不是-s -s是从savepoints恢复,所以报错里面会有找不到savepoints目录 Best, Yang magichuang <[hidden email]> 于2020年12月23日周三 下午8:29写道: > 机器参数:三台 32C64G centos 7.8,cdh集群在这上面先部署 > flink版本:1.11.2,在三台集群上搭建的集群 > > hadoop集群是用cdh搭建的 > > > 启动命令:flink run -m yarn-cluster -ynm traffic -s 2 -p 2 -ytm 1024 -py > traffic.py > > 程序使用pyflink开发的,从kafka读取数据,然后用滚动窗口聚合每分钟的数据在写入kafka > > > > > 这个程序在local模式下是正常运行的,但是用per-job模式提交总是失败 > > 测试官方例子 flink run -m yarn-cluster examples/batch/WordCount.jar > 是可以输出结果的,所以想请教一下这个是yarn的问题还是程序的问题啊? > > > > > 下面是主要报错信息 > > Caused by: java.util.concurrent.CompletionException: > org.apache.flink.runtime.client.JobExecutionException: Could not > instantiate JobManager. > > at > org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$6(Dispatcher.java:398) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > ~[?:1.8.0_202] > > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > ... 4 more > > Caused by: org.apache.flink.runtime.client.JobExecutionException: Could > not instantiate JobManager. > > at > org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$6(Dispatcher.java:398) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > ~[?:1.8.0_202] > > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > ... 4 more > > Caused by: java.io.FileNotFoundException: Cannot find checkpoint or > savepoint file/directory '2' on file system 'file'. > > at > org.apache.flink.runtime.state.filesystem.AbstractFsCheckpointStorage.resolveCheckpointPointer(AbstractFsCheckpointStorage.java:243) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > at > org.apache.flink.runtime.state.filesystem.AbstractFsCheckpointStorage.resolveCheckpoint(AbstractFsCheckpointStorage.java:110) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreSavepoint(CheckpointCoordinator.java:1394) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > at > org.apache.flink.runtime.scheduler.SchedulerBase.tryRestoreExecutionGraphFromSavepoint(SchedulerBase.java:300) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > at > org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:253) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > at > org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:229) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > at > org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:119) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > at > org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:103) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > at > org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:284) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:272) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > at > org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:98) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > at > org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:40) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > at > org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl.<init>(JobManagerRunnerImpl.java:140) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > at > org.apache.flink.runtime.dispatcher.DefaultJobManagerRunnerFactory.createJobManagerRunner(DefaultJobManagerRunnerFactory.java:84) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > at > org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$6(Dispatcher.java:388) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > ~[?:1.8.0_202] > > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > ... 4 more > > 2020-12-23 20:12:46,459 INFO org.apache.flink.runtime.blob.BlobServer [] - > Stopped BLOB server at 0.0.0.0:16109 > > > > > > > > 全部日志可以打开下面的链接: > https://note.youdao.com/ynoteshare1/index.html?id=25f1af945e277057c2251e8f60d90f8a&type=note > 加载可能慢一些,请稍等一会就出来了~ > > > > > > > > > > > > > > Best, > > MagicHuang > > > > > > |
感谢感谢感谢!!!
原来是这样,以为solt 缩写就是-s了,,,感谢这位朋友的解答,已经可以提交了~ > ------------------ 原始邮件 ------------------ > 发 件 人:"Yang Wang" <[hidden email]> > 发送时间:2020-12-24 11:01:46 > 收 件 人:user-zh <[hidden email]> > 抄 送: > 主 题:Re: flink on yarn启动失败 > > 你这个命令写的有点问题,flink run -m yarn-cluster -ynm traffic -s 2 -p 2 -ytm 1024 -py > traffic.py > > 应该是-ys,而不是-s > -s是从savepoints恢复,所以报错里面会有找不到savepoints目录 > > > Best, > Yang > > magichuang 于2020年12月23日周三 下午8:29写道: > > > 机器参数:三台 32C64G centos 7.8,cdh集群在这上面先部署 > > flink版本:1.11.2,在三台集群上搭建的集群 > > > > hadoop集群是用cdh搭建的 > > > > > > 启动命令:flink run -m yarn-cluster -ynm traffic -s 2 -p 2 -ytm 1024 -py > > traffic.py > > > > 程序使用pyflink开发的,从kafka读取数据,然后用滚动窗口聚合每分钟的数据在写入kafka > > > > > > > > > > 这个程序在local模式下是正常运行的,但是用per-job模式提交总是失败 > > > > 测试官方例子 flink run -m yarn-cluster examples/batch/WordCount.jar > > 是可以输出结果的,所以想请教一下这个是yarn的问题还是程序的问题啊? > > > > > > > > > > 下面是主要报错信息 > > > > Caused by: java.util.concurrent.CompletionException: > > org.apache.flink.runtime.client.JobExecutionException: Could not > > instantiate JobManager. > > > > at > > org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$6(Dispatcher.java:398) > > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > > > at > > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > > ~[?:1.8.0_202] > > > > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) > > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > > > at > > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44) > > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > > > ... 4 more > > > > Caused by: org.apache.flink.runtime.client.JobExecutionException: Could > > not instantiate JobManager. > > > > at > > org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$6(Dispatcher.java:398) > > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > > > at > > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > > ~[?:1.8.0_202] > > > > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) > > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > > > at > > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44) > > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > > > ... 4 more > > > > Caused by: java.io.FileNotFoundException: Cannot find checkpoint or > > savepoint file/directory '2' on file system 'file'. > > > > at > > org.apache.flink.runtime.state.filesystem.AbstractFsCheckpointStorage.resolveCheckpointPointer(AbstractFsCheckpointStorage.java:243) > > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > > > at > > org.apache.flink.runtime.state.filesystem.AbstractFsCheckpointStorage.resolveCheckpoint(AbstractFsCheckpointStorage.java:110) > > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > > > at > > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreSavepoint(CheckpointCoordinator.java:1394) > > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > > > at > > org.apache.flink.runtime.scheduler.SchedulerBase.tryRestoreExecutionGraphFromSavepoint(SchedulerBase.java:300) > > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > > > at > > org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:253) > > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > > > at > > org.apache.flink.runtime.scheduler.SchedulerBase.(SchedulerBase.java:229) > > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > > > at > > org.apache.flink.runtime.scheduler.DefaultScheduler.(DefaultScheduler.java:119) > > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > > > at > > org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:103) > > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > > > at > > org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:284) > > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > > > at org.apache.flink.runtime.jobmaster.JobMaster.(JobMaster.java:272) > > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > > > at > > org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:98) > > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > > > at > > org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:40) > > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > > > at > > org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl.(JobManagerRunnerImpl.java:140) > > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > > > at > > org.apache.flink.runtime.dispatcher.DefaultJobManagerRunnerFactory.createJobManagerRunner(DefaultJobManagerRunnerFactory.java:84) > > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > > > at > > org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$6(Dispatcher.java:388) > > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > > > at > > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > > ~[?:1.8.0_202] > > > > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) > > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > > > at > > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44) > > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > > > > ... 4 more > > > > 2020-12-23 20:12:46,459 INFO org.apache.flink.runtime.blob.BlobServer [] - > > Stopped BLOB server at 0.0.0.0:16109 > > > > > > > > > > > > > > > > 全部日志可以打开下面的链接: > > https://note.youdao.com/ynoteshare1/index.html?id=25f1af945e277057c2251e8f60d90f8a&type=note > > 加载可能慢一些,请稍等一会就出来了~ > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > MagicHuang > > > > > > > > > > > > -- Best, MagicHuang |
Free forum by Nabble | Edit this page |