Flink 1.12 On Yarn 作业提交失败问题

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink 1.12 On Yarn 作业提交失败问题

LakeShen
Hi 社区,

          最近从 Flink 1.10 升级版本至 Flink 1.12,在提交作业到 Yarn 时,作业一直报错如下:


org.apache.flink.client.program.ProgramInvocationException: The main method
caused an error: Failed to execute sql

at
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:365)

at
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:218)

at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)

at
org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:812)

at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:246)

at
org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1054)

at
org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)

at
org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)

at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)

Caused by: org.apache.flink.table.api.TableException: Failed to execute sql

at
org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:699)

at
org.apache.flink.table.api.internal.TableEnvironmentImpl.executeOperation(TableEnvironmentImpl.java:767)

at
org.apache.flink.table.api.internal.TableEnvironmentImpl.executeSql(TableEnvironmentImpl.java:666)

at
com.youzan.bigdata.FlinkStreamSQLDDLJob.lambda$main$0(FlinkStreamSQLDDLJob.java:95)

at
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1380)

at
java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)

at
com.youzan.bigdata.FlinkStreamSQLDDLJob.main(FlinkStreamSQLDDLJob.java:93)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:348)

... 11 more

Caused by: org.apache.flink.client.deployment.ClusterDeploymentException:
Could not deploy Yarn job cluster.

at
org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:481)

at
org.apache.flink.client.deployment.executors.AbstractJobClusterExecutor.execute(AbstractJobClusterExecutor.java:81)

at
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1905)

at
org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:135)

at
org.apache.flink.table.planner.delegation.ExecutorBase.executeAsync(ExecutorBase.java:55)

at
org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:681)

... 22 more

Caused by:
org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The
YARN application unexpectedly switched to state FAILED during deployment.
Diagnostics from YARN: Application application_1613992328588_4441 failed 2
times due to AM Container for appattempt_1613992328588_4441_000002 exited
with  exitCode: 1
Diagnostics: Exception from container-launch.
Container id: container_xxx
Exit code: 1
Stack trace: ExitCodeException exitCode=1:

at org.apache.hadoop.util.Shell.runCommand(Shell.java:575)

at org.apache.hadoop.util.Shell.run(Shell.java:478)

at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:766)

at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)

at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)

at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)


          相关信息如下:
          1. 我的 Flink 作业中没有 Hadoop 相关的依赖
          2. 提交作业的机器,以及 Hadoop 集群每台机器都有 HADOOP_CLASSPATH 环境变量
          3. Flink 作业提交到 Yarn 后,状态之后从 Accepted 到 FAILED 状态。

          希望有人帮我解惑,感谢

          Best,
          LakeShen
Reply | Threaded
Open this post in threaded view
|

Re:Flink 1.12 On Yarn 作业提交失败问题

Smile
HADOOP_CLASSPATH 看下环境变量的内容是什么,是否和 "hadoop classpath" 这个语句执行的结果一致?
根据 [1],Flink 1.11 开始,不再默认把 HDFS 相关的 jar 包打进 Flink 的包里面了,而是需要用户在执行时指定 HDFS 相关包路径,export HADOOP_CLASSPATH=`hadoop classpath` 这句话实际上的效果是执行 hadoop classpath 命令并将结果赋值给 HADOOP_CLASSPATH 这个系统变量。

另外控制变量的话你找个最简单的作业提交一下看看?
看上面的错误日志你提交的应该是个 SQL 作业,找个 DataStream 的 word count 提交看下报错信息是什么。

参考:
[1]. https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/resource-providers/yarn.html
Reply | Threaded
Open this post in threaded view
|

回复:Flink 1.12 On Yarn 作业提交失败问题

凌战
In reply to this post by LakeShen
同提交作业到On Yarn集群,客户端的错误也是


org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment.
Diagnostics from YARN: Application application_1610671284452_0243 failed 10 times due to AM Container for appattempt_1610671284452_0243_000010 exited with  exitCode: 1
Failing this attempt.Diagnostics: [2021-02-23 18:51:00.021]Exception from container-launch.
Container id: container_e48_1610671284452_0243_10_000001
Exit code: 1


[2021-02-23 18:51:00.024]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :


[2021-02-23 18:51:00.027]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :


Yarn那边的日志显示:Could not find or load main class org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint


不过我是Flink 1.12 的API,然后提交的集群还是Flink1.10.1的,不知道哪里的问题


| |
凌战
|
|
[hidden email]
|
签名由网易邮箱大师定制
在2021年2月23日 18:46,LakeShen<[hidden email]> 写道:
Hi 社区,

最近从 Flink 1.10 升级版本至 Flink 1.12,在提交作业到 Yarn 时,作业一直报错如下:


org.apache.flink.client.program.ProgramInvocationException: The main method
caused an error: Failed to execute sql

at
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:365)

at
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:218)

at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)

at
org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:812)

at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:246)

at
org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1054)

at
org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)

at
org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)

at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)

Caused by: org.apache.flink.table.api.TableException: Failed to execute sql

at
org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:699)

at
org.apache.flink.table.api.internal.TableEnvironmentImpl.executeOperation(TableEnvironmentImpl.java:767)

at
org.apache.flink.table.api.internal.TableEnvironmentImpl.executeSql(TableEnvironmentImpl.java:666)

at
com.youzan.bigdata.FlinkStreamSQLDDLJob.lambda$main$0(FlinkStreamSQLDDLJob.java:95)

at
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1380)

at
java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)

at
com.youzan.bigdata.FlinkStreamSQLDDLJob.main(FlinkStreamSQLDDLJob.java:93)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:348)

... 11 more

Caused by: org.apache.flink.client.deployment.ClusterDeploymentException:
Could not deploy Yarn job cluster.

at
org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:481)

at
org.apache.flink.client.deployment.executors.AbstractJobClusterExecutor.execute(AbstractJobClusterExecutor.java:81)

at
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1905)

at
org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:135)

at
org.apache.flink.table.planner.delegation.ExecutorBase.executeAsync(ExecutorBase.java:55)

at
org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:681)

... 22 more

Caused by:
org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The
YARN application unexpectedly switched to state FAILED during deployment.
Diagnostics from YARN: Application application_1613992328588_4441 failed 2
times due to AM Container for appattempt_1613992328588_4441_000002 exited
with  exitCode: 1
Diagnostics: Exception from container-launch.
Container id: container_xxx
Exit code: 1
Stack trace: ExitCodeException exitCode=1:

at org.apache.hadoop.util.Shell.runCommand(Shell.java:575)

at org.apache.hadoop.util.Shell.run(Shell.java:478)

at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:766)

at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)

at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)

at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)


相关信息如下:
1. 我的 Flink 作业中没有 Hadoop 相关的依赖
2. 提交作业的机器,以及 Hadoop 集群每台机器都有 HADOOP_CLASSPATH 环境变量
3. Flink 作业提交到 Yarn 后,状态之后从 Accepted 到 FAILED 状态。

希望有人帮我解惑,感谢

Best,
LakeShen
Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.12 On Yarn 作业提交失败问题

Smile
In reply to this post by LakeShen
// 上一次似乎没发成功,换个方式重发一次,若有打扰请见谅

HADOOP_CLASSPATH 看下环境变量的内容是什么,是否和 "hadoop classpath" 这个语句执行的结果一致?
根据 [1],Flink 1.11 开始,不再默认把 HDFS 相关的 jar 包打进 Flink 的包里面了,而是需要用户在执行时指定 HDFS
相关包路径,export HADOOP_CLASSPATH=`hadoop classpath` 这句话实际上的效果是执行 hadoop
classpath 命令并将结果赋值给 HADOOP_CLASSPATH 这个系统变量。

另外控制变量的话你找个最简单的作业提交一下看看?
看上面的错误日志你提交的应该是个 SQL 作业,找个 DataStream 的 word count 提交看下报错信息是什么。

参考:
[1].
https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/resource-providers/yarn.html





--
Sent from: http://apache-flink.147419.n8.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.12 On Yarn 作业提交失败问题

LakeShen
In reply to this post by 凌战
这个应该你的 flink 本地配置的目录要是 1.12 版本的,也就是 flink-dist 目录



凌战 <[hidden email]> 于2021年2月23日周二 下午7:33写道:

> 同提交作业到On Yarn集群,客户端的错误也是
>
>
> org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The
> YARN application unexpectedly switched to state FAILED during deployment.
> Diagnostics from YARN: Application application_1610671284452_0243 failed
> 10 times due to AM Container for appattempt_1610671284452_0243_000010
> exited with  exitCode: 1
> Failing this attempt.Diagnostics: [2021-02-23 18:51:00.021]Exception from
> container-launch.
> Container id: container_e48_1610671284452_0243_10_000001
> Exit code: 1
>
>
> [2021-02-23 18:51:00.024]Container exited with a non-zero exit code 1.
> Error file: prelaunch.err.
> Last 4096 bytes of prelaunch.err :
>
>
> [2021-02-23 18:51:00.027]Container exited with a non-zero exit code 1.
> Error file: prelaunch.err.
> Last 4096 bytes of prelaunch.err :
>
>
> Yarn那边的日志显示:Could not find or load main class
> org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint
>
>
> 不过我是Flink 1.12 的API,然后提交的集群还是Flink1.10.1的,不知道哪里的问题
>
>
> | |
> 凌战
> |
> |
> [hidden email]
> |
> 签名由网易邮箱大师定制
> 在2021年2月23日 18:46,LakeShen<[hidden email]> 写道:
> Hi 社区,
>
> 最近从 Flink 1.10 升级版本至 Flink 1.12,在提交作业到 Yarn 时,作业一直报错如下:
>
>
> org.apache.flink.client.program.ProgramInvocationException: The main method
> caused an error: Failed to execute sql
>
> at
>
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:365)
>
> at
>
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:218)
>
> at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)
>
> at
>
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:812)
>
> at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:246)
>
> at
> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1054)
>
> at
>
> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:422)
>
> at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
>
> at
>
> org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>
> at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)
>
> Caused by: org.apache.flink.table.api.TableException: Failed to execute sql
>
> at
>
> org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:699)
>
> at
>
> org.apache.flink.table.api.internal.TableEnvironmentImpl.executeOperation(TableEnvironmentImpl.java:767)
>
> at
>
> org.apache.flink.table.api.internal.TableEnvironmentImpl.executeSql(TableEnvironmentImpl.java:666)
>
> at
>
> com.youzan.bigdata.FlinkStreamSQLDDLJob.lambda$main$0(FlinkStreamSQLDDLJob.java:95)
>
> at
>
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1380)
>
> at
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
>
> at
> com.youzan.bigdata.FlinkStreamSQLDDLJob.main(FlinkStreamSQLDDLJob.java:93)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:498)
>
> at
>
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:348)
>
> ... 11 more
>
> Caused by: org.apache.flink.client.deployment.ClusterDeploymentException:
> Could not deploy Yarn job cluster.
>
> at
>
> org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:481)
>
> at
>
> org.apache.flink.client.deployment.executors.AbstractJobClusterExecutor.execute(AbstractJobClusterExecutor.java:81)
>
> at
>
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1905)
>
> at
>
> org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:135)
>
> at
>
> org.apache.flink.table.planner.delegation.ExecutorBase.executeAsync(ExecutorBase.java:55)
>
> at
>
> org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:681)
>
> ... 22 more
>
> Caused by:
> org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The
> YARN application unexpectedly switched to state FAILED during deployment.
> Diagnostics from YARN: Application application_1613992328588_4441 failed 2
> times due to AM Container for appattempt_1613992328588_4441_000002 exited
> with  exitCode: 1
> Diagnostics: Exception from container-launch.
> Container id: container_xxx
> Exit code: 1
> Stack trace: ExitCodeException exitCode=1:
>
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:575)
>
> at org.apache.hadoop.util.Shell.run(Shell.java:478)
>
> at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:766)
>
> at
>
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
>
> at
>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>
> at
>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>
> at java.lang.Thread.run(Thread.java:748)
>
>
> 相关信息如下:
> 1. 我的 Flink 作业中没有 Hadoop 相关的依赖
> 2. 提交作业的机器,以及 Hadoop 集群每台机器都有 HADOOP_CLASSPATH 环境变量
> 3. Flink 作业提交到 Yarn 后,状态之后从 Accepted 到 FAILED 状态。
>
> 希望有人帮我解惑,感谢
>
> Best,
> LakeShen
>
Reply | Threaded
Open this post in threaded view
|

回复: Flink 1.12 On Yarn 作业提交失败问题

凌战
你是指提交时所依赖的flink-dist jar包需要是 1.12 版本吗,现在改成1.12 版本还是不行


| |
凌战
|
|
[hidden email]
|
签名由网易邮箱大师定制
在2021年2月23日 21:27,LakeShen<[hidden email]> 写道:
这个应该你的 flink 本地配置的目录要是 1.12 版本的,也就是 flink-dist 目录



凌战 <[hidden email]> 于2021年2月23日周二 下午7:33写道:

同提交作业到On Yarn集群,客户端的错误也是


org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The
YARN application unexpectedly switched to state FAILED during deployment.
Diagnostics from YARN: Application application_1610671284452_0243 failed
10 times due to AM Container for appattempt_1610671284452_0243_000010
exited with  exitCode: 1
Failing this attempt.Diagnostics: [2021-02-23 18:51:00.021]Exception from
container-launch.
Container id: container_e48_1610671284452_0243_10_000001
Exit code: 1


[2021-02-23 18:51:00.024]Container exited with a non-zero exit code 1.
Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :


[2021-02-23 18:51:00.027]Container exited with a non-zero exit code 1.
Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :


Yarn那边的日志显示:Could not find or load main class
org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint


不过我是Flink 1.12 的API,然后提交的集群还是Flink1.10.1的,不知道哪里的问题


| |
凌战
|
|
[hidden email]
|
签名由网易邮箱大师定制
在2021年2月23日 18:46,LakeShen<[hidden email]> 写道:
Hi 社区,

最近从 Flink 1.10 升级版本至 Flink 1.12,在提交作业到 Yarn 时,作业一直报错如下:


org.apache.flink.client.program.ProgramInvocationException: The main method
caused an error: Failed to execute sql

at

org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:365)

at

org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:218)

at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)

at

org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:812)

at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:246)

at
org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1054)

at

org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at

org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)

at

org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)

at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)

Caused by: org.apache.flink.table.api.TableException: Failed to execute sql

at

org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:699)

at

org.apache.flink.table.api.internal.TableEnvironmentImpl.executeOperation(TableEnvironmentImpl.java:767)

at

org.apache.flink.table.api.internal.TableEnvironmentImpl.executeSql(TableEnvironmentImpl.java:666)

at

com.youzan.bigdata.FlinkStreamSQLDDLJob.lambda$main$0(FlinkStreamSQLDDLJob.java:95)

at

java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1380)

at
java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)

at
com.youzan.bigdata.FlinkStreamSQLDDLJob.main(FlinkStreamSQLDDLJob.java:93)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at

org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:348)

... 11 more

Caused by: org.apache.flink.client.deployment.ClusterDeploymentException:
Could not deploy Yarn job cluster.

at

org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:481)

at

org.apache.flink.client.deployment.executors.AbstractJobClusterExecutor.execute(AbstractJobClusterExecutor.java:81)

at

org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1905)

at

org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:135)

at

org.apache.flink.table.planner.delegation.ExecutorBase.executeAsync(ExecutorBase.java:55)

at

org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:681)

... 22 more

Caused by:
org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The
YARN application unexpectedly switched to state FAILED during deployment.
Diagnostics from YARN: Application application_1613992328588_4441 failed 2
times due to AM Container for appattempt_1613992328588_4441_000002 exited
with  exitCode: 1
Diagnostics: Exception from container-launch.
Container id: container_xxx
Exit code: 1
Stack trace: ExitCodeException exitCode=1:

at org.apache.hadoop.util.Shell.runCommand(Shell.java:575)

at org.apache.hadoop.util.Shell.run(Shell.java:478)

at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:766)

at

org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)

at

org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)

at

org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)


相关信息如下:
1. 我的 Flink 作业中没有 Hadoop 相关的依赖
2. 提交作业的机器,以及 Hadoop 集群每台机器都有 HADOOP_CLASSPATH 环境变量
3. Flink 作业提交到 Yarn 后,状态之后从 Accepted 到 FAILED 状态。

希望有人帮我解惑,感谢

Best,
LakeShen

Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.12 On Yarn 作业提交失败问题

凌战
In reply to this post by LakeShen
你是指提交时所依赖的flink-dist jar包需要是 1.12 版本吗,现在改成1.12 版本还是不行

> 2021年2月23日 下午9:27,LakeShen <[hidden email]> 写道:
>
> 这个应该你的 flink 本地配置的目录要是 1.12 版本的,也就是 flink-dist 目录
>
>
>
> 凌战 <[hidden email]> 于2021年2月23日周二 下午7:33写道:
>
>> 同提交作业到On Yarn集群,客户端的错误也是
>>
>>
>> org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The
>> YARN application unexpectedly switched to state FAILED during deployment.
>> Diagnostics from YARN: Application application_1610671284452_0243 failed
>> 10 times due to AM Container for appattempt_1610671284452_0243_000010
>> exited with  exitCode: 1
>> Failing this attempt.Diagnostics: [2021-02-23 18:51:00.021]Exception from
>> container-launch.
>> Container id: container_e48_1610671284452_0243_10_000001
>> Exit code: 1
>>
>>
>> [2021-02-23 18:51:00.024]Container exited with a non-zero exit code 1.
>> Error file: prelaunch.err.
>> Last 4096 bytes of prelaunch.err :
>>
>>
>> [2021-02-23 18:51:00.027]Container exited with a non-zero exit code 1.
>> Error file: prelaunch.err.
>> Last 4096 bytes of prelaunch.err :
>>
>>
>> Yarn那边的日志显示:Could not find or load main class
>> org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint
>>
>>
>> 不过我是Flink 1.12 的API,然后提交的集群还是Flink1.10.1的,不知道哪里的问题
>>
>>
>> | |
>> 凌战
>> |
>> |
>> [hidden email]
>> |
>> 签名由网易邮箱大师定制
>> 在2021年2月23日 18:46,LakeShen<[hidden email]> 写道:
>> Hi 社区,
>>
>> 最近从 Flink 1.10 升级版本至 Flink 1.12,在提交作业到 Yarn 时,作业一直报错如下:
>>
>>
>> org.apache.flink.client.program.ProgramInvocationException: The main method
>> caused an error: Failed to execute sql
>>
>> at
>>
>> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:365)
>>
>> at
>>
>> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:218)
>>
>> at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)
>>
>> at
>>
>> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:812)
>>
>> at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:246)
>>
>> at
>> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1054)
>>
>> at
>>
>> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
>>
>> at java.security.AccessController.doPrivileged(Native Method)
>>
>> at javax.security.auth.Subject.doAs(Subject.java:422)
>>
>> at
>>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
>>
>> at
>>
>> org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>>
>> at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)
>>
>> Caused by: org.apache.flink.table.api.TableException: Failed to execute sql
>>
>> at
>>
>> org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:699)
>>
>> at
>>
>> org.apache.flink.table.api.internal.TableEnvironmentImpl.executeOperation(TableEnvironmentImpl.java:767)
>>
>> at
>>
>> org.apache.flink.table.api.internal.TableEnvironmentImpl.executeSql(TableEnvironmentImpl.java:666)
>>
>> at
>>
>> com.youzan.bigdata.FlinkStreamSQLDDLJob.lambda$main$0(FlinkStreamSQLDDLJob.java:95)
>>
>> at
>>
>> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1380)
>>
>> at
>> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
>>
>> at
>> com.youzan.bigdata.FlinkStreamSQLDDLJob.main(FlinkStreamSQLDDLJob.java:93)
>>
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>
>> at
>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>
>> at
>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>
>> at java.lang.reflect.Method.invoke(Method.java:498)
>>
>> at
>>
>> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:348)
>>
>> ... 11 more
>>
>> Caused by: org.apache.flink.client.deployment.ClusterDeploymentException:
>> Could not deploy Yarn job cluster.
>>
>> at
>>
>> org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:481)
>>
>> at
>>
>> org.apache.flink.client.deployment.executors.AbstractJobClusterExecutor.execute(AbstractJobClusterExecutor.java:81)
>>
>> at
>>
>> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1905)
>>
>> at
>>
>> org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:135)
>>
>> at
>>
>> org.apache.flink.table.planner.delegation.ExecutorBase.executeAsync(ExecutorBase.java:55)
>>
>> at
>>
>> org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:681)
>>
>> ... 22 more
>>
>> Caused by:
>> org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The
>> YARN application unexpectedly switched to state FAILED during deployment.
>> Diagnostics from YARN: Application application_1613992328588_4441 failed 2
>> times due to AM Container for appattempt_1613992328588_4441_000002 exited
>> with  exitCode: 1
>> Diagnostics: Exception from container-launch.
>> Container id: container_xxx
>> Exit code: 1
>> Stack trace: ExitCodeException exitCode=1:
>>
>> at org.apache.hadoop.util.Shell.runCommand(Shell.java:575)
>>
>> at org.apache.hadoop.util.Shell.run(Shell.java:478)
>>
>> at
>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:766)
>>
>> at
>>
>> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
>>
>> at
>>
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>>
>> at
>>
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>>
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>
>> at
>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>
>> at
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>
>> at java.lang.Thread.run(Thread.java:748)
>>
>>
>> 相关信息如下:
>> 1. 我的 Flink 作业中没有 Hadoop 相关的依赖
>> 2. 提交作业的机器,以及 Hadoop 集群每台机器都有 HADOOP_CLASSPATH 环境变量
>> 3. Flink 作业提交到 Yarn 后,状态之后从 Accepted 到 FAILED 状态。
>>
>> 希望有人帮我解惑,感谢
>>
>> Best,
>> LakeShen
>>

Reply | Threaded
Open this post in threaded view
|

回复: Flink 1.12 On Yarn 作业提交失败问题

马阳阳
可以试试在flink-conf.yaml里添加如下配置:
yarn.flink-dist-jar: /opt/flink-1.12/lib/flink-dist_2.11-1.12.0.jar
yarn.ship-files: /data/dfl2/lib


这个行为其实很奇怪,在我们的环境里,有的提交任务的机器不需要添加这个配置,有的不加这个配置就会造成那个main class找不到的问题。


Ps: 造成main class找不到的原因还可能是程序依赖的版本和部署的flink版本不一致,这种情况可能发生在flink依赖升级之后,部署的flink没有更新或者没有完全更新


| |
马阳阳
|
|
[hidden email]
|
签名由网易邮箱大师定制


在2021年02月23日 22:36,m183<[hidden email]> 写道:
你是指提交时所依赖的flink-dist jar包需要是 1.12 版本吗,现在改成1.12 版本还是不行

2021年2月23日 下午9:27,LakeShen <[hidden email]> 写道:

这个应该你的 flink 本地配置的目录要是 1.12 版本的,也就是 flink-dist 目录



凌战 <[hidden email]> 于2021年2月23日周二 下午7:33写道:

同提交作业到On Yarn集群,客户端的错误也是


org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The
YARN application unexpectedly switched to state FAILED during deployment.
Diagnostics from YARN: Application application_1610671284452_0243 failed
10 times due to AM Container for appattempt_1610671284452_0243_000010
exited with  exitCode: 1
Failing this attempt.Diagnostics: [2021-02-23 18:51:00.021]Exception from
container-launch.
Container id: container_e48_1610671284452_0243_10_000001
Exit code: 1


[2021-02-23 18:51:00.024]Container exited with a non-zero exit code 1.
Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :


[2021-02-23 18:51:00.027]Container exited with a non-zero exit code 1.
Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :


Yarn那边的日志显示:Could not find or load main class
org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint


不过我是Flink 1.12 的API,然后提交的集群还是Flink1.10.1的,不知道哪里的问题


| |
凌战
|
|
[hidden email]
|
签名由网易邮箱大师定制
在2021年2月23日 18:46,LakeShen<[hidden email]> 写道:
Hi 社区,

最近从 Flink 1.10 升级版本至 Flink 1.12,在提交作业到 Yarn 时,作业一直报错如下:


org.apache.flink.client.program.ProgramInvocationException: The main method
caused an error: Failed to execute sql

at

org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:365)

at

org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:218)

at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)

at

org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:812)

at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:246)

at
org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1054)

at

org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at

org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)

at

org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)

at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)

Caused by: org.apache.flink.table.api.TableException: Failed to execute sql

at

org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:699)

at

org.apache.flink.table.api.internal.TableEnvironmentImpl.executeOperation(TableEnvironmentImpl.java:767)

at

org.apache.flink.table.api.internal.TableEnvironmentImpl.executeSql(TableEnvironmentImpl.java:666)

at

com.youzan.bigdata.FlinkStreamSQLDDLJob.lambda$main$0(FlinkStreamSQLDDLJob.java:95)

at

java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1380)

at
java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)

at
com.youzan.bigdata.FlinkStreamSQLDDLJob.main(FlinkStreamSQLDDLJob.java:93)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at

org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:348)

... 11 more

Caused by: org.apache.flink.client.deployment.ClusterDeploymentException:
Could not deploy Yarn job cluster.

at

org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:481)

at

org.apache.flink.client.deployment.executors.AbstractJobClusterExecutor.execute(AbstractJobClusterExecutor.java:81)

at

org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1905)

at

org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:135)

at

org.apache.flink.table.planner.delegation.ExecutorBase.executeAsync(ExecutorBase.java:55)

at

org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:681)

... 22 more

Caused by:
org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The
YARN application unexpectedly switched to state FAILED during deployment.
Diagnostics from YARN: Application application_1613992328588_4441 failed 2
times due to AM Container for appattempt_1613992328588_4441_000002 exited
with  exitCode: 1
Diagnostics: Exception from container-launch.
Container id: container_xxx
Exit code: 1
Stack trace: ExitCodeException exitCode=1:

at org.apache.hadoop.util.Shell.runCommand(Shell.java:575)

at org.apache.hadoop.util.Shell.run(Shell.java:478)

at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:766)

at

org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)

at

org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)

at

org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)


相关信息如下:
1. 我的 Flink 作业中没有 Hadoop 相关的依赖
2. 提交作业的机器,以及 Hadoop 集群每台机器都有 HADOOP_CLASSPATH 环境变量
3. Flink 作业提交到 Yarn 后,状态之后从 Accepted 到 FAILED 状态。

希望有人帮我解惑,感谢

Best,
LakeShen


Reply | Threaded
Open this post in threaded view
|

回复: Flink 1.12 On Yarn 作业提交失败问题

马阳阳
说明一下,yarn.ship-files这个配置的文件夹下需要包含flink-yarn的jar包,可以配置成flink home下的lib文件夹


| |
马阳阳
|
|
[hidden email]
|
签名由网易邮箱大师定制


在2021年02月25日 08:59,马阳阳<[hidden email]> 写道:
可以试试在flink-conf.yaml里添加如下配置:
yarn.flink-dist-jar: /opt/flink-1.12/lib/flink-dist_2.11-1.12.0.jar
yarn.ship-files: /data/dfl2/lib


这个行为其实很奇怪,在我们的环境里,有的提交任务的机器不需要添加这个配置,有的不加这个配置就会造成那个main class找不到的问题。


Ps: 造成main class找不到的原因还可能是程序依赖的版本和部署的flink版本不一致,这种情况可能发生在flink依赖升级之后,部署的flink没有更新或者没有完全更新


| |
马阳阳
|
|
[hidden email]
|
签名由网易邮箱大师定制


在2021年02月23日 22:36,m183<[hidden email]> 写道:
你是指提交时所依赖的flink-dist jar包需要是 1.12 版本吗,现在改成1.12 版本还是不行

2021年2月23日 下午9:27,LakeShen <[hidden email]> 写道:

这个应该你的 flink 本地配置的目录要是 1.12 版本的,也就是 flink-dist 目录



凌战 <[hidden email]> 于2021年2月23日周二 下午7:33写道:

同提交作业到On Yarn集群,客户端的错误也是


org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The
YARN application unexpectedly switched to state FAILED during deployment.
Diagnostics from YARN: Application application_1610671284452_0243 failed
10 times due to AM Container for appattempt_1610671284452_0243_000010
exited with  exitCode: 1
Failing this attempt.Diagnostics: [2021-02-23 18:51:00.021]Exception from
container-launch.
Container id: container_e48_1610671284452_0243_10_000001
Exit code: 1


[2021-02-23 18:51:00.024]Container exited with a non-zero exit code 1.
Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :


[2021-02-23 18:51:00.027]Container exited with a non-zero exit code 1.
Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :


Yarn那边的日志显示:Could not find or load main class
org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint


不过我是Flink 1.12 的API,然后提交的集群还是Flink1.10.1的,不知道哪里的问题


| |
凌战
|
|
[hidden email]
|
签名由网易邮箱大师定制
在2021年2月23日 18:46,LakeShen<[hidden email]> 写道:
Hi 社区,

最近从 Flink 1.10 升级版本至 Flink 1.12,在提交作业到 Yarn 时,作业一直报错如下:


org.apache.flink.client.program.ProgramInvocationException: The main method
caused an error: Failed to execute sql

at

org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:365)

at

org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:218)

at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)

at

org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:812)

at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:246)

at
org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1054)

at

org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at

org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)

at

org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)

at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)

Caused by: org.apache.flink.table.api.TableException: Failed to execute sql

at

org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:699)

at

org.apache.flink.table.api.internal.TableEnvironmentImpl.executeOperation(TableEnvironmentImpl.java:767)

at

org.apache.flink.table.api.internal.TableEnvironmentImpl.executeSql(TableEnvironmentImpl.java:666)

at

com.youzan.bigdata.FlinkStreamSQLDDLJob.lambda$main$0(FlinkStreamSQLDDLJob.java:95)

at

java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1380)

at
java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)

at
com.youzan.bigdata.FlinkStreamSQLDDLJob.main(FlinkStreamSQLDDLJob.java:93)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at

org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:348)

... 11 more

Caused by: org.apache.flink.client.deployment.ClusterDeploymentException:
Could not deploy Yarn job cluster.

at

org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:481)

at

org.apache.flink.client.deployment.executors.AbstractJobClusterExecutor.execute(AbstractJobClusterExecutor.java:81)

at

org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1905)

at

org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:135)

at

org.apache.flink.table.planner.delegation.ExecutorBase.executeAsync(ExecutorBase.java:55)

at

org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:681)

... 22 more

Caused by:
org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The
YARN application unexpectedly switched to state FAILED during deployment.
Diagnostics from YARN: Application application_1613992328588_4441 failed 2
times due to AM Container for appattempt_1613992328588_4441_000002 exited
with  exitCode: 1
Diagnostics: Exception from container-launch.
Container id: container_xxx
Exit code: 1
Stack trace: ExitCodeException exitCode=1:

at org.apache.hadoop.util.Shell.runCommand(Shell.java:575)

at org.apache.hadoop.util.Shell.run(Shell.java:478)

at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:766)

at

org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)

at

org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)

at

org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)


相关信息如下:
1. 我的 Flink 作业中没有 Hadoop 相关的依赖
2. 提交作业的机器,以及 Hadoop 集群每台机器都有 HADOOP_CLASSPATH 环境变量
3. Flink 作业提交到 Yarn 后,状态之后从 Accepted 到 FAILED 状态。

希望有人帮我解惑,感谢

Best,
LakeShen