Apache Flink 中文用户邮件列表

Flink 1.9.1的报错信息,在启动flink on yran的时候出现问题

Classic

List

Threaded

2 messages Options

xzw0223

Flink 1.9.1的报错信息,在启动flink on yran的时候出现问题

这是启动日志报错的信息通过flink on yarn模式进行提交的

[root@node01 flink-1.9.1]# bin/flink run -m yarn-cluster -yn 2
./examples/batch/WordCount.jar

2020-09-06 14:30:00,803 INFO org.apache.hadoop.yarn.client.RMProxy
- Connecting to ResourceManager at /0.0.0.0:8032
2020-09-06 14:30:00,938 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli
- No path for the flink jar passed. Using the location of class
org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2020-09-06 14:30:00,938 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli
- No path for the flink jar passed. Using the location of class
org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2020-09-06 14:30:00,947 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli
- The argument yn is deprecated in will be ignored.
2020-09-06 14:30:00,947 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli
- The argument yn is deprecated in will be ignored.
2020-09-06 14:30:01,105 INFO org.apache.hadoop.conf.Configuration
- resource-types.xml not found
2020-09-06 14:30:01,105 INFO
org.apache.hadoop.yarn.util.resource.ResourceUtils - Unable to
find 'resource-types.xml'.
2020-09-06 14:30:01,136 INFO
org.apache.flink.yarn.AbstractYarnClusterDescriptor - Cluster
specification: ClusterSpecification{masterMemoryMB=1024,
taskManagerMemoryMB=1024, numberTaskManagers=2, slotsPerTaskManager=2}
2020-09-06 14:30:01,193 WARN
org.apache.flink.yarn.AbstractYarnClusterDescriptor - The file
system scheme is 'file'. This indicates that the specified Hadoop
configuration path is wrong and the system is using the default Hadoop
configuration values.The Flink YARN client needs to store its files in a
distributed file system
2020-09-06 14:30:01,196 WARN
org.apache.flink.yarn.AbstractYarnClusterDescriptor - The
configuration directory ('/export/servers/flink-1.9.1/conf') contains both
LOG4J and Logback configuration files. Please delete or rename one of them.
2020-09-06 14:30:01,600 INFO
org.apache.flink.yarn.AbstractYarnClusterDescriptor - Submitting
application master application_1599371603539_0004
2020-09-06 14:30:01,635 INFO
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted
application application_1599371603539_0004
2020-09-06 14:30:01,635 INFO
org.apache.flink.yarn.AbstractYarnClusterDescriptor - Waiting for
the cluster to be allocated
2020-09-06 14:30:01,644 INFO
org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deploying
cluster, current state ACCEPTED

------------------------------------------------------------
The program finished with the following exception:

org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't
deploy Yarn session cluster
at
org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:385)
at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:251)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205)
at
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1010)
at
org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1083)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1083)
Caused by:
org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException:
The YARN application unexpectedly switched to state FAILED during
deployment.
Diagnostics from YARN: Application application_1599371603539_0004 failed 2
times due to AM Container for appattempt_1599371603539_0004_000002 exited
with exitCode: 1
For more detailed output, check application tracking
page:http://node01:8088/cluster/app/application_1599371603539_0004Then,
click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1599371603539_0004_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
at org.apache.hadoop.util.Shell.run(Shell.java:482)
at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
If log aggregation is enabled on your cluster, use this command to further
investigate the issue:
yarn logs -applicationId application_1599371603539_0004
at
org.apache.flink.yarn.AbstractYarnClusterDescriptor.startAppMaster(AbstractYarnClusterDescriptor.java:1024)
at
org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(AbstractYarnClusterDescriptor.java:507)
at
org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:378)
... 9 more
2020-09-06 14:30:06,357 INFO
org.apache.flink.yarn.AbstractYarnClusterDescriptor - Cancelling
deployment from Deployment Failure Hook
2020-09-06 14:30:06,357 INFO
org.apache.flink.yarn.AbstractYarnClusterDescriptor - Killing YARN
application
2020-09-06 14:30:06,378 INFO
org.apache.hadoop.io.retry.RetryInvocationHandler -
java.io.IOException: The client is stopped, while invoking
ApplicationClientProtocolPBClientImpl.forceKillApplication over null. Trying
to failover immediately.
2020-09-06 14:30:06,382 INFO
org.apache.hadoop.io.retry.RetryInvocationHandler -
java.io.IOException: The client is stopped, while invoking
ApplicationClientProtocolPBClientImpl.forceKillApplication over null after 1
failover attempts. Trying to failover after sleeping for 27593ms.
2020-09-06 14:30:33,977 INFO
org.apache.hadoop.io.retry.RetryInvocationHandler -
java.io.IOException: The client is stopped, while invoking
ApplicationClientProtocolPBClientImpl.forceKillApplication over null after 2
failover attempts. Trying to failover after sleeping for 39571ms.
2020-09-06 14:31:13,549 INFO
org.apache.hadoop.io.retry.RetryInvocationHandler -
java.io.IOException: The client is stopped, while invoking
ApplicationClientProtocolPBClientImpl.forceKillApplication over null after 3
failover attempts. Trying to failover after sleeping for 26075ms.

这是我查看yarn logs -applicationId application_1599371603539_0004 的报错信息

Container: container_1599371603539_0004_02_000001 on node01_35016
===================================================================
LogType:jobmanager.err
Log Upload Time:星期日九月 06 14:30:07 +0800 2020
LogLength:589
Log Contents:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/export/servers/hadoop-2.7.7/hadoopDatas/tempDatas/nm-local-dir/usercache/root/appcache/application_1599371603539_0004/filecache/10/slf4j-log4j12-1.7.15.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/export/servers/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
End of LogType:jobmanager.err

LogType:jobmanager.out
Log Upload Time:星期日九月 06 14:30:07 +0800 2020
LogLength:0
Log Contents:
End of LogType:jobmanager.out

我使用的Hadoop版本是2.7.7的, 请问一下我这个是哪里出现了问题呢

--
Sent from: http://apache-flink.147419.n8.nabble.com/

Yang Wang

Re: Flink 1.9.1的报错信息,在启动flink on yran的时候出现问题

你把-yn 2这个参数去了看一下，这个参数很早就不能生效了
TM都是动态申请和释放的

Best,
Yang

xzw0223 <[hidden email]> 于2020年9月7日周一上午9:50写道：

> 这是启动日志报错的信息通过flink on yarn模式进行提交的
>
> [root@node01 flink-1.9.1]# bin/flink run -m yarn-cluster -yn 2
> ./examples/batch/WordCount.jar
>
> 2020-09-06 14:30:00,803 INFO org.apache.hadoop.yarn.client.RMProxy
>
> - Connecting to ResourceManager at /0.0.0.0:8032
> 2020-09-06 14:30:00,938 INFO
> org.apache.flink.yarn.cli.FlinkYarnSessionCli
> - No path for the flink jar passed. Using the location of class
> org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2020-09-06 14:30:00,938 INFO
> org.apache.flink.yarn.cli.FlinkYarnSessionCli
> - No path for the flink jar passed. Using the location of class
> org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2020-09-06 14:30:00,947 INFO
> org.apache.flink.yarn.cli.FlinkYarnSessionCli
> - The argument yn is deprecated in will be ignored.
> 2020-09-06 14:30:00,947 INFO
> org.apache.flink.yarn.cli.FlinkYarnSessionCli
> - The argument yn is deprecated in will be ignored.
> 2020-09-06 14:30:01,105 INFO org.apache.hadoop.conf.Configuration
>
> - resource-types.xml not found
> 2020-09-06 14:30:01,105 INFO
> org.apache.hadoop.yarn.util.resource.ResourceUtils - Unable to
> find 'resource-types.xml'.
> 2020-09-06 14:30:01,136 INFO
> org.apache.flink.yarn.AbstractYarnClusterDescriptor - Cluster
> specification: ClusterSpecification{masterMemoryMB=1024,
> taskManagerMemoryMB=1024, numberTaskManagers=2, slotsPerTaskManager=2}
> 2020-09-06 14:30:01,193 WARN
> org.apache.flink.yarn.AbstractYarnClusterDescriptor - The file
> system scheme is 'file'. This indicates that the specified Hadoop
> configuration path is wrong and the system is using the default Hadoop
> configuration values.The Flink YARN client needs to store its files in a
> distributed file system
> 2020-09-06 14:30:01,196 WARN
> org.apache.flink.yarn.AbstractYarnClusterDescriptor - The
> configuration directory ('/export/servers/flink-1.9.1/conf') contains both
> LOG4J and Logback configuration files. Please delete or rename one of them.
> 2020-09-06 14:30:01,600 INFO
> org.apache.flink.yarn.AbstractYarnClusterDescriptor - Submitting
> application master application_1599371603539_0004
> 2020-09-06 14:30:01,635 INFO
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted
> application application_1599371603539_0004
> 2020-09-06 14:30:01,635 INFO
> org.apache.flink.yarn.AbstractYarnClusterDescriptor - Waiting for
> the cluster to be allocated
> 2020-09-06 14:30:01,644 INFO
> org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deploying
> cluster, current state ACCEPTED
>
> ------------------------------------------------------------
> The program finished with the following exception:
>
> org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't
> deploy Yarn session cluster
> at
>
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:385)
> at
> org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:251)
> at
> org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205)
> at
>
> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1010)
> at
>
> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1083)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
> at
>
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
> at
> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1083)
> Caused by:
>
> org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException:
> The YARN application unexpectedly switched to state FAILED during
> deployment.
> Diagnostics from YARN: Application application_1599371603539_0004 failed 2
> times due to AM Container for appattempt_1599371603539_0004_000002 exited
> with exitCode: 1
> For more detailed output, check application tracking
> page:http://node01:8088/cluster/app/application_1599371603539_0004Then,
> click on links to logs of each attempt.
> Diagnostics: Exception from container-launch.
> Container id: container_1599371603539_0004_02_000001
> Exit code: 1
> Stack trace: ExitCodeException exitCode=1:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
> at org.apache.hadoop.util.Shell.run(Shell.java:482)
> at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
> at
>
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
> at
>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
> at
>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>
>
> Container exited with a non-zero exit code 1
> Failing this attempt. Failing the application.
> If log aggregation is enabled on your cluster, use this command to further
> investigate the issue:
> yarn logs -applicationId application_1599371603539_0004
> at
>
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.startAppMaster(AbstractYarnClusterDescriptor.java:1024)
> at
>
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(AbstractYarnClusterDescriptor.java:507)
> at
>
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:378)
> ... 9 more
> 2020-09-06 14:30:06,357 INFO
> org.apache.flink.yarn.AbstractYarnClusterDescriptor - Cancelling
> deployment from Deployment Failure Hook
> 2020-09-06 14:30:06,357 INFO
> org.apache.flink.yarn.AbstractYarnClusterDescriptor - Killing
> YARN
> application
> 2020-09-06 14:30:06,378 INFO
> org.apache.hadoop.io.retry.RetryInvocationHandler -
> java.io.IOException: The client is stopped, while invoking
> ApplicationClientProtocolPBClientImpl.forceKillApplication over null.
> Trying
> to failover immediately.
> 2020-09-06 14:30:06,382 INFO
> org.apache.hadoop.io.retry.RetryInvocationHandler -
> java.io.IOException: The client is stopped, while invoking
> ApplicationClientProtocolPBClientImpl.forceKillApplication over null after
> 1
> failover attempts. Trying to failover after sleeping for 27593ms.
> 2020-09-06 14:30:33,977 INFO
> org.apache.hadoop.io.retry.RetryInvocationHandler -
> java.io.IOException: The client is stopped, while invoking
> ApplicationClientProtocolPBClientImpl.forceKillApplication over null after
> 2
> failover attempts. Trying to failover after sleeping for 39571ms.
> 2020-09-06 14:31:13,549 INFO
> org.apache.hadoop.io.retry.RetryInvocationHandler -
> java.io.IOException: The client is stopped, while invoking
> ApplicationClientProtocolPBClientImpl.forceKillApplication over null after
> 3
> failover attempts. Trying to failover after sleeping for 26075ms.
>
>
> 这是我查看yarn logs -applicationId application_1599371603539_0004 的报错信息
>
>
> Container: container_1599371603539_0004_02_000001 on node01_35016
> ===================================================================
> LogType:jobmanager.err
> Log Upload Time:星期日九月 06 14:30:07 +0800 2020
> LogLength:589
> Log Contents:
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
>
> [jar:file:/export/servers/hadoop-2.7.7/hadoopDatas/tempDatas/nm-local-dir/usercache/root/appcache/application_1599371603539_0004/filecache/10/slf4j-log4j12-1.7.15.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
>
> [jar:file:/export/servers/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> End of LogType:jobmanager.err
>
> LogType:jobmanager.out
> Log Upload Time:星期日九月 06 14:30:07 +0800 2020
> LogLength:0
> Log Contents:
> End of LogType:jobmanager.out
>
>
>
> 我使用的Hadoop版本是2.7.7的, 请问一下我这个是哪里出现了问题呢
>
>
>
>
>
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/
>