Flink 1.10.1 cdh-hdfs启用HA时,flink job提交报错

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink 1.10.1 cdh-hdfs启用HA时,flink job提交报错

Storm☀️
各位老师,好。
Flink on  yarn 模式提交
hadoop:2.6.0
cdh 5.15.2
HADOOP_CONF_DIR=/etc/hadoop/conf

在cdh上开启hdfs的HA之后提交任务报错,不开启HA能正常提交任务。
启动方式:
/bin/flink run -m yarn-cluster -yt /yarn-conf -p 3 -ytm 2048 -ys 1 -ynm xxx
/jars/flink10.jar  xx
报错信息r如下:


2020-09-02 14:53:08,118 DEBUG org.apache.flink.yarn.YarnResourceManager  -
TM:remote keytab path obtained null
2020-09-02 14:53:08,119 DEBUG org.apache.flink.yarn.YarnResourceManager  -
TM:remote keytab principal obtained null
2020-09-02 14:53:08,119 DEBUG org.apache.flink.yarn.YarnResourceManager  -
TM:remote yarn conf path obtained null
2020-09-02 14:53:08,119 DEBUG org.apache.flink.yarn.YarnResourceManager  -
TM:remote krb5 path obtained null
2020-09-02 14:53:08,120 ERROR org.apache.flink.yarn.YarnResourceManager  -
Could not start TaskManager in container
container_1598944802155_0042_01_000006.
java.lang.IllegalArgumentException: java.net.UnknownHostException:
nameservice2
        at
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374)
        at
org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:312)
        at
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:178)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:665)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:601)
        at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:148)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
        at org.apache.flink.yarn.Utils.createTaskExecutorContext(Utils.java:469)
        at
org.apache.flink.yarn.YarnResourceManager.createTaskExecutorLaunchContext(YarnResourceManager.java:582)
        at
org.apache.flink.yarn.YarnResourceManager.startTaskExecutorInContainer(YarnResourceManager.java:384)
        at java.lang.Iterable.forEach(Iterable.java:75)
        at
org.apache.flink.yarn.YarnResourceManager.lambda$onContainersAllocated$1(YarnResourceManager.java:366)
        at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:402)
        at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:195)
        at
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
        at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
        at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
        at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
        at scala.PartialFunction.applyOrElse(PartialFunction.scala:123)
        at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)
        at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
        at akka.actor.Actor.aroundReceive(Actor.scala:517)
        at akka.actor.Actor.aroundReceive$(Actor.scala:515)
        at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
        at akka.actor.ActorCell.invoke(ActorCell.scala:561)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
        at akka.dispatch.Mailbox.run(Mailbox.scala:225)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
        at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.net.UnknownHostException: nameservice2
        ... 41 more



--
Sent from: http://apache-flink.147419.n8.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re:Flink 1.10.1 cdh-hdfs启用HA时,flink job提交报错

huangli
你好, 应该是flink访问hdfs的时候,没有找到namespace&nbsp;
如果做flink任务提交的机器是CDH集群的网关节点,可以将flink-conf.yaml中hdfs有关的配置都配成类似: hdfs:///flink/ha/
去掉namespace和端口的配置,再尝试一下。








Best Regards


Huang Li
------------------&nbsp;Original&nbsp;------------------
From: &nbsp;"Storm☀️"<[hidden email]&gt;;
Date: &nbsp;Wed, Sep 2, 2020 03:36 PM
To: &nbsp;"user-zh"<[hidden email]&gt;;

Subject: &nbsp;Flink 1.10.1 cdh-hdfs启用HA时,flink job提交报错

&nbsp;

各位老师,好。
Flink on&nbsp; yarn 模式提交
hadoop:2.6.0
cdh 5.15.2
HADOOP_CONF_DIR=/etc/hadoop/conf

在cdh上开启hdfs的HA之后提交任务报错,不开启HA能正常提交任务。
启动方式:
/bin/flink run -m yarn-cluster -yt /yarn-conf -p 3 -ytm 2048 -ys 1 -ynm xxx
/jars/flink10.jar&nbsp; xx
报错信息r如下:


2020-09-02 14:53:08,118 DEBUG org.apache.flink.yarn.YarnResourceManager&nbsp; -
TM:remote keytab path obtained null
2020-09-02 14:53:08,119 DEBUG org.apache.flink.yarn.YarnResourceManager&nbsp; -
TM:remote keytab principal obtained null
2020-09-02 14:53:08,119 DEBUG org.apache.flink.yarn.YarnResourceManager&nbsp; -
TM:remote yarn conf path obtained null
2020-09-02 14:53:08,119 DEBUG org.apache.flink.yarn.YarnResourceManager&nbsp; -
TM:remote krb5 path obtained null
2020-09-02 14:53:08,120 ERROR org.apache.flink.yarn.YarnResourceManager&nbsp; -
Could not start TaskManager in container
container_1598944802155_0042_01_000006.
java.lang.IllegalArgumentException: java.net.UnknownHostException:
nameservice2
        at
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374)
        at
org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:312)
        at
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:178)
        at org.apache.hadoop.hdfs.DFSClient.<init&gt;(DFSClient.java:665)
        at org.apache.hadoop.hdfs.DFSClient.<init&gt;(DFSClient.java:601)
        at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:148)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
        at org.apache.flink.yarn.Utils.createTaskExecutorContext(Utils.java:469)
        at
org.apache.flink.yarn.YarnResourceManager.createTaskExecutorLaunchContext(YarnResourceManager.java:582)
        at
org.apache.flink.yarn.YarnResourceManager.startTaskExecutorInContainer(YarnResourceManager.java:384)
        at java.lang.Iterable.forEach(Iterable.java:75)
        at
org.apache.flink.yarn.YarnResourceManager.lambda$onContainersAllocated$1(YarnResourceManager.java:366)
        at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:402)
        at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:195)
        at
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
        at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
        at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
        at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
        at scala.PartialFunction.applyOrElse(PartialFunction.scala:123)
        at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)
        at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
        at akka.actor.Actor.aroundReceive(Actor.scala:517)
        at akka.actor.Actor.aroundReceive$(Actor.scala:515)
        at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
        at akka.actor.ActorCell.invoke(ActorCell.scala:561)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
        at akka.dispatch.Mailbox.run(Mailbox.scala:225)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
        at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.net.UnknownHostException: nameservice2
        ... 41 more



--
Sent from: http://apache-flink.147419.n8.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Re:Flink 1.10.1 cdh-hdfs启用HA时,flink job提交报错

Storm☀️
现在的配置是这样的,没有添加namenode+ip;
jobmanager.archive.fs.dir: hdfs:///completed-jobs/

需要改成:
hdfs://nameservice2/completed-jobs/  这样的吗?


感觉是创建fs的时候错了。看到这部分异常:
createNonHAProxy




--
Sent from: http://apache-flink.147419.n8.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.10.1 cdh-hdfs启用HA时,flink job提交报错

Storm☀️
In reply to this post by Storm☀️
问题找到了;
hdfs-site.xml配置文件冲突导致的。

原因:通过-yt上传了 外部集群的hdfs-site.xml文件。

flink10初始化taskmanager读取 hdfs-site.xml配置的时候被外部的hdfs-site.xml文件干扰。

此问题终结。



--
Sent from: http://apache-flink.147419.n8.nabble.com/