关于Fink Native K8S Session模式的两个问题

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

关于Fink Native K8S Session模式的两个问题

Mori_Tang
首先,我使用的flink版本是1.11,K8S版本是v1.17。
启动的集群的脚本命令是:
./bin/kubernetes-session.sh  -Dkubernetes.cluster-id=flink-cluster
-Dkubernetes.jobmanager.service-account=flink
-Dtaskmanager.memory.process.size=4096m   -Dkubernetes.taskmanager.cpu=2
-Dtaskmanager.numberOfTaskSlots=4  -Dkubernetes.namespace=flink
-Dkubernetes.rest-service.exposed.type=NodePort
-Dkubernetes.container-start-command-template="%java% %classpath% %jvmmem%
%jvmopts% %logging% %class% %args%"  -Dakka.framesize=104857600b  
-Dkubernetes.container.image=flink:1.11.1

集群应该是启动成功,启动的部分日志如下所示:
2020-09-10 14:08:56,417 INFO
org.apache.flink.kubernetes.KubernetesClusterDescriptor      [] - Create
flink session cluster flink-cluster successfully, JobManager Web Interface:
http://10.10.102.20:30398

可以通过上述网址成功访问到UI界面。

接下来是我的两个问题:
1、关于提交的命令:
     仿照文档中给出的命令是:./bin/flink run -d -e kubernetes-session
-Dkubernetes.cluster-id=flink-cluster
examples/streaming/WindowJoin.jar。但据此命令提交后,会有如下报错:
org.apache.flink.client.program.ProgramInvocationException: The main method
caused an error:
org.apache.flink.client.deployment.ClusterRetrieveException: Could not get
the rest endpoint of flink-cluster

     而后我仿照非native的方法,以如下命令提交:
     ./bin/flink run -m 10.10.101.26:30398
./examples/streaming/WordCount.jar,似乎提交成功了。
     Q:文档中的命令要做什么额外的配置才能让它解析到endpoint of flink-cluster呢?以后面这个命令提交是可以的吗?
2、以第二种命令提交后报错。
     以第二种命令提交后,有如下报错:
Caused by:
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
Could not allocate the required slot within slot request timeout. Please
make sure that the cluster has enough resources.
        at
org.apache.flink.runtime.scheduler.DefaultScheduler.maybeWrapWithNoResourceAvailableException(DefaultScheduler.java:441)
        ... 47 more
Caused by: java.util.concurrent.CompletionException:
java.util.concurrent.TimeoutException
        at
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
        at
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
        at
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607)
        at
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
        ... 27 more
Caused by: java.util.concurrent.TimeoutException

Q:看起来似乎是集群资源不够,可是集群的资源应当是充足的,应该如何debug呢?




--
Sent from: http://apache-flink.147419.n8.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: 关于Fink Native K8S Session模式的两个问题

Yang Wang
1. 没有找到rest endpoint的原因应该是你flink run提交任务的时候没有指定namespace
你用下面的命令应该就可以了
./bin/flink run -d -e kubernetes-session
-Dkubernetes.cluster-id=flink-cluster -Dkubernetes.namespace=flink
examples/streaming/WindowJoin.jar

2. 没有资源有可能是Flink的ResourceManager没有足够的权限向K8s申请Pod,确认你已经配置了正确的service
account[1]。
注意不同的namespace需要对应的service account是不一样的。
如果不是这个原因的话,可以打开JobManager的webui查看log或者使用kubectl -n blink logs <JM_POD>


[1].
https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html#rbac


Best,
Yang

Mori_Tang <[hidden email]> 于2020年9月10日周四 下午4:10写道:

> 首先,我使用的flink版本是1.11,K8S版本是v1.17。
> 启动的集群的脚本命令是:
> ./bin/kubernetes-session.sh  -Dkubernetes.cluster-id=flink-cluster
> -Dkubernetes.jobmanager.service-account=flink
> -Dtaskmanager.memory.process.size=4096m   -Dkubernetes.taskmanager.cpu=2
> -Dtaskmanager.numberOfTaskSlots=4  -Dkubernetes.namespace=flink
> -Dkubernetes.rest-service.exposed.type=NodePort
> -Dkubernetes.container-start-command-template="%java% %classpath% %jvmmem%
> %jvmopts% %logging% %class% %args%"  -Dakka.framesize=104857600b
> -Dkubernetes.container.image=flink:1.11.1
>
> 集群应该是启动成功,启动的部分日志如下所示:
> 2020-09-10 14:08:56,417 INFO
> org.apache.flink.kubernetes.KubernetesClusterDescriptor      [] - Create
> flink session cluster flink-cluster successfully, JobManager Web Interface:
> http://10.10.102.20:30398
>
> 可以通过上述网址成功访问到UI界面。
>
> 接下来是我的两个问题:
> 1、关于提交的命令:
>      仿照文档中给出的命令是:./bin/flink run -d -e kubernetes-session
> -Dkubernetes.cluster-id=flink-cluster
> examples/streaming/WindowJoin.jar。但据此命令提交后,会有如下报错:
> org.apache.flink.client.program.ProgramInvocationException: The main method
> caused an error:
> org.apache.flink.client.deployment.ClusterRetrieveException: Could not get
> the rest endpoint of flink-cluster
>
>      而后我仿照非native的方法,以如下命令提交:
>      ./bin/flink run -m 10.10.101.26:30398
> ./examples/streaming/WordCount.jar,似乎提交成功了。
>      Q:文档中的命令要做什么额外的配置才能让它解析到endpoint of flink-cluster呢?以后面这个命令提交是可以的吗?
> 2、以第二种命令提交后报错。
>      以第二种命令提交后,有如下报错:
> Caused by:
> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
> Could not allocate the required slot within slot request timeout. Please
> make sure that the cluster has enough resources.
>         at
>
> org.apache.flink.runtime.scheduler.DefaultScheduler.maybeWrapWithNoResourceAvailableException(DefaultScheduler.java:441)
>         ... 47 more
> Caused by: java.util.concurrent.CompletionException:
> java.util.concurrent.TimeoutException
>         at
>
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
>         at
>
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
>         at
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607)
>         at
>
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
>         ... 27 more
> Caused by: java.util.concurrent.TimeoutException
>
> Q:看起来似乎是集群资源不够,可是集群的资源应当是充足的,应该如何debug呢?
>
>
>
>
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/
>