首先,我使用的flink版本是1.11,K8S版本是v1.17。
启动的集群的脚本命令是: ./bin/kubernetes-session.sh -Dkubernetes.cluster-id=flink-cluster -Dkubernetes.jobmanager.service-account=flink -Dtaskmanager.memory.process.size=4096m -Dkubernetes.taskmanager.cpu=2 -Dtaskmanager.numberOfTaskSlots=4 -Dkubernetes.namespace=flink -Dkubernetes.rest-service.exposed.type=NodePort -Dkubernetes.container-start-command-template="%java% %classpath% %jvmmem% %jvmopts% %logging% %class% %args%" -Dakka.framesize=104857600b -Dkubernetes.container.image=flink:1.11.1 集群应该是启动成功,启动的部分日志如下所示: 2020-09-10 14:08:56,417 INFO org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Create flink session cluster flink-cluster successfully, JobManager Web Interface: http://10.10.102.20:30398 可以通过上述网址成功访问到UI界面。 接下来是我的两个问题: 1、关于提交的命令: 仿照文档中给出的命令是:./bin/flink run -d -e kubernetes-session -Dkubernetes.cluster-id=flink-cluster examples/streaming/WindowJoin.jar。但据此命令提交后,会有如下报错: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: org.apache.flink.client.deployment.ClusterRetrieveException: Could not get the rest endpoint of flink-cluster 而后我仿照非native的方法,以如下命令提交: ./bin/flink run -m 10.10.101.26:30398 ./examples/streaming/WordCount.jar,似乎提交成功了。 Q:文档中的命令要做什么额外的配置才能让它解析到endpoint of flink-cluster呢?以后面这个命令提交是可以的吗? 2、以第二种命令提交后报错。 以第二种命令提交后,有如下报错: Caused by: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Could not allocate the required slot within slot request timeout. Please make sure that the cluster has enough resources. at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeWrapWithNoResourceAvailableException(DefaultScheduler.java:441) ... 47 more Caused by: java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) ... 27 more Caused by: java.util.concurrent.TimeoutException Q:看起来似乎是集群资源不够,可是集群的资源应当是充足的,应该如何debug呢? -- Sent from: http://apache-flink.147419.n8.nabble.com/ |
1. 没有找到rest endpoint的原因应该是你flink run提交任务的时候没有指定namespace
你用下面的命令应该就可以了 ./bin/flink run -d -e kubernetes-session -Dkubernetes.cluster-id=flink-cluster -Dkubernetes.namespace=flink examples/streaming/WindowJoin.jar 2. 没有资源有可能是Flink的ResourceManager没有足够的权限向K8s申请Pod,确认你已经配置了正确的service account[1]。 注意不同的namespace需要对应的service account是不一样的。 如果不是这个原因的话,可以打开JobManager的webui查看log或者使用kubectl -n blink logs <JM_POD> [1]. https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html#rbac Best, Yang Mori_Tang <[hidden email]> 于2020年9月10日周四 下午4:10写道: > 首先,我使用的flink版本是1.11,K8S版本是v1.17。 > 启动的集群的脚本命令是: > ./bin/kubernetes-session.sh -Dkubernetes.cluster-id=flink-cluster > -Dkubernetes.jobmanager.service-account=flink > -Dtaskmanager.memory.process.size=4096m -Dkubernetes.taskmanager.cpu=2 > -Dtaskmanager.numberOfTaskSlots=4 -Dkubernetes.namespace=flink > -Dkubernetes.rest-service.exposed.type=NodePort > -Dkubernetes.container-start-command-template="%java% %classpath% %jvmmem% > %jvmopts% %logging% %class% %args%" -Dakka.framesize=104857600b > -Dkubernetes.container.image=flink:1.11.1 > > 集群应该是启动成功,启动的部分日志如下所示: > 2020-09-10 14:08:56,417 INFO > org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Create > flink session cluster flink-cluster successfully, JobManager Web Interface: > http://10.10.102.20:30398 > > 可以通过上述网址成功访问到UI界面。 > > 接下来是我的两个问题: > 1、关于提交的命令: > 仿照文档中给出的命令是:./bin/flink run -d -e kubernetes-session > -Dkubernetes.cluster-id=flink-cluster > examples/streaming/WindowJoin.jar。但据此命令提交后,会有如下报错: > org.apache.flink.client.program.ProgramInvocationException: The main method > caused an error: > org.apache.flink.client.deployment.ClusterRetrieveException: Could not get > the rest endpoint of flink-cluster > > 而后我仿照非native的方法,以如下命令提交: > ./bin/flink run -m 10.10.101.26:30398 > ./examples/streaming/WordCount.jar,似乎提交成功了。 > Q:文档中的命令要做什么额外的配置才能让它解析到endpoint of flink-cluster呢?以后面这个命令提交是可以的吗? > 2、以第二种命令提交后报错。 > 以第二种命令提交后,有如下报错: > Caused by: > org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: > Could not allocate the required slot within slot request timeout. Please > make sure that the cluster has enough resources. > at > > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeWrapWithNoResourceAvailableException(DefaultScheduler.java:441) > ... 47 more > Caused by: java.util.concurrent.CompletionException: > java.util.concurrent.TimeoutException > at > > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) > at > > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) > at > java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) > at > > java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) > ... 27 more > Caused by: java.util.concurrent.TimeoutException > > Q:看起来似乎是集群资源不够,可是集群的资源应当是充足的,应该如何debug呢? > > > > > -- > Sent from: http://apache-flink.147419.n8.nabble.com/ > |
Free forum by Nabble | Edit this page |