本人第一次尝试在k8s上部署flink,版本用的是1.12.0,jdk是1.8.0_275,scala是2.12.12,在我的mac机器上安装有minikube单机环境,以下是实验步骤:
git clone https://github.com/apache/flink-dockercdflink-docker/1.12/scala_2.12-java8-debian docker build --tag flink:1.12.0-scala_2.12-java8 . cd flink-1.12.0 ./bin/kubernetes-session.sh \ -Dkubernetes.container.image=flink:1.12.0-scala_2.12-java8 \ -Dkubernetes.rest-service.exposed.type=NodePort \ -Dtaskmanager.numberOfTaskSlots=2 \ -Dkubernetes.cluster-id=flink-session-cluster 显示JM启起来了,但无法通过web访问 2020-12-27 22:08:12,387 INFO org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Create flink session cluster session001 successfully, JobManager Web Interface: http://192.168.99.100:8081 通过 `kubectl get pods` 命令查看到pod一直处理ContainerCreating状态 NAME READY STATUS RESTARTS AGE flink-session-cluster-858bd55dff-bzjk2 0/1 ContainerCreating 0 5m59s kubernetes-dashboard-1608509744-6bc8455756-mp47w 1/1 Running 0 6d14h 于是通过 `kubectl describe pod flink-session-cluster-858bd55dff-bzjk2`命令查看详细,结果如下: Name: flink-session-cluster-858bd55dff-bzjk2 Namespace: default Priority: 0 Node: minikube/192.168.99.100 Start Time: Sun, 27 Dec 2020 22:21:56 +0800 Labels: app=flink-session-cluster component=jobmanager pod-template-hash=858bd55dff type=flink-native-kubernetes Annotations: <none> Status: Pending IP: 172.17.0.4 IPs: IP: 172.17.0.4 Controlled By: ReplicaSet/flink-session-cluster-858bd55dff Containers: flink-job-manager: Container ID: Image: flink:1.12.0-scala_2.12-java8 Image ID: Ports: 8081/TCP, 6123/TCP, 6124/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP Command: /docker-entrypoint.sh Args: native-k8s $JAVA_HOME/bin/java -classpath $FLINK_CLASSPATH -Xmx1073741824 -Xms1073741824 -XX:MaxMetaspaceSize=268435456 -Dlog.file=/opt/flink/log/jobmanager.log -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties -Dlog4j.configurationFile=file:/opt/flink/conf/log4j-console.properties org.apache.flink.kubernetes.entrypoint.KubernetesSessionClusterEntrypoint -D jobmanager.memory.off-heap.size=134217728b -D jobmanager.memory.jvm-overhead.min=201326592b -D jobmanager.memory.jvm-metaspace.size=268435456b -D jobmanager.memory.heap.size=1073741824b -D jobmanager.memory.jvm-overhead.max=201326592b State: Waiting Reason: ImagePullBackOff Ready: False Restart Count: 0 Limits: cpu: 1 memory: 1600Mi Requests: cpu: 1 memory: 1600Mi Environment: _POD_IP_ADDRESS: (v1:status.podIP) HADOOP_CONF_DIR: /opt/hadoop/conf Mounts: /opt/flink/conf from flink-config-volume (rw) /opt/hadoop/conf from hadoop-config-volume (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-s47ht (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: hadoop-config-volume: Type: ConfigMap (a volume populated by a ConfigMap) Name: hadoop-config-flink-session-cluster Optional: false flink-config-volume: Type: ConfigMap (a volume populated by a ConfigMap) Name: flink-config-flink-session-cluster Optional: false default-token-s47ht: Type: Secret (a volume populated by a Secret) SecretName: default-token-s47ht Optional: false QoS Class: Guaranteed Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 21m default-scheduler Successfully assigned default/flink-session-cluster-858bd55dff-bzjk2 to minikube Warning FailedMount 21m (x2 over 21m) kubelet MountVolume.SetUp failed for volume "flink-config-volume" : configmap "flink-config-flink-session-cluster" not found Warning FailedMount 21m (x2 over 21m) kubelet MountVolume.SetUp failed for volume "hadoop-config-volume" : configmap "hadoop-config-flink-session-cluster" not found Normal Pulling 13m (x4 over 21m) kubelet Pulling image "flink:1.12.0-scala_2.12-java8" Warning Failed 13m (x4 over 15m) kubelet Failed to pull image "flink:1.12.0-scala_2.12-java8": rpc error: code = Unknown desc = Error response from daemon: manifest for flink:1.12.0-scala_2.12-java8 not found: manifest unknown: manifest unknown Normal BackOff 13m (x5 over 15m) kubelet Back-off pulling image "flink:1.12.0-scala_2.12-java8" Warning Failed 11m (x5 over 15m) kubelet Error: ErrImagePull Warning Failed 100s (x53 over 15m) kubelet Error: ImagePullBackOff 一开始怀疑本地镜像没有生成,于是通过 `docker images` 命令查看 REPOSITORY TAG IMAGE ID CREATED SIZE flink 1.12.0-scala_2.12-java8 f7dd9b9e020b 12 hours ago 642MB 显示镜像的确是存在的,这就奇怪了,为什么从本地pull镜像会失败呢?是哪里有问题了吗?minikube下,如何从本地web访问到k8s上运行的flink集群dashboard呢? 第一次用k8s,还请各位指点,谢谢! |
你整个流程理由有两个问题:
1. 镜像找不到 原因应该是和minikube的driver设置有关,如果是hyperkit或者其他vm的方式,你需要minikube ssh到虚拟机内部查看镜像是否正常存在 2. JM链接无法访问 2020-12-27 22:08:12,387 INFO org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Create flink session cluster session001 successfully, JobManager Web Interface: http://192.168.99.100:8081 我猜你上面的这行log应该不是你贴出来的命令打印的,因为你给的命令是NodePort方式,打印出来的JM地址不应该是8081端口的。 只要你在minikube上提交的任务加上kubernetes.rest-service.exposed.type=NodePort,并且JM能起来,打印出来的JM地址就是可以访问的 当然你也可以手动拼接出来这个链接,minikube ip拿到APIServer地址,然后用kubectl get svc 去查看你创建的Flink Session Cluster对应的rest svc的NodePort,拼起来访问就好了 Best, Yang 陈帅 <[hidden email]> 于2020年12月27日周日 下午10:51写道: > > 本人第一次尝试在k8s上部署flink,版本用的是1.12.0,jdk是1.8.0_275,scala是2.12.12,在我的mac机器上安装有minikube单机环境,以下是实验步骤: > > > git clone > https://github.com/apache/flink-dockercdflink-docker/1.12/scala_2.12-java8-debian > docker build --tag flink:1.12.0-scala_2.12-java8 . > > > cd flink-1.12.0 > ./bin/kubernetes-session.sh \ > -Dkubernetes.container.image=flink:1.12.0-scala_2.12-java8 \ > -Dkubernetes.rest-service.exposed.type=NodePort \ > -Dtaskmanager.numberOfTaskSlots=2 \ > -Dkubernetes.cluster-id=flink-session-cluster > > > 显示JM启起来了,但无法通过web访问 > > 2020-12-27 22:08:12,387 INFO > org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Create > flink session cluster session001 successfully, JobManager Web Interface: > http://192.168.99.100:8081 > > > > > 通过 `kubectl get pods` 命令查看到pod一直处理ContainerCreating状态 > > NAME READY STATUS > RESTARTS AGE > > flink-session-cluster-858bd55dff-bzjk2 0/1 > ContainerCreating 0 5m59s > > kubernetes-dashboard-1608509744-6bc8455756-mp47w 1/1 Running > 0 6d14h > > > > > 于是通过 `kubectl describe pod > flink-session-cluster-858bd55dff-bzjk2`命令查看详细,结果如下: > > > > > Name: flink-session-cluster-858bd55dff-bzjk2 > > Namespace: default > > Priority: 0 > > Node: minikube/192.168.99.100 > > Start Time: Sun, 27 Dec 2020 22:21:56 +0800 > > Labels: app=flink-session-cluster > > component=jobmanager > > pod-template-hash=858bd55dff > > type=flink-native-kubernetes > > Annotations: <none> > > Status: Pending > > IP: 172.17.0.4 > > IPs: > > IP: 172.17.0.4 > > Controlled By: ReplicaSet/flink-session-cluster-858bd55dff > > Containers: > > flink-job-manager: > > Container ID: > > Image: flink:1.12.0-scala_2.12-java8 > > Image ID: > > Ports: 8081/TCP, 6123/TCP, 6124/TCP > > Host Ports: 0/TCP, 0/TCP, 0/TCP > > Command: > > /docker-entrypoint.sh > > Args: > > native-k8s > > $JAVA_HOME/bin/java -classpath $FLINK_CLASSPATH -Xmx1073741824 > -Xms1073741824 -XX:MaxMetaspaceSize=268435456 > -Dlog.file=/opt/flink/log/jobmanager.log > -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml > -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties > -Dlog4j.configurationFile=file:/opt/flink/conf/log4j-console.properties > org.apache.flink.kubernetes.entrypoint.KubernetesSessionClusterEntrypoint > -D jobmanager.memory.off-heap.size=134217728b -D > jobmanager.memory.jvm-overhead.min=201326592b -D > jobmanager.memory.jvm-metaspace.size=268435456b -D > jobmanager.memory.heap.size=1073741824b -D > jobmanager.memory.jvm-overhead.max=201326592b > > State: Waiting > > Reason: ImagePullBackOff > > Ready: False > > Restart Count: 0 > > Limits: > > cpu: 1 > > memory: 1600Mi > > Requests: > > cpu: 1 > > memory: 1600Mi > > Environment: > > _POD_IP_ADDRESS: (v1:status.podIP) > > HADOOP_CONF_DIR: /opt/hadoop/conf > > Mounts: > > /opt/flink/conf from flink-config-volume (rw) > > /opt/hadoop/conf from hadoop-config-volume (rw) > > /var/run/secrets/kubernetes.io/serviceaccount from > default-token-s47ht (ro) > > Conditions: > > Type Status > > Initialized True > > Ready False > > ContainersReady False > > PodScheduled True > > Volumes: > > hadoop-config-volume: > > Type: ConfigMap (a volume populated by a ConfigMap) > > Name: hadoop-config-flink-session-cluster > > Optional: false > > flink-config-volume: > > Type: ConfigMap (a volume populated by a ConfigMap) > > Name: flink-config-flink-session-cluster > > Optional: false > > default-token-s47ht: > > Type: Secret (a volume populated by a Secret) > > SecretName: default-token-s47ht > > Optional: false > > QoS Class: Guaranteed > > Node-Selectors: <none> > > Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s > > node.kubernetes.io/unreachable:NoExecute op=Exists for > 300s > > Events: > > Type Reason Age From Message > > ---- ------ ---- ---- ------- > > Normal Scheduled 21m default-scheduler > Successfully assigned default/flink-session-cluster-858bd55dff-bzjk2 to > minikube > > Warning FailedMount 21m (x2 over 21m) kubelet > MountVolume.SetUp failed for volume "flink-config-volume" : configmap > "flink-config-flink-session-cluster" not found > > Warning FailedMount 21m (x2 over 21m) kubelet > MountVolume.SetUp failed for volume "hadoop-config-volume" : configmap > "hadoop-config-flink-session-cluster" not found > > Normal Pulling 13m (x4 over 21m) kubelet Pulling > image "flink:1.12.0-scala_2.12-java8" > > Warning Failed 13m (x4 over 15m) kubelet Failed to > pull image "flink:1.12.0-scala_2.12-java8": rpc error: code = Unknown desc > = Error response from daemon: manifest for flink:1.12.0-scala_2.12-java8 > not found: manifest unknown: manifest unknown > > Normal BackOff 13m (x5 over 15m) kubelet Back-off > pulling image "flink:1.12.0-scala_2.12-java8" > > Warning Failed 11m (x5 over 15m) kubelet Error: > ErrImagePull > > Warning Failed 100s (x53 over 15m) kubelet Error: > ImagePullBackOff > > > > > 一开始怀疑本地镜像没有生成,于是通过 `docker images` 命令查看 > > REPOSITORY TAG > IMAGE ID CREATED SIZE > > flink > 1.12.0-scala_2.12-java8 f7dd9b9e020b 12 hours ago 642MB > > > > > > 显示镜像的确是存在的,这就奇怪了,为什么从本地pull镜像会失败呢?是哪里有问题了吗?minikube下,如何从本地web访问到k8s上运行的flink集群dashboard呢? > > 第一次用k8s,还请各位指点,谢谢! > > > > > > > > > |
In reply to this post by casel.chen
今天改用官方最新发布的flink镜像版本1.11.3也启不起来
这是我的命令 ./bin/kubernetes-session.sh \ -Dkubernetes.cluster-id=rtdp \ -Dtaskmanager.memory.process.size=4096m \ -Dkubernetes.taskmanager.cpu=2 \ -Dtaskmanager.numberOfTaskSlots=4 \ -Dresourcemanager.taskmanager-timeout=3600000 \ -Dkubernetes.container.image=flink:1.11.3-scala_2.12-java8 \ -Dkubernetes.namespace=rtdp Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 88s default-scheduler Successfully assigned rtdp/rtdp-6d7794d65d-g6mb5 to cn-shanghai.192.168.16.130 Warning FailedMount 88s kubelet MountVolume.SetUp failed for volume "flink-config-volume" : configmap "flink-config-rtdp" not found Warning FailedMount 88s kubelet MountVolume.SetUp failed for volume "hadoop-config-volume" : configmap "hadoop-config-rtdp" not found Normal AllocIPSucceed 87s terway-daemon Alloc IP 192.168.32.25/22 for Pod Normal Pulling 87s kubelet Pulling image "flink:1.11.3-scala_2.12-java8" Normal Pulled 31s kubelet Successfully pulled image "flink:1.11.3-scala_2.12-java8" Normal Created 18s (x2 over 26s) kubelet Created container flink-job-manager Normal Started 18s (x2 over 26s) kubelet Started container flink-job-manager Normal Pulled 18s kubelet Container image "flink:1.11.3-scala_2.12-java8" already present on machine Warning BackOff 10s kubelet Back-off restarting failed container 这里面有两个ConfigMap没有找到,是需要提前创建吗?官方文档没有说明?还是我看漏了? https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#start-flink-session 在 2020-12-27 22:50:32,"陈帅" <[hidden email]> 写道: >本人第一次尝试在k8s上部署flink,版本用的是1.12.0,jdk是1.8.0_275,scala是2.12.12,在我的mac机器上安装有minikube单机环境,以下是实验步骤: > > >git clone https://github.com/apache/flink-dockercdflink-docker/1.12/scala_2.12-java8-debian >docker build --tag flink:1.12.0-scala_2.12-java8 . > > >cd flink-1.12.0 >./bin/kubernetes-session.sh \ -Dkubernetes.container.image=flink:1.12.0-scala_2.12-java8 \ -Dkubernetes.rest-service.exposed.type=NodePort \ -Dtaskmanager.numberOfTaskSlots=2 \ -Dkubernetes.cluster-id=flink-session-cluster > > >显示JM启起来了,但无法通过web访问 > >2020-12-27 22:08:12,387 INFO org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Create flink session cluster session001 successfully, JobManager Web Interface: http://192.168.99.100:8081 > > > > >通过 `kubectl get pods` 命令查看到pod一直处理ContainerCreating状态 > >NAME READY STATUS RESTARTS AGE > >flink-session-cluster-858bd55dff-bzjk2 0/1 ContainerCreating 0 5m59s > >kubernetes-dashboard-1608509744-6bc8455756-mp47w 1/1 Running 0 6d14h > > > > >于是通过 `kubectl describe pod flink-session-cluster-858bd55dff-bzjk2`命令查看详细,结果如下: > > > > >Name: flink-session-cluster-858bd55dff-bzjk2 > >Namespace: default > >Priority: 0 > >Node: minikube/192.168.99.100 > >Start Time: Sun, 27 Dec 2020 22:21:56 +0800 > >Labels: app=flink-session-cluster > > component=jobmanager > > pod-template-hash=858bd55dff > > type=flink-native-kubernetes > >Annotations: <none> > >Status: Pending > >IP: 172.17.0.4 > >IPs: > > IP: 172.17.0.4 > >Controlled By: ReplicaSet/flink-session-cluster-858bd55dff > >Containers: > > flink-job-manager: > > Container ID: > > Image: flink:1.12.0-scala_2.12-java8 > > Image ID: > > Ports: 8081/TCP, 6123/TCP, 6124/TCP > > Host Ports: 0/TCP, 0/TCP, 0/TCP > > Command: > > /docker-entrypoint.sh > > Args: > > native-k8s > > $JAVA_HOME/bin/java -classpath $FLINK_CLASSPATH -Xmx1073741824 -Xms1073741824 -XX:MaxMetaspaceSize=268435456 -Dlog.file=/opt/flink/log/jobmanager.log -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties -Dlog4j.configurationFile=file:/opt/flink/conf/log4j-console.properties org.apache.flink.kubernetes.entrypoint.KubernetesSessionClusterEntrypoint -D jobmanager.memory.off-heap.size=134217728b -D jobmanager.memory.jvm-overhead.min=201326592b -D jobmanager.memory.jvm-metaspace.size=268435456b -D jobmanager.memory.heap.size=1073741824b -D jobmanager.memory.jvm-overhead.max=201326592b > > State: Waiting > > Reason: ImagePullBackOff > > Ready: False > > Restart Count: 0 > > Limits: > > cpu: 1 > > memory: 1600Mi > > Requests: > > cpu: 1 > > memory: 1600Mi > > Environment: > > _POD_IP_ADDRESS: (v1:status.podIP) > > HADOOP_CONF_DIR: /opt/hadoop/conf > > Mounts: > > /opt/flink/conf from flink-config-volume (rw) > > /opt/hadoop/conf from hadoop-config-volume (rw) > > /var/run/secrets/kubernetes.io/serviceaccount from default-token-s47ht (ro) > >Conditions: > > Type Status > > Initialized True > > Ready False > > ContainersReady False > > PodScheduled True > >Volumes: > > hadoop-config-volume: > > Type: ConfigMap (a volume populated by a ConfigMap) > > Name: hadoop-config-flink-session-cluster > > Optional: false > > flink-config-volume: > > Type: ConfigMap (a volume populated by a ConfigMap) > > Name: flink-config-flink-session-cluster > > Optional: false > > default-token-s47ht: > > Type: Secret (a volume populated by a Secret) > > SecretName: default-token-s47ht > > Optional: false > >QoS Class: Guaranteed > >Node-Selectors: <none> > >Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s > > node.kubernetes.io/unreachable:NoExecute op=Exists for 300s > >Events: > > Type Reason Age From Message > > ---- ------ ---- ---- ------- > > Normal Scheduled 21m default-scheduler Successfully assigned default/flink-session-cluster-858bd55dff-bzjk2 to minikube > > Warning FailedMount 21m (x2 over 21m) kubelet MountVolume.SetUp failed for volume "flink-config-volume" : configmap "flink-config-flink-session-cluster" not found > > Warning FailedMount 21m (x2 over 21m) kubelet MountVolume.SetUp failed for volume "hadoop-config-volume" : configmap "hadoop-config-flink-session-cluster" not found > > Normal Pulling 13m (x4 over 21m) kubelet Pulling image "flink:1.12.0-scala_2.12-java8" > > Warning Failed 13m (x4 over 15m) kubelet Failed to pull image "flink:1.12.0-scala_2.12-java8": rpc error: code = Unknown desc = Error response from daemon: manifest for flink:1.12.0-scala_2.12-java8 not found: manifest unknown: manifest unknown > > Normal BackOff 13m (x5 over 15m) kubelet Back-off pulling image "flink:1.12.0-scala_2.12-java8" > > Warning Failed 11m (x5 over 15m) kubelet Error: ErrImagePull > > Warning Failed 100s (x53 over 15m) kubelet Error: ImagePullBackOff > > > > >一开始怀疑本地镜像没有生成,于是通过 `docker images` 命令查看 > >REPOSITORY TAG IMAGE ID CREATED SIZE > >flink 1.12.0-scala_2.12-java8 f7dd9b9e020b 12 hours ago 642MB > > > > >显示镜像的确是存在的,这就奇怪了,为什么从本地pull镜像会失败呢?是哪里有问题了吗?minikube下,如何从本地web访问到k8s上运行的flink集群dashboard呢? > >第一次用k8s,还请各位指点,谢谢! > > > > > > > > |
ConfigMap不需要提前创建,那个Warning信息可以忽略,是正常的,主要原因是先创建的deployment,再创建的ConfigMap
你可以参考社区的文档[1]把Jm的log打到console看一下 我怀疑是你没有创建service account导致的[2] [1]. https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#log-files [2]. https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#rbac Best, Yang 陈帅 <[hidden email]> 于2020年12月28日周一 下午5:54写道: > 今天改用官方最新发布的flink镜像版本1.11.3也启不起来 > 这是我的命令 > ./bin/kubernetes-session.sh \ > -Dkubernetes.cluster-id=rtdp \ > -Dtaskmanager.memory.process.size=4096m \ > -Dkubernetes.taskmanager.cpu=2 \ > -Dtaskmanager.numberOfTaskSlots=4 \ > -Dresourcemanager.taskmanager-timeout=3600000 \ > -Dkubernetes.container.image=flink:1.11.3-scala_2.12-java8 \ > -Dkubernetes.namespace=rtdp > > > > Events: > > Type Reason Age From Message > > ---- ------ ---- ---- ------- > > Normal Scheduled 88s default-scheduler > Successfully assigned rtdp/rtdp-6d7794d65d-g6mb5 to > cn-shanghai.192.168.16.130 > > Warning FailedMount 88s kubelet > MountVolume.SetUp failed for volume "flink-config-volume" : configmap > "flink-config-rtdp" not found > > Warning FailedMount 88s kubelet > MountVolume.SetUp failed for volume "hadoop-config-volume" : configmap > "hadoop-config-rtdp" not found > > Normal AllocIPSucceed 87s terway-daemon Alloc IP > 192.168.32.25/22 for Pod > > Normal Pulling 87s kubelet Pulling > image "flink:1.11.3-scala_2.12-java8" > > Normal Pulled 31s kubelet > Successfully pulled image "flink:1.11.3-scala_2.12-java8" > > Normal Created 18s (x2 over 26s) kubelet Created > container flink-job-manager > > Normal Started 18s (x2 over 26s) kubelet Started > container flink-job-manager > > Normal Pulled 18s kubelet Container > image "flink:1.11.3-scala_2.12-java8" already present on machine > > Warning BackOff 10s kubelet Back-off > restarting failed container > > > > > > > > 这里面有两个ConfigMap没有找到,是需要提前创建吗?官方文档没有说明?还是我看漏了? > > https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#start-flink-session > > > > > > > > > > 在 2020-12-27 22:50:32,"陈帅" <[hidden email]> 写道: > > >本人第一次尝试在k8s上部署flink,版本用的是1.12.0,jdk是1.8.0_275,scala是2.12.12,在我的mac机器上安装有minikube单机环境,以下是实验步骤: > > > > > >git clone > https://github.com/apache/flink-dockercdflink-docker/1.12/scala_2.12-java8-debian > >docker build --tag flink:1.12.0-scala_2.12-java8 . > > > > > >cd flink-1.12.0 > >./bin/kubernetes-session.sh \ > -Dkubernetes.container.image=flink:1.12.0-scala_2.12-java8 \ > -Dkubernetes.rest-service.exposed.type=NodePort \ > -Dtaskmanager.numberOfTaskSlots=2 \ > -Dkubernetes.cluster-id=flink-session-cluster > > > > > >显示JM启起来了,但无法通过web访问 > > > >2020-12-27 22:08:12,387 INFO > org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Create > flink session cluster session001 successfully, JobManager Web Interface: > http://192.168.99.100:8081 > > > > > > > > > >通过 `kubectl get pods` 命令查看到pod一直处理ContainerCreating状态 > > > >NAME READY STATUS > RESTARTS AGE > > > >flink-session-cluster-858bd55dff-bzjk2 0/1 > ContainerCreating 0 5m59s > > > >kubernetes-dashboard-1608509744-6bc8455756-mp47w 1/1 Running > 0 6d14h > > > > > > > > > >于是通过 `kubectl describe pod > flink-session-cluster-858bd55dff-bzjk2`命令查看详细,结果如下: > > > > > > > > > >Name: flink-session-cluster-858bd55dff-bzjk2 > > > >Namespace: default > > > >Priority: 0 > > > >Node: minikube/192.168.99.100 > > > >Start Time: Sun, 27 Dec 2020 22:21:56 +0800 > > > >Labels: app=flink-session-cluster > > > > component=jobmanager > > > > pod-template-hash=858bd55dff > > > > type=flink-native-kubernetes > > > >Annotations: <none> > > > >Status: Pending > > > >IP: 172.17.0.4 > > > >IPs: > > > > IP: 172.17.0.4 > > > >Controlled By: ReplicaSet/flink-session-cluster-858bd55dff > > > >Containers: > > > > flink-job-manager: > > > > Container ID: > > > > Image: flink:1.12.0-scala_2.12-java8 > > > > Image ID: > > > > Ports: 8081/TCP, 6123/TCP, 6124/TCP > > > > Host Ports: 0/TCP, 0/TCP, 0/TCP > > > > Command: > > > > /docker-entrypoint.sh > > > > Args: > > > > native-k8s > > > > $JAVA_HOME/bin/java -classpath $FLINK_CLASSPATH -Xmx1073741824 > -Xms1073741824 -XX:MaxMetaspaceSize=268435456 > -Dlog.file=/opt/flink/log/jobmanager.log > -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml > -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties > -Dlog4j.configurationFile=file:/opt/flink/conf/log4j-console.properties > org.apache.flink.kubernetes.entrypoint.KubernetesSessionClusterEntrypoint > -D jobmanager.memory.off-heap.size=134217728b -D > jobmanager.memory.jvm-overhead.min=201326592b -D > jobmanager.memory.jvm-metaspace.size=268435456b -D > jobmanager.memory.heap.size=1073741824b -D > jobmanager.memory.jvm-overhead.max=201326592b > > > > State: Waiting > > > > Reason: ImagePullBackOff > > > > Ready: False > > > > Restart Count: 0 > > > > Limits: > > > > cpu: 1 > > > > memory: 1600Mi > > > > Requests: > > > > cpu: 1 > > > > memory: 1600Mi > > > > Environment: > > > > _POD_IP_ADDRESS: (v1:status.podIP) > > > > HADOOP_CONF_DIR: /opt/hadoop/conf > > > > Mounts: > > > > /opt/flink/conf from flink-config-volume (rw) > > > > /opt/hadoop/conf from hadoop-config-volume (rw) > > > > /var/run/secrets/kubernetes.io/serviceaccount from > default-token-s47ht (ro) > > > >Conditions: > > > > Type Status > > > > Initialized True > > > > Ready False > > > > ContainersReady False > > > > PodScheduled True > > > >Volumes: > > > > hadoop-config-volume: > > > > Type: ConfigMap (a volume populated by a ConfigMap) > > > > Name: hadoop-config-flink-session-cluster > > > > Optional: false > > > > flink-config-volume: > > > > Type: ConfigMap (a volume populated by a ConfigMap) > > > > Name: flink-config-flink-session-cluster > > > > Optional: false > > > > default-token-s47ht: > > > > Type: Secret (a volume populated by a Secret) > > > > SecretName: default-token-s47ht > > > > Optional: false > > > >QoS Class: Guaranteed > > > >Node-Selectors: <none> > > > >Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for > 300s > > > > node.kubernetes.io/unreachable:NoExecute op=Exists for > 300s > > > >Events: > > > > Type Reason Age From Message > > > > ---- ------ ---- ---- ------- > > > > Normal Scheduled 21m default-scheduler > Successfully assigned default/flink-session-cluster-858bd55dff-bzjk2 to > minikube > > > > Warning FailedMount 21m (x2 over 21m) kubelet > MountVolume.SetUp failed for volume "flink-config-volume" : configmap > "flink-config-flink-session-cluster" not found > > > > Warning FailedMount 21m (x2 over 21m) kubelet > MountVolume.SetUp failed for volume "hadoop-config-volume" : configmap > "hadoop-config-flink-session-cluster" not found > > > > Normal Pulling 13m (x4 over 21m) kubelet Pulling > image "flink:1.12.0-scala_2.12-java8" > > > > Warning Failed 13m (x4 over 15m) kubelet Failed to > pull image "flink:1.12.0-scala_2.12-java8": rpc error: code = Unknown desc > = Error response from daemon: manifest for flink:1.12.0-scala_2.12-java8 > not found: manifest unknown: manifest unknown > > > > Normal BackOff 13m (x5 over 15m) kubelet Back-off > pulling image "flink:1.12.0-scala_2.12-java8" > > > > Warning Failed 11m (x5 over 15m) kubelet Error: > ErrImagePull > > > > Warning Failed 100s (x53 over 15m) kubelet Error: > ImagePullBackOff > > > > > > > > > >一开始怀疑本地镜像没有生成,于是通过 `docker images` 命令查看 > > > >REPOSITORY TAG > IMAGE ID CREATED SIZE > > > >flink > 1.12.0-scala_2.12-java8 f7dd9b9e020b 12 hours ago 642MB > > > > > > > > > > >显示镜像的确是存在的,这就奇怪了,为什么从本地pull镜像会失败呢?是哪里有问题了吗?minikube下,如何从本地web访问到k8s上运行的flink集群dashboard呢? > > > >第一次用k8s,还请各位指点,谢谢! > > > > > > > > > > > > > > > > > |
In reply to this post by Yang Wang
我是在MacBook Pro上搭建了一套MiniKube,安装了VirtualBox。请问正确启动 Flink v1.11.3 on K8S 的步骤是怎样的?
我实践的步骤是: minikube start cd /Users/admin/dev/flink-1.11.3 ./bin/kubernetes-session.sh 此时显示拉取的镜像名称是 flink:1.11.3-scala_2.12 ,而不是dockerhub仓库上flink官方给的 flink:1.11.3-scala_2.12-java8 于是我重新使用命令 ./bin/kubernetes-session.sh \ -Dkubernetes.cluster-id=my-flink-cluster \ -Dkubernetes.container.image=flink:1.11.3-scala_2.12-java8 等待一段拉取镜像时间后get pod显示 SJ-DN0393:flink-1.11.3 admin$ kubectl get pods NAME READY STATUS RESTARTS AGE kubernetes-dashboard-1608509744-6bc8455756-mp47w 1/1 Running 3 10d my-flink-cluster-77c6f85879-9vcx8 0/1 CrashLoopBackOff 5 29m 通过describe pod命令显示 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 29m default-scheduler Successfully assigned default/my-flink-cluster-77c6f85879-9vcx8 to minikube Warning FailedMount 29m kubelet MountVolume.SetUp failed for volume "flink-config-volume" : configmap "flink-config-my-flink-cluster" not found Warning FailedMount 29m kubelet MountVolume.SetUp failed for volume "hadoop-config-volume" : configmap "hadoop-config-my-flink-cluster" not found Normal Pulling 29m kubelet Pulling image "flink:1.11.3-scala_2.12-java8" Normal Pulled 2m41s (x5 over 4m34s) kubelet Container image "flink:1.11.3-scala_2.12-java8" already present on machine Normal Created 2m41s (x5 over 4m33s) kubelet Created container flink-job-manager Normal Started 2m41s (x5 over 4m33s) kubelet Started container flink-job-manager Warning BackOff 2m8s (x10 over 4m18s) kubelet Back-off restarting failed container 在 2020-12-28 10:40:59,"Yang Wang" <[hidden email]> 写道: >你整个流程理由有两个问题: > >1. 镜像找不到 >原因应该是和minikube的driver设置有关,如果是hyperkit或者其他vm的方式,你需要minikube >ssh到虚拟机内部查看镜像是否正常存在 > >2. JM链接无法访问 >2020-12-27 22:08:12,387 INFO >org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Create >flink session cluster session001 successfully, JobManager Web Interface: >http://192.168.99.100:8081 > >我猜你上面的这行log应该不是你贴出来的命令打印的,因为你给的命令是NodePort方式,打印出来的JM地址不应该是8081端口的。 >只要你在minikube上提交的任务加上kubernetes.rest-service.exposed.type=NodePort,并且JM能起来,打印出来的JM地址就是可以访问的 > >当然你也可以手动拼接出来这个链接,minikube ip拿到APIServer地址,然后用kubectl get svc 去查看你创建的Flink >Session Cluster对应的rest svc的NodePort,拼起来访问就好了 > > >Best, >Yang > >陈帅 <[hidden email]> 于2020年12月27日周日 下午10:51写道: > >> >> 本人第一次尝试在k8s上部署flink,版本用的是1.12.0,jdk是1.8.0_275,scala是2.12.12,在我的mac机器上安装有minikube单机环境,以下是实验步骤: >> >> >> git clone >> https://github.com/apache/flink-dockercdflink-docker/1.12/scala_2.12-java8-debian >> docker build --tag flink:1.12.0-scala_2.12-java8 . >> >> >> cd flink-1.12.0 >> ./bin/kubernetes-session.sh \ >> -Dkubernetes.container.image=flink:1.12.0-scala_2.12-java8 \ >> -Dkubernetes.rest-service.exposed.type=NodePort \ >> -Dtaskmanager.numberOfTaskSlots=2 \ >> -Dkubernetes.cluster-id=flink-session-cluster >> >> >> 显示JM启起来了,但无法通过web访问 >> >> 2020-12-27 22:08:12,387 INFO >> org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Create >> flink session cluster session001 successfully, JobManager Web Interface: >> http://192.168.99.100:8081 >> >> >> >> >> 通过 `kubectl get pods` 命令查看到pod一直处理ContainerCreating状态 >> >> NAME READY STATUS >> RESTARTS AGE >> >> flink-session-cluster-858bd55dff-bzjk2 0/1 >> ContainerCreating 0 5m59s >> >> kubernetes-dashboard-1608509744-6bc8455756-mp47w 1/1 Running >> 0 6d14h >> >> >> >> >> 于是通过 `kubectl describe pod >> flink-session-cluster-858bd55dff-bzjk2`命令查看详细,结果如下: >> >> >> >> >> Name: flink-session-cluster-858bd55dff-bzjk2 >> >> Namespace: default >> >> Priority: 0 >> >> Node: minikube/192.168.99.100 >> >> Start Time: Sun, 27 Dec 2020 22:21:56 +0800 >> >> Labels: app=flink-session-cluster >> >> component=jobmanager >> >> pod-template-hash=858bd55dff >> >> type=flink-native-kubernetes >> >> Annotations: <none> >> >> Status: Pending >> >> IP: 172.17.0.4 >> >> IPs: >> >> IP: 172.17.0.4 >> >> Controlled By: ReplicaSet/flink-session-cluster-858bd55dff >> >> Containers: >> >> flink-job-manager: >> >> Container ID: >> >> Image: flink:1.12.0-scala_2.12-java8 >> >> Image ID: >> >> Ports: 8081/TCP, 6123/TCP, 6124/TCP >> >> Host Ports: 0/TCP, 0/TCP, 0/TCP >> >> Command: >> >> /docker-entrypoint.sh >> >> Args: >> >> native-k8s >> >> $JAVA_HOME/bin/java -classpath $FLINK_CLASSPATH -Xmx1073741824 >> -Xms1073741824 -XX:MaxMetaspaceSize=268435456 >> -Dlog.file=/opt/flink/log/jobmanager.log >> -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml >> -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties >> -Dlog4j.configurationFile=file:/opt/flink/conf/log4j-console.properties >> org.apache.flink.kubernetes.entrypoint.KubernetesSessionClusterEntrypoint >> -D jobmanager.memory.off-heap.size=134217728b -D >> jobmanager.memory.jvm-overhead.min=201326592b -D >> jobmanager.memory.jvm-metaspace.size=268435456b -D >> jobmanager.memory.heap.size=1073741824b -D >> jobmanager.memory.jvm-overhead.max=201326592b >> >> State: Waiting >> >> Reason: ImagePullBackOff >> >> Ready: False >> >> Restart Count: 0 >> >> Limits: >> >> cpu: 1 >> >> memory: 1600Mi >> >> Requests: >> >> cpu: 1 >> >> memory: 1600Mi >> >> Environment: >> >> _POD_IP_ADDRESS: (v1:status.podIP) >> >> HADOOP_CONF_DIR: /opt/hadoop/conf >> >> Mounts: >> >> /opt/flink/conf from flink-config-volume (rw) >> >> /opt/hadoop/conf from hadoop-config-volume (rw) >> >> /var/run/secrets/kubernetes.io/serviceaccount from >> default-token-s47ht (ro) >> >> Conditions: >> >> Type Status >> >> Initialized True >> >> Ready False >> >> ContainersReady False >> >> PodScheduled True >> >> Volumes: >> >> hadoop-config-volume: >> >> Type: ConfigMap (a volume populated by a ConfigMap) >> >> Name: hadoop-config-flink-session-cluster >> >> Optional: false >> >> flink-config-volume: >> >> Type: ConfigMap (a volume populated by a ConfigMap) >> >> Name: flink-config-flink-session-cluster >> >> Optional: false >> >> default-token-s47ht: >> >> Type: Secret (a volume populated by a Secret) >> >> SecretName: default-token-s47ht >> >> Optional: false >> >> QoS Class: Guaranteed >> >> Node-Selectors: <none> >> >> Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s >> >> node.kubernetes.io/unreachable:NoExecute op=Exists for >> 300s >> >> Events: >> >> Type Reason Age From Message >> >> ---- ------ ---- ---- ------- >> >> Normal Scheduled 21m default-scheduler >> Successfully assigned default/flink-session-cluster-858bd55dff-bzjk2 to >> minikube >> >> Warning FailedMount 21m (x2 over 21m) kubelet >> MountVolume.SetUp failed for volume "flink-config-volume" : configmap >> "flink-config-flink-session-cluster" not found >> >> Warning FailedMount 21m (x2 over 21m) kubelet >> MountVolume.SetUp failed for volume "hadoop-config-volume" : configmap >> "hadoop-config-flink-session-cluster" not found >> >> Normal Pulling 13m (x4 over 21m) kubelet Pulling >> image "flink:1.12.0-scala_2.12-java8" >> >> Warning Failed 13m (x4 over 15m) kubelet Failed to >> pull image "flink:1.12.0-scala_2.12-java8": rpc error: code = Unknown desc >> = Error response from daemon: manifest for flink:1.12.0-scala_2.12-java8 >> not found: manifest unknown: manifest unknown >> >> Normal BackOff 13m (x5 over 15m) kubelet Back-off >> pulling image "flink:1.12.0-scala_2.12-java8" >> >> Warning Failed 11m (x5 over 15m) kubelet Error: >> ErrImagePull >> >> Warning Failed 100s (x53 over 15m) kubelet Error: >> ImagePullBackOff >> >> >> >> >> 一开始怀疑本地镜像没有生成,于是通过 `docker images` 命令查看 >> >> REPOSITORY TAG >> IMAGE ID CREATED SIZE >> >> flink >> 1.12.0-scala_2.12-java8 f7dd9b9e020b 12 hours ago 642MB >> >> >> >> >> >> 显示镜像的确是存在的,这就奇怪了,为什么从本地pull镜像会失败呢?是哪里有问题了吗?minikube下,如何从本地web访问到k8s上运行的flink集群dashboard呢? >> >> 第一次用k8s,还请各位指点,谢谢! >> >> >> >> >> >> >> >> >> |
In reply to this post by Yang Wang
环境:MacBook Pro 单机安装了 minkube v1.15.1 和 kubernetes v1.19.4
我在flink v1.11.3发行版下执行如下命令 kubectl create namespace flink-session-cluster kubectl create serviceaccount flink -n flink-session-cluster kubectl create clusterrolebinding flink-role-binding-flink \ --clusterrole=edit \ --serviceaccount=flink-session-cluster:flink ./bin/kubernetes-session.sh \ -Dkubernetes.namespace=flink-session-cluster \ -Dkubernetes.jobmanager.service-account=flink \ -Dkubernetes.cluster-id=session001 \ -Dtaskmanager.memory.process.size=8192m \ -Dkubernetes.taskmanager.cpu=1 \ -Dtaskmanager.numberOfTaskSlots=4 \ -Dresourcemanager.taskmanager-timeout=3600000 屏幕打印的结果显示flink web UI启在了 http://192.168.64.2:8081 而不是类似于 http://192.168.50.135:31753 这样的5位数端口,是哪里有问题?这里的host ip应该是minikube ip吗?我本地浏览器访问不了http://192.168.64.2:8081 2021-01-02 10:28:04,177 INFO org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The derived from fraction jvm overhead memory (160.000mb (167772162 bytes)) is less than its min value 192.000mb (201326592 bytes), min value will be used instead 2021-01-02 10:28:04,907 INFO org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Create flink session cluster session001 successfully, JobManager Web Interface: http://192.168.64.2:8081 查看了pods, service, deployment都正常启动好了,显示全绿色的 接下来提交任务 ./bin/flink run -d \ -e kubernetes-session \ -Dkubernetes.namespace=flink-session-cluster \ -Dkubernetes.cluster-id=session001 \ examples/streaming/WindowJoin.jar Using windowSize=2000, data rate=3 To customize example, use: WindowJoin [--windowSize <window-size-in-millis>] [--rate <elements-per-second>] 2021-01-02 10:21:48,658 INFO org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Retrieve flink cluster session001 successfully, JobManager Web Interface: http://10.106.136.236:8081 这里显示的 http://10.106.136.236:8081 我是能够通过浏览器访问到的,打开显示作业正在运行,而且available slots一项显示的是 0,查看JM日志有如下error Causedby: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Couldnot allocate the required slot within slot request timeout. Please make sure that the cluster has enough resources. at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeWrapWithNoResourceAvailableException(DefaultScheduler.java:441) ~[flink-dist_2.12-1.11.3.jar:1.11.3] ... 47 more Causedby: java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_275] at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_275] at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) ~[?:1.8.0_275] at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) ~[?:1.8.0_275] ... 27 more Causedby: java.util.concurrent.TimeoutException ... 25 more 为什么会报这个资源配置不足的错?谢谢解答! 在 2020-12-29 09:53:48,"Yang Wang" <[hidden email]> 写道: >ConfigMap不需要提前创建,那个Warning信息可以忽略,是正常的,主要原因是先创建的deployment,再创建的ConfigMap >你可以参考社区的文档[1]把Jm的log打到console看一下 > >我怀疑是你没有创建service account导致的[2] > >[1]. >https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#log-files >[2]. >https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#rbac > >Best, >Yang > >陈帅 <[hidden email]> 于2020年12月28日周一 下午5:54写道: > >> 今天改用官方最新发布的flink镜像版本1.11.3也启不起来 >> 这是我的命令 >> ./bin/kubernetes-session.sh \ >> -Dkubernetes.cluster-id=rtdp \ >> -Dtaskmanager.memory.process.size=4096m \ >> -Dkubernetes.taskmanager.cpu=2 \ >> -Dtaskmanager.numberOfTaskSlots=4 \ >> -Dresourcemanager.taskmanager-timeout=3600000 \ >> -Dkubernetes.container.image=flink:1.11.3-scala_2.12-java8 \ >> -Dkubernetes.namespace=rtdp >> >> >> >> Events: >> >> Type Reason Age From Message >> >> ---- ------ ---- ---- ------- >> >> Normal Scheduled 88s default-scheduler >> Successfully assigned rtdp/rtdp-6d7794d65d-g6mb5 to >> cn-shanghai.192.168.16.130 >> >> Warning FailedMount 88s kubelet >> MountVolume.SetUp failed for volume "flink-config-volume" : configmap >> "flink-config-rtdp" not found >> >> Warning FailedMount 88s kubelet >> MountVolume.SetUp failed for volume "hadoop-config-volume" : configmap >> "hadoop-config-rtdp" not found >> >> Normal AllocIPSucceed 87s terway-daemon Alloc IP >> 192.168.32.25/22 for Pod >> >> Normal Pulling 87s kubelet Pulling >> image "flink:1.11.3-scala_2.12-java8" >> >> Normal Pulled 31s kubelet >> Successfully pulled image "flink:1.11.3-scala_2.12-java8" >> >> Normal Created 18s (x2 over 26s) kubelet Created >> container flink-job-manager >> >> Normal Started 18s (x2 over 26s) kubelet Started >> container flink-job-manager >> >> Normal Pulled 18s kubelet Container >> image "flink:1.11.3-scala_2.12-java8" already present on machine >> >> Warning BackOff 10s kubelet Back-off >> restarting failed container >> >> >> >> >> >> >> >> 这里面有两个ConfigMap没有找到,是需要提前创建吗?官方文档没有说明?还是我看漏了? >> >> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#start-flink-session >> >> >> >> >> >> >> >> >> >> 在 2020-12-27 22:50:32,"陈帅" <[hidden email]> 写道: >> >> >本人第一次尝试在k8s上部署flink,版本用的是1.12.0,jdk是1.8.0_275,scala是2.12.12,在我的mac机器上安装有minikube单机环境,以下是实验步骤: >> > >> > >> >git clone >> https://github.com/apache/flink-dockercdflink-docker/1.12/scala_2.12-java8-debian >> >docker build --tag flink:1.12.0-scala_2.12-java8 . >> > >> > >> >cd flink-1.12.0 >> >./bin/kubernetes-session.sh \ >> -Dkubernetes.container.image=flink:1.12.0-scala_2.12-java8 \ >> -Dkubernetes.rest-service.exposed.type=NodePort \ >> -Dtaskmanager.numberOfTaskSlots=2 \ >> -Dkubernetes.cluster-id=flink-session-cluster >> > >> > >> >显示JM启起来了,但无法通过web访问 >> > >> >2020-12-27 22:08:12,387 INFO >> org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Create >> flink session cluster session001 successfully, JobManager Web Interface: >> http://192.168.99.100:8081 >> > >> > >> > >> > >> >通过 `kubectl get pods` 命令查看到pod一直处理ContainerCreating状态 >> > >> >NAME READY STATUS >> RESTARTS AGE >> > >> >flink-session-cluster-858bd55dff-bzjk2 0/1 >> ContainerCreating 0 5m59s >> > >> >kubernetes-dashboard-1608509744-6bc8455756-mp47w 1/1 Running >> 0 6d14h >> > >> > >> > >> > >> >于是通过 `kubectl describe pod >> flink-session-cluster-858bd55dff-bzjk2`命令查看详细,结果如下: >> > >> > >> > >> > >> >Name: flink-session-cluster-858bd55dff-bzjk2 >> > >> >Namespace: default >> > >> >Priority: 0 >> > >> >Node: minikube/192.168.99.100 >> > >> >Start Time: Sun, 27 Dec 2020 22:21:56 +0800 >> > >> >Labels: app=flink-session-cluster >> > >> > component=jobmanager >> > >> > pod-template-hash=858bd55dff >> > >> > type=flink-native-kubernetes >> > >> >Annotations: <none> >> > >> >Status: Pending >> > >> >IP: 172.17.0.4 >> > >> >IPs: >> > >> > IP: 172.17.0.4 >> > >> >Controlled By: ReplicaSet/flink-session-cluster-858bd55dff >> > >> >Containers: >> > >> > flink-job-manager: >> > >> > Container ID: >> > >> > Image: flink:1.12.0-scala_2.12-java8 >> > >> > Image ID: >> > >> > Ports: 8081/TCP, 6123/TCP, 6124/TCP >> > >> > Host Ports: 0/TCP, 0/TCP, 0/TCP >> > >> > Command: >> > >> > /docker-entrypoint.sh >> > >> > Args: >> > >> > native-k8s >> > >> > $JAVA_HOME/bin/java -classpath $FLINK_CLASSPATH -Xmx1073741824 >> -Xms1073741824 -XX:MaxMetaspaceSize=268435456 >> -Dlog.file=/opt/flink/log/jobmanager.log >> -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml >> -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties >> -Dlog4j.configurationFile=file:/opt/flink/conf/log4j-console.properties >> org.apache.flink.kubernetes.entrypoint.KubernetesSessionClusterEntrypoint >> -D jobmanager.memory.off-heap.size=134217728b -D >> jobmanager.memory.jvm-overhead.min=201326592b -D >> jobmanager.memory.jvm-metaspace.size=268435456b -D >> jobmanager.memory.heap.size=1073741824b -D >> jobmanager.memory.jvm-overhead.max=201326592b >> > >> > State: Waiting >> > >> > Reason: ImagePullBackOff >> > >> > Ready: False >> > >> > Restart Count: 0 >> > >> > Limits: >> > >> > cpu: 1 >> > >> > memory: 1600Mi >> > >> > Requests: >> > >> > cpu: 1 >> > >> > memory: 1600Mi >> > >> > Environment: >> > >> > _POD_IP_ADDRESS: (v1:status.podIP) >> > >> > HADOOP_CONF_DIR: /opt/hadoop/conf >> > >> > Mounts: >> > >> > /opt/flink/conf from flink-config-volume (rw) >> > >> > /opt/hadoop/conf from hadoop-config-volume (rw) >> > >> > /var/run/secrets/kubernetes.io/serviceaccount from >> default-token-s47ht (ro) >> > >> >Conditions: >> > >> > Type Status >> > >> > Initialized True >> > >> > Ready False >> > >> > ContainersReady False >> > >> > PodScheduled True >> > >> >Volumes: >> > >> > hadoop-config-volume: >> > >> > Type: ConfigMap (a volume populated by a ConfigMap) >> > >> > Name: hadoop-config-flink-session-cluster >> > >> > Optional: false >> > >> > flink-config-volume: >> > >> > Type: ConfigMap (a volume populated by a ConfigMap) >> > >> > Name: flink-config-flink-session-cluster >> > >> > Optional: false >> > >> > default-token-s47ht: >> > >> > Type: Secret (a volume populated by a Secret) >> > >> > SecretName: default-token-s47ht >> > >> > Optional: false >> > >> >QoS Class: Guaranteed >> > >> >Node-Selectors: <none> >> > >> >Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for >> 300s >> > >> > node.kubernetes.io/unreachable:NoExecute op=Exists for >> 300s >> > >> >Events: >> > >> > Type Reason Age From Message >> > >> > ---- ------ ---- ---- ------- >> > >> > Normal Scheduled 21m default-scheduler >> Successfully assigned default/flink-session-cluster-858bd55dff-bzjk2 to >> minikube >> > >> > Warning FailedMount 21m (x2 over 21m) kubelet >> MountVolume.SetUp failed for volume "flink-config-volume" : configmap >> "flink-config-flink-session-cluster" not found >> > >> > Warning FailedMount 21m (x2 over 21m) kubelet >> MountVolume.SetUp failed for volume "hadoop-config-volume" : configmap >> "hadoop-config-flink-session-cluster" not found >> > >> > Normal Pulling 13m (x4 over 21m) kubelet Pulling >> image "flink:1.12.0-scala_2.12-java8" >> > >> > Warning Failed 13m (x4 over 15m) kubelet Failed to >> pull image "flink:1.12.0-scala_2.12-java8": rpc error: code = Unknown desc >> = Error response from daemon: manifest for flink:1.12.0-scala_2.12-java8 >> not found: manifest unknown: manifest unknown >> > >> > Normal BackOff 13m (x5 over 15m) kubelet Back-off >> pulling image "flink:1.12.0-scala_2.12-java8" >> > >> > Warning Failed 11m (x5 over 15m) kubelet Error: >> ErrImagePull >> > >> > Warning Failed 100s (x53 over 15m) kubelet Error: >> ImagePullBackOff >> > >> > >> > >> > >> >一开始怀疑本地镜像没有生成,于是通过 `docker images` 命令查看 >> > >> >REPOSITORY TAG >> IMAGE ID CREATED SIZE >> > >> >flink >> 1.12.0-scala_2.12-java8 f7dd9b9e020b 12 hours ago 642MB >> > >> > >> > >> > >> >> >显示镜像的确是存在的,这就奇怪了,为什么从本地pull镜像会失败呢?是哪里有问题了吗?minikube下,如何从本地web访问到k8s上运行的flink集群dashboard呢? >> > >> >第一次用k8s,还请各位指点,谢谢! >> > >> > >> > >> > >> > >> > >> > >> > >> |
native方式默认使用的是LoadBalancer的方式来暴露,所以会打印出来一个你无法访问的地址
你可以加一个-Dkubernetes.rest-service.exposed.type=NodePort的方式来使用NodePort来暴露 这样Flink Client端打印出来的地址就是正确的了 另外你可以可以使用minikube ip来查看ip地址,同时用kubectl get svc获取你创建的Flink cluster svc的NodePort,拼起来就可以 至于你说的NoResourceAvailableException,你可以看下是不是TaskManager的Pod已经创建出来了,但是pending状态 如果是,那就是你minikube资源不够了,可以把minikube资源调大或者把JobManager、TaskManager的Pod资源调小 如果不是,你可以把完整的JobManager日志发一下,这样方便查问题 Best, Yang 陈帅 <[hidden email]> 于2021年1月2日周六 上午10:43写道: > 环境:MacBook Pro 单机安装了 minkube v1.15.1 和 kubernetes v1.19.4 > 我在flink v1.11.3发行版下执行如下命令 > kubectl create namespace flink-session-cluster > > > kubectl create serviceaccount flink -n flink-session-cluster > > > kubectl create clusterrolebinding flink-role-binding-flink \ > --clusterrole=edit \ --serviceaccount=flink-session-cluster:flink > > > ./bin/kubernetes-session.sh \ -Dkubernetes.namespace=flink-session-cluster > \ -Dkubernetes.jobmanager.service-account=flink \ > -Dkubernetes.cluster-id=session001 \ > -Dtaskmanager.memory.process.size=8192m \ -Dkubernetes.taskmanager.cpu=1 \ > -Dtaskmanager.numberOfTaskSlots=4 \ > -Dresourcemanager.taskmanager-timeout=3600000 > > > 屏幕打印的结果显示flink web UI启在了 http://192.168.64.2:8081 而不是类似于 > http://192.168.50.135:31753 这样的5位数端口,是哪里有问题?这里的host ip应该是minikube > ip吗?我本地浏览器访问不了http://192.168.64.2:8081 > > > > 2021-01-02 10:28:04,177 INFO > org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The > derived from fraction jvm overhead memory (160.000mb (167772162 bytes)) is > less than its min value 192.000mb (201326592 bytes), min value will be used > instead > > 2021-01-02 10:28:04,907 INFO > org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Create > flink session cluster session001 successfully, JobManager Web Interface: > http://192.168.64.2:8081 > > > > > 查看了pods, service, deployment都正常启动好了,显示全绿色的 > > > 接下来提交任务 > ./bin/flink run -d \ -e kubernetes-session \ > -Dkubernetes.namespace=flink-session-cluster \ > -Dkubernetes.cluster-id=session001 \ examples/streaming/WindowJoin.jar > > > > Using windowSize=2000, data rate=3 > > To customize example, use: WindowJoin [--windowSize > <window-size-in-millis>] [--rate <elements-per-second>] > > 2021-01-02 10:21:48,658 INFO > org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Retrieve > flink cluster session001 successfully, JobManager Web Interface: > http://10.106.136.236:8081 > > > > > 这里显示的 http://10.106.136.236:8081 我是能够通过浏览器访问到的,打开显示作业正在运行,而且available > slots一项显示的是 0,查看JM日志有如下error > > > > > Causedby: > org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: > Couldnot allocate the required slot within slot request timeout. Please > make sure that the cluster has enough resources. > at > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeWrapWithNoResourceAvailableException(DefaultScheduler.java:441) > ~[flink-dist_2.12-1.11.3.jar:1.11.3] > ... 47 more > Causedby: java.util.concurrent.CompletionException: > java.util.concurrent.TimeoutException > at > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) > ~[?:1.8.0_275] > at > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) > ~[?:1.8.0_275] > at > java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) > ~[?:1.8.0_275] > at > java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) > ~[?:1.8.0_275] > ... 27 more > Causedby: java.util.concurrent.TimeoutException > ... 25 more > > > 为什么会报这个资源配置不足的错?谢谢解答! > > > > > > > > > 在 2020-12-29 09:53:48,"Yang Wang" <[hidden email]> 写道: > >ConfigMap不需要提前创建,那个Warning信息可以忽略,是正常的,主要原因是先创建的deployment,再创建的ConfigMap > >你可以参考社区的文档[1]把Jm的log打到console看一下 > > > >我怀疑是你没有创建service account导致的[2] > > > >[1]. > > > https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#log-files > >[2]. > > > https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#rbac > > > >Best, > >Yang > > > >陈帅 <[hidden email]> 于2020年12月28日周一 下午5:54写道: > > > >> 今天改用官方最新发布的flink镜像版本1.11.3也启不起来 > >> 这是我的命令 > >> ./bin/kubernetes-session.sh \ > >> -Dkubernetes.cluster-id=rtdp \ > >> -Dtaskmanager.memory.process.size=4096m \ > >> -Dkubernetes.taskmanager.cpu=2 \ > >> -Dtaskmanager.numberOfTaskSlots=4 \ > >> -Dresourcemanager.taskmanager-timeout=3600000 \ > >> -Dkubernetes.container.image=flink:1.11.3-scala_2.12-java8 \ > >> -Dkubernetes.namespace=rtdp > >> > >> > >> > >> Events: > >> > >> Type Reason Age From Message > >> > >> ---- ------ ---- ---- ------- > >> > >> Normal Scheduled 88s default-scheduler > >> Successfully assigned rtdp/rtdp-6d7794d65d-g6mb5 to > >> cn-shanghai.192.168.16.130 > >> > >> Warning FailedMount 88s kubelet > >> MountVolume.SetUp failed for volume "flink-config-volume" : configmap > >> "flink-config-rtdp" not found > >> > >> Warning FailedMount 88s kubelet > >> MountVolume.SetUp failed for volume "hadoop-config-volume" : configmap > >> "hadoop-config-rtdp" not found > >> > >> Normal AllocIPSucceed 87s terway-daemon Alloc > IP > >> 192.168.32.25/22 for Pod > >> > >> Normal Pulling 87s kubelet Pulling > >> image "flink:1.11.3-scala_2.12-java8" > >> > >> Normal Pulled 31s kubelet > >> Successfully pulled image "flink:1.11.3-scala_2.12-java8" > >> > >> Normal Created 18s (x2 over 26s) kubelet Created > >> container flink-job-manager > >> > >> Normal Started 18s (x2 over 26s) kubelet Started > >> container flink-job-manager > >> > >> Normal Pulled 18s kubelet > Container > >> image "flink:1.11.3-scala_2.12-java8" already present on machine > >> > >> Warning BackOff 10s kubelet > Back-off > >> restarting failed container > >> > >> > >> > >> > >> > >> > >> > >> 这里面有两个ConfigMap没有找到,是需要提前创建吗?官方文档没有说明?还是我看漏了? > >> > >> > https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#start-flink-session > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> 在 2020-12-27 22:50:32,"陈帅" <[hidden email]> 写道: > >> > >> > >本人第一次尝试在k8s上部署flink,版本用的是1.12.0,jdk是1.8.0_275,scala是2.12.12,在我的mac机器上安装有minikube单机环境,以下是实验步骤: > >> > > >> > > >> >git clone > >> > https://github.com/apache/flink-dockercdflink-docker/1.12/scala_2.12-java8-debian > >> >docker build --tag flink:1.12.0-scala_2.12-java8 . > >> > > >> > > >> >cd flink-1.12.0 > >> >./bin/kubernetes-session.sh \ > >> -Dkubernetes.container.image=flink:1.12.0-scala_2.12-java8 \ > >> -Dkubernetes.rest-service.exposed.type=NodePort \ > >> -Dtaskmanager.numberOfTaskSlots=2 \ > >> -Dkubernetes.cluster-id=flink-session-cluster > >> > > >> > > >> >显示JM启起来了,但无法通过web访问 > >> > > >> >2020-12-27 22:08:12,387 INFO > >> org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Create > >> flink session cluster session001 successfully, JobManager Web Interface: > >> http://192.168.99.100:8081 > >> > > >> > > >> > > >> > > >> >通过 `kubectl get pods` 命令查看到pod一直处理ContainerCreating状态 > >> > > >> >NAME READY STATUS > >> RESTARTS AGE > >> > > >> >flink-session-cluster-858bd55dff-bzjk2 0/1 > >> ContainerCreating 0 5m59s > >> > > >> >kubernetes-dashboard-1608509744-6bc8455756-mp47w 1/1 Running > >> 0 6d14h > >> > > >> > > >> > > >> > > >> >于是通过 `kubectl describe pod > >> flink-session-cluster-858bd55dff-bzjk2`命令查看详细,结果如下: > >> > > >> > > >> > > >> > > >> >Name: flink-session-cluster-858bd55dff-bzjk2 > >> > > >> >Namespace: default > >> > > >> >Priority: 0 > >> > > >> >Node: minikube/192.168.99.100 > >> > > >> >Start Time: Sun, 27 Dec 2020 22:21:56 +0800 > >> > > >> >Labels: app=flink-session-cluster > >> > > >> > component=jobmanager > >> > > >> > pod-template-hash=858bd55dff > >> > > >> > type=flink-native-kubernetes > >> > > >> >Annotations: <none> > >> > > >> >Status: Pending > >> > > >> >IP: 172.17.0.4 > >> > > >> >IPs: > >> > > >> > IP: 172.17.0.4 > >> > > >> >Controlled By: ReplicaSet/flink-session-cluster-858bd55dff > >> > > >> >Containers: > >> > > >> > flink-job-manager: > >> > > >> > Container ID: > >> > > >> > Image: flink:1.12.0-scala_2.12-java8 > >> > > >> > Image ID: > >> > > >> > Ports: 8081/TCP, 6123/TCP, 6124/TCP > >> > > >> > Host Ports: 0/TCP, 0/TCP, 0/TCP > >> > > >> > Command: > >> > > >> > /docker-entrypoint.sh > >> > > >> > Args: > >> > > >> > native-k8s > >> > > >> > $JAVA_HOME/bin/java -classpath $FLINK_CLASSPATH -Xmx1073741824 > >> -Xms1073741824 -XX:MaxMetaspaceSize=268435456 > >> -Dlog.file=/opt/flink/log/jobmanager.log > >> -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml > >> -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties > >> -Dlog4j.configurationFile=file:/opt/flink/conf/log4j-console.properties > >> > org.apache.flink.kubernetes.entrypoint.KubernetesSessionClusterEntrypoint > >> -D jobmanager.memory.off-heap.size=134217728b -D > >> jobmanager.memory.jvm-overhead.min=201326592b -D > >> jobmanager.memory.jvm-metaspace.size=268435456b -D > >> jobmanager.memory.heap.size=1073741824b -D > >> jobmanager.memory.jvm-overhead.max=201326592b > >> > > >> > State: Waiting > >> > > >> > Reason: ImagePullBackOff > >> > > >> > Ready: False > >> > > >> > Restart Count: 0 > >> > > >> > Limits: > >> > > >> > cpu: 1 > >> > > >> > memory: 1600Mi > >> > > >> > Requests: > >> > > >> > cpu: 1 > >> > > >> > memory: 1600Mi > >> > > >> > Environment: > >> > > >> > _POD_IP_ADDRESS: (v1:status.podIP) > >> > > >> > HADOOP_CONF_DIR: /opt/hadoop/conf > >> > > >> > Mounts: > >> > > >> > /opt/flink/conf from flink-config-volume (rw) > >> > > >> > /opt/hadoop/conf from hadoop-config-volume (rw) > >> > > >> > /var/run/secrets/kubernetes.io/serviceaccount from > >> default-token-s47ht (ro) > >> > > >> >Conditions: > >> > > >> > Type Status > >> > > >> > Initialized True > >> > > >> > Ready False > >> > > >> > ContainersReady False > >> > > >> > PodScheduled True > >> > > >> >Volumes: > >> > > >> > hadoop-config-volume: > >> > > >> > Type: ConfigMap (a volume populated by a ConfigMap) > >> > > >> > Name: hadoop-config-flink-session-cluster > >> > > >> > Optional: false > >> > > >> > flink-config-volume: > >> > > >> > Type: ConfigMap (a volume populated by a ConfigMap) > >> > > >> > Name: flink-config-flink-session-cluster > >> > > >> > Optional: false > >> > > >> > default-token-s47ht: > >> > > >> > Type: Secret (a volume populated by a Secret) > >> > > >> > SecretName: default-token-s47ht > >> > > >> > Optional: false > >> > > >> >QoS Class: Guaranteed > >> > > >> >Node-Selectors: <none> > >> > > >> >Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for > >> 300s > >> > > >> > node.kubernetes.io/unreachable:NoExecute op=Exists > for > >> 300s > >> > > >> >Events: > >> > > >> > Type Reason Age From Message > >> > > >> > ---- ------ ---- ---- ------- > >> > > >> > Normal Scheduled 21m default-scheduler > >> Successfully assigned default/flink-session-cluster-858bd55dff-bzjk2 to > >> minikube > >> > > >> > Warning FailedMount 21m (x2 over 21m) kubelet > >> MountVolume.SetUp failed for volume "flink-config-volume" : configmap > >> "flink-config-flink-session-cluster" not found > >> > > >> > Warning FailedMount 21m (x2 over 21m) kubelet > >> MountVolume.SetUp failed for volume "hadoop-config-volume" : configmap > >> "hadoop-config-flink-session-cluster" not found > >> > > >> > Normal Pulling 13m (x4 over 21m) kubelet Pulling > >> image "flink:1.12.0-scala_2.12-java8" > >> > > >> > Warning Failed 13m (x4 over 15m) kubelet Failed > to > >> pull image "flink:1.12.0-scala_2.12-java8": rpc error: code = Unknown > desc > >> = Error response from daemon: manifest for flink:1.12.0-scala_2.12-java8 > >> not found: manifest unknown: manifest unknown > >> > > >> > Normal BackOff 13m (x5 over 15m) kubelet > Back-off > >> pulling image "flink:1.12.0-scala_2.12-java8" > >> > > >> > Warning Failed 11m (x5 over 15m) kubelet Error: > >> ErrImagePull > >> > > >> > Warning Failed 100s (x53 over 15m) kubelet Error: > >> ImagePullBackOff > >> > > >> > > >> > > >> > > >> >一开始怀疑本地镜像没有生成,于是通过 `docker images` 命令查看 > >> > > >> >REPOSITORY TAG > >> IMAGE ID CREATED SIZE > >> > > >> >flink > >> 1.12.0-scala_2.12-java8 f7dd9b9e020b 12 hours ago 642MB > >> > > >> > > >> > > >> > > >> > >> > >显示镜像的确是存在的,这就奇怪了,为什么从本地pull镜像会失败呢?是哪里有问题了吗?minikube下,如何从本地web访问到k8s上运行的flink集群dashboard呢? > >> > > >> >第一次用k8s,还请各位指点,谢谢! > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > |
Free forum by Nabble | Edit this page |