最近试用flink kubernetes application时发现TM不断申请再终止,而且设置的LoadBalancer类型的Rest服务一直没有ready,查看不到flink web ui,k8s日志如下,这是什么原因?是因为我申请的资源太小么?
================= 启动参数 "kubernetes.jobmanager.cpu": "0.1", "kubernetes.taskmanager.cpu": "0.1", "taskmanager.numberOfTaskSlots": "1", "jobmanager.memory.process.size": "1024m", "taskmanager.memory.process.size": "1024m", ================= k8s日志 2021-04-05 09:55:14,777 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - JobManager successfully registered at ResourceManager, leader id: 9903e058fb5ca6f418c78dafcad048f1. 2021-04-05 09:55:14,869 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Registered job manager [hidden email]://flink@172.17.0.5:6123/user/rpc/jobmanager_2 for job 00000000000000000000000000000000. 2021-04-05 09:55:14,869 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Registered job manager [hidden email]://flink@172.17.0.5:6123/user/rpc/jobmanager_2 for job 00000000000000000000000000000000. 2021-04-05 09:55:14,870 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{3bcf44c03f742d211b5abcc9d0d35068}] and profile ResourceProfile{UNKNOWN} with allocation id 17bcd11a1d493155e3ed45cfd200be79 from resource manager. 2021-04-05 09:55:14,871 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Registered job manager [hidden email]://flink@172.17.0.5:6123/user/rpc/jobmanager_2 for job 00000000000000000000000000000000. 2021-04-05 09:55:14,871 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Request slot with profile ResourceProfile{UNKNOWN} for job 00000000000000000000000000000000 with allocation id 17bcd11a1d493155e3ed45cfd200be79. 2021-04-05 09:55:14,974 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Requesting new worker with resource spec WorkerResourceSpec {cpuCores=0.1, taskHeapSize=25.600mb (26843542 bytes), taskOffHeapSize=0 bytes, networkMemSize=64.000mb (67108864 bytes), managedMemSize=230.400mb (241591914 bytes)}, current pending count: 1. 2021-04-05 09:55:15,272 INFO org.apache.flink.runtime.externalresource.ExternalResourceUtils [] - Enabled external resources: [] 2021-04-05 09:55:18,570 INFO org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Creating new TaskManager pod with name flink-k8s-native-application-cluster-taskmanager-1-1 and resource <1024,0.1>. 2021-04-05 09:55:22,669 INFO org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Pod flink-k8s-native-application-cluster-taskmanager-1-1 is created. 2021-04-05 09:55:22,670 INFO org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Received new TaskManager pod: flink-k8s-native-application-cluster-taskmanager-1-1 2021-04-05 09:55:22,770 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Requested worker flink-k8s-native-application-cluster-taskmanager-1-1 with resource spec WorkerResourceSpec {cpuCores=0.1, taskHeapSize=25.600mb (26843542 bytes), taskOffHeapSize=0 bytes, networkMemSize=64.000mb (67108864 bytes), managedMemSize=230.400mb (241591914 bytes)}. 2021-04-05 09:56:35,494 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Worker flink-k8s-native-application-cluster-taskmanager-1-1 with resource spec WorkerResourceSpec {cpuCores=0.1, taskHeapSize=25.600mb (26843542 bytes), taskOffHeapSize=0 bytes, networkMemSize=64.000mb (67108864 bytes), managedMemSize=230.400mb (241591914 bytes)} was requested in current attempt and has not registered. Current pending count after removing: 0. 2021-04-05 09:56:35,494 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Worker flink-k8s-native-application-cluster-taskmanager-1-1 is terminated. Diagnostics: Pod terminated, container termination statuses: [flink-task-manager(exitCode=1, reason=Error, message=null)] 2021-04-05 09:56:35,495 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Requesting new worker with resource spec WorkerResourceSpec {cpuCores=0.1, taskHeapSize=25.600mb (26843542 bytes), taskOffHeapSize=0 bytes, networkMemSize=64.000mb (67108864 bytes), managedMemSize=230.400mb (241591914 bytes)}, current pending count: 1. 2021-04-05 09:56:35,496 INFO org.apache.flink.runtime.externalresource.ExternalResourceUtils [] - Enabled external resources: [] 2021-04-05 09:56:35,498 INFO org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Creating new TaskManager pod with name flink-k8s-native-application-cluster-taskmanager-1-2 and resource <1024,0.1>. 2021-04-05 09:56:35,700 INFO org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Pod flink-k8s-native-application-cluster-taskmanager-1-2 is created. 2021-04-05 09:56:35,811 INFO org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Received new TaskManager pod: flink-k8s-native-application-cluster-taskmanager-1-2 2021-04-05 09:56:35,811 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Requested worker flink-k8s-native-application-cluster-taskmanager-1-2 with resource spec WorkerResourceSpec {cpuCores=0.1, taskHeapSize=25.600mb (26843542 bytes), taskOffHeapSize=0 bytes, networkMemSize=64.000mb (67108864 bytes), managedMemSize=230.400mb (241591914 bytes)}. 2021-04-05 09:57:56,904 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Worker flink-k8s-native-application-cluster-taskmanager-1-2 with resource spec WorkerResourceSpec {cpuCores=0.1, taskHeapSize=25.600mb (26843542 bytes), taskOffHeapSize=0 bytes, networkMemSize=64.000mb (67108864 bytes), managedMemSize=230.400mb (241591914 bytes)} was requested in current attempt and has not registered. Current pending count after removing: 0. 2021-04-05 09:57:56,997 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Worker flink-k8s-native-application-cluster-taskmanager-1-2 is terminated. Diagnostics: Pod terminated, container termination statuses: [flink-task-manager(exitCode=1, reason=Error, message=null)] 2021-04-05 09:57:56,998 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Requesting new worker with resource spec WorkerResourceSpec {cpuCores=0.1, taskHeapSize=25.600mb (26843542 bytes), taskOffHeapSize=0 bytes, networkMemSize=64.000mb (67108864 bytes), managedMemSize=230.400mb (241591914 bytes)}, current pending count: 1. 2021-04-05 09:57:57,099 INFO org.apache.flink.runtime.externalresource.ExternalResourceUtils [] - Enabled external resources: [] 2021-04-05 09:57:57,199 INFO org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Creating new TaskManager pod with name flink-k8s-native-application-cluster-taskmanager-1-3 and resource <1024,0.1>. 2021-04-05 09:57:57,800 INFO org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Pod flink-k8s-native-application-cluster-taskmanager-1-3 is created. 2021-04-05 09:57:58,197 INFO org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Received new TaskManager pod: flink-k8s-native-application-cluster-taskmanager-1-3 2021-04-05 09:57:58,198 INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Requested worker flink-k8s-native-application-cluster-taskmanager-1-3 with resource spec WorkerResourceSpec {cpuCores=0.1, taskHeapSize=25.600mb (26843542 bytes), taskOffHeapSize=0 bytes, networkMemSize=64.000mb (67108864 bytes), managedMemSize=230.400mb (241591914 bytes)}. |
你的cpu设置这么小,K8s是严格限制的
我怀疑TM启动很慢,一直注册不上来超时导致失败了,你可以看看TM log确认一下 另外,从你发的这个log看,rest endpoint应该已经成功启动了,可以通过<LoadBalancerIP:8081>来进行访问 Best, Yang casel.chen <[hidden email]> 于2021年4月5日周一 上午10:05写道: > 最近试用flink kubernetes > application时发现TM不断申请再终止,而且设置的LoadBalancer类型的Rest服务一直没有ready,查看不到flink web > ui,k8s日志如下,这是什么原因?是因为我申请的资源太小么? > > > ================= 启动参数 > "kubernetes.jobmanager.cpu": "0.1", > "kubernetes.taskmanager.cpu": "0.1", > "taskmanager.numberOfTaskSlots": "1", > "jobmanager.memory.process.size": "1024m", > "taskmanager.memory.process.size": "1024m", > > > ================= k8s日志 > > > > 2021-04-05 09:55:14,777 INFO > org.apache.flink.runtime.jobmaster.JobMaster [] - > JobManager successfully registered at ResourceManager, leader id: > 9903e058fb5ca6f418c78dafcad048f1. > 2021-04-05 09:55:14,869 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Registered job manager [hidden email]:// > flink@172.17.0.5:6123/user/rpc/jobmanager_2 for job > 00000000000000000000000000000000. > 2021-04-05 09:55:14,869 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Registered job manager [hidden email]:// > flink@172.17.0.5:6123/user/rpc/jobmanager_2 for job > 00000000000000000000000000000000. > 2021-04-05 09:55:14,870 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{3bcf44c03f742d211b5abcc9d0d35068}] and > profile ResourceProfile{UNKNOWN} with allocation id > 17bcd11a1d493155e3ed45cfd200be79 from resource manager. > 2021-04-05 09:55:14,871 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Registered job manager [hidden email]:// > flink@172.17.0.5:6123/user/rpc/jobmanager_2 for job > 00000000000000000000000000000000. > 2021-04-05 09:55:14,871 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Request slot with profile ResourceProfile{UNKNOWN} for job > 00000000000000000000000000000000 with allocation id > 17bcd11a1d493155e3ed45cfd200be79. > 2021-04-05 09:55:14,974 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Requesting new worker with resource spec WorkerResourceSpec {cpuCores=0.1, > taskHeapSize=25.600mb (26843542 bytes), taskOffHeapSize=0 bytes, > networkMemSize=64.000mb (67108864 bytes), managedMemSize=230.400mb > (241591914 bytes)}, current pending count: 1. > 2021-04-05 09:55:15,272 INFO > org.apache.flink.runtime.externalresource.ExternalResourceUtils [] - > Enabled external resources: [] > 2021-04-05 09:55:18,570 INFO > org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Creating > new TaskManager pod with name > flink-k8s-native-application-cluster-taskmanager-1-1 and resource > <1024,0.1>. > 2021-04-05 09:55:22,669 INFO > org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Pod > flink-k8s-native-application-cluster-taskmanager-1-1 is created. > 2021-04-05 09:55:22,670 INFO > org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Received > new TaskManager pod: flink-k8s-native-application-cluster-taskmanager-1-1 > 2021-04-05 09:55:22,770 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Requested worker flink-k8s-native-application-cluster-taskmanager-1-1 with > resource spec WorkerResourceSpec {cpuCores=0.1, taskHeapSize=25.600mb > (26843542 bytes), taskOffHeapSize=0 bytes, networkMemSize=64.000mb > (67108864 bytes), managedMemSize=230.400mb (241591914 bytes)}. > 2021-04-05 09:56:35,494 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Worker flink-k8s-native-application-cluster-taskmanager-1-1 with resource > spec WorkerResourceSpec {cpuCores=0.1, taskHeapSize=25.600mb (26843542 > bytes), taskOffHeapSize=0 bytes, networkMemSize=64.000mb (67108864 bytes), > managedMemSize=230.400mb (241591914 bytes)} was requested in current > attempt and has not registered. Current pending count after removing: 0. > 2021-04-05 09:56:35,494 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Worker flink-k8s-native-application-cluster-taskmanager-1-1 is terminated. > Diagnostics: Pod terminated, container termination statuses: > [flink-task-manager(exitCode=1, reason=Error, message=null)] > 2021-04-05 09:56:35,495 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Requesting new worker with resource spec WorkerResourceSpec {cpuCores=0.1, > taskHeapSize=25.600mb (26843542 bytes), taskOffHeapSize=0 bytes, > networkMemSize=64.000mb (67108864 bytes), managedMemSize=230.400mb > (241591914 bytes)}, current pending count: 1. > 2021-04-05 09:56:35,496 INFO > org.apache.flink.runtime.externalresource.ExternalResourceUtils [] - > Enabled external resources: [] > 2021-04-05 09:56:35,498 INFO > org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Creating > new TaskManager pod with name > flink-k8s-native-application-cluster-taskmanager-1-2 and resource > <1024,0.1>. > 2021-04-05 09:56:35,700 INFO > org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Pod > flink-k8s-native-application-cluster-taskmanager-1-2 is created. > 2021-04-05 09:56:35,811 INFO > org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Received > new TaskManager pod: flink-k8s-native-application-cluster-taskmanager-1-2 > 2021-04-05 09:56:35,811 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Requested worker flink-k8s-native-application-cluster-taskmanager-1-2 with > resource spec WorkerResourceSpec {cpuCores=0.1, taskHeapSize=25.600mb > (26843542 bytes), taskOffHeapSize=0 bytes, networkMemSize=64.000mb > (67108864 bytes), managedMemSize=230.400mb (241591914 bytes)}. > 2021-04-05 09:57:56,904 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Worker flink-k8s-native-application-cluster-taskmanager-1-2 with resource > spec WorkerResourceSpec {cpuCores=0.1, taskHeapSize=25.600mb (26843542 > bytes), taskOffHeapSize=0 bytes, networkMemSize=64.000mb (67108864 bytes), > managedMemSize=230.400mb (241591914 bytes)} was requested in current > attempt and has not registered. Current pending count after removing: 0. > 2021-04-05 09:57:56,997 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Worker flink-k8s-native-application-cluster-taskmanager-1-2 is terminated. > Diagnostics: Pod terminated, container termination statuses: > [flink-task-manager(exitCode=1, reason=Error, message=null)] > 2021-04-05 09:57:56,998 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Requesting new worker with resource spec WorkerResourceSpec {cpuCores=0.1, > taskHeapSize=25.600mb (26843542 bytes), taskOffHeapSize=0 bytes, > networkMemSize=64.000mb (67108864 bytes), managedMemSize=230.400mb > (241591914 bytes)}, current pending count: 1. > 2021-04-05 09:57:57,099 INFO > org.apache.flink.runtime.externalresource.ExternalResourceUtils [] - > Enabled external resources: [] > 2021-04-05 09:57:57,199 INFO > org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Creating > new TaskManager pod with name > flink-k8s-native-application-cluster-taskmanager-1-3 and resource > <1024,0.1>. > 2021-04-05 09:57:57,800 INFO > org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Pod > flink-k8s-native-application-cluster-taskmanager-1-3 is created. > 2021-04-05 09:57:58,197 INFO > org.apache.flink.kubernetes.KubernetesResourceManagerDriver [] - Received > new TaskManager pod: flink-k8s-native-application-cluster-taskmanager-1-3 > 2021-04-05 09:57:58,198 INFO > org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - > Requested worker flink-k8s-native-application-cluster-taskmanager-1-3 with > resource spec WorkerResourceSpec {cpuCores=0.1, taskHeapSize=25.600mb > (26843542 bytes), taskOffHeapSize=0 bytes, networkMemSize=64.000mb > (67108864 bytes), managedMemSize=230.400mb (241591914 bytes)}. |
Free forum by Nabble | Edit this page |