大约几十分钟就会restart,请教大佬们有查的思路,每次抛出的错误都是一样的,运行一段时间也会积累很多ConfigMap,下面是一个具体的错误
错误内容 2021-01-17 04:16:46,116 ERROR org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Fatal error occurred in ResourceManager. org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Error while watching the ConfigMap test-flink-etl-42557c3f6325ffc876958430859178cd-jobmanager-leader at org.apache.flink.kubernetes.highavailability.KubernetesLeaderRetrievalDriver$ConfigMapCallbackHandlerImpl.handleFatalError(KubernetesLeaderRetrievalDriver.java:120) [flink-dist_2.11-1.12.1.jar:1.12.1] at org.apache.flink.kubernetes.kubeclient.resources.AbstractKubernetesWatcher.onClose(AbstractKubernetesWatcher.java:48) [flink-dist_2.11-1.12.1.jar:1.12.1] at io.fabric8.kubernetes.client.utils.WatcherToggle.onClose(WatcherToggle.java:56) [flink-dist_2.11-1.12.1.jar:1.12.1] at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.closeEvent(WatchConnectionManager.java:367) [flink-dist_2.11-1.12.1.jar:1.12.1] at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$700(WatchConnectionManager.java:50) [flink-dist_2.11-1.12.1.jar:1.12.1] at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) [flink-dist_2.11-1.12.1.jar:1.12.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323) [flink-dist_2.11-1.12.1.jar:1.12.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219) [flink-dist_2.11-1.12.1.jar:1.12.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105) [flink-dist_2.11-1.12.1.jar:1.12.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274) [flink-dist_2.11-1.12.1.jar:1.12.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214) [flink-dist_2.11-1.12.1.jar:1.12.1] at org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:206) [flink-dist_2.11-1.12.1.jar:1.12.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) [flink-dist_2.11-1.12.1.jar:1.12.1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_275] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_275] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275] 2021-01-17 04:16:46,117 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Fatal error occurred in the cluster entrypoint. org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Error while watching the ConfigMap test-flink-etl-42557c3f6325ffc876958430859178cd-jobmanager-leader at org.apache.flink.kubernetes.highavailability.KubernetesLeaderRetrievalDriver$ConfigMapCallbackHandlerImpl.handleFatalError(KubernetesLeaderRetrievalDriver.java:120) [flink-dist_2.11-1.12.1.jar:1.12.1] at org.apache.flink.kubernetes.kubeclient.resources.AbstractKubernetesWatcher.onClose(AbstractKubernetesWatcher.java:48) [flink-dist_2.11-1.12.1.jar:1.12.1] at io.fabric8.kubernetes.client.utils.WatcherToggle.onClose(WatcherToggle.java:56) [flink-dist_2.11-1.12.1.jar:1.12.1] at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.closeEvent(WatchConnectionManager.java:367) [flink-dist_2.11-1.12.1.jar:1.12.1] at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$700(WatchConnectionManager.java:50) [flink-dist_2.11-1.12.1.jar:1.12.1] at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) [flink-dist_2.11-1.12.1.jar:1.12.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323) [flink-dist_2.11-1.12.1.jar:1.12.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219) [flink-dist_2.11-1.12.1.jar:1.12.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105) [flink-dist_2.11-1.12.1.jar:1.12.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274) [flink-dist_2.11-1.12.1.jar:1.12.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214) [flink-dist_2.11-1.12.1.jar:1.12.1] at org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:206) [flink-dist_2.11-1.12.1.jar:1.12.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) [flink-dist_2.11-1.12.1.jar:1.12.1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_275] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_275] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275] 2021-01-17 04:16:46,164 INFO org.apache.flink.runtime.blob.BlobServer [] - Stopped BLOB server at 0.0.0.0:6124 jobmanager重启后,查看有这个 ConfigMap test-flink-etl-42557c3f6325ffc876958430859178cd-jobmanager-leader [gum@docker-repos ~]$ kubectl -n gem-flink get cm test-flink-etl-42557c3f6325ffc876958430859178cd-jobmanager-leader -o yaml apiVersion: v1 data: address: akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_3 sessionId: c0f99c65-af3c-4916-ae7c-c272e2987e31 kind: ConfigMap metadata: annotations: control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"5fd98e66-8f6e-4871-b349-fd8760e9eb6b","leaseDuration":15.000000000,"acquireTime":"2021-01-17T03:43:12.444000Z","renewTime":"2021-01-17T03:51:52.460000Z","leaderTransitions":105}' creationTimestamp: "2021-01-17T03:43:12Z" labels: app: test-flink-etl configmap-type: high-availability type: flink-native-kubernetes name: test-flink-etl-42557c3f6325ffc876958430859178cd-jobmanager-leader namespace: gem-flink resourceVersion: "39527319" selfLink: /api/v1/namespaces/gem-flink/configmaps/test-flink-etl-42557c3f6325ffc876958430859178cd-jobmanager-leader uid: 70b979b5-b696-47b7-8eb8-558e8887f2c9 -- Sent from: http://apache-flink.147419.n8.nabble.com/ |
你搜索一下看看有没有too old resource version的报错
另外,测试一下Pod和APIServer的网络状态,是不是经常断 Best, Yang macdoor <[hidden email]> 于2021年1月18日周一 上午9:45写道: > 大约几十分钟就会restart,请教大佬们有查的思路,每次抛出的错误都是一样的,运行一段时间也会积累很多ConfigMap,下面是一个具体的错误 > > 错误内容 > > 2021-01-17 04:16:46,116 ERROR > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Fatal error occurred in ResourceManager. > org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Error > while watching the ConfigMap > test-flink-etl-42557c3f6325ffc876958430859178cd-jobmanager-leader > at > > org.apache.flink.kubernetes.highavailability.KubernetesLeaderRetrievalDriver$ConfigMapCallbackHandlerImpl.handleFatalError(KubernetesLeaderRetrievalDriver.java:120) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > > org.apache.flink.kubernetes.kubeclient.resources.AbstractKubernetesWatcher.onClose(AbstractKubernetesWatcher.java:48) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > > io.fabric8.kubernetes.client.utils.WatcherToggle.onClose(WatcherToggle.java:56) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.closeEvent(WatchConnectionManager.java:367) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$700(WatchConnectionManager.java:50) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.ws > .RealWebSocket.onReadMessage(RealWebSocket.java:323) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.ws > .WebSocketReader.readMessageFrame(WebSocketReader.java:219) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.ws > .WebSocketReader.processNextFrame(WebSocketReader.java:105) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.ws > .RealWebSocket.loopReader(RealWebSocket.java:274) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.ws > .RealWebSocket$2.onResponse(RealWebSocket.java:214) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > > org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:206) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > > org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [?:1.8.0_275] > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [?:1.8.0_275] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275] > 2021-01-17 04:16:46,117 ERROR > org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Fatal > error occurred in the cluster entrypoint. > org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Error > while watching the ConfigMap > test-flink-etl-42557c3f6325ffc876958430859178cd-jobmanager-leader > at > > org.apache.flink.kubernetes.highavailability.KubernetesLeaderRetrievalDriver$ConfigMapCallbackHandlerImpl.handleFatalError(KubernetesLeaderRetrievalDriver.java:120) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > > org.apache.flink.kubernetes.kubeclient.resources.AbstractKubernetesWatcher.onClose(AbstractKubernetesWatcher.java:48) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > > io.fabric8.kubernetes.client.utils.WatcherToggle.onClose(WatcherToggle.java:56) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.closeEvent(WatchConnectionManager.java:367) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$700(WatchConnectionManager.java:50) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.ws > .RealWebSocket.onReadMessage(RealWebSocket.java:323) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.ws > .WebSocketReader.readMessageFrame(WebSocketReader.java:219) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.ws > .WebSocketReader.processNextFrame(WebSocketReader.java:105) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.ws > .RealWebSocket.loopReader(RealWebSocket.java:274) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.ws > .RealWebSocket$2.onResponse(RealWebSocket.java:214) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > > org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:206) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > > org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [?:1.8.0_275] > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [?:1.8.0_275] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275] > 2021-01-17 04:16:46,164 INFO org.apache.flink.runtime.blob.BlobServer > > [] - Stopped BLOB server at 0.0.0.0:6124 > > jobmanager重启后,查看有这个 ConfigMap > test-flink-etl-42557c3f6325ffc876958430859178cd-jobmanager-leader > > [gum@docker-repos ~]$ kubectl -n gem-flink get cm > test-flink-etl-42557c3f6325ffc876958430859178cd-jobmanager-leader -o yaml > apiVersion: v1 > data: > address: akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_3 > sessionId: c0f99c65-af3c-4916-ae7c-c272e2987e31 > kind: ConfigMap > metadata: > annotations: > control-plane.alpha.kubernetes.io/leader: > > '{"holderIdentity":"5fd98e66-8f6e-4871-b349-fd8760e9eb6b","leaseDuration":15.000000000,"acquireTime":"2021-01-17T03:43:12.444000Z","renewTime":"2021-01-17T03:51:52.460000Z","leaderTransitions":105}' > creationTimestamp: "2021-01-17T03:43:12Z" > labels: > app: test-flink-etl > configmap-type: high-availability > type: flink-native-kubernetes > name: test-flink-etl-42557c3f6325ffc876958430859178cd-jobmanager-leader > namespace: gem-flink > resourceVersion: "39527319" > selfLink: > > /api/v1/namespaces/gem-flink/configmaps/test-flink-etl-42557c3f6325ffc876958430859178cd-jobmanager-leader > uid: 70b979b5-b696-47b7-8eb8-558e8887f2c9 > > > > > -- > Sent from: http://apache-flink.147419.n8.nabble.com/ > |
我查看了一下之前的日志,没有发现 too old resource
version,而且连续几个日志都没有其他错误,直接就这个错误,restart,然后就是一个新日志了。 我用的k8s集群似乎网络确实不太稳定,请教一下如何测试Pod和APIServer之间的网络比较容易说明问题?ping?或者什么工具? -- Sent from: http://apache-flink.147419.n8.nabble.com/ |
可以用iperf来进行网络的测试,你需要在镜像里面提前安装好
另外,可以打开debug log看一下是不是Watch经过了很多次重试都连不上,才导致失败的 Best, Yang macdoor <[hidden email]> 于2021年1月18日周一 下午7:08写道: > 我查看了一下之前的日志,没有发现 too old resource > version,而且连续几个日志都没有其他错误,直接就这个错误,restart,然后就是一个新日志了。 > > 我用的k8s集群似乎网络确实不太稳定,请教一下如何测试Pod和APIServer之间的网络比较容易说明问题?ping?或者什么工具? > > > > -- > Sent from: http://apache-flink.147419.n8.nabble.com/ |
多谢!打开了DEBUG日志,仍然只有最后一个ERROR,不过之前有不少包含
kubernetes.client.dsl.internal.WatchConnectionManager 的日志,grep 了一部分,能看出些什么吗? job-debug-0118.log:2021-01-19 02:12:25,551 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket successfully opened job-debug-0118.log:2021-01-19 02:12:25,646 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Connecting websocket ... io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@2553d42c job-debug-0118.log:2021-01-19 02:12:25,647 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket successfully opened job-debug-0118.log:2021-01-19 02:12:30,128 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Connecting websocket ... io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@5a9fa83e job-debug-0118.log:2021-01-19 02:12:30,176 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket successfully opened job-debug-0118.log:2021-01-19 02:12:39,028 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Force closing the watch io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@2553d42c job-debug-0118.log:2021-01-19 02:12:39,028 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Closing websocket org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket@15b15029 job-debug-0118.log:2021-01-19 02:12:39,030 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket close received. code: 1000, reason: job-debug-0118.log:2021-01-19 02:12:39,030 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Ignoring onClose for already closed/closing websocket job-debug-0118.log:2021-01-19 02:12:39,031 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Force closing the watch io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@2cdbe5a0 job-debug-0118.log:2021-01-19 02:12:39,031 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Closing websocket org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket@1e3f5396 job-debug-0118.log:2021-01-19 02:12:39,033 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket close received. code: 1000, reason: job-debug-0118.log:2021-01-19 02:12:39,033 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Ignoring onClose for already closed/closing websocket job-debug-0118.log:2021-01-19 02:12:42,677 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Connecting websocket ... io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@210aab4b job-debug-0118.log:2021-01-19 02:12:42,678 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket successfully opened job-debug-0118.log:2021-01-19 02:12:42,920 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Connecting websocket ... io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@278d8398 job-debug-0118.log:2021-01-19 02:12:42,921 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket successfully opened job-debug-0118.log:2021-01-19 02:12:45,130 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Connecting websocket ... io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@4b318628 job-debug-0118.log:2021-01-19 02:12:45,132 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket successfully opened job-debug-0118.log:2021-01-19 02:13:05,927 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Force closing the watch io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@278d8398 job-debug-0118.log:2021-01-19 02:13:05,927 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Closing websocket org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket@69d1ebd2 job-debug-0118.log:2021-01-19 02:13:05,930 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket close received. code: 1000, reason: job-debug-0118.log:2021-01-19 02:13:05,930 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Ignoring onClose for already closed/closing websocket job-debug-0118.log:2021-01-19 02:13:05,940 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Force closing the watch io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@210aab4b job-debug-0118.log:2021-01-19 02:13:05,940 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Closing websocket org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket@3db9d8d8 job-debug-0118.log:2021-01-19 02:13:05,942 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket close received. code: 1000, reason: job-debug-0118.log:2021-01-19 02:13:05,942 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Ignoring onClose for already closed/closing websocket job-debug-0118.log:2021-01-19 02:13:08,378 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Connecting websocket ... io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@4dcf905 job-debug-0118.log:2021-01-19 02:13:08,381 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket successfully opened job-debug-0118.log:2021-01-19 02:13:08,471 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Connecting websocket ... io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@428ca061 job-debug-0118.log:2021-01-19 02:13:08,472 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket successfully opened job-debug-0118.log:2021-01-19 02:13:10,127 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Connecting websocket ... io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@46b49e58 job-debug-0118.log:2021-01-19 02:13:10,128 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket successfully opened job-debug-0118.log:2021-01-19 02:13:21,625 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Force closing the watch io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@428ca061 job-debug-0118.log:2021-01-19 02:13:21,625 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Closing websocket org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket@14e16427 job-debug-0118.log:2021-01-19 02:13:21,627 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket close received. code: 1000, reason: job-debug-0118.log:2021-01-19 02:13:21,627 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Ignoring onClose for already closed/closing websocket job-debug-0118.log:2021-01-19 02:13:21,628 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Force closing the watch io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@4dcf905 job-debug-0118.log:2021-01-19 02:13:21,628 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Closing websocket org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket@11708e54 job-debug-0118.log:2021-01-19 02:13:21,630 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket close received. code: 1000, reason: job-debug-0118.log:2021-01-19 02:13:21,630 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Ignoring onClose for already closed/closing websocket job-debug-0118.log:2021-01-19 02:13:25,680 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Connecting websocket ... io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@3ba4abd7 job-debug-0118.log:2021-01-19 02:13:25,681 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket successfully opened job-debug-0118.log:2021-01-19 02:13:25,908 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Connecting websocket ... io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@23fe4bdd job-debug-0118.log:2021-01-19 02:13:25,909 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket successfully opened job-debug-0118.log:2021-01-19 02:13:30,128 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Connecting websocket ... io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@5cf8bd92 job-debug-0118.log:2021-01-19 02:13:30,175 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket successfully opened job-debug-0118.log:2021-01-19 02:13:46,104 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket close received. code: 1000, reason: job-debug-0118.log:2021-01-19 02:13:46,105 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Submitting reconnect task to the executor job-debug-0118.log:2021-01-19 02:13:46,113 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Scheduling reconnect task job-debug-0118.log:2021-01-19 02:13:46,117 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Current reconnect backoff is 1000 milliseconds (T0) job-debug-0118.log:2021-01-19 02:13:47,117 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Connecting websocket ... io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@23f03575 job-debug-0118.log:2021-01-19 02:13:47,120 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket successfully opened job-debug-0118.log: at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.closeEvent(WatchConnectionManager.java:367) [flink-dist_2.11-1.12.1.jar:1.12.1] job-debug-0118.log: at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$700(WatchConnectionManager.java:50) [flink-dist_2.11-1.12.1.jar:1.12.1] job-debug-0118.log: at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) [flink-dist_2.11-1.12.1.jar:1.12.1] job-debug-0118.log: at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.closeEvent(WatchConnectionManager.java:367) [flink-dist_2.11-1.12.1.jar:1.12.1] job-debug-0118.log: at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$700(WatchConnectionManager.java:50) [flink-dist_2.11-1.12.1.jar:1.12.1] job-debug-0118.log: at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) [flink-dist_2.11-1.12.1.jar:1.12.1] 最后的ERROR是这样的 2021-01-19 02:13:47,094 DEBUG org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Handling event from subtask 406 of source Source: HiveSource-snmpprobe.p_snmp_ifXTable: RequestSplitEvent (host='172.0.37.8') 2021-01-19 02:13:47,094 INFO org.apache.flink.connector.file.src.impl.StaticFileSplitEnumerator [] - Subtask 406 (on host '172.0.37.8') is requesting a file source split 2021-01-19 02:13:47,094 INFO org.apache.flink.connector.file.src.impl.StaticFileSplitEnumerator [] - No more splits available for subtask 406 2021-01-19 02:13:47,097 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: HiveSource-snmpprobe.p_snmp_ifXTable (318/458) (710557b37a1e03f0f462ab5303842489) switched from RUNNING to FINISHED. 2021-01-19 02:13:47,097 DEBUG org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Ignoring transition of vertex Source: HiveSource-snmpprobe.p_snmp_ifXTable (318/458) - execution #0 to FAILED while being FINISHED. 2021-01-19 02:13:47,097 DEBUG org.apache.flink.runtime.scheduler.SharedSlot [] - Remove logical slot (SlotRequestId{988c43b8a7b427ea962685f057438880}) for execution vertex (id 605b35e407e90cda15ad084365733fdd_317) from the physical slot (SlotRequestId{37b03b71035c9d8c564bb7c299ee9b3d}) 2021-01-19 02:13:47,097 DEBUG org.apache.flink.runtime.scheduler.SharedSlot [] - Release shared slot externally (SlotRequestId{37b03b71035c9d8c564bb7c299ee9b3d}) 2021-01-19 02:13:47,097 DEBUG org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Releasing slot [SlotRequestId{37b03b71035c9d8c564bb7c299ee9b3d}] because: Slot is being returned from SlotSharingExecutionSlotAllocator. 2021-01-19 02:13:47,097 DEBUG org.apache.flink.runtime.scheduler.SharedSlot [] - Release shared slot (SlotRequestId{37b03b71035c9d8c564bb7c299ee9b3d}) 2021-01-19 02:13:47,097 DEBUG org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Fulfilling pending slot request [SlotRequestId{bb5a48db898111288c811359cc2d7f51}] with slot [385153f7c5efff54be584439258f7352] 2021-01-19 02:13:47,097 DEBUG org.apache.flink.runtime.scheduler.SharedSlot [] - Allocated logical slot (SlotRequestId{78f370c05403ab3d703a8d89c19d23c8}) for execution vertex (id 605b35e407e90cda15ad084365733fdd_419) from the physical slot (SlotRequestId{bb5a48db898111288c811359cc2d7f51}) 2021-01-19 02:13:47,097 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: HiveSource-snmpprobe.p_snmp_ifXTable (420/458) (d04edd6e11b7cdc9e88c0ab6d756fed2) switched from SCHEDULED to DEPLOYING. 2021-01-19 02:13:47,097 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Deploying Source: HiveSource-snmpprobe.p_snmp_ifXTable (420/458) (attempt #0) with attempt id d04edd6e11b7cdc9e88c0ab6d756fed2 to 172.0.42.250:6122-5d505f @ 172-0-42-250.flink-taskmanager-query-state.gem-flink.svc.cluster.local (dataPort=40697) with allocation id 385153f7c5efff54be584439258f7352 2021-01-19 02:13:47,097 DEBUG org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - Cancel slot request 4da96bc97ef9b47ba7e408c78835d75a. 2021-01-19 02:13:47,100 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: HiveSource-snmpprobe.p_snmp_ifXTable (413/458) (2124587d1641d6cb05c05dfc742e8423) switched from DEPLOYING to RUNNING. 2021-01-19 02:13:47,100 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: HiveSource-snmpprobe.p_snmp_ifXTable (414/458) (4aa88cc1b61dc5b056ad59d373392c2f) switched from DEPLOYING to RUNNING. 2021-01-19 02:13:47,100 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: HiveSource-snmpprobe.p_snmp_ifXTable (415/458) (225d9a604e6b852f2ea6e87ebcf3107c) switched from DEPLOYING to RUNNING. 2021-01-19 02:13:47,112 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: HiveSource-snmpprobe.p_snmp_ifXTable (417/458) (c1a78898e76b3f1761cd5be1913dd24c) switched from DEPLOYING to RUNNING. 2021-01-19 02:13:47,112 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: HiveSource-snmpprobe.p_snmp_ifXTable (418/458) (6deb899dbd8bf373b349b980d1e78506) switched from DEPLOYING to RUNNING. 2021-01-19 02:13:47,113 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: HiveSource-snmpprobe.p_snmp_ifXTable (416/458) (5a99fa4a1d8bdbf93345a9b20ae1fa91) switched from DEPLOYING to RUNNING. 2021-01-19 02:13:47,117 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: HiveSource-snmpprobe.p_snmp_ifXTable (406/458) (d1f9edd1bfdd80eef6b32b8850020130) switched from RUNNING to FINISHED. 2021-01-19 02:13:47,117 DEBUG org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Ignoring transition of vertex Source: HiveSource-snmpprobe.p_snmp_ifXTable (406/458) - execution #0 to FAILED while being FINISHED. 2021-01-19 02:13:47,117 DEBUG org.apache.flink.runtime.scheduler.SharedSlot [] - Remove logical slot (SlotRequestId{037efe676c5cec5fe6b549d3ebd5f72b}) for execution vertex (id 605b35e407e90cda15ad084365733fdd_405) from the physical slot (SlotRequestId{a840e61c33cb3f250cfb54652c87aa64}) 2021-01-19 02:13:47,117 DEBUG org.apache.flink.runtime.scheduler.SharedSlot [] - Release shared slot externally (SlotRequestId{a840e61c33cb3f250cfb54652c87aa64}) 2021-01-19 02:13:47,117 DEBUG org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Releasing slot [SlotRequestId{a840e61c33cb3f250cfb54652c87aa64}] because: Slot is being returned from SlotSharingExecutionSlotAllocator. 2021-01-19 02:13:47,117 DEBUG org.apache.flink.runtime.scheduler.SharedSlot [] - Release shared slot (SlotRequestId{a840e61c33cb3f250cfb54652c87aa64}) 2021-01-19 02:13:47,117 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Connecting websocket ... io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@23f03575 2021-01-19 02:13:47,117 DEBUG org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Fulfilling pending slot request [SlotRequestId{538f35a507cd0949bf547588eb436b49}] with slot [ebf6ebbb9abe3a9e6ccb56e235b00b53] 2021-01-19 02:13:47,117 DEBUG org.apache.flink.runtime.scheduler.SharedSlot [] - Allocated logical slot (SlotRequestId{a9bac8016853f9e86963b8ee11dea18f}) for execution vertex (id 605b35e407e90cda15ad084365733fdd_420) from the physical slot (SlotRequestId{538f35a507cd0949bf547588eb436b49}) 2021-01-19 02:13:47,117 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: HiveSource-snmpprobe.p_snmp_ifXTable (421/458) (706d012e7a572e1e5786536df9ab3bbb) switched from SCHEDULED to DEPLOYING. 2021-01-19 02:13:47,117 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Deploying Source: HiveSource-snmpprobe.p_snmp_ifXTable (421/458) (attempt #0) with attempt id 706d012e7a572e1e5786536df9ab3bbb to 172.0.37.8:6122-694869 @ 172-0-37-8.flink-taskmanager-query-state.gem-flink.svc.cluster.local (dataPort=32959) with allocation id ebf6ebbb9abe3a9e6ccb56e235b00b53 2021-01-19 02:13:47,117 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: HiveSource-snmpprobe.p_snmp_ifXTable (407/458) (396cb3fdd115a31d8575407fa9ee6e07) switched from RUNNING to FINISHED. 2021-01-19 02:13:47,117 DEBUG org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Ignoring transition of vertex Source: HiveSource-snmpprobe.p_snmp_ifXTable (407/458) - execution #0 to FAILED while being FINISHED. 2021-01-19 02:13:47,117 DEBUG org.apache.flink.runtime.scheduler.SharedSlot [] - Remove logical slot (SlotRequestId{88e116a3dd0a40ef692734548aac9682}) for execution vertex (id 605b35e407e90cda15ad084365733fdd_406) from the physical slot (SlotRequestId{9bb6a1762363d3996aded34c82abab54}) 2021-01-19 02:13:47,117 DEBUG org.apache.flink.runtime.scheduler.SharedSlot [] - Release shared slot externally (SlotRequestId{9bb6a1762363d3996aded34c82abab54}) 2021-01-19 02:13:47,117 DEBUG org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Releasing slot [SlotRequestId{9bb6a1762363d3996aded34c82abab54}] because: Slot is being returned from SlotSharingExecutionSlotAllocator. 2021-01-19 02:13:47,117 DEBUG org.apache.flink.runtime.scheduler.SharedSlot [] - Release shared slot (SlotRequestId{9bb6a1762363d3996aded34c82abab54}) 2021-01-19 02:13:47,117 DEBUG org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Fulfilling pending slot request [SlotRequestId{88a7127cc4a86be0a962b9aa68d4feff}] with slot [a30c9937af2c6de7ab471086cc9268f5] 2021-01-19 02:13:47,117 DEBUG org.apache.flink.runtime.scheduler.SharedSlot [] - Allocated logical slot (SlotRequestId{da65200b5d50dcaaaff3f4373dd824c4}) for execution vertex (id 605b35e407e90cda15ad084365733fdd_421) from the physical slot (SlotRequestId{88a7127cc4a86be0a962b9aa68d4feff}) 2021-01-19 02:13:47,117 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: HiveSource-snmpprobe.p_snmp_ifXTable (422/458) (f12be47a0d11892e411d1afcb928b55a) switched from SCHEDULED to DEPLOYING. 2021-01-19 02:13:47,117 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Deploying Source: HiveSource-snmpprobe.p_snmp_ifXTable (422/458) (attempt #0) with attempt id f12be47a0d11892e411d1afcb928b55a to 172.0.37.8:6122-694869 @ 172-0-37-8.flink-taskmanager-query-state.gem-flink.svc.cluster.local (dataPort=32959) with allocation id a30c9937af2c6de7ab471086cc9268f5 2021-01-19 02:13:47,117 DEBUG org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - Cancel slot request 0a1cfa83b0d664615e0b9e1f938d7dee. 2021-01-19 02:13:47,117 DEBUG org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - Cancel slot request c1d63c4ffdf6e17212c4ca6be4071850. 2021-01-19 02:13:47,120 DEBUG io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket successfully opened 2021-01-19 02:13:47,123 ERROR org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Fatal error occurred in ResourceManager. org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Error while watching the ConfigMap test-flink-etl-cb1c647ea7488765fd3e8cc1dc691e46-jobmanager-leader at org.apache.flink.kubernetes.highavailability.KubernetesLeaderRetrievalDriver$ConfigMapCallbackHandlerImpl.handleFatalError(KubernetesLeaderRetrievalDriver.java:120) [flink-dist_2.11-1.12.1.jar:1.12.1] at org.apache.flink.kubernetes.kubeclient.resources.AbstractKubernetesWatcher.onClose(AbstractKubernetesWatcher.java:48) [flink-dist_2.11-1.12.1.jar:1.12.1] at io.fabric8.kubernetes.client.utils.WatcherToggle.onClose(WatcherToggle.java:56) [flink-dist_2.11-1.12.1.jar:1.12.1] at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.closeEvent(WatchConnectionManager.java:367) [flink-dist_2.11-1.12.1.jar:1.12.1] at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$700(WatchConnectionManager.java:50) [flink-dist_2.11-1.12.1.jar:1.12.1] at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) [flink-dist_2.11-1.12.1.jar:1.12.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323) [flink-dist_2.11-1.12.1.jar:1.12.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219) [flink-dist_2.11-1.12.1.jar:1.12.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105) [flink-dist_2.11-1.12.1.jar:1.12.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274) [flink-dist_2.11-1.12.1.jar:1.12.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214) [flink-dist_2.11-1.12.1.jar:1.12.1] at org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:206) [flink-dist_2.11-1.12.1.jar:1.12.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) [flink-dist_2.11-1.12.1.jar:1.12.1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_275] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_275] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275] 2021-01-19 02:13:47,124 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Fatal error occurred in the cluster entrypoint. org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Error while watching the ConfigMap test-flink-etl-cb1c647ea7488765fd3e8cc1dc691e46-jobmanager-leader at org.apache.flink.kubernetes.highavailability.KubernetesLeaderRetrievalDriver$ConfigMapCallbackHandlerImpl.handleFatalError(KubernetesLeaderRetrievalDriver.java:120) [flink-dist_2.11-1.12.1.jar:1.12.1] at org.apache.flink.kubernetes.kubeclient.resources.AbstractKubernetesWatcher.onClose(AbstractKubernetesWatcher.java:48) [flink-dist_2.11-1.12.1.jar:1.12.1] at io.fabric8.kubernetes.client.utils.WatcherToggle.onClose(WatcherToggle.java:56) [flink-dist_2.11-1.12.1.jar:1.12.1] at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.closeEvent(WatchConnectionManager.java:367) [flink-dist_2.11-1.12.1.jar:1.12.1] at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$700(WatchConnectionManager.java:50) [flink-dist_2.11-1.12.1.jar:1.12.1] at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) [flink-dist_2.11-1.12.1.jar:1.12.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323) [flink-dist_2.11-1.12.1.jar:1.12.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219) [flink-dist_2.11-1.12.1.jar:1.12.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105) [flink-dist_2.11-1.12.1.jar:1.12.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274) [flink-dist_2.11-1.12.1.jar:1.12.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214) [flink-dist_2.11-1.12.1.jar:1.12.1] at org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:206) [flink-dist_2.11-1.12.1.jar:1.12.1] at org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) [flink-dist_2.11-1.12.1.jar:1.12.1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_275] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_275] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275] 2021-01-19 02:13:47,125 DEBUG org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Handling event from subtask 365 of source Source: HiveSource-snmpprobe.p_snmp_ifXTable: ReaderRegistrationEvent[subtaskId = 365, location = 172.0.37.16) 2021-01-19 02:13:47,125 DEBUG org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Handling event from subtask 365 of source Source: HiveSource-snmpprobe.p_snmp_ifXTable: RequestSplitEvent (host='172.0.37.16') 2021-01-19 02:13:47,125 INFO org.apache.flink.connector.file.src.impl.StaticFileSplitEnumerator [] - Subtask 365 (on host '172.0.37.16') is requesting a file source split 2021-01-19 02:13:47,125 INFO org.apache.flink.connector.file.src.impl.StaticFileSplitEnumerator [] - No more splits available for subtask 365 2021-01-19 02:13:47,125 DEBUG org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Handling event from subtask 379 of source Source: HiveSource-snmpprobe.p_snmp_ifXTable: ReaderRegistrationEvent[subtaskId = 379, location = 172.0.37.16) 2021-01-19 02:13:47,125 DEBUG org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Handling event from subtask 379 of source Source: HiveSource-snmpprobe.p_snmp_ifXTable: RequestSplitEvent (host='172.0.37.16') 2021-01-19 02:13:47,125 INFO org.apache.flink.connector.file.src.impl.StaticFileSplitEnumerator [] - Subtask 379 (on host '172.0.37.16') is requesting a file source split 2021-01-19 02:13:47,125 INFO org.apache.flink.connector.file.src.impl.StaticFileSplitEnumerator [] - No more splits available for subtask 379 2021-01-19 02:13:47,130 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: HiveSource-snmpprobe.p_snmp_ifXTable (389/458) (b0d8b877b1911ffca609f818693b68ad) switched from DEPLOYING to RUNNING. 2021-01-19 02:13:47,131 INFO org.apache.flink.runtime.blob.BlobServer [] - Stopped BLOB server at 0.0.0.0:6124 2021-01-19 02:13:47,132 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: HiveSource-snmpprobe.p_snmp_ifXTable (421/458) (706d012e7a572e1e5786536df9ab3bbb) switched from DEPLOYING to RUNNING. -- Sent from: http://apache-flink.147419.n8.nabble.com/ |
看着是有很多Connecting websocket 和 Scheduling reconnect task的log
我觉得还是你的Pod和APIServer的网络不是很稳定 另外,可以的话,你把DEBUG级别的JobManager完整log发一下 Best, Yang macdoor <[hidden email]> 于2021年1月19日周二 上午9:31写道: > 多谢!打开了DEBUG日志,仍然只有最后一个ERROR,不过之前有不少包含 > kubernetes.client.dsl.internal.WatchConnectionManager 的日志,grep > 了一部分,能看出些什么吗? > > job-debug-0118.log:2021-01-19 02:12:25,551 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > WebSocket successfully opened > job-debug-0118.log:2021-01-19 02:12:25,646 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > Connecting websocket ... > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@2553d42c > job-debug-0118.log:2021-01-19 02:12:25,647 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > WebSocket successfully opened > job-debug-0118.log:2021-01-19 02:12:30,128 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > Connecting websocket ... > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@5a9fa83e > job-debug-0118.log:2021-01-19 02:12:30,176 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > WebSocket successfully opened > job-debug-0118.log:2021-01-19 02:12:39,028 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Force > closing the watch > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@2553d42c > job-debug-0118.log:2021-01-19 02:12:39,028 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > Closing websocket > org.apache.flink.kubernetes.shaded.okhttp3.internal.ws > .RealWebSocket@15b15029 > job-debug-0118.log:2021-01-19 02:12:39,030 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > WebSocket close received. code: 1000, reason: > job-debug-0118.log:2021-01-19 02:12:39,030 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > Ignoring onClose for already closed/closing websocket > job-debug-0118.log:2021-01-19 02:12:39,031 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Force > closing the watch > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@2cdbe5a0 > job-debug-0118.log:2021-01-19 02:12:39,031 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > Closing websocket > org.apache.flink.kubernetes.shaded.okhttp3.internal.ws > .RealWebSocket@1e3f5396 > job-debug-0118.log:2021-01-19 02:12:39,033 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > WebSocket close received. code: 1000, reason: > job-debug-0118.log:2021-01-19 02:12:39,033 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > Ignoring onClose for already closed/closing websocket > job-debug-0118.log:2021-01-19 02:12:42,677 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > Connecting websocket ... > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@210aab4b > job-debug-0118.log:2021-01-19 02:12:42,678 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > WebSocket successfully opened > job-debug-0118.log:2021-01-19 02:12:42,920 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > Connecting websocket ... > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@278d8398 > job-debug-0118.log:2021-01-19 02:12:42,921 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > WebSocket successfully opened > job-debug-0118.log:2021-01-19 02:12:45,130 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > Connecting websocket ... > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@4b318628 > job-debug-0118.log:2021-01-19 02:12:45,132 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > WebSocket successfully opened > job-debug-0118.log:2021-01-19 02:13:05,927 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Force > closing the watch > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@278d8398 > job-debug-0118.log:2021-01-19 02:13:05,927 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > Closing websocket > org.apache.flink.kubernetes.shaded.okhttp3.internal.ws > .RealWebSocket@69d1ebd2 > job-debug-0118.log:2021-01-19 02:13:05,930 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > WebSocket close received. code: 1000, reason: > job-debug-0118.log:2021-01-19 02:13:05,930 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > Ignoring onClose for already closed/closing websocket > job-debug-0118.log:2021-01-19 02:13:05,940 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Force > closing the watch > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@210aab4b > job-debug-0118.log:2021-01-19 02:13:05,940 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > Closing websocket > org.apache.flink.kubernetes.shaded.okhttp3.internal.ws > .RealWebSocket@3db9d8d8 > job-debug-0118.log:2021-01-19 02:13:05,942 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > WebSocket close received. code: 1000, reason: > job-debug-0118.log:2021-01-19 02:13:05,942 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > Ignoring onClose for already closed/closing websocket > job-debug-0118.log:2021-01-19 02:13:08,378 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > Connecting websocket ... > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@4dcf905 > job-debug-0118.log:2021-01-19 02:13:08,381 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > WebSocket successfully opened > job-debug-0118.log:2021-01-19 02:13:08,471 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > Connecting websocket ... > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@428ca061 > job-debug-0118.log:2021-01-19 02:13:08,472 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > WebSocket successfully opened > job-debug-0118.log:2021-01-19 02:13:10,127 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > Connecting websocket ... > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@46b49e58 > job-debug-0118.log:2021-01-19 02:13:10,128 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > WebSocket successfully opened > job-debug-0118.log:2021-01-19 02:13:21,625 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Force > closing the watch > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@428ca061 > job-debug-0118.log:2021-01-19 02:13:21,625 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > Closing websocket > org.apache.flink.kubernetes.shaded.okhttp3.internal.ws > .RealWebSocket@14e16427 > job-debug-0118.log:2021-01-19 02:13:21,627 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > WebSocket close received. code: 1000, reason: > job-debug-0118.log:2021-01-19 02:13:21,627 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > Ignoring onClose for already closed/closing websocket > job-debug-0118.log:2021-01-19 02:13:21,628 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Force > closing the watch > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@4dcf905 > job-debug-0118.log:2021-01-19 02:13:21,628 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > Closing websocket > org.apache.flink.kubernetes.shaded.okhttp3.internal.ws > .RealWebSocket@11708e54 > job-debug-0118.log:2021-01-19 02:13:21,630 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > WebSocket close received. code: 1000, reason: > job-debug-0118.log:2021-01-19 02:13:21,630 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > Ignoring onClose for already closed/closing websocket > job-debug-0118.log:2021-01-19 02:13:25,680 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > Connecting websocket ... > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@3ba4abd7 > job-debug-0118.log:2021-01-19 02:13:25,681 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > WebSocket successfully opened > job-debug-0118.log:2021-01-19 02:13:25,908 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > Connecting websocket ... > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@23fe4bdd > job-debug-0118.log:2021-01-19 02:13:25,909 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > WebSocket successfully opened > job-debug-0118.log:2021-01-19 02:13:30,128 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > Connecting websocket ... > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@5cf8bd92 > job-debug-0118.log:2021-01-19 02:13:30,175 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > WebSocket successfully opened > job-debug-0118.log:2021-01-19 02:13:46,104 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > WebSocket close received. code: 1000, reason: > job-debug-0118.log:2021-01-19 02:13:46,105 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > Submitting reconnect task to the executor > job-debug-0118.log:2021-01-19 02:13:46,113 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > Scheduling reconnect task > job-debug-0118.log:2021-01-19 02:13:46,117 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > Current reconnect backoff is 1000 milliseconds (T0) > job-debug-0118.log:2021-01-19 02:13:47,117 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > Connecting websocket ... > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@23f03575 > job-debug-0118.log:2021-01-19 02:13:47,120 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > WebSocket successfully opened > job-debug-0118.log: at > > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.closeEvent(WatchConnectionManager.java:367) > [flink-dist_2.11-1.12.1.jar:1.12.1] > job-debug-0118.log: at > > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$700(WatchConnectionManager.java:50) > [flink-dist_2.11-1.12.1.jar:1.12.1] > job-debug-0118.log: at > > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) > [flink-dist_2.11-1.12.1.jar:1.12.1] > job-debug-0118.log: at > > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.closeEvent(WatchConnectionManager.java:367) > [flink-dist_2.11-1.12.1.jar:1.12.1] > job-debug-0118.log: at > > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$700(WatchConnectionManager.java:50) > [flink-dist_2.11-1.12.1.jar:1.12.1] > job-debug-0118.log: at > > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) > [flink-dist_2.11-1.12.1.jar:1.12.1] > > > 最后的ERROR是这样的 > > 2021-01-19 02:13:47,094 DEBUG > org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Handling > event from subtask 406 of source Source: > HiveSource-snmpprobe.p_snmp_ifXTable: RequestSplitEvent (host='172.0.37.8') > 2021-01-19 02:13:47,094 INFO > org.apache.flink.connector.file.src.impl.StaticFileSplitEnumerator [] - > Subtask 406 (on host '172.0.37.8') is requesting a file source split > 2021-01-19 02:13:47,094 INFO > org.apache.flink.connector.file.src.impl.StaticFileSplitEnumerator [] - No > more splits available for subtask 406 > 2021-01-19 02:13:47,097 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: > HiveSource-snmpprobe.p_snmp_ifXTable (318/458) > (710557b37a1e03f0f462ab5303842489) switched from RUNNING to FINISHED. > 2021-01-19 02:13:47,097 DEBUG > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Ignoring > transition of vertex Source: HiveSource-snmpprobe.p_snmp_ifXTable (318/458) > - execution #0 to FAILED while being FINISHED. > 2021-01-19 02:13:47,097 DEBUG > org.apache.flink.runtime.scheduler.SharedSlot > [] - Remove logical slot (SlotRequestId{988c43b8a7b427ea962685f057438880}) > for execution vertex (id 605b35e407e90cda15ad084365733fdd_317) from the > physical slot (SlotRequestId{37b03b71035c9d8c564bb7c299ee9b3d}) > 2021-01-19 02:13:47,097 DEBUG > org.apache.flink.runtime.scheduler.SharedSlot > [] - Release shared slot externally > (SlotRequestId{37b03b71035c9d8c564bb7c299ee9b3d}) > 2021-01-19 02:13:47,097 DEBUG > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Releasing > slot [SlotRequestId{37b03b71035c9d8c564bb7c299ee9b3d}] because: Slot is > being returned from SlotSharingExecutionSlotAllocator. > 2021-01-19 02:13:47,097 DEBUG > org.apache.flink.runtime.scheduler.SharedSlot > [] - Release shared slot (SlotRequestId{37b03b71035c9d8c564bb7c299ee9b3d}) > 2021-01-19 02:13:47,097 DEBUG > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Fulfilling > pending slot request [SlotRequestId{bb5a48db898111288c811359cc2d7f51}] with > slot [385153f7c5efff54be584439258f7352] > 2021-01-19 02:13:47,097 DEBUG > org.apache.flink.runtime.scheduler.SharedSlot > [] - Allocated logical slot > (SlotRequestId{78f370c05403ab3d703a8d89c19d23c8}) for execution vertex (id > 605b35e407e90cda15ad084365733fdd_419) from the physical slot > (SlotRequestId{bb5a48db898111288c811359cc2d7f51}) > 2021-01-19 02:13:47,097 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: > HiveSource-snmpprobe.p_snmp_ifXTable (420/458) > (d04edd6e11b7cdc9e88c0ab6d756fed2) switched from SCHEDULED to DEPLOYING. > 2021-01-19 02:13:47,097 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Deploying > Source: HiveSource-snmpprobe.p_snmp_ifXTable (420/458) (attempt #0) with > attempt id d04edd6e11b7cdc9e88c0ab6d756fed2 to 172.0.42.250:6122-5d505f @ > 172-0-42-250.flink-taskmanager-query-state.gem-flink.svc.cluster.local > (dataPort=40697) with allocation id 385153f7c5efff54be584439258f7352 > 2021-01-19 02:13:47,097 DEBUG > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - > Cancel slot request 4da96bc97ef9b47ba7e408c78835d75a. > 2021-01-19 02:13:47,100 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: > HiveSource-snmpprobe.p_snmp_ifXTable (413/458) > (2124587d1641d6cb05c05dfc742e8423) switched from DEPLOYING to RUNNING. > 2021-01-19 02:13:47,100 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: > HiveSource-snmpprobe.p_snmp_ifXTable (414/458) > (4aa88cc1b61dc5b056ad59d373392c2f) switched from DEPLOYING to RUNNING. > 2021-01-19 02:13:47,100 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: > HiveSource-snmpprobe.p_snmp_ifXTable (415/458) > (225d9a604e6b852f2ea6e87ebcf3107c) switched from DEPLOYING to RUNNING. > 2021-01-19 02:13:47,112 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: > HiveSource-snmpprobe.p_snmp_ifXTable (417/458) > (c1a78898e76b3f1761cd5be1913dd24c) switched from DEPLOYING to RUNNING. > 2021-01-19 02:13:47,112 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: > HiveSource-snmpprobe.p_snmp_ifXTable (418/458) > (6deb899dbd8bf373b349b980d1e78506) switched from DEPLOYING to RUNNING. > 2021-01-19 02:13:47,113 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: > HiveSource-snmpprobe.p_snmp_ifXTable (416/458) > (5a99fa4a1d8bdbf93345a9b20ae1fa91) switched from DEPLOYING to RUNNING. > 2021-01-19 02:13:47,117 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: > HiveSource-snmpprobe.p_snmp_ifXTable (406/458) > (d1f9edd1bfdd80eef6b32b8850020130) switched from RUNNING to FINISHED. > 2021-01-19 02:13:47,117 DEBUG > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Ignoring > transition of vertex Source: HiveSource-snmpprobe.p_snmp_ifXTable (406/458) > - execution #0 to FAILED while being FINISHED. > 2021-01-19 02:13:47,117 DEBUG > org.apache.flink.runtime.scheduler.SharedSlot > [] - Remove logical slot (SlotRequestId{037efe676c5cec5fe6b549d3ebd5f72b}) > for execution vertex (id 605b35e407e90cda15ad084365733fdd_405) from the > physical slot (SlotRequestId{a840e61c33cb3f250cfb54652c87aa64}) > 2021-01-19 02:13:47,117 DEBUG > org.apache.flink.runtime.scheduler.SharedSlot > [] - Release shared slot externally > (SlotRequestId{a840e61c33cb3f250cfb54652c87aa64}) > 2021-01-19 02:13:47,117 DEBUG > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Releasing > slot [SlotRequestId{a840e61c33cb3f250cfb54652c87aa64}] because: Slot is > being returned from SlotSharingExecutionSlotAllocator. > 2021-01-19 02:13:47,117 DEBUG > org.apache.flink.runtime.scheduler.SharedSlot > [] - Release shared slot (SlotRequestId{a840e61c33cb3f250cfb54652c87aa64}) > 2021-01-19 02:13:47,117 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > Connecting websocket ... > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@23f03575 > 2021-01-19 02:13:47,117 DEBUG > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Fulfilling > pending slot request [SlotRequestId{538f35a507cd0949bf547588eb436b49}] with > slot [ebf6ebbb9abe3a9e6ccb56e235b00b53] > 2021-01-19 02:13:47,117 DEBUG > org.apache.flink.runtime.scheduler.SharedSlot > [] - Allocated logical slot > (SlotRequestId{a9bac8016853f9e86963b8ee11dea18f}) for execution vertex (id > 605b35e407e90cda15ad084365733fdd_420) from the physical slot > (SlotRequestId{538f35a507cd0949bf547588eb436b49}) > 2021-01-19 02:13:47,117 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: > HiveSource-snmpprobe.p_snmp_ifXTable (421/458) > (706d012e7a572e1e5786536df9ab3bbb) switched from SCHEDULED to DEPLOYING. > 2021-01-19 02:13:47,117 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Deploying > Source: HiveSource-snmpprobe.p_snmp_ifXTable (421/458) (attempt #0) with > attempt id 706d012e7a572e1e5786536df9ab3bbb to 172.0.37.8:6122-694869 @ > 172-0-37-8.flink-taskmanager-query-state.gem-flink.svc.cluster.local > (dataPort=32959) with allocation id ebf6ebbb9abe3a9e6ccb56e235b00b53 > 2021-01-19 02:13:47,117 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: > HiveSource-snmpprobe.p_snmp_ifXTable (407/458) > (396cb3fdd115a31d8575407fa9ee6e07) switched from RUNNING to FINISHED. > 2021-01-19 02:13:47,117 DEBUG > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Ignoring > transition of vertex Source: HiveSource-snmpprobe.p_snmp_ifXTable (407/458) > - execution #0 to FAILED while being FINISHED. > 2021-01-19 02:13:47,117 DEBUG > org.apache.flink.runtime.scheduler.SharedSlot > [] - Remove logical slot (SlotRequestId{88e116a3dd0a40ef692734548aac9682}) > for execution vertex (id 605b35e407e90cda15ad084365733fdd_406) from the > physical slot (SlotRequestId{9bb6a1762363d3996aded34c82abab54}) > 2021-01-19 02:13:47,117 DEBUG > org.apache.flink.runtime.scheduler.SharedSlot > [] - Release shared slot externally > (SlotRequestId{9bb6a1762363d3996aded34c82abab54}) > 2021-01-19 02:13:47,117 DEBUG > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Releasing > slot [SlotRequestId{9bb6a1762363d3996aded34c82abab54}] because: Slot is > being returned from SlotSharingExecutionSlotAllocator. > 2021-01-19 02:13:47,117 DEBUG > org.apache.flink.runtime.scheduler.SharedSlot > [] - Release shared slot (SlotRequestId{9bb6a1762363d3996aded34c82abab54}) > 2021-01-19 02:13:47,117 DEBUG > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Fulfilling > pending slot request [SlotRequestId{88a7127cc4a86be0a962b9aa68d4feff}] with > slot [a30c9937af2c6de7ab471086cc9268f5] > 2021-01-19 02:13:47,117 DEBUG > org.apache.flink.runtime.scheduler.SharedSlot > [] - Allocated logical slot > (SlotRequestId{da65200b5d50dcaaaff3f4373dd824c4}) for execution vertex (id > 605b35e407e90cda15ad084365733fdd_421) from the physical slot > (SlotRequestId{88a7127cc4a86be0a962b9aa68d4feff}) > 2021-01-19 02:13:47,117 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: > HiveSource-snmpprobe.p_snmp_ifXTable (422/458) > (f12be47a0d11892e411d1afcb928b55a) switched from SCHEDULED to DEPLOYING. > 2021-01-19 02:13:47,117 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Deploying > Source: HiveSource-snmpprobe.p_snmp_ifXTable (422/458) (attempt #0) with > attempt id f12be47a0d11892e411d1afcb928b55a to 172.0.37.8:6122-694869 @ > 172-0-37-8.flink-taskmanager-query-state.gem-flink.svc.cluster.local > (dataPort=32959) with allocation id a30c9937af2c6de7ab471086cc9268f5 > 2021-01-19 02:13:47,117 DEBUG > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - > Cancel slot request 0a1cfa83b0d664615e0b9e1f938d7dee. > 2021-01-19 02:13:47,117 DEBUG > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - > Cancel slot request c1d63c4ffdf6e17212c4ca6be4071850. > 2021-01-19 02:13:47,120 DEBUG > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - > WebSocket successfully opened > 2021-01-19 02:13:47,123 ERROR > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Fatal error occurred in ResourceManager. > org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Error > while watching the ConfigMap > test-flink-etl-cb1c647ea7488765fd3e8cc1dc691e46-jobmanager-leader > at > > org.apache.flink.kubernetes.highavailability.KubernetesLeaderRetrievalDriver$ConfigMapCallbackHandlerImpl.handleFatalError(KubernetesLeaderRetrievalDriver.java:120) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > > org.apache.flink.kubernetes.kubeclient.resources.AbstractKubernetesWatcher.onClose(AbstractKubernetesWatcher.java:48) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > > io.fabric8.kubernetes.client.utils.WatcherToggle.onClose(WatcherToggle.java:56) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.closeEvent(WatchConnectionManager.java:367) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$700(WatchConnectionManager.java:50) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.ws > .RealWebSocket.onReadMessage(RealWebSocket.java:323) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.ws > .WebSocketReader.readMessageFrame(WebSocketReader.java:219) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.ws > .WebSocketReader.processNextFrame(WebSocketReader.java:105) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.ws > .RealWebSocket.loopReader(RealWebSocket.java:274) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.ws > .RealWebSocket$2.onResponse(RealWebSocket.java:214) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > > org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:206) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > > org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [?:1.8.0_275] > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [?:1.8.0_275] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275] > 2021-01-19 02:13:47,124 ERROR > org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Fatal > error occurred in the cluster entrypoint. > org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Error > while watching the ConfigMap > test-flink-etl-cb1c647ea7488765fd3e8cc1dc691e46-jobmanager-leader > at > > org.apache.flink.kubernetes.highavailability.KubernetesLeaderRetrievalDriver$ConfigMapCallbackHandlerImpl.handleFatalError(KubernetesLeaderRetrievalDriver.java:120) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > > org.apache.flink.kubernetes.kubeclient.resources.AbstractKubernetesWatcher.onClose(AbstractKubernetesWatcher.java:48) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > > io.fabric8.kubernetes.client.utils.WatcherToggle.onClose(WatcherToggle.java:56) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.closeEvent(WatchConnectionManager.java:367) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$700(WatchConnectionManager.java:50) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > > io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.ws > .RealWebSocket.onReadMessage(RealWebSocket.java:323) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.ws > .WebSocketReader.readMessageFrame(WebSocketReader.java:219) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.ws > .WebSocketReader.processNextFrame(WebSocketReader.java:105) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.ws > .RealWebSocket.loopReader(RealWebSocket.java:274) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > org.apache.flink.kubernetes.shaded.okhttp3.internal.ws > .RealWebSocket$2.onResponse(RealWebSocket.java:214) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > > org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:206) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > > org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) > [flink-dist_2.11-1.12.1.jar:1.12.1] > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [?:1.8.0_275] > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [?:1.8.0_275] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275] > 2021-01-19 02:13:47,125 DEBUG > org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Handling > event from subtask 365 of source Source: > HiveSource-snmpprobe.p_snmp_ifXTable: ReaderRegistrationEvent[subtaskId = > 365, location = 172.0.37.16) > 2021-01-19 02:13:47,125 DEBUG > org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Handling > event from subtask 365 of source Source: > HiveSource-snmpprobe.p_snmp_ifXTable: RequestSplitEvent > (host='172.0.37.16') > 2021-01-19 02:13:47,125 INFO > org.apache.flink.connector.file.src.impl.StaticFileSplitEnumerator [] - > Subtask 365 (on host '172.0.37.16') is requesting a file source split > 2021-01-19 02:13:47,125 INFO > org.apache.flink.connector.file.src.impl.StaticFileSplitEnumerator [] - No > more splits available for subtask 365 > 2021-01-19 02:13:47,125 DEBUG > org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Handling > event from subtask 379 of source Source: > HiveSource-snmpprobe.p_snmp_ifXTable: ReaderRegistrationEvent[subtaskId = > 379, location = 172.0.37.16) > 2021-01-19 02:13:47,125 DEBUG > org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Handling > event from subtask 379 of source Source: > HiveSource-snmpprobe.p_snmp_ifXTable: RequestSplitEvent > (host='172.0.37.16') > 2021-01-19 02:13:47,125 INFO > org.apache.flink.connector.file.src.impl.StaticFileSplitEnumerator [] - > Subtask 379 (on host '172.0.37.16') is requesting a file source split > 2021-01-19 02:13:47,125 INFO > org.apache.flink.connector.file.src.impl.StaticFileSplitEnumerator [] - No > more splits available for subtask 379 > 2021-01-19 02:13:47,130 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: > HiveSource-snmpprobe.p_snmp_ifXTable (389/458) > (b0d8b877b1911ffca609f818693b68ad) switched from DEPLOYING to RUNNING. > 2021-01-19 02:13:47,131 INFO org.apache.flink.runtime.blob.BlobServer > > [] - Stopped BLOB server at 0.0.0.0:6124 > 2021-01-19 02:13:47,132 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: > HiveSource-snmpprobe.p_snmp_ifXTable (421/458) > (706d012e7a572e1e5786536df9ab3bbb) switched from DEPLOYING to RUNNING. > > > > > > -- > Sent from: http://apache-flink.147419.n8.nabble.com/ > |
通过附件或者你上传到第三方的存储,然后在这里共享一下链接
macdoor <[hidden email]> 于2021年1月19日周二 下午12:44写道: > 可以的,怎么发给你? > > > > -- > Sent from: http://apache-flink.147419.n8.nabble.com/ > |
https://pan.baidu.com/s/1GHdfeF2y8RUW_Htgdn4KbQ 提取码: piaf
-- Sent from: http://apache-flink.147419.n8.nabble.com/ |
Free forum by Nabble | Edit this page |