2020-06-19 15:11:18,361 INFO org.apache.flink.client.cli.CliFrontend - Triggering savepoint for job e229c76e6a1b43142cb4272523102ed1. 2020-06-19 15:11:18,378 INFO org.apache.flink.client.cli.CliFrontend - Waiting for response... 2020-06-19 15:11:48,381 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Stopping ZooKeeperLeaderRetrievalService /leader/rest_server_lock. 2020-06-19 15:11:48,382 INFO org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl - backgroundOperationsLoop exiting 2020-06-19 15:11:48,385 INFO org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper - Session: 0x172b776fac82479 closed 2020-06-19 15:11:48,385 INFO org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x172b776fac82479 2020-06-19 15:11:48,385 ERROR org.apache.flink.client.cli.CliFrontend - Error while running the command. org.apache.flink.util.FlinkException: Triggering a savepoint for the job e229c76e6a1b43142cb4272523102ed1 failed. at org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:633) at org.apache.flink.client.cli.CliFrontend.lambda$savepoint$9(CliFrontend.java:611) at org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:843) at org.apache.flink.client.cli.CliFrontend.savepoint(CliFrontend.java:608) at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:910) at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:968) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:968) Caused by: java.util.concurrent.TimeoutException at org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:999) at org.apache.flink.runtime.concurrent.DirectExecutorService.execute(DirectExecutorService.java:211) at org.apache.flink.runtime.concurrent.FutureUtils.lambda$orTimeout$14(FutureUtils.java:427) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) |
Hi
这里手动 Checkpoint 是指 Savepoint 吧。从栈看是因为超时了,有可能是 savepoint 比较慢导致的。 这个你可以看一下 JM log,看看是否 savepoint 很久才完成。 另外,可以描述下你们使用 savepoint 的主要场景吗? 1. 为什么要使用 savepoint 2. 在你们的场景中能否用 checkpoint 代替 savepoint 呢? Best, Congxian Zhou Zach <[hidden email]> 于2020年6月19日周五 下午3:25写道: > > > > > 2020-06-19 15:11:18,361 INFO org.apache.flink.client.cli.CliFrontend > - Triggering savepoint for job > e229c76e6a1b43142cb4272523102ed1. > 2020-06-19 15:11:18,378 INFO org.apache.flink.client.cli.CliFrontend > - Waiting for response... > 2020-06-19 15:11:48,381 INFO > org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - > Stopping ZooKeeperLeaderRetrievalService /leader/rest_server_lock. > 2020-06-19 15:11:48,382 INFO > org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl > - backgroundOperationsLoop exiting > 2020-06-19 15:11:48,385 INFO > org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper - > Session: 0x172b776fac82479 closed > 2020-06-19 15:11:48,385 INFO > org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn - > EventThread shut down for session: 0x172b776fac82479 > 2020-06-19 15:11:48,385 ERROR org.apache.flink.client.cli.CliFrontend > - Error while running the command. > org.apache.flink.util.FlinkException: Triggering a savepoint for the job > e229c76e6a1b43142cb4272523102ed1 failed. > at > org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:633) > at > org.apache.flink.client.cli.CliFrontend.lambda$savepoint$9(CliFrontend.java:611) > at > org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:843) > at > org.apache.flink.client.cli.CliFrontend.savepoint(CliFrontend.java:608) > at > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:910) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:968) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at > org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:968) > Caused by: java.util.concurrent.TimeoutException > at > org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:999) > at > org.apache.flink.runtime.concurrent.DirectExecutorService.execute(DirectExecutorService.java:211) > at > org.apache.flink.runtime.concurrent.FutureUtils.lambda$orTimeout$14(FutureUtils.java:427) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) |
Free forum by Nabble | Edit this page |