本人在使用flink savepoint 保存快照的时候,遇到错误,目前不清楚是因为什么导致这个问题,路过的大佬帮忙看下。
flink 版本1.10.1 执行 flink savepoint a3a2e6c3a5a00bbe4c0c9e351dc58c47 hdfs://hadoopnamenodeHA/flink/flink-savepoints 出现错误信息 org.apache.flink.util.FlinkException: Triggering a savepoint for the job a3a2e6c3a5a00bbe4c0c9e351dc58c47 failed. at org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:631) at org.apache.flink.client.cli.CliFrontend.lambda$savepoint$9(CliFrontend.java:609) at org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:841) at org.apache.flink.client.cli.CliFrontend.savepoint(CliFrontend.java:606) at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:908) at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:966) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:966) Caused by: java.util.concurrent.TimeoutException at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) at org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:625) |
Hi,
这个报错只是在规定的时间内没有完成 Savepoint,导致客户端连接 Master 超时, 具体的原因需要看下 Jobmaster 的日志。 PS:在任务一直重启、反压的情况下,一般 Savepoint 都会失败。 Best, Hailong Wang 在 2020-11-06 09:33:48,"张锴" <[hidden email]> 写道: >本人在使用flink savepoint 保存快照的时候,遇到错误,目前不清楚是因为什么导致这个问题,路过的大佬帮忙看下。 > >flink 版本1.10.1 > > >执行 flink savepoint a3a2e6c3a5a00bbe4c0c9e351dc58c47 >hdfs://hadoopnamenodeHA/flink/flink-savepoints > > >出现错误信息 > > >org.apache.flink.util.FlinkException: Triggering a savepoint for the job >a3a2e6c3a5a00bbe4c0c9e351dc58c47 failed. > > at >org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:631) > > at >org.apache.flink.client.cli.CliFrontend.lambda$savepoint$9(CliFrontend.java:609) > > at >org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:841) > > at org.apache.flink.client.cli.CliFrontend.savepoint(CliFrontend.java:606) > > at >org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:908) > > at >org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:966) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:422) > > at >org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) > > at >org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:966) > >Caused by: java.util.concurrent.TimeoutException > > at >java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) > > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) > > at >org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:625) |
重启和反压都正常
另外增加了从客户端到master的时间,还是有这个问题 hailongwang <[hidden email]> 于 2020年11月6日周五 10:54写道: > Hi, > > > 这个报错只是在规定的时间内没有完成 Savepoint,导致客户端连接 Master 超时, > 具体的原因需要看下 Jobmaster 的日志。 > PS:在任务一直重启、反压的情况下,一般 Savepoint 都会失败。 > > > Best, > Hailong Wang > > > > > 在 2020-11-06 09:33:48,"张锴" <[hidden email]> 写道: > >本人在使用flink savepoint 保存快照的时候,遇到错误,目前不清楚是因为什么导致这个问题,路过的大佬帮忙看下。 > > > >flink 版本1.10.1 > > > > > >执行 flink savepoint a3a2e6c3a5a00bbe4c0c9e351dc58c47 > >hdfs://hadoopnamenodeHA/flink/flink-savepoints > > > > > >出现错误信息 > > > > > >org.apache.flink.util.FlinkException: Triggering a savepoint for the job > >a3a2e6c3a5a00bbe4c0c9e351dc58c47 failed. > > > > at > > >org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:631) > > > > at > > >org.apache.flink.client.cli.CliFrontend.lambda$savepoint$9(CliFrontend.java:609) > > > > at > > >org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:841) > > > > at > org.apache.flink.client.cli.CliFrontend.savepoint(CliFrontend.java:606) > > > > at > > >org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:908) > > > > at > > >org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:966) > > > > at java.security.AccessController.doPrivileged(Native Method) > > > > at javax.security.auth.Subject.doAs(Subject.java:422) > > > > at > > >org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) > > > > at > > >org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > > > > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:966) > > > >Caused by: java.util.concurrent.TimeoutException > > > > at > > >java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) > > > > at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) > > > > at > > >org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:625) > |
Hi
从 client 端日志,或者 JM 日志还能看到其他的异常么? Best, Congxian 张锴 <[hidden email]> 于2020年11月6日周五 上午11:42写道: > 重启和反压都正常 > 另外增加了从客户端到master的时间,还是有这个问题 > > hailongwang <[hidden email]> 于 2020年11月6日周五 10:54写道: > > > Hi, > > > > > > 这个报错只是在规定的时间内没有完成 Savepoint,导致客户端连接 Master 超时, > > 具体的原因需要看下 Jobmaster 的日志。 > > PS:在任务一直重启、反压的情况下,一般 Savepoint 都会失败。 > > > > > > Best, > > Hailong Wang > > > > > > > > > > 在 2020-11-06 09:33:48,"张锴" <[hidden email]> 写道: > > >本人在使用flink savepoint 保存快照的时候,遇到错误,目前不清楚是因为什么导致这个问题,路过的大佬帮忙看下。 > > > > > >flink 版本1.10.1 > > > > > > > > >执行 flink savepoint a3a2e6c3a5a00bbe4c0c9e351dc58c47 > > >hdfs://hadoopnamenodeHA/flink/flink-savepoints > > > > > > > > >出现错误信息 > > > > > > > > >org.apache.flink.util.FlinkException: Triggering a savepoint for the job > > >a3a2e6c3a5a00bbe4c0c9e351dc58c47 failed. > > > > > > at > > > > > >org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:631) > > > > > > at > > > > > >org.apache.flink.client.cli.CliFrontend.lambda$savepoint$9(CliFrontend.java:609) > > > > > > at > > > > > >org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:841) > > > > > > at > > org.apache.flink.client.cli.CliFrontend.savepoint(CliFrontend.java:606) > > > > > > at > > > > > >org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:908) > > > > > > at > > > > > >org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:966) > > > > > > at java.security.AccessController.doPrivileged(Native Method) > > > > > > at javax.security.auth.Subject.doAs(Subject.java:422) > > > > > > at > > > > > >org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) > > > > > > at > > > > > >org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > > > > > > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:966) > > > > > >Caused by: java.util.concurrent.TimeoutException > > > > > > at > > > > > >java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) > > > > > > at > > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) > > > > > > at > > > > > >org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:625) > > > |
Hi,
你的任务时跑在yarn上的吗?如果是 需要指定 -yid > 2020年11月6日 下午1:31,Congxian Qiu <[hidden email]> 写道: > > Hi > 从 client 端日志,或者 JM 日志还能看到其他的异常么? > Best, > Congxian > > > 张锴 <[hidden email]> 于2020年11月6日周五 上午11:42写道: > >> 重启和反压都正常 >> 另外增加了从客户端到master的时间,还是有这个问题 >> >> hailongwang <[hidden email]> 于 2020年11月6日周五 10:54写道: >> >>> Hi, >>> >>> >>> 这个报错只是在规定的时间内没有完成 Savepoint,导致客户端连接 Master 超时, >>> 具体的原因需要看下 Jobmaster 的日志。 >>> PS:在任务一直重启、反压的情况下,一般 Savepoint 都会失败。 >>> >>> >>> Best, >>> Hailong Wang >>> >>> >>> >>> >>> 在 2020-11-06 09:33:48,"张锴" <[hidden email]> 写道: >>>> 本人在使用flink savepoint 保存快照的时候,遇到错误,目前不清楚是因为什么导致这个问题,路过的大佬帮忙看下。 >>>> >>>> flink 版本1.10.1 >>>> >>>> >>>> 执行 flink savepoint a3a2e6c3a5a00bbe4c0c9e351dc58c47 >>>> hdfs://hadoopnamenodeHA/flink/flink-savepoints >>>> >>>> >>>> 出现错误信息 >>>> >>>> >>>> org.apache.flink.util.FlinkException: Triggering a savepoint for the job >>>> a3a2e6c3a5a00bbe4c0c9e351dc58c47 failed. >>>> >>>> at >>> >>> >>> org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:631) >>>> >>>> at >>> >>> >>> org.apache.flink.client.cli.CliFrontend.lambda$savepoint$9(CliFrontend.java:609) >>>> >>>> at >>> >>> >>> org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:841) >>>> >>>> at >>> org.apache.flink.client.cli.CliFrontend.savepoint(CliFrontend.java:606) >>>> >>>> at >>> >>> >>> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:908) >>>> >>>> at >>> >>> >>> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:966) >>>> >>>> at java.security.AccessController.doPrivileged(Native Method) >>>> >>>> at javax.security.auth.Subject.doAs(Subject.java:422) >>>> >>>> at >>> >>> >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) >>>> >>>> at >>> >>> >>> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) >>>> >>>> at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:966) >>>> >>>> Caused by: java.util.concurrent.TimeoutException >>>> >>>> at >>> >>> >>> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) >>>> >>>> at >>> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) >>>> >>>> at >>> >>> >>> org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:625) >>> >> |
已经指定了
admin <[hidden email]> 于2020年11月6日周五 下午3:17写道: > Hi, > 你的任务时跑在yarn上的吗?如果是 需要指定 -yid > > > 2020年11月6日 下午1:31,Congxian Qiu <[hidden email]> 写道: > > > > Hi > > 从 client 端日志,或者 JM 日志还能看到其他的异常么? > > Best, > > Congxian > > > > > > 张锴 <[hidden email]> 于2020年11月6日周五 上午11:42写道: > > > >> 重启和反压都正常 > >> 另外增加了从客户端到master的时间,还是有这个问题 > >> > >> hailongwang <[hidden email]> 于 2020年11月6日周五 10:54写道: > >> > >>> Hi, > >>> > >>> > >>> 这个报错只是在规定的时间内没有完成 Savepoint,导致客户端连接 Master 超时, > >>> 具体的原因需要看下 Jobmaster 的日志。 > >>> PS:在任务一直重启、反压的情况下,一般 Savepoint 都会失败。 > >>> > >>> > >>> Best, > >>> Hailong Wang > >>> > >>> > >>> > >>> > >>> 在 2020-11-06 09:33:48,"张锴" <[hidden email]> 写道: > >>>> 本人在使用flink savepoint 保存快照的时候,遇到错误,目前不清楚是因为什么导致这个问题,路过的大佬帮忙看下。 > >>>> > >>>> flink 版本1.10.1 > >>>> > >>>> > >>>> 执行 flink savepoint a3a2e6c3a5a00bbe4c0c9e351dc58c47 > >>>> hdfs://hadoopnamenodeHA/flink/flink-savepoints > >>>> > >>>> > >>>> 出现错误信息 > >>>> > >>>> > >>>> org.apache.flink.util.FlinkException: Triggering a savepoint for the > job > >>>> a3a2e6c3a5a00bbe4c0c9e351dc58c47 failed. > >>>> > >>>> at > >>> > >>> > >>> > org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:631) > >>>> > >>>> at > >>> > >>> > >>> > org.apache.flink.client.cli.CliFrontend.lambda$savepoint$9(CliFrontend.java:609) > >>>> > >>>> at > >>> > >>> > >>> > org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:841) > >>>> > >>>> at > >>> org.apache.flink.client.cli.CliFrontend.savepoint(CliFrontend.java:606) > >>>> > >>>> at > >>> > >>> > >>> > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:908) > >>>> > >>>> at > >>> > >>> > >>> > org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:966) > >>>> > >>>> at java.security.AccessController.doPrivileged(Native Method) > >>>> > >>>> at javax.security.auth.Subject.doAs(Subject.java:422) > >>>> > >>>> at > >>> > >>> > >>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) > >>>> > >>>> at > >>> > >>> > >>> > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > >>>> > >>>> at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:966) > >>>> > >>>> Caused by: java.util.concurrent.TimeoutException > >>>> > >>>> at > >>> > >>> > >>> > java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) > >>>> > >>>> at > >>> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) > >>>> > >>>> at > >>> > >>> > >>> > org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:625) > >>> > >> > > |
In reply to this post by Congxian Qiu
看到了,通过JM看到是写的权限没有,改了之后就好了
Congxian Qiu <[hidden email]> 于2020年11月6日周五 下午1:31写道: > Hi > 从 client 端日志,或者 JM 日志还能看到其他的异常么? > Best, > Congxian > > > 张锴 <[hidden email]> 于2020年11月6日周五 上午11:42写道: > > > 重启和反压都正常 > > 另外增加了从客户端到master的时间,还是有这个问题 > > > > hailongwang <[hidden email]> 于 2020年11月6日周五 10:54写道: > > > > > Hi, > > > > > > > > > 这个报错只是在规定的时间内没有完成 Savepoint,导致客户端连接 Master 超时, > > > 具体的原因需要看下 Jobmaster 的日志。 > > > PS:在任务一直重启、反压的情况下,一般 Savepoint 都会失败。 > > > > > > > > > Best, > > > Hailong Wang > > > > > > > > > > > > > > > 在 2020-11-06 09:33:48,"张锴" <[hidden email]> 写道: > > > >本人在使用flink savepoint 保存快照的时候,遇到错误,目前不清楚是因为什么导致这个问题,路过的大佬帮忙看下。 > > > > > > > >flink 版本1.10.1 > > > > > > > > > > > >执行 flink savepoint a3a2e6c3a5a00bbe4c0c9e351dc58c47 > > > >hdfs://hadoopnamenodeHA/flink/flink-savepoints > > > > > > > > > > > >出现错误信息 > > > > > > > > > > > >org.apache.flink.util.FlinkException: Triggering a savepoint for the > job > > > >a3a2e6c3a5a00bbe4c0c9e351dc58c47 failed. > > > > > > > > at > > > > > > > > > >org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:631) > > > > > > > > at > > > > > > > > > >org.apache.flink.client.cli.CliFrontend.lambda$savepoint$9(CliFrontend.java:609) > > > > > > > > at > > > > > > > > > >org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:841) > > > > > > > > at > > > org.apache.flink.client.cli.CliFrontend.savepoint(CliFrontend.java:606) > > > > > > > > at > > > > > > > > > >org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:908) > > > > > > > > at > > > > > > > > > >org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:966) > > > > > > > > at java.security.AccessController.doPrivileged(Native Method) > > > > > > > > at javax.security.auth.Subject.doAs(Subject.java:422) > > > > > > > > at > > > > > > > > > >org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) > > > > > > > > at > > > > > > > > > >org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > > > > > > > > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:966) > > > > > > > >Caused by: java.util.concurrent.TimeoutException > > > > > > > > at > > > > > > > > > >java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) > > > > > > > > at > > > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) > > > > > > > > at > > > > > > > > > >org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:625) > > > > > > |
Free forum by Nabble | Edit this page |