普通的source -> map -> filter-> sink 测试应用。
触发savepoint的脚本 : ${FLINK_HOME} stop -p ${TARGET_DIR} -d ${JOB_ID} 具体报错信息: org.apache.flink.util.FlinkException: Could not stop with a savepoint job "81990282a4686ebda3d04041e3620776". at org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:462) at org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:843) at org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:454) at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:907) at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:968) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:968) Caused by: java.util.concurrent.TimeoutException at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) at org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:460) ... 9 more 查看报错,怀疑是权限问题,我是root用户启动的应用,savepoint目录所在的hdfs路径权限所属也是root,如果不停止应用,直接触发savepoint没问题,继续定位到是root用户去停止hadoop 应用遇到权限问题,但是不知道怎么解决,目前卡在这里。 -- Sent from: http://apache-flink.147419.n8.nabble.com/ |
Hi
你可以看下 JM log 中这个 savepoint 失败是什么原因导致的,如果是 savepoint 超时了,就要看哪个 task 完成的慢,(savepoint 可能比 checkpoint 要慢) Best, Congxian Robin Zhang <[hidden email]> 于2020年10月19日周一 下午3:42写道: > 普通的source -> map -> filter-> sink 测试应用。 > > 触发savepoint的脚本 : > ${FLINK_HOME} stop -p ${TARGET_DIR} -d ${JOB_ID} > 具体报错信息: > > org.apache.flink.util.FlinkException: Could not stop with a savepoint job > "81990282a4686ebda3d04041e3620776". > at > org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:462) > at > > org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:843) > at > org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:454) > at > > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:907) > at > > org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:968) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962) > at > > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at > org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:968) > Caused by: java.util.concurrent.TimeoutException > at > > java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) > at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) > at > org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:460) > ... 9 more > > > > 查看报错,怀疑是权限问题,我是root用户启动的应用,savepoint目录所在的hdfs路径权限所属也是root,如果不停止应用,直接触发savepoint没问题,继续定位到是root用户去停止hadoop > 应用遇到权限问题,但是不知道怎么解决,目前卡在这里。 > > > > -- > Sent from: http://apache-flink.147419.n8.nabble.com/ > |
In reply to this post by LiangbinZhang
Hi Robin Zhang
你应该是遇到了这个issue报告的问题:https://issues.apache.org/jira/browse/FLINK-16626 ,可以看下这个issue描述,祝好~ Robin Zhang <[hidden email]> 于2020年10月19日周一 下午3:42写道: > 普通的source -> map -> filter-> sink 测试应用。 > > 触发savepoint的脚本 : > ${FLINK_HOME} stop -p ${TARGET_DIR} -d ${JOB_ID} > 具体报错信息: > > org.apache.flink.util.FlinkException: Could not stop with a savepoint job > "81990282a4686ebda3d04041e3620776". > at > org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:462) > at > > org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:843) > at > org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:454) > at > > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:907) > at > > org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:968) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962) > at > > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at > org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:968) > Caused by: java.util.concurrent.TimeoutException > at > > java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) > at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) > at > org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:460) > ... 9 more > > > > 查看报错,怀疑是权限问题,我是root用户启动的应用,savepoint目录所在的hdfs路径权限所属也是root,如果不停止应用,直接触发savepoint没问题,继续定位到是root用户去停止hadoop > 应用遇到权限问题,但是不知道怎么解决,目前卡在这里。 > > > > -- > Sent from: http://apache-flink.147419.n8.nabble.com/ > |
In reply to this post by Congxian Qiu
Hi,Congxian
感谢提供思路,看了一下,JM端没有暴露日志,只能查看到ck正常的日志 Best, Robin Congxian Qiu wrote > Hi > 你可以看下 JM log 中这个 savepoint 失败是什么原因导致的,如果是 savepoint 超时了,就要看哪个 task > 完成的慢,(savepoint 可能比 checkpoint 要慢) > Best, > Congxian > > > Robin Zhang < > vincent2015qdlg@ > > 于2020年10月19日周一 下午3:42写道: > >> 普通的source -> map -> filter-> sink 测试应用。 >> >> 触发savepoint的脚本 : >> ${FLINK_HOME} stop -p ${TARGET_DIR} -d ${JOB_ID} >> 具体报错信息: >> >> org.apache.flink.util.FlinkException: Could not stop with a savepoint job >> "81990282a4686ebda3d04041e3620776". >> at >> org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:462) >> at >> >> org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:843) >> at >> org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:454) >> at >> >> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:907) >> at >> >> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:968) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:422) >> at >> >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962) >> at >> >> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) >> at >> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:968) >> Caused by: java.util.concurrent.TimeoutException >> at >> >> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) >> at >> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) >> at >> org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:460) >> ... 9 more >> >> >> >> 查看报错,怀疑是权限问题,我是root用户启动的应用,savepoint目录所在的hdfs路径权限所属也是root,如果不停止应用,直接触发savepoint没问题,继续定位到是root用户去停止hadoop >> 应用遇到权限问题,但是不知道怎么解决,目前卡在这里。 >> >> >> >> -- >> Sent from: http://apache-flink.147419.n8.nabble.com/ >> -- Sent from: http://apache-flink.147419.n8.nabble.com/ |
In reply to this post by zilong xiao
Hi,zilong
的确是这个问题,感谢帮助。 Best, Robin zilong xiao wrote > Hi Robin Zhang > 你应该是遇到了这个issue报告的问题:https://issues.apache.org/jira/browse/FLINK-16626 > ,可以看下这个issue描述,祝好~ > > Robin Zhang < > vincent2015qdlg@ > > 于2020年10月19日周一 下午3:42写道: > >> 普通的source -> map -> filter-> sink 测试应用。 >> >> 触发savepoint的脚本 : >> ${FLINK_HOME} stop -p ${TARGET_DIR} -d ${JOB_ID} >> 具体报错信息: >> >> org.apache.flink.util.FlinkException: Could not stop with a savepoint job >> "81990282a4686ebda3d04041e3620776". >> at >> org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:462) >> at >> >> org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:843) >> at >> org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:454) >> at >> >> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:907) >> at >> >> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:968) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:422) >> at >> >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962) >> at >> >> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) >> at >> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:968) >> Caused by: java.util.concurrent.TimeoutException >> at >> >> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771) >> at >> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) >> at >> org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:460) >> ... 9 more >> >> >> >> 查看报错,怀疑是权限问题,我是root用户启动的应用,savepoint目录所在的hdfs路径权限所属也是root,如果不停止应用,直接触发savepoint没问题,继续定位到是root用户去停止hadoop >> 应用遇到权限问题,但是不知道怎么解决,目前卡在这里。 >> >> >> >> -- >> Sent from: http://apache-flink.147419.n8.nabble.com/ >> -- Sent from: http://apache-flink.147419.n8.nabble.com/ |
Free forum by Nabble | Edit this page |