用的版本1.9.1,我这里只要遇到异常,譬如空指针异常,然后从checkpoint恢复,总是恢复失败,报找不到sst文件的错误,错误堆栈如下:
2020-01-16 19:29:39 java.lang.Exception: Exception while creating StreamOperatorStateContext. at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:253) at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:881) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:395) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.flink.util.FlinkException: Could not restore keyed state backend for KeyedProcessOperator_8ea7af242b2bcc2d11daf69b5d588c4d_(31/32) from any of the 1 provided restore options. at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135) at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307) at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135) ... 6 more Caused by: org.apache.flink.runtime.state.BackendBuildingException: Caught unexpected exception. at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:326) at org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:520) at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291) at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142) at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121) ... 8 more Caused by: java.nio.file.NoSuchFileException: /data/hadoop/tmp/nm-local-dir/usercache/www-data/appcache/application_1579002711906_0001/flink-io-bd910c0d-03c7-48ff-8712-4e7059bac574/job_bbe797c8fcdf7c362bed774435ae5f86_op_KeyedProcessOperator_8ea7af242b2bcc2d11daf69b5d588c4d__31_32__uuid_7259cf96-aa16-423e-a356-dcac0a7859f2/db/000019.sst -> /data/hadoop/tmp/nm-local-dir/usercache/www-data/appcache/application_1579002711906_0001/flink-io-bd910c0d-03c7-48ff-8712-4e7059bac574/job_bbe797c8fcdf7c362bed774435ae5f86_op_KeyedProcessOperator_8ea7af242b2bcc2d11daf69b5d588c4d__31_32__uuid_7259cf96-aa16-423e-a356-dcac0a7859f2/40e6dc65-7fac-41ae-b736-91c4ecd5e296/000019.sst at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixFileSystemProvider.createLink(UnixFileSystemProvider.java:476) at java.nio.file.Files.createLink(Files.java:1086) at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreInstanceDirectoryFromPath(RocksDBIncrementalRestoreOperation.java:473) at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromLocalState(RocksDBIncrementalRestoreOperation.java:212) at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromRemoteState(RocksDBIncrementalRestoreOperation.java:188) at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreWithoutRescaling(RocksDBIncrementalRestoreOperation.java:162) at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restore(RocksDBIncrementalRestoreOperation.java:148) at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:270) ... 12 more 从错误日志看,首先从远程dfs下载checkpoint,下载到本地后,再做链接,但是在链接过程报错找不到文件,这个难道是权限问题吗 |
Hi
确定每次恢复的时候没有其他异常么,之前有用户遇到是因为其他异常,触发cancel task的逻辑,导致清理了本地下载的文件,所以在进行硬链的时候会遇到no such file的异常。 祝好 唐云 ________________________________ From: claylin <[hidden email]> Sent: Thursday, January 16, 2020 22:00 To: user-zh <[hidden email]> Subject: 从checkpoint恢复任务失败 用的版本1.9.1,我这里只要遇到异常,譬如空指针异常,然后从checkpoint恢复,总是恢复失败,报找不到sst文件的错误,错误堆栈如下: 2020-01-16 19:29:39 java.lang.Exception: Exception while creating StreamOperatorStateContext. at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:253) at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:881) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:395) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.flink.util.FlinkException: Could not restore keyed state backend for KeyedProcessOperator_8ea7af242b2bcc2d11daf69b5d588c4d_(31/32) from any of the 1 provided restore options. at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135) at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307) at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135) ... 6 more Caused by: org.apache.flink.runtime.state.BackendBuildingException: Caught unexpected exception. at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:326) at org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:520) at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291) at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142) at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121) ... 8 more Caused by: java.nio.file.NoSuchFileException: /data/hadoop/tmp/nm-local-dir/usercache/www-data/appcache/application_1579002711906_0001/flink-io-bd910c0d-03c7-48ff-8712-4e7059bac574/job_bbe797c8fcdf7c362bed774435ae5f86_op_KeyedProcessOperator_8ea7af242b2bcc2d11daf69b5d588c4d__31_32__uuid_7259cf96-aa16-423e-a356-dcac0a7859f2/db/000019.sst -> /data/hadoop/tmp/nm-local-dir/usercache/www-data/appcache/application_1579002711906_0001/flink-io-bd910c0d-03c7-48ff-8712-4e7059bac574/job_bbe797c8fcdf7c362bed774435ae5f86_op_KeyedProcessOperator_8ea7af242b2bcc2d11daf69b5d588c4d__31_32__uuid_7259cf96-aa16-423e-a356-dcac0a7859f2/40e6dc65-7fac-41ae-b736-91c4ecd5e296/000019.sst at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixFileSystemProvider.createLink(UnixFileSystemProvider.java:476) at java.nio.file.Files.createLink(Files.java:1086) at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreInstanceDirectoryFromPath(RocksDBIncrementalRestoreOperation.java:473) at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromLocalState(RocksDBIncrementalRestoreOperation.java:212) at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromRemoteState(RocksDBIncrementalRestoreOperation.java:188) at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreWithoutRescaling(RocksDBIncrementalRestoreOperation.java:162) at org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restore(RocksDBIncrementalRestoreOperation.java:148) at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:270) ... 12 more 从错误日志看,首先从远程dfs下载checkpoint,下载到本地后,再做链接,但是在链接过程报错找不到文件,这个难道是权限问题吗 |
Free forum by Nabble | Edit this page |