无法从checkpoint中恢复state

classic Classic list List threaded Threaded
5 messages Options
sun
Reply | Threaded
Open this post in threaded view
|

无法从checkpoint中恢复state

sun
你好,我有2个问题

1:每次重启服务,checkpoint的目录中chk-  总是从chk-1开始,chk-2 ........,没有从上次的编号开始

2:重启服务后,没有从checkpoint中恢复state的数据

下面是我的配置,我是在本地调试的,单机



final StreamExecutionEnvironment streamExecutionEnvironment = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(conf);

//            StateBackend stateBackend = new RocksDBStateBackend("hdfs://10.100.51.101:9000/flink/checkpoints",true);
            StateBackend stateBackend = new FsStateBackend("file:///flink/checkpoints");
//            StateBackend stateBackend = new MemoryStateBackend();
            streamExecutionEnvironment.setStateBackend(stateBackend);

            streamExecutionEnvironment.enableCheckpointing(1000);
            streamExecutionEnvironment.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
            streamExecutionEnvironment.getCheckpointConfig().setMinPauseBetweenCheckpoints(500);
            streamExecutionEnvironment.getCheckpointConfig().setCheckpointTimeout(60000);
            streamExecutionEnvironment.getCheckpointConfig().setMaxConcurrentCheckpoints(1);
            streamExecutionEnvironment.getCheckpointConfig()
                    .enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
Reply | Threaded
Open this post in threaded view
|

Re:无法从checkpoint中恢复state

程龙












再启动服务的时候 需要指定checkpoint回复地址,你这里只是指定了做checkpint地址





在 2020-09-03 16:03:41,"sun" <[hidden email]> 写道:

>你好,我有2个问题
>
>1:每次重启服务,checkpoint的目录中chk-&nbsp; 总是从chk-1开始,chk-2 ........,没有从上次的编号开始
>
>2:重启服务后,没有从checkpoint中恢复state的数据
>
>下面是我的配置,我是在本地调试的,单机
>
>
>
>final StreamExecutionEnvironment streamExecutionEnvironment = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(conf);
>
>//            StateBackend stateBackend = new RocksDBStateBackend("hdfs://10.100.51.101:9000/flink/checkpoints",true);
>            StateBackend stateBackend = new FsStateBackend("file:///flink/checkpoints");
>//            StateBackend stateBackend = new MemoryStateBackend();
>            streamExecutionEnvironment.setStateBackend(stateBackend);
>
>            streamExecutionEnvironment.enableCheckpointing(1000);
>            streamExecutionEnvironment.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
>            streamExecutionEnvironment.getCheckpointConfig().setMinPauseBetweenCheckpoints(500);
>            streamExecutionEnvironment.getCheckpointConfig().setCheckpointTimeout(60000);
>            streamExecutionEnvironment.getCheckpointConfig().setMaxConcurrentCheckpoints(1);
>            streamExecutionEnvironment.getCheckpointConfig()
>                    .enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
sun
Reply | Threaded
Open this post in threaded view
|

回复:无法从checkpoint中恢复state

sun
啥?




------------------&nbsp;原始邮件&nbsp;------------------
发件人:                                                                                                                        "user-zh"                                                                                    <[hidden email]&gt;;
发送时间:&nbsp;2020年9月3日(星期四) 下午4:10
收件人:&nbsp;"user-zh"<[hidden email]&gt;;

主题:&nbsp;Re:无法从checkpoint中恢复state















再启动服务的时候 需要指定checkpoint回复地址,你这里只是指定了做checkpint地址





在 2020-09-03 16:03:41,"sun" <[hidden email]&gt; 写道:
&gt;你好,我有2个问题
&gt;
&gt;1:每次重启服务,checkpoint的目录中chk-&amp;nbsp; 总是从chk-1开始,chk-2 ........,没有从上次的编号开始
&gt;
&gt;2:重启服务后,没有从checkpoint中恢复state的数据
&gt;
&gt;下面是我的配置,我是在本地调试的,单机
&gt;
&gt;
&gt;
&gt;final StreamExecutionEnvironment streamExecutionEnvironment = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(conf);
&gt;
&gt;//&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; StateBackend stateBackend = new RocksDBStateBackend("hdfs://10.100.51.101:9000/flink/checkpoints",true);
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; StateBackend stateBackend = new FsStateBackend("file:///flink/checkpoints");
&gt;//&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; StateBackend stateBackend = new MemoryStateBackend();
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; streamExecutionEnvironment.setStateBackend(stateBackend);
&gt;
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; streamExecutionEnvironment.enableCheckpointing(1000);
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; streamExecutionEnvironment.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; streamExecutionEnvironment.getCheckpointConfig().setMinPauseBetweenCheckpoints(500);
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; streamExecutionEnvironment.getCheckpointConfig().setCheckpointTimeout(60000);
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; streamExecutionEnvironment.getCheckpointConfig().setMaxConcurrentCheckpoints(1);
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; streamExecutionEnvironment.getCheckpointConfig()
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; .enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
Reply | Threaded
Open this post in threaded view
|

Re: Re:无法从checkpoint中恢复state

marble.zhong@coinflex.com.INVALID
In reply to this post by 程龙
/opt/flink/bin/flink run -d -s /opt/flink/savepoints -c
com.xxx.flink.ohlc.kafka.OrderTickCandleView
/home/service-ohlc-*-SNAPSHOT.jar

在启动job时,已经指定这个目录,但会报以下错,
Caused by: org.apache.flink.runtime.client.JobExecutionException: Could not
instantiate JobManager.
        at
org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$6(Dispatcher.java:398)
        at
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
        ... 6 more
Caused by: java.io.FileNotFoundException: Cannot find meta data file
'_metadata' in directory '/opt/flink/savepoints'. Please try to load the
checkpoint/savepoint directly from the metadata file instead of the
directory.




--
Sent from: http://apache-flink.147419.n8.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Re:无法从checkpoint中恢复state

Congxian Qiu
Hi
   从报错看,你知道的是一个目录,这个目录下面没有 _metadata 文件,这不是一个完整的 checkpoint/savepoint
因此不能用于恢复
Best,
Congxian


[hidden email] <[hidden email]>
于2020年10月27日周二 下午4:06写道:

> /opt/flink/bin/flink run -d -s /opt/flink/savepoints -c
> com.xxx.flink.ohlc.kafka.OrderTickCandleView
> /home/service-ohlc-*-SNAPSHOT.jar
>
> 在启动job时,已经指定这个目录,但会报以下错,
> Caused by: org.apache.flink.runtime.client.JobExecutionException: Could not
> instantiate JobManager.
>         at
>
> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$6(Dispatcher.java:398)
>         at
>
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
>         ... 6 more
> Caused by: java.io.FileNotFoundException: Cannot find meta data file
> '_metadata' in directory '/opt/flink/savepoints'. Please try to load the
> checkpoint/savepoint directly from the metadata file instead of the
> directory.
>
>
>
>
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/
>