使用BroadcastStream后checkpoint失效

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

使用BroadcastStream后checkpoint失效

restart
问题:job在接入广播流后,checkpint失效。
描述:广播流的数据来源两个地方,一个是从mongo,一个是从kafka,数据进行union,同时指定Watermark,返回Watermark.MAX_WATERMARK(用于与主数据源connect后,窗口聚合水印更新),job部署后,来源mongo的数据源状态会变为FINISHED。网上有查过,说subtask
状态finished会导致checkpoint不触发,那如何既能满足数据源自定义(更像是DataSet),同时保证checkpoint正常触发呢



--
Sent from: http://apache-flink.147419.n8.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: 使用BroadcastStream后checkpoint失效

Congxian Qiu
Hi
    我理解你的 BroadcastStream 也会定期的再次读取,然后更新对应的状态,这样的话,你的 source 可以一直在读取数据(run
函数),不退出即可。如果只希望读取一次话,是不是维表也可以满足你的需求呢?
Best,
Congxian


restart <[hidden email]> 于2020年11月3日周二 上午10:11写道:

> 问题:job在接入广播流后,checkpint失效。
>
> 描述:广播流的数据来源两个地方,一个是从mongo,一个是从kafka,数据进行union,同时指定Watermark,返回Watermark.MAX_WATERMARK(用于与主数据源connect后,窗口聚合水印更新),job部署后,来源mongo的数据源状态会变为FINISHED。网上有查过,说subtask
> 状态finished会导致checkpoint不触发,那如何既能满足数据源自定义(更像是DataSet),同时保证checkpoint正常触发呢
>
>
>
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: 使用BroadcastStream后checkpoint失效

nobleyd
In reply to this post by restart
finish为啥会导致ckpt不触发。

restart <[hidden email]> 于2020年11月3日周二 上午10:11写道:

> 问题:job在接入广播流后,checkpint失效。
>
> 描述:广播流的数据来源两个地方,一个是从mongo,一个是从kafka,数据进行union,同时指定Watermark,返回Watermark.MAX_WATERMARK(用于与主数据源connect后,窗口聚合水印更新),job部署后,来源mongo的数据源状态会变为FINISHED。网上有查过,说subtask
> 状态finished会导致checkpoint不触发,那如何既能满足数据源自定义(更像是DataSet),同时保证checkpoint正常触发呢
>
>
>
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: 使用BroadcastStream后checkpoint失效

JasonLee
In reply to this post by restart
hi    
checkpoint目前只能用在流数据上,你读取的mongo数据是一个有界的数据源,所以是不支持做checkpoint就会导致整个任务的checkpoint失败,你可以把读取mongo做一个定时读取,这样应该就可以完成checkpoint.



-----
Best Wishes
JasonLee
--
Sent from: http://apache-flink.147419.n8.nabble.com/
Best Wishes
JasonLee
Reply | Threaded
Open this post in threaded view
|

Re: 使用BroadcastStream后checkpoint失效

restart
In reply to this post by nobleyd
finish状态导致checkpoint不触发,我看了源码,在CheckpointCoordinator.triggerCheckpoint方法有判断:
<http://apache-flink.147419.n8.nabble.com/file/t1014/Dingtalk_20201104131452.jpg>




--
Sent from: http://apache-flink.147419.n8.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: 使用BroadcastStream后checkpoint失效

restart
In reply to this post by JasonLee
感谢,改成定时确实流状态时RUNNING了



--
Sent from: http://apache-flink.147419.n8.nabble.com/