Hi Arvid,
Thank you for the clarification!
Best,
Eleanore
On Tue, Mar 10, 2020 at 12:32 PM Arvid Heise <
[hidden email]> wrote:
> Hi Eleanore,
>
> incremental checkpointing would be needed if you have a large state
> (GB-TB), but between two checkpoints only little changes happen (KB-MB).
>
> There are two reasons for large state: large user state or large operator
> state coming from joins, windows, or grouping. In the end, you will see the
> total size in the web ui. If it's small and checkpointing duration is low,
> there is absolutely no way to go incremental.
>
> On Tue, Mar 10, 2020 at 5:43 PM Eleanore Jin <
[hidden email]>
> wrote:
>
>> Hi All,
>>
>> I am using Apache Beam to construct the pipeline, and this pipeline is
>> running with Flink Runner.
>>
>> Both Source and Sink are Kafka topics, I have enabled Beam Exactly once
>> semantics.
>>
>> I believe how it works in beam is:
>> the messages will be cached and not processed by the
>> KafkaExactlyOnceSink, until the checkpoint completes and all the cached
>> messages are checkpointed, then it will start processing those messages.
>>
>> So is there any benefit to enable increment checkpointing when using
>> RocksDB as backend. Because I see the states as consumer offsets, and
>> cached messages in between checkpoints. Delta seems to be the complete new
>> checkpointed states.
>>
>> Thanks a lot!
>> Eleanore
>>
>