Behavior for flink job running on K8S failed after restart strategy exhausted

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Behavior for flink job running on K8S failed after restart strategy exhausted

Eleanore Jin
Hi Experts,

I have a flink cluster (per job mode) running on kubernetes. The job is
configured with restart strategy

restart-strategy.fixed-delay.attempts: 3restart-strategy.fixed-delay.delay: 10 s


So after 3 times retry, the job will be marked as FAILED, hence the pods
are not running. However, kubernetes will then restart the job again as the
available replicas do not match the desired one.

I wonder what are the suggestions for such a scenario? How should I
configure the flink job running on k8s?

Thanks a lot!
Eleanore
Reply | Threaded
Open this post in threaded view
|

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

Till Rohrmann
Hi Eleanore,

how are you deploying Flink exactly? Are you using the application mode
with native K8s support to deploy a cluster [1] or are you manually
deploying a per-job mode [2]?

I believe the problem might be that we terminate the Flink process with a
non-zero exit code if the job reaches the ApplicationStatus.FAILED [3].

cc Yang Wang have you observed a similar behavior when running Flink in
per-job mode on K8s?

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#flink-kubernetes-application
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/kubernetes.html#job-cluster-resource-definitions
[3]
https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/clusterframework/ApplicationStatus.java#L32

On Fri, Jul 31, 2020 at 6:26 PM Eleanore Jin <[hidden email]> wrote:

> Hi Experts,
>
> I have a flink cluster (per job mode) running on kubernetes. The job is
> configured with restart strategy
>
> restart-strategy.fixed-delay.attempts: 3restart-strategy.fixed-delay.delay: 10 s
>
>
> So after 3 times retry, the job will be marked as FAILED, hence the pods
> are not running. However, kubernetes will then restart the job again as the
> available replicas do not match the desired one.
>
> I wonder what are the suggestions for such a scenario? How should I
> configure the flink job running on k8s?
>
> Thanks a lot!
> Eleanore
>
Reply | Threaded
Open this post in threaded view
|

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

Eleanore Jin
Hi Till,

Thanks for the reply!

I manually deploy as per-job mode [1] and I am using Flink 1.8.2.
Specifically, I build a custom docker image, which I copied the app jar
(not uber jar) and all its dependencies under /flink/lib.

So my question is more like, in this case, if the job is marked as FAILED,
which causes k8s to restart the pod, this seems not help at all, what are
the suggestions for such scenario?

Thanks a lot!
Eleanore

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html#flink-job-cluster-on-kubernetes

On Mon, Aug 3, 2020 at 2:13 AM Till Rohrmann <[hidden email]> wrote:

> Hi Eleanore,
>
> how are you deploying Flink exactly? Are you using the application mode
> with native K8s support to deploy a cluster [1] or are you manually
> deploying a per-job mode [2]?
>
> I believe the problem might be that we terminate the Flink process with a
> non-zero exit code if the job reaches the ApplicationStatus.FAILED [3].
>
> cc Yang Wang have you observed a similar behavior when running Flink in
> per-job mode on K8s?
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#flink-kubernetes-application
> [2]
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/kubernetes.html#job-cluster-resource-definitions
> [3]
> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/clusterframework/ApplicationStatus.java#L32
>
> On Fri, Jul 31, 2020 at 6:26 PM Eleanore Jin <[hidden email]>
> wrote:
>
>> Hi Experts,
>>
>> I have a flink cluster (per job mode) running on kubernetes. The job is
>> configured with restart strategy
>>
>> restart-strategy.fixed-delay.attempts: 3restart-strategy.fixed-delay.delay: 10 s
>>
>>
>> So after 3 times retry, the job will be marked as FAILED, hence the pods
>> are not running. However, kubernetes will then restart the job again as the
>> available replicas do not match the desired one.
>>
>> I wonder what are the suggestions for such a scenario? How should I
>> configure the flink job running on k8s?
>>
>> Thanks a lot!
>> Eleanore
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

Yang Wang
Hi Eleanore,

I think you are using K8s resource "Job" to deploy the jobmanager. Please
set .spec.template.spec.restartPolicy = "Never" and spec.backoffLimit = 0.
Refer here[1] for more information.

Then, when the jobmanager failed because of any reason, the K8s job will be
marked failed. And K8s will not restart the job again.

[1].
https://kubernetes.io/docs/concepts/workloads/controllers/job/#job-termination-and-cleanup


Best,
Yang

Eleanore Jin <[hidden email]> 于2020年8月4日周二 上午12:05写道:

> Hi Till,
>
> Thanks for the reply!
>
> I manually deploy as per-job mode [1] and I am using Flink 1.8.2.
> Specifically, I build a custom docker image, which I copied the app jar
> (not uber jar) and all its dependencies under /flink/lib.
>
> So my question is more like, in this case, if the job is marked as FAILED,
> which causes k8s to restart the pod, this seems not help at all, what are
> the suggestions for such scenario?
>
> Thanks a lot!
> Eleanore
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html#flink-job-cluster-on-kubernetes
>
> On Mon, Aug 3, 2020 at 2:13 AM Till Rohrmann <[hidden email]> wrote:
>
>> Hi Eleanore,
>>
>> how are you deploying Flink exactly? Are you using the application mode
>> with native K8s support to deploy a cluster [1] or are you manually
>> deploying a per-job mode [2]?
>>
>> I believe the problem might be that we terminate the Flink process with a
>> non-zero exit code if the job reaches the ApplicationStatus.FAILED [3].
>>
>> cc Yang Wang have you observed a similar behavior when running Flink in
>> per-job mode on K8s?
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#flink-kubernetes-application
>> [2]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/kubernetes.html#job-cluster-resource-definitions
>> [3]
>> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/clusterframework/ApplicationStatus.java#L32
>>
>> On Fri, Jul 31, 2020 at 6:26 PM Eleanore Jin <[hidden email]>
>> wrote:
>>
>>> Hi Experts,
>>>
>>> I have a flink cluster (per job mode) running on kubernetes. The job is
>>> configured with restart strategy
>>>
>>> restart-strategy.fixed-delay.attempts: 3restart-strategy.fixed-delay.delay: 10 s
>>>
>>>
>>> So after 3 times retry, the job will be marked as FAILED, hence the pods
>>> are not running. However, kubernetes will then restart the job again as the
>>> available replicas do not match the desired one.
>>>
>>> I wonder what are the suggestions for such a scenario? How should I
>>> configure the flink job running on k8s?
>>>
>>> Thanks a lot!
>>> Eleanore
>>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

Till Rohrmann
@Yang Wang <[hidden email]> I believe that we should rethink the
exit codes of Flink. In general you want K8s to restart a failed Flink
process. Hence, an application which terminates in state FAILED should not
return a non-zero exit code because it is a valid termination state.

Cheers,
Till

On Tue, Aug 4, 2020 at 8:55 AM Yang Wang <[hidden email]> wrote:

> Hi Eleanore,
>
> I think you are using K8s resource "Job" to deploy the jobmanager. Please
> set .spec.template.spec.restartPolicy = "Never" and spec.backoffLimit = 0.
> Refer here[1] for more information.
>
> Then, when the jobmanager failed because of any reason, the K8s job will
> be marked failed. And K8s will not restart the job again.
>
> [1].
> https://kubernetes.io/docs/concepts/workloads/controllers/job/#job-termination-and-cleanup
>
>
> Best,
> Yang
>
> Eleanore Jin <[hidden email]> 于2020年8月4日周二 上午12:05写道:
>
>> Hi Till,
>>
>> Thanks for the reply!
>>
>> I manually deploy as per-job mode [1] and I am using Flink 1.8.2.
>> Specifically, I build a custom docker image, which I copied the app jar
>> (not uber jar) and all its dependencies under /flink/lib.
>>
>> So my question is more like, in this case, if the job is marked as
>> FAILED, which causes k8s to restart the pod, this seems not help at all,
>> what are the suggestions for such scenario?
>>
>> Thanks a lot!
>> Eleanore
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html#flink-job-cluster-on-kubernetes
>>
>> On Mon, Aug 3, 2020 at 2:13 AM Till Rohrmann <[hidden email]>
>> wrote:
>>
>>> Hi Eleanore,
>>>
>>> how are you deploying Flink exactly? Are you using the application mode
>>> with native K8s support to deploy a cluster [1] or are you manually
>>> deploying a per-job mode [2]?
>>>
>>> I believe the problem might be that we terminate the Flink process with
>>> a non-zero exit code if the job reaches the ApplicationStatus.FAILED [3].
>>>
>>> cc Yang Wang have you observed a similar behavior when running Flink in
>>> per-job mode on K8s?
>>>
>>> [1]
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#flink-kubernetes-application
>>> [2]
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/kubernetes.html#job-cluster-resource-definitions
>>> [3]
>>> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/clusterframework/ApplicationStatus.java#L32
>>>
>>> On Fri, Jul 31, 2020 at 6:26 PM Eleanore Jin <[hidden email]>
>>> wrote:
>>>
>>>> Hi Experts,
>>>>
>>>> I have a flink cluster (per job mode) running on kubernetes. The job is
>>>> configured with restart strategy
>>>>
>>>> restart-strategy.fixed-delay.attempts: 3restart-strategy.fixed-delay.delay: 10 s
>>>>
>>>>
>>>> So after 3 times retry, the job will be marked as FAILED, hence the
>>>> pods are not running. However, kubernetes will then restart the job again
>>>> as the available replicas do not match the desired one.
>>>>
>>>> I wonder what are the suggestions for such a scenario? How should I
>>>> configure the flink job running on k8s?
>>>>
>>>> Thanks a lot!
>>>> Eleanore
>>>>
>>>
Reply | Threaded
Open this post in threaded view
|

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

Yang Wang
@Till Rohrmann <[hidden email]> In native mode, when a Flink
application terminates with FAILED state, all the resources will be cleaned
up.

However, in standalone mode, I agree with you that we need to rethink the
exit code of Flink. When a job exhausts the restart
strategy, we should terminate the pod and do not restart again. After
googling, it seems that we could not specify the restartPolicy
based on exit code[1]. So maybe we need to return a zero exit code to avoid
restarting by K8s.

[1].
https://stackoverflow.com/questions/48797297/is-it-possible-to-define-restartpolicy-based-on-container-exit-code

Best,
Yang

Till Rohrmann <[hidden email]> 于2020年8月4日周二 下午3:48写道:

> @Yang Wang <[hidden email]> I believe that we should rethink the
> exit codes of Flink. In general you want K8s to restart a failed Flink
> process. Hence, an application which terminates in state FAILED should not
> return a non-zero exit code because it is a valid termination state.
>
> Cheers,
> Till
>
> On Tue, Aug 4, 2020 at 8:55 AM Yang Wang <[hidden email]> wrote:
>
>> Hi Eleanore,
>>
>> I think you are using K8s resource "Job" to deploy the jobmanager. Please
>> set .spec.template.spec.restartPolicy = "Never" and spec.backoffLimit = 0.
>> Refer here[1] for more information.
>>
>> Then, when the jobmanager failed because of any reason, the K8s job will
>> be marked failed. And K8s will not restart the job again.
>>
>> [1].
>> https://kubernetes.io/docs/concepts/workloads/controllers/job/#job-termination-and-cleanup
>>
>>
>> Best,
>> Yang
>>
>> Eleanore Jin <[hidden email]> 于2020年8月4日周二 上午12:05写道:
>>
>>> Hi Till,
>>>
>>> Thanks for the reply!
>>>
>>> I manually deploy as per-job mode [1] and I am using Flink 1.8.2.
>>> Specifically, I build a custom docker image, which I copied the app jar
>>> (not uber jar) and all its dependencies under /flink/lib.
>>>
>>> So my question is more like, in this case, if the job is marked as
>>> FAILED, which causes k8s to restart the pod, this seems not help at all,
>>> what are the suggestions for such scenario?
>>>
>>> Thanks a lot!
>>> Eleanore
>>>
>>> [1]
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html#flink-job-cluster-on-kubernetes
>>>
>>> On Mon, Aug 3, 2020 at 2:13 AM Till Rohrmann <[hidden email]>
>>> wrote:
>>>
>>>> Hi Eleanore,
>>>>
>>>> how are you deploying Flink exactly? Are you using the application mode
>>>> with native K8s support to deploy a cluster [1] or are you manually
>>>> deploying a per-job mode [2]?
>>>>
>>>> I believe the problem might be that we terminate the Flink process with
>>>> a non-zero exit code if the job reaches the ApplicationStatus.FAILED [3].
>>>>
>>>> cc Yang Wang have you observed a similar behavior when running Flink in
>>>> per-job mode on K8s?
>>>>
>>>> [1]
>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#flink-kubernetes-application
>>>> [2]
>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/kubernetes.html#job-cluster-resource-definitions
>>>> [3]
>>>> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/clusterframework/ApplicationStatus.java#L32
>>>>
>>>> On Fri, Jul 31, 2020 at 6:26 PM Eleanore Jin <[hidden email]>
>>>> wrote:
>>>>
>>>>> Hi Experts,
>>>>>
>>>>> I have a flink cluster (per job mode) running on kubernetes. The job
>>>>> is configured with restart strategy
>>>>>
>>>>> restart-strategy.fixed-delay.attempts: 3restart-strategy.fixed-delay.delay: 10 s
>>>>>
>>>>>
>>>>> So after 3 times retry, the job will be marked as FAILED, hence the
>>>>> pods are not running. However, kubernetes will then restart the job again
>>>>> as the available replicas do not match the desired one.
>>>>>
>>>>> I wonder what are the suggestions for such a scenario? How should I
>>>>> configure the flink job running on k8s?
>>>>>
>>>>> Thanks a lot!
>>>>> Eleanore
>>>>>
>>>>
Reply | Threaded
Open this post in threaded view
|

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

Eleanore Jin
Hi Yang & Till,

Thanks for your prompt reply!

Yang, regarding your question, I am actually not using k8s job, as I put my
app.jar and its dependencies under flink's lib directory. I have 1 k8s
deployment for job manager, and 1 k8s deployment for task manager, and 1
k8s service for job manager.

As you mentioned above, if flink job is marked as failed, it will cause the
job manager pod to be restarted. Which is not the ideal behavior.

Do you suggest that I should change the deployment strategy from using k8s
deployment to k8s job? In case the flink program exit with non-zero code
(e.g. exhausted number of configured restart), pod can be marked as
complete hence not restarting the job again?

Thanks a lot!
Eleanore

On Tue, Aug 4, 2020 at 2:49 AM Yang Wang <[hidden email]> wrote:

> @Till Rohrmann <[hidden email]> In native mode, when a Flink
> application terminates with FAILED state, all the resources will be cleaned
> up.
>
> However, in standalone mode, I agree with you that we need to rethink the
> exit code of Flink. When a job exhausts the restart
> strategy, we should terminate the pod and do not restart again. After
> googling, it seems that we could not specify the restartPolicy
> based on exit code[1]. So maybe we need to return a zero exit code to
> avoid restarting by K8s.
>
> [1].
> https://stackoverflow.com/questions/48797297/is-it-possible-to-define-restartpolicy-based-on-container-exit-code
>
> Best,
> Yang
>
> Till Rohrmann <[hidden email]> 于2020年8月4日周二 下午3:48写道:
>
>> @Yang Wang <[hidden email]> I believe that we should rethink the
>> exit codes of Flink. In general you want K8s to restart a failed Flink
>> process. Hence, an application which terminates in state FAILED should not
>> return a non-zero exit code because it is a valid termination state.
>>
>> Cheers,
>> Till
>>
>> On Tue, Aug 4, 2020 at 8:55 AM Yang Wang <[hidden email]> wrote:
>>
>>> Hi Eleanore,
>>>
>>> I think you are using K8s resource "Job" to deploy the jobmanager.
>>> Please set .spec.template.spec.restartPolicy = "Never" and
>>> spec.backoffLimit = 0.
>>> Refer here[1] for more information.
>>>
>>> Then, when the jobmanager failed because of any reason, the K8s job will
>>> be marked failed. And K8s will not restart the job again.
>>>
>>> [1].
>>> https://kubernetes.io/docs/concepts/workloads/controllers/job/#job-termination-and-cleanup
>>>
>>>
>>> Best,
>>> Yang
>>>
>>> Eleanore Jin <[hidden email]> 于2020年8月4日周二 上午12:05写道:
>>>
>>>> Hi Till,
>>>>
>>>> Thanks for the reply!
>>>>
>>>> I manually deploy as per-job mode [1] and I am using Flink 1.8.2.
>>>> Specifically, I build a custom docker image, which I copied the app jar
>>>> (not uber jar) and all its dependencies under /flink/lib.
>>>>
>>>> So my question is more like, in this case, if the job is marked as
>>>> FAILED, which causes k8s to restart the pod, this seems not help at all,
>>>> what are the suggestions for such scenario?
>>>>
>>>> Thanks a lot!
>>>> Eleanore
>>>>
>>>> [1]
>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html#flink-job-cluster-on-kubernetes
>>>>
>>>> On Mon, Aug 3, 2020 at 2:13 AM Till Rohrmann <[hidden email]>
>>>> wrote:
>>>>
>>>>> Hi Eleanore,
>>>>>
>>>>> how are you deploying Flink exactly? Are you using the application
>>>>> mode with native K8s support to deploy a cluster [1] or are you manually
>>>>> deploying a per-job mode [2]?
>>>>>
>>>>> I believe the problem might be that we terminate the Flink process
>>>>> with a non-zero exit code if the job reaches the ApplicationStatus.FAILED
>>>>> [3].
>>>>>
>>>>> cc Yang Wang have you observed a similar behavior when running Flink
>>>>> in per-job mode on K8s?
>>>>>
>>>>> [1]
>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#flink-kubernetes-application
>>>>> [2]
>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/kubernetes.html#job-cluster-resource-definitions
>>>>> [3]
>>>>> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/clusterframework/ApplicationStatus.java#L32
>>>>>
>>>>> On Fri, Jul 31, 2020 at 6:26 PM Eleanore Jin <[hidden email]>
>>>>> wrote:
>>>>>
>>>>>> Hi Experts,
>>>>>>
>>>>>> I have a flink cluster (per job mode) running on kubernetes. The job
>>>>>> is configured with restart strategy
>>>>>>
>>>>>> restart-strategy.fixed-delay.attempts: 3restart-strategy.fixed-delay.delay: 10 s
>>>>>>
>>>>>>
>>>>>> So after 3 times retry, the job will be marked as FAILED, hence the
>>>>>> pods are not running. However, kubernetes will then restart the job again
>>>>>> as the available replicas do not match the desired one.
>>>>>>
>>>>>> I wonder what are the suggestions for such a scenario? How should I
>>>>>> configure the flink job running on k8s?
>>>>>>
>>>>>> Thanks a lot!
>>>>>> Eleanore
>>>>>>
>>>>>
Reply | Threaded
Open this post in threaded view
|

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

Yang Wang
Hi Eleanore,

Yes, I suggest to use Job to replace Deployment. It could be used to run
jobmanager one time and finish after a successful/failed completion.

However, using Job still could not solve your problem completely. Just as
Till said, When a job exhausts the restart strategy, the jobmanager
pod will terminate with non-zero exit code. It will cause the K8s
restarting it again. Even though we could set the resartPolicy and
backoffLimit,
this is not a clean and correct way to go. We should terminate the
jobmanager process with zero exit code in such situation.

@Till Rohrmann <[hidden email]> I just have one concern. Is it a
special case for K8s deployment? For standalone/Yarn/Mesos, it seems that
terminating with
non-zero exit code is harmless.


Best,
Yang

Eleanore Jin <[hidden email]> 于2020年8月4日周二 下午11:54写道:

> Hi Yang & Till,
>
> Thanks for your prompt reply!
>
> Yang, regarding your question, I am actually not using k8s job, as I put
> my app.jar and its dependencies under flink's lib directory. I have 1 k8s
> deployment for job manager, and 1 k8s deployment for task manager, and 1
> k8s service for job manager.
>
> As you mentioned above, if flink job is marked as failed, it will cause
> the job manager pod to be restarted. Which is not the ideal behavior.
>
> Do you suggest that I should change the deployment strategy from using k8s
> deployment to k8s job? In case the flink program exit with non-zero code
> (e.g. exhausted number of configured restart), pod can be marked as
> complete hence not restarting the job again?
>
> Thanks a lot!
> Eleanore
>
> On Tue, Aug 4, 2020 at 2:49 AM Yang Wang <[hidden email]> wrote:
>
>> @Till Rohrmann <[hidden email]> In native mode, when a Flink
>> application terminates with FAILED state, all the resources will be cleaned
>> up.
>>
>> However, in standalone mode, I agree with you that we need to rethink the
>> exit code of Flink. When a job exhausts the restart
>> strategy, we should terminate the pod and do not restart again. After
>> googling, it seems that we could not specify the restartPolicy
>> based on exit code[1]. So maybe we need to return a zero exit code to
>> avoid restarting by K8s.
>>
>> [1].
>> https://stackoverflow.com/questions/48797297/is-it-possible-to-define-restartpolicy-based-on-container-exit-code
>>
>> Best,
>> Yang
>>
>> Till Rohrmann <[hidden email]> 于2020年8月4日周二 下午3:48写道:
>>
>>> @Yang Wang <[hidden email]> I believe that we should rethink the
>>> exit codes of Flink. In general you want K8s to restart a failed Flink
>>> process. Hence, an application which terminates in state FAILED should not
>>> return a non-zero exit code because it is a valid termination state.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Tue, Aug 4, 2020 at 8:55 AM Yang Wang <[hidden email]> wrote:
>>>
>>>> Hi Eleanore,
>>>>
>>>> I think you are using K8s resource "Job" to deploy the jobmanager.
>>>> Please set .spec.template.spec.restartPolicy = "Never" and
>>>> spec.backoffLimit = 0.
>>>> Refer here[1] for more information.
>>>>
>>>> Then, when the jobmanager failed because of any reason, the K8s job
>>>> will be marked failed. And K8s will not restart the job again.
>>>>
>>>> [1].
>>>> https://kubernetes.io/docs/concepts/workloads/controllers/job/#job-termination-and-cleanup
>>>>
>>>>
>>>> Best,
>>>> Yang
>>>>
>>>> Eleanore Jin <[hidden email]> 于2020年8月4日周二 上午12:05写道:
>>>>
>>>>> Hi Till,
>>>>>
>>>>> Thanks for the reply!
>>>>>
>>>>> I manually deploy as per-job mode [1] and I am using Flink 1.8.2.
>>>>> Specifically, I build a custom docker image, which I copied the app jar
>>>>> (not uber jar) and all its dependencies under /flink/lib.
>>>>>
>>>>> So my question is more like, in this case, if the job is marked as
>>>>> FAILED, which causes k8s to restart the pod, this seems not help at all,
>>>>> what are the suggestions for such scenario?
>>>>>
>>>>> Thanks a lot!
>>>>> Eleanore
>>>>>
>>>>> [1]
>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html#flink-job-cluster-on-kubernetes
>>>>>
>>>>> On Mon, Aug 3, 2020 at 2:13 AM Till Rohrmann <[hidden email]>
>>>>> wrote:
>>>>>
>>>>>> Hi Eleanore,
>>>>>>
>>>>>> how are you deploying Flink exactly? Are you using the application
>>>>>> mode with native K8s support to deploy a cluster [1] or are you manually
>>>>>> deploying a per-job mode [2]?
>>>>>>
>>>>>> I believe the problem might be that we terminate the Flink process
>>>>>> with a non-zero exit code if the job reaches the ApplicationStatus.FAILED
>>>>>> [3].
>>>>>>
>>>>>> cc Yang Wang have you observed a similar behavior when running Flink
>>>>>> in per-job mode on K8s?
>>>>>>
>>>>>> [1]
>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#flink-kubernetes-application
>>>>>> [2]
>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/kubernetes.html#job-cluster-resource-definitions
>>>>>> [3]
>>>>>> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/clusterframework/ApplicationStatus.java#L32
>>>>>>
>>>>>> On Fri, Jul 31, 2020 at 6:26 PM Eleanore Jin <[hidden email]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Experts,
>>>>>>>
>>>>>>> I have a flink cluster (per job mode) running on kubernetes. The job
>>>>>>> is configured with restart strategy
>>>>>>>
>>>>>>> restart-strategy.fixed-delay.attempts: 3restart-strategy.fixed-delay.delay: 10 s
>>>>>>>
>>>>>>>
>>>>>>> So after 3 times retry, the job will be marked as FAILED, hence the
>>>>>>> pods are not running. However, kubernetes will then restart the job again
>>>>>>> as the available replicas do not match the desired one.
>>>>>>>
>>>>>>> I wonder what are the suggestions for such a scenario? How should I
>>>>>>> configure the flink job running on k8s?
>>>>>>>
>>>>>>> Thanks a lot!
>>>>>>> Eleanore
>>>>>>>
>>>>>>
Reply | Threaded
Open this post in threaded view
|

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

Till Rohrmann
Yes for the other deployments it is not a problem. A reason why people
preferred non-zero exit codes in case of FAILED jobs is that this is easier
to monitor than having to take a look at the actual job result. Moreover,
in the YARN web UI the application shows as failed if I am not mistaken.
However, from a framework's perspective, a FAILED job does not mean that
Flink has failed and, hence, the return code could still be 0 in my opinion.

Cheers,
Till

On Wed, Aug 5, 2020 at 9:30 AM Yang Wang <[hidden email]> wrote:

> Hi Eleanore,
>
> Yes, I suggest to use Job to replace Deployment. It could be used to run
> jobmanager one time and finish after a successful/failed completion.
>
> However, using Job still could not solve your problem completely. Just as
> Till said, When a job exhausts the restart strategy, the jobmanager
> pod will terminate with non-zero exit code. It will cause the K8s
> restarting it again. Even though we could set the resartPolicy and
> backoffLimit,
> this is not a clean and correct way to go. We should terminate the
> jobmanager process with zero exit code in such situation.
>
> @Till Rohrmann <[hidden email]> I just have one concern. Is it a
> special case for K8s deployment? For standalone/Yarn/Mesos, it seems that
> terminating with
> non-zero exit code is harmless.
>
>
> Best,
> Yang
>
> Eleanore Jin <[hidden email]> 于2020年8月4日周二 下午11:54写道:
>
>> Hi Yang & Till,
>>
>> Thanks for your prompt reply!
>>
>> Yang, regarding your question, I am actually not using k8s job, as I put
>> my app.jar and its dependencies under flink's lib directory. I have 1 k8s
>> deployment for job manager, and 1 k8s deployment for task manager, and 1
>> k8s service for job manager.
>>
>> As you mentioned above, if flink job is marked as failed, it will cause
>> the job manager pod to be restarted. Which is not the ideal behavior.
>>
>> Do you suggest that I should change the deployment strategy from using
>> k8s deployment to k8s job? In case the flink program exit with non-zero
>> code (e.g. exhausted number of configured restart), pod can be marked as
>> complete hence not restarting the job again?
>>
>> Thanks a lot!
>> Eleanore
>>
>> On Tue, Aug 4, 2020 at 2:49 AM Yang Wang <[hidden email]> wrote:
>>
>>> @Till Rohrmann <[hidden email]> In native mode, when a Flink
>>> application terminates with FAILED state, all the resources will be cleaned
>>> up.
>>>
>>> However, in standalone mode, I agree with you that we need to rethink
>>> the exit code of Flink. When a job exhausts the restart
>>> strategy, we should terminate the pod and do not restart again. After
>>> googling, it seems that we could not specify the restartPolicy
>>> based on exit code[1]. So maybe we need to return a zero exit code to
>>> avoid restarting by K8s.
>>>
>>> [1].
>>> https://stackoverflow.com/questions/48797297/is-it-possible-to-define-restartpolicy-based-on-container-exit-code
>>>
>>> Best,
>>> Yang
>>>
>>> Till Rohrmann <[hidden email]> 于2020年8月4日周二 下午3:48写道:
>>>
>>>> @Yang Wang <[hidden email]> I believe that we should
>>>> rethink the exit codes of Flink. In general you want K8s to restart a
>>>> failed Flink process. Hence, an application which terminates in state
>>>> FAILED should not return a non-zero exit code because it is a valid
>>>> termination state.
>>>>
>>>> Cheers,
>>>> Till
>>>>
>>>> On Tue, Aug 4, 2020 at 8:55 AM Yang Wang <[hidden email]> wrote:
>>>>
>>>>> Hi Eleanore,
>>>>>
>>>>> I think you are using K8s resource "Job" to deploy the jobmanager.
>>>>> Please set .spec.template.spec.restartPolicy = "Never" and
>>>>> spec.backoffLimit = 0.
>>>>> Refer here[1] for more information.
>>>>>
>>>>> Then, when the jobmanager failed because of any reason, the K8s job
>>>>> will be marked failed. And K8s will not restart the job again.
>>>>>
>>>>> [1].
>>>>> https://kubernetes.io/docs/concepts/workloads/controllers/job/#job-termination-and-cleanup
>>>>>
>>>>>
>>>>> Best,
>>>>> Yang
>>>>>
>>>>> Eleanore Jin <[hidden email]> 于2020年8月4日周二 上午12:05写道:
>>>>>
>>>>>> Hi Till,
>>>>>>
>>>>>> Thanks for the reply!
>>>>>>
>>>>>> I manually deploy as per-job mode [1] and I am using Flink 1.8.2.
>>>>>> Specifically, I build a custom docker image, which I copied the app jar
>>>>>> (not uber jar) and all its dependencies under /flink/lib.
>>>>>>
>>>>>> So my question is more like, in this case, if the job is marked as
>>>>>> FAILED, which causes k8s to restart the pod, this seems not help at all,
>>>>>> what are the suggestions for such scenario?
>>>>>>
>>>>>> Thanks a lot!
>>>>>> Eleanore
>>>>>>
>>>>>> [1]
>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html#flink-job-cluster-on-kubernetes
>>>>>>
>>>>>> On Mon, Aug 3, 2020 at 2:13 AM Till Rohrmann <[hidden email]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Eleanore,
>>>>>>>
>>>>>>> how are you deploying Flink exactly? Are you using the application
>>>>>>> mode with native K8s support to deploy a cluster [1] or are you manually
>>>>>>> deploying a per-job mode [2]?
>>>>>>>
>>>>>>> I believe the problem might be that we terminate the Flink process
>>>>>>> with a non-zero exit code if the job reaches the ApplicationStatus.FAILED
>>>>>>> [3].
>>>>>>>
>>>>>>> cc Yang Wang have you observed a similar behavior when running Flink
>>>>>>> in per-job mode on K8s?
>>>>>>>
>>>>>>> [1]
>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#flink-kubernetes-application
>>>>>>> [2]
>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/kubernetes.html#job-cluster-resource-definitions
>>>>>>> [3]
>>>>>>> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/clusterframework/ApplicationStatus.java#L32
>>>>>>>
>>>>>>> On Fri, Jul 31, 2020 at 6:26 PM Eleanore Jin <[hidden email]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Experts,
>>>>>>>>
>>>>>>>> I have a flink cluster (per job mode) running on kubernetes. The
>>>>>>>> job is configured with restart strategy
>>>>>>>>
>>>>>>>> restart-strategy.fixed-delay.attempts: 3restart-strategy.fixed-delay.delay: 10 s
>>>>>>>>
>>>>>>>>
>>>>>>>> So after 3 times retry, the job will be marked as FAILED, hence the
>>>>>>>> pods are not running. However, kubernetes will then restart the job again
>>>>>>>> as the available replicas do not match the desired one.
>>>>>>>>
>>>>>>>> I wonder what are the suggestions for such a scenario? How should I
>>>>>>>> configure the flink job running on k8s?
>>>>>>>>
>>>>>>>> Thanks a lot!
>>>>>>>> Eleanore
>>>>>>>>
>>>>>>>
Reply | Threaded
Open this post in threaded view
|

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

Yang Wang
Actually, the application status shows in YARN web UI is not determined by
the jobmanager process exit code.
Instead, we use "resourceManagerClient.unregisterApplicationMaster" to
control the final status of YARN application.
So although jobmanager exit with zero code, it still could show failed
status in YARN web UI.

I have created a ticket to track this improvement[1].

[1]. https://issues.apache.org/jira/browse/FLINK-18828


Best,
Yang


Till Rohrmann <[hidden email]> 于2020年8月5日周三 下午3:56写道:

> Yes for the other deployments it is not a problem. A reason why people
> preferred non-zero exit codes in case of FAILED jobs is that this is easier
> to monitor than having to take a look at the actual job result. Moreover,
> in the YARN web UI the application shows as failed if I am not mistaken.
> However, from a framework's perspective, a FAILED job does not mean that
> Flink has failed and, hence, the return code could still be 0 in my opinion.
>
> Cheers,
> Till
>
> On Wed, Aug 5, 2020 at 9:30 AM Yang Wang <[hidden email]> wrote:
>
>> Hi Eleanore,
>>
>> Yes, I suggest to use Job to replace Deployment. It could be used to run
>> jobmanager one time and finish after a successful/failed completion.
>>
>> However, using Job still could not solve your problem completely. Just as
>> Till said, When a job exhausts the restart strategy, the jobmanager
>> pod will terminate with non-zero exit code. It will cause the K8s
>> restarting it again. Even though we could set the resartPolicy and
>> backoffLimit,
>> this is not a clean and correct way to go. We should terminate the
>> jobmanager process with zero exit code in such situation.
>>
>> @Till Rohrmann <[hidden email]> I just have one concern. Is it a
>> special case for K8s deployment? For standalone/Yarn/Mesos, it seems that
>> terminating with
>> non-zero exit code is harmless.
>>
>>
>> Best,
>> Yang
>>
>> Eleanore Jin <[hidden email]> 于2020年8月4日周二 下午11:54写道:
>>
>>> Hi Yang & Till,
>>>
>>> Thanks for your prompt reply!
>>>
>>> Yang, regarding your question, I am actually not using k8s job, as I put
>>> my app.jar and its dependencies under flink's lib directory. I have 1 k8s
>>> deployment for job manager, and 1 k8s deployment for task manager, and 1
>>> k8s service for job manager.
>>>
>>> As you mentioned above, if flink job is marked as failed, it will cause
>>> the job manager pod to be restarted. Which is not the ideal behavior.
>>>
>>> Do you suggest that I should change the deployment strategy from using
>>> k8s deployment to k8s job? In case the flink program exit with non-zero
>>> code (e.g. exhausted number of configured restart), pod can be marked as
>>> complete hence not restarting the job again?
>>>
>>> Thanks a lot!
>>> Eleanore
>>>
>>> On Tue, Aug 4, 2020 at 2:49 AM Yang Wang <[hidden email]> wrote:
>>>
>>>> @Till Rohrmann <[hidden email]> In native mode, when a Flink
>>>> application terminates with FAILED state, all the resources will be cleaned
>>>> up.
>>>>
>>>> However, in standalone mode, I agree with you that we need to rethink
>>>> the exit code of Flink. When a job exhausts the restart
>>>> strategy, we should terminate the pod and do not restart again. After
>>>> googling, it seems that we could not specify the restartPolicy
>>>> based on exit code[1]. So maybe we need to return a zero exit code to
>>>> avoid restarting by K8s.
>>>>
>>>> [1].
>>>> https://stackoverflow.com/questions/48797297/is-it-possible-to-define-restartpolicy-based-on-container-exit-code
>>>>
>>>> Best,
>>>> Yang
>>>>
>>>> Till Rohrmann <[hidden email]> 于2020年8月4日周二 下午3:48写道:
>>>>
>>>>> @Yang Wang <[hidden email]> I believe that we should
>>>>> rethink the exit codes of Flink. In general you want K8s to restart a
>>>>> failed Flink process. Hence, an application which terminates in state
>>>>> FAILED should not return a non-zero exit code because it is a valid
>>>>> termination state.
>>>>>
>>>>> Cheers,
>>>>> Till
>>>>>
>>>>> On Tue, Aug 4, 2020 at 8:55 AM Yang Wang <[hidden email]>
>>>>> wrote:
>>>>>
>>>>>> Hi Eleanore,
>>>>>>
>>>>>> I think you are using K8s resource "Job" to deploy the jobmanager.
>>>>>> Please set .spec.template.spec.restartPolicy = "Never" and
>>>>>> spec.backoffLimit = 0.
>>>>>> Refer here[1] for more information.
>>>>>>
>>>>>> Then, when the jobmanager failed because of any reason, the K8s job
>>>>>> will be marked failed. And K8s will not restart the job again.
>>>>>>
>>>>>> [1].
>>>>>> https://kubernetes.io/docs/concepts/workloads/controllers/job/#job-termination-and-cleanup
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>> Yang
>>>>>>
>>>>>> Eleanore Jin <[hidden email]> 于2020年8月4日周二 上午12:05写道:
>>>>>>
>>>>>>> Hi Till,
>>>>>>>
>>>>>>> Thanks for the reply!
>>>>>>>
>>>>>>> I manually deploy as per-job mode [1] and I am using Flink 1.8.2.
>>>>>>> Specifically, I build a custom docker image, which I copied the app jar
>>>>>>> (not uber jar) and all its dependencies under /flink/lib.
>>>>>>>
>>>>>>> So my question is more like, in this case, if the job is marked as
>>>>>>> FAILED, which causes k8s to restart the pod, this seems not help at all,
>>>>>>> what are the suggestions for such scenario?
>>>>>>>
>>>>>>> Thanks a lot!
>>>>>>> Eleanore
>>>>>>>
>>>>>>> [1]
>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html#flink-job-cluster-on-kubernetes
>>>>>>>
>>>>>>> On Mon, Aug 3, 2020 at 2:13 AM Till Rohrmann <[hidden email]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Eleanore,
>>>>>>>>
>>>>>>>> how are you deploying Flink exactly? Are you using the application
>>>>>>>> mode with native K8s support to deploy a cluster [1] or are you manually
>>>>>>>> deploying a per-job mode [2]?
>>>>>>>>
>>>>>>>> I believe the problem might be that we terminate the Flink process
>>>>>>>> with a non-zero exit code if the job reaches the ApplicationStatus.FAILED
>>>>>>>> [3].
>>>>>>>>
>>>>>>>> cc Yang Wang have you observed a similar behavior when running
>>>>>>>> Flink in per-job mode on K8s?
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#flink-kubernetes-application
>>>>>>>> [2]
>>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/kubernetes.html#job-cluster-resource-definitions
>>>>>>>> [3]
>>>>>>>> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/clusterframework/ApplicationStatus.java#L32
>>>>>>>>
>>>>>>>> On Fri, Jul 31, 2020 at 6:26 PM Eleanore Jin <
>>>>>>>> [hidden email]> wrote:
>>>>>>>>
>>>>>>>>> Hi Experts,
>>>>>>>>>
>>>>>>>>> I have a flink cluster (per job mode) running on kubernetes. The
>>>>>>>>> job is configured with restart strategy
>>>>>>>>>
>>>>>>>>> restart-strategy.fixed-delay.attempts: 3restart-strategy.fixed-delay.delay: 10 s
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> So after 3 times retry, the job will be marked as FAILED, hence
>>>>>>>>> the pods are not running. However, kubernetes will then restart the job
>>>>>>>>> again as the available replicas do not match the desired one.
>>>>>>>>>
>>>>>>>>> I wonder what are the suggestions for such a scenario? How should
>>>>>>>>> I configure the flink job running on k8s?
>>>>>>>>>
>>>>>>>>> Thanks a lot!
>>>>>>>>> Eleanore
>>>>>>>>>
>>>>>>>>
Reply | Threaded
Open this post in threaded view
|

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

Till Rohrmann
You are right Yang Wang.

Thanks for creating this issue.

Cheers,
Till

On Wed, Aug 5, 2020 at 1:33 PM Yang Wang <[hidden email]> wrote:

> Actually, the application status shows in YARN web UI is not determined by
> the jobmanager process exit code.
> Instead, we use "resourceManagerClient.unregisterApplicationMaster" to
> control the final status of YARN application.
> So although jobmanager exit with zero code, it still could show failed
> status in YARN web UI.
>
> I have created a ticket to track this improvement[1].
>
> [1]. https://issues.apache.org/jira/browse/FLINK-18828
>
>
> Best,
> Yang
>
>
> Till Rohrmann <[hidden email]> 于2020年8月5日周三 下午3:56写道:
>
>> Yes for the other deployments it is not a problem. A reason why people
>> preferred non-zero exit codes in case of FAILED jobs is that this is easier
>> to monitor than having to take a look at the actual job result. Moreover,
>> in the YARN web UI the application shows as failed if I am not mistaken.
>> However, from a framework's perspective, a FAILED job does not mean that
>> Flink has failed and, hence, the return code could still be 0 in my opinion.
>>
>> Cheers,
>> Till
>>
>> On Wed, Aug 5, 2020 at 9:30 AM Yang Wang <[hidden email]> wrote:
>>
>>> Hi Eleanore,
>>>
>>> Yes, I suggest to use Job to replace Deployment. It could be used to run
>>> jobmanager one time and finish after a successful/failed completion.
>>>
>>> However, using Job still could not solve your problem completely. Just
>>> as Till said, When a job exhausts the restart strategy, the jobmanager
>>> pod will terminate with non-zero exit code. It will cause the K8s
>>> restarting it again. Even though we could set the resartPolicy and
>>> backoffLimit,
>>> this is not a clean and correct way to go. We should terminate the
>>> jobmanager process with zero exit code in such situation.
>>>
>>> @Till Rohrmann <[hidden email]> I just have one concern. Is it a
>>> special case for K8s deployment? For standalone/Yarn/Mesos, it seems that
>>> terminating with
>>> non-zero exit code is harmless.
>>>
>>>
>>> Best,
>>> Yang
>>>
>>> Eleanore Jin <[hidden email]> 于2020年8月4日周二 下午11:54写道:
>>>
>>>> Hi Yang & Till,
>>>>
>>>> Thanks for your prompt reply!
>>>>
>>>> Yang, regarding your question, I am actually not using k8s job, as I
>>>> put my app.jar and its dependencies under flink's lib directory. I have 1
>>>> k8s deployment for job manager, and 1 k8s deployment for task manager, and
>>>> 1 k8s service for job manager.
>>>>
>>>> As you mentioned above, if flink job is marked as failed, it will cause
>>>> the job manager pod to be restarted. Which is not the ideal behavior.
>>>>
>>>> Do you suggest that I should change the deployment strategy from using
>>>> k8s deployment to k8s job? In case the flink program exit with non-zero
>>>> code (e.g. exhausted number of configured restart), pod can be marked as
>>>> complete hence not restarting the job again?
>>>>
>>>> Thanks a lot!
>>>> Eleanore
>>>>
>>>> On Tue, Aug 4, 2020 at 2:49 AM Yang Wang <[hidden email]> wrote:
>>>>
>>>>> @Till Rohrmann <[hidden email]> In native mode, when a Flink
>>>>> application terminates with FAILED state, all the resources will be cleaned
>>>>> up.
>>>>>
>>>>> However, in standalone mode, I agree with you that we need to rethink
>>>>> the exit code of Flink. When a job exhausts the restart
>>>>> strategy, we should terminate the pod and do not restart again. After
>>>>> googling, it seems that we could not specify the restartPolicy
>>>>> based on exit code[1]. So maybe we need to return a zero exit code to
>>>>> avoid restarting by K8s.
>>>>>
>>>>> [1].
>>>>> https://stackoverflow.com/questions/48797297/is-it-possible-to-define-restartpolicy-based-on-container-exit-code
>>>>>
>>>>> Best,
>>>>> Yang
>>>>>
>>>>> Till Rohrmann <[hidden email]> 于2020年8月4日周二 下午3:48写道:
>>>>>
>>>>>> @Yang Wang <[hidden email]> I believe that we should
>>>>>> rethink the exit codes of Flink. In general you want K8s to restart a
>>>>>> failed Flink process. Hence, an application which terminates in state
>>>>>> FAILED should not return a non-zero exit code because it is a valid
>>>>>> termination state.
>>>>>>
>>>>>> Cheers,
>>>>>> Till
>>>>>>
>>>>>> On Tue, Aug 4, 2020 at 8:55 AM Yang Wang <[hidden email]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Eleanore,
>>>>>>>
>>>>>>> I think you are using K8s resource "Job" to deploy the jobmanager.
>>>>>>> Please set .spec.template.spec.restartPolicy = "Never" and
>>>>>>> spec.backoffLimit = 0.
>>>>>>> Refer here[1] for more information.
>>>>>>>
>>>>>>> Then, when the jobmanager failed because of any reason, the K8s job
>>>>>>> will be marked failed. And K8s will not restart the job again.
>>>>>>>
>>>>>>> [1].
>>>>>>> https://kubernetes.io/docs/concepts/workloads/controllers/job/#job-termination-and-cleanup
>>>>>>>
>>>>>>>
>>>>>>> Best,
>>>>>>> Yang
>>>>>>>
>>>>>>> Eleanore Jin <[hidden email]> 于2020年8月4日周二 上午12:05写道:
>>>>>>>
>>>>>>>> Hi Till,
>>>>>>>>
>>>>>>>> Thanks for the reply!
>>>>>>>>
>>>>>>>> I manually deploy as per-job mode [1] and I am using Flink 1.8.2.
>>>>>>>> Specifically, I build a custom docker image, which I copied the app jar
>>>>>>>> (not uber jar) and all its dependencies under /flink/lib.
>>>>>>>>
>>>>>>>> So my question is more like, in this case, if the job is marked as
>>>>>>>> FAILED, which causes k8s to restart the pod, this seems not help at all,
>>>>>>>> what are the suggestions for such scenario?
>>>>>>>>
>>>>>>>> Thanks a lot!
>>>>>>>> Eleanore
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html#flink-job-cluster-on-kubernetes
>>>>>>>>
>>>>>>>> On Mon, Aug 3, 2020 at 2:13 AM Till Rohrmann <[hidden email]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Eleanore,
>>>>>>>>>
>>>>>>>>> how are you deploying Flink exactly? Are you using the application
>>>>>>>>> mode with native K8s support to deploy a cluster [1] or are you manually
>>>>>>>>> deploying a per-job mode [2]?
>>>>>>>>>
>>>>>>>>> I believe the problem might be that we terminate the Flink process
>>>>>>>>> with a non-zero exit code if the job reaches the ApplicationStatus.FAILED
>>>>>>>>> [3].
>>>>>>>>>
>>>>>>>>> cc Yang Wang have you observed a similar behavior when running
>>>>>>>>> Flink in per-job mode on K8s?
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#flink-kubernetes-application
>>>>>>>>> [2]
>>>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/kubernetes.html#job-cluster-resource-definitions
>>>>>>>>> [3]
>>>>>>>>> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/clusterframework/ApplicationStatus.java#L32
>>>>>>>>>
>>>>>>>>> On Fri, Jul 31, 2020 at 6:26 PM Eleanore Jin <
>>>>>>>>> [hidden email]> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Experts,
>>>>>>>>>>
>>>>>>>>>> I have a flink cluster (per job mode) running on kubernetes. The
>>>>>>>>>> job is configured with restart strategy
>>>>>>>>>>
>>>>>>>>>> restart-strategy.fixed-delay.attempts: 3restart-strategy.fixed-delay.delay: 10 s
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> So after 3 times retry, the job will be marked as FAILED, hence
>>>>>>>>>> the pods are not running. However, kubernetes will then restart the job
>>>>>>>>>> again as the available replicas do not match the desired one.
>>>>>>>>>>
>>>>>>>>>> I wonder what are the suggestions for such a scenario? How should
>>>>>>>>>> I configure the flink job running on k8s?
>>>>>>>>>>
>>>>>>>>>> Thanks a lot!
>>>>>>>>>> Eleanore
>>>>>>>>>>
>>>>>>>>>
Reply | Threaded
Open this post in threaded view
|

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

Eleanore Jin
Hi Yang and Till,

Thanks a lot for the help! I have the similar question as Till mentioned,
if we do not fail Flink pods when the restart strategy is exhausted, it
might be hard to monitor such failures. Today I get alerts if the k8s pods
are restarted or in crash loop, but if this will no longer be the case, how
can we deal with the monitoring? In production, I have hundreds of small
flink jobs running (2-8 TM pods) doing stateless processing, it is really
hard for us to expose ingress for each JM rest endpoint to periodically
query the job status for each flink job.

Thanks a lot!
Eleanore

On Wed, Aug 5, 2020 at 4:56 AM Till Rohrmann <[hidden email]> wrote:

> You are right Yang Wang.
>
> Thanks for creating this issue.
>
> Cheers,
> Till
>
> On Wed, Aug 5, 2020 at 1:33 PM Yang Wang <[hidden email]> wrote:
>
>> Actually, the application status shows in YARN web UI is not determined
>> by the jobmanager process exit code.
>> Instead, we use "resourceManagerClient.unregisterApplicationMaster" to
>> control the final status of YARN application.
>> So although jobmanager exit with zero code, it still could show failed
>> status in YARN web UI.
>>
>> I have created a ticket to track this improvement[1].
>>
>> [1]. https://issues.apache.org/jira/browse/FLINK-18828
>>
>>
>> Best,
>> Yang
>>
>>
>> Till Rohrmann <[hidden email]> 于2020年8月5日周三 下午3:56写道:
>>
>>> Yes for the other deployments it is not a problem. A reason why people
>>> preferred non-zero exit codes in case of FAILED jobs is that this is easier
>>> to monitor than having to take a look at the actual job result. Moreover,
>>> in the YARN web UI the application shows as failed if I am not mistaken.
>>> However, from a framework's perspective, a FAILED job does not mean that
>>> Flink has failed and, hence, the return code could still be 0 in my opinion.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Wed, Aug 5, 2020 at 9:30 AM Yang Wang <[hidden email]> wrote:
>>>
>>>> Hi Eleanore,
>>>>
>>>> Yes, I suggest to use Job to replace Deployment. It could be used
>>>> to run jobmanager one time and finish after a successful/failed completion.
>>>>
>>>> However, using Job still could not solve your problem completely. Just
>>>> as Till said, When a job exhausts the restart strategy, the jobmanager
>>>> pod will terminate with non-zero exit code. It will cause the K8s
>>>> restarting it again. Even though we could set the resartPolicy and
>>>> backoffLimit,
>>>> this is not a clean and correct way to go. We should terminate the
>>>> jobmanager process with zero exit code in such situation.
>>>>
>>>> @Till Rohrmann <[hidden email]> I just have one concern. Is it a
>>>> special case for K8s deployment? For standalone/Yarn/Mesos, it seems that
>>>> terminating with
>>>> non-zero exit code is harmless.
>>>>
>>>>
>>>> Best,
>>>> Yang
>>>>
>>>> Eleanore Jin <[hidden email]> 于2020年8月4日周二 下午11:54写道:
>>>>
>>>>> Hi Yang & Till,
>>>>>
>>>>> Thanks for your prompt reply!
>>>>>
>>>>> Yang, regarding your question, I am actually not using k8s job, as I
>>>>> put my app.jar and its dependencies under flink's lib directory. I have 1
>>>>> k8s deployment for job manager, and 1 k8s deployment for task manager, and
>>>>> 1 k8s service for job manager.
>>>>>
>>>>> As you mentioned above, if flink job is marked as failed, it will
>>>>> cause the job manager pod to be restarted. Which is not the ideal
>>>>> behavior.
>>>>>
>>>>> Do you suggest that I should change the deployment strategy from using
>>>>> k8s deployment to k8s job? In case the flink program exit with non-zero
>>>>> code (e.g. exhausted number of configured restart), pod can be marked as
>>>>> complete hence not restarting the job again?
>>>>>
>>>>> Thanks a lot!
>>>>> Eleanore
>>>>>
>>>>> On Tue, Aug 4, 2020 at 2:49 AM Yang Wang <[hidden email]>
>>>>> wrote:
>>>>>
>>>>>> @Till Rohrmann <[hidden email]> In native mode, when a Flink
>>>>>> application terminates with FAILED state, all the resources will be cleaned
>>>>>> up.
>>>>>>
>>>>>> However, in standalone mode, I agree with you that we need to rethink
>>>>>> the exit code of Flink. When a job exhausts the restart
>>>>>> strategy, we should terminate the pod and do not restart again. After
>>>>>> googling, it seems that we could not specify the restartPolicy
>>>>>> based on exit code[1]. So maybe we need to return a zero exit code to
>>>>>> avoid restarting by K8s.
>>>>>>
>>>>>> [1].
>>>>>> https://stackoverflow.com/questions/48797297/is-it-possible-to-define-restartpolicy-based-on-container-exit-code
>>>>>>
>>>>>> Best,
>>>>>> Yang
>>>>>>
>>>>>> Till Rohrmann <[hidden email]> 于2020年8月4日周二 下午3:48写道:
>>>>>>
>>>>>>> @Yang Wang <[hidden email]> I believe that we should
>>>>>>> rethink the exit codes of Flink. In general you want K8s to restart a
>>>>>>> failed Flink process. Hence, an application which terminates in state
>>>>>>> FAILED should not return a non-zero exit code because it is a valid
>>>>>>> termination state.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Till
>>>>>>>
>>>>>>> On Tue, Aug 4, 2020 at 8:55 AM Yang Wang <[hidden email]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Eleanore,
>>>>>>>>
>>>>>>>> I think you are using K8s resource "Job" to deploy the jobmanager.
>>>>>>>> Please set .spec.template.spec.restartPolicy = "Never" and
>>>>>>>> spec.backoffLimit = 0.
>>>>>>>> Refer here[1] for more information.
>>>>>>>>
>>>>>>>> Then, when the jobmanager failed because of any reason, the K8s job
>>>>>>>> will be marked failed. And K8s will not restart the job again.
>>>>>>>>
>>>>>>>> [1].
>>>>>>>> https://kubernetes.io/docs/concepts/workloads/controllers/job/#job-termination-and-cleanup
>>>>>>>>
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Yang
>>>>>>>>
>>>>>>>> Eleanore Jin <[hidden email]> 于2020年8月4日周二 上午12:05写道:
>>>>>>>>
>>>>>>>>> Hi Till,
>>>>>>>>>
>>>>>>>>> Thanks for the reply!
>>>>>>>>>
>>>>>>>>> I manually deploy as per-job mode [1] and I am using Flink 1.8.2.
>>>>>>>>> Specifically, I build a custom docker image, which I copied the app jar
>>>>>>>>> (not uber jar) and all its dependencies under /flink/lib.
>>>>>>>>>
>>>>>>>>> So my question is more like, in this case, if the job is marked as
>>>>>>>>> FAILED, which causes k8s to restart the pod, this seems not help at all,
>>>>>>>>> what are the suggestions for such scenario?
>>>>>>>>>
>>>>>>>>> Thanks a lot!
>>>>>>>>> Eleanore
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html#flink-job-cluster-on-kubernetes
>>>>>>>>>
>>>>>>>>> On Mon, Aug 3, 2020 at 2:13 AM Till Rohrmann <[hidden email]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Eleanore,
>>>>>>>>>>
>>>>>>>>>> how are you deploying Flink exactly? Are you using the
>>>>>>>>>> application mode with native K8s support to deploy a cluster [1] or are you
>>>>>>>>>> manually deploying a per-job mode [2]?
>>>>>>>>>>
>>>>>>>>>> I believe the problem might be that we terminate the Flink
>>>>>>>>>> process with a non-zero exit code if the job reaches the
>>>>>>>>>> ApplicationStatus.FAILED [3].
>>>>>>>>>>
>>>>>>>>>> cc Yang Wang have you observed a similar behavior when running
>>>>>>>>>> Flink in per-job mode on K8s?
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#flink-kubernetes-application
>>>>>>>>>> [2]
>>>>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/kubernetes.html#job-cluster-resource-definitions
>>>>>>>>>> [3]
>>>>>>>>>> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/clusterframework/ApplicationStatus.java#L32
>>>>>>>>>>
>>>>>>>>>> On Fri, Jul 31, 2020 at 6:26 PM Eleanore Jin <
>>>>>>>>>> [hidden email]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Experts,
>>>>>>>>>>>
>>>>>>>>>>> I have a flink cluster (per job mode) running on kubernetes. The
>>>>>>>>>>> job is configured with restart strategy
>>>>>>>>>>>
>>>>>>>>>>> restart-strategy.fixed-delay.attempts: 3restart-strategy.fixed-delay.delay: 10 s
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> So after 3 times retry, the job will be marked as FAILED, hence
>>>>>>>>>>> the pods are not running. However, kubernetes will then restart the job
>>>>>>>>>>> again as the available replicas do not match the desired one.
>>>>>>>>>>>
>>>>>>>>>>> I wonder what are the suggestions for such a scenario? How
>>>>>>>>>>> should I configure the flink job running on k8s?
>>>>>>>>>>>
>>>>>>>>>>> Thanks a lot!
>>>>>>>>>>> Eleanore
>>>>>>>>>>>
>>>>>>>>>>
Reply | Threaded
Open this post in threaded view
|

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

Yang Wang
Hi Eleanore,

From my experience, collecting the Flink metrics to prometheus via metrics
collector is a more ideal way. It is
also easier to configure the alert.
Maybe you could use "fullRestarts" or "numRestarts" to monitor the job
restarting. More metrics could be find
here[2].

[1].
https://ci.apache.org/projects/flink/flink-docs-master/monitoring/metrics.html#prometheus-orgapacheflinkmetricsprometheusprometheusreporter
[2].
https://ci.apache.org/projects/flink/flink-docs-master/monitoring/metrics.html#availability

Best,
Yang

Eleanore Jin <[hidden email]> 于2020年8月5日周三 下午11:52写道:

> Hi Yang and Till,
>
> Thanks a lot for the help! I have the similar question as Till mentioned,
> if we do not fail Flink pods when the restart strategy is exhausted, it
> might be hard to monitor such failures. Today I get alerts if the k8s pods
> are restarted or in crash loop, but if this will no longer be the case, how
> can we deal with the monitoring? In production, I have hundreds of small
> flink jobs running (2-8 TM pods) doing stateless processing, it is really
> hard for us to expose ingress for each JM rest endpoint to periodically
> query the job status for each flink job.
>
> Thanks a lot!
> Eleanore
>
> On Wed, Aug 5, 2020 at 4:56 AM Till Rohrmann <[hidden email]> wrote:
>
>> You are right Yang Wang.
>>
>> Thanks for creating this issue.
>>
>> Cheers,
>> Till
>>
>> On Wed, Aug 5, 2020 at 1:33 PM Yang Wang <[hidden email]> wrote:
>>
>>> Actually, the application status shows in YARN web UI is not determined
>>> by the jobmanager process exit code.
>>> Instead, we use "resourceManagerClient.unregisterApplicationMaster" to
>>> control the final status of YARN application.
>>> So although jobmanager exit with zero code, it still could show failed
>>> status in YARN web UI.
>>>
>>> I have created a ticket to track this improvement[1].
>>>
>>> [1]. https://issues.apache.org/jira/browse/FLINK-18828
>>>
>>>
>>> Best,
>>> Yang
>>>
>>>
>>> Till Rohrmann <[hidden email]> 于2020年8月5日周三 下午3:56写道:
>>>
>>>> Yes for the other deployments it is not a problem. A reason why people
>>>> preferred non-zero exit codes in case of FAILED jobs is that this is easier
>>>> to monitor than having to take a look at the actual job result. Moreover,
>>>> in the YARN web UI the application shows as failed if I am not mistaken.
>>>> However, from a framework's perspective, a FAILED job does not mean that
>>>> Flink has failed and, hence, the return code could still be 0 in my opinion.
>>>>
>>>> Cheers,
>>>> Till
>>>>
>>>> On Wed, Aug 5, 2020 at 9:30 AM Yang Wang <[hidden email]> wrote:
>>>>
>>>>> Hi Eleanore,
>>>>>
>>>>> Yes, I suggest to use Job to replace Deployment. It could be used
>>>>> to run jobmanager one time and finish after a successful/failed completion.
>>>>>
>>>>> However, using Job still could not solve your problem completely. Just
>>>>> as Till said, When a job exhausts the restart strategy, the jobmanager
>>>>> pod will terminate with non-zero exit code. It will cause the K8s
>>>>> restarting it again. Even though we could set the resartPolicy and
>>>>> backoffLimit,
>>>>> this is not a clean and correct way to go. We should terminate the
>>>>> jobmanager process with zero exit code in such situation.
>>>>>
>>>>> @Till Rohrmann <[hidden email]> I just have one concern. Is it
>>>>> a special case for K8s deployment? For standalone/Yarn/Mesos, it seems that
>>>>> terminating with
>>>>> non-zero exit code is harmless.
>>>>>
>>>>>
>>>>> Best,
>>>>> Yang
>>>>>
>>>>> Eleanore Jin <[hidden email]> 于2020年8月4日周二 下午11:54写道:
>>>>>
>>>>>> Hi Yang & Till,
>>>>>>
>>>>>> Thanks for your prompt reply!
>>>>>>
>>>>>> Yang, regarding your question, I am actually not using k8s job, as I
>>>>>> put my app.jar and its dependencies under flink's lib directory. I have 1
>>>>>> k8s deployment for job manager, and 1 k8s deployment for task manager, and
>>>>>> 1 k8s service for job manager.
>>>>>>
>>>>>> As you mentioned above, if flink job is marked as failed, it will
>>>>>> cause the job manager pod to be restarted. Which is not the ideal
>>>>>> behavior.
>>>>>>
>>>>>> Do you suggest that I should change the deployment strategy from
>>>>>> using k8s deployment to k8s job? In case the flink program exit with
>>>>>> non-zero code (e.g. exhausted number of configured restart), pod can be
>>>>>> marked as complete hence not restarting the job again?
>>>>>>
>>>>>> Thanks a lot!
>>>>>> Eleanore
>>>>>>
>>>>>> On Tue, Aug 4, 2020 at 2:49 AM Yang Wang <[hidden email]>
>>>>>> wrote:
>>>>>>
>>>>>>> @Till Rohrmann <[hidden email]> In native mode, when a Flink
>>>>>>> application terminates with FAILED state, all the resources will be cleaned
>>>>>>> up.
>>>>>>>
>>>>>>> However, in standalone mode, I agree with you that we need to
>>>>>>> rethink the exit code of Flink. When a job exhausts the restart
>>>>>>> strategy, we should terminate the pod and do not restart again.
>>>>>>> After googling, it seems that we could not specify the restartPolicy
>>>>>>> based on exit code[1]. So maybe we need to return a zero exit code
>>>>>>> to avoid restarting by K8s.
>>>>>>>
>>>>>>> [1].
>>>>>>> https://stackoverflow.com/questions/48797297/is-it-possible-to-define-restartpolicy-based-on-container-exit-code
>>>>>>>
>>>>>>> Best,
>>>>>>> Yang
>>>>>>>
>>>>>>> Till Rohrmann <[hidden email]> 于2020年8月4日周二 下午3:48写道:
>>>>>>>
>>>>>>>> @Yang Wang <[hidden email]> I believe that we should
>>>>>>>> rethink the exit codes of Flink. In general you want K8s to restart a
>>>>>>>> failed Flink process. Hence, an application which terminates in state
>>>>>>>> FAILED should not return a non-zero exit code because it is a valid
>>>>>>>> termination state.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Till
>>>>>>>>
>>>>>>>> On Tue, Aug 4, 2020 at 8:55 AM Yang Wang <[hidden email]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Eleanore,
>>>>>>>>>
>>>>>>>>> I think you are using K8s resource "Job" to deploy the jobmanager.
>>>>>>>>> Please set .spec.template.spec.restartPolicy = "Never" and
>>>>>>>>> spec.backoffLimit = 0.
>>>>>>>>> Refer here[1] for more information.
>>>>>>>>>
>>>>>>>>> Then, when the jobmanager failed because of any reason, the K8s
>>>>>>>>> job will be marked failed. And K8s will not restart the job again.
>>>>>>>>>
>>>>>>>>> [1].
>>>>>>>>> https://kubernetes.io/docs/concepts/workloads/controllers/job/#job-termination-and-cleanup
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Yang
>>>>>>>>>
>>>>>>>>> Eleanore Jin <[hidden email]> 于2020年8月4日周二 上午12:05写道:
>>>>>>>>>
>>>>>>>>>> Hi Till,
>>>>>>>>>>
>>>>>>>>>> Thanks for the reply!
>>>>>>>>>>
>>>>>>>>>> I manually deploy as per-job mode [1] and I am using Flink 1.8.2.
>>>>>>>>>> Specifically, I build a custom docker image, which I copied the app jar
>>>>>>>>>> (not uber jar) and all its dependencies under /flink/lib.
>>>>>>>>>>
>>>>>>>>>> So my question is more like, in this case, if the job is marked
>>>>>>>>>> as FAILED, which causes k8s to restart the pod, this seems not help at all,
>>>>>>>>>> what are the suggestions for such scenario?
>>>>>>>>>>
>>>>>>>>>> Thanks a lot!
>>>>>>>>>> Eleanore
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html#flink-job-cluster-on-kubernetes
>>>>>>>>>>
>>>>>>>>>> On Mon, Aug 3, 2020 at 2:13 AM Till Rohrmann <
>>>>>>>>>> [hidden email]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Eleanore,
>>>>>>>>>>>
>>>>>>>>>>> how are you deploying Flink exactly? Are you using the
>>>>>>>>>>> application mode with native K8s support to deploy a cluster [1] or are you
>>>>>>>>>>> manually deploying a per-job mode [2]?
>>>>>>>>>>>
>>>>>>>>>>> I believe the problem might be that we terminate the Flink
>>>>>>>>>>> process with a non-zero exit code if the job reaches the
>>>>>>>>>>> ApplicationStatus.FAILED [3].
>>>>>>>>>>>
>>>>>>>>>>> cc Yang Wang have you observed a similar behavior when running
>>>>>>>>>>> Flink in per-job mode on K8s?
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#flink-kubernetes-application
>>>>>>>>>>> [2]
>>>>>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/kubernetes.html#job-cluster-resource-definitions
>>>>>>>>>>> [3]
>>>>>>>>>>> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/clusterframework/ApplicationStatus.java#L32
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jul 31, 2020 at 6:26 PM Eleanore Jin <
>>>>>>>>>>> [hidden email]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Experts,
>>>>>>>>>>>>
>>>>>>>>>>>> I have a flink cluster (per job mode) running on kubernetes.
>>>>>>>>>>>> The job is configured with restart strategy
>>>>>>>>>>>>
>>>>>>>>>>>> restart-strategy.fixed-delay.attempts: 3restart-strategy.fixed-delay.delay: 10 s
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> So after 3 times retry, the job will be marked as FAILED, hence
>>>>>>>>>>>> the pods are not running. However, kubernetes will then restart the job
>>>>>>>>>>>> again as the available replicas do not match the desired one.
>>>>>>>>>>>>
>>>>>>>>>>>> I wonder what are the suggestions for such a scenario? How
>>>>>>>>>>>> should I configure the flink job running on k8s?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks a lot!
>>>>>>>>>>>> Eleanore
>>>>>>>>>>>>
>>>>>>>>>>>
Reply | Threaded
Open this post in threaded view
|

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

Eleanore Jin
Hi Yang,

Thanks a lot for the information!

Eleanore

On Thu, Aug 6, 2020 at 4:20 AM Yang Wang <[hidden email]> wrote:

> Hi Eleanore,
>
> From my experience, collecting the Flink metrics to prometheus via metrics
> collector is a more ideal way. It is
> also easier to configure the alert.
> Maybe you could use "fullRestarts" or "numRestarts" to monitor the job
> restarting. More metrics could be find
> here[2].
>
> [1].
> https://ci.apache.org/projects/flink/flink-docs-master/monitoring/metrics.html#prometheus-orgapacheflinkmetricsprometheusprometheusreporter
> [2].
> https://ci.apache.org/projects/flink/flink-docs-master/monitoring/metrics.html#availability
>
> Best,
> Yang
>
> Eleanore Jin <[hidden email]> 于2020年8月5日周三 下午11:52写道:
>
>> Hi Yang and Till,
>>
>> Thanks a lot for the help! I have the similar question as Till mentioned,
>> if we do not fail Flink pods when the restart strategy is exhausted, it
>> might be hard to monitor such failures. Today I get alerts if the k8s pods
>> are restarted or in crash loop, but if this will no longer be the case, how
>> can we deal with the monitoring? In production, I have hundreds of small
>> flink jobs running (2-8 TM pods) doing stateless processing, it is really
>> hard for us to expose ingress for each JM rest endpoint to periodically
>> query the job status for each flink job.
>>
>> Thanks a lot!
>> Eleanore
>>
>> On Wed, Aug 5, 2020 at 4:56 AM Till Rohrmann <[hidden email]>
>> wrote:
>>
>>> You are right Yang Wang.
>>>
>>> Thanks for creating this issue.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Wed, Aug 5, 2020 at 1:33 PM Yang Wang <[hidden email]> wrote:
>>>
>>>> Actually, the application status shows in YARN web UI is not determined
>>>> by the jobmanager process exit code.
>>>> Instead, we use "resourceManagerClient.unregisterApplicationMaster" to
>>>> control the final status of YARN application.
>>>> So although jobmanager exit with zero code, it still could show failed
>>>> status in YARN web UI.
>>>>
>>>> I have created a ticket to track this improvement[1].
>>>>
>>>> [1]. https://issues.apache.org/jira/browse/FLINK-18828
>>>>
>>>>
>>>> Best,
>>>> Yang
>>>>
>>>>
>>>> Till Rohrmann <[hidden email]> 于2020年8月5日周三 下午3:56写道:
>>>>
>>>>> Yes for the other deployments it is not a problem. A reason why people
>>>>> preferred non-zero exit codes in case of FAILED jobs is that this is easier
>>>>> to monitor than having to take a look at the actual job result. Moreover,
>>>>> in the YARN web UI the application shows as failed if I am not mistaken.
>>>>> However, from a framework's perspective, a FAILED job does not mean that
>>>>> Flink has failed and, hence, the return code could still be 0 in my opinion.
>>>>>
>>>>> Cheers,
>>>>> Till
>>>>>
>>>>> On Wed, Aug 5, 2020 at 9:30 AM Yang Wang <[hidden email]>
>>>>> wrote:
>>>>>
>>>>>> Hi Eleanore,
>>>>>>
>>>>>> Yes, I suggest to use Job to replace Deployment. It could be used
>>>>>> to run jobmanager one time and finish after a successful/failed completion.
>>>>>>
>>>>>> However, using Job still could not solve your problem completely.
>>>>>> Just as Till said, When a job exhausts the restart strategy, the jobmanager
>>>>>> pod will terminate with non-zero exit code. It will cause the K8s
>>>>>> restarting it again. Even though we could set the resartPolicy and
>>>>>> backoffLimit,
>>>>>> this is not a clean and correct way to go. We should terminate the
>>>>>> jobmanager process with zero exit code in such situation.
>>>>>>
>>>>>> @Till Rohrmann <[hidden email]> I just have one concern. Is it
>>>>>> a special case for K8s deployment? For standalone/Yarn/Mesos, it seems that
>>>>>> terminating with
>>>>>> non-zero exit code is harmless.
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>> Yang
>>>>>>
>>>>>> Eleanore Jin <[hidden email]> 于2020年8月4日周二 下午11:54写道:
>>>>>>
>>>>>>> Hi Yang & Till,
>>>>>>>
>>>>>>> Thanks for your prompt reply!
>>>>>>>
>>>>>>> Yang, regarding your question, I am actually not using k8s job, as I
>>>>>>> put my app.jar and its dependencies under flink's lib directory. I have 1
>>>>>>> k8s deployment for job manager, and 1 k8s deployment for task manager, and
>>>>>>> 1 k8s service for job manager.
>>>>>>>
>>>>>>> As you mentioned above, if flink job is marked as failed, it will
>>>>>>> cause the job manager pod to be restarted. Which is not the ideal
>>>>>>> behavior.
>>>>>>>
>>>>>>> Do you suggest that I should change the deployment strategy from
>>>>>>> using k8s deployment to k8s job? In case the flink program exit with
>>>>>>> non-zero code (e.g. exhausted number of configured restart), pod can be
>>>>>>> marked as complete hence not restarting the job again?
>>>>>>>
>>>>>>> Thanks a lot!
>>>>>>> Eleanore
>>>>>>>
>>>>>>> On Tue, Aug 4, 2020 at 2:49 AM Yang Wang <[hidden email]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> @Till Rohrmann <[hidden email]> In native mode, when a Flink
>>>>>>>> application terminates with FAILED state, all the resources will be cleaned
>>>>>>>> up.
>>>>>>>>
>>>>>>>> However, in standalone mode, I agree with you that we need to
>>>>>>>> rethink the exit code of Flink. When a job exhausts the restart
>>>>>>>> strategy, we should terminate the pod and do not restart again.
>>>>>>>> After googling, it seems that we could not specify the restartPolicy
>>>>>>>> based on exit code[1]. So maybe we need to return a zero exit code
>>>>>>>> to avoid restarting by K8s.
>>>>>>>>
>>>>>>>> [1].
>>>>>>>> https://stackoverflow.com/questions/48797297/is-it-possible-to-define-restartpolicy-based-on-container-exit-code
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Yang
>>>>>>>>
>>>>>>>> Till Rohrmann <[hidden email]> 于2020年8月4日周二 下午3:48写道:
>>>>>>>>
>>>>>>>>> @Yang Wang <[hidden email]> I believe that we should
>>>>>>>>> rethink the exit codes of Flink. In general you want K8s to restart a
>>>>>>>>> failed Flink process. Hence, an application which terminates in state
>>>>>>>>> FAILED should not return a non-zero exit code because it is a valid
>>>>>>>>> termination state.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Till
>>>>>>>>>
>>>>>>>>> On Tue, Aug 4, 2020 at 8:55 AM Yang Wang <[hidden email]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Eleanore,
>>>>>>>>>>
>>>>>>>>>> I think you are using K8s resource "Job" to deploy the
>>>>>>>>>> jobmanager. Please set .spec.template.spec.restartPolicy = "Never" and
>>>>>>>>>> spec.backoffLimit = 0.
>>>>>>>>>> Refer here[1] for more information.
>>>>>>>>>>
>>>>>>>>>> Then, when the jobmanager failed because of any reason, the K8s
>>>>>>>>>> job will be marked failed. And K8s will not restart the job again.
>>>>>>>>>>
>>>>>>>>>> [1].
>>>>>>>>>> https://kubernetes.io/docs/concepts/workloads/controllers/job/#job-termination-and-cleanup
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Yang
>>>>>>>>>>
>>>>>>>>>> Eleanore Jin <[hidden email]> 于2020年8月4日周二 上午12:05写道:
>>>>>>>>>>
>>>>>>>>>>> Hi Till,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for the reply!
>>>>>>>>>>>
>>>>>>>>>>> I manually deploy as per-job mode [1] and I am using Flink
>>>>>>>>>>> 1.8.2. Specifically, I build a custom docker image, which I copied the app
>>>>>>>>>>> jar (not uber jar) and all its dependencies under /flink/lib.
>>>>>>>>>>>
>>>>>>>>>>> So my question is more like, in this case, if the job is marked
>>>>>>>>>>> as FAILED, which causes k8s to restart the pod, this seems not help at all,
>>>>>>>>>>> what are the suggestions for such scenario?
>>>>>>>>>>>
>>>>>>>>>>> Thanks a lot!
>>>>>>>>>>> Eleanore
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html#flink-job-cluster-on-kubernetes
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Aug 3, 2020 at 2:13 AM Till Rohrmann <
>>>>>>>>>>> [hidden email]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Eleanore,
>>>>>>>>>>>>
>>>>>>>>>>>> how are you deploying Flink exactly? Are you using the
>>>>>>>>>>>> application mode with native K8s support to deploy a cluster [1] or are you
>>>>>>>>>>>> manually deploying a per-job mode [2]?
>>>>>>>>>>>>
>>>>>>>>>>>> I believe the problem might be that we terminate the Flink
>>>>>>>>>>>> process with a non-zero exit code if the job reaches the
>>>>>>>>>>>> ApplicationStatus.FAILED [3].
>>>>>>>>>>>>
>>>>>>>>>>>> cc Yang Wang have you observed a similar behavior when running
>>>>>>>>>>>> Flink in per-job mode on K8s?
>>>>>>>>>>>>
>>>>>>>>>>>> [1]
>>>>>>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/native_kubernetes.html#flink-kubernetes-application
>>>>>>>>>>>> [2]
>>>>>>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/kubernetes.html#job-cluster-resource-definitions
>>>>>>>>>>>> [3]
>>>>>>>>>>>> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/clusterframework/ApplicationStatus.java#L32
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jul 31, 2020 at 6:26 PM Eleanore Jin <
>>>>>>>>>>>> [hidden email]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Experts,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have a flink cluster (per job mode) running on kubernetes.
>>>>>>>>>>>>> The job is configured with restart strategy
>>>>>>>>>>>>>
>>>>>>>>>>>>> restart-strategy.fixed-delay.attempts: 3restart-strategy.fixed-delay.delay: 10 s
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> So after 3 times retry, the job will be marked as FAILED,
>>>>>>>>>>>>> hence the pods are not running. However, kubernetes will then restart the
>>>>>>>>>>>>> job again as the available replicas do not match the desired one.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I wonder what are the suggestions for such a scenario? How
>>>>>>>>>>>>> should I configure the flink job running on k8s?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks a lot!
>>>>>>>>>>>>> Eleanore
>>>>>>>>>>>>>
>>>>>>>>>>>>
Reply | Threaded
Open this post in threaded view
|

Re: Behavior for flink job running on K8S failed after restart strategy exhausted

Lori681
In reply to this post by Eleanore Jin

New » Free Games » Private Sex Orgy » Self Teen Girls
Private Video Collection » Very Explicit Cams
18 yo Teens Only » Asian Tiktok Teens
Home Made Model TKM » Real Life Cam

Download from Xubster-com:
Link: https://xubster.com/users/546/9802

Download from Daofile-com:
Link: https://daofile.com/go/3w4soyhvuake

Download from Nelion-me:
Link: https://nelion.me/go/w98s182gorax

Download from File-al:
Link: https://file.al/public/56284/31885

***************************
***************************

Log In or Sign Up; Link: https://xubster.com/free546.html

Amateur Young Girls
Link; 1: https://xubster.com/users/546/12421/0001
Link; 2: https://xubster.com/users/546/12462/0002
Link; 3: https://xubster.com/users/546/12463/0003
Link; 4: https://xubster.com/users/546/12464/0004
Link; 5: https://xubster.com/users/546/12465/0005
Link; 6: https://xubster.com/users/546/12466/0006
Link; 7: https://xubster.com/users/546/12467/0007
Link; 8: https://xubster.com/users/546/12468/0008
Link; 9: https://xubster.com/users/546/12469/0009
Link; 10: https://xubster.com/users/546/12470/0010
Link; 11: https://xubster.com/users/546/12471/0011
Link; 12: https://xubster.com/users/546/12472/0012

Covid 17-19-21-25 Girls
Link; 13: https://xubster.com/users/546/12422/0013
Link; 14: https://xubster.com/users/546/12473/0014

18 yo Teens Only
Innocent sweethearts star in 18 year old sex scenes
with their tight cunts filled by big cocks,
Sexy coeds give hot blowjobs and swallow cum,
Teens, Hardcore, Large Shlong, Amateur eighteen Years Old,
18 year old masturbates on cam,
Horny Teen 19 Years Old Masturbating On Web Cam,
Only fingers filling my pussy to orgasm.
Link; 15: https://xubster.com/users/546/12423/0015
Link; 16: https://xubster.com/users/546/12474/0016
Link; 17: https://xubster.com/users/546/12475/0017
Link; 18: https://xubster.com/users/546/12476/0018
Link; 19: https://xubster.com/users/546/12477/0019

Asian Tiktok Teens
Link; 20: https://xubster.com/users/546/12424/0020
Link; 21: https://xubster.com/users/546/12479/0021
Link; 22: https://xubster.com/users/546/12480/0022
Link; 23: https://xubster.com/users/546/12481/0023
Link; 24: https://xubster.com/users/546/12482/0024
Link; 25: https://xubster.com/users/546/12483/0025
Link; 26: https://xubster.com/users/546/12484/0026
Link; 27: https://xubster.com/users/546/12485/0027
Link; 28: https://xubster.com/users/546/12486/0028
Link; 29: https://xubster.com/users/546/12487/0029
Link; 30: https://xubster.com/users/546/12488/0030
Link; 31: https://xubster.com/users/546/12489/0031

Real Life Cam
Teen and Young Girls and Couples,
Voyeur Villa Nelly Doggystyle HD Sex,
Real Private Life on WebCam,
Categories: Voyeur Sex, WebCam Porn, SpyCam Fetish,
Teen Cam, Young Couples, Exhibitionism Video, Young Public Sex,
WebCam Public Sex, Masturbation Teens, TeenSex
Link; 32: https://xubster.com/users/546/12418/0032
Link; 33: https://xubster.com/users/546/12490/0033
Link; 34: https://xubster.com/users/546/12491/0034
Link; 35: https://xubster.com/users/546/12492/0035
Link; 36: https://xubster.com/users/546/12493/0036
Link; 37: https://xubster.com/users/546/12494/0037

Porn Tiktok 18+ Banned on TikTok
Link; 38: https://xubster.com/users/546/12592/0038
Link; 39: https://xubster.com/users/546/12593/0039
Link; 40: https://xubster.com/users/546/12594/0040
Link; 41: https://xubster.com/users/546/12595/0041
Link; 42: https://xubster.com/users/546/12596/0042
Link; 43: https://xubster.com/users/546/12597/0043
Link; 44: https://xubster.com/users/546/12598/0044
Link; 45: https://xubster.com/users/546/12599/0045
Link; 46: https://xubster.com/users/546/12600/0046
Link; 47: https://xubster.com/users/546/12601/0047
Link; 48: https://xubster.com/users/546/12602/0048
Link; 49: https://xubster.com/users/546/12603/0049
Link; 50: https://xubster.com/users/546/12604/0050

Teen Models
Japanese Teen Girl in WebCam Show After School
MISS VIKKI - TEEN RUSSIAN MODEL
Little Miss Vikki From Russia – My Private Collection
Mattie Doll – Horny Teen With a Talent for Sharing Sensational Orgasmes
Kyutty Kitty – Asian Sweety PussyCat
REAL VIDEOS OF SEXY TEEN MODEL Hentai-Cat
Effy Loweell – Sexy Young Models With Small Tits

Alice MFC
Link; 51: https://xubster.com/users/546/12427

Alison Lil Baby
Link; 52: https://xubster.com/users/546/12428

Cute Mary
Link; 53: https://xubster.com/users/546/12429

Effy Loweell
Link; 54: https://xubster.com/users/546/12430

Hana Lily
Link; 55: https://xubster.com/users/546/12431

Hentai-Cat
Link; 56: https://xubster.com/users/546/12432

Hot Nesquik
Link; 57: https://xubster.com/users/546/12433

Katya Letova
Link; 58: https://xubster.com/users/546/12434

Koska Leska
Link; 59: https://xubster.com/users/546/12435

Kyutty
Link; 60: https://xubster.com/users/546/12436

Mattie Doll
Link; 61: https://xubster.com/users/546/12437

Miss Vikki
Link; 62: https://xubster.com/users/546/12438

Venus Kitty
Link; 63: https://xubster.com/users/546/12439

Your Wet Schoolgirl
Link; 64: https://xubster.com/users/546/12440

Non Nude Tiktok Teens
Teen Cute Girls talk, sexy dance and play on cam
Link; 65: https://xubster.com/users/546/12452/0065
Link; 66: https://xubster.com/users/546/12507/0066
Link; 67: https://xubster.com/users/546/12508/0067
Link; 68: https://xubster.com/users/546/12509/0068
Link; 69: https://xubster.com/users/546/12510/0069
Link; 70: https://xubster.com/users/546/12511/0070
Link; 71: https://xubster.com/users/546/12512/0071
Link; 72: https://xubster.com/users/546/12513/0072
Link; 73: https://xubster.com/users/546/12514/0073
Link; 74: https://xubster.com/users/546/12515/0074
Link; 75: https://xubster.com/users/546/12516/0075
Link; 76: https://xubster.com/users/546/12517/0076

Nudism Young Girls
Link; 77: https://xubster.com/users/546/12453/0077
Link; 78: https://xubster.com/users/546/12518/0078
Link; 79: https://xubster.com/users/546/12519/0079
Link; 80: https://xubster.com/users/546/12520/0080
Link; 81: https://xubster.com/users/546/12521/0081
Link; 82: https://xubster.com/users/546/12522/0082
Link; 83: https://xubster.com/users/546/12523/0083

Russian Family Incest
Incest family teens
Link; 84: https://xubster.com/users/546/12454/0084
Link; 85: https://xubster.com/users/546/12524/0085
Link; 86: https://xubster.com/users/546/12525/0086
Link; 87: https://xubster.com/users/546/12526/0087
Link; 88: https://xubster.com/users/546/12527/0088
Link; 89: https://xubster.com/users/546/12528/0089
Link; 90: https://xubster.com/users/546/12529/0090
Link; 91: https://xubster.com/users/546/12530/0091

Real Spycam - Hiddencam
Link; 92: https://xubster.com/users/546/12455/0092
Link; 93: https://xubster.com/users/546/12531/0093
Link; 94: https://xubster.com/users/546/12532/0094
Link; 95: https://xubster.com/users/546/12533/0095
Link; 96: https://xubster.com/users/546/12534/0096
Link; 97: https://xubster.com/users/546/12535/0097
Link; 98: https://xubster.com/users/546/12536/0098
Link; 99: https://xubster.com/users/546/12537/0099
Link; 100: https://xubster.com/users/546/12538/0100
Link; 101: https://xubster.com/users/546/12539/0101
Link; 102: https://xubster.com/users/546/12540/0102
Link; 103: https://xubster.com/users/546/12541/0103

Tight Teen Pussy
FUCK TIGHT TEEN PUSSY - Real Fuck Extreme Small Teen Pussy - 18+
FULL HD 83 Hot Home Made Videos of Real Extreme Fuck Small Teen Pussy
Link; 104: https://xubster.com/users/546/12456/0104

Random Tiktok Girls
Link; 105: https://xubster.com/users/546/12457/0105
Link; 106: https://xubster.com/users/546/12542/0106
Link; 107: https://xubster.com/users/546/12543/0107
Link; 108: https://xubster.com/users/546/12544/0108
Link; 109: https://xubster.com/users/546/12545/0109
Link; 110: https://xubster.com/users/546/12546/0110
Link; 111: https://xubster.com/users/546/12547/0111
Link; 112: https://xubster.com/users/546/12548/0112
Link; 113: https://xubster.com/users/546/12549/0113
Link; 114: https://xubster.com/users/546/12550/0114
Link; 115: https://xubster.com/users/546/12551/0115
Link; 116: https://xubster.com/users/546/12552/0116
Link; 117: https://xubster.com/users/546/12553/0117
Link; 118: https://xubster.com/users/546/12554/0118

Skype and Omegle Girls
Link; 119: https://xubster.com/users/546/12459/0119
Link; 120: https://xubster.com/users/546/12555/0120
Link; 121: https://xubster.com/users/546/12556/0121
Link; 122: https://xubster.com/users/546/12557/0122
Link; 123: https://xubster.com/users/546/12558/0123
Link; 124: https://xubster.com/users/546/12559/0124
Link; 125: https://xubster.com/users/546/12560/0125
Link; 126: https://xubster.com/users/546/12561/0126
Link; 127: https://xubster.com/users/546/12562/0127
Link; 128: https://xubster.com/users/546/12563/0128
Link; 129: https://xubster.com/users/546/12564/0129
Link; 130: https://xubster.com/users/546/12565/0130

Tiktok Nude Girls
Link; 131: https://xubster.com/users/546/12460/0131
Link; 132: https://xubster.com/users/546/12566/0132
Link; 133: https://xubster.com/users/546/12567/0133
Link; 134: https://xubster.com/users/546/12568/0134
Link; 135: https://xubster.com/users/546/12569/0135
Link; 136: https://xubster.com/users/546/12570/0136
Link; 137: https://xubster.com/users/546/12571/0137
Link; 138: https://xubster.com/users/546/12572/0138
Link; 139: https://xubster.com/users/546/12573/0139
Link; 140: https://xubster.com/users/546/12574/0140
Link; 141: https://xubster.com/users/546/12575/0141
Link; 142: https://xubster.com/users/546/12576/0142
Link; 143: https://xubster.com/users/546/12577/0143
Link; 144: https://xubster.com/users/546/12578/0144
Link; 145: https://xubster.com/users/546/12579/0145

Webcam Teens and Couples
Link; 146: https://xubster.com/users/546/12461/0146
Link; 147: https://xubster.com/users/546/12580/0147
Link; 148: https://xubster.com/users/546/12581/0148
Link; 149: https://xubster.com/users/546/12582/0149
Link; 150: https://xubster.com/users/546/12583/0150
Link; 151: https://xubster.com/users/546/12584/0151
Link; 152: https://xubster.com/users/546/12585/0152
Link; 153: https://xubster.com/users/546/12586/0153
Link; 154: https://xubster.com/users/546/12587/0154
Link; 155: https://xubster.com/users/546/12588/0155
Link; 156: https://xubster.com/users/546/12589/0156
Link; 157: https://xubster.com/users/546/12590/0157

Rape Porn; Free Porn Videos; HD - VR Sex Videos
Link; 158: https://xubster.com/users/546/6261/001

Snuff Porn Videos; Fake murders and decapitation
Link; 159: https://xubster.com/users/546/6265/002

Forced Porn; Forced Sex - Forced To Fuck Videos
Link; 160: https://xubster.com/users/546/6266/003

Hot Asian Teen Sex Videos; Japanese and Korean Porn Movies
Link; 161: https://xubster.com/users/546/6262/004

Sleeping Girl Gets An Unexpected Visit Late At Night
Sex Sleeping Girl Porn Videos
Link; 162: https://xubster.com/users/546/6296/005

Lesbian Necrophilia Porn Videos
Link; 163: https://xubster.com/users/546/6290/006

Horror Porn Videos - Sex Movies
Link; 164: https://xubster.com/users/546/6295/007

Sex Gay Porn Videos
Link; 165: https://xubster.com/users/546/6281/008

:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:

3D, Hentai, Free Games

Download from Daofile.com:
Link: https://daofile.com/go/p47ssveuv75c

Download:
Sex Cartoons; Free Games; Hentai; Manga; 3D
MegaPack; 367249 Files

Download: MegaPack; 367249:
Link: https://file.al/public/56284/38916

:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:

Preview Video
Link: https://xubster.com/users/546/12061

HD Young Nudists PN Videos

Download from Xubster.com:
Link: https://xubster.com/users/546/12062
Link: https://xubster.com/users/546/12067

Download:
PN Teen Girls; Young Adult Nudism; Teen Nudist Sex:
MegaPack; 83150 Files

Download: MegaPack; 83150:
Link: https://file.al/public/56284/38915

:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:

Private Collection; Real Teen Sexy Selfies and Videos
MegaPack; 192851 Files

Download: MegaPack; 192851:
Link: https://file.al/public/56284/38917

:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:

Preview Video
Link: https://daofile.com/go/efb4i0wyu73a

TukTuk Patrol - Thai Teen Video:

Download from Daofile.com:
Link: https://daofile.com/go/0ackk6rwv5gq

:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:

Girls and Boys PT - Photo and Video
MegaPack; 531498 Files

Download: MegaPack; 531498:
Link: https://file.al/public/56284/39650

:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:-:

VIP: - Young Nude Vagina
Link; 1: https://daofile.com/go/58017o3w2wa1

VIP: - Taboo Teen Archive
Link; 2: https://daofile.com/go/at6nq7tzdrwq
Link; 3: https://daofile.com/go/uqvdfvlt1b7j

VIP: - Private Sex Orgy: - Self Teen Girls
Link; 4: https://daofile.com/go/rwmcfthjrcew
Link; 5: https://daofile.com/go/7x4q0mtks6bo

Young Teen Bathing Suit Videos and HD Footage
Asian Woman In Bathroom Videos and HD Footage
Japanese Bath Culture, Public Bath
Link; 6: https://daofile.com/go/zvcjqfm0s50w
Link; 7: https://daofile.com/go/62mt4oaxq78n
Link; 8: https://daofile.com/go/x1q0iheql2ym
Link; 9: https://daofile.com/go/c9d7xs7fsme6
Link; 10: https://daofile.com/go/0pza6o1prc3r
Link; 11: https://daofile.com/go/s6qkgi10c7tf
Link; 12: https://daofile.com/go/1mg548npuj9p
Link; 13: https://daofile.com/go/piif3sxzu7y4

Japanese Teen HD Sex Porn Videos
Japanese School Girl Full Movie Porn videos
Japanese teen jav xxx sex school asian big tits milf mom sister porn HD
Link; 14: https://daofile.com/go/ws3qwum15koi
Link; 15: https://daofile.com/go/z9k8qssuw74c
Link; 16: https://daofile.com/go/r2pmntlq6vkp
Link; 17: https://daofile.com/go/r9khigu0c0xe
Link; 18: https://daofile.com/go/5qfkaafzvk0k
Link; 19: https://daofile.com/go/st4jcfg1g9bz
Link; 20: https://daofile.com/go/4hvigt8dchbc

Selfie teens
Real teens sexy selfies, show teen tits
Link; 21: https://daofile.com/go/dlfstx2s3mv3
Link; 22: https://daofile.com/go/3i181cjpm77j
Link; 23: https://daofile.com/go/ccyhj01bdnmg
Link; 24: https://daofile.com/go/gh7snep8cn54

Collection of Teen Sex and Erotic Videos
Link; 25: https://daofile.com/go/kun7aw1l0sxy
Link; 26: https://daofile.com/go/u9jikdewbmen
Link; 27: https://daofile.com/go/sxflclskqlde
Link; 28: https://daofile.com/go/htsmkg04kkop
Link; 29: https://daofile.com/go/daq9svwypcpg
Link; 30: https://daofile.com/go/d26g52rcnyql
Link; 31: https://daofile.com/go/4msqlcw96jyf
Link; 32: https://daofile.com/go/a6vapjguf0x7
Link; 33: https://daofile.com/go/937pupbznnt1

Webcam Teen
Teen Erotic Videos From Real Life Cams - Omegle teen, Skype teen
Link; 34: https://daofile.com/go/e2nnzbuhjt5z
Link; 35: https://daofile.com/go/nz1tewuygcr1
Link; 36: https://daofile.com/go/23sfxojnkhlc
Link; 37: https://daofile.com/go/31lpybl6312o
Link; 38: https://daofile.com/go/ahtftflfq6gl
Link; 39: https://daofile.com/go/5xcpj94xj6tw
Link; 40: https://daofile.com/go/gyshuzhg00l8
Link; 41: https://daofile.com/go/kr3zaonpkf4p
Link; 42: https://daofile.com/go/cl923bdxvs9k
Link; 43: https://daofile.com/go/3burlwssg7py
Link; 44: https://daofile.com/go/e12thco5doao

Young Girls and Boys Make Real Hot Sex on Cam
Link; 45: https://daofile.com/go/k0ws7lypjw5c
Link; 46: https://daofile.com/go/w96f0hj7ym8t

Sex Machine Porn Videos
Link; 47: https://daofile.com/go/psc0hbsfch2w
Link; 48: https://daofile.com/go/xet9s4b8l1n0

Teen Crazy Girls Gallery
Link; 49: https://daofile.com/go/ka68my4wdqca
Link; 50: https://daofile.com/go/wwiaf2oaavgp

Toilet HD Videos - Hidden cams in toilets film every amateur comer.
Watch Public toilet spy cam of girls pissing of Best Collection Voyeur Porn videos.
Medical And Gyno Voyeur Videos
Watch Medical voyeur cam shooting,
Asian explored in the gyno office of Best Collection Voyeur Porn videos.
Link; 51: https://daofile.com/go/71feh10vjrfe
Link; 52: https://daofile.com/go/osxvp1epjyam
Link; 53: https://daofile.com/go/ylhuc48hu73l
Link; 54: https://daofile.com/go/5lx4gmnok82y
Link; 55: https://daofile.com/go/u13oih6vbrjc
Link; 56: https://daofile.com/go/s3a5qk1p5cyw

TokyoDolls: Sexy Teen Girls - Full Collection
Link; 57: https://daofile.com/go/y9w47mnulyw1

Galitsin teen - TP Sex Videos
Link; 58: https://daofile.com/go/6uttrr3le10n
Link; 59: https://daofile.com/go/sjatsg7tjroi
Link; 60: https://daofile.com/go/1gx4d40cf40w
Link; 61: https://daofile.com/go/mu8hlmao4fge
Link; 62: https://daofile.com/go/phu0sv1tgx9c
Link; 63: https://daofile.com/go/z6oa86xp644b

Candid HD
Link; 64: https://daofile.com/go/1piz0c48n4p3

Femdom BDSM
Link; 65: https://daofile.com/go/u6w29zcyys0c

Nonude models
Link; 66: https://daofile.com/go/nnxix879th8k

Nudi-Pageant
Link; 67: https://daofile.com/go/nxnme8zrveru

TTL and YFM Teen Latinas Models
Link; 68: https://daofile.com/go/wlpk7947rax3

****************************
****************************

New » Free Games » Private Sex Orgy » Self Teen Girls
Private Video Collection » Very Explicit Cams
18 yo Teens Only » Asian Tiktok Teens
Home Made Model TKM » Real Life Cam

Download from Xubster-com:
Link: https://xubster.com/users/546/9802

Download from Daofile-com:
Link: https://daofile.com/go/3w4soyhvuake

Download from Nelion-me:
Link: https://nelion.me/go/w98s182gorax

Download from File-al:
Link: https://file.al/public/56284/31885

¤¤
¤¤
¤¤
¤¤
¤¤