关于flink任务挂掉报警的监控指标选择

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

关于flink任务挂掉报警的监控指标选择

bradyMk
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re:关于flink任务挂掉报警的监控指标选择

hechuan
可以配置任务重启告警, flink任务挂掉之后会自动尝试重启
如果是固定任务数量的话, 还可以配置slot数量告警



在 2020-11-05 10:15:01,"bradyMk" <[hidden email]> 写道:

>请问各位大佬,我基于grafana+prometheus构建的Flink监控,现在想实现flink任务挂掉后,grafana就发出报警的功能,但是目前不知道该用什么指标去监控,我之前想监控flink_jobmanager_job_uptime这个指标,设置的监控规则是:max_over_time(flink_jobmanager_job_uptime[1m])
>-
>min_over_time(flink_jobmanager_job_uptime[1m])的差小于等于0就报警,但是任务刚启动,会有误报,想请教下有没有更好的办法
>
>
>
>-----
>Best Wishes
>--
>Sent from: http://apache-flink.147419.n8.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Re:关于flink任务挂掉报警的监控指标选择

bradyMk
CONTENTS DELETED
The author has deleted this message.