理性规律异常,停止任务后,再提交会导致JobManager进程失败。

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

理性规律异常,停止任务后,再提交会导致JobManager进程失败。

nobleyd
如题,standalone,1.12。
目前感觉不像是停止任务或启动任务本身问题。看起来像是这俩操作导致JM的压力大什么的。然后报错异常如下:

2021-03-03 10:14:51,298 ERROR
org.apache.flink.runtime.util.FatalExitExceptionHandler [] - FATAL: Thread
'cluster-io-thread-3' produced an uncaught exception. Stopping the
process...

java.util.concurrent.RejectedExecutionException: Task
java.util.concurrent.ScheduledThreadPoolExecutor
$ScheduledFutureTask@422cfccb rejected from
java.util.concurrent.ScheduledThreadPoolExecutor@6709b3f5[Terminated, pool
size = 0, active threads = 0, queued tasks = 0, completed tasks = 2304]
at
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
~[?:1.8.0_251]
at
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
~[?:1.8.0_251]
at
java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:326)
~[?:1.8.0_251]
at
java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:533)
~[?:1.8.0_251]
at
java.util.concurrent.ScheduledThreadPoolExecutor.execute(ScheduledThreadPoolExecutor.java:622)
~[?:1.8.0_251]
at
java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668)
~[?:1.8.0_251]
at
org.apache.flink.runtime.concurrent.ScheduledExecutorServiceAdapter.execute(ScheduledExecutorServiceAdapter.java:62)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
at
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.scheduleTriggerRequest(CheckpointCoordinator.java:1152)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
at
org.apache.flink.runtime.checkpoint.CheckpointsCleaner.lambda$cleanCheckpoint$0(CheckpointsCleaner.java:58)
~[flink-dist_2.11-1.12.0.jar:1.12.0]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
~[?:1.8.0_251]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
~[?:1.8.0_251]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_251]
Reply | Threaded
Open this post in threaded view
|

Re: 理性规律异常,停止任务后,再提交会导致JobManager进程失败。

nobleyd
这个问题和我之前反馈的另一个问题估计也有关系。实际表现还有个特点:
即提交任务后,任务会处于initialize阶段较长时间,并且WEB-UI开始卡顿转圈无法展示具体状态。然后过一会恢复(此期间某JM进程会失败自动重启(我们这边的脚本机制))。这是某种表现,还有一种是处于initialize阶段较长时间后,恢复之后会出现多个一模一样的处于innitialize阶段的任务(从web-ui界面看到),然后陆续减少到1个,最终只有1个成功运行处于running状态。

yidan zhao <[hidden email]> 于2021年3月3日周三 上午10:52写道:

> 如题,standalone,1.12。
> 目前感觉不像是停止任务或启动任务本身问题。看起来像是这俩操作导致JM的压力大什么的。然后报错异常如下:
>
> 2021-03-03 10:14:51,298 ERROR
> org.apache.flink.runtime.util.FatalExitExceptionHandler [] - FATAL:
> Thread 'cluster-io-thread-3' produced an uncaught exception. Stopping the
> process...
>
> java.util.concurrent.RejectedExecutionException: Task
> java.util.concurrent.ScheduledThreadPoolExecutor
> $ScheduledFutureTask@422cfccb rejected from
> java.util.concurrent.ScheduledThreadPoolExecutor@6709b3f5[Terminated,
> pool size = 0, active threads = 0, queued tasks = 0, completed tasks =
> 2304]
> at
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
> ~[?:1.8.0_251]
> at
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
> ~[?:1.8.0_251]
> at
> java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:326)
> ~[?:1.8.0_251]
> at
> java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:533)
> ~[?:1.8.0_251]
> at
> java.util.concurrent.ScheduledThreadPoolExecutor.execute(ScheduledThreadPoolExecutor.java:622)
> ~[?:1.8.0_251]
> at
> java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668)
> ~[?:1.8.0_251]
> at
> org.apache.flink.runtime.concurrent.ScheduledExecutorServiceAdapter.execute(ScheduledExecutorServiceAdapter.java:62)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.scheduleTriggerRequest(CheckpointCoordinator.java:1152)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at
> org.apache.flink.runtime.checkpoint.CheckpointsCleaner.lambda$cleanCheckpoint$0(CheckpointsCleaner.java:58)
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ~[?:1.8.0_251]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> ~[?:1.8.0_251]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_251]
>
>
>