jobmanager 日志异常

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

jobmanager 日志异常

戴嘉诚
大家好:


我的flink是部署在yarn上左session,今天早上jobmanager自动退出了,然后yarn把他重新拉起了,导致里面跑的job重新启动了,但是我查看日志,看到jobmanager的日志没有任何异常,同时jobmanager也没有长时间的full
gc和频繁的gc,以下是jobmanager的日志:
就是在06:44分的是偶,日志上标记了收收到停止请求,然后jobmanager直接停止了...请问是由于什么原因导致的呢?

2019-08-06 06:43:58,891 INFO

>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed
> checkpoint 7843 for job e49624208fe771c4c9527799fd46f2a3 (5645215 bytes in
> 801 ms).
> 2019-08-06 06:43:59,336 INFO
>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering
> checkpoint 7852 @ 1565045039321 for job a9a7464ead55474bea6f42ed8e5de60f.
> 2019-08-06 06:44:00,971 INFO
>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering
> checkpoint 7852 @ 1565045040957 for job 79788b218e684cb31c1ca0fcc641e89f.
> 2019-08-06 06:44:01,357 INFO
>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed
> checkpoint 7852 for job a9a7464ead55474bea6f42ed8e5de60f (25870658 bytes in
> 1806 ms).
> 2019-08-06 06:44:02,887 INFO
>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed
> checkpoint 7852 for job 79788b218e684cb31c1ca0fcc641e89f (29798945 bytes in
> 1849 ms).
> 2019-08-06 06:44:05,101 INFO
>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering
> checkpoint 7852 @ 1565045045092 for job 03f3a0bd53c21f90f70ea01916dc9f78.
> 2019-08-06 06:44:06,547 INFO
>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering
> checkpoint 7844 @ 1565045046522 for job 486a1949d75863f823013d87b509d228.
> 2019-08-06 06:44:07,311 INFO
>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed
> checkpoint 7844 for job 486a1949d75863f823013d87b509d228 (62458942 bytes in
> 736 ms).
> 2019-08-06 06:44:07,506 INFO
>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed
> checkpoint 7852 for job 03f3a0bd53c21f90f70ea01916dc9f78 (105565032 bytes
> in 2366 ms).
> 2019-08-06 06:44:08,087 INFO
>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering
> checkpoint 7853 @ 1565045048055 for job 32783d371464265ef536454055ae6182.
> 2019-08-06 06:44:09,626 INFO
>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Checkpoint
> 7050 of job 4b542195824ff7b7cdf749543fd368cb expired before completing.
> 2019-08-06 06:44:09,647 INFO
>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering
> checkpoint 7051 @ 1565045049626 for job 4b542195824ff7b7cdf749543fd368cb.
> 2019-08-06 06:44:12,006 INFO
>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed
> checkpoint 7853 for job 32783d371464265ef536454055ae6182 (299599482 bytes
> in 3912 ms).
> 2019-08-06 06:44:12,972 INFO
>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering
> checkpoint 7853 @ 1565045052962 for job 16db5afe9a8cd7c6278030d5dec4c80c.
> 2019-08-06 06:44:13,109 INFO
>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering
> checkpoint 7853 @ 1565045053080 for job 9c1394a2d2ff47c7852eff9f1f932535.
> 2019-08-06 06:44:16,779 INFO
>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed
> checkpoint 7853 for job 16db5afe9a8cd7c6278030d5dec4c80c (152643149 bytes
> in 3666 ms).
> 2019-08-06 06:44:18,598 INFO
>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed
> checkpoint 7828 for job 8df2b47f2a4c1ba0f7019ee5989f6e71 (837558245 bytes
> in 23472 ms).
> 2019-08-06 06:44:19,193 INFO
>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed
> checkpoint 7853 for job 9c1394a2d2ff47c7852eff9f1f932535 (594628825 bytes
> in 6067 ms).
> 2019-08-06 06:44:19,238 INFO
>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed
> checkpoint 5855 for job 108ce7f6f5f3e76b12fad9dbdbc8feba (45917615 bytes in
> 61819 ms).
> 2019-08-06 06:44:19,248 INFO
>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering
> checkpoint 5856 @ 1565045059238 for job 108ce7f6f5f3e76b12fad9dbdbc8feba.
> 2019-08-06 06:44:22,092 INFO
>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering
> checkpoint 7802 @ 1565045062084 for job 430689e0f202fcb29ce9d6403e6825f9.
> 2019-08-06 06:44:22,838 INFO
>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed
> checkpoint 2940 for job fea51fd74006de69e265adc13e802229 (122562953 bytes
> in 174336 ms).
> 2019-08-06 06:44:22,888 INFO
>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering
> checkpoint 2941 @ 1565045062838 for job fea51fd74006de69e265adc13e802229.
> 2019-08-06 06:44:24,348 INFO
>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering
> checkpoint 613 @ 1565045064328 for job 5a75d77312f29c714af0a2994f0e8b1a.
> 2019-08-06 06:44:25,327 INFO
>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed
> checkpoint 7802 for job 430689e0f202fcb29ce9d6403e6825f9 (358649788 bytes
> in 2788 ms).
> 2019-08-06 06:44:25,769 INFO
>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed
> checkpoint 613 for job 5a75d77312f29c714af0a2994f0e8b1a (583594 bytes in
> 1341 ms).
> 2019-08-06 06:44:27,547 INFO
>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering
> checkpoint 7844 @ 1565045067534 for job fb32bbf35ed002961b9dfb1417799ae6.
> 2019-08-06 06:44:28,738 INFO
>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed
> checkpoint 7844 for job fb32bbf35ed002961b9dfb1417799ae6 (11017757 bytes in
> 1178 ms).
> 2019-08-06 06:44:37,576 INFO
>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering
> checkpoint 7853 @ 1565045077573 for job d73c940cf0a996e12ecb93a146f93293.
> 2019-08-06 06:44:38,167 INFO
>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed
> checkpoint 7853 for job d73c940cf0a996e12ecb93a146f93293 (123726 bytes in
> 562 ms).
> 2019-08-06 06:44:45,957 INFO
>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - RECEIVED
> SIGNAL 15: SIGTERM. Shutting down as requested.
> 2019-08-06 06:44:45,957 INFO
>  org.apache.flink.runtime.blob.TransientBlobCache              - Shutting
> down BLOB cache
Reply | Threaded
Open this post in threaded view
|

Re: jobmanager 日志异常

Wong Victor
Hi,
  可以查看一下jobmanager所在节点的yarn log,搜索一下对应的container为什么被kill;

Regards

On 2019/8/6, 11:40 AM, "戴嘉诚" <[hidden email]> wrote:

    大家好:
   
   
    我的flink是部署在yarn上左session,今天早上jobmanager自动退出了,然后yarn把他重新拉起了,导致里面跑的job重新启动了,但是我查看日志,看到jobmanager的日志没有任何异常,同时jobmanager也没有长时间的full
    gc和频繁的gc,以下是jobmanager的日志:
    就是在06:44分的是偶,日志上标记了收收到停止请求,然后jobmanager直接停止了...请问是由于什么原因导致的呢?
   
    2019-08-06 06:43:58,891 INFO
    >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed
    > checkpoint 7843 for job e49624208fe771c4c9527799fd46f2a3 (5645215 bytes in
    > 801 ms).
    > 2019-08-06 06:43:59,336 INFO
    >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering
    > checkpoint 7852 @ 1565045039321 for job a9a7464ead55474bea6f42ed8e5de60f.
    > 2019-08-06 06:44:00,971 INFO
    >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering
    > checkpoint 7852 @ 1565045040957 for job 79788b218e684cb31c1ca0fcc641e89f.
    > 2019-08-06 06:44:01,357 INFO
    >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed
    > checkpoint 7852 for job a9a7464ead55474bea6f42ed8e5de60f (25870658 bytes in
    > 1806 ms).
    > 2019-08-06 06:44:02,887 INFO
    >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed
    > checkpoint 7852 for job 79788b218e684cb31c1ca0fcc641e89f (29798945 bytes in
    > 1849 ms).
    > 2019-08-06 06:44:05,101 INFO
    >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering
    > checkpoint 7852 @ 1565045045092 for job 03f3a0bd53c21f90f70ea01916dc9f78.
    > 2019-08-06 06:44:06,547 INFO
    >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering
    > checkpoint 7844 @ 1565045046522 for job 486a1949d75863f823013d87b509d228.
    > 2019-08-06 06:44:07,311 INFO
    >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed
    > checkpoint 7844 for job 486a1949d75863f823013d87b509d228 (62458942 bytes in
    > 736 ms).
    > 2019-08-06 06:44:07,506 INFO
    >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed
    > checkpoint 7852 for job 03f3a0bd53c21f90f70ea01916dc9f78 (105565032 bytes
    > in 2366 ms).
    > 2019-08-06 06:44:08,087 INFO
    >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering
    > checkpoint 7853 @ 1565045048055 for job 32783d371464265ef536454055ae6182.
    > 2019-08-06 06:44:09,626 INFO
    >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Checkpoint
    > 7050 of job 4b542195824ff7b7cdf749543fd368cb expired before completing.
    > 2019-08-06 06:44:09,647 INFO
    >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering
    > checkpoint 7051 @ 1565045049626 for job 4b542195824ff7b7cdf749543fd368cb.
    > 2019-08-06 06:44:12,006 INFO
    >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed
    > checkpoint 7853 for job 32783d371464265ef536454055ae6182 (299599482 bytes
    > in 3912 ms).
    > 2019-08-06 06:44:12,972 INFO
    >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering
    > checkpoint 7853 @ 1565045052962 for job 16db5afe9a8cd7c6278030d5dec4c80c.
    > 2019-08-06 06:44:13,109 INFO
    >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering
    > checkpoint 7853 @ 1565045053080 for job 9c1394a2d2ff47c7852eff9f1f932535.
    > 2019-08-06 06:44:16,779 INFO
    >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed
    > checkpoint 7853 for job 16db5afe9a8cd7c6278030d5dec4c80c (152643149 bytes
    > in 3666 ms).
    > 2019-08-06 06:44:18,598 INFO
    >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed
    > checkpoint 7828 for job 8df2b47f2a4c1ba0f7019ee5989f6e71 (837558245 bytes
    > in 23472 ms).
    > 2019-08-06 06:44:19,193 INFO
    >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed
    > checkpoint 7853 for job 9c1394a2d2ff47c7852eff9f1f932535 (594628825 bytes
    > in 6067 ms).
    > 2019-08-06 06:44:19,238 INFO
    >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed
    > checkpoint 5855 for job 108ce7f6f5f3e76b12fad9dbdbc8feba (45917615 bytes in
    > 61819 ms).
    > 2019-08-06 06:44:19,248 INFO
    >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering
    > checkpoint 5856 @ 1565045059238 for job 108ce7f6f5f3e76b12fad9dbdbc8feba.
    > 2019-08-06 06:44:22,092 INFO
    >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering
    > checkpoint 7802 @ 1565045062084 for job 430689e0f202fcb29ce9d6403e6825f9.
    > 2019-08-06 06:44:22,838 INFO
    >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed
    > checkpoint 2940 for job fea51fd74006de69e265adc13e802229 (122562953 bytes
    > in 174336 ms).
    > 2019-08-06 06:44:22,888 INFO
    >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering
    > checkpoint 2941 @ 1565045062838 for job fea51fd74006de69e265adc13e802229.
    > 2019-08-06 06:44:24,348 INFO
    >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering
    > checkpoint 613 @ 1565045064328 for job 5a75d77312f29c714af0a2994f0e8b1a.
    > 2019-08-06 06:44:25,327 INFO
    >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed
    > checkpoint 7802 for job 430689e0f202fcb29ce9d6403e6825f9 (358649788 bytes
    > in 2788 ms).
    > 2019-08-06 06:44:25,769 INFO
    >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed
    > checkpoint 613 for job 5a75d77312f29c714af0a2994f0e8b1a (583594 bytes in
    > 1341 ms).
    > 2019-08-06 06:44:27,547 INFO
    >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering
    > checkpoint 7844 @ 1565045067534 for job fb32bbf35ed002961b9dfb1417799ae6.
    > 2019-08-06 06:44:28,738 INFO
    >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed
    > checkpoint 7844 for job fb32bbf35ed002961b9dfb1417799ae6 (11017757 bytes in
    > 1178 ms).
    > 2019-08-06 06:44:37,576 INFO
    >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Triggering
    > checkpoint 7853 @ 1565045077573 for job d73c940cf0a996e12ecb93a146f93293.
    > 2019-08-06 06:44:38,167 INFO
    >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed
    > checkpoint 7853 for job d73c940cf0a996e12ecb93a146f93293 (123726 bytes in
    > 562 ms).
    > 2019-08-06 06:44:45,957 INFO
    >  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - RECEIVED
    > SIGNAL 15: SIGTERM. Shutting down as requested.
    > 2019-08-06 06:44:45,957 INFO
    >  org.apache.flink.runtime.blob.TransientBlobCache              - Shutting
    > down BLOB cache
   

Reply | Threaded
Open this post in threaded view
|

Re: jobmanager 日志异常

Biao Liu
你好,

> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - RECEIVED
> SIGNAL 15: SIGTERM. Shutting down as requested.

这是收到了 signal 15 了 [1],Wong 说得对,搜一下 yarn node manager 或者 yarn resource
manager 的 log

1. https://access.redhat.com/solutions/737033

Thanks,
Biao /'bɪ.aʊ/



On Tue, Aug 6, 2019 at 12:30 PM Wong Victor <[hidden email]>
wrote:

> Hi,
>   可以查看一下jobmanager所在节点的yarn log,搜索一下对应的container为什么被kill;
>
> Regards
>
> On 2019/8/6, 11:40 AM, "戴嘉诚" <[hidden email]> wrote:
>
>     大家好:
>
>
>
> 我的flink是部署在yarn上左session,今天早上jobmanager自动退出了,然后yarn把他重新拉起了,导致里面跑的job重新启动了,但是我查看日志,看到jobmanager的日志没有任何异常,同时jobmanager也没有长时间的full
>     gc和频繁的gc,以下是jobmanager的日志:
>     就是在06:44分的是偶,日志上标记了收收到停止请求,然后jobmanager直接停止了...请问是由于什么原因导致的呢?
>
>     2019-08-06 06:43:58,891 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Completed
>     > checkpoint 7843 for job e49624208fe771c4c9527799fd46f2a3 (5645215
> bytes in
>     > 801 ms).
>     > 2019-08-06 06:43:59,336 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Triggering
>     > checkpoint 7852 @ 1565045039321 for job
> a9a7464ead55474bea6f42ed8e5de60f.
>     > 2019-08-06 06:44:00,971 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Triggering
>     > checkpoint 7852 @ 1565045040957 for job
> 79788b218e684cb31c1ca0fcc641e89f.
>     > 2019-08-06 06:44:01,357 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Completed
>     > checkpoint 7852 for job a9a7464ead55474bea6f42ed8e5de60f (25870658
> bytes in
>     > 1806 ms).
>     > 2019-08-06 06:44:02,887 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Completed
>     > checkpoint 7852 for job 79788b218e684cb31c1ca0fcc641e89f (29798945
> bytes in
>     > 1849 ms).
>     > 2019-08-06 06:44:05,101 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Triggering
>     > checkpoint 7852 @ 1565045045092 for job
> 03f3a0bd53c21f90f70ea01916dc9f78.
>     > 2019-08-06 06:44:06,547 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Triggering
>     > checkpoint 7844 @ 1565045046522 for job
> 486a1949d75863f823013d87b509d228.
>     > 2019-08-06 06:44:07,311 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Completed
>     > checkpoint 7844 for job 486a1949d75863f823013d87b509d228 (62458942
> bytes in
>     > 736 ms).
>     > 2019-08-06 06:44:07,506 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Completed
>     > checkpoint 7852 for job 03f3a0bd53c21f90f70ea01916dc9f78 (105565032
> bytes
>     > in 2366 ms).
>     > 2019-08-06 06:44:08,087 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Triggering
>     > checkpoint 7853 @ 1565045048055 for job
> 32783d371464265ef536454055ae6182.
>     > 2019-08-06 06:44:09,626 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Checkpoint
>     > 7050 of job 4b542195824ff7b7cdf749543fd368cb expired before
> completing.
>     > 2019-08-06 06:44:09,647 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Triggering
>     > checkpoint 7051 @ 1565045049626 for job
> 4b542195824ff7b7cdf749543fd368cb.
>     > 2019-08-06 06:44:12,006 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Completed
>     > checkpoint 7853 for job 32783d371464265ef536454055ae6182 (299599482
> bytes
>     > in 3912 ms).
>     > 2019-08-06 06:44:12,972 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Triggering
>     > checkpoint 7853 @ 1565045052962 for job
> 16db5afe9a8cd7c6278030d5dec4c80c.
>     > 2019-08-06 06:44:13,109 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Triggering
>     > checkpoint 7853 @ 1565045053080 for job
> 9c1394a2d2ff47c7852eff9f1f932535.
>     > 2019-08-06 06:44:16,779 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Completed
>     > checkpoint 7853 for job 16db5afe9a8cd7c6278030d5dec4c80c (152643149
> bytes
>     > in 3666 ms).
>     > 2019-08-06 06:44:18,598 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Completed
>     > checkpoint 7828 for job 8df2b47f2a4c1ba0f7019ee5989f6e71 (837558245
> bytes
>     > in 23472 ms).
>     > 2019-08-06 06:44:19,193 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Completed
>     > checkpoint 7853 for job 9c1394a2d2ff47c7852eff9f1f932535 (594628825
> bytes
>     > in 6067 ms).
>     > 2019-08-06 06:44:19,238 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Completed
>     > checkpoint 5855 for job 108ce7f6f5f3e76b12fad9dbdbc8feba (45917615
> bytes in
>     > 61819 ms).
>     > 2019-08-06 06:44:19,248 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Triggering
>     > checkpoint 5856 @ 1565045059238 for job
> 108ce7f6f5f3e76b12fad9dbdbc8feba.
>     > 2019-08-06 06:44:22,092 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Triggering
>     > checkpoint 7802 @ 1565045062084 for job
> 430689e0f202fcb29ce9d6403e6825f9.
>     > 2019-08-06 06:44:22,838 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Completed
>     > checkpoint 2940 for job fea51fd74006de69e265adc13e802229 (122562953
> bytes
>     > in 174336 ms).
>     > 2019-08-06 06:44:22,888 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Triggering
>     > checkpoint 2941 @ 1565045062838 for job
> fea51fd74006de69e265adc13e802229.
>     > 2019-08-06 06:44:24,348 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Triggering
>     > checkpoint 613 @ 1565045064328 for job
> 5a75d77312f29c714af0a2994f0e8b1a.
>     > 2019-08-06 06:44:25,327 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Completed
>     > checkpoint 7802 for job 430689e0f202fcb29ce9d6403e6825f9 (358649788
> bytes
>     > in 2788 ms).
>     > 2019-08-06 06:44:25,769 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Completed
>     > checkpoint 613 for job 5a75d77312f29c714af0a2994f0e8b1a (583594
> bytes in
>     > 1341 ms).
>     > 2019-08-06 06:44:27,547 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Triggering
>     > checkpoint 7844 @ 1565045067534 for job
> fb32bbf35ed002961b9dfb1417799ae6.
>     > 2019-08-06 06:44:28,738 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Completed
>     > checkpoint 7844 for job fb32bbf35ed002961b9dfb1417799ae6 (11017757
> bytes in
>     > 1178 ms).
>     > 2019-08-06 06:44:37,576 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Triggering
>     > checkpoint 7853 @ 1565045077573 for job
> d73c940cf0a996e12ecb93a146f93293.
>     > 2019-08-06 06:44:38,167 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Completed
>     > checkpoint 7853 for job d73c940cf0a996e12ecb93a146f93293 (123726
> bytes in
>     > 562 ms).
>     > 2019-08-06 06:44:45,957 INFO
>     >  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -
> RECEIVED
>     > SIGNAL 15: SIGTERM. Shutting down as requested.
>     > 2019-08-06 06:44:45,957 INFO
>     >  org.apache.flink.runtime.blob.TransientBlobCache              -
> Shutting
>     > down BLOB cache
>
>
>
Reply | Threaded
Open this post in threaded view
|

答复: jobmanager 日志异常

戴嘉诚
你好,
        谢谢!已经找到原因了

发件人: Biao Liu
发送时间: 2019年8月6日 13:55
收件人: user-zh
主题: Re: jobmanager 日志异常

你好,

> org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - RECEIVED
> SIGNAL 15: SIGTERM. Shutting down as requested.

这是收到了 signal 15 了 [1],Wong 说得对,搜一下 yarn node manager 或者 yarn resource
manager 的 log

1. https://access.redhat.com/solutions/737033

Thanks,
Biao /'bɪ.aʊ/



On Tue, Aug 6, 2019 at 12:30 PM Wong Victor <[hidden email]>
wrote:

> Hi,
>   可以查看一下jobmanager所在节点的yarn log,搜索一下对应的container为什么被kill;
>
> Regards
>
> On 2019/8/6, 11:40 AM, "戴嘉诚" <[hidden email]> wrote:
>
>     大家好:
>
>
>
> 我的flink是部署在yarn上左session,今天早上jobmanager自动退出了,然后yarn把他重新拉起了,导致里面跑的job重新启动了,但是我查看日志,看到jobmanager的日志没有任何异常,同时jobmanager也没有长时间的full
>     gc和频繁的gc,以下是jobmanager的日志:
>     就是在06:44分的是偶,日志上标记了收收到停止请求,然后jobmanager直接停止了...请问是由于什么原因导致的呢?
>
>     2019-08-06 06:43:58,891 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Completed
>     > checkpoint 7843 for job e49624208fe771c4c9527799fd46f2a3 (5645215
> bytes in
>     > 801 ms).
>     > 2019-08-06 06:43:59,336 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Triggering
>     > checkpoint 7852 @ 1565045039321 for job
> a9a7464ead55474bea6f42ed8e5de60f.
>     > 2019-08-06 06:44:00,971 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Triggering
>     > checkpoint 7852 @ 1565045040957 for job
> 79788b218e684cb31c1ca0fcc641e89f.
>     > 2019-08-06 06:44:01,357 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Completed
>     > checkpoint 7852 for job a9a7464ead55474bea6f42ed8e5de60f (25870658
> bytes in
>     > 1806 ms).
>     > 2019-08-06 06:44:02,887 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Completed
>     > checkpoint 7852 for job 79788b218e684cb31c1ca0fcc641e89f (29798945
> bytes in
>     > 1849 ms).
>     > 2019-08-06 06:44:05,101 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Triggering
>     > checkpoint 7852 @ 1565045045092 for job
> 03f3a0bd53c21f90f70ea01916dc9f78.
>     > 2019-08-06 06:44:06,547 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Triggering
>     > checkpoint 7844 @ 1565045046522 for job
> 486a1949d75863f823013d87b509d228.
>     > 2019-08-06 06:44:07,311 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Completed
>     > checkpoint 7844 for job 486a1949d75863f823013d87b509d228 (62458942
> bytes in
>     > 736 ms).
>     > 2019-08-06 06:44:07,506 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Completed
>     > checkpoint 7852 for job 03f3a0bd53c21f90f70ea01916dc9f78 (105565032
> bytes
>     > in 2366 ms).
>     > 2019-08-06 06:44:08,087 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Triggering
>     > checkpoint 7853 @ 1565045048055 for job
> 32783d371464265ef536454055ae6182.
>     > 2019-08-06 06:44:09,626 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Checkpoint
>     > 7050 of job 4b542195824ff7b7cdf749543fd368cb expired before
> completing.
>     > 2019-08-06 06:44:09,647 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Triggering
>     > checkpoint 7051 @ 1565045049626 for job
> 4b542195824ff7b7cdf749543fd368cb.
>     > 2019-08-06 06:44:12,006 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Completed
>     > checkpoint 7853 for job 32783d371464265ef536454055ae6182 (299599482
> bytes
>     > in 3912 ms).
>     > 2019-08-06 06:44:12,972 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Triggering
>     > checkpoint 7853 @ 1565045052962 for job
> 16db5afe9a8cd7c6278030d5dec4c80c.
>     > 2019-08-06 06:44:13,109 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Triggering
>     > checkpoint 7853 @ 1565045053080 for job
> 9c1394a2d2ff47c7852eff9f1f932535.
>     > 2019-08-06 06:44:16,779 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Completed
>     > checkpoint 7853 for job 16db5afe9a8cd7c6278030d5dec4c80c (152643149
> bytes
>     > in 3666 ms).
>     > 2019-08-06 06:44:18,598 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Completed
>     > checkpoint 7828 for job 8df2b47f2a4c1ba0f7019ee5989f6e71 (837558245
> bytes
>     > in 23472 ms).
>     > 2019-08-06 06:44:19,193 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Completed
>     > checkpoint 7853 for job 9c1394a2d2ff47c7852eff9f1f932535 (594628825
> bytes
>     > in 6067 ms).
>     > 2019-08-06 06:44:19,238 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Completed
>     > checkpoint 5855 for job 108ce7f6f5f3e76b12fad9dbdbc8feba (45917615
> bytes in
>     > 61819 ms).
>     > 2019-08-06 06:44:19,248 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Triggering
>     > checkpoint 5856 @ 1565045059238 for job
> 108ce7f6f5f3e76b12fad9dbdbc8feba.
>     > 2019-08-06 06:44:22,092 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Triggering
>     > checkpoint 7802 @ 1565045062084 for job
> 430689e0f202fcb29ce9d6403e6825f9.
>     > 2019-08-06 06:44:22,838 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Completed
>     > checkpoint 2940 for job fea51fd74006de69e265adc13e802229 (122562953
> bytes
>     > in 174336 ms).
>     > 2019-08-06 06:44:22,888 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Triggering
>     > checkpoint 2941 @ 1565045062838 for job
> fea51fd74006de69e265adc13e802229.
>     > 2019-08-06 06:44:24,348 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Triggering
>     > checkpoint 613 @ 1565045064328 for job
> 5a75d77312f29c714af0a2994f0e8b1a.
>     > 2019-08-06 06:44:25,327 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Completed
>     > checkpoint 7802 for job 430689e0f202fcb29ce9d6403e6825f9 (358649788
> bytes
>     > in 2788 ms).
>     > 2019-08-06 06:44:25,769 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Completed
>     > checkpoint 613 for job 5a75d77312f29c714af0a2994f0e8b1a (583594
> bytes in
>     > 1341 ms).
>     > 2019-08-06 06:44:27,547 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Triggering
>     > checkpoint 7844 @ 1565045067534 for job
> fb32bbf35ed002961b9dfb1417799ae6.
>     > 2019-08-06 06:44:28,738 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Completed
>     > checkpoint 7844 for job fb32bbf35ed002961b9dfb1417799ae6 (11017757
> bytes in
>     > 1178 ms).
>     > 2019-08-06 06:44:37,576 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Triggering
>     > checkpoint 7853 @ 1565045077573 for job
> d73c940cf0a996e12ecb93a146f93293.
>     > 2019-08-06 06:44:38,167 INFO
>     >  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     -
> Completed
>     > checkpoint 7853 for job d73c940cf0a996e12ecb93a146f93293 (123726
> bytes in
>     > 562 ms).
>     > 2019-08-06 06:44:45,957 INFO
>     >  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -
> RECEIVED
>     > SIGNAL 15: SIGTERM. Shutting down as requested.
>     > 2019-08-06 06:44:45,957 INFO
>     >  org.apache.flink.runtime.blob.TransientBlobCache              -
> Shutting
>     > down BLOB cache
>
>
>