the remote task manager was lost

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

the remote task manager was lost

guanxianchun
flink版本: flink-1.11
taskmanager memory: 8G
jobmanager memory: 2G
akka.ask.timeout:20s
akka.retry-gate-closed-for: 5000
client.timeout:600s

运行一段时间后报the remote task manager was lost ,错误信息如下:
2020-10-28 00:25:30,608 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Completed
checkpoint 411 for job 031e5f122711786fcc11ee6eb47291fa (2703770 bytes in
336 ms).
2020-10-28 00:27:30,273 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Triggering
checkpoint 412 (type=CHECKPOINT) @ 1603816050239 for job
031e5f122711786fcc11ee6eb47291fa.
2020-10-28 00:27:30,776 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Completed
checkpoint 412 for job 031e5f122711786fcc11ee6eb47291fa (3466688 bytes in
509 ms).
2020-10-28 00:29:30,246 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Triggering
checkpoint 413 (type=CHECKPOINT) @ 1603816170239 for job
031e5f122711786fcc11ee6eb47291fa.
2020-10-28 00:29:30,597 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Completed
checkpoint 413 for job 031e5f122711786fcc11ee6eb47291fa (2752681 bytes in
334 ms).
2020-10-28 00:29:47,353 WARN  akka.remote.ReliableDeliverySupervisor                      
[] - Association with remote system
[akka.tcp://[hidden email]:13912] has failed, address is now
gated for [5000] ms. Reason: [Disassociated]
2020-10-28 00:29:47,353 WARN  akka.remote.ReliableDeliverySupervisor                      
[] - Association with remote system
[akka.tcp://[hidden email]:31260] has failed, address is
now gated for [5000] ms. Reason: [Disassociated]
2020-10-28 00:29:47,377 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
KeyedProcess -> async wait operator -> Map (1/3)
(f84731e57528b326ad15ddc17821d1b8) switched from RUNNING to FAILED on
org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@538198b8.
org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
Connection unexpectedly closed by remote task manager
'hadoop01.dev.test.cn/192.168.1.21:7527'. This might indicate that the
remote task manager was lost.
        at
org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.channelInactive(CreditBasedPartitionRequestClientHandler.java:144)
~[flink-dist_2.11-1.11.1.jar:1.11.1]
        at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:257)
~[flink-dist_2.11-1.11.1.jar:1.11.1]
        at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:243)
~[flink-dist_2.11-1.11.1.jar:1.11.1]
        at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:236)
~[flink-dist_2.11-1.11.1.jar:1.11.1]
        at
org.apache.flink.shaded.netty4.io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81)
~[flink-dist_2.11-1.11.1.jar:1.11.1]
        at
org.apache.flink.runtime.io.network.netty.NettyMessageClientDecoderDelegate.channelInactive(NettyMessageClientDecoderDelegate.java:97)
~[flink-dist_2.11-1.11.1.jar:1.11.1]
        at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:257)
~[flink-dist_2.11-1.11.1.jar:1.11.1]
        at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:243)
~[flink-dist_2.11-1.11.1.jar:1.11.1]
        at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:236)
~[flink-dist_2.11-1.11.1.jar:1.11.1]
        at
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1416)
~[flink-dist_2.11-1.11.1.jar:1.11.1]
        at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:257)
~[flink-dist_2.11-1.11.1.jar:1.11.1]
        at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:243)
~[flink-dist_2.11-1.11.1.jar:1.11.1]
        at
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:912)
~[flink-dist_2.11-1.11.1.jar:1.11.1]
        at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:816)
~[flink-dist_2.11-1.11.1.jar:1.11.1]
        at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
~[flink-dist_2.11-1.11.1.jar:1.11.1]
        at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:416)
~[flink-dist_2.11-1.11.1.jar:1.11.1]
        at
org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:331)
~[flink-dist_2.11-1.11.1.jar:1.11.1]
        at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
~[flink-dist_2.11-1.11.1.jar:1.11.1]
        at
org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
~[flink-dist_2.11-1.11.1.jar:1.11.1]
        at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_131]
2020-10-28 00:29:47,442 INFO
org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy
[] - Calculating tasks to restart to recover the failed task
abf129c3bc11e5b145c2f3103110a0b2_0.
2020-10-28 00:29:47,443 INFO
org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy
[] - 19 tasks should be restarted to recover the failed task
abf129c3bc11e5b145c2f3103110a0b2_0.
2020-10-28 00:29:47,444 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job
static_order_gmv_by_paydate (031e5f122711786fcc11ee6eb47291fa) switched from
state RUNNING to RESTARTING.
2020-10-28 00:29:47,445 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Sink:
Unnamed (1/1) (c9d8de20cf8d58d3cd5e9f2dfadd7b70) switched from RUNNING to
CANCELING.
2020-10-28 00:29:47,447 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source:
Custom Source -> Flat Map -> Timestamps/Watermarks (2/3)
(828066cde4cda22eb4756366eafac229) switched from RUNNING to CANCELING.
2020-10-28 00:29:47,447 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source:
Custom Source -> Flat Map -> Timestamps/Watermarks (1/3)
(ae5e40830a57bbd118db2f8ee86a00ae) switched from RUNNING to CANCELING.
2020-10-28 00:29:47,447 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
KeyedProcess -> async wait operator -> Map (2/3)
(70eb6b6d5a363910f8fd808024d68b8a) switched from RUNNING to CANCELING.
2020-10-28 00:29:47,447 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
KeyedProcess -> async wait operator -> Map (3/3)
(a42963633bf0a142c082ec0e424666b3) switched from RUNNING to CANCELING.
2020-10-28 00:29:47,448 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source:
Custom Source -> Flat Map -> Timestamps/Watermarks (3/3)
(591b6fa2ad487cc2fe91cb9ac5a0d19e) switched from RUNNING to CANCELING.
2020-10-28 00:29:47,448 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
KeyedProcess (1/3) (9a35a07b539502ec2d23ec35d3d507db) switched from RUNNING
to CANCELING.
2020-10-28 00:29:47,448 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
KeyedProcess (2/3) (82734fa6851b2dcd769b34f7d8d1afaa) switched from RUNNING
to CANCELING.
2020-10-28 00:29:47,448 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
KeyedProcess (3/3) (f13b2ef5feba6b65ad276cf87bdf2218) switched from RUNNING
to CANCELING.
2020-10-28 00:29:47,448 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Co-Flat
Map (2/3) (6442e15db194a591c32a821e18198686) switched from RUNNING to
CANCELING.
2020-10-28 00:29:47,448 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Co-Flat
Map (1/3) (6961b6cff72d1c41d8345944d246b433) switched from RUNNING to
CANCELING.
2020-10-28 00:29:47,448 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Co-Flat
Map (3/3) (41edc64886544d8a542b23074c99f614) switched from RUNNING to
CANCELING.
2020-10-28 00:29:47,448 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
Window(TumblingEventTimeWindows(120000), EventTimeTrigger,
ViewAggregateFunction, ViewSumWindowFunction) (1/3)
(e9bd1a3fb4f3d0786831a439189e6240) switched from RUNNING to CANCELING.
2020-10-28 00:29:47,448 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
KeyedProcess (1/3) (057233f7fa678b0a54e5c3d682caab24) switched from RUNNING
to CANCELING.
2020-10-28 00:29:47,448 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
KeyedProcess (2/3) (11cde122ba8a22ef37269c8cd051e079) switched from RUNNING
to CANCELING.
2020-10-28 00:29:47,448 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
Window(TumblingEventTimeWindows(120000), EventTimeTrigger,
ViewAggregateFunction, ViewSumWindowFunction) (3/3)
(40b1bb8ce62b6b2062dc68bd63c2f60a) switched from RUNNING to CANCELING.
2020-10-28 00:29:47,449 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
Window(TumblingEventTimeWindows(120000), EventTimeTrigger,
ViewAggregateFunction, ViewSumWindowFunction) (2/3)
(88e1242700ba1d5a9cba5c466f51cac2) switched from RUNNING to CANCELING.
2020-10-28 00:29:47,449 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
KeyedProcess (3/3) (e67f3c240663d5949872fa5988568e40) switched from RUNNING
to CANCELING.
2020-10-28 00:29:47,452 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
Window(TumblingEventTimeWindows(120000), EventTimeTrigger,
ViewAggregateFunction, ViewSumWindowFunction) (1/3)
(e9bd1a3fb4f3d0786831a439189e6240) switched from CANCELING to CANCELED.
2020-10-28 00:29:47,452 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Discarding
the results produced by task execution e9bd1a3fb4f3d0786831a439189e6240.
2020-10-28 00:29:47,457 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Discarding
the results produced by task execution e9bd1a3fb4f3d0786831a439189e6240.
2020-10-28 00:29:47,459 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
KeyedProcess (1/3) (057233f7fa678b0a54e5c3d682caab24) switched from
CANCELING to CANCELED.
2020-10-28 00:29:47,459 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Discarding
the results produced by task execution 057233f7fa678b0a54e5c3d682caab24.
2020-10-28 00:29:47,460 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Discarding
the results produced by task execution 057233f7fa678b0a54e5c3d682caab24.
2020-10-28 00:29:47,460 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Co-Flat
Map (1/3) (6961b6cff72d1c41d8345944d246b433) switched from CANCELING to
CANCELED.
2020-10-28 00:29:47,460 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Discarding
the results produced by task execution 6961b6cff72d1c41d8345944d246b433.
2020-10-28 00:29:47,460 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Discarding
the results produced by task execution 6961b6cff72d1c41d8345944d246b433.
2020-10-28 00:29:47,461 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
KeyedProcess (1/3) (9a35a07b539502ec2d23ec35d3d507db) switched from
CANCELING to CANCELED.
2020-10-28 00:29:47,461 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Discarding
the results produced by task execution 9a35a07b539502ec2d23ec35d3d507db.
2020-10-28 00:29:47,461 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Discarding
the results produced by task execution 9a35a07b539502ec2d23ec35d3d507db.
2020-10-28 00:29:47,517 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source:
Custom Source -> Flat Map -> Timestamps/Watermarks (3/3)
(591b6fa2ad487cc2fe91cb9ac5a0d19e) switched from CANCELING to CANCELED.
2020-10-28 00:29:47,566 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Sink:
Unnamed (1/1) (c9d8de20cf8d58d3cd5e9f2dfadd7b70) switched from CANCELING to
CANCELED.
2020-10-28 00:29:47,567 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
KeyedProcess (3/3) (f13b2ef5feba6b65ad276cf87bdf2218) switched from
CANCELING to CANCELED.
2020-10-28 00:29:47,568 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Co-Flat
Map (3/3) (41edc64886544d8a542b23074c99f614) switched from CANCELING to
CANCELED.
2020-10-28 00:29:47,568 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
KeyedProcess (3/3) (e67f3c240663d5949872fa5988568e40) switched from
CANCELING to CANCELED.
2020-10-28 00:29:47,569 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
Window(TumblingEventTimeWindows(120000), EventTimeTrigger,
ViewAggregateFunction, ViewSumWindowFunction) (3/3)
(40b1bb8ce62b6b2062dc68bd63c2f60a) switched from CANCELING to CANCELED.
2020-10-28 00:29:47,570 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source:
Custom Source -> Flat Map -> Timestamps/Watermarks (1/3)
(ae5e40830a57bbd118db2f8ee86a00ae) switched from CANCELING to CANCELED.
2020-10-28 00:29:47,594 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
KeyedProcess -> async wait operator -> Map (3/3)
(a42963633bf0a142c082ec0e424666b3) switched from CANCELING to CANCELED.
2020-10-28 00:29:50,845 INFO  org.apache.flink.yarn.YarnResourceManager                  
[] - Closing TaskExecutor connection
container_1591067037248_153639_01_000003 because: Container killed on
request. Exit code is 137
Container exited with a non-zero exit code 137
Killed by external signal

2020-10-28 00:29:50,846 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source:
Custom Source -> Flat Map -> Timestamps/Watermarks (2/3)
(828066cde4cda22eb4756366eafac229) switched from CANCELING to CANCELED.
2020-10-28 00:29:50,846 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Discarding
the results produced by task execution 828066cde4cda22eb4756366eafac229.
2020-10-28 00:29:50,846 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Discarding
the results produced by task execution 828066cde4cda22eb4756366eafac229.
2020-10-28 00:29:50,846 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
KeyedProcess -> async wait operator -> Map (2/3)
(70eb6b6d5a363910f8fd808024d68b8a) switched from CANCELING to CANCELED.
2020-10-28 00:29:50,847 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Discarding
the results produced by task execution 70eb6b6d5a363910f8fd808024d68b8a.
2020-10-28 00:29:50,847 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Discarding
the results produced by task execution 70eb6b6d5a363910f8fd808024d68b8a.
2020-10-28 00:29:50,847 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
KeyedProcess (2/3) (11cde122ba8a22ef37269c8cd051e079) switched from
CANCELING to CANCELED.
2020-10-28 00:29:50,847 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Discarding
the results produced by task execution 11cde122ba8a22ef37269c8cd051e079.
2020-10-28 00:29:50,847 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Discarding
the results produced by task execution 11cde122ba8a22ef37269c8cd051e079.
2020-10-28 00:29:50,847 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
Window(TumblingEventTimeWindows(120000), EventTimeTrigger,
ViewAggregateFunction, ViewSumWindowFunction) (2/3)
(88e1242700ba1d5a9cba5c466f51cac2) switched from CANCELING to CANCELED.
2020-10-28 00:29:50,847 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Discarding
the results produced by task execution 88e1242700ba1d5a9cba5c466f51cac2.
2020-10-28 00:29:50,847 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Discarding
the results produced by task execution 88e1242700ba1d5a9cba5c466f51cac2.
2020-10-28 00:29:50,847 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Co-Flat
Map (2/3) (6442e15db194a591c32a821e18198686) switched from CANCELING to
CANCELED.
2020-10-28 00:29:50,847 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Discarding
the results produced by task execution 6442e15db194a591c32a821e18198686.
2020-10-28 00:29:50,847 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Discarding
the results produced by task execution 6442e15db194a591c32a821e18198686.
2020-10-28 00:29:50,847 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
KeyedProcess (2/3) (82734fa6851b2dcd769b34f7d8d1afaa) switched from
CANCELING to CANCELED.
2020-10-28 00:29:50,847 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Discarding
the results produced by task execution 82734fa6851b2dcd769b34f7d8d1afaa.
2020-10-28 00:29:50,847 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Discarding
the results produced by task execution 82734fa6851b2dcd769b34f7d8d1afaa.
2020-10-28 00:29:50,850 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job
static_order_gmv_by_paydate (031e5f122711786fcc11ee6eb47291fa) switched from
state RESTARTING to RUNNING.
2020-10-28 00:29:50,851 INFO
org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore [] -
Recovering checkpoints from ZooKeeper.





--
Sent from: http://apache-flink.147419.n8.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: the remote task manager was lost

nobleyd
我都是80G、100G这么分配资源的。。。

guanxianchun <[hidden email]> 于2020年10月28日周三 下午5:02写道:

> flink版本: flink-1.11
> taskmanager memory: 8G
> jobmanager memory: 2G
> akka.ask.timeout:20s
> akka.retry-gate-closed-for: 5000
> client.timeout:600s
>
> 运行一段时间后报the remote task manager was lost ,错误信息如下:
> 2020-10-28 00:25:30,608 INFO
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Completed
> checkpoint 411 for job 031e5f122711786fcc11ee6eb47291fa (2703770 bytes in
> 336 ms).
> 2020-10-28 00:27:30,273 INFO
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] -
> Triggering
> checkpoint 412 (type=CHECKPOINT) @ 1603816050239 for job
> 031e5f122711786fcc11ee6eb47291fa.
> 2020-10-28 00:27:30,776 INFO
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Completed
> checkpoint 412 for job 031e5f122711786fcc11ee6eb47291fa (3466688 bytes in
> 509 ms).
> 2020-10-28 00:29:30,246 INFO
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] -
> Triggering
> checkpoint 413 (type=CHECKPOINT) @ 1603816170239 for job
> 031e5f122711786fcc11ee6eb47291fa.
> 2020-10-28 00:29:30,597 INFO
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Completed
> checkpoint 413 for job 031e5f122711786fcc11ee6eb47291fa (2752681 bytes in
> 334 ms).
> 2020-10-28 00:29:47,353 WARN  akka.remote.ReliableDeliverySupervisor
>
> [] - Association with remote system
> [akka.tcp://[hidden email]:13912] has failed, address is now
> gated for [5000] ms. Reason: [Disassociated]
> 2020-10-28 00:29:47,353 WARN  akka.remote.ReliableDeliverySupervisor
>
> [] - Association with remote system
> [akka.tcp://[hidden email]:31260] has failed, address
> is
> now gated for [5000] ms. Reason: [Disassociated]
> 2020-10-28 00:29:47,377 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> KeyedProcess -> async wait operator -> Map (1/3)
> (f84731e57528b326ad15ddc17821d1b8) switched from RUNNING to FAILED on
> org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@538198b8.
> org.apache.flink.runtime.io
> .network.netty.exception.RemoteTransportException:
> Connection unexpectedly closed by remote task manager
> 'hadoop01.dev.test.cn/192.168.1.21:7527'. This might indicate that the
> remote task manager was lost.
>         at
> org.apache.flink.runtime.io
> .network.netty.CreditBasedPartitionRequestClientHandler.channelInactive(CreditBasedPartitionRequestClientHandler.java:144)
> ~[flink-dist_2.11-1.11.1.jar:1.11.1]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:257)
> ~[flink-dist_2.11-1.11.1.jar:1.11.1]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:243)
> ~[flink-dist_2.11-1.11.1.jar:1.11.1]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:236)
> ~[flink-dist_2.11-1.11.1.jar:1.11.1]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81)
> ~[flink-dist_2.11-1.11.1.jar:1.11.1]
>         at
> org.apache.flink.runtime.io
> .network.netty.NettyMessageClientDecoderDelegate.channelInactive(NettyMessageClientDecoderDelegate.java:97)
> ~[flink-dist_2.11-1.11.1.jar:1.11.1]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:257)
> ~[flink-dist_2.11-1.11.1.jar:1.11.1]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:243)
> ~[flink-dist_2.11-1.11.1.jar:1.11.1]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:236)
> ~[flink-dist_2.11-1.11.1.jar:1.11.1]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1416)
> ~[flink-dist_2.11-1.11.1.jar:1.11.1]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:257)
> ~[flink-dist_2.11-1.11.1.jar:1.11.1]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:243)
> ~[flink-dist_2.11-1.11.1.jar:1.11.1]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:912)
> ~[flink-dist_2.11-1.11.1.jar:1.11.1]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:816)
> ~[flink-dist_2.11-1.11.1.jar:1.11.1]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
> ~[flink-dist_2.11-1.11.1.jar:1.11.1]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:416)
> ~[flink-dist_2.11-1.11.1.jar:1.11.1]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:331)
> ~[flink-dist_2.11-1.11.1.jar:1.11.1]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
> ~[flink-dist_2.11-1.11.1.jar:1.11.1]
>         at
>
> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> ~[flink-dist_2.11-1.11.1.jar:1.11.1]
>         at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_131]
> 2020-10-28 00:29:47,442 INFO
>
> org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy
> [] - Calculating tasks to restart to recover the failed task
> abf129c3bc11e5b145c2f3103110a0b2_0.
> 2020-10-28 00:29:47,443 INFO
>
> org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy
> [] - 19 tasks should be restarted to recover the failed task
> abf129c3bc11e5b145c2f3103110a0b2_0.
> 2020-10-28 00:29:47,444 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job
> static_order_gmv_by_paydate (031e5f122711786fcc11ee6eb47291fa) switched
> from
> state RUNNING to RESTARTING.
> 2020-10-28 00:29:47,445 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Sink:
> Unnamed (1/1) (c9d8de20cf8d58d3cd5e9f2dfadd7b70) switched from RUNNING to
> CANCELING.
> 2020-10-28 00:29:47,447 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source:
> Custom Source -> Flat Map -> Timestamps/Watermarks (2/3)
> (828066cde4cda22eb4756366eafac229) switched from RUNNING to CANCELING.
> 2020-10-28 00:29:47,447 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source:
> Custom Source -> Flat Map -> Timestamps/Watermarks (1/3)
> (ae5e40830a57bbd118db2f8ee86a00ae) switched from RUNNING to CANCELING.
> 2020-10-28 00:29:47,447 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> KeyedProcess -> async wait operator -> Map (2/3)
> (70eb6b6d5a363910f8fd808024d68b8a) switched from RUNNING to CANCELING.
> 2020-10-28 00:29:47,447 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> KeyedProcess -> async wait operator -> Map (3/3)
> (a42963633bf0a142c082ec0e424666b3) switched from RUNNING to CANCELING.
> 2020-10-28 00:29:47,448 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source:
> Custom Source -> Flat Map -> Timestamps/Watermarks (3/3)
> (591b6fa2ad487cc2fe91cb9ac5a0d19e) switched from RUNNING to CANCELING.
> 2020-10-28 00:29:47,448 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> KeyedProcess (1/3) (9a35a07b539502ec2d23ec35d3d507db) switched from RUNNING
> to CANCELING.
> 2020-10-28 00:29:47,448 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> KeyedProcess (2/3) (82734fa6851b2dcd769b34f7d8d1afaa) switched from RUNNING
> to CANCELING.
> 2020-10-28 00:29:47,448 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> KeyedProcess (3/3) (f13b2ef5feba6b65ad276cf87bdf2218) switched from RUNNING
> to CANCELING.
> 2020-10-28 00:29:47,448 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Co-Flat
> Map (2/3) (6442e15db194a591c32a821e18198686) switched from RUNNING to
> CANCELING.
> 2020-10-28 00:29:47,448 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Co-Flat
> Map (1/3) (6961b6cff72d1c41d8345944d246b433) switched from RUNNING to
> CANCELING.
> 2020-10-28 00:29:47,448 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Co-Flat
> Map (3/3) (41edc64886544d8a542b23074c99f614) switched from RUNNING to
> CANCELING.
> 2020-10-28 00:29:47,448 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> Window(TumblingEventTimeWindows(120000), EventTimeTrigger,
> ViewAggregateFunction, ViewSumWindowFunction) (1/3)
> (e9bd1a3fb4f3d0786831a439189e6240) switched from RUNNING to CANCELING.
> 2020-10-28 00:29:47,448 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> KeyedProcess (1/3) (057233f7fa678b0a54e5c3d682caab24) switched from RUNNING
> to CANCELING.
> 2020-10-28 00:29:47,448 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> KeyedProcess (2/3) (11cde122ba8a22ef37269c8cd051e079) switched from RUNNING
> to CANCELING.
> 2020-10-28 00:29:47,448 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> Window(TumblingEventTimeWindows(120000), EventTimeTrigger,
> ViewAggregateFunction, ViewSumWindowFunction) (3/3)
> (40b1bb8ce62b6b2062dc68bd63c2f60a) switched from RUNNING to CANCELING.
> 2020-10-28 00:29:47,449 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> Window(TumblingEventTimeWindows(120000), EventTimeTrigger,
> ViewAggregateFunction, ViewSumWindowFunction) (2/3)
> (88e1242700ba1d5a9cba5c466f51cac2) switched from RUNNING to CANCELING.
> 2020-10-28 00:29:47,449 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> KeyedProcess (3/3) (e67f3c240663d5949872fa5988568e40) switched from RUNNING
> to CANCELING.
> 2020-10-28 00:29:47,452 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> Window(TumblingEventTimeWindows(120000), EventTimeTrigger,
> ViewAggregateFunction, ViewSumWindowFunction) (1/3)
> (e9bd1a3fb4f3d0786831a439189e6240) switched from CANCELING to CANCELED.
> 2020-10-28 00:29:47,452 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> Discarding
> the results produced by task execution e9bd1a3fb4f3d0786831a439189e6240.
> 2020-10-28 00:29:47,457 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> Discarding
> the results produced by task execution e9bd1a3fb4f3d0786831a439189e6240.
> 2020-10-28 00:29:47,459 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> KeyedProcess (1/3) (057233f7fa678b0a54e5c3d682caab24) switched from
> CANCELING to CANCELED.
> 2020-10-28 00:29:47,459 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> Discarding
> the results produced by task execution 057233f7fa678b0a54e5c3d682caab24.
> 2020-10-28 00:29:47,460 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> Discarding
> the results produced by task execution 057233f7fa678b0a54e5c3d682caab24.
> 2020-10-28 00:29:47,460 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Co-Flat
> Map (1/3) (6961b6cff72d1c41d8345944d246b433) switched from CANCELING to
> CANCELED.
> 2020-10-28 00:29:47,460 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> Discarding
> the results produced by task execution 6961b6cff72d1c41d8345944d246b433.
> 2020-10-28 00:29:47,460 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> Discarding
> the results produced by task execution 6961b6cff72d1c41d8345944d246b433.
> 2020-10-28 00:29:47,461 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> KeyedProcess (1/3) (9a35a07b539502ec2d23ec35d3d507db) switched from
> CANCELING to CANCELED.
> 2020-10-28 00:29:47,461 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> Discarding
> the results produced by task execution 9a35a07b539502ec2d23ec35d3d507db.
> 2020-10-28 00:29:47,461 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> Discarding
> the results produced by task execution 9a35a07b539502ec2d23ec35d3d507db.
> 2020-10-28 00:29:47,517 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source:
> Custom Source -> Flat Map -> Timestamps/Watermarks (3/3)
> (591b6fa2ad487cc2fe91cb9ac5a0d19e) switched from CANCELING to CANCELED.
> 2020-10-28 00:29:47,566 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Sink:
> Unnamed (1/1) (c9d8de20cf8d58d3cd5e9f2dfadd7b70) switched from CANCELING to
> CANCELED.
> 2020-10-28 00:29:47,567 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> KeyedProcess (3/3) (f13b2ef5feba6b65ad276cf87bdf2218) switched from
> CANCELING to CANCELED.
> 2020-10-28 00:29:47,568 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Co-Flat
> Map (3/3) (41edc64886544d8a542b23074c99f614) switched from CANCELING to
> CANCELED.
> 2020-10-28 00:29:47,568 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> KeyedProcess (3/3) (e67f3c240663d5949872fa5988568e40) switched from
> CANCELING to CANCELED.
> 2020-10-28 00:29:47,569 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> Window(TumblingEventTimeWindows(120000), EventTimeTrigger,
> ViewAggregateFunction, ViewSumWindowFunction) (3/3)
> (40b1bb8ce62b6b2062dc68bd63c2f60a) switched from CANCELING to CANCELED.
> 2020-10-28 00:29:47,570 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source:
> Custom Source -> Flat Map -> Timestamps/Watermarks (1/3)
> (ae5e40830a57bbd118db2f8ee86a00ae) switched from CANCELING to CANCELED.
> 2020-10-28 00:29:47,594 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> KeyedProcess -> async wait operator -> Map (3/3)
> (a42963633bf0a142c082ec0e424666b3) switched from CANCELING to CANCELED.
> 2020-10-28 00:29:50,845 INFO  org.apache.flink.yarn.YarnResourceManager
>
> [] - Closing TaskExecutor connection
> container_1591067037248_153639_01_000003 because: Container killed on
> request. Exit code is 137
> Container exited with a non-zero exit code 137
> Killed by external signal
>
> 2020-10-28 00:29:50,846 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source:
> Custom Source -> Flat Map -> Timestamps/Watermarks (2/3)
> (828066cde4cda22eb4756366eafac229) switched from CANCELING to CANCELED.
> 2020-10-28 00:29:50,846 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> Discarding
> the results produced by task execution 828066cde4cda22eb4756366eafac229.
> 2020-10-28 00:29:50,846 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> Discarding
> the results produced by task execution 828066cde4cda22eb4756366eafac229.
> 2020-10-28 00:29:50,846 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> KeyedProcess -> async wait operator -> Map (2/3)
> (70eb6b6d5a363910f8fd808024d68b8a) switched from CANCELING to CANCELED.
> 2020-10-28 00:29:50,847 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> Discarding
> the results produced by task execution 70eb6b6d5a363910f8fd808024d68b8a.
> 2020-10-28 00:29:50,847 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> Discarding
> the results produced by task execution 70eb6b6d5a363910f8fd808024d68b8a.
> 2020-10-28 00:29:50,847 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> KeyedProcess (2/3) (11cde122ba8a22ef37269c8cd051e079) switched from
> CANCELING to CANCELED.
> 2020-10-28 00:29:50,847 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> Discarding
> the results produced by task execution 11cde122ba8a22ef37269c8cd051e079.
> 2020-10-28 00:29:50,847 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> Discarding
> the results produced by task execution 11cde122ba8a22ef37269c8cd051e079.
> 2020-10-28 00:29:50,847 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> Window(TumblingEventTimeWindows(120000), EventTimeTrigger,
> ViewAggregateFunction, ViewSumWindowFunction) (2/3)
> (88e1242700ba1d5a9cba5c466f51cac2) switched from CANCELING to CANCELED.
> 2020-10-28 00:29:50,847 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> Discarding
> the results produced by task execution 88e1242700ba1d5a9cba5c466f51cac2.
> 2020-10-28 00:29:50,847 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> Discarding
> the results produced by task execution 88e1242700ba1d5a9cba5c466f51cac2.
> 2020-10-28 00:29:50,847 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Co-Flat
> Map (2/3) (6442e15db194a591c32a821e18198686) switched from CANCELING to
> CANCELED.
> 2020-10-28 00:29:50,847 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> Discarding
> the results produced by task execution 6442e15db194a591c32a821e18198686.
> 2020-10-28 00:29:50,847 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> Discarding
> the results produced by task execution 6442e15db194a591c32a821e18198686.
> 2020-10-28 00:29:50,847 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> KeyedProcess (2/3) (82734fa6851b2dcd769b34f7d8d1afaa) switched from
> CANCELING to CANCELED.
> 2020-10-28 00:29:50,847 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> Discarding
> the results produced by task execution 82734fa6851b2dcd769b34f7d8d1afaa.
> 2020-10-28 00:29:50,847 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> Discarding
> the results produced by task execution 82734fa6851b2dcd769b34f7d8d1afaa.
> 2020-10-28 00:29:50,850 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job
> static_order_gmv_by_paydate (031e5f122711786fcc11ee6eb47291fa) switched
> from
> state RESTARTING to RUNNING.
> 2020-10-28 00:29:50,851 INFO
> org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore [] -
> Recovering checkpoints from ZooKeeper.
>
>
>
>
>
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: the remote task manager was lost

Congxian Qiu
可以看一下 remote task 对应的 tm 日志,看看有没有啥异常

Best,
Congxian


赵一旦 <[hidden email]> 于2020年12月2日周三 下午6:17写道:

> 我都是80G、100G这么分配资源的。。。
>
> guanxianchun <[hidden email]> 于2020年10月28日周三 下午5:02写道:
>
> > flink版本: flink-1.11
> > taskmanager memory: 8G
> > jobmanager memory: 2G
> > akka.ask.timeout:20s
> > akka.retry-gate-closed-for: 5000
> > client.timeout:600s
> >
> > 运行一段时间后报the remote task manager was lost ,错误信息如下:
> > 2020-10-28 00:25:30,608 INFO
> > org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] -
> Completed
> > checkpoint 411 for job 031e5f122711786fcc11ee6eb47291fa (2703770 bytes in
> > 336 ms).
> > 2020-10-28 00:27:30,273 INFO
> > org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] -
> > Triggering
> > checkpoint 412 (type=CHECKPOINT) @ 1603816050239 for job
> > 031e5f122711786fcc11ee6eb47291fa.
> > 2020-10-28 00:27:30,776 INFO
> > org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] -
> Completed
> > checkpoint 412 for job 031e5f122711786fcc11ee6eb47291fa (3466688 bytes in
> > 509 ms).
> > 2020-10-28 00:29:30,246 INFO
> > org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] -
> > Triggering
> > checkpoint 413 (type=CHECKPOINT) @ 1603816170239 for job
> > 031e5f122711786fcc11ee6eb47291fa.
> > 2020-10-28 00:29:30,597 INFO
> > org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] -
> Completed
> > checkpoint 413 for job 031e5f122711786fcc11ee6eb47291fa (2752681 bytes in
> > 334 ms).
> > 2020-10-28 00:29:47,353 WARN  akka.remote.ReliableDeliverySupervisor
> >
> > [] - Association with remote system
> > [akka.tcp://[hidden email]:13912] has failed, address is now
> > gated for [5000] ms. Reason: [Disassociated]
> > 2020-10-28 00:29:47,353 WARN  akka.remote.ReliableDeliverySupervisor
> >
> > [] - Association with remote system
> > [akka.tcp://[hidden email]:31260] has failed,
> address
> > is
> > now gated for [5000] ms. Reason: [Disassociated]
> > 2020-10-28 00:29:47,377 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > KeyedProcess -> async wait operator -> Map (1/3)
> > (f84731e57528b326ad15ddc17821d1b8) switched from RUNNING to FAILED on
> > org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@538198b8.
> > org.apache.flink.runtime.io
> > .network.netty.exception.RemoteTransportException:
> > Connection unexpectedly closed by remote task manager
> > 'hadoop01.dev.test.cn/192.168.1.21:7527'. This might indicate that the
> > remote task manager was lost.
> >         at
> > org.apache.flink.runtime.io
> >
> .network.netty.CreditBasedPartitionRequestClientHandler.channelInactive(CreditBasedPartitionRequestClientHandler.java:144)
> > ~[flink-dist_2.11-1.11.1.jar:1.11.1]
> >         at
> >
> >
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:257)
> > ~[flink-dist_2.11-1.11.1.jar:1.11.1]
> >         at
> >
> >
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:243)
> > ~[flink-dist_2.11-1.11.1.jar:1.11.1]
> >         at
> >
> >
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:236)
> > ~[flink-dist_2.11-1.11.1.jar:1.11.1]
> >         at
> >
> >
> org.apache.flink.shaded.netty4.io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81)
> > ~[flink-dist_2.11-1.11.1.jar:1.11.1]
> >         at
> > org.apache.flink.runtime.io
> >
> .network.netty.NettyMessageClientDecoderDelegate.channelInactive(NettyMessageClientDecoderDelegate.java:97)
> > ~[flink-dist_2.11-1.11.1.jar:1.11.1]
> >         at
> >
> >
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:257)
> > ~[flink-dist_2.11-1.11.1.jar:1.11.1]
> >         at
> >
> >
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:243)
> > ~[flink-dist_2.11-1.11.1.jar:1.11.1]
> >         at
> >
> >
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:236)
> > ~[flink-dist_2.11-1.11.1.jar:1.11.1]
> >         at
> >
> >
> org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1416)
> > ~[flink-dist_2.11-1.11.1.jar:1.11.1]
> >         at
> >
> >
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:257)
> > ~[flink-dist_2.11-1.11.1.jar:1.11.1]
> >         at
> >
> >
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:243)
> > ~[flink-dist_2.11-1.11.1.jar:1.11.1]
> >         at
> >
> >
> org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:912)
> > ~[flink-dist_2.11-1.11.1.jar:1.11.1]
> >         at
> >
> >
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:816)
> > ~[flink-dist_2.11-1.11.1.jar:1.11.1]
> >         at
> >
> >
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
> > ~[flink-dist_2.11-1.11.1.jar:1.11.1]
> >         at
> >
> >
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:416)
> > ~[flink-dist_2.11-1.11.1.jar:1.11.1]
> >         at
> >
> >
> org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:331)
> > ~[flink-dist_2.11-1.11.1.jar:1.11.1]
> >         at
> >
> >
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
> > ~[flink-dist_2.11-1.11.1.jar:1.11.1]
> >         at
> >
> >
> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> > ~[flink-dist_2.11-1.11.1.jar:1.11.1]
> >         at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_131]
> > 2020-10-28 00:29:47,442 INFO
> >
> >
> org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy
> > [] - Calculating tasks to restart to recover the failed task
> > abf129c3bc11e5b145c2f3103110a0b2_0.
> > 2020-10-28 00:29:47,443 INFO
> >
> >
> org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy
> > [] - 19 tasks should be restarted to recover the failed task
> > abf129c3bc11e5b145c2f3103110a0b2_0.
> > 2020-10-28 00:29:47,444 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job
> > static_order_gmv_by_paydate (031e5f122711786fcc11ee6eb47291fa) switched
> > from
> > state RUNNING to RESTARTING.
> > 2020-10-28 00:29:47,445 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Sink:
> > Unnamed (1/1) (c9d8de20cf8d58d3cd5e9f2dfadd7b70) switched from RUNNING to
> > CANCELING.
> > 2020-10-28 00:29:47,447 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source:
> > Custom Source -> Flat Map -> Timestamps/Watermarks (2/3)
> > (828066cde4cda22eb4756366eafac229) switched from RUNNING to CANCELING.
> > 2020-10-28 00:29:47,447 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source:
> > Custom Source -> Flat Map -> Timestamps/Watermarks (1/3)
> > (ae5e40830a57bbd118db2f8ee86a00ae) switched from RUNNING to CANCELING.
> > 2020-10-28 00:29:47,447 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > KeyedProcess -> async wait operator -> Map (2/3)
> > (70eb6b6d5a363910f8fd808024d68b8a) switched from RUNNING to CANCELING.
> > 2020-10-28 00:29:47,447 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > KeyedProcess -> async wait operator -> Map (3/3)
> > (a42963633bf0a142c082ec0e424666b3) switched from RUNNING to CANCELING.
> > 2020-10-28 00:29:47,448 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source:
> > Custom Source -> Flat Map -> Timestamps/Watermarks (3/3)
> > (591b6fa2ad487cc2fe91cb9ac5a0d19e) switched from RUNNING to CANCELING.
> > 2020-10-28 00:29:47,448 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > KeyedProcess (1/3) (9a35a07b539502ec2d23ec35d3d507db) switched from
> RUNNING
> > to CANCELING.
> > 2020-10-28 00:29:47,448 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > KeyedProcess (2/3) (82734fa6851b2dcd769b34f7d8d1afaa) switched from
> RUNNING
> > to CANCELING.
> > 2020-10-28 00:29:47,448 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > KeyedProcess (3/3) (f13b2ef5feba6b65ad276cf87bdf2218) switched from
> RUNNING
> > to CANCELING.
> > 2020-10-28 00:29:47,448 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Co-Flat
> > Map (2/3) (6442e15db194a591c32a821e18198686) switched from RUNNING to
> > CANCELING.
> > 2020-10-28 00:29:47,448 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Co-Flat
> > Map (1/3) (6961b6cff72d1c41d8345944d246b433) switched from RUNNING to
> > CANCELING.
> > 2020-10-28 00:29:47,448 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Co-Flat
> > Map (3/3) (41edc64886544d8a542b23074c99f614) switched from RUNNING to
> > CANCELING.
> > 2020-10-28 00:29:47,448 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > Window(TumblingEventTimeWindows(120000), EventTimeTrigger,
> > ViewAggregateFunction, ViewSumWindowFunction) (1/3)
> > (e9bd1a3fb4f3d0786831a439189e6240) switched from RUNNING to CANCELING.
> > 2020-10-28 00:29:47,448 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > KeyedProcess (1/3) (057233f7fa678b0a54e5c3d682caab24) switched from
> RUNNING
> > to CANCELING.
> > 2020-10-28 00:29:47,448 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > KeyedProcess (2/3) (11cde122ba8a22ef37269c8cd051e079) switched from
> RUNNING
> > to CANCELING.
> > 2020-10-28 00:29:47,448 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > Window(TumblingEventTimeWindows(120000), EventTimeTrigger,
> > ViewAggregateFunction, ViewSumWindowFunction) (3/3)
> > (40b1bb8ce62b6b2062dc68bd63c2f60a) switched from RUNNING to CANCELING.
> > 2020-10-28 00:29:47,449 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > Window(TumblingEventTimeWindows(120000), EventTimeTrigger,
> > ViewAggregateFunction, ViewSumWindowFunction) (2/3)
> > (88e1242700ba1d5a9cba5c466f51cac2) switched from RUNNING to CANCELING.
> > 2020-10-28 00:29:47,449 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > KeyedProcess (3/3) (e67f3c240663d5949872fa5988568e40) switched from
> RUNNING
> > to CANCELING.
> > 2020-10-28 00:29:47,452 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > Window(TumblingEventTimeWindows(120000), EventTimeTrigger,
> > ViewAggregateFunction, ViewSumWindowFunction) (1/3)
> > (e9bd1a3fb4f3d0786831a439189e6240) switched from CANCELING to CANCELED.
> > 2020-10-28 00:29:47,452 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > Discarding
> > the results produced by task execution e9bd1a3fb4f3d0786831a439189e6240.
> > 2020-10-28 00:29:47,457 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > Discarding
> > the results produced by task execution e9bd1a3fb4f3d0786831a439189e6240.
> > 2020-10-28 00:29:47,459 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > KeyedProcess (1/3) (057233f7fa678b0a54e5c3d682caab24) switched from
> > CANCELING to CANCELED.
> > 2020-10-28 00:29:47,459 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > Discarding
> > the results produced by task execution 057233f7fa678b0a54e5c3d682caab24.
> > 2020-10-28 00:29:47,460 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > Discarding
> > the results produced by task execution 057233f7fa678b0a54e5c3d682caab24.
> > 2020-10-28 00:29:47,460 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Co-Flat
> > Map (1/3) (6961b6cff72d1c41d8345944d246b433) switched from CANCELING to
> > CANCELED.
> > 2020-10-28 00:29:47,460 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > Discarding
> > the results produced by task execution 6961b6cff72d1c41d8345944d246b433.
> > 2020-10-28 00:29:47,460 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > Discarding
> > the results produced by task execution 6961b6cff72d1c41d8345944d246b433.
> > 2020-10-28 00:29:47,461 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > KeyedProcess (1/3) (9a35a07b539502ec2d23ec35d3d507db) switched from
> > CANCELING to CANCELED.
> > 2020-10-28 00:29:47,461 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > Discarding
> > the results produced by task execution 9a35a07b539502ec2d23ec35d3d507db.
> > 2020-10-28 00:29:47,461 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > Discarding
> > the results produced by task execution 9a35a07b539502ec2d23ec35d3d507db.
> > 2020-10-28 00:29:47,517 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source:
> > Custom Source -> Flat Map -> Timestamps/Watermarks (3/3)
> > (591b6fa2ad487cc2fe91cb9ac5a0d19e) switched from CANCELING to CANCELED.
> > 2020-10-28 00:29:47,566 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Sink:
> > Unnamed (1/1) (c9d8de20cf8d58d3cd5e9f2dfadd7b70) switched from CANCELING
> to
> > CANCELED.
> > 2020-10-28 00:29:47,567 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > KeyedProcess (3/3) (f13b2ef5feba6b65ad276cf87bdf2218) switched from
> > CANCELING to CANCELED.
> > 2020-10-28 00:29:47,568 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Co-Flat
> > Map (3/3) (41edc64886544d8a542b23074c99f614) switched from CANCELING to
> > CANCELED.
> > 2020-10-28 00:29:47,568 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > KeyedProcess (3/3) (e67f3c240663d5949872fa5988568e40) switched from
> > CANCELING to CANCELED.
> > 2020-10-28 00:29:47,569 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > Window(TumblingEventTimeWindows(120000), EventTimeTrigger,
> > ViewAggregateFunction, ViewSumWindowFunction) (3/3)
> > (40b1bb8ce62b6b2062dc68bd63c2f60a) switched from CANCELING to CANCELED.
> > 2020-10-28 00:29:47,570 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source:
> > Custom Source -> Flat Map -> Timestamps/Watermarks (1/3)
> > (ae5e40830a57bbd118db2f8ee86a00ae) switched from CANCELING to CANCELED.
> > 2020-10-28 00:29:47,594 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > KeyedProcess -> async wait operator -> Map (3/3)
> > (a42963633bf0a142c082ec0e424666b3) switched from CANCELING to CANCELED.
> > 2020-10-28 00:29:50,845 INFO  org.apache.flink.yarn.YarnResourceManager
> >
> > [] - Closing TaskExecutor connection
> > container_1591067037248_153639_01_000003 because: Container killed on
> > request. Exit code is 137
> > Container exited with a non-zero exit code 137
> > Killed by external signal
> >
> > 2020-10-28 00:29:50,846 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source:
> > Custom Source -> Flat Map -> Timestamps/Watermarks (2/3)
> > (828066cde4cda22eb4756366eafac229) switched from CANCELING to CANCELED.
> > 2020-10-28 00:29:50,846 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > Discarding
> > the results produced by task execution 828066cde4cda22eb4756366eafac229.
> > 2020-10-28 00:29:50,846 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > Discarding
> > the results produced by task execution 828066cde4cda22eb4756366eafac229.
> > 2020-10-28 00:29:50,846 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > KeyedProcess -> async wait operator -> Map (2/3)
> > (70eb6b6d5a363910f8fd808024d68b8a) switched from CANCELING to CANCELED.
> > 2020-10-28 00:29:50,847 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > Discarding
> > the results produced by task execution 70eb6b6d5a363910f8fd808024d68b8a.
> > 2020-10-28 00:29:50,847 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > Discarding
> > the results produced by task execution 70eb6b6d5a363910f8fd808024d68b8a.
> > 2020-10-28 00:29:50,847 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > KeyedProcess (2/3) (11cde122ba8a22ef37269c8cd051e079) switched from
> > CANCELING to CANCELED.
> > 2020-10-28 00:29:50,847 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > Discarding
> > the results produced by task execution 11cde122ba8a22ef37269c8cd051e079.
> > 2020-10-28 00:29:50,847 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > Discarding
> > the results produced by task execution 11cde122ba8a22ef37269c8cd051e079.
> > 2020-10-28 00:29:50,847 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > Window(TumblingEventTimeWindows(120000), EventTimeTrigger,
> > ViewAggregateFunction, ViewSumWindowFunction) (2/3)
> > (88e1242700ba1d5a9cba5c466f51cac2) switched from CANCELING to CANCELED.
> > 2020-10-28 00:29:50,847 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > Discarding
> > the results produced by task execution 88e1242700ba1d5a9cba5c466f51cac2.
> > 2020-10-28 00:29:50,847 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > Discarding
> > the results produced by task execution 88e1242700ba1d5a9cba5c466f51cac2.
> > 2020-10-28 00:29:50,847 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Co-Flat
> > Map (2/3) (6442e15db194a591c32a821e18198686) switched from CANCELING to
> > CANCELED.
> > 2020-10-28 00:29:50,847 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > Discarding
> > the results produced by task execution 6442e15db194a591c32a821e18198686.
> > 2020-10-28 00:29:50,847 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > Discarding
> > the results produced by task execution 6442e15db194a591c32a821e18198686.
> > 2020-10-28 00:29:50,847 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > KeyedProcess (2/3) (82734fa6851b2dcd769b34f7d8d1afaa) switched from
> > CANCELING to CANCELED.
> > 2020-10-28 00:29:50,847 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > Discarding
> > the results produced by task execution 82734fa6851b2dcd769b34f7d8d1afaa.
> > 2020-10-28 00:29:50,847 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] -
> > Discarding
> > the results produced by task execution 82734fa6851b2dcd769b34f7d8d1afaa.
> > 2020-10-28 00:29:50,850 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job
> > static_order_gmv_by_paydate (031e5f122711786fcc11ee6eb47291fa) switched
> > from
> > state RESTARTING to RUNNING.
> > 2020-10-28 00:29:50,851 INFO
> > org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore []
> -
> > Recovering checkpoints from ZooKeeper.
> >
> >
> >
> >
> >
> > --
> > Sent from: http://apache-flink.147419.n8.nabble.com/
> >
>