怎么排查taskmanager频繁挂掉的原因?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

怎么排查taskmanager频繁挂掉的原因?

Jeff
hi all,
最近上线一批新任务后taskmanager频繁挂掉,很可能是OOM问题,操作系统日志里没找到相关记录,flink日志只找到如下部分,但还是不确定是什么原因,请问要怎么确定原因呢?




id, channel, rowtime) -> select: (appid, channel, rowtime, 1 AS $f3) b91d36766995398a9b0c9416ac1fb6bc.
2020-05-14 08:55:30,504 ERROR org.apache.flink.runtime.taskmanager.Task - Task did not exit gracefully within 180 + seconds.
2020-05-14 08:55:30,505 ERROR org.apache.flink.runtime.taskexecutor.TaskExecutor - Task did not exit gracefully within 180 + seconds.
2020-05-14 08:55:30,505 ERROR org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Fatal error occurred while executing the TaskManager. Shutting it down...
2020-05-14 08:55:30,505 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Stopping TaskExecutor akka.tcp://flink@172.24.150.24:48412/user/taskmanager_0.
2020-05-14 08:55:30,508 INFO org.apache.flink.runtime.taskexecutor.JobLeaderService - Stop job leader service.
2020-05-14 08:55:30,510 INFO org.apache.flink.runtime.state.TaskExecutorLocalStateStoresManager - Shutting down TaskExecutorLocalStateStoresManager.
2020-05-14 08:55:30,512 INFO org.apache.flink.runtime.io.disk.iomanager.IOManager - I/O manager removed spill file directory /tmp/flink-io-c882cdf4-840c-4c62-800e-0ca3226be20b
2020-05-14 08:55:30,512 INFO org.apache.flink.runtime.io.network.NetworkEnvironment - Shutting down the network environment and its components.
2020-05-14 08:55:30,514 INFO org.apache.flink.runtime.io.network.netty.NettyClient - Successful shutdown (took 2 ms).
2020-05-14 08:55:30,517 INFO org.apache.flink.runtime.io.network.netty.NettyServer - Successful shutdown (took 2 ms).
2020-05-14 08:55:30,545 INFO org.apache.flink.runtime.taskexecutor.JobLeaderService - Stop job leader service.
2020-05-14 08:55:30,546 INFO org.apache.flink.runtime.filecache.FileCache - removed file cache directory /tmp/flink-dist-cache-3e0d4351-d5aa-4358-a028-9a59f398d92f
2020-05-14 08:55:30,550 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Stopped TaskExecutor akka.tcp://flink@172.24.150.24:48412/user/taskmanager_0.
2020-05-14 08:55:30,552 INFO org.apache.flink.runtime.blob.PermanentBlobCache - Shutting down BLOB cache
2020-05-14 08:55:30,554 INFO org.apache.flink.runtime.blob.TransientBlobCache - Shutting down BLOB cache
2020-05-14 08:55:30,563 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopping Akka RPC service.
2020-05-14 08:55:30,566 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down remote daemon.
2020-05-14 08:55:30,567 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon shut down; proceeding with flushing remote transports.
2020-05-14 08:55:30,570 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down remote daemon.
2020-05-14 08:55:30,571 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon shut down; proceeding with flushing remote transports.
2020-05-14 08:55:30,571 INFO org.apache.flink.runtime.taskmanager.Task - Source: KafkaTableSource switched from RUNNING to FAILED.
java.lang.RuntimeException: segment has been freed
at org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:110)
at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:89)
at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:45)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
at org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51)
at org.apache.flink.table.runtime.CRowWrappingCollector.collect(CRowWrappingCollector.scala:37)
at org.apache.flink.table.runtime.CRowWrappingCollector.collect(CRowWrappingCollector.scala:28)
at DataStreamCalcRule$2658.processElement(Unknown Source)
at org.apache.flink.table.runtime.CRowProcessRunner.processElement(CRowProcessRunner.scala:66)
at org.apache.flink.table.runtime.CRowProcessRunner.processElement(CRowProcessRunner.scala:35)
at org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:579)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
at org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51)
at org.apache.flink.table.runtime.CRowWrappingCollector.collect(CRowWrappingCollector.scala:37)
at org.apache.flink.table.runtime.CRowWrappingCollector.collect(CRowWrappingCollector.scala:28)
at DataStreamSourceConversion$2595.processElement(Unknown Source)
at org.apache.flink.table.runtime.CRowOutputProcessRunner.processElement(CRowOutputProcessRunner.scala:70)
at org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:579)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
at org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104)
at org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collectWithTimestamp(StreamSourceContexts.java:111)
at org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordWithTimestamp(AbstractFetcher.java:398)
at org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.emitRecord(KafkaFetcher.java:185)
at org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.runFetchLoop(KafkaFetcher.java:150)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:711)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:93)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:57)
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:97)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:302)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: segment has been freed
at org.apache.flink.core.memory.HybridMemorySegment.put(HybridMemorySegment.java:228)
at org.apache.flink.core.memory.HybridMemorySegment.put(HybridMemorySegment.java:381)
at org.apache.flink.runtime.io.network.buffer.BufferBuilder.append(BufferBuilder.java:85)
at org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.copyToBufferBuilder(SpanningRecordSerializer.java:95)
at org.apache.flink.runtime.io.network.api.writer.RecordWriter.copyFromSerializerToTargetChannel(RecordWriter.java:178)
at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:162)
at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:128)
at org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:107)
... 39 more
2020-05-14 08:55:30,572 INFO org.apache.flink.runtime.taskmanager.Task - Freeing task resources for Source: KafkaTableSource
2020-05-14 08:55:30,572 ERROR org.apache.flink.runtime.taskmanager.Task - FATAL - exception in resource cleanup of task Source: KafkaTableSource.
java.lang.IllegalStateException: Memory manager has been shut down.
at org.apache.flink.runtime.memory.MemoryManager.releaseAll(MemoryManager.java:480)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:828)
at java.lang.Thread.run(Thread.java:745)
2020-05-14 08:55:30,572 ERROR org.apache.flink.runtime.taskexecutor.TaskExecutor - FATAL - exception in resource cleanup of task Source: KafkaTableSource(.
java.lang.IllegalStateException: Memory manager has been shut down.
at org.apache.flink.runtime.memory.MemoryManager.releaseAll(MemoryManager.java:480)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:828)
at java.lang.Thread.run(Thread.java:745)
2020-05-14 08:55:30,572 ERROR org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Fatal error occurred while executing the TaskManager. Shutting it down...
java.lang.IllegalStateException: Memory manager has been shut down.
at org.apache.flink.runtime.memory.MemoryManager.releaseAll(MemoryManager.java:480)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:828)
at java.lang.Thread.run(Thread.java:745)
2020-05-14 08:55:30,587 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut down.
2020-05-14 08:55:30,587 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut down.
2020-05-14 08:55:30,600 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopped Akka RPC service.
Reply | Threaded
Open this post in threaded view
|

回复:怎么排查taskmanager频繁挂掉的原因?

shao.hongxiao
你的是batch 模式吗




| |
邵红晓
|
|
邮箱:[hidden email]
|

签名由 网易邮箱大师 定制

在2020年05月15日 15:05,Jeff 写道:
hi all,
最近上线一批新任务后taskmanager频繁挂掉,很可能是OOM问题,操作系统日志里没找到相关记录,flink日志只找到如下部分,但还是不确定是什么原因,请问要怎么确定原因呢?




id, channel, rowtime) -> select: (appid, channel, rowtime, 1 AS $f3) b91d36766995398a9b0c9416ac1fb6bc.
2020-05-14 08:55:30,504 ERROR org.apache.flink.runtime.taskmanager.Task - Task did not exit gracefully within 180 + seconds.
2020-05-14 08:55:30,505 ERROR org.apache.flink.runtime.taskexecutor.TaskExecutor - Task did not exit gracefully within 180 + seconds.
2020-05-14 08:55:30,505 ERROR org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Fatal error occurred while executing the TaskManager. Shutting it down...
2020-05-14 08:55:30,505 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Stopping TaskExecutor akka.tcp://flink@172.24.150.24:48412/user/taskmanager_0.
2020-05-14 08:55:30,508 INFO org.apache.flink.runtime.taskexecutor.JobLeaderService - Stop job leader service.
2020-05-14 08:55:30,510 INFO org.apache.flink.runtime.state.TaskExecutorLocalStateStoresManager - Shutting down TaskExecutorLocalStateStoresManager.
2020-05-14 08:55:30,512 INFO org.apache.flink.runtime.io.disk.iomanager.IOManager - I/O manager removed spill file directory /tmp/flink-io-c882cdf4-840c-4c62-800e-0ca3226be20b
2020-05-14 08:55:30,512 INFO org.apache.flink.runtime.io.network.NetworkEnvironment - Shutting down the network environment and its components.
2020-05-14 08:55:30,514 INFO org.apache.flink.runtime.io.network.netty.NettyClient - Successful shutdown (took 2 ms).
2020-05-14 08:55:30,517 INFO org.apache.flink.runtime.io.network.netty.NettyServer - Successful shutdown (took 2 ms).
2020-05-14 08:55:30,545 INFO org.apache.flink.runtime.taskexecutor.JobLeaderService - Stop job leader service.
2020-05-14 08:55:30,546 INFO org.apache.flink.runtime.filecache.FileCache - removed file cache directory /tmp/flink-dist-cache-3e0d4351-d5aa-4358-a028-9a59f398d92f
2020-05-14 08:55:30,550 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Stopped TaskExecutor akka.tcp://flink@172.24.150.24:48412/user/taskmanager_0.
2020-05-14 08:55:30,552 INFO org.apache.flink.runtime.blob.PermanentBlobCache - Shutting down BLOB cache
2020-05-14 08:55:30,554 INFO org.apache.flink.runtime.blob.TransientBlobCache - Shutting down BLOB cache
2020-05-14 08:55:30,563 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopping Akka RPC service.
2020-05-14 08:55:30,566 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down remote daemon.
2020-05-14 08:55:30,567 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon shut down; proceeding with flushing remote transports.
2020-05-14 08:55:30,570 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down remote daemon.
2020-05-14 08:55:30,571 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon shut down; proceeding with flushing remote transports.
2020-05-14 08:55:30,571 INFO org.apache.flink.runtime.taskmanager.Task - Source: KafkaTableSource switched from RUNNING to FAILED.
java.lang.RuntimeException: segment has been freed
at org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:110)
at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:89)
at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:45)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
at org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51)
at org.apache.flink.table.runtime.CRowWrappingCollector.collect(CRowWrappingCollector.scala:37)
at org.apache.flink.table.runtime.CRowWrappingCollector.collect(CRowWrappingCollector.scala:28)
at DataStreamCalcRule$2658.processElement(Unknown Source)
at org.apache.flink.table.runtime.CRowProcessRunner.processElement(CRowProcessRunner.scala:66)
at org.apache.flink.table.runtime.CRowProcessRunner.processElement(CRowProcessRunner.scala:35)
at org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:579)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
at org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51)
at org.apache.flink.table.runtime.CRowWrappingCollector.collect(CRowWrappingCollector.scala:37)
at org.apache.flink.table.runtime.CRowWrappingCollector.collect(CRowWrappingCollector.scala:28)
at DataStreamSourceConversion$2595.processElement(Unknown Source)
at org.apache.flink.table.runtime.CRowOutputProcessRunner.processElement(CRowOutputProcessRunner.scala:70)
at org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:579)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
at org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104)
at org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collectWithTimestamp(StreamSourceContexts.java:111)
at org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordWithTimestamp(AbstractFetcher.java:398)
at org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.emitRecord(KafkaFetcher.java:185)
at org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.runFetchLoop(KafkaFetcher.java:150)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:711)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:93)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:57)
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:97)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:302)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: segment has been freed
at org.apache.flink.core.memory.HybridMemorySegment.put(HybridMemorySegment.java:228)
at org.apache.flink.core.memory.HybridMemorySegment.put(HybridMemorySegment.java:381)
at org.apache.flink.runtime.io.network.buffer.BufferBuilder.append(BufferBuilder.java:85)
at org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.copyToBufferBuilder(SpanningRecordSerializer.java:95)
at org.apache.flink.runtime.io.network.api.writer.RecordWriter.copyFromSerializerToTargetChannel(RecordWriter.java:178)
at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:162)
at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:128)
at org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:107)
... 39 more
2020-05-14 08:55:30,572 INFO org.apache.flink.runtime.taskmanager.Task - Freeing task resources for Source: KafkaTableSource
2020-05-14 08:55:30,572 ERROR org.apache.flink.runtime.taskmanager.Task - FATAL - exception in resource cleanup of task Source: KafkaTableSource.
java.lang.IllegalStateException: Memory manager has been shut down.
at org.apache.flink.runtime.memory.MemoryManager.releaseAll(MemoryManager.java:480)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:828)
at java.lang.Thread.run(Thread.java:745)
2020-05-14 08:55:30,572 ERROR org.apache.flink.runtime.taskexecutor.TaskExecutor - FATAL - exception in resource cleanup of task Source: KafkaTableSource(.
java.lang.IllegalStateException: Memory manager has been shut down.
at org.apache.flink.runtime.memory.MemoryManager.releaseAll(MemoryManager.java:480)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:828)
at java.lang.Thread.run(Thread.java:745)
2020-05-14 08:55:30,572 ERROR org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Fatal error occurred while executing the TaskManager. Shutting it down...
java.lang.IllegalStateException: Memory manager has been shut down.
at org.apache.flink.runtime.memory.MemoryManager.releaseAll(MemoryManager.java:480)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:828)
at java.lang.Thread.run(Thread.java:745)
2020-05-14 08:55:30,587 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut down.
2020-05-14 08:55:30,587 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut down.
2020-05-14 08:55:30,600 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopped Akka RPC service.
Reply | Threaded
Open this post in threaded view
|

回复:怎么排查taskmanager频繁挂掉的原因?

jimandlice
大佬 也看看我的问题呀




| |
jimandlice
|
|
邮箱:[hidden email]
|

Signature is customized by Netease Mail Master

在2020年05月15日 15:14,shao.hongxiao 写道:
你的是batch 模式吗




| |
邵红晓
|
|
邮箱:[hidden email]
|

签名由 网易邮箱大师 定制

在2020年05月15日 15:05,Jeff 写道:
hi all,
最近上线一批新任务后taskmanager频繁挂掉,很可能是OOM问题,操作系统日志里没找到相关记录,flink日志只找到如下部分,但还是不确定是什么原因,请问要怎么确定原因呢?




id, channel, rowtime) -> select: (appid, channel, rowtime, 1 AS $f3) b91d36766995398a9b0c9416ac1fb6bc.
2020-05-14 08:55:30,504 ERROR org.apache.flink.runtime.taskmanager.Task - Task did not exit gracefully within 180 + seconds.
2020-05-14 08:55:30,505 ERROR org.apache.flink.runtime.taskexecutor.TaskExecutor - Task did not exit gracefully within 180 + seconds.
2020-05-14 08:55:30,505 ERROR org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Fatal error occurred while executing the TaskManager. Shutting it down...
2020-05-14 08:55:30,505 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Stopping TaskExecutor akka.tcp://flink@172.24.150.24:48412/user/taskmanager_0.
2020-05-14 08:55:30,508 INFO org.apache.flink.runtime.taskexecutor.JobLeaderService - Stop job leader service.
2020-05-14 08:55:30,510 INFO org.apache.flink.runtime.state.TaskExecutorLocalStateStoresManager - Shutting down TaskExecutorLocalStateStoresManager.
2020-05-14 08:55:30,512 INFO org.apache.flink.runtime.io.disk.iomanager.IOManager - I/O manager removed spill file directory /tmp/flink-io-c882cdf4-840c-4c62-800e-0ca3226be20b
2020-05-14 08:55:30,512 INFO org.apache.flink.runtime.io.network.NetworkEnvironment - Shutting down the network environment and its components.
2020-05-14 08:55:30,514 INFO org.apache.flink.runtime.io.network.netty.NettyClient - Successful shutdown (took 2 ms).
2020-05-14 08:55:30,517 INFO org.apache.flink.runtime.io.network.netty.NettyServer - Successful shutdown (took 2 ms).
2020-05-14 08:55:30,545 INFO org.apache.flink.runtime.taskexecutor.JobLeaderService - Stop job leader service.
2020-05-14 08:55:30,546 INFO org.apache.flink.runtime.filecache.FileCache - removed file cache directory /tmp/flink-dist-cache-3e0d4351-d5aa-4358-a028-9a59f398d92f
2020-05-14 08:55:30,550 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Stopped TaskExecutor akka.tcp://flink@172.24.150.24:48412/user/taskmanager_0.
2020-05-14 08:55:30,552 INFO org.apache.flink.runtime.blob.PermanentBlobCache - Shutting down BLOB cache
2020-05-14 08:55:30,554 INFO org.apache.flink.runtime.blob.TransientBlobCache - Shutting down BLOB cache
2020-05-14 08:55:30,563 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopping Akka RPC service.
2020-05-14 08:55:30,566 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down remote daemon.
2020-05-14 08:55:30,567 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon shut down; proceeding with flushing remote transports.
2020-05-14 08:55:30,570 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down remote daemon.
2020-05-14 08:55:30,571 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon shut down; proceeding with flushing remote transports.
2020-05-14 08:55:30,571 INFO org.apache.flink.runtime.taskmanager.Task - Source: KafkaTableSource switched from RUNNING to FAILED.
java.lang.RuntimeException: segment has been freed
at org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:110)
at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:89)
at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:45)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
at org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51)
at org.apache.flink.table.runtime.CRowWrappingCollector.collect(CRowWrappingCollector.scala:37)
at org.apache.flink.table.runtime.CRowWrappingCollector.collect(CRowWrappingCollector.scala:28)
at DataStreamCalcRule$2658.processElement(Unknown Source)
at org.apache.flink.table.runtime.CRowProcessRunner.processElement(CRowProcessRunner.scala:66)
at org.apache.flink.table.runtime.CRowProcessRunner.processElement(CRowProcessRunner.scala:35)
at org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:579)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
at org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51)
at org.apache.flink.table.runtime.CRowWrappingCollector.collect(CRowWrappingCollector.scala:37)
at org.apache.flink.table.runtime.CRowWrappingCollector.collect(CRowWrappingCollector.scala:28)
at DataStreamSourceConversion$2595.processElement(Unknown Source)
at org.apache.flink.table.runtime.CRowOutputProcessRunner.processElement(CRowOutputProcessRunner.scala:70)
at org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:579)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
at org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104)
at org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collectWithTimestamp(StreamSourceContexts.java:111)
at org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordWithTimestamp(AbstractFetcher.java:398)
at org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.emitRecord(KafkaFetcher.java:185)
at org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.runFetchLoop(KafkaFetcher.java:150)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:711)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:93)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:57)
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:97)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:302)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: segment has been freed
at org.apache.flink.core.memory.HybridMemorySegment.put(HybridMemorySegment.java:228)
at org.apache.flink.core.memory.HybridMemorySegment.put(HybridMemorySegment.java:381)
at org.apache.flink.runtime.io.network.buffer.BufferBuilder.append(BufferBuilder.java:85)
at org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.copyToBufferBuilder(SpanningRecordSerializer.java:95)
at org.apache.flink.runtime.io.network.api.writer.RecordWriter.copyFromSerializerToTargetChannel(RecordWriter.java:178)
at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:162)
at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:128)
at org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:107)
... 39 more
2020-05-14 08:55:30,572 INFO org.apache.flink.runtime.taskmanager.Task - Freeing task resources for Source: KafkaTableSource
2020-05-14 08:55:30,572 ERROR org.apache.flink.runtime.taskmanager.Task - FATAL - exception in resource cleanup of task Source: KafkaTableSource.
java.lang.IllegalStateException: Memory manager has been shut down.
at org.apache.flink.runtime.memory.MemoryManager.releaseAll(MemoryManager.java:480)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:828)
at java.lang.Thread.run(Thread.java:745)
2020-05-14 08:55:30,572 ERROR org.apache.flink.runtime.taskexecutor.TaskExecutor - FATAL - exception in resource cleanup of task Source: KafkaTableSource(.
java.lang.IllegalStateException: Memory manager has been shut down.
at org.apache.flink.runtime.memory.MemoryManager.releaseAll(MemoryManager.java:480)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:828)
at java.lang.Thread.run(Thread.java:745)
2020-05-14 08:55:30,572 ERROR org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Fatal error occurred while executing the TaskManager. Shutting it down...
java.lang.IllegalStateException: Memory manager has been shut down.
at org.apache.flink.runtime.memory.MemoryManager.releaseAll(MemoryManager.java:480)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:828)
at java.lang.Thread.run(Thread.java:745)
2020-05-14 08:55:30,587 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut down.
2020-05-14 08:55:30,587 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut down.
2020-05-14 08:55:30,600 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopped Akka RPC service.
Reply | Threaded
Open this post in threaded view
|

Re:回复:怎么排查taskmanager频繁挂掉的原因?

Jeff
In reply to this post by shao.hongxiao



不是,是用per-job方式提交的














在 2020-05-15 14:14:20,"shao.hongxiao" <[hidden email]> 写道:

>你的是batch 模式吗
>
>
>
>
>| |
>邵红晓
>|
>|
>邮箱:[hidden email]
>|
>
>签名由 网易邮箱大师 定制
>
>在2020年05月15日 15:05,Jeff 写道:
>hi all,
>最近上线一批新任务后taskmanager频繁挂掉,很可能是OOM问题,操作系统日志里没找到相关记录,flink日志只找到如下部分,但还是不确定是什么原因,请问要怎么确定原因呢?
>
>
>
>
>id, channel, rowtime) -> select: (appid, channel, rowtime, 1 AS $f3) b91d36766995398a9b0c9416ac1fb6bc.
>2020-05-14 08:55:30,504 ERROR org.apache.flink.runtime.taskmanager.Task - Task did not exit gracefully within 180 + seconds.
>2020-05-14 08:55:30,505 ERROR org.apache.flink.runtime.taskexecutor.TaskExecutor - Task did not exit gracefully within 180 + seconds.
>2020-05-14 08:55:30,505 ERROR org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Fatal error occurred while executing the TaskManager. Shutting it down...
>2020-05-14 08:55:30,505 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Stopping TaskExecutor akka.tcp://flink@172.24.150.24:48412/user/taskmanager_0.
>2020-05-14 08:55:30,508 INFO org.apache.flink.runtime.taskexecutor.JobLeaderService - Stop job leader service.
>2020-05-14 08:55:30,510 INFO org.apache.flink.runtime.state.TaskExecutorLocalStateStoresManager - Shutting down TaskExecutorLocalStateStoresManager.
>2020-05-14 08:55:30,512 INFO org.apache.flink.runtime.io.disk.iomanager.IOManager - I/O manager removed spill file directory /tmp/flink-io-c882cdf4-840c-4c62-800e-0ca3226be20b
>2020-05-14 08:55:30,512 INFO org.apache.flink.runtime.io.network.NetworkEnvironment - Shutting down the network environment and its components.
>2020-05-14 08:55:30,514 INFO org.apache.flink.runtime.io.network.netty.NettyClient - Successful shutdown (took 2 ms).
>2020-05-14 08:55:30,517 INFO org.apache.flink.runtime.io.network.netty.NettyServer - Successful shutdown (took 2 ms).
>2020-05-14 08:55:30,545 INFO org.apache.flink.runtime.taskexecutor.JobLeaderService - Stop job leader service.
>2020-05-14 08:55:30,546 INFO org.apache.flink.runtime.filecache.FileCache - removed file cache directory /tmp/flink-dist-cache-3e0d4351-d5aa-4358-a028-9a59f398d92f
>2020-05-14 08:55:30,550 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Stopped TaskExecutor akka.tcp://flink@172.24.150.24:48412/user/taskmanager_0.
>2020-05-14 08:55:30,552 INFO org.apache.flink.runtime.blob.PermanentBlobCache - Shutting down BLOB cache
>2020-05-14 08:55:30,554 INFO org.apache.flink.runtime.blob.TransientBlobCache - Shutting down BLOB cache
>2020-05-14 08:55:30,563 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopping Akka RPC service.
>2020-05-14 08:55:30,566 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down remote daemon.
>2020-05-14 08:55:30,567 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon shut down; proceeding with flushing remote transports.
>2020-05-14 08:55:30,570 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down remote daemon.
>2020-05-14 08:55:30,571 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon shut down; proceeding with flushing remote transports.
>2020-05-14 08:55:30,571 INFO org.apache.flink.runtime.taskmanager.Task - Source: KafkaTableSource switched from RUNNING to FAILED.
>java.lang.RuntimeException: segment has been freed
>at org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:110)
>at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:89)
>at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:45)
>at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
>at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
>at org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51)
>at org.apache.flink.table.runtime.CRowWrappingCollector.collect(CRowWrappingCollector.scala:37)
>at org.apache.flink.table.runtime.CRowWrappingCollector.collect(CRowWrappingCollector.scala:28)
>at DataStreamCalcRule$2658.processElement(Unknown Source)
>at org.apache.flink.table.runtime.CRowProcessRunner.processElement(CRowProcessRunner.scala:66)
>at org.apache.flink.table.runtime.CRowProcessRunner.processElement(CRowProcessRunner.scala:35)
>at org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)
>at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:579)
>at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554)
>at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534)
>at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
>at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
>at org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51)
>at org.apache.flink.table.runtime.CRowWrappingCollector.collect(CRowWrappingCollector.scala:37)
>at org.apache.flink.table.runtime.CRowWrappingCollector.collect(CRowWrappingCollector.scala:28)
>at DataStreamSourceConversion$2595.processElement(Unknown Source)
>at org.apache.flink.table.runtime.CRowOutputProcessRunner.processElement(CRowOutputProcessRunner.scala:70)
>at org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)
>at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:579)
>at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554)
>at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534)
>at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
>at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
>at org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104)
>at org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collectWithTimestamp(StreamSourceContexts.java:111)
>at org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordWithTimestamp(AbstractFetcher.java:398)
>at org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.emitRecord(KafkaFetcher.java:185)
>at org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.runFetchLoop(KafkaFetcher.java:150)
>at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:711)
>at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:93)
>at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:57)
>at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:97)
>at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:302)
>at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>at java.lang.Thread.run(Thread.java:745)
>Caused by: java.lang.IllegalStateException: segment has been freed
>at org.apache.flink.core.memory.HybridMemorySegment.put(HybridMemorySegment.java:228)
>at org.apache.flink.core.memory.HybridMemorySegment.put(HybridMemorySegment.java:381)
>at org.apache.flink.runtime.io.network.buffer.BufferBuilder.append(BufferBuilder.java:85)
>at org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.copyToBufferBuilder(SpanningRecordSerializer.java:95)
>at org.apache.flink.runtime.io.network.api.writer.RecordWriter.copyFromSerializerToTargetChannel(RecordWriter.java:178)
>at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:162)
>at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:128)
>at org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:107)
>... 39 more
>2020-05-14 08:55:30,572 INFO org.apache.flink.runtime.taskmanager.Task - Freeing task resources for Source: KafkaTableSource
>2020-05-14 08:55:30,572 ERROR org.apache.flink.runtime.taskmanager.Task - FATAL - exception in resource cleanup of task Source: KafkaTableSource.
>java.lang.IllegalStateException: Memory manager has been shut down.
>at org.apache.flink.runtime.memory.MemoryManager.releaseAll(MemoryManager.java:480)
>at org.apache.flink.runtime.taskmanager.Task.run(Task.java:828)
>at java.lang.Thread.run(Thread.java:745)
>2020-05-14 08:55:30,572 ERROR org.apache.flink.runtime.taskexecutor.TaskExecutor - FATAL - exception in resource cleanup of task Source: KafkaTableSource(.
>java.lang.IllegalStateException: Memory manager has been shut down.
>at org.apache.flink.runtime.memory.MemoryManager.releaseAll(MemoryManager.java:480)
>at org.apache.flink.runtime.taskmanager.Task.run(Task.java:828)
>at java.lang.Thread.run(Thread.java:745)
>2020-05-14 08:55:30,572 ERROR org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Fatal error occurred while executing the TaskManager. Shutting it down...
>java.lang.IllegalStateException: Memory manager has been shut down.
>at org.apache.flink.runtime.memory.MemoryManager.releaseAll(MemoryManager.java:480)
>at org.apache.flink.runtime.taskmanager.Task.run(Task.java:828)
>at java.lang.Thread.run(Thread.java:745)
>2020-05-14 08:55:30,587 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut down.
>2020-05-14 08:55:30,587 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut down.
>2020-05-14 08:55:30,600 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopped Akka RPC service.
Reply | Threaded
Open this post in threaded view
|

Re: 回复:怎么排查taskmanager频繁挂掉的原因?

zhisheng
可以去 yarn 上找找 jobmanager 的日志,挂掉的作业,他的 jobmanager 日志应该还在的

Jeff <[hidden email]> 于2020年5月15日周五 下午3:28写道:

>
>
>
> 不是,是用per-job方式提交的
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> 在 2020-05-15 14:14:20,"shao.hongxiao" <[hidden email]> 写道:
> >你的是batch 模式吗
> >
> >
> >
> >
> >| |
> >邵红晓
> >|
> >|
> >邮箱:[hidden email]
> >|
> >
> >签名由 网易邮箱大师 定制
> >
> >在2020年05月15日 15:05,Jeff 写道:
> >hi all,
>
> >最近上线一批新任务后taskmanager频繁挂掉,很可能是OOM问题,操作系统日志里没找到相关记录,flink日志只找到如下部分,但还是不确定是什么原因,请问要怎么确定原因呢?
> >
> >
> >
> >
> >id, channel, rowtime) -> select: (appid, channel, rowtime, 1 AS $f3)
> b91d36766995398a9b0c9416ac1fb6bc.
> >2020-05-14 08:55:30,504 ERROR org.apache.flink.runtime.taskmanager.Task -
> Task did not exit gracefully within 180 + seconds.
> >2020-05-14 08:55:30,505 ERROR
> org.apache.flink.runtime.taskexecutor.TaskExecutor - Task did not exit
> gracefully within 180 + seconds.
> >2020-05-14 08:55:30,505 ERROR
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Fatal error
> occurred while executing the TaskManager. Shutting it down...
> >2020-05-14 08:55:30,505 INFO
> org.apache.flink.runtime.taskexecutor.TaskExecutor - Stopping TaskExecutor
> akka.tcp://flink@172.24.150.24:48412/user/taskmanager_0.
> >2020-05-14 08:55:30,508 INFO
> org.apache.flink.runtime.taskexecutor.JobLeaderService - Stop job leader
> service.
> >2020-05-14 08:55:30,510 INFO
> org.apache.flink.runtime.state.TaskExecutorLocalStateStoresManager -
> Shutting down TaskExecutorLocalStateStoresManager.
> >2020-05-14 08:55:30,512 INFO org.apache.flink.runtime.io.disk.iomanager.IOManager
> - I/O manager removed spill file directory
> /tmp/flink-io-c882cdf4-840c-4c62-800e-0ca3226be20b
> >2020-05-14 08:55:30,512 INFO org.apache.flink.runtime.io.network.NetworkEnvironment
> - Shutting down the network environment and its components.
> >2020-05-14 08:55:30,514 INFO org.apache.flink.runtime.io.network.netty.NettyClient
> - Successful shutdown (took 2 ms).
> >2020-05-14 08:55:30,517 INFO org.apache.flink.runtime.io.network.netty.NettyServer
> - Successful shutdown (took 2 ms).
> >2020-05-14 08:55:30,545 INFO
> org.apache.flink.runtime.taskexecutor.JobLeaderService - Stop job leader
> service.
> >2020-05-14 08:55:30,546 INFO org.apache.flink.runtime.filecache.FileCache
> - removed file cache directory
> /tmp/flink-dist-cache-3e0d4351-d5aa-4358-a028-9a59f398d92f
> >2020-05-14 08:55:30,550 INFO
> org.apache.flink.runtime.taskexecutor.TaskExecutor - Stopped TaskExecutor
> akka.tcp://flink@172.24.150.24:48412/user/taskmanager_0.
> >2020-05-14 08:55:30,552 INFO
> org.apache.flink.runtime.blob.PermanentBlobCache - Shutting down BLOB cache
> >2020-05-14 08:55:30,554 INFO
> org.apache.flink.runtime.blob.TransientBlobCache - Shutting down BLOB cache
> >2020-05-14 08:55:30,563 INFO
> org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopping Akka RPC
> service.
> >2020-05-14 08:55:30,566 INFO
> akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down
> remote daemon.
> >2020-05-14 08:55:30,567 INFO
> akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon shut
> down; proceeding with flushing remote transports.
> >2020-05-14 08:55:30,570 INFO
> akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down
> remote daemon.
> >2020-05-14 08:55:30,571 INFO
> akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon shut
> down; proceeding with flushing remote transports.
> >2020-05-14 08:55:30,571 INFO org.apache.flink.runtime.taskmanager.Task -
> Source: KafkaTableSource switched from RUNNING to FAILED.
> >java.lang.RuntimeException: segment has been freed
> >at
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:110)
> >at
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:89)
> >at
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:45)
> >at
> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
> >at
> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
> >at
> org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51)
> >at
> org.apache.flink.table.runtime.CRowWrappingCollector.collect(CRowWrappingCollector.scala:37)
> >at
> org.apache.flink.table.runtime.CRowWrappingCollector.collect(CRowWrappingCollector.scala:28)
> >at DataStreamCalcRule$2658.processElement(Unknown Source)
> >at
> org.apache.flink.table.runtime.CRowProcessRunner.processElement(CRowProcessRunner.scala:66)
> >at
> org.apache.flink.table.runtime.CRowProcessRunner.processElement(CRowProcessRunner.scala:35)
> >at
> org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)
> >at
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:579)
> >at
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554)
> >at
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534)
> >at
> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
> >at
> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
> >at
> org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51)
> >at
> org.apache.flink.table.runtime.CRowWrappingCollector.collect(CRowWrappingCollector.scala:37)
> >at
> org.apache.flink.table.runtime.CRowWrappingCollector.collect(CRowWrappingCollector.scala:28)
> >at DataStreamSourceConversion$2595.processElement(Unknown Source)
> >at
> org.apache.flink.table.runtime.CRowOutputProcessRunner.processElement(CRowOutputProcessRunner.scala:70)
> >at
> org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:66)
> >at
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:579)
> >at
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554)
> >at
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534)
> >at
> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
> >at
> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
> >at
> org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104)
> >at
> org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collectWithTimestamp(StreamSourceContexts.java:111)
> >at
> org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordWithTimestamp(AbstractFetcher.java:398)
> >at
> org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.emitRecord(KafkaFetcher.java:185)
> >at
> org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.runFetchLoop(KafkaFetcher.java:150)
> >at
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:711)
> >at
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:93)
> >at
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:57)
> >at
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:97)
> >at
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:302)
> >at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
> >at java.lang.Thread.run(Thread.java:745)
> >Caused by: java.lang.IllegalStateException: segment has been freed
> >at
> org.apache.flink.core.memory.HybridMemorySegment.put(HybridMemorySegment.java:228)
> >at
> org.apache.flink.core.memory.HybridMemorySegment.put(HybridMemorySegment.java:381)
> >at org.apache.flink.runtime.io
> .network.buffer.BufferBuilder.append(BufferBuilder.java:85)
> >at org.apache.flink.runtime.io
> .network.api.serialization.SpanningRecordSerializer.copyToBufferBuilder(SpanningRecordSerializer.java:95)
> >at org.apache.flink.runtime.io
> .network.api.writer.RecordWriter.copyFromSerializerToTargetChannel(RecordWriter.java:178)
> >at org.apache.flink.runtime.io
> .network.api.writer.RecordWriter.emit(RecordWriter.java:162)
> >at org.apache.flink.runtime.io
> .network.api.writer.RecordWriter.emit(RecordWriter.java:128)
> >at
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:107)
> >... 39 more
> >2020-05-14 08:55:30,572 INFO org.apache.flink.runtime.taskmanager.Task -
> Freeing task resources for Source: KafkaTableSource
> >2020-05-14 08:55:30,572 ERROR org.apache.flink.runtime.taskmanager.Task -
> FATAL - exception in resource cleanup of task Source: KafkaTableSource.
> >java.lang.IllegalStateException: Memory manager has been shut down.
> >at
> org.apache.flink.runtime.memory.MemoryManager.releaseAll(MemoryManager.java:480)
> >at org.apache.flink.runtime.taskmanager.Task.run(Task.java:828)
> >at java.lang.Thread.run(Thread.java:745)
> >2020-05-14 08:55:30,572 ERROR
> org.apache.flink.runtime.taskexecutor.TaskExecutor - FATAL - exception in
> resource cleanup of task Source: KafkaTableSource(.
> >java.lang.IllegalStateException: Memory manager has been shut down.
> >at
> org.apache.flink.runtime.memory.MemoryManager.releaseAll(MemoryManager.java:480)
> >at org.apache.flink.runtime.taskmanager.Task.run(Task.java:828)
> >at java.lang.Thread.run(Thread.java:745)
> >2020-05-14 08:55:30,572 ERROR
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Fatal error
> occurred while executing the TaskManager. Shutting it down...
> >java.lang.IllegalStateException: Memory manager has been shut down.
> >at
> org.apache.flink.runtime.memory.MemoryManager.releaseAll(MemoryManager.java:480)
> >at org.apache.flink.runtime.taskmanager.Task.run(Task.java:828)
> >at java.lang.Thread.run(Thread.java:745)
> >2020-05-14 08:55:30,587 INFO
> akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut down.
> >2020-05-14 08:55:30,587 INFO
> akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut down.
> >2020-05-14 08:55:30,600 INFO
> org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopped Akka RPC service.
>