LocalBufferPoo死锁

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

LocalBufferPoo死锁

1548069580
各位好:
最近遇到一个问题,上游有反压的情况下任务运行一段时间后出现上下游数据均停滞的情况,通过jstack命令发现,source算子阻塞了,同时观察到下游也在等待数据。堆栈如下:
"Legacy Source Thread - Source: Custom Source (1/2)" #95 prio=5 os_prio=0 tid=0x00007fafa4018000 nid=0x57d waiting on condition [0x00007fb03d48a000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for&nbsp; <0x000000074afaf508&gt; (a java.util.concurrent.CompletableFuture$Signaller)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
        at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
        at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
        at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
        at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestMemorySegmentBlocking(LocalBufferPool.java:241)
        at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBufferBuilderBlocking(LocalBufferPool.java:210)
        at org.apache.flink.runtime.io.network.partition.ResultPartition.getBufferBuilder(ResultPartition.java:189)
        at org.apache.flink.runtime.io.network.api.writer.ChannelSelectorRecordWriter.requestNewBufferBuilder(ChannelSelectorRecordWriter.java:103)
        at org.apache.flink.runtime.io.network.api.writer.RecordWriter.copyFromSerializerToTargetChannel(RecordWriter.java:151)
        at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:122)
        at org.apache.flink.runtime.io.network.api.writer.ChannelSelectorRecordWriter.emit(ChannelSelectorRecordWriter.java:60)
        at org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:107)
        at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:89)
        at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:45)
        at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:730)
        at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:708)
        at org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104)
        - locked <0x000000074a5f6a98&gt; (a java.lang.Object)
        at com.jd.bdp.flink.sink.jimdb.common.SourceTimeMillisMock.run(SourceTimeMillisMock.java:25)
        at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
        at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
        at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:200)


初步认为是卡死在获取Memory Segment,但是在flink ui上观察到Memory Segments Available充足,想咨询下各位这种情况如何分析解决,麻烦各位了。
Reply | Threaded
Open this post in threaded view
|

Re:LocalBufferPoo死锁

hailongwang
Hi,
这个应该是下游算子有压力,可以根据 Inpool 指标查看哪个算子有瓶颈,然后对应的进行处理。




Best,
Hailong Wang
在 2020-10-27 18:57:55,"1548069580" <[hidden email]> 写道

>各位好:
>最近遇到一个问题,上游有反压的情况下任务运行一段时间后出现上下游数据均停滞的情况,通过jstack命令发现,source算子阻塞了,同时观察到下游也在等待数据。堆栈如下:
>"Legacy Source Thread - Source: Custom Source (1/2)" #95 prio=5 os_prio=0 tid=0x00007fafa4018000 nid=0x57d waiting on condition [0x00007fb03d48a000]
>&nbsp; &nbsp;java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for&nbsp; <0x000000074afaf508&gt; (a java.util.concurrent.CompletableFuture$Signaller)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
> at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
> at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
> at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestMemorySegmentBlocking(LocalBufferPool.java:241)
> at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBufferBuilderBlocking(LocalBufferPool.java:210)
> at org.apache.flink.runtime.io.network.partition.ResultPartition.getBufferBuilder(ResultPartition.java:189)
> at org.apache.flink.runtime.io.network.api.writer.ChannelSelectorRecordWriter.requestNewBufferBuilder(ChannelSelectorRecordWriter.java:103)
> at org.apache.flink.runtime.io.network.api.writer.RecordWriter.copyFromSerializerToTargetChannel(RecordWriter.java:151)
> at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:122)
> at org.apache.flink.runtime.io.network.api.writer.ChannelSelectorRecordWriter.emit(ChannelSelectorRecordWriter.java:60)
> at org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:107)
> at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:89)
> at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:45)
> at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:730)
> at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:708)
> at org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104)
> - locked <0x000000074a5f6a98&gt; (a java.lang.Object)
> at com.jd.bdp.flink.sink.jimdb.common.SourceTimeMillisMock.run(SourceTimeMillisMock.java:25)
> at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
> at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
> at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:200)
>
>
>初步认为是卡死在获取Memory Segment,但是在flink ui上观察到Memory Segments Available充足,想咨询下各位这种情况如何分析解决,麻烦各位了。