|
hi,all
最近用flink写入kafka,发现checkpoint失败特别多,基本50%的都失败了。
checkpoint时间间隔的30~60s 之间,没有大状态,基本就是维护offset的状态
希望能帮我看看 是什么原因导致的,能否降低一下 checkpoint的失败率
主要报错如下所示:
Caused by: org.apache.kafka.common.errors.ProducerFencedException: Producer attempted an operation with an old epoch. Either there is a newer producer with the same transactionalId, or the producer's transaction has been expired by the broker.
Caused by: org.apache.flink.util.FlinkRuntimeException: Committing one of transactions failed, logging first encountered failure
kafka日志如下:
org.apache.kafka.common.errors.ProducerFencedException: Producer's epoch is no longer valid. There is probably another producer with a newer epoch. 16744 (request epoch), 16745 (server epoch)
org.apache.kafka.common.errors.ProducerFencedException: Producer's epoch is no longer valid. There is probably another producer with a newer epoch. 16744 (request epoch), 16745 (server epoch)
org.apache.kafka.common.errors.ProducerFencedException: Producer's epoch is no longer valid. There is probably another producer with a newer epoch. 16744 (request epoch), 16745 (server epoch)
org.apache.kafka.common.errors.ProducerFencedException: Producer's epoch is no longer valid. There is probably another producer with a newer epoch. 16744 (request epoch), 16745 (server epoch)
hdfs暂时没发现异常
|