Hi
这个错误“could only be replicated to 0 nodes instead of minReplication (=1)”是HDFS不稳定导致的,无法将数据进行duplicate与Flink本身并无关系。
祝好
唐云
________________________________
From: yanggang_it_job <
[hidden email]>
Sent: Monday, June 1, 2020 15:30
To:
[hidden email] <
[hidden email]>
Subject: checkpoint失败讨论
最近多个以rocksdb作为状态后端,hdfs作为远程文件系统的任务,频繁报错,这个报错有以下特征
1、报错之前这些任务都平稳运行,突然在某一天报错
2、当发现此类错误的时候,多个任务也会因相同的报错而导致checkpoint失败
报错信息如下
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/stream/flink-checkpoints/19523bf083346eb80b409167e9b91b53/chk-43396/cef72b90-8492-4b09-8d1b-384b0ebe5768 could only be replicated to 0 nodes instead of minReplication (=1). There are 8 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1723)
辛苦大家看看
谢谢