代码如下:
String path = "hdfs://HACluster/user/flink/test-1/2020-05-29--15/"; StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); FileInputFormat fileInputFormat = new TextInputFormat(new Path(path)); fileInputFormat.setNestedFileEnumeration(true); env.readFile(fileInputFormat, path).print(); env.execute();hdfs数据目录如下:/user/flink/test-1/2020-05-29--15/.part-0-0.inprogress.6c12fe72-5602-4458-b29f-c8c8b4a7b73b(有数据)/user/flink/test-1/2020-05-29--15/.part-1-0.inprogress.34b1d5ff-cf0d-4209-b409-21920b12327d(有数据)问题如下:flink无法获取到数据输出 |
Hi,kcz
inprogress阶段的文件是不可以被读取的,只有finished的文件才可以被读取[1],具体可以通过指定RollingPolicy 来指定文件什么时候由inprogress变为finished [1]https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/connectors/streamfile_sink.html#part-file-lifecycle [2]https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/connectors/streamfile_sink.html#rolling-policy Best Sun.Zhu | | Sun.Zhu | | [hidden email] | 签名由网易邮箱大师定制 在2020年06月2日 19:20,kcz<[hidden email]> 写道: 代码如下: String path = "hdfs://HACluster/user/flink/test-1/2020-05-29--15/"; StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); FileInputFormat fileInputFormat = new TextInputFormat(new Path(path)); fileInputFormat.setNestedFileEnumeration(true); env.readFile(fileInputFormat, path).print(); env.execute();hdfs数据目录如下:/user/flink/test-1/2020-05-29--15/.part-0-0.inprogress.6c12fe72-5602-4458-b29f-c8c8b4a7b73b(有数据)/user/flink/test-1/2020-05-29--15/.part-1-0.inprogress.34b1d5ff-cf0d-4209-b409-21920b12327d(有数据)问题如下:flink无法获取到数据输出 |
In reply to this post by kcz
你是否有添加hdfs-site.xml 和 core-site.xml到IDEA的resources目录下或者集群环境的conf目录下,你代码里用的是集群的namespace 发件人: kcz 发送时间: 2020-06-02 19:20 收件人: user-zh 主题: flink-1.10 读取hdfs目录下面所有文件,无输出 代码如下: String path = "hdfs://HACluster/user/flink/test-1/2020-05-29--15/"; StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); FileInputFormat fileInputFormat = new TextInputFormat(new Path(path)); fileInputFormat.setNestedFileEnumeration(true); env.readFile(fileInputFormat, path).print(); env.execute();hdfs数据目录如下:/user/flink/test-1/2020-05-29--15/.part-0-0.inprogress.6c12fe72-5602-4458-b29f-c8c8b4a7b73b(有数据)/user/flink/test-1/2020-05-29--15/.part-1-0.inprogress.34b1d5ff-cf0d-4209-b409-21920b12327d(有数据)问题如下:flink无法获取到数据输出 |
In reply to this post by admin
谢谢大佬,我看看
------------------ 原始邮件 ------------------ 发件人: Sun.Zhu <[hidden email]> 发送时间: 2020年6月2日 23:57 收件人: [hidden email] <[hidden email]> 抄送: user-zh <[hidden email]> 主题: 回复:flink-1.10 读取hdfs目录下面所有文件,无输出 Hi,kcz inprogress阶段的文件是不可以被读取的,只有finished的文件才可以被读取[1],具体可以通过指定RollingPolicy 来指定文件什么时候由inprogress变为finished [1]https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/connectors/streamfile_sink.html#part-file-lifecycle [2]https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/connectors/streamfile_sink.html#rolling-policy Best Sun.Zhu | | Sun.Zhu | | [hidden email] | 签名由网易邮箱大师定制 在2020年06月2日 19:20,kcz<[hidden email]> 写道: 代码如下: String path = "hdfs://HACluster/user/flink/test-1/2020-05-29--15/"; StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); FileInputFormat fileInputFormat = new TextInputFormat(new Path(path)); fileInputFormat.setNestedFileEnumeration(true); env.readFile(fileInputFormat, path).print(); env.execute();hdfs数据目录如下:/user/flink/test-1/2020-05-29--15/.part-0-0.inprogress.6c12fe72-5602-4458-b29f-c8c8b4a7b73b(有数据)/user/flink/test-1/2020-05-29--15/.part-1-0.inprogress.34b1d5ff-cf0d-4209-b409-21920b12327d(有数据)问题如下:flink无法获取到数据输出 |
Free forum by Nabble | Edit this page |