flink-1.10 读取hdfs目录下面所有文件,无输出

classic Classic list List threaded Threaded
4 messages Options
kcz
Reply | Threaded
Open this post in threaded view
|

flink-1.10 读取hdfs目录下面所有文件,无输出

kcz
代码如下:
String path = "hdfs://HACluster/user/flink/test-1/2020-05-29--15/";
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
FileInputFormat fileInputFormat = new TextInputFormat(new Path(path));
fileInputFormat.setNestedFileEnumeration(true);
env.readFile(fileInputFormat, path).print();
env.execute();hdfs数据目录如下:/user/flink/test-1/2020-05-29--15/.part-0-0.inprogress.6c12fe72-5602-4458-b29f-c8c8b4a7b73b(有数据)/user/flink/test-1/2020-05-29--15/.part-1-0.inprogress.34b1d5ff-cf0d-4209-b409-21920b12327d(有数据)问题如下:flink无法获取到数据输出
Reply | Threaded
Open this post in threaded view
|

回复:flink-1.10 读取hdfs目录下面所有文件,无输出

admin
Hi,kcz
inprogress阶段的文件是不可以被读取的,只有finished的文件才可以被读取[1],具体可以通过指定RollingPolicy
来指定文件什么时候由inprogress变为finished
[1]https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/connectors/streamfile_sink.html#part-file-lifecycle

[2]https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/connectors/streamfile_sink.html#rolling-policy


Best
Sun.Zhu
| |
Sun.Zhu
|
|
[hidden email]
|
签名由网易邮箱大师定制


在2020年06月2日 19:20,kcz<[hidden email]> 写道:
代码如下:
String path = "hdfs://HACluster/user/flink/test-1/2020-05-29--15/";
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
FileInputFormat fileInputFormat = new TextInputFormat(new Path(path));
fileInputFormat.setNestedFileEnumeration(true);
env.readFile(fileInputFormat, path).print();
env.execute();hdfs数据目录如下:/user/flink/test-1/2020-05-29--15/.part-0-0.inprogress.6c12fe72-5602-4458-b29f-c8c8b4a7b73b(有数据)/user/flink/test-1/2020-05-29--15/.part-1-0.inprogress.34b1d5ff-cf0d-4209-b409-21920b12327d(有数据)问题如下:flink无法获取到数据输出
Reply | Threaded
Open this post in threaded view
|

回复: flink-1.10 读取hdfs目录下面所有文件,无输出

wangweiguang@stevegame.cn
In reply to this post by kcz

你是否有添加hdfs-site.xml 和 core-site.xml到IDEA的resources目录下或者集群环境的conf目录下,你代码里用的是集群的namespace


 
发件人: kcz
发送时间: 2020-06-02 19:20
收件人: user-zh
主题: flink-1.10 读取hdfs目录下面所有文件,无输出
代码如下:
String path = "hdfs://HACluster/user/flink/test-1/2020-05-29--15/";
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
FileInputFormat fileInputFormat = new TextInputFormat(new Path(path));
fileInputFormat.setNestedFileEnumeration(true);
env.readFile(fileInputFormat, path).print();
env.execute();hdfs数据目录如下:/user/flink/test-1/2020-05-29--15/.part-0-0.inprogress.6c12fe72-5602-4458-b29f-c8c8b4a7b73b(有数据)/user/flink/test-1/2020-05-29--15/.part-1-0.inprogress.34b1d5ff-cf0d-4209-b409-21920b12327d(有数据)问题如下:flink无法获取到数据输出
kcz
Reply | Threaded
Open this post in threaded view
|

回复:flink-1.10 读取hdfs目录下面所有文件,无输出

kcz
In reply to this post by admin
谢谢大佬,我看看





------------------ 原始邮件 ------------------
发件人: Sun.Zhu <[hidden email]&gt;
发送时间: 2020年6月2日 23:57
收件人: [hidden email] <[hidden email]&gt;
抄送: user-zh <[hidden email]&gt;
主题: 回复:flink-1.10 读取hdfs目录下面所有文件,无输出



Hi,kcz
inprogress阶段的文件是不可以被读取的,只有finished的文件才可以被读取[1],具体可以通过指定RollingPolicy
来指定文件什么时候由inprogress变为finished
[1]https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/connectors/streamfile_sink.html#part-file-lifecycle

[2]https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/connectors/streamfile_sink.html#rolling-policy


Best
Sun.Zhu
| |
Sun.Zhu
|
|
[hidden email]
|
签名由网易邮箱大师定制


在2020年06月2日 19:20,kcz<[hidden email]&gt; 写道:
代码如下:
String path = "hdfs://HACluster/user/flink/test-1/2020-05-29--15/";
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
FileInputFormat fileInputFormat = new TextInputFormat(new Path(path));
fileInputFormat.setNestedFileEnumeration(true);
env.readFile(fileInputFormat, path).print();
env.execute();hdfs数据目录如下:/user/flink/test-1/2020-05-29--15/.part-0-0.inprogress.6c12fe72-5602-4458-b29f-c8c8b4a7b73b(有数据)/user/flink/test-1/2020-05-29--15/.part-1-0.inprogress.34b1d5ff-cf0d-4209-b409-21920b12327d(有数据)问题如下:flink无法获取到数据输出