flink 批方式如何读取多路径文件或通配符文件

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

flink 批方式如何读取多路径文件或通配符文件

无边
HI ALL :
     请问下,flink批方式如何读取多路径文件或通配符文件?如下:
           /abc/202004*/t1.data  读2020年4月所有t1.data文件;
           /abc/20200401/t*.data 读2020年4月1日目录下所有t开头的文件
     谢谢!
Reply | Threaded
Open this post in threaded view
|

Re: flink 批方式如何读取多路径文件或通配符文件

Jingsong Li
Hi,

你是在用Dataset还是SQL?

如果是Dataset或是Datastream
先把文件筛选出来,然后FileInputFormat.setFilePaths?

Best,
Jingsong Lee

On Sun, Apr 26, 2020 at 10:01 PM 无痕 <[hidden email]> wrote:

> HI ALL :
> &nbsp; &nbsp; &nbsp;请问下,flink批方式如何读取多路径文件或通配符文件?如下:
> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;/abc/202004*/t1.data&nbsp;
> 读2020年4月所有t1.data文件;
> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;/abc/20200401/t*.data
> 读2020年4月1日目录下所有t开头的文件
> &nbsp; &nbsp; &nbsp;谢谢!



--
Best, Jingsong Lee
Reply | Threaded
Open this post in threaded view
|

Re: flink 批方式如何读取多路径文件或通配符文件

Jingsong Li
FYI:
建了个ISSUE来支持通配符。

Best,
Jingsong Lee

On Mon, Apr 27, 2020 at 9:29 AM Jingsong Li <[hidden email]> wrote:

> Hi,
>
> 你是在用Dataset还是SQL?
>
> 如果是Dataset或是Datastream
> 先把文件筛选出来,然后FileInputFormat.setFilePaths?
>
> Best,
> Jingsong Lee
>
> On Sun, Apr 26, 2020 at 10:01 PM 无痕 <[hidden email]> wrote:
>
>> HI ALL :
>> &nbsp; &nbsp; &nbsp;请问下,flink批方式如何读取多路径文件或通配符文件?如下:
>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;/abc/202004*/t1.data&nbsp;
>> 读2020年4月所有t1.data文件;
>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;/abc/20200401/t*.data
>> 读2020年4月1日目录下所有t开头的文件
>> &nbsp; &nbsp; &nbsp;谢谢!
>
>
>
> --
> Best, Jingsong Lee
>


--
Best, Jingsong Lee
Reply | Threaded
Open this post in threaded view
|

Re: flink 批方式如何读取多路径文件或通配符文件

Jark
Administrator
FYI: the issue number is FLINK-17397

> 2020年4月27日 11:36,Jingsong Li <[hidden email]> 写道:
>
> FYI:
> 建了个ISSUE来支持通配符。
>
> Best,
> Jingsong Lee
>
> On Mon, Apr 27, 2020 at 9:29 AM Jingsong Li <[hidden email]> wrote:
>
>> Hi,
>>
>> 你是在用Dataset还是SQL?
>>
>> 如果是Dataset或是Datastream
>> 先把文件筛选出来,然后FileInputFormat.setFilePaths?
>>
>> Best,
>> Jingsong Lee
>>
>> On Sun, Apr 26, 2020 at 10:01 PM 无痕 <[hidden email]> wrote:
>>
>>> HI ALL :
>>> &nbsp; &nbsp; &nbsp;请问下,flink批方式如何读取多路径文件或通配符文件?如下:
>>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;/abc/202004*/t1.data&nbsp;
>>> 读2020年4月所有t1.data文件;
>>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;/abc/20200401/t*.data
>>> 读2020年4月1日目录下所有t开头的文件
>>> &nbsp; &nbsp; &nbsp;谢谢!
>>
>>
>>
>> --
>> Best, Jingsong Lee
>>
>
>
> --
> Best, Jingsong Lee

Reply | Threaded
Open this post in threaded view
|

Re: flink 批方式如何读取多路径文件或通配符文件

Jark
Administrator
抱歉,粘贴错 issue 了。 正确的链接是:https://issues.apache.org/jira/browse/FLINK-17398

On Mon, 27 Apr 2020 at 13:36, Jark Wu <[hidden email]> wrote:

> FYI: the issue number is FLINK-17397
>
> > 2020年4月27日 11:36,Jingsong Li <[hidden email]> 写道:
> >
> > FYI:
> > 建了个ISSUE来支持通配符。
> >
> > Best,
> > Jingsong Lee
> >
> > On Mon, Apr 27, 2020 at 9:29 AM Jingsong Li <[hidden email]>
> wrote:
> >
> >> Hi,
> >>
> >> 你是在用Dataset还是SQL?
> >>
> >> 如果是Dataset或是Datastream
> >> 先把文件筛选出来,然后FileInputFormat.setFilePaths?
> >>
> >> Best,
> >> Jingsong Lee
> >>
> >> On Sun, Apr 26, 2020 at 10:01 PM 无痕 <[hidden email]> wrote:
> >>
> >>> HI ALL :
> >>> &nbsp; &nbsp; &nbsp;请问下,flink批方式如何读取多路径文件或通配符文件?如下:
> >>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;/abc/202004*/t1.data&nbsp;
> >>> 读2020年4月所有t1.data文件;
> >>> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;/abc/20200401/t*.data
> >>> 读2020年4月1日目录下所有t开头的文件
> >>> &nbsp; &nbsp; &nbsp;谢谢!
> >>
> >>
> >>
> >> --
> >> Best, Jingsong Lee
> >>
> >
> >
> > --
> > Best, Jingsong Lee
>
>
Reply | Threaded
Open this post in threaded view
|

回复: flink 批方式如何读取多路径文件或通配符文件

无边
In reply to this post by Jingsong Li
感谢回复!
应用使用Dataset,查了下FileInputFormat是抽象类,我看里面supportsMultiPaths方法被Deprecated
/**
 * Override this method to supports multiple paths.
 * When this method will be removed, all FileInputFormats have to support multiple paths.
 *
 * @return True if the FileInputFormat supports multiple paths, false otherwise.
 *
 * @deprecated Will be removed for Flink 2.0.
 */
@Deprecated
public boolean supportsMultiPaths() {
   return false;
}




------------------&nbsp;原始邮件&nbsp;------------------
发件人:&nbsp;"Jingsong Li"<[hidden email]&gt;;
发送时间:&nbsp;2020年4月27日(星期一) 上午9:29
收件人:&nbsp;"user-zh"<[hidden email]&gt;;

主题:&nbsp;Re: flink 批方式如何读取多路径文件或通配符文件



Hi,

你是在用Dataset还是SQL?

如果是Dataset或是Datastream
先把文件筛选出来,然后FileInputFormat.setFilePaths?

Best,
Jingsong Lee

On Sun, Apr 26, 2020 at 10:01 PM 无痕 <[hidden email]&gt; wrote:

&gt; HI ALL :
&gt; &amp;nbsp; &amp;nbsp; &amp;nbsp;请问下,flink批方式如何读取多路径文件或通配符文件?如下:
&gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;/abc/202004*/t1.data&amp;nbsp;
&gt; 读2020年4月所有t1.data文件;
&gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;/abc/20200401/t*.data
&gt; 读2020年4月1日目录下所有t开头的文件
&gt; &amp;nbsp; &amp;nbsp; &amp;nbsp;谢谢!



--
Best, Jingsong Lee
Reply | Threaded
Open this post in threaded view
|

Re: flink 批方式如何读取多路径文件或通配符文件

Jingsong Li
"all FileInputFormats have to support multiple paths"
如果你有自己的实现,overwrite supportsMultiPaths它为true,几乎所有的实现有是true的。

如果你使用DataStream,需注意了,不能使用StreamExecutionEnvironment.createInput(不支持多路径),需显示使用addSource(new
InputFormatSourceFunction)

Best,
Jingsong Lee

On Mon, Apr 27, 2020 at 3:43 PM 无痕 <[hidden email]> wrote:

> 感谢回复!
> 应用使用Dataset,查了下FileInputFormat是抽象类,我看里面supportsMultiPaths方法被Deprecated
> /**
>  * Override this method to supports multiple paths.
>  * When this method will be removed, all FileInputFormats have to support
> multiple paths.
>  *
>  * @return True if the FileInputFormat supports multiple paths, false
> otherwise.
>  *
>  * @deprecated Will be removed for Flink 2.0.
>  */
> @Deprecated
> public boolean supportsMultiPaths() {
>    return false;
> }
>
>
>
>
> ------------------&nbsp;原始邮件&nbsp;------------------
> 发件人:&nbsp;"Jingsong Li"<[hidden email]&gt;;
> 发送时间:&nbsp;2020年4月27日(星期一) 上午9:29
> 收件人:&nbsp;"user-zh"<[hidden email]&gt;;
>
> 主题:&nbsp;Re: flink 批方式如何读取多路径文件或通配符文件
>
>
>
> Hi,
>
> 你是在用Dataset还是SQL?
>
> 如果是Dataset或是Datastream
> 先把文件筛选出来,然后FileInputFormat.setFilePaths?
>
> Best,
> Jingsong Lee
>
> On Sun, Apr 26, 2020 at 10:01 PM 无痕 <[hidden email]&gt; wrote:
>
> &gt; HI ALL :
> &gt; &amp;nbsp; &amp;nbsp; &amp;nbsp;请问下,flink批方式如何读取多路径文件或通配符文件?如下:
> &gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;
> &amp;nbsp;/abc/202004*/t1.data&amp;nbsp;
> &gt; 读2020年4月所有t1.data文件;
> &gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;
> &amp;nbsp;/abc/20200401/t*.data
> &gt; 读2020年4月1日目录下所有t开头的文件
> &gt; &amp;nbsp; &amp;nbsp; &amp;nbsp;谢谢!
>
>
>
> --
> Best, Jingsong Lee



--
Best, Jingsong Lee