Flink 1.11.1 on k8s 如何配置hadoop

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink 1.11.1 on k8s 如何配置hadoop

hechuan
Hi,
Flink 1.11.1 想运行到K8S上面, 使用的镜像是flink:1.11.1-scala_2.12, 按照官网上面介绍的, 部署session cluster, jobmanager和taskmanager都启动成功了
然后提交任务的时候会报错:
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Hadoop is not in the classpath/dependencies.
at org.apache.flink.core.fs.UnsupportedSchemeFactory.create(UnsupportedSchemeFactory.java:58)
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:487)


提示找不到Hadoop的依赖, 从官网上介绍的话, 1.11开始不建议使用flink-shaded-hadoop-2-uber, 需要HADOOP_CLASSPATH
但是还是不太清楚怎么和K8S部署结合起来, 官网说的比较简单粗略, 有人有实际成功的案例可以分享下吗?


Thx

Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.11.1 on k8s 如何配置hadoop

caozhen
顺手贴一下flink1.11.1的hadoop集成wiki:
https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/hadoop.html

根据官网说不再提供flink-shaded-hadoop-2-uber。并给出以下两种解决方式

1、建议使用HADOOP_CLASSPATH加载hadoop依赖
2、或者将hadoop依赖放到flink客户端lib目录下

*我在用1.11.1 flink on
yarn时,使用的是第二种方式,下载hadoop-src包,将一些常用依赖拷贝到lib目录下。(这可能会和你的mainjar程序发生类冲突问题,需要调试)

我觉得目前这种方式不好,只是暂时解决问题。还是应该有flink-shaded-hadoop包,正在尝试打包,有些问题还没完全解决。
*



--
Sent from: http://apache-flink.147419.n8.nabble.com/
Reply | Threaded
Open this post in threaded view
|

回复: Flink 1.11.1 on k8s 如何配置hadoop

Meng Wang
官网的镜像只包含 Flink 相关的内容,如果需要连接 HDFS,你需要将 Hadoop 相关包及配置打到镜像中


--

Best,
Matt Wang


在2020年08月7日 12:49,caozhen<[hidden email]> 写道:
顺手贴一下flink1.11.1的hadoop集成wiki:
https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/hadoop.html

根据官网说不再提供flink-shaded-hadoop-2-uber。并给出以下两种解决方式

1、建议使用HADOOP_CLASSPATH加载hadoop依赖
2、或者将hadoop依赖放到flink客户端lib目录下

*我在用1.11.1 flink on
yarn时,使用的是第二种方式,下载hadoop-src包,将一些常用依赖拷贝到lib目录下。(这可能会和你的mainjar程序发生类冲突问题,需要调试)

我觉得目前这种方式不好,只是暂时解决问题。还是应该有flink-shaded-hadoop包,正在尝试打包,有些问题还没完全解决。
*



--
Sent from: http://apache-flink.147419.n8.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.11.1 on k8s 如何配置hadoop

Yang Wang
Matt Wang是正确的

目前Flink发布的binary和镜像里面都没有flink-shaded-hadoop,所以需要你在官方镜像的基础再加一层
把flink-shaded-hadoop[1]打到/opt/flink/lib目录下

FROM flinkCOPY /path/of/flink-shaded-hadoop-2-uber-*.jar $FLINK_HOME/lib/


[1].
https://mvnrepository.com/artifact/org.apache.flink/flink-shaded-hadoop-2-uber


Best,
Yang

Matt Wang <[hidden email]> 于2020年8月7日周五 下午5:22写道:

> 官网的镜像只包含 Flink 相关的内容,如果需要连接 HDFS,你需要将 Hadoop 相关包及配置打到镜像中
>
>
> --
>
> Best,
> Matt Wang
>
>
> 在2020年08月7日 12:49,caozhen<[hidden email]> 写道:
> 顺手贴一下flink1.11.1的hadoop集成wiki:
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/hadoop.html
>
> 根据官网说不再提供flink-shaded-hadoop-2-uber。并给出以下两种解决方式
>
> 1、建议使用HADOOP_CLASSPATH加载hadoop依赖
> 2、或者将hadoop依赖放到flink客户端lib目录下
>
> *我在用1.11.1 flink on
>
> yarn时,使用的是第二种方式,下载hadoop-src包,将一些常用依赖拷贝到lib目录下。(这可能会和你的mainjar程序发生类冲突问题,需要调试)
>
> 我觉得目前这种方式不好,只是暂时解决问题。还是应该有flink-shaded-hadoop包,正在尝试打包,有些问题还没完全解决。
> *
>
>
>
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re:Re: Flink 1.11.1 on k8s 如何配置hadoop

hechuan
Hi,
我下载了flink-shaded-hadoop-2-uber-2.8.3-10.0.jar, 然后放到了lib下, 重启了集群, 但是启动任务还是会报错:
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'hdfs'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded. For a full list of supported file systems, please see https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/.
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:491)
at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:389)
at org.apache.flink.core.fs.Path.getFileSystem(Path.java:292)
at org.apache.flink.runtime.state.filesystem.FsCheckpointStorage.<init>(FsCheckpointStorage.java:64)
at org.apache.flink.runtime.state.filesystem.FsStateBackend.createCheckpointStorage(FsStateBackend.java:501)
at org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createCheckpointStorage(RocksDBStateBackend.java:465)
at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.<init>(CheckpointCoordinator.java:301)
... 22 more
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Hadoop is not in the classpath/dependencies.
at org.apache.flink.core.fs.UnsupportedSchemeFactory.create(UnsupportedSchemeFactory.java:58)
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:487)
... 28 more


lib下有这些jar包
$ ls lib/
avro-1.8.2.jar
flink-avro-1.11.1-sql-jar.jar
flink-connector-jdbc_2.12-1.11.1.jar
flink-csv-1.11.1.jar
flink-dist_2.12-1.11.1.jar
flink-json-1.11.1.jar
flink-shaded-hadoop-2-uber-2.8.3-10.0.jar
flink-shaded-zookeeper-3.4.14.jar
flink-sql-connector-kafka_2.12-1.11.1.jar
flink-table_2.12-1.11.1.jar
flink-table-blink_2.12-1.11.1.jar
kafka-clients-2.5.0.jar
log4j-1.2-api-2.12.1.jar
log4j-api-2.12.1.jar
log4j-core-2.12.1.jar
log4j-slf4j-impl-2.12.1.jar
mysql-connector-java-5.1.49.jar



在 2020-08-10 10:13:44,"Yang Wang" <[hidden email]> 写道:

>Matt Wang是正确的
>
>目前Flink发布的binary和镜像里面都没有flink-shaded-hadoop,所以需要你在官方镜像的基础再加一层
>把flink-shaded-hadoop[1]打到/opt/flink/lib目录下
>
>FROM flinkCOPY /path/of/flink-shaded-hadoop-2-uber-*.jar $FLINK_HOME/lib/
>
>
>[1].
>https://mvnrepository.com/artifact/org.apache.flink/flink-shaded-hadoop-2-uber
>
>
>Best,
>Yang
>
>Matt Wang <[hidden email]> 于2020年8月7日周五 下午5:22写道:
>
>> 官网的镜像只包含 Flink 相关的内容,如果需要连接 HDFS,你需要将 Hadoop 相关包及配置打到镜像中
>>
>>
>> --
>>
>> Best,
>> Matt Wang
>>
>>
>> 在2020年08月7日 12:49,caozhen<[hidden email]> 写道:
>> 顺手贴一下flink1.11.1的hadoop集成wiki:
>>
>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/hadoop.html
>>
>> 根据官网说不再提供flink-shaded-hadoop-2-uber。并给出以下两种解决方式
>>
>> 1、建议使用HADOOP_CLASSPATH加载hadoop依赖
>> 2、或者将hadoop依赖放到flink客户端lib目录下
>>
>> *我在用1.11.1 flink on
>>
>> yarn时,使用的是第二种方式,下载hadoop-src包,将一些常用依赖拷贝到lib目录下。(这可能会和你的mainjar程序发生类冲突问题,需要调试)
>>
>> 我觉得目前这种方式不好,只是暂时解决问题。还是应该有flink-shaded-hadoop包,正在尝试打包,有些问题还没完全解决。
>> *
>>
>>
>>
>> --
>> Sent from: http://apache-flink.147419.n8.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Re: Flink 1.11.1 on k8s 如何配置hadoop

Yang Wang
你是自己打了一个新的镜像,把flink-shaded-hadoop-2-uber-2.8.3-10.0.jar放到lib下面了吗
如果是的话不应该有这样的问题

Best,
Yang

RS <[hidden email]> 于2020年8月10日周一 下午12:04写道:

> Hi,
> 我下载了flink-shaded-hadoop-2-uber-2.8.3-10.0.jar, 然后放到了lib下, 重启了集群,
> 但是启动任务还是会报错:
> Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException:
> Could not find a file system implementation for scheme 'hdfs'. The scheme
> is not directly supported by Flink and no Hadoop file system to support
> this scheme could be loaded. For a full list of supported file systems,
> please see
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/.
> at
> org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:491)
> at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:389)
> at org.apache.flink.core.fs.Path.getFileSystem(Path.java:292)
> at
> org.apache.flink.runtime.state.filesystem.FsCheckpointStorage.<init>(FsCheckpointStorage.java:64)
> at
> org.apache.flink.runtime.state.filesystem.FsStateBackend.createCheckpointStorage(FsStateBackend.java:501)
> at
> org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createCheckpointStorage(RocksDBStateBackend.java:465)
> at
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.<init>(CheckpointCoordinator.java:301)
> ... 22 more
> Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException:
> Hadoop is not in the classpath/dependencies.
> at
> org.apache.flink.core.fs.UnsupportedSchemeFactory.create(UnsupportedSchemeFactory.java:58)
> at
> org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:487)
> ... 28 more
>
>
> lib下有这些jar包
> $ ls lib/
> avro-1.8.2.jar
> flink-avro-1.11.1-sql-jar.jar
> flink-connector-jdbc_2.12-1.11.1.jar
> flink-csv-1.11.1.jar
> flink-dist_2.12-1.11.1.jar
> flink-json-1.11.1.jar
> flink-shaded-hadoop-2-uber-2.8.3-10.0.jar
> flink-shaded-zookeeper-3.4.14.jar
> flink-sql-connector-kafka_2.12-1.11.1.jar
> flink-table_2.12-1.11.1.jar
> flink-table-blink_2.12-1.11.1.jar
> kafka-clients-2.5.0.jar
> log4j-1.2-api-2.12.1.jar
> log4j-api-2.12.1.jar
> log4j-core-2.12.1.jar
> log4j-slf4j-impl-2.12.1.jar
> mysql-connector-java-5.1.49.jar
>
>
>
> 在 2020-08-10 10:13:44,"Yang Wang" <[hidden email]> 写道:
> >Matt Wang是正确的
> >
> >目前Flink发布的binary和镜像里面都没有flink-shaded-hadoop,所以需要你在官方镜像的基础再加一层
> >把flink-shaded-hadoop[1]打到/opt/flink/lib目录下
> >
> >FROM flinkCOPY /path/of/flink-shaded-hadoop-2-uber-*.jar $FLINK_HOME/lib/
> >
> >
> >[1].
> >
> https://mvnrepository.com/artifact/org.apache.flink/flink-shaded-hadoop-2-uber
> >
> >
> >Best,
> >Yang
> >
> >Matt Wang <[hidden email]> 于2020年8月7日周五 下午5:22写道:
> >
> >> 官网的镜像只包含 Flink 相关的内容,如果需要连接 HDFS,你需要将 Hadoop 相关包及配置打到镜像中
> >>
> >>
> >> --
> >>
> >> Best,
> >> Matt Wang
> >>
> >>
> >> 在2020年08月7日 12:49,caozhen<[hidden email]> 写道:
> >> 顺手贴一下flink1.11.1的hadoop集成wiki:
> >>
> >>
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/hadoop.html
> >>
> >> 根据官网说不再提供flink-shaded-hadoop-2-uber。并给出以下两种解决方式
> >>
> >> 1、建议使用HADOOP_CLASSPATH加载hadoop依赖
> >> 2、或者将hadoop依赖放到flink客户端lib目录下
> >>
> >> *我在用1.11.1 flink on
> >>
> >>
> yarn时,使用的是第二种方式,下载hadoop-src包,将一些常用依赖拷贝到lib目录下。(这可能会和你的mainjar程序发生类冲突问题,需要调试)
> >>
> >> 我觉得目前这种方式不好,只是暂时解决问题。还是应该有flink-shaded-hadoop包,正在尝试打包,有些问题还没完全解决。
> >> *
> >>
> >>
> >>
> >> --
> >> Sent from: http://apache-flink.147419.n8.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re:Re: Re: Flink 1.11.1 on k8s 如何配置hadoop

hechuan
Hi
恩,  重新试了下, 这种是可以的, 前面是我操作错了, 谢谢~
Thx

在 2020-08-10 13:36:36,"Yang Wang" <[hidden email]> 写道:

>你是自己打了一个新的镜像,把flink-shaded-hadoop-2-uber-2.8.3-10.0.jar放到lib下面了吗
>如果是的话不应该有这样的问题
>
>Best,
>Yang
>
>RS <[hidden email]> 于2020年8月10日周一 下午12:04写道:
>
>> Hi,
>> 我下载了flink-shaded-hadoop-2-uber-2.8.3-10.0.jar, 然后放到了lib下, 重启了集群,
>> 但是启动任务还是会报错:
>> Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException:
>> Could not find a file system implementation for scheme 'hdfs'. The scheme
>> is not directly supported by Flink and no Hadoop file system to support
>> this scheme could be loaded. For a full list of supported file systems,
>> please see
>> https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/.
>> at
>> org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:491)
>> at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:389)
>> at org.apache.flink.core.fs.Path.getFileSystem(Path.java:292)
>> at
>> org.apache.flink.runtime.state.filesystem.FsCheckpointStorage.<init>(FsCheckpointStorage.java:64)
>> at
>> org.apache.flink.runtime.state.filesystem.FsStateBackend.createCheckpointStorage(FsStateBackend.java:501)
>> at
>> org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createCheckpointStorage(RocksDBStateBackend.java:465)
>> at
>> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.<init>(CheckpointCoordinator.java:301)
>> ... 22 more
>> Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException:
>> Hadoop is not in the classpath/dependencies.
>> at
>> org.apache.flink.core.fs.UnsupportedSchemeFactory.create(UnsupportedSchemeFactory.java:58)
>> at
>> org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:487)
>> ... 28 more
>>
>>
>> lib下有这些jar包
>> $ ls lib/
>> avro-1.8.2.jar
>> flink-avro-1.11.1-sql-jar.jar
>> flink-connector-jdbc_2.12-1.11.1.jar
>> flink-csv-1.11.1.jar
>> flink-dist_2.12-1.11.1.jar
>> flink-json-1.11.1.jar
>> flink-shaded-hadoop-2-uber-2.8.3-10.0.jar
>> flink-shaded-zookeeper-3.4.14.jar
>> flink-sql-connector-kafka_2.12-1.11.1.jar
>> flink-table_2.12-1.11.1.jar
>> flink-table-blink_2.12-1.11.1.jar
>> kafka-clients-2.5.0.jar
>> log4j-1.2-api-2.12.1.jar
>> log4j-api-2.12.1.jar
>> log4j-core-2.12.1.jar
>> log4j-slf4j-impl-2.12.1.jar
>> mysql-connector-java-5.1.49.jar
>>
>>
>>
>> 在 2020-08-10 10:13:44,"Yang Wang" <[hidden email]> 写道:
>> >Matt Wang是正确的
>> >
>> >目前Flink发布的binary和镜像里面都没有flink-shaded-hadoop,所以需要你在官方镜像的基础再加一层
>> >把flink-shaded-hadoop[1]打到/opt/flink/lib目录下
>> >
>> >FROM flinkCOPY /path/of/flink-shaded-hadoop-2-uber-*.jar $FLINK_HOME/lib/
>> >
>> >
>> >[1].
>> >
>> https://mvnrepository.com/artifact/org.apache.flink/flink-shaded-hadoop-2-uber
>> >
>> >
>> >Best,
>> >Yang
>> >
>> >Matt Wang <[hidden email]> 于2020年8月7日周五 下午5:22写道:
>> >
>> >> 官网的镜像只包含 Flink 相关的内容,如果需要连接 HDFS,你需要将 Hadoop 相关包及配置打到镜像中
>> >>
>> >>
>> >> --
>> >>
>> >> Best,
>> >> Matt Wang
>> >>
>> >>
>> >> 在2020年08月7日 12:49,caozhen<[hidden email]> 写道:
>> >> 顺手贴一下flink1.11.1的hadoop集成wiki:
>> >>
>> >>
>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/hadoop.html
>> >>
>> >> 根据官网说不再提供flink-shaded-hadoop-2-uber。并给出以下两种解决方式
>> >>
>> >> 1、建议使用HADOOP_CLASSPATH加载hadoop依赖
>> >> 2、或者将hadoop依赖放到flink客户端lib目录下
>> >>
>> >> *我在用1.11.1 flink on
>> >>
>> >>
>> yarn时,使用的是第二种方式,下载hadoop-src包,将一些常用依赖拷贝到lib目录下。(这可能会和你的mainjar程序发生类冲突问题,需要调试)
>> >>
>> >> 我觉得目前这种方式不好,只是暂时解决问题。还是应该有flink-shaded-hadoop包,正在尝试打包,有些问题还没完全解决。
>> >> *
>> >>
>> >>
>> >>
>> >> --
>> >> Sent from: http://apache-flink.147419.n8.nabble.com/
>>
Reply | Threaded
Open this post in threaded view
|

Re: Re:Re: Re: Flink 1.11.1 on k8s 如何配置hadoop

yang
麻烦问一下,您是怎么从新打镜像的,是把原来的jar解压出来,然后在打包么?



--
Sent from: http://apache-flink.147419.n8.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Re:Re: Re: Flink 1.11.1 on k8s 如何配置hadoop

yang
In reply to this post by hechuan
麻烦问一下,从新打镜像,是把原来的包解压然后从新打包么



--
Sent from: http://apache-flink.147419.n8.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Re:Re: Re: Flink 1.11.1 on k8s 如何配置hadoop

Yang Wang
只需要base社区的镜像,然后再加上一层(拷贝flink-shaded-hadoop),commit到docker
image,然后push到docker registry就可以了

例如Dockerfile可以如下
FROM flink:1.11.1-scala_2.11
COPY flink-shaded-hadoop-2*.jar /opt/flink/lib/

另外,flink-shaded-hadoop可以从这里下载[1]

[1].
https://mvnrepository.com/artifact/org.apache.flink/flink-shaded-hadoop-2


Best,
Yang

yang <[hidden email]> 于2020年10月10日周六 下午5:50写道:

> 麻烦问一下,从新打镜像,是把原来的包解压然后从新打包么
>
>
>
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Re:Re: Re: Flink 1.11.1 on k8s 如何配置hadoop

Dream-底限
hi、可以去hadoop的一个节点直接打镜像哈,打镜像的时候把需要的hadoop依赖包、flink一起打包到docker里面,然后配置一下环境变量就可以用了;如果你的docker部署节点有hadoop或flink也可以直接外挂;目前我们使用的是第一种

Yang Wang <[hidden email]> 于2020年10月12日周一 上午10:23写道:

> 只需要base社区的镜像,然后再加上一层(拷贝flink-shaded-hadoop),commit到docker
> image,然后push到docker registry就可以了
>
> 例如Dockerfile可以如下
> FROM flink:1.11.1-scala_2.11
> COPY flink-shaded-hadoop-2*.jar /opt/flink/lib/
>
> 另外,flink-shaded-hadoop可以从这里下载[1]
>
> [1].
> https://mvnrepository.com/artifact/org.apache.flink/flink-shaded-hadoop-2
>
>
> Best,
> Yang
>
> yang <[hidden email]> 于2020年10月10日周六 下午5:50写道:
>
> > 麻烦问一下,从新打镜像,是把原来的包解压然后从新打包么
> >
> >
> >
> > --
> > Sent from: http://apache-flink.147419.n8.nabble.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: Re:Re: Re: Flink 1.11.1 on k8s 如何配置hadoop

mts_geek
In reply to this post by hechuan
你好,我也遇到了这个问题,请问下你具体是如何打镜像的呢?
我是Dockerfile里添加
COPY --chown=flink:flink jars/flink-shaded-hadoop-2-2.8.3-10.0.jar
$FLINK_HOME/lib/

但是运行flink 1.11 on k8s的session cluster,
jobserver能启动,但是提交job后报错。说不能初始化HadoopUtils,
但是flink-shaded-hadoop-2-2.8.3-10.0.jar里的确是有HadoopUtils这个class的。


Caused by: java.lang.NoClassDefFoundError: Could not initialize class
org.apache.flink.runtime.util.HadoopUtils

<http://apache-flink.147419.n8.nabble.com/file/t1510/WX20210522-115421%402x.png>




--
Sent from: http://apache-flink.147419.n8.nabble.com/