Apache Flink 中文用户邮件列表

Flink 1.10 on Yarn

Classic

List

Threaded

7 messages Options

hailong

Flink 1.10 on Yarn

Hi

场景：1 tm 三个slot，run了三个job

三个job 运行的时候出现了 ck 过程中空指针异常，导致任务一致重启。最终导致`Metaspace` 空间占满，出现 `java.lang.OutOfMemoryError: Metaspace. The metaspace out-of-memory error has occurred. This can mean two things: either the job requires a larger size of JVM metaspace to load classes or there is a class loading leak. In the first case 'taskmanager.memory.jvm-metaspace.size' configuration option should be increased. If the error persists (usually in cluster after several job (re-)submissions) then there is probably a class loading leak which has to be investigated and fixed. The task executor has to be shutdown...
`

附件为部分异常信息

疑问：
1. 为什么会在 ck 时候出现空指针？（三个 job 为同一个 kafka topic，通过ck 恢复 job 可以正常运行，应该不是数据的问题）
2. 通过日志看，是可以重启的，为什么自动重启后还存在这个问题，导致一直重启？

感谢~~~
从网易邮箱大师发来的云附件
08-07error.txt(730.4KB,2020年8月22日 11:37 到期)
下载

Yangze Guo

Re: Flink 1.10 on Yarn

日志没有贴成功，taskmanager.memory.jvm-metaspace.size目前是默认配置么？

Best,
Yangze Guo

On Fri, Aug 7, 2020 at 11:38 AM xuhaiLong <[hidden email]> wrote:

>
>
>
> Hi
>
>
> 场景：1 tm 三个slot，run了三个job
>
>
> 三个job 运行的时候出现了 ck 过程中空指针异常，导致任务一致重启。最终导致`Metaspace` 空间占满，出现 `java.lang.OutOfMemoryError: Metaspace. The metaspace out-of-memory error has occurred. This can mean two things: either the job requires a larger size of JVM metaspace to load classes or there is a class loading leak. In the first case 'taskmanager.memory.jvm-metaspace.size' configuration option should be increased. If the error persists (usually in cluster after several job (re-)submissions) then there is probably a class loading leak which has to be investigated and fixed. The task executor has to be shutdown...
> `
>
>
> 附件为部分异常信息
>
>
> 疑问：
> 1. 为什么会在 ck 时候出现空指针？（三个 job 为同一个 kafka topic，通过ck 恢复 job 可以正常运行，应该不是数据的问题）
> 2. 通过日志看，是可以重启的，为什么自动重启后还存在这个问题，导致一直重启？
>
>
> 感谢~~~
> 从网易邮箱大师发来的云附件
> 08-07error.txt(730.4KB,2020年8月22日 11:37 到期)
> 下载

hailong

Re: Flink 1.10 on Yarn

sorry，我添加错附件了

是的，taskmanager.memory.jvm-metaspace.size 为默认配置

On 8/7/2020 11:43，[hidden email] wrote：

日志没有贴成功，taskmanager.memory.jvm-metaspace.size目前是默认配置么？

Best,
Yangze Guo

On Fri, Aug 7, 2020 at 11:38 AM xuhaiLong <[hidden email]> wrote:

Hi

场景：1 tm 三个slot，run了三个job

三个job 运行的时候出现了 ck 过程中空指针异常，导致任务一致重启。最终导致`Metaspace` 空间占满，出现 `java.lang.OutOfMemoryError: Metaspace. The metaspace out-of-memory error has occurred. This can mean two things: either the job requires a larger size of JVM metaspace to load classes or there is a class loading leak. In the first case 'taskmanager.memory.jvm-metaspace.size' configuration option should be increased. If the error persists (usually in cluster after several job (re-)submissions) then there is probably a class loading leak which has to be investigated and fixed. The task executor has to be shutdown...
`

附件为部分异常信息

疑问：
1. 为什么会在 ck 时候出现空指针？（三个 job 为同一个 kafka topic，通过ck 恢复 job 可以正常运行，应该不是数据的问题）
2. 通过日志看，是可以重启的，为什么自动重启后还存在这个问题，导致一直重启？

感谢~~~
从网易邮箱大师发来的云附件
08-07error.txt(730.4KB,2020年8月22日 11:37 到期)
下载

08-071-error.txt (365K) Download Attachment

chenkai

Re:Re: Flink 1.10 on Yarn

hi xuhaiLong,看日志发生的 checkpoint nullpointer 是个已知的问题，具体可以查看下面两个jira。
你用的jdk版本是多少呢？目前发现使用 jdk8_40/jdk8_60 + flink-1.10 会出现 checkpoint nullpointer，可以把jdk升级下版本试一下
https://issues.apache.org/jira/browse/FLINK-18196
https://issues.apache.org/jira/browse/FLINK-17479

在 2020-08-07 12:50:23，"xuhaiLong" <[hidden email]> 写道：

sorry，我添加错附件了

是的，taskmanager.memory.jvm-metaspace.size 为默认配置
On 8/7/2020 11:43，Yangze Guo<[hidden email]> wrote：
日志没有贴成功，taskmanager.memory.jvm-metaspace.size目前是默认配置么？

Best,
Yangze Guo

On Fri, Aug 7, 2020 at 11:38 AM xuhaiLong <[hidden email]> wrote:

Hi

场景：1 tm 三个slot，run了三个job

三个job 运行的时候出现了 ck 过程中空指针异常，导致任务一致重启。最终导致`Metaspace` 空间占满，出现 `java.lang.OutOfMemoryError: Metaspace. The metaspace out-of-memory error has occurred. This can mean two things: either the job requires a larger size of JVM metaspace to load classes or there is a class loading leak. In the first case 'taskmanager.memory.jvm-metaspace.size' configuration option should be increased. If the error persists (usually in cluster after several job (re-)submissions) then there is probably a class loading leak which has to be investigated and fixed. The task executor has to be shutdown...
`

附件为部分异常信息

疑问：
1. 为什么会在 ck 时候出现空指针？（三个 job 为同一个 kafka topic，通过ck 恢复 job 可以正常运行，应该不是数据的问题）
2. 通过日志看，是可以重启的，为什么自动重启后还存在这个问题，导致一直重启？

感谢~~~
从网易邮箱大师发来的云附件
08-07error.txt(730.4KB,2020年8月22日 11:37 到期)
下载

hailong

Re: Flink 1.10 on Yarn

感谢回复！我这边的确是这个bug 引起的

On 8/7/2020 13:43，chenkaibit<[hidden email]> wrote：
hi xuhaiLong,看日志发生的 checkpoint nullpointer 是个已知的问题，具体可以查看下面两个jira。
你用的jdk版本是多少呢？目前发现使用 jdk8_40/jdk8_60 + flink-1.10 会出现 checkpoint nullpointer，可以把jdk升级下版本试一下
https://issues.apache.org/jira/browse/FLINK-18196
https://issues.apache.org/jira/browse/FLINK-17479

在 2020-08-07 12:50:23，"xuhaiLong" <[hidden email]> 写道：

sorry，我添加错附件了

是的，taskmanager.memory.jvm-metaspace.size 为默认配置
On 8/7/2020 11:43，Yangze Guo<[hidden email]> wrote：
日志没有贴成功，taskmanager.memory.jvm-metaspace.size目前是默认配置么？

Best,
Yangze Guo

On Fri, Aug 7, 2020 at 11:38 AM xuhaiLong <[hidden email]> wrote:

Hi

场景：1 tm 三个slot，run了三个job

三个job 运行的时候出现了 ck 过程中空指针异常，导致任务一致重启。最终导致`Metaspace` 空间占满，出现 `java.lang.OutOfMemoryError: Metaspace. The metaspace out-of-memory error has occurred. This can mean two things: either the job requires a larger size of JVM metaspace to load classes or there is a class loading leak. In the first case 'taskmanager.memory.jvm-metaspace.size' configuration option should be increased. If the error persists (usually in cluster after several job (re-)submissions) then there is probably a class loading leak which has to be investigated and fixed. The task executor has to be shutdown...
`

附件为部分异常信息

疑问：
1. 为什么会在 ck 时候出现空指针？（三个 job 为同一个 kafka topic，通过ck 恢复 job 可以正常运行，应该不是数据的问题）
2. 通过日志看，是可以重启的，为什么自动重启后还存在这个问题，导致一直重启？

感谢~~~
从网易邮箱大师发来的云附件
08-07error.txt(730.4KB,2020年8月22日 11:37 到期)
下载

Congxian Qiu

Re: Flink 1.10 on Yarn

Hi xuhaiLong
请问你这个作业在这个版本是是必然出现 NPE 问题吗？另外 1.10 之前的版本有出现过这个问题吗？
Best,
Congxian

xuhaiLong <[hidden email]> 于2020年8月7日周五下午3:14写道：

> 感谢回复！我这边的确是这个bug 引起的
>
>
> On 8/7/2020 13:43，chenkaibit<[hidden email]> wrote：
> hi xuhaiLong,看日志发生的 checkpoint nullpointer 是个已知的问题，具体可以查看下面两个jira。
> 你用的jdk版本是多少呢？目前发现使用 jdk8_40/jdk8_60 + flink-1.10 会出现 checkpoint
> nullpointer，可以把jdk升级下版本试一下
> https://issues.apache.org/jira/browse/FLINK-18196
> https://issues.apache.org/jira/browse/FLINK-17479
>
>
>
>
> 在 2020-08-07 12:50:23，"xuhaiLong" <[hidden email]> 写道：
>
> sorry，我添加错附件了
>
>
> 是的，taskmanager.memory.jvm-metaspace.size 为默认配置
> On 8/7/2020 11:43，Yangze Guo<[hidden email]> wrote：
> 日志没有贴成功，taskmanager.memory.jvm-metaspace.size目前是默认配置么？
>
> Best,
> Yangze Guo
>
> On Fri, Aug 7, 2020 at 11:38 AM xuhaiLong <[hidden email]> wrote:
>
>
>
> Hi
>
>
> 场景：1 tm 三个slot，run了三个job
>
>
> 三个job 运行的时候出现了 ck 过程中空指针异常，导致任务一致重启。最终导致`Metaspace` 空间占满，出现
> `java.lang.OutOfMemoryError: Metaspace. The metaspace out-of-memory error
> has occurred. This can mean two things: either the job requires a larger
> size of JVM metaspace to load classes or there is a class loading leak. In
> the first case 'taskmanager.memory.jvm-metaspace.size' configuration option
> should be increased. If the error persists (usually in cluster after
> several job (re-)submissions) then there is probably a class loading leak
> which has to be investigated and fixed. The task executor has to be
> shutdown...
> `
>
>
> 附件为部分异常信息
>
>
> 疑问：
> 1. 为什么会在 ck 时候出现空指针？（三个 job 为同一个 kafka topic，通过ck 恢复 job 可以正常运行，应该不是数据的问题）
> 2. 通过日志看，是可以重启的，为什么自动重启后还存在这个问题，导致一直重启？
>
>
> 感谢~~~
> 从网易邮箱大师发来的云附件
> 08-07error.txt(730.4KB,2020年8月22日 11:37 到期)
> 下载
>

hailong

Re: Flink 1.10 on Yarn

@Congxian Qiu Sorry，刚看到。

之前使用的 flink 1.7，没有出现过这个问题。升级到 flink 1.10 后这个问题必现，但是时间不定。

On 8/9/2020 15:00，Congxian Qiu<[hidden email]> wrote：
Hi xuhaiLong
请问你这个作业在这个版本是是必然出现 NPE 问题吗？另外 1.10 之前的版本有出现过这个问题吗？
Best,
Congxian

xuhaiLong <[hidden email]> 于2020年8月7日周五下午3:14写道：

感谢回复！我这边的确是这个bug 引起的

On 8/7/2020 13:43，chenkaibit<[hidden email]> wrote：
hi xuhaiLong,看日志发生的 checkpoint nullpointer 是个已知的问题，具体可以查看下面两个jira。
你用的jdk版本是多少呢？目前发现使用 jdk8_40/jdk8_60 + flink-1.10 会出现 checkpoint
nullpointer，可以把jdk升级下版本试一下
https://issues.apache.org/jira/browse/FLINK-18196
https://issues.apache.org/jira/browse/FLINK-17479

在 2020-08-07 12:50:23，"xuhaiLong" <[hidden email]> 写道：

sorry，我添加错附件了

是的，taskmanager.memory.jvm-metaspace.size 为默认配置
On 8/7/2020 11:43，Yangze Guo<[hidden email]> wrote：
日志没有贴成功，taskmanager.memory.jvm-metaspace.size目前是默认配置么？

Best,
Yangze Guo

On Fri, Aug 7, 2020 at 11:38 AM xuhaiLong <[hidden email]> wrote:

Hi

场景：1 tm 三个slot，run了三个job

三个job 运行的时候出现了 ck 过程中空指针异常，导致任务一致重启。最终导致`Metaspace` 空间占满，出现
`java.lang.OutOfMemoryError: Metaspace. The metaspace out-of-memory error
has occurred. This can mean two things: either the job requires a larger
size of JVM metaspace to load classes or there is a class loading leak. In
the first case 'taskmanager.memory.jvm-metaspace.size' configuration option
should be increased. If the error persists (usually in cluster after
several job (re-)submissions) then there is probably a class loading leak
which has to be investigated and fixed. The task executor has to be
shutdown...
`

附件为部分异常信息

疑问：
1. 为什么会在 ck 时候出现空指针？（三个 job 为同一个 kafka topic，通过ck 恢复 job 可以正常运行，应该不是数据的问题）
2. 通过日志看，是可以重启的，为什么自动重启后还存在这个问题，导致一直重启？

感谢~~~
从网易邮箱大师发来的云附件
08-07error.txt(730.4KB,2020年8月22日 11:37 到期)
下载