Hi 场景:1 tm 三个slot,run了三个job 三个job 运行的时候 出现了 ck 过程中空指针异常,导致任务一致重启。最终导致`Metaspace` 空间占满,出现 `java.lang.OutOfMemoryError: Metaspace. The metaspace out-of-memory error has occurred. This can mean two things: either the job requires a larger size of JVM metaspace to load classes or there is a class loading leak. In the first case 'taskmanager.memory.jvm-metaspace.size' configuration option should be increased. If the error persists (usually in cluster after several job (re-)submissions) then there is probably a class loading leak which has to be investigated and fixed. The task executor has to be shutdown... ` 附件为部分异常信息 疑问: 1. 为什么会在 ck 时候出现空指针?(三个 job 为同一个 kafka topic,通过ck 恢复 job 可以正常运行,应该不是数据的问题) 2. 通过日志看,是可以重启的,为什么自动重启后还存在这个问题,导致一直重启? 感谢~~~ 从网易邮箱大师发来的云附件 08-07error.txt(730.4KB,2020年8月22日 11:37 到期) 下载 |
日志没有贴成功,taskmanager.memory.jvm-metaspace.size目前是默认配置么?
Best, Yangze Guo On Fri, Aug 7, 2020 at 11:38 AM xuhaiLong <[hidden email]> wrote: > > > > Hi > > > 场景:1 tm 三个slot,run了三个job > > > 三个job 运行的时候 出现了 ck 过程中空指针异常,导致任务一致重启。最终导致`Metaspace` 空间占满,出现 `java.lang.OutOfMemoryError: Metaspace. The metaspace out-of-memory error has occurred. This can mean two things: either the job requires a larger size of JVM metaspace to load classes or there is a class loading leak. In the first case 'taskmanager.memory.jvm-metaspace.size' configuration option should be increased. If the error persists (usually in cluster after several job (re-)submissions) then there is probably a class loading leak which has to be investigated and fixed. The task executor has to be shutdown... > ` > > > 附件为部分异常信息 > > > 疑问: > 1. 为什么会在 ck 时候出现空指针?(三个 job 为同一个 kafka topic,通过ck 恢复 job 可以正常运行,应该不是数据的问题) > 2. 通过日志看,是可以重启的,为什么自动重启后还存在这个问题,导致一直重启? > > > 感谢~~~ > 从网易邮箱大师发来的云附件 > 08-07error.txt(730.4KB,2020年8月22日 11:37 到期) > 下载 |
sorry,我添加错附件了
是的,taskmanager.memory.jvm-metaspace.size 为默认配置
On 8/7/2020 11:43,[hidden email] wrote:
日志没有贴成功,taskmanager.memory.jvm-metaspace.size目前是默认配置么? 08-071-error.txt (365K) Download Attachment |
hi xuhaiLong,看日志发生的 checkpoint nullpointer 是个已知的问题,具体可以查看下面两个jira。
你用的jdk版本是多少呢?目前发现使用 jdk8_40/jdk8_60 + flink-1.10 会出现 checkpoint nullpointer,可以把jdk升级下版本试一下 https://issues.apache.org/jira/browse/FLINK-18196 https://issues.apache.org/jira/browse/FLINK-17479 在 2020-08-07 12:50:23,"xuhaiLong" <[hidden email]> 写道: sorry,我添加错附件了 是的,taskmanager.memory.jvm-metaspace.size 为默认配置 On 8/7/2020 11:43,Yangze Guo<[hidden email]> wrote: 日志没有贴成功,taskmanager.memory.jvm-metaspace.size目前是默认配置么? Best, Yangze Guo On Fri, Aug 7, 2020 at 11:38 AM xuhaiLong <[hidden email]> wrote: Hi 场景:1 tm 三个slot,run了三个job 三个job 运行的时候 出现了 ck 过程中空指针异常,导致任务一致重启。最终导致`Metaspace` 空间占满,出现 `java.lang.OutOfMemoryError: Metaspace. The metaspace out-of-memory error has occurred. This can mean two things: either the job requires a larger size of JVM metaspace to load classes or there is a class loading leak. In the first case 'taskmanager.memory.jvm-metaspace.size' configuration option should be increased. If the error persists (usually in cluster after several job (re-)submissions) then there is probably a class loading leak which has to be investigated and fixed. The task executor has to be shutdown... ` 附件为部分异常信息 疑问: 1. 为什么会在 ck 时候出现空指针?(三个 job 为同一个 kafka topic,通过ck 恢复 job 可以正常运行,应该不是数据的问题) 2. 通过日志看,是可以重启的,为什么自动重启后还存在这个问题,导致一直重启? 感谢~~~ 从网易邮箱大师发来的云附件 08-07error.txt(730.4KB,2020年8月22日 11:37 到期) 下载 |
感谢回复!我这边的确是这个bug 引起的
On 8/7/2020 13:43,chenkaibit<[hidden email]> wrote: hi xuhaiLong,看日志发生的 checkpoint nullpointer 是个已知的问题,具体可以查看下面两个jira。 你用的jdk版本是多少呢?目前发现使用 jdk8_40/jdk8_60 + flink-1.10 会出现 checkpoint nullpointer,可以把jdk升级下版本试一下 https://issues.apache.org/jira/browse/FLINK-18196 https://issues.apache.org/jira/browse/FLINK-17479 在 2020-08-07 12:50:23,"xuhaiLong" <[hidden email]> 写道: sorry,我添加错附件了 是的,taskmanager.memory.jvm-metaspace.size 为默认配置 On 8/7/2020 11:43,Yangze Guo<[hidden email]> wrote: 日志没有贴成功,taskmanager.memory.jvm-metaspace.size目前是默认配置么? Best, Yangze Guo On Fri, Aug 7, 2020 at 11:38 AM xuhaiLong <[hidden email]> wrote: Hi 场景:1 tm 三个slot,run了三个job 三个job 运行的时候 出现了 ck 过程中空指针异常,导致任务一致重启。最终导致`Metaspace` 空间占满,出现 `java.lang.OutOfMemoryError: Metaspace. The metaspace out-of-memory error has occurred. This can mean two things: either the job requires a larger size of JVM metaspace to load classes or there is a class loading leak. In the first case 'taskmanager.memory.jvm-metaspace.size' configuration option should be increased. If the error persists (usually in cluster after several job (re-)submissions) then there is probably a class loading leak which has to be investigated and fixed. The task executor has to be shutdown... ` 附件为部分异常信息 疑问: 1. 为什么会在 ck 时候出现空指针?(三个 job 为同一个 kafka topic,通过ck 恢复 job 可以正常运行,应该不是数据的问题) 2. 通过日志看,是可以重启的,为什么自动重启后还存在这个问题,导致一直重启? 感谢~~~ 从网易邮箱大师发来的云附件 08-07error.txt(730.4KB,2020年8月22日 11:37 到期) 下载 |
Hi xuhaiLong
请问你这个作业在这个版本是是必然出现 NPE 问题吗?另外 1.10 之前的版本有出现过这个问题吗? Best, Congxian xuhaiLong <[hidden email]> 于2020年8月7日周五 下午3:14写道: > 感谢回复!我这边的确是这个bug 引起的 > > > On 8/7/2020 13:43,chenkaibit<[hidden email]> wrote: > hi xuhaiLong,看日志发生的 checkpoint nullpointer 是个已知的问题,具体可以查看下面两个jira。 > 你用的jdk版本是多少呢?目前发现使用 jdk8_40/jdk8_60 + flink-1.10 会出现 checkpoint > nullpointer,可以把jdk升级下版本试一下 > https://issues.apache.org/jira/browse/FLINK-18196 > https://issues.apache.org/jira/browse/FLINK-17479 > > > > > 在 2020-08-07 12:50:23,"xuhaiLong" <[hidden email]> 写道: > > sorry,我添加错附件了 > > > 是的,taskmanager.memory.jvm-metaspace.size 为默认配置 > On 8/7/2020 11:43,Yangze Guo<[hidden email]> wrote: > 日志没有贴成功,taskmanager.memory.jvm-metaspace.size目前是默认配置么? > > Best, > Yangze Guo > > On Fri, Aug 7, 2020 at 11:38 AM xuhaiLong <[hidden email]> wrote: > > > > Hi > > > 场景:1 tm 三个slot,run了三个job > > > 三个job 运行的时候 出现了 ck 过程中空指针异常,导致任务一致重启。最终导致`Metaspace` 空间占满,出现 > `java.lang.OutOfMemoryError: Metaspace. The metaspace out-of-memory error > has occurred. This can mean two things: either the job requires a larger > size of JVM metaspace to load classes or there is a class loading leak. In > the first case 'taskmanager.memory.jvm-metaspace.size' configuration option > should be increased. If the error persists (usually in cluster after > several job (re-)submissions) then there is probably a class loading leak > which has to be investigated and fixed. The task executor has to be > shutdown... > ` > > > 附件为部分异常信息 > > > 疑问: > 1. 为什么会在 ck 时候出现空指针?(三个 job 为同一个 kafka topic,通过ck 恢复 job 可以正常运行,应该不是数据的问题) > 2. 通过日志看,是可以重启的,为什么自动重启后还存在这个问题,导致一直重启? > > > 感谢~~~ > 从网易邮箱大师发来的云附件 > 08-07error.txt(730.4KB,2020年8月22日 11:37 到期) > 下载 > |
@Congxian Qiu Sorry,刚看到。
之前使用的 flink 1.7,没有出现过这个问题。升级到 flink 1.10 后这个问题必现,但是时间不定。 On 8/9/2020 15:00,Congxian Qiu<[hidden email]> wrote: Hi xuhaiLong 请问你这个作业在这个版本是是必然出现 NPE 问题吗?另外 1.10 之前的版本有出现过这个问题吗? Best, Congxian xuhaiLong <[hidden email]> 于2020年8月7日周五 下午3:14写道: 感谢回复!我这边的确是这个bug 引起的 On 8/7/2020 13:43,chenkaibit<[hidden email]> wrote: hi xuhaiLong,看日志发生的 checkpoint nullpointer 是个已知的问题,具体可以查看下面两个jira。 你用的jdk版本是多少呢?目前发现使用 jdk8_40/jdk8_60 + flink-1.10 会出现 checkpoint nullpointer,可以把jdk升级下版本试一下 https://issues.apache.org/jira/browse/FLINK-18196 https://issues.apache.org/jira/browse/FLINK-17479 在 2020-08-07 12:50:23,"xuhaiLong" <[hidden email]> 写道: sorry,我添加错附件了 是的,taskmanager.memory.jvm-metaspace.size 为默认配置 On 8/7/2020 11:43,Yangze Guo<[hidden email]> wrote: 日志没有贴成功,taskmanager.memory.jvm-metaspace.size目前是默认配置么? Best, Yangze Guo On Fri, Aug 7, 2020 at 11:38 AM xuhaiLong <[hidden email]> wrote: Hi 场景:1 tm 三个slot,run了三个job 三个job 运行的时候 出现了 ck 过程中空指针异常,导致任务一致重启。最终导致`Metaspace` 空间占满,出现 `java.lang.OutOfMemoryError: Metaspace. The metaspace out-of-memory error has occurred. This can mean two things: either the job requires a larger size of JVM metaspace to load classes or there is a class loading leak. In the first case 'taskmanager.memory.jvm-metaspace.size' configuration option should be increased. If the error persists (usually in cluster after several job (re-)submissions) then there is probably a class loading leak which has to be investigated and fixed. The task executor has to be shutdown... ` 附件为部分异常信息 疑问: 1. 为什么会在 ck 时候出现空指针?(三个 job 为同一个 kafka topic,通过ck 恢复 job 可以正常运行,应该不是数据的问题) 2. 通过日志看,是可以重启的,为什么自动重启后还存在这个问题,导致一直重启? 感谢~~~ 从网易邮箱大师发来的云附件 08-07error.txt(730.4KB,2020年8月22日 11:37 到期) 下载 |
Free forum by Nabble | Edit this page |