Hi 使用版本Flink 1.11,部署方式 kubernetes session。 TM个数30个,每个TM 4个slot。 job 并行度120.提交作业的时候出现大量的No hostname could be resolved for the IP address,JM time out,作业提交失败。web ui也会卡主无响应。 用wordCount,并行度只有1提交也会刷,no hostname的日志会刷个几条,然后正常提交,如果并行度一上去,就会超时。 部分日志如下: 2020-07-15 16:58:46,460 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.32.160.7, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. 2020-07-15 16:58:46,460 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.44.224.7, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. 2020-07-15 16:58:46,461 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.40.32.9, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. 2020-07-15 16:59:10,236 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - The heartbeat of JobManager with id 69a0d460de468888a9f41c770d963c0a timed out. 2020-07-15 16:59:10,236 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Disconnect job manager [hidden email]://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job e1554c737e37ed79688a15c746b6e9ef from the resource manager. how to deal with ? beset ! | | a511955993 | | 邮箱:[hidden email] | 签名由 网易邮箱大师 定制 |
Hi,SmileSmile.
个人之前有遇到过 类似 的host解析问题,可以从k8s的pod节点网络映射角度排查一下。 希望这对你有帮助。 祝好。 Roc Marshal 在 2020-07-15 17:04:18,"SmileSmile" <[hidden email]> 写道: > >Hi > >使用版本Flink 1.11,部署方式 kubernetes session。 TM个数30个,每个TM 4个slot。 job 并行度120.提交作业的时候出现大量的No hostname could be resolved for the IP address,JM time out,作业提交失败。web ui也会卡主无响应。 > >用wordCount,并行度只有1提交也会刷,no hostname的日志会刷个几条,然后正常提交,如果并行度一上去,就会超时。 > > >部分日志如下: > >2020-07-15 16:58:46,460 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.32.160.7, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. >2020-07-15 16:58:46,460 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.44.224.7, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. >2020-07-15 16:58:46,461 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.40.32.9, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. > >2020-07-15 16:59:10,236 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - The heartbeat of JobManager with id 69a0d460de468888a9f41c770d963c0a timed out. >2020-07-15 16:59:10,236 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Disconnect job manager [hidden email]://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job e1554c737e37ed79688a15c746b6e9ef from the resource manager. > > >how to deal with ? > > >beset ! > >| | >a511955993 >| >| >邮箱:[hidden email] >| > >签名由 网易邮箱大师 定制 |
Hi Roc
该现象在1.10.1版本没有,在1.11版本才出现。请问这个该如何查比较合适 | | a511955993 | | 邮箱:[hidden email] | 签名由 网易邮箱大师 定制 On 07/15/2020 17:16, Roc Marshal wrote: Hi,SmileSmile. 个人之前有遇到过 类似 的host解析问题,可以从k8s的pod节点网络映射角度排查一下。 希望这对你有帮助。 祝好。 Roc Marshal 在 2020-07-15 17:04:18,"SmileSmile" <[hidden email]> 写道: > >Hi > >使用版本Flink 1.11,部署方式 kubernetes session。 TM个数30个,每个TM 4个slot。 job 并行度120.提交作业的时候出现大量的No hostname could be resolved for the IP address,JM time out,作业提交失败。web ui也会卡主无响应。 > >用wordCount,并行度只有1提交也会刷,no hostname的日志会刷个几条,然后正常提交,如果并行度一上去,就会超时。 > > >部分日志如下: > >2020-07-15 16:58:46,460 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.32.160.7, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. >2020-07-15 16:58:46,460 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.44.224.7, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. >2020-07-15 16:58:46,461 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.40.32.9, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. > >2020-07-15 16:59:10,236 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - The heartbeat of JobManager with id 69a0d460de468888a9f41c770d963c0a timed out. >2020-07-15 16:59:10,236 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Disconnect job manager [hidden email]://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job e1554c737e37ed79688a15c746b6e9ef from the resource manager. > > >how to deal with ? > > >beset ! > >| | >a511955993 >| >| >邮箱:[hidden email] >| > >签名由 网易邮箱大师 定制 |
Hi
如果没有异常,GC 情况也正常的话,或许可以看一下 pod 的相关日志,如果开启了 HA 也可以看一下 zk 的日志。之前遇到过一次在 Yarn 环境中类似的现象是由于其他原因导致的,通过看 NM 日志以及 zk 日志发现的原因。 Best, Congxian SmileSmile <[hidden email]> 于2020年7月15日周三 下午5:20写道: > Hi Roc > > 该现象在1.10.1版本没有,在1.11版本才出现。请问这个该如何查比较合适 > > > > | | > a511955993 > | > | > 邮箱:[hidden email] > | > > 签名由 网易邮箱大师 定制 > > On 07/15/2020 17:16, Roc Marshal wrote: > Hi,SmileSmile. > 个人之前有遇到过 类似 的host解析问题,可以从k8s的pod节点网络映射角度排查一下。 > 希望这对你有帮助。 > > > 祝好。 > Roc Marshal > > > > > > > > > > > > 在 2020-07-15 17:04:18,"SmileSmile" <[hidden email]> 写道: > > > >Hi > > > >使用版本Flink 1.11,部署方式 kubernetes session。 TM个数30个,每个TM 4个slot。 job > 并行度120.提交作业的时候出现大量的No hostname could be resolved for the IP address,JM time > out,作业提交失败。web ui也会卡主无响应。 > > > >用wordCount,并行度只有1提交也会刷,no hostname的日志会刷个几条,然后正常提交,如果并行度一上去,就会超时。 > > > > > >部分日志如下: > > > >2020-07-15 16:58:46,460 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.32.160.7, using IP address > as host name. Local input split assignment (such as for HDFS files) may be > impacted. > >2020-07-15 16:58:46,460 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.44.224.7, using IP address > as host name. Local input split assignment (such as for HDFS files) may be > impacted. > >2020-07-15 16:58:46,461 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.40.32.9, using IP address > as host name. Local input split assignment (such as for HDFS files) may be > impacted. > > > >2020-07-15 16:59:10,236 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - The > heartbeat of JobManager with id 69a0d460de468888a9f41c770d963c0a timed out. > >2020-07-15 16:59:10,236 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Disconnect job manager 00000000000000000000000000000000 > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > e1554c737e37ed79688a15c746b6e9ef from the resource manager. > > > > > >how to deal with ? > > > > > >beset ! > > > >| | > >a511955993 > >| > >| > >邮箱:[hidden email] > >| > > > >签名由 网易邮箱大师 定制 > |
Hi,Congxian
因为是测试环境,没有配置HA,目前看到的信息,就是JM刷出来大量的no hostname could be resolved,jm失联,作业提交失败。 将jm内存配置为10g也是一样的情况(jobmanager.memory.pprocesa.size:10240m)。 在同一个环境将版本回退到1.10没有出现该问题,也不会刷如上报错。 是否有其他排查思路? Best! | | a511955993 | | 邮箱:[hidden email] | 签名由 网易邮箱大师 定制 On 07/16/2020 13:17, Congxian Qiu wrote: Hi 如果没有异常,GC 情况也正常的话,或许可以看一下 pod 的相关日志,如果开启了 HA 也可以看一下 zk 的日志。之前遇到过一次在 Yarn 环境中类似的现象是由于其他原因导致的,通过看 NM 日志以及 zk 日志发现的原因。 Best, Congxian SmileSmile <[hidden email]> 于2020年7月15日周三 下午5:20写道: > Hi Roc > > 该现象在1.10.1版本没有,在1.11版本才出现。请问这个该如何查比较合适 > > > > | | > a511955993 > | > | > 邮箱:[hidden email] > | > > 签名由 网易邮箱大师 定制 > > On 07/15/2020 17:16, Roc Marshal wrote: > Hi,SmileSmile. > 个人之前有遇到过 类似 的host解析问题,可以从k8s的pod节点网络映射角度排查一下。 > 希望这对你有帮助。 > > > 祝好。 > Roc Marshal > > > > > > > > > > > > 在 2020-07-15 17:04:18,"SmileSmile" <[hidden email]> 写道: > > > >Hi > > > >使用版本Flink 1.11,部署方式 kubernetes session。 TM个数30个,每个TM 4个slot。 job > 并行度120.提交作业的时候出现大量的No hostname could be resolved for the IP address,JM time > out,作业提交失败。web ui也会卡主无响应。 > > > >用wordCount,并行度只有1提交也会刷,no hostname的日志会刷个几条,然后正常提交,如果并行度一上去,就会超时。 > > > > > >部分日志如下: > > > >2020-07-15 16:58:46,460 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.32.160.7, using IP address > as host name. Local input split assignment (such as for HDFS files) may be > impacted. > >2020-07-15 16:58:46,460 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.44.224.7, using IP address > as host name. Local input split assignment (such as for HDFS files) may be > impacted. > >2020-07-15 16:58:46,461 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.40.32.9, using IP address > as host name. Local input split assignment (such as for HDFS files) may be > impacted. > > > >2020-07-15 16:59:10,236 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - The > heartbeat of JobManager with id 69a0d460de468888a9f41c770d963c0a timed out. > >2020-07-15 16:59:10,236 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Disconnect job manager 00000000000000000000000000000000 > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > e1554c737e37ed79688a15c746b6e9ef from the resource manager. > > > > > >how to deal with ? > > > > > >beset ! > > > >| | > >a511955993 > >| > >| > >邮箱:[hidden email] > >| > > > >签名由 网易邮箱大师 定制 > |
Hi
不确定 k8s 环境中能否看到 pod 的完整日志?类似 Yarn 的 NM 日志一样,如果有的话,可以尝试看一下这个 pod 的完整日志有没有什么发现 Best, Congxian SmileSmile <[hidden email]> 于2020年7月21日周二 下午3:19写道: > Hi,Congxian > > 因为是测试环境,没有配置HA,目前看到的信息,就是JM刷出来大量的no hostname could be > resolved,jm失联,作业提交失败。 > 将jm内存配置为10g也是一样的情况(jobmanager.memory.pprocesa.size:10240m)。 > > 在同一个环境将版本回退到1.10没有出现该问题,也不会刷如上报错。 > > > 是否有其他排查思路? > > Best! > > > > > | | > a511955993 > | > | > 邮箱:[hidden email] > | > > 签名由 网易邮箱大师 定制 > > On 07/16/2020 13:17, Congxian Qiu wrote: > Hi > 如果没有异常,GC 情况也正常的话,或许可以看一下 pod 的相关日志,如果开启了 HA 也可以看一下 zk 的日志。之前遇到过一次在 Yarn > 环境中类似的现象是由于其他原因导致的,通过看 NM 日志以及 zk 日志发现的原因。 > > Best, > Congxian > > > SmileSmile <[hidden email]> 于2020年7月15日周三 下午5:20写道: > > > Hi Roc > > > > 该现象在1.10.1版本没有,在1.11版本才出现。请问这个该如何查比较合适 > > > > > > > > | | > > a511955993 > > | > > | > > 邮箱:[hidden email] > > | > > > > 签名由 网易邮箱大师 定制 > > > > On 07/15/2020 17:16, Roc Marshal wrote: > > Hi,SmileSmile. > > 个人之前有遇到过 类似 的host解析问题,可以从k8s的pod节点网络映射角度排查一下。 > > 希望这对你有帮助。 > > > > > > 祝好。 > > Roc Marshal > > > > > > > > > > > > > > > > > > > > > > > > 在 2020-07-15 17:04:18,"SmileSmile" <[hidden email]> 写道: > > > > > >Hi > > > > > >使用版本Flink 1.11,部署方式 kubernetes session。 TM个数30个,每个TM 4个slot。 job > > 并行度120.提交作业的时候出现大量的No hostname could be resolved for the IP address,JM > time > > out,作业提交失败。web ui也会卡主无响应。 > > > > > >用wordCount,并行度只有1提交也会刷,no hostname的日志会刷个几条,然后正常提交,如果并行度一上去,就会超时。 > > > > > > > > >部分日志如下: > > > > > >2020-07-15 16:58:46,460 WARN > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > > hostname could be resolved for the IP address 10.32.160.7, using IP > address > > as host name. Local input split assignment (such as for HDFS files) may > be > > impacted. > > >2020-07-15 16:58:46,460 WARN > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > > hostname could be resolved for the IP address 10.44.224.7, using IP > address > > as host name. Local input split assignment (such as for HDFS files) may > be > > impacted. > > >2020-07-15 16:58:46,461 WARN > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > > hostname could be resolved for the IP address 10.40.32.9, using IP > address > > as host name. Local input split assignment (such as for HDFS files) may > be > > impacted. > > > > > >2020-07-15 16:59:10,236 INFO > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > The > > heartbeat of JobManager with id 69a0d460de468888a9f41c770d963c0a timed > out. > > >2020-07-15 16:59:10,236 INFO > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > > Disconnect job manager 00000000000000000000000000000000 > > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > > e1554c737e37ed79688a15c746b6e9ef from the resource manager. > > > > > > > > >how to deal with ? > > > > > > > > >beset ! > > > > > >| | > > >a511955993 > > >| > > >| > > >邮箱:[hidden email] > > >| > > > > > >签名由 网易邮箱大师 定制 > > > |
如果你的日志里面一直在刷No hostname could be resolved for the IP address,应该是集群的coredns
有问题,由ip地址反查hostname查不到。你可以起一个busybox验证一下是不是这个ip就解析不了,有 可能是coredns有问题 Best, Yang Congxian Qiu <[hidden email]> 于2020年7月21日周二 下午7:29写道: > Hi > 不确定 k8s 环境中能否看到 pod 的完整日志?类似 Yarn 的 NM 日志一样,如果有的话,可以尝试看一下这个 pod > 的完整日志有没有什么发现 > Best, > Congxian > > > SmileSmile <[hidden email]> 于2020年7月21日周二 下午3:19写道: > > > Hi,Congxian > > > > 因为是测试环境,没有配置HA,目前看到的信息,就是JM刷出来大量的no hostname could be > > resolved,jm失联,作业提交失败。 > > 将jm内存配置为10g也是一样的情况(jobmanager.memory.pprocesa.size:10240m)。 > > > > 在同一个环境将版本回退到1.10没有出现该问题,也不会刷如上报错。 > > > > > > 是否有其他排查思路? > > > > Best! > > > > > > > > > > | | > > a511955993 > > | > > | > > 邮箱:[hidden email] > > | > > > > 签名由 网易邮箱大师 定制 > > > > On 07/16/2020 13:17, Congxian Qiu wrote: > > Hi > > 如果没有异常,GC 情况也正常的话,或许可以看一下 pod 的相关日志,如果开启了 HA 也可以看一下 zk 的日志。之前遇到过一次在 > Yarn > > 环境中类似的现象是由于其他原因导致的,通过看 NM 日志以及 zk 日志发现的原因。 > > > > Best, > > Congxian > > > > > > SmileSmile <[hidden email]> 于2020年7月15日周三 下午5:20写道: > > > > > Hi Roc > > > > > > 该现象在1.10.1版本没有,在1.11版本才出现。请问这个该如何查比较合适 > > > > > > > > > > > > | | > > > a511955993 > > > | > > > | > > > 邮箱:[hidden email] > > > | > > > > > > 签名由 网易邮箱大师 定制 > > > > > > On 07/15/2020 17:16, Roc Marshal wrote: > > > Hi,SmileSmile. > > > 个人之前有遇到过 类似 的host解析问题,可以从k8s的pod节点网络映射角度排查一下。 > > > 希望这对你有帮助。 > > > > > > > > > 祝好。 > > > Roc Marshal > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 在 2020-07-15 17:04:18,"SmileSmile" <[hidden email]> 写道: > > > > > > > >Hi > > > > > > > >使用版本Flink 1.11,部署方式 kubernetes session。 TM个数30个,每个TM 4个slot。 job > > > 并行度120.提交作业的时候出现大量的No hostname could be resolved for the IP address,JM > > time > > > out,作业提交失败。web ui也会卡主无响应。 > > > > > > > >用wordCount,并行度只有1提交也会刷,no hostname的日志会刷个几条,然后正常提交,如果并行度一上去,就会超时。 > > > > > > > > > > > >部分日志如下: > > > > > > > >2020-07-15 16:58:46,460 WARN > > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > > > hostname could be resolved for the IP address 10.32.160.7, using IP > > address > > > as host name. Local input split assignment (such as for HDFS files) may > > be > > > impacted. > > > >2020-07-15 16:58:46,460 WARN > > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > > > hostname could be resolved for the IP address 10.44.224.7, using IP > > address > > > as host name. Local input split assignment (such as for HDFS files) may > > be > > > impacted. > > > >2020-07-15 16:58:46,461 WARN > > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > > > hostname could be resolved for the IP address 10.40.32.9, using IP > > address > > > as host name. Local input split assignment (such as for HDFS files) may > > be > > > impacted. > > > > > > > >2020-07-15 16:59:10,236 INFO > > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > > The > > > heartbeat of JobManager with id 69a0d460de468888a9f41c770d963c0a timed > > out. > > > >2020-07-15 16:59:10,236 INFO > > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > > > Disconnect job manager 00000000000000000000000000000000 > > > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > > > e1554c737e37ed79688a15c746b6e9ef from the resource manager. > > > > > > > > > > > >how to deal with ? > > > > > > > > > > > >beset ! > > > > > > > >| | > > > >a511955993 > > > >| > > > >| > > > >邮箱:[hidden email] > > > >| > > > > > > > >签名由 网易邮箱大师 定制 > > > > > > |
Hi,Yang Wang! 很开心可以收到你的回复,你的回复帮助很大,让我知道了问题的方向。我再补充些信息,希望可以帮我进一步判断一下问题根源。 在JM报错的地方,No hostname could be resolved for ip address xxxxx ,报出来的ip是k8s分配给flink pod的内网ip,不是宿主机的ip。请问这个问题可能出在哪里呢 Best! | | a511955993 | | 邮箱:[hidden email] | 签名由 网易邮箱大师 定制 On 07/22/2020 18:18, Yang Wang wrote: 如果你的日志里面一直在刷No hostname could be resolved for the IP address,应该是集群的coredns 有问题,由ip地址反查hostname查不到。你可以起一个busybox验证一下是不是这个ip就解析不了,有 可能是coredns有问题 Best, Yang Congxian Qiu <[hidden email]> 于2020年7月21日周二 下午7:29写道: > Hi > 不确定 k8s 环境中能否看到 pod 的完整日志?类似 Yarn 的 NM 日志一样,如果有的话,可以尝试看一下这个 pod > 的完整日志有没有什么发现 > Best, > Congxian > > > SmileSmile <[hidden email]> 于2020年7月21日周二 下午3:19写道: > > > Hi,Congxian > > > > 因为是测试环境,没有配置HA,目前看到的信息,就是JM刷出来大量的no hostname could be > > resolved,jm失联,作业提交失败。 > > 将jm内存配置为10g也是一样的情况(jobmanager.memory.pprocesa.size:10240m)。 > > > > 在同一个环境将版本回退到1.10没有出现该问题,也不会刷如上报错。 > > > > > > 是否有其他排查思路? > > > > Best! > > > > > > > > > > | | > > a511955993 > > | > > | > > 邮箱:[hidden email] > > | > > > > 签名由 网易邮箱大师 定制 > > > > On 07/16/2020 13:17, Congxian Qiu wrote: > > Hi > > 如果没有异常,GC 情况也正常的话,或许可以看一下 pod 的相关日志,如果开启了 HA 也可以看一下 zk 的日志。之前遇到过一次在 > Yarn > > 环境中类似的现象是由于其他原因导致的,通过看 NM 日志以及 zk 日志发现的原因。 > > > > Best, > > Congxian > > > > > > SmileSmile <[hidden email]> 于2020年7月15日周三 下午5:20写道: > > > > > Hi Roc > > > > > > 该现象在1.10.1版本没有,在1.11版本才出现。请问这个该如何查比较合适 > > > > > > > > > > > > | | > > > a511955993 > > > | > > > | > > > 邮箱:[hidden email] > > > | > > > > > > 签名由 网易邮箱大师 定制 > > > > > > On 07/15/2020 17:16, Roc Marshal wrote: > > > Hi,SmileSmile. > > > 个人之前有遇到过 类似 的host解析问题,可以从k8s的pod节点网络映射角度排查一下。 > > > 希望这对你有帮助。 > > > > > > > > > 祝好。 > > > Roc Marshal > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 在 2020-07-15 17:04:18,"SmileSmile" <[hidden email]> 写道: > > > > > > > >Hi > > > > > > > >使用版本Flink 1.11,部署方式 kubernetes session。 TM个数30个,每个TM 4个slot。 job > > > 并行度120.提交作业的时候出现大量的No hostname could be resolved for the IP address,JM > > time > > > out,作业提交失败。web ui也会卡主无响应。 > > > > > > > >用wordCount,并行度只有1提交也会刷,no hostname的日志会刷个几条,然后正常提交,如果并行度一上去,就会超时。 > > > > > > > > > > > >部分日志如下: > > > > > > > >2020-07-15 16:58:46,460 WARN > > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > > > hostname could be resolved for the IP address 10.32.160.7, using IP > > address > > > as host name. Local input split assignment (such as for HDFS files) may > > be > > > impacted. > > > >2020-07-15 16:58:46,460 WARN > > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > > > hostname could be resolved for the IP address 10.44.224.7, using IP > > address > > > as host name. Local input split assignment (such as for HDFS files) may > > be > > > impacted. > > > >2020-07-15 16:58:46,461 WARN > > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > > > hostname could be resolved for the IP address 10.40.32.9, using IP > > address > > > as host name. Local input split assignment (such as for HDFS files) may > > be > > > impacted. > > > > > > > >2020-07-15 16:59:10,236 INFO > > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > > The > > > heartbeat of JobManager with id 69a0d460de468888a9f41c770d963c0a timed > > out. > > > >2020-07-15 16:59:10,236 INFO > > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > > > Disconnect job manager 00000000000000000000000000000000 > > > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > > > e1554c737e37ed79688a15c746b6e9ef from the resource manager. > > > > > > > > > > > >how to deal with ? > > > > > > > > > > > >beset ! > > > > > > > >| | > > > >a511955993 > > > >| > > > >| > > > >邮箱:[hidden email] > > > >| > > > > > > > >签名由 网易邮箱大师 定制 > > > > > > |
我的意思就是你在Flink任务运行的过程中,然后下面的命令在集群里面起一个busybox的pod,
在里面执行 nslookup {ip_address},看看是否能够正常解析到。如果不能应该就是coredns的 问题了 kubectl run -i -t busybox --image=busybox --restart=Never 你需要确认下集群的coredns pod是否正常,一般是部署在kube-system这个namespace下的 Best, Yang SmileSmile <[hidden email]> 于2020年7月22日周三 下午7:57写道: > > Hi,Yang Wang! > > 很开心可以收到你的回复,你的回复帮助很大,让我知道了问题的方向。我再补充些信息,希望可以帮我进一步判断一下问题根源。 > > 在JM报错的地方,No hostname could be resolved for ip address xxxxx > ,报出来的ip是k8s分配给flink pod的内网ip,不是宿主机的ip。请问这个问题可能出在哪里呢 > > Best! > > > a511955993 > 邮箱:[hidden email] > > <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=a511955993&uid=a511955993%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22%E9%82%AE%E7%AE%B1%EF%BC%9Aa511955993%40163.com%22%5D> > > 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail88> 定制 > > On 07/22/2020 18:18, Yang Wang <[hidden email]> wrote: > 如果你的日志里面一直在刷No hostname could be resolved for the IP address,应该是集群的coredns > 有问题,由ip地址反查hostname查不到。你可以起一个busybox验证一下是不是这个ip就解析不了,有 > 可能是coredns有问题 > > > Best, > Yang > > Congxian Qiu <[hidden email]> 于2020年7月21日周二 下午7:29写道: > > > Hi > > 不确定 k8s 环境中能否看到 pod 的完整日志?类似 Yarn 的 NM 日志一样,如果有的话,可以尝试看一下这个 pod > > 的完整日志有没有什么发现 > > Best, > > Congxian > > > > > > SmileSmile <[hidden email]> 于2020年7月21日周二 下午3:19写道: > > > > > Hi,Congxian > > > > > > 因为是测试环境,没有配置HA,目前看到的信息,就是JM刷出来大量的no hostname could be > > > resolved,jm失联,作业提交失败。 > > > 将jm内存配置为10g也是一样的情况(jobmanager.memory.pprocesa.size:10240m)。 > > > > > > 在同一个环境将版本回退到1.10没有出现该问题,也不会刷如上报错。 > > > > > > > > > 是否有其他排查思路? > > > > > > Best! > > > > > > > > > > > > > > > | | > > > a511955993 > > > | > > > | > > > 邮箱:[hidden email] > > > | > > > > > > 签名由 网易邮箱大师 定制 > > > > > > On 07/16/2020 13:17, Congxian Qiu wrote: > > > Hi > > > 如果没有异常,GC 情况也正常的话,或许可以看一下 pod 的相关日志,如果开启了 HA 也可以看一下 zk 的日志。之前遇到过一次在 > > Yarn > > > 环境中类似的现象是由于其他原因导致的,通过看 NM 日志以及 zk 日志发现的原因。 > > > > > > Best, > > > Congxian > > > > > > > > > SmileSmile <[hidden email]> 于2020年7月15日周三 下午5:20写道: > > > > > > > Hi Roc > > > > > > > > 该现象在1.10.1版本没有,在1.11版本才出现。请问这个该如何查比较合适 > > > > > > > > > > > > > > > > | | > > > > a511955993 > > > > | > > > > | > > > > 邮箱:[hidden email] > > > > | > > > > > > > > 签名由 网易邮箱大师 定制 > > > > > > > > On 07/15/2020 17:16, Roc Marshal wrote: > > > > Hi,SmileSmile. > > > > 个人之前有遇到过 类似 的host解析问题,可以从k8s的pod节点网络映射角度排查一下。 > > > > 希望这对你有帮助。 > > > > > > > > > > > > 祝好。 > > > > Roc Marshal > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 在 2020-07-15 17:04:18,"SmileSmile" <[hidden email]> 写道: > > > > > > > > > >Hi > > > > > > > > > >使用版本Flink 1.11,部署方式 kubernetes session。 TM个数30个,每个TM 4个slot。 job > > > > 并行度120.提交作业的时候出现大量的No hostname could be resolved for the IP > address,JM > > > time > > > > out,作业提交失败。web ui也会卡主无响应。 > > > > > > > > > >用wordCount,并行度只有1提交也会刷,no hostname的日志会刷个几条,然后正常提交,如果并行度一上去,就会超时。 > > > > > > > > > > > > > > >部分日志如下: > > > > > > > > > >2020-07-15 16:58:46,460 WARN > > > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > > > > hostname could be resolved for the IP address 10.32.160.7, using IP > > > address > > > > as host name. Local input split assignment (such as for HDFS files) > may > > > be > > > > impacted. > > > > >2020-07-15 16:58:46,460 WARN > > > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > > > > hostname could be resolved for the IP address 10.44.224.7, using IP > > > address > > > > as host name. Local input split assignment (such as for HDFS files) > may > > > be > > > > impacted. > > > > >2020-07-15 16:58:46,461 WARN > > > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > > > > hostname could be resolved for the IP address 10.40.32.9, using IP > > > address > > > > as host name. Local input split assignment (such as for HDFS files) > may > > > be > > > > impacted. > > > > > > > > > >2020-07-15 16:59:10,236 INFO > > > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager > [] - > > > The > > > > heartbeat of JobManager with id 69a0d460de468888a9f41c770d963c0a > timed > > > out. > > > > >2020-07-15 16:59:10,236 INFO > > > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager > [] - > > > > Disconnect job manager 00000000000000000000000000000000 > > > > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for > job > > > > e1554c737e37ed79688a15c746b6e9ef from the resource manager. > > > > > > > > > > > > > > >how to deal with ? > > > > > > > > > > > > > > >beset ! > > > > > > > > > >| | > > > > >a511955993 > > > > >| > > > > >| > > > > >邮箱:[hidden email] > > > > >| > > > > > > > > > >签名由 网易邮箱大师 定制 > > > > > > > > > > > |
Hi Yang Wang 刚刚在测试环境测试了一下,taskManager没有办法nslookup出来,JM可以nslookup,这两者的差别在于是否有service。 解决方案:我这边给集群加上了taskmanager-query-state-service.yaml(按照官网上是可选服务)。就不会刷No hostname could be resolved for ip address,将NodePort改为ClusterIp,作业就可以成功提交,不会出现time out的问题了,问题得到了解决。 1. 如果按照上面的情况,那么这个配置文件是必须配置的? 2. 在1.11的更新中,发现有 [Flink-15911][Flink-15154] 支持分别配置用于本地监听绑定的网络接口和外部访问的地址和端口。是否是这块的改动, 需要JM去通过TM上报的ip反向解析出service? Bset! [1]https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/kubernetes.html | | a511955993 | | 邮箱:[hidden email] | 签名由 网易邮箱大师 定制 On 07/23/2020 10:11, Yang Wang wrote: 我的意思就是你在Flink任务运行的过程中,然后下面的命令在集群里面起一个busybox的pod, 在里面执行 nslookup {ip_address},看看是否能够正常解析到。如果不能应该就是coredns的 问题了 kubectl run -i -t busybox --image=busybox --restart=Never 你需要确认下集群的coredns pod是否正常,一般是部署在kube-system这个namespace下的 Best, Yang SmileSmile <[hidden email]> 于2020年7月22日周三 下午7:57写道: > > Hi,Yang Wang! > > 很开心可以收到你的回复,你的回复帮助很大,让我知道了问题的方向。我再补充些信息,希望可以帮我进一步判断一下问题根源。 > > 在JM报错的地方,No hostname could be resolved for ip address xxxxx > ,报出来的ip是k8s分配给flink pod的内网ip,不是宿主机的ip。请问这个问题可能出在哪里呢 > > Best! > > > a511955993 > 邮箱:[hidden email] > > <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=a511955993&uid=a511955993%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22%E9%82%AE%E7%AE%B1%EF%BC%9Aa511955993%40163.com%22%5D> > > 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail88> 定制 > > On 07/22/2020 18:18, Yang Wang <[hidden email]> wrote: > 如果你的日志里面一直在刷No hostname could be resolved for the IP address,应该是集群的coredns > 有问题,由ip地址反查hostname查不到。你可以起一个busybox验证一下是不是这个ip就解析不了,有 > 可能是coredns有问题 > > > Best, > Yang > > Congxian Qiu <[hidden email]> 于2020年7月21日周二 下午7:29写道: > > > Hi > > 不确定 k8s 环境中能否看到 pod 的完整日志?类似 Yarn 的 NM 日志一样,如果有的话,可以尝试看一下这个 pod > > 的完整日志有没有什么发现 > > Best, > > Congxian > > > > > > SmileSmile <[hidden email]> 于2020年7月21日周二 下午3:19写道: > > > > > Hi,Congxian > > > > > > 因为是测试环境,没有配置HA,目前看到的信息,就是JM刷出来大量的no hostname could be > > > resolved,jm失联,作业提交失败。 > > > 将jm内存配置为10g也是一样的情况(jobmanager.memory.pprocesa.size:10240m)。 > > > > > > 在同一个环境将版本回退到1.10没有出现该问题,也不会刷如上报错。 > > > > > > > > > 是否有其他排查思路? > > > > > > Best! > > > > > > > > > > > > > > > | | > > > a511955993 > > > | > > > | > > > 邮箱:[hidden email] > > > | > > > > > > 签名由 网易邮箱大师 定制 > > > > > > On 07/16/2020 13:17, Congxian Qiu wrote: > > > Hi > > > 如果没有异常,GC 情况也正常的话,或许可以看一下 pod 的相关日志,如果开启了 HA 也可以看一下 zk 的日志。之前遇到过一次在 > > Yarn > > > 环境中类似的现象是由于其他原因导致的,通过看 NM 日志以及 zk 日志发现的原因。 > > > > > > Best, > > > Congxian > > > > > > > > > SmileSmile <[hidden email]> 于2020年7月15日周三 下午5:20写道: > > > > > > > Hi Roc > > > > > > > > 该现象在1.10.1版本没有,在1.11版本才出现。请问这个该如何查比较合适 > > > > > > > > > > > > > > > > | | > > > > a511955993 > > > > | > > > > | > > > > 邮箱:[hidden email] > > > > | > > > > > > > > 签名由 网易邮箱大师 定制 > > > > > > > > On 07/15/2020 17:16, Roc Marshal wrote: > > > > Hi,SmileSmile. > > > > 个人之前有遇到过 类似 的host解析问题,可以从k8s的pod节点网络映射角度排查一下。 > > > > 希望这对你有帮助。 > > > > > > > > > > > > 祝好。 > > > > Roc Marshal > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 在 2020-07-15 17:04:18,"SmileSmile" <[hidden email]> 写道: > > > > > > > > > >Hi > > > > > > > > > >使用版本Flink 1.11,部署方式 kubernetes session。 TM个数30个,每个TM 4个slot。 job > > > > 并行度120.提交作业的时候出现大量的No hostname could be resolved for the IP > address,JM > > > time > > > > out,作业提交失败。web ui也会卡主无响应。 > > > > > > > > > >用wordCount,并行度只有1提交也会刷,no hostname的日志会刷个几条,然后正常提交,如果并行度一上去,就会超时。 > > > > > > > > > > > > > > >部分日志如下: > > > > > > > > > >2020-07-15 16:58:46,460 WARN > > > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > > > > hostname could be resolved for the IP address 10.32.160.7, using IP > > > address > > > > as host name. Local input split assignment (such as for HDFS files) > may > > > be > > > > impacted. > > > > >2020-07-15 16:58:46,460 WARN > > > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > > > > hostname could be resolved for the IP address 10.44.224.7, using IP > > > address > > > > as host name. Local input split assignment (such as for HDFS files) > may > > > be > > > > impacted. > > > > >2020-07-15 16:58:46,461 WARN > > > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > > > > hostname could be resolved for the IP address 10.40.32.9, using IP > > > address > > > > as host name. Local input split assignment (such as for HDFS files) > may > > > be > > > > impacted. > > > > > > > > > >2020-07-15 16:59:10,236 INFO > > > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager > [] - > > > The > > > > heartbeat of JobManager with id 69a0d460de468888a9f41c770d963c0a > timed > > > out. > > > > >2020-07-15 16:59:10,236 INFO > > > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager > [] - > > > > Disconnect job manager 00000000000000000000000000000000 > > > > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for > job > > > > e1554c737e37ed79688a15c746b6e9ef from the resource manager. > > > > > > > > > > > > > > >how to deal with ? > > > > > > > > > > > > > > >beset ! > > > > > > > > > >| | > > > > >a511955993 > > > > >| > > > > >| > > > > >邮箱:[hidden email] > > > > >| > > > > > > > > > >签名由 网易邮箱大师 定制 > > > > > > > > > > > |
很高兴你的问题解决了,但我觉得根本原因应该不是加上了taskmanager-query-state-service.yaml的关系。
我这边不创建这个服务也是正常的,而且nslookup {tm_ip_address}是可以正常反解析到hostname的。 注意这里不是解析hostname,而是通过ip地址来反解析进行验证 回答你说的两个问题: 1. 不是必须的,我这边验证不需要创建,集群也是可以正常运行任务的。Rest service的暴露方式是ClusterIP、NodePort、LoadBalancer都正常 2. 如果没有配置taskmanager.bind-host, [Flink-15911][Flink-15154]这两个JIRA并不会影响TM向RM注册时候的使用的地址 如果你想找到根本原因,那可能需要你这边提供JM/TM的完整log,这样方便分析 Best, Yang SmileSmile <[hidden email]> 于2020年7月23日周四 上午11:30写道: > > Hi Yang Wang > > 刚刚在测试环境测试了一下,taskManager没有办法nslookup出来,JM可以nslookup,这两者的差别在于是否有service。 > > 解决方案:我这边给集群加上了taskmanager-query-state-service.yaml(按照官网上是可选服务)。就不会刷No > hostname could be resolved for ip > address,将NodePort改为ClusterIp,作业就可以成功提交,不会出现time out的问题了,问题得到了解决。 > > > 1. 如果按照上面的情况,那么这个配置文件是必须配置的? > > 2. 在1.11的更新中,发现有 [Flink-15911][Flink-15154] > 支持分别配置用于本地监听绑定的网络接口和外部访问的地址和端口。是否是这块的改动, > 需要JM去通过TM上报的ip反向解析出service? > > > Bset! > > > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/kubernetes.html > > a511955993 > 邮箱:[hidden email] > > <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=a511955993&uid=a511955993%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22%E9%82%AE%E7%AE%B1%EF%BC%9Aa511955993%40163.com%22%5D> > > 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail88> 定制 > > On 07/23/2020 10:11, Yang Wang <[hidden email]> wrote: > 我的意思就是你在Flink任务运行的过程中,然后下面的命令在集群里面起一个busybox的pod, > 在里面执行 nslookup {ip_address},看看是否能够正常解析到。如果不能应该就是coredns的 > 问题了 > > kubectl run -i -t busybox --image=busybox --restart=Never > > 你需要确认下集群的coredns pod是否正常,一般是部署在kube-system这个namespace下的 > > > > Best, > Yang > > > SmileSmile <[hidden email]> 于2020年7月22日周三 下午7:57写道: > > > > > Hi,Yang Wang! > > > > 很开心可以收到你的回复,你的回复帮助很大,让我知道了问题的方向。我再补充些信息,希望可以帮我进一步判断一下问题根源。 > > > > 在JM报错的地方,No hostname could be resolved for ip address xxxxx > > ,报出来的ip是k8s分配给flink pod的内网ip,不是宿主机的ip。请问这个问题可能出在哪里呢 > > > > Best! > > > > > > a511955993 > > 邮箱:[hidden email] > > > > < > https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=a511955993&uid=a511955993%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22%E9%82%AE%E7%AE%B1%EF%BC%9Aa511955993%40163.com%22%5D> > > > > > 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail88> 定制 > > > > On 07/22/2020 18:18, Yang Wang <[hidden email]> wrote: > > 如果你的日志里面一直在刷No hostname could be resolved for the IP > address,应该是集群的coredns > > 有问题,由ip地址反查hostname查不到。你可以起一个busybox验证一下是不是这个ip就解析不了,有 > > 可能是coredns有问题 > > > > > > Best, > > Yang > > > > Congxian Qiu <[hidden email]> 于2020年7月21日周二 下午7:29写道: > > > > > Hi > > > 不确定 k8s 环境中能否看到 pod 的完整日志?类似 Yarn 的 NM 日志一样,如果有的话,可以尝试看一下这个 pod > > > 的完整日志有没有什么发现 > > > Best, > > > Congxian > > > > > > > > > SmileSmile <[hidden email]> 于2020年7月21日周二 下午3:19写道: > > > > > > > Hi,Congxian > > > > > > > > 因为是测试环境,没有配置HA,目前看到的信息,就是JM刷出来大量的no hostname could be > > > > resolved,jm失联,作业提交失败。 > > > > 将jm内存配置为10g也是一样的情况(jobmanager.memory.pprocesa.size:10240m)。 > > > > > > > > 在同一个环境将版本回退到1.10没有出现该问题,也不会刷如上报错。 > > > > > > > > > > > > 是否有其他排查思路? > > > > > > > > Best! > > > > > > > > > > > > > > > > > > > > | | > > > > a511955993 > > > > | > > > > | > > > > 邮箱:[hidden email] > > > > | > > > > > > > > 签名由 网易邮箱大师 定制 > > > > > > > > On 07/16/2020 13:17, Congxian Qiu wrote: > > > > Hi > > > > 如果没有异常,GC 情况也正常的话,或许可以看一下 pod 的相关日志,如果开启了 HA 也可以看一下 zk > 的日志。之前遇到过一次在 > > > Yarn > > > > 环境中类似的现象是由于其他原因导致的,通过看 NM 日志以及 zk 日志发现的原因。 > > > > > > > > Best, > > > > Congxian > > > > > > > > > > > > SmileSmile <[hidden email]> 于2020年7月15日周三 下午5:20写道: > > > > > > > > > Hi Roc > > > > > > > > > > 该现象在1.10.1版本没有,在1.11版本才出现。请问这个该如何查比较合适 > > > > > > > > > > > > > > > > > > > > | | > > > > > a511955993 > > > > > | > > > > > | > > > > > 邮箱:[hidden email] > > > > > | > > > > > > > > > > 签名由 网易邮箱大师 定制 > > > > > > > > > > On 07/15/2020 17:16, Roc Marshal wrote: > > > > > Hi,SmileSmile. > > > > > 个人之前有遇到过 类似 的host解析问题,可以从k8s的pod节点网络映射角度排查一下。 > > > > > 希望这对你有帮助。 > > > > > > > > > > > > > > > 祝好。 > > > > > Roc Marshal > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 在 2020-07-15 17:04:18,"SmileSmile" <[hidden email]> 写道: > > > > > > > > > > > >Hi > > > > > > > > > > > >使用版本Flink 1.11,部署方式 kubernetes session。 TM个数30个,每个TM 4个slot。 job > > > > > 并行度120.提交作业的时候出现大量的No hostname could be resolved for the IP > > address,JM > > > > time > > > > > out,作业提交失败。web ui也会卡主无响应。 > > > > > > > > > > > >用wordCount,并行度只有1提交也会刷,no hostname的日志会刷个几条,然后正常提交,如果并行度一上去,就会超时。 > > > > > > > > > > > > > > > > > >部分日志如下: > > > > > > > > > > > >2020-07-15 16:58:46,460 WARN > > > > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - > No > > > > > hostname could be resolved for the IP address 10.32.160.7, using > IP > > > > address > > > > > as host name. Local input split assignment (such as for HDFS > files) > > may > > > > be > > > > > impacted. > > > > > >2020-07-15 16:58:46,460 WARN > > > > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - > No > > > > > hostname could be resolved for the IP address 10.44.224.7, using > IP > > > > address > > > > > as host name. Local input split assignment (such as for HDFS > files) > > may > > > > be > > > > > impacted. > > > > > >2020-07-15 16:58:46,461 WARN > > > > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - > No > > > > > hostname could be resolved for the IP address 10.40.32.9, using IP > > > > address > > > > > as host name. Local input split assignment (such as for HDFS > files) > > may > > > > be > > > > > impacted. > > > > > > > > > > > >2020-07-15 16:59:10,236 INFO > > > > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager > > [] - > > > > The > > > > > heartbeat of JobManager with id 69a0d460de468888a9f41c770d963c0a > > timed > > > > out. > > > > > >2020-07-15 16:59:10,236 INFO > > > > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager > > [] - > > > > > Disconnect job manager 00000000000000000000000000000000 > > > > > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for > > job > > > > > e1554c737e37ed79688a15c746b6e9ef from the resource manager. > > > > > > > > > > > > > > > > > >how to deal with ? > > > > > > > > > > > > > > > > > >beset ! > > > > > > > > > > > >| | > > > > > >a511955993 > > > > > >| > > > > > >| > > > > > >邮箱:[hidden email] > > > > > >| > > > > > > > > > > > >签名由 网易邮箱大师 定制 > > > > > > > > > > > > > > > > > > |
Hi Yang Wang
先分享下我这边的环境版本 kubernetes:1.17.4. CNI: weave 1 2 3 是我的一些疑惑 4 是JM日志 1. 去掉taskmanager-query-state-service.yaml后确实不行 nslookup kubectl exec -it busybox2 -- /bin/sh / # nslookup 10.47.96.2 Server: 10.96.0.10 Address: 10.96.0.10:53 ** server can't find 2.96.47.10.in-addr.arpa: NXDOMAIN 2. Flink1.11和Flink1.10 detail subtasks taskmanagers xxx x 这行 1.11变成了172-20-0-50。1.10是flink-taskmanager-7b5d6958b6-sfzlk:36459。这块的改动是?(目前这个集群跑着1.10和1.11,1.10可以正常运行,如果coredns有问题,1.10版本的flink应该也有一样的情况吧?) 3. coredns是否特殊配置? 在容器中解析域名是正常的,只是反向解析没有service才会有问题。coredns是否有什么需要配置? 4. time out时候的JM日志如下: 2020-07-23 13:53:00,228 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - ResourceManager akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_0 was granted leadership with fencing token 00000000000000000000000000000000 2020-07-23 13:53:00,232 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at akka://flink/user/rpc/dispatcher_1 . 2020-07-23 13:53:00,233 INFO org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - Starting the SlotManager. 2020-07-23 13:53:03,472 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Registering TaskManager with ResourceID 1f9ae0cd95a28943a73be26323588696 (akka.tcp://flink@10.34.128.9:6122/user/rpc/taskmanager_0) at ResourceManager 2020-07-23 13:53:03,777 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Registering TaskManager with ResourceID cac09e751264e61615329c20713a84b4 (akka.tcp://flink@10.32.160.6:6122/user/rpc/taskmanager_0) at ResourceManager 2020-07-23 13:53:03,787 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Registering TaskManager with ResourceID 93c72d01d09f9ae427c5fc980ed4c1e4 (akka.tcp://flink@10.39.0.8:6122/user/rpc/taskmanager_0) at ResourceManager 2020-07-23 13:53:04,044 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Registering TaskManager with ResourceID 8adf2f8e81b77a16d5418a9e252c61e2 (akka.tcp://flink@10.38.64.7:6122/user/rpc/taskmanager_0) at ResourceManager 2020-07-23 13:53:04,099 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Registering TaskManager with ResourceID 23e9d2358f6eb76b9ae718d879d4f330 (akka.tcp://flink@10.42.160.6:6122/user/rpc/taskmanager_0) at ResourceManager 2020-07-23 13:53:04,146 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Registering TaskManager with ResourceID 092f8dee299e32df13db3111662b61f8 (akka.tcp://flink@10.33.192.14:6122/user/rpc/taskmanager_0) at ResourceManager 2020-07-23 13:55:44,220 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Received JobGraph submission 99a030d0e3f428490a501c0132f27a56 (JobTest). 2020-07-23 13:55:44,222 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Submitting job 99a030d0e3f428490a501c0132f27a56 (JobTest). 2020-07-23 13:55:44,251 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting RPC endpoint for org.apache.flink.runtime.jobmaster.JobMaster at akka://flink/user/rpc/jobmanager_2 . 2020-07-23 13:55:44,260 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Initializing job JobTest (99a030d0e3f428490a501c0132f27a56). 2020-07-23 13:55:44,278 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Using restart back off time strategy NoRestartBackoffTimeStrategy for JobTest (99a030d0e3f428490a501c0132f27a56). 2020-07-23 13:55:44,319 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Running initialization on master for job JobTest (99a030d0e3f428490a501c0132f27a56). 2020-07-23 13:55:44,319 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Successfully ran initialization on master in 0 ms. 2020-07-23 13:55:44,428 INFO org.apache.flink.runtime.scheduler.adapter.DefaultExecutionTopology [] - Built 1 pipelined regions in 25 ms 2020-07-23 13:55:44,437 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Loading state backend via factory org.apache.flink.contrib.streaming.state.RocksDBStateBackendFactory 2020-07-23 13:55:44,456 INFO org.apache.flink.contrib.streaming.state.RocksDBStateBackend [] - Using predefined options: DEFAULT. 2020-07-23 13:55:44,457 INFO org.apache.flink.contrib.streaming.state.RocksDBStateBackend [] - Using default options factory: DefaultConfigurableOptionsFactory{configuredOptions={}}. 2020-07-23 13:55:44,466 WARN org.apache.flink.runtime.util.HadoopUtils [] - Could not find Hadoop configuration via any of the supported methods (Flink configuration, environment variables). 2020-07-23 13:55:45,276 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Using failover strategy org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy@72bd8533 for JobTest (99a030d0e3f428490a501c0132f27a56). 2020-07-23 13:55:45,280 INFO org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl [] - JobManager runner for job JobTest (99a030d0e3f428490a501c0132f27a56) was granted leadership with session id 00000000-0000-0000-0000-000000000000 at akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2. 2020-07-23 13:55:45,286 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Starting scheduling with scheduling strategy [org.apache.flink.runtime.scheduler.strategy.EagerSchedulingStrategy] 2020-07-23 13:55:45,436 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{e092b12b96b0a98bbf057e71b9705c23}] 2020-07-23 13:55:45,436 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{4ad15f417716c9e07fca383990c0f52a}] 2020-07-23 13:55:45,436 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{345fdb427a893b7fc3f4f040f93445d2}] 2020-07-23 13:55:45,437 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{e559485ea7b0b7e17367816882538d90}] 2020-07-23 13:55:45,437 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{7be8f6c1aedb27b04e7feae68078685c}] 2020-07-23 13:55:45,437 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{582a86197884206652dff3aea2306bb3}] 2020-07-23 13:55:45,437 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{0cc24260eda3af299a0b321feefaf2cb}] 2020-07-23 13:55:45,437 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{240ca6f3d3b5ece6a98243ec8cadf616}] 2020-07-23 13:55:45,438 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{c35033d598a517acc108424bb9f809fb}] 2020-07-23 13:55:45,438 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{ad35013c3b532d4b4df1be62395ae0cf}] 2020-07-23 13:55:45,438 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{c929bd5e8daf432d01fad1ece3daec1a}] 2020-07-23 13:55:45,487 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Connecting to ResourceManager akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*(00000000000000000000000000000000) 2020-07-23 13:55:45,492 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Resolved ResourceManager address, beginning registration 2020-07-23 13:55:45,493 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Registering job manager [hidden email]://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job 99a030d0e3f428490a501c0132f27a56. 2020-07-23 13:55:45,499 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Registered job manager [hidden email]://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job 99a030d0e3f428490a501c0132f27a56. 2020-07-23 13:55:45,501 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - JobManager successfully registered at ResourceManager, leader id: 00000000000000000000000000000000. 2020-07-23 13:55:45,501 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{15fd2a9565c2b080748c1d1592b1cbbc}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,502 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Request slot with profile ResourceProfile{UNKNOWN} for job 99a030d0e3f428490a501c0132f27a56 with allocation id d420d08bf2654d9ea76955c70db18b69. 2020-07-23 13:55:45,502 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{8cd72cc16f0e319d915a9a096a1096d7}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,503 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{e7e422409acebdb385014a9634af6a90}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,503 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{cef1af73546ca1fc27ca7a3322e9e815}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,503 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{108fe0b3086567ad79275eccef2fdaf8}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,503 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{265e67985eab7a6dc08024e53bf2708d}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,503 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{7087497a17c441f1a1d6fefcbc7cd0ea}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,503 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{14ac08438e79c8db8d25d93b99d62725}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,514 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Request slot with profile ResourceProfile{UNKNOWN} for job 99a030d0e3f428490a501c0132f27a56 with allocation id fce526bbe3e1be91caa3e4b536b20e35. 2020-07-23 13:55:45,514 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{40c7abbb12514c405323b0569fb21647}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,514 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{a4985a9647b65b30a571258b45c8f2ce}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,515 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{c52a6eb2fa58050e71e7903590019fd1}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,517 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Request slot with profile ResourceProfile{UNKNOWN} for job 99a030d0e3f428490a501c0132f27a56 with allocation id 18ac7ec802ebfcfed8c05ee9324a55a4. 2020-07-23 13:55:45,518 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Request slot with profile ResourceProfile{UNKNOWN} for job 99a030d0e3f428490a501c0132f27a56 with allocation id 7ec76cbe689eb418b63599e90ade19be. 2020-07-23 13:55:45,518 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{46d65692a8b5aad11b51f9a74a666a74}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,518 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{3670bb4f345eedf941cc18e477ba1e9d}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,518 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{4a12467d76b9e3df8bc3412c0be08e14}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,518 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{e092b12b96b0a98bbf057e71b9705c23}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,518 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{4ad15f417716c9e07fca383990c0f52a}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,518 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{345fdb427a893b7fc3f4f040f93445d2}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,519 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{e559485ea7b0b7e17367816882538d90}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,519 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Request slot with profile ResourceProfile{UNKNOWN} for job 99a030d0e3f428490a501c0132f27a56 with allocation id b78837a29b4032924ac25be70ed21a3c. 2020-07-23 13:58:18,037 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.47.96.2, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. 2020-07-23 13:58:22,192 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.34.64.14, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. 2020-07-23 13:58:22,358 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.34.128.9, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. 2020-07-23 13:58:24,562 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.32.160.6, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. 2020-07-23 13:58:25,487 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.38.64.7, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. 2020-07-23 13:58:27,636 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.42.160.6, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. 2020-07-23 13:58:27,767 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.43.64.12, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. 2020-07-23 13:58:29,651 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - The heartbeat of JobManager with id 456a18b6c404cb11a359718e16de1c6b timed out. 2020-07-23 13:58:29,651 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Disconnect job manager [hidden email]://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job 99a030d0e3f428490a501c0132f27a56 from the resource manager. 2020-07-23 13:58:29,854 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.39.0.8, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. 2020-07-23 13:58:33,623 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.35.0.10, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. 2020-07-23 13:58:35,756 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.36.32.8, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. 2020-07-23 13:58:36,694 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.42.128.6, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. 2020-07-23 14:01:17,814 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Close ResourceManager connection 83b1ff14900abfd54418e7fa3efb3f8a: The heartbeat of JobManager with id 456a18b6c404cb11a359718e16de1c6b timed out.. 2020-07-23 14:01:17,815 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Connecting to ResourceManager akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*(00000000000000000000000000000000) 2020-07-23 14:01:17,816 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Resolved ResourceManager address, beginning registration 2020-07-23 14:01:17,816 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Registering job manager [hidden email]://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job 99a030d0e3f428490a501c0132f27a56. 2020-07-23 14:01:17,836 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: host_relation -> Timestamps/Watermarks -> Map (1/1) (302ca9640e2d209a543d843f2996ccd2) switched from SCHEDULED to FAILED on not deployed. org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Could not allocate the required slot within slot request timeout. Please make sure that the cluster has enough resources. at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeWrapWithNoResourceAvailableException(DefaultScheduler.java:441) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.scheduler.DefaultScheduler.lambda$assignResourceOrHandleError$6(DefaultScheduler.java:422) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_242] at org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.lambda$internalAllocateSlot$0(SchedulerImpl.java:168) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_242] at org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$SingleTaskSlot.release(SlotSharingManager.java:726) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.release(SlotSharingManager.java:537) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.lambda$new$0(SlotSharingManager.java:432) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_242] at org.apache.flink.runtime.concurrent.FutureUtils.lambda$forwardTo$21(FutureUtils.java:1120) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_242] at org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:1036) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:402) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:195) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) [flink-dist_2.11-1.11.1.jar:1.11.1] at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) [flink-dist_2.11-1.11.1.jar:1.11.1] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) [flink-dist_2.11-1.11.1.jar:1.11.1] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [flink-dist_2.11-1.11.1.jar:1.11.1] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.actor.Actor$class.aroundReceive(Actor.scala:517) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.actor.ActorCell.invoke(ActorCell.scala:561) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.Mailbox.run(Mailbox.scala:225) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.Mailbox.exec(Mailbox.scala:235) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [flink-dist_2.11-1.11.1.jar:1.11.1] Caused by: java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) ~[?:1.8.0_242] ... 25 more Caused by: java.util.concurrent.TimeoutException ... 23 more 2020-07-23 14:01:17,848 INFO org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy [] - Calculating tasks to restart to recover the failed task cbc357ccb763df2852fee8c4fc7d55f2_0. 2020-07-23 14:01:17,910 INFO org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy [] - 902 tasks should be restarted to recover the failed task cbc357ccb763df2852fee8c4fc7d55f2_0. 2020-07-23 14:01:17,913 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job JobTest (99a030d0e3f428490a501c0132f27a56) switched from state RUNNING to FAILING. org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:116) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:78) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:192) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:185) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:179) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:503) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.scheduler.UpdateSchedulerNgOnInternalFailuresListener.notifyTaskFailure(UpdateSchedulerNgOnInternalFailuresListener.java:49) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.executiongraph.ExecutionGraph.notifySchedulerNgAboutInternalTaskFailure(ExecutionGraph.java:1710) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1287) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1255) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.executiongraph.Execution.markFailed(Execution.java:1086) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.executiongraph.ExecutionVertex.markFailed(ExecutionVertex.java:748) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.scheduler.DefaultExecutionVertexOperations.markFailed(DefaultExecutionVertexOperations.java:41) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskDeploymentFailure(DefaultScheduler.java:435) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.scheduler.DefaultScheduler.lambda$assignResourceOrHandleError$6(DefaultScheduler.java:422) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_242] at org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.lambda$internalAllocateSlot$0(SchedulerImpl.java:168) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_242] at org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$SingleTaskSlot.release(SlotSharingManager.java:726) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.release(SlotSharingManager.java:537) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.lambda$new$0(SlotSharingManager.java:432) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_242] at org.apache.flink.runtime.concurrent.FutureUtils.lambda$forwardTo$21(FutureUtils.java:1120) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_242] at org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:1036) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:402) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:195) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) [flink-dist_2.11-1.11.1.jar:1.11.1] at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) [flink-dist_2.11-1.11.1.jar:1.11.1] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) [flink-dist_2.11-1.11.1.jar:1.11.1] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [flink-dist_2.11-1.11.1.jar:1.11.1] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.actor.Actor$class.aroundReceive(Actor.scala:517) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.actor.ActorCell.invoke(ActorCell.scala:561) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.Mailbox.run(Mailbox.scala:225) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.Mailbox.exec(Mailbox.scala:235) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [flink-dist_2.11-1.11.1.jar:1.11.1] Caused by: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Could not allocate the required slot within slot request timeout. Please make sure that the cluster has enough resources. at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeWrapWithNoResourceAvailableException(DefaultScheduler.java:441) ~[flink-dist_2.11-1.11.1.jar:1.11.1] ... 45 more Caused by: java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) ~[?:1.8.0_242] ... 25 more Caused by: java.util.concurrent.TimeoutException ... 23 more 2020-07-23 14:01:18,109 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Discarding the results produced by task execution 1809eb912d69854f2babedeaf879df6a. 2020-07-23 14:01:18,110 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job JobTest (99a030d0e3f428490a501c0132f27a56) switched from state FAILING to FAILED. org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:116) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:78) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:192) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:185) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:179) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:503) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.scheduler.UpdateSchedulerNgOnInternalFailuresListener.notifyTaskFailure(UpdateSchedulerNgOnInternalFailuresListener.java:49) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.executiongraph.ExecutionGraph.notifySchedulerNgAboutInternalTaskFailure(ExecutionGraph.java:1710) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1287) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1255) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.executiongraph.Execution.markFailed(Execution.java:1086) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.executiongraph.ExecutionVertex.markFailed(ExecutionVertex.java:748) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.scheduler.DefaultExecutionVertexOperations.markFailed(DefaultExecutionVertexOperations.java:41) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskDeploymentFailure(DefaultScheduler.java:435) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.scheduler.DefaultScheduler.lambda$assignResourceOrHandleError$6(DefaultScheduler.java:422) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_242] at org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.lambda$internalAllocateSlot$0(SchedulerImpl.java:168) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_242] at org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$SingleTaskSlot.release(SlotSharingManager.java:726) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.release(SlotSharingManager.java:537) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.lambda$new$0(SlotSharingManager.java:432) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_242] at org.apache.flink.runtime.concurrent.FutureUtils.lambda$forwardTo$21(FutureUtils.java:1120) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_242] at org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:1036) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:402) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:195) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) [flink-dist_2.11-1.11.1.jar:1.11.1] at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) [flink-dist_2.11-1.11.1.jar:1.11.1] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) [flink-dist_2.11-1.11.1.jar:1.11.1] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [flink-dist_2.11-1.11.1.jar:1.11.1] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.actor.Actor$class.aroundReceive(Actor.scala:517) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.actor.ActorCell.invoke(ActorCell.scala:561) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.Mailbox.run(Mailbox.scala:225) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.Mailbox.exec(Mailbox.scala:235) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [flink-dist_2.11-1.11.1.jar:1.11.1] Caused by: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Could not allocate the required slot within slot request timeout. Please make sure that the cluster has enough resources. at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeWrapWithNoResourceAvailableException(DefaultScheduler.java:441) ~[flink-dist_2.11-1.11.1.jar:1.11.1] ... 45 more Caused by: java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) ~[?:1.8.0_242] ... 25 more Caused by: java.util.concurrent.TimeoutException ... 23 more 2020-07-23 14:01:18,114 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Stopping checkpoint coordinator for job 99a030d0e3f428490a501c0132f27a56. 2020-07-23 14:01:18,117 INFO org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore [] - Shutting down 2020-07-23 14:01:18,118 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Discarding the results produced by task execution 302ca9640e2d209a543d843f2996ccd2. 2020-07-23 14:01:18,120 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Pending slot request [SlotRequestId{15fd2a9565c2b080748c1d1592b1cbbc}] timed out. 2020-07-23 14:01:18,120 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Pending slot request [SlotRequestId{8cd72cc16f0e319d915a9a096a1096d7}] timed out. 2020-07-23 14:01:18,120 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Pending slot request [SlotRequestId{e7e422409acebdb385014a9634af6a90}] timed out. 2020-07-23 14:01:18,121 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Pending slot request [SlotRequestId{cef1af73546ca1fc27ca7a3322e9e815}] timed out. 2020-07-23 14:01:18,121 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Pending slot request [SlotRequestId{108fe0b3086567ad79275eccef2fdaf8}] timed out. 2020-07-23 14:01:18,121 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Pending slot request [SlotRequestId{265e67985eab7a6dc08024e53bf2708d}] timed out. 2020-07-23 14:01:18,122 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Pending slot request [SlotRequestId{7087497a17c441f1a1d6fefcbc7cd0ea}] timed out. 2020-07-23 14:01:18,122 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Pending slot request [ 2020-07-23 14:01:18,151 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Registering job manager [hidden email]://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job 99a030d0e3f428490a501c0132f27a56. 2020-07-23 14:01:18,157 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Registered job manager [hidden email]://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job 99a030d0e3f428490a501c0132f27a56. 2020-07-23 14:01:18,157 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Registered job manager [hidden email]://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job 99a030d0e3f428490a501c0132f27a56. 2020-07-23 14:01:18,157 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Job 99a030d0e3f428490a501c0132f27a56 reached globally terminal state FAILED. 2020-07-23 14:01:18,162 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Registered job manager [hidden email]://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job 99a030d0e3f428490a501c0132f27a56. 2020-07-23 14:01:18,162 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - JobManager successfully registered at ResourceManager, leader id: 00000000000000000000000000000000. 2020-07-23 14:01:18,225 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Stopping the JobMaster for job JobTest(99a030d0e3f428490a501c0132f27a56). 2020-07-23 14:01:18,381 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Suspending SlotPool. 2020-07-23 14:01:18,382 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Close ResourceManager connection 83b1ff14900abfd54418e7fa3efb3f8a: JobManager is shutting down.. 2020-07-23 14:01:18,382 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Stopping SlotPool. 2020-07-23 14:01:18,382 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Disconnect job manager [hidden email]://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job 99a030d0e3f428490a501c0132f27a56 from the resource manager. | | a511955993 | | 邮箱:[hidden email] | 签名由 网易邮箱大师 定制 On 07/23/2020 13:26, Yang Wang wrote: 很高兴你的问题解决了,但我觉得根本原因应该不是加上了taskmanager-query-state-service.yaml的关系。 我这边不创建这个服务也是正常的,而且nslookup {tm_ip_address}是可以正常反解析到hostname的。 注意这里不是解析hostname,而是通过ip地址来反解析进行验证 回答你说的两个问题: 1. 不是必须的,我这边验证不需要创建,集群也是可以正常运行任务的。Rest service的暴露方式是ClusterIP、NodePort、LoadBalancer都正常 2. 如果没有配置taskmanager.bind-host, [Flink-15911][Flink-15154]这两个JIRA并不会影响TM向RM注册时候的使用的地址 如果你想找到根本原因,那可能需要你这边提供JM/TM的完整log,这样方便分析 Best, Yang SmileSmile <[hidden email]> 于2020年7月23日周四 上午11:30写道: > > Hi Yang Wang > > 刚刚在测试环境测试了一下,taskManager没有办法nslookup出来,JM可以nslookup,这两者的差别在于是否有service。 > > 解决方案:我这边给集群加上了taskmanager-query-state-service.yaml(按照官网上是可选服务)。就不会刷No > hostname could be resolved for ip > address,将NodePort改为ClusterIp,作业就可以成功提交,不会出现time out的问题了,问题得到了解决。 > > > 1. 如果按照上面的情况,那么这个配置文件是必须配置的? > > 2. 在1.11的更新中,发现有 [Flink-15911][Flink-15154] > 支持分别配置用于本地监听绑定的网络接口和外部访问的地址和端口。是否是这块的改动, > 需要JM去通过TM上报的ip反向解析出service? > > > Bset! > > > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/kubernetes.html > > a511955993 > 邮箱:[hidden email] > > <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=a511955993&uid=a511955993%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22%E9%82%AE%E7%AE%B1%EF%BC%9Aa511955993%40163.com%22%5D> > > 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail88> 定制 > > On 07/23/2020 10:11, Yang Wang <[hidden email]> wrote: > 我的意思就是你在Flink任务运行的过程中,然后下面的命令在集群里面起一个busybox的pod, > 在里面执行 nslookup {ip_address},看看是否能够正常解析到。如果不能应该就是coredns的 > 问题了 > > kubectl run -i -t busybox --image=busybox --restart=Never > > 你需要确认下集群的coredns pod是否正常,一般是部署在kube-system这个namespace下的 > > > > Best, > Yang > > > SmileSmile <[hidden email]> 于2020年7月22日周三 下午7:57写道: > > > > > Hi,Yang Wang! > > > > 很开心可以收到你的回复,你的回复帮助很大,让我知道了问题的方向。我再补充些信息,希望可以帮我进一步判断一下问题根源。 > > > > 在JM报错的地方,No hostname could be resolved for ip address xxxxx > > ,报出来的ip是k8s分配给flink pod的内网ip,不是宿主机的ip。请问这个问题可能出在哪里呢 > > > > Best! > > > > > > a511955993 > > 邮箱:[hidden email] > > > > < > https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=a511955993&uid=a511955993%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22%E9%82%AE%E7%AE%B1%EF%BC%9Aa511955993%40163.com%22%5D> > > > > > 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail88> 定制 > > > > On 07/22/2020 18:18, Yang Wang <[hidden email]> wrote: > > 如果你的日志里面一直在刷No hostname could be resolved for the IP > address,应该是集群的coredns > > 有问题,由ip地址反查hostname查不到。你可以起一个busybox验证一下是不是这个ip就解析不了,有 > > 可能是coredns有问题 > > > > > > Best, > > Yang > > > > Congxian Qiu <[hidden email]> 于2020年7月21日周二 下午7:29写道: > > > > > Hi > > > 不确定 k8s 环境中能否看到 pod 的完整日志?类似 Yarn 的 NM 日志一样,如果有的话,可以尝试看一下这个 pod > > > 的完整日志有没有什么发现 > > > Best, > > > Congxian > > > > > > > > > SmileSmile <[hidden email]> 于2020年7月21日周二 下午3:19写道: > > > > > > > Hi,Congxian > > > > > > > > 因为是测试环境,没有配置HA,目前看到的信息,就是JM刷出来大量的no hostname could be > > > > resolved,jm失联,作业提交失败。 > > > > 将jm内存配置为10g也是一样的情况(jobmanager.memory.pprocesa.size:10240m)。 > > > > > > > > 在同一个环境将版本回退到1.10没有出现该问题,也不会刷如上报错。 > > > > > > > > > > > > 是否有其他排查思路? > > > > > > > > Best! > > > > > > > > > > > > > > > > > > > > | | > > > > a511955993 > > > > | > > > > | > > > > 邮箱:[hidden email] > > > > | > > > > > > > > 签名由 网易邮箱大师 定制 > > > > > > > > On 07/16/2020 13:17, Congxian Qiu wrote: > > > > Hi > > > > 如果没有异常,GC 情况也正常的话,或许可以看一下 pod 的相关日志,如果开启了 HA 也可以看一下 zk > 的日志。之前遇到过一次在 > > > Yarn > > > > 环境中类似的现象是由于其他原因导致的,通过看 NM 日志以及 zk 日志发现的原因。 > > > > > > > > Best, > > > > Congxian > > > > > > > > > > > > SmileSmile <[hidden email]> 于2020年7月15日周三 下午5:20写道: > > > > > > > > > Hi Roc > > > > > > > > > > 该现象在1.10.1版本没有,在1.11版本才出现。请问这个该如何查比较合适 > > > > > > > > > > > > > > > > > > > > | | > > > > > a511955993 > > > > > | > > > > > | > > > > > 邮箱:[hidden email] > > > > > | > > > > > > > > > > 签名由 网易邮箱大师 定制 > > > > > > > > > > On 07/15/2020 17:16, Roc Marshal wrote: > > > > > Hi,SmileSmile. > > > > > 个人之前有遇到过 类似 的host解析问题,可以从k8s的pod节点网络映射角度排查一下。 > > > > > 希望这对你有帮助。 > > > > > > > > > > > > > > > 祝好。 > > > > > Roc Marshal > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 在 2020-07-15 17:04:18,"SmileSmile" <[hidden email]> 写道: > > > > > > > > > > > >Hi > > > > > > > > > > > >使用版本Flink 1.11,部署方式 kubernetes session。 TM个数30个,每个TM 4个slot。 job > > > > > 并行度120.提交作业的时候出现大量的No hostname could be resolved for the IP > > address,JM > > > > time > > > > > out,作业提交失败。web ui也会卡主无响应。 > > > > > > > > > > > >用wordCount,并行度只有1提交也会刷,no hostname的日志会刷个几条,然后正常提交,如果并行度一上去,就会超时。 > > > > > > > > > > > > > > > > > >部分日志如下: > > > > > > > > > > > >2020-07-15 16:58:46,460 WARN > > > > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - > No > > > > > hostname could be resolved for the IP address 10.32.160.7, using > IP > > > > address > > > > > as host name. Local input split assignment (such as for HDFS > files) > > may > > > > be > > > > > impacted. > > > > > >2020-07-15 16:58:46,460 WARN > > > > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - > No > > > > > hostname could be resolved for the IP address 10.44.224.7, using > IP > > > > address > > > > > as host name. Local input split assignment (such as for HDFS > files) > > may > > > > be > > > > > impacted. > > > > > >2020-07-15 16:58:46,461 WARN > > > > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - > No > > > > > hostname could be resolved for the IP address 10.40.32.9, using IP > > > > address > > > > > as host name. Local input split assignment (such as for HDFS > files) > > may > > > > be > > > > > impacted. > > > > > > > > > > > >2020-07-15 16:59:10,236 INFO > > > > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager > > [] - > > > > The > > > > > heartbeat of JobManager with id 69a0d460de468888a9f41c770d963c0a > > timed > > > > out. > > > > > >2020-07-15 16:59:10,236 INFO > > > > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager > > [] - > > > > > Disconnect job manager 00000000000000000000000000000000 > > > > > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for > > job > > > > > e1554c737e37ed79688a15c746b6e9ef from the resource manager. > > > > > > > > > > > > > > > > > >how to deal with ? > > > > > > > > > > > > > > > > > >beset ! > > > > > > > > > > > >| | > > > > > >a511955993 > > > > > >| > > > > > >| > > > > > >邮箱:[hidden email] > > > > > >| > > > > > > > > > > > >签名由 网易邮箱大师 定制 > > > > > > > > > > > > > > > > > > |
看你这个任务,失败的根本原因并不是“No hostname could be resolved
”,这个WARNING的原因可以单独讨论(如果在1.10里面不存在的话)。 你可以本地起一个Standalone的集群,也会有这样的WARNING,并不影响正常使用 失败的原因是slot 5分钟申请超时了,你给的日志里面2020-07-23 13:55:45,519到2020-07-23 13:58:18,037是空白的,没有进行省略吧? 这段时间按理应该是task开始deploy了。在日志里看到了JM->RM的心跳超时,同一个Pod里面的同一个进程通信也超时了 所以怀疑JM一直在FullGC,这个需要你确认一下 Best, Yang SmileSmile <[hidden email]> 于2020年7月23日周四 下午2:43写道: > Hi Yang Wang > > 先分享下我这边的环境版本 > > > kubernetes:1.17.4. CNI: weave > > > 1 2 3 是我的一些疑惑 > > 4 是JM日志 > > > 1. 去掉taskmanager-query-state-service.yaml后确实不行 nslookup > > kubectl exec -it busybox2 -- /bin/sh > / # nslookup 10.47.96.2 > Server: 10.96.0.10 > Address: 10.96.0.10:53 > > ** server can't find 2.96.47.10.in-addr.arpa: NXDOMAIN > > > > 2. Flink1.11和Flink1.10 > > detail subtasks taskmanagers xxx x 这行 > 1.11变成了172-20-0-50。1.10是flink-taskmanager-7b5d6958b6-sfzlk:36459。这块的改动是?(目前这个集群跑着1.10和1.11,1.10可以正常运行,如果coredns有问题,1.10版本的flink应该也有一样的情况吧?) > > 3. coredns是否特殊配置? > > 在容器中解析域名是正常的,只是反向解析没有service才会有问题。coredns是否有什么需要配置? > > > 4. time out时候的JM日志如下: > > > > 2020-07-23 13:53:00,228 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > ResourceManager akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_0 > was granted leadership with fencing token 00000000000000000000000000000000 > 2020-07-23 13:53:00,232 INFO > org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting > RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher > at akka://flink/user/rpc/dispatcher_1 . > 2020-07-23 13:53:00,233 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - > Starting the SlotManager. > 2020-07-23 13:53:03,472 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registering TaskManager with ResourceID 1f9ae0cd95a28943a73be26323588696 > (akka.tcp://flink@10.34.128.9:6122/user/rpc/taskmanager_0) at > ResourceManager > 2020-07-23 13:53:03,777 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registering TaskManager with ResourceID cac09e751264e61615329c20713a84b4 > (akka.tcp://flink@10.32.160.6:6122/user/rpc/taskmanager_0) at > ResourceManager > 2020-07-23 13:53:03,787 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registering TaskManager with ResourceID 93c72d01d09f9ae427c5fc980ed4c1e4 > (akka.tcp://flink@10.39.0.8:6122/user/rpc/taskmanager_0) at > ResourceManager > 2020-07-23 13:53:04,044 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registering TaskManager with ResourceID 8adf2f8e81b77a16d5418a9e252c61e2 > (akka.tcp://flink@10.38.64.7:6122/user/rpc/taskmanager_0) at > ResourceManager > 2020-07-23 13:53:04,099 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registering TaskManager with ResourceID 23e9d2358f6eb76b9ae718d879d4f330 > (akka.tcp://flink@10.42.160.6:6122/user/rpc/taskmanager_0) at > ResourceManager > 2020-07-23 13:53:04,146 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registering TaskManager with ResourceID 092f8dee299e32df13db3111662b61f8 > (akka.tcp://flink@10.33.192.14:6122/user/rpc/taskmanager_0) at > ResourceManager > > > 2020-07-23 13:55:44,220 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Received > JobGraph submission 99a030d0e3f428490a501c0132f27a56 (JobTest). > 2020-07-23 13:55:44,222 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - > Submitting job 99a030d0e3f428490a501c0132f27a56 (JobTest). > 2020-07-23 13:55:44,251 INFO > org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting > RPC endpoint for org.apache.flink.runtime.jobmaster.JobMaster at > akka://flink/user/rpc/jobmanager_2 . > 2020-07-23 13:55:44,260 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - Initializing job JobTest > (99a030d0e3f428490a501c0132f27a56). > 2020-07-23 13:55:44,278 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - Using restart back off time strategy > NoRestartBackoffTimeStrategy for JobTest (99a030d0e3f428490a501c0132f27a56). > 2020-07-23 13:55:44,319 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - Running initialization on master for job JobTest > (99a030d0e3f428490a501c0132f27a56). > 2020-07-23 13:55:44,319 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - Successfully ran initialization on master in 0 ms. > 2020-07-23 13:55:44,428 INFO > org.apache.flink.runtime.scheduler.adapter.DefaultExecutionTopology [] - > Built 1 pipelined regions in 25 ms > 2020-07-23 13:55:44,437 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - Loading state backend via factory > org.apache.flink.contrib.streaming.state.RocksDBStateBackendFactory > 2020-07-23 13:55:44,456 INFO > org.apache.flink.contrib.streaming.state.RocksDBStateBackend [] - Using > predefined options: DEFAULT. > 2020-07-23 13:55:44,457 INFO > org.apache.flink.contrib.streaming.state.RocksDBStateBackend [] - Using > default options factory: > DefaultConfigurableOptionsFactory{configuredOptions={}}. > 2020-07-23 13:55:44,466 WARN org.apache.flink.runtime.util.HadoopUtils > [] - Could not find Hadoop configuration via any of the > supported methods (Flink configuration, environment variables). > 2020-07-23 13:55:45,276 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - Using failover strategy > org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy@72bd8533 > for JobTest (99a030d0e3f428490a501c0132f27a56). > 2020-07-23 13:55:45,280 INFO > org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl [] - > JobManager runner for job JobTest (99a030d0e3f428490a501c0132f27a56) was > granted leadership with session id 00000000-0000-0000-0000-000000000000 at > akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2. > 2020-07-23 13:55:45,286 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - Starting scheduling with scheduling strategy > [org.apache.flink.runtime.scheduler.strategy.EagerSchedulingStrategy] > > > > 2020-07-23 13:55:45,436 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > serve slot request, no ResourceManager connected. Adding as pending request > [SlotRequestId{e092b12b96b0a98bbf057e71b9705c23}] > 2020-07-23 13:55:45,436 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > serve slot request, no ResourceManager connected. Adding as pending request > [SlotRequestId{4ad15f417716c9e07fca383990c0f52a}] > 2020-07-23 13:55:45,436 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > serve slot request, no ResourceManager connected. Adding as pending request > [SlotRequestId{345fdb427a893b7fc3f4f040f93445d2}] > 2020-07-23 13:55:45,437 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > serve slot request, no ResourceManager connected. Adding as pending request > [SlotRequestId{e559485ea7b0b7e17367816882538d90}] > 2020-07-23 13:55:45,437 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > serve slot request, no ResourceManager connected. Adding as pending request > [SlotRequestId{7be8f6c1aedb27b04e7feae68078685c}] > 2020-07-23 13:55:45,437 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > serve slot request, no ResourceManager connected. Adding as pending request > [SlotRequestId{582a86197884206652dff3aea2306bb3}] > 2020-07-23 13:55:45,437 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > serve slot request, no ResourceManager connected. Adding as pending request > [SlotRequestId{0cc24260eda3af299a0b321feefaf2cb}] > 2020-07-23 13:55:45,437 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > serve slot request, no ResourceManager connected. Adding as pending request > [SlotRequestId{240ca6f3d3b5ece6a98243ec8cadf616}] > 2020-07-23 13:55:45,438 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > serve slot request, no ResourceManager connected. Adding as pending request > [SlotRequestId{c35033d598a517acc108424bb9f809fb}] > 2020-07-23 13:55:45,438 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > serve slot request, no ResourceManager connected. Adding as pending request > [SlotRequestId{ad35013c3b532d4b4df1be62395ae0cf}] > 2020-07-23 13:55:45,438 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > serve slot request, no ResourceManager connected. Adding as pending request > [SlotRequestId{c929bd5e8daf432d01fad1ece3daec1a}] > 2020-07-23 13:55:45,487 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - Connecting to ResourceManager > akka.tcp://flink@flink-jobmanager > :6123/user/rpc/resourcemanager_*(00000000000000000000000000000000) > 2020-07-23 13:55:45,492 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - Resolved ResourceManager address, beginning > registration > 2020-07-23 13:55:45,493 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registering job manager 00000000000000000000000000000000 > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > 99a030d0e3f428490a501c0132f27a56. > 2020-07-23 13:55:45,499 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registered job manager 00000000000000000000000000000000 > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > 99a030d0e3f428490a501c0132f27a56. > 2020-07-23 13:55:45,501 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - JobManager successfully registered at ResourceManager, > leader id: 00000000000000000000000000000000. > 2020-07-23 13:55:45,501 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{15fd2a9565c2b080748c1d1592b1cbbc}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,502 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Request slot with profile ResourceProfile{UNKNOWN} for job > 99a030d0e3f428490a501c0132f27a56 with allocation id > d420d08bf2654d9ea76955c70db18b69. > 2020-07-23 13:55:45,502 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{8cd72cc16f0e319d915a9a096a1096d7}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,503 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{e7e422409acebdb385014a9634af6a90}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,503 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{cef1af73546ca1fc27ca7a3322e9e815}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,503 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{108fe0b3086567ad79275eccef2fdaf8}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,503 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{265e67985eab7a6dc08024e53bf2708d}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,503 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{7087497a17c441f1a1d6fefcbc7cd0ea}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,503 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{14ac08438e79c8db8d25d93b99d62725}] and > profile ResourceProfile{UNKNOWN} from resource manager. > > 2020-07-23 13:55:45,514 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Request slot with profile ResourceProfile{UNKNOWN} for job > 99a030d0e3f428490a501c0132f27a56 with allocation id > fce526bbe3e1be91caa3e4b536b20e35. > 2020-07-23 13:55:45,514 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{40c7abbb12514c405323b0569fb21647}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,514 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{a4985a9647b65b30a571258b45c8f2ce}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,515 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{c52a6eb2fa58050e71e7903590019fd1}] and > profile ResourceProfile{UNKNOWN} from resource manager. > > 2020-07-23 13:55:45,517 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Request slot with profile ResourceProfile{UNKNOWN} for job > 99a030d0e3f428490a501c0132f27a56 with allocation id > 18ac7ec802ebfcfed8c05ee9324a55a4. > > 2020-07-23 13:55:45,518 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Request slot with profile ResourceProfile{UNKNOWN} for job > 99a030d0e3f428490a501c0132f27a56 with allocation id > 7ec76cbe689eb418b63599e90ade19be. > 2020-07-23 13:55:45,518 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{46d65692a8b5aad11b51f9a74a666a74}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,518 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{3670bb4f345eedf941cc18e477ba1e9d}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,518 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{4a12467d76b9e3df8bc3412c0be08e14}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,518 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{e092b12b96b0a98bbf057e71b9705c23}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,518 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{4ad15f417716c9e07fca383990c0f52a}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,518 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{345fdb427a893b7fc3f4f040f93445d2}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,519 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{e559485ea7b0b7e17367816882538d90}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,519 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Request slot with profile ResourceProfile{UNKNOWN} for job > 99a030d0e3f428490a501c0132f27a56 with allocation id > b78837a29b4032924ac25be70ed21a3c. > > > 2020-07-23 13:58:18,037 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.47.96.2, using IP address > as host name. Local input split assignment (such as for HDFS files) may be > impacted. > 2020-07-23 13:58:22,192 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.34.64.14, using IP address > as host name. Local input split assignment (such as for HDFS files) may be > impacted. > 2020-07-23 13:58:22,358 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.34.128.9, using IP address > as host name. Local input split assignment (such as for HDFS files) may be > impacted. > 2020-07-23 13:58:24,562 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.32.160.6, using IP address > as host name. Local input split assignment (such as for HDFS files) may be > impacted. > 2020-07-23 13:58:25,487 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.38.64.7, using IP address > as host name. Local input split assignment (such as for HDFS files) may be > impacted. > 2020-07-23 13:58:27,636 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.42.160.6, using IP address > as host name. Local input split assignment (such as for HDFS files) may be > impacted. > 2020-07-23 13:58:27,767 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.43.64.12, using IP address > as host name. Local input split assignment (such as for HDFS files) may be > impacted. > 2020-07-23 13:58:29,651 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > The heartbeat of JobManager with id 456a18b6c404cb11a359718e16de1c6b timed > out. > 2020-07-23 13:58:29,651 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Disconnect job manager 00000000000000000000000000000000 > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > 99a030d0e3f428490a501c0132f27a56 from the resource manager. > 2020-07-23 13:58:29,854 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.39.0.8, using IP address > as host name. Local input split assignment (such as for HDFS files) may be > impacted. > 2020-07-23 13:58:33,623 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.35.0.10, using IP address > as host name. Local input split assignment (such as for HDFS files) may be > impacted. > 2020-07-23 13:58:35,756 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.36.32.8, using IP address > as host name. Local input split assignment (such as for HDFS files) may be > impacted. > 2020-07-23 13:58:36,694 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.42.128.6, using IP address > as host name. Local input split assignment (such as for HDFS files) may be > impacted. > > > 2020-07-23 14:01:17,814 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - Close ResourceManager connection > 83b1ff14900abfd54418e7fa3efb3f8a: The heartbeat of JobManager with id > 456a18b6c404cb11a359718e16de1c6b timed out.. > 2020-07-23 14:01:17,815 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - Connecting to ResourceManager > akka.tcp://flink@flink-jobmanager > :6123/user/rpc/resourcemanager_*(00000000000000000000000000000000) > 2020-07-23 14:01:17,816 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - Resolved ResourceManager address, beginning > registration > 2020-07-23 14:01:17,816 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registering job manager 00000000000000000000000000000000 > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > 99a030d0e3f428490a501c0132f27a56. > 2020-07-23 14:01:17,836 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: > host_relation -> Timestamps/Watermarks -> Map (1/1) > (302ca9640e2d209a543d843f2996ccd2) switched from SCHEDULED to FAILED on not > deployed. > org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: > Could not allocate the required slot within slot request timeout. Please > make sure that the cluster has enough resources. > at > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeWrapWithNoResourceAvailableException(DefaultScheduler.java:441) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.scheduler.DefaultScheduler.lambda$assignResourceOrHandleError$6(DefaultScheduler.java:422) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.lambda$internalAllocateSlot$0(SchedulerImpl.java:168) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$SingleTaskSlot.release(SlotSharingManager.java:726) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.release(SlotSharingManager.java:537) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.lambda$new$0(SlotSharingManager.java:432) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > org.apache.flink.runtime.concurrent.FutureUtils.lambda$forwardTo$21(FutureUtils.java:1120) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:1036) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:402) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:195) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > [flink-dist_2.11-1.11.1.jar:1.11.1] > Caused by: java.util.concurrent.CompletionException: > java.util.concurrent.TimeoutException > at > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) > ~[?:1.8.0_242] > ... 25 more > Caused by: java.util.concurrent.TimeoutException > ... 23 more > 2020-07-23 14:01:17,848 INFO > org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy > [] - Calculating tasks to restart to recover the failed task > cbc357ccb763df2852fee8c4fc7d55f2_0. > 2020-07-23 14:01:17,910 INFO > org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy > [] - 902 tasks should be restarted to recover the failed task > cbc357ccb763df2852fee8c4fc7d55f2_0. > 2020-07-23 14:01:17,913 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job > JobTest (99a030d0e3f428490a501c0132f27a56) switched from state RUNNING to > FAILING. > org.apache.flink.runtime.JobException: Recovery is suppressed by > NoRestartBackoffTimeStrategy > at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:116) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:78) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:192) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:185) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:179) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:503) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.scheduler.UpdateSchedulerNgOnInternalFailuresListener.notifyTaskFailure(UpdateSchedulerNgOnInternalFailuresListener.java:49) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.executiongraph.ExecutionGraph.notifySchedulerNgAboutInternalTaskFailure(ExecutionGraph.java:1710) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1287) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1255) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.executiongraph.Execution.markFailed(Execution.java:1086) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.executiongraph.ExecutionVertex.markFailed(ExecutionVertex.java:748) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.scheduler.DefaultExecutionVertexOperations.markFailed(DefaultExecutionVertexOperations.java:41) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskDeploymentFailure(DefaultScheduler.java:435) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.scheduler.DefaultScheduler.lambda$assignResourceOrHandleError$6(DefaultScheduler.java:422) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.lambda$internalAllocateSlot$0(SchedulerImpl.java:168) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$SingleTaskSlot.release(SlotSharingManager.java:726) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.release(SlotSharingManager.java:537) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.lambda$new$0(SlotSharingManager.java:432) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > org.apache.flink.runtime.concurrent.FutureUtils.lambda$forwardTo$21(FutureUtils.java:1120) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:1036) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:402) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:195) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > [flink-dist_2.11-1.11.1.jar:1.11.1] > Caused by: > org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: > Could not allocate the required slot within slot request timeout. Please > make sure that the cluster has enough resources. > at > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeWrapWithNoResourceAvailableException(DefaultScheduler.java:441) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > ... 45 more > Caused by: java.util.concurrent.CompletionException: > java.util.concurrent.TimeoutException > at > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) > ~[?:1.8.0_242] > ... 25 more > Caused by: java.util.concurrent.TimeoutException > ... 23 more > > > > 2020-07-23 14:01:18,109 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - > Discarding the results produced by task execution > 1809eb912d69854f2babedeaf879df6a. > 2020-07-23 14:01:18,110 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job > JobTest (99a030d0e3f428490a501c0132f27a56) switched from state FAILING to > FAILED. > org.apache.flink.runtime.JobException: Recovery is suppressed by > NoRestartBackoffTimeStrategy > at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:116) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:78) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:192) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:185) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:179) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:503) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.scheduler.UpdateSchedulerNgOnInternalFailuresListener.notifyTaskFailure(UpdateSchedulerNgOnInternalFailuresListener.java:49) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.executiongraph.ExecutionGraph.notifySchedulerNgAboutInternalTaskFailure(ExecutionGraph.java:1710) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1287) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1255) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.executiongraph.Execution.markFailed(Execution.java:1086) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.executiongraph.ExecutionVertex.markFailed(ExecutionVertex.java:748) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.scheduler.DefaultExecutionVertexOperations.markFailed(DefaultExecutionVertexOperations.java:41) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskDeploymentFailure(DefaultScheduler.java:435) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.scheduler.DefaultScheduler.lambda$assignResourceOrHandleError$6(DefaultScheduler.java:422) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.lambda$internalAllocateSlot$0(SchedulerImpl.java:168) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$SingleTaskSlot.release(SlotSharingManager.java:726) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.release(SlotSharingManager.java:537) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.lambda$new$0(SlotSharingManager.java:432) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > org.apache.flink.runtime.concurrent.FutureUtils.lambda$forwardTo$21(FutureUtils.java:1120) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:1036) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:402) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:195) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > [flink-dist_2.11-1.11.1.jar:1.11.1] > Caused by: > org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: > Could not allocate the required slot within slot request timeout. Please > make sure that the cluster has enough resources. > at > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeWrapWithNoResourceAvailableException(DefaultScheduler.java:441) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > ... 45 more > Caused by: java.util.concurrent.CompletionException: > java.util.concurrent.TimeoutException > at > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) > ~[?:1.8.0_242] > ... 25 more > Caused by: java.util.concurrent.TimeoutException > ... 23 more > 2020-07-23 14:01:18,114 INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Stopping > checkpoint coordinator for job 99a030d0e3f428490a501c0132f27a56. > 2020-07-23 14:01:18,117 INFO > org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore [] > - Shutting down > 2020-07-23 14:01:18,118 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - > Discarding the results produced by task execution > 302ca9640e2d209a543d843f2996ccd2. > 2020-07-23 14:01:18,120 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Pending > slot request [SlotRequestId{15fd2a9565c2b080748c1d1592b1cbbc}] timed out. > 2020-07-23 14:01:18,120 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Pending > slot request [SlotRequestId{8cd72cc16f0e319d915a9a096a1096d7}] timed out. > 2020-07-23 14:01:18,120 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Pending > slot request [SlotRequestId{e7e422409acebdb385014a9634af6a90}] timed out. > 2020-07-23 14:01:18,121 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Pending > slot request [SlotRequestId{cef1af73546ca1fc27ca7a3322e9e815}] timed out. > 2020-07-23 14:01:18,121 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Pending > slot request [SlotRequestId{108fe0b3086567ad79275eccef2fdaf8}] timed out. > 2020-07-23 14:01:18,121 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Pending > slot request [SlotRequestId{265e67985eab7a6dc08024e53bf2708d}] timed out. > 2020-07-23 14:01:18,122 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Pending > slot request [SlotRequestId{7087497a17c441f1a1d6fefcbc7cd0ea}] timed out. > 2020-07-23 14:01:18,122 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Pending > slot request [ > > > 2020-07-23 14:01:18,151 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registering job manager 00000000000000000000000000000000 > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > 99a030d0e3f428490a501c0132f27a56. > 2020-07-23 14:01:18,157 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registered job manager 00000000000000000000000000000000 > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > 99a030d0e3f428490a501c0132f27a56. > 2020-07-23 14:01:18,157 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registered job manager 00000000000000000000000000000000 > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > 99a030d0e3f428490a501c0132f27a56. > 2020-07-23 14:01:18,157 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Job > 99a030d0e3f428490a501c0132f27a56 reached globally terminal state FAILED. > 2020-07-23 14:01:18,162 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registered job manager 00000000000000000000000000000000 > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > 99a030d0e3f428490a501c0132f27a56. > 2020-07-23 14:01:18,162 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - JobManager successfully registered at ResourceManager, > leader id: 00000000000000000000000000000000. > 2020-07-23 14:01:18,225 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - Stopping the JobMaster for job > JobTest(99a030d0e3f428490a501c0132f27a56). > 2020-07-23 14:01:18,381 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Suspending SlotPool. > 2020-07-23 14:01:18,382 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - Close ResourceManager connection > 83b1ff14900abfd54418e7fa3efb3f8a: JobManager is shutting down.. > 2020-07-23 14:01:18,382 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Stopping > SlotPool. > 2020-07-23 14:01:18,382 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Disconnect job manager 00000000000000000000000000000000 > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > 99a030d0e3f428490a501c0132f27a56 from the resource manager. > > a511955993 > 邮箱:[hidden email] > > <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=a511955993&uid=a511955993%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22%E9%82%AE%E7%AE%B1%EF%BC%9Aa511955993%40163.com%22%5D> > > 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail88> 定制 > > On 07/23/2020 13:26, Yang Wang <[hidden email]> wrote: > 很高兴你的问题解决了,但我觉得根本原因应该不是加上了taskmanager-query-state-service.yaml的关系。 > 我这边不创建这个服务也是正常的,而且nslookup {tm_ip_address}是可以正常反解析到hostname的。 > > 注意这里不是解析hostname,而是通过ip地址来反解析进行验证 > > > 回答你说的两个问题: > 1. 不是必须的,我这边验证不需要创建,集群也是可以正常运行任务的。Rest > service的暴露方式是ClusterIP、NodePort、LoadBalancer都正常 > 2. 如果没有配置taskmanager.bind-host, > [Flink-15911][Flink-15154]这两个JIRA并不会影响TM向RM注册时候的使用的地址 > > 如果你想找到根本原因,那可能需要你这边提供JM/TM的完整log,这样方便分析 > > > Best, > Yang > > SmileSmile <[hidden email]> 于2020年7月23日周四 上午11:30写道: > > > > > Hi Yang Wang > > > > 刚刚在测试环境测试了一下,taskManager没有办法nslookup出来,JM可以nslookup,这两者的差别在于是否有service。 > > > > 解决方案:我这边给集群加上了taskmanager-query-state-service.yaml(按照官网上是可选服务)。就不会刷No > > hostname could be resolved for ip > > address,将NodePort改为ClusterIp,作业就可以成功提交,不会出现time out的问题了,问题得到了解决。 > > > > > > 1. 如果按照上面的情况,那么这个配置文件是必须配置的? > > > > 2. 在1.11的更新中,发现有 [Flink-15911][Flink-15154] > > 支持分别配置用于本地监听绑定的网络接口和外部访问的地址和端口。是否是这块的改动, > > 需要JM去通过TM上报的ip反向解析出service? > > > > > > Bset! > > > > > > [1] > > > https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/kubernetes.html > > > > a511955993 > > 邮箱:[hidden email] > > > > < > https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=a511955993&uid=a511955993%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22%E9%82%AE%E7%AE%B1%EF%BC%9Aa511955993%40163.com%22%5D> > > > > > 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail88> 定制 > > > > On 07/23/2020 10:11, Yang Wang <[hidden email]> wrote: > > 我的意思就是你在Flink任务运行的过程中,然后下面的命令在集群里面起一个busybox的pod, > > 在里面执行 nslookup {ip_address},看看是否能够正常解析到。如果不能应该就是coredns的 > > 问题了 > > > > kubectl run -i -t busybox --image=busybox --restart=Never > > > > 你需要确认下集群的coredns pod是否正常,一般是部署在kube-system这个namespace下的 > > > > > > > > Best, > > Yang > > > > > > SmileSmile <[hidden email]> 于2020年7月22日周三 下午7:57写道: > > > > > > > > Hi,Yang Wang! > > > > > > 很开心可以收到你的回复,你的回复帮助很大,让我知道了问题的方向。我再补充些信息,希望可以帮我进一步判断一下问题根源。 > > > > > > 在JM报错的地方,No hostname could be resolved for ip address xxxxx > > > ,报出来的ip是k8s分配给flink pod的内网ip,不是宿主机的ip。请问这个问题可能出在哪里呢 > > > > > > Best! > > > > > > > > > a511955993 > > > 邮箱:[hidden email] > > > > > > < > > > https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=a511955993&uid=a511955993%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22%E9%82%AE%E7%AE%B1%EF%BC%9Aa511955993%40163.com%22%5D> > > > > > > > > > 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail88> 定制 > > > > > > On 07/22/2020 18:18, Yang Wang <[hidden email]> wrote: > > > 如果你的日志里面一直在刷No hostname could be resolved for the IP > > address,应该是集群的coredns > > > 有问题,由ip地址反查hostname查不到。你可以起一个busybox验证一下是不是这个ip就解析不了,有 > > > 可能是coredns有问题 > > > > > > > > > Best, > > > Yang > > > > > > Congxian Qiu <[hidden email]> 于2020年7月21日周二 下午7:29写道: > > > > > > > Hi > > > > 不确定 k8s 环境中能否看到 pod 的完整日志?类似 Yarn 的 NM 日志一样,如果有的话,可以尝试看一下这个 pod > > > > 的完整日志有没有什么发现 > > > > Best, > > > > Congxian > > > > > > > > > > > > SmileSmile <[hidden email]> 于2020年7月21日周二 下午3:19写道: > > > > > > > > > Hi,Congxian > > > > > > > > > > 因为是测试环境,没有配置HA,目前看到的信息,就是JM刷出来大量的no hostname could be > > > > > resolved,jm失联,作业提交失败。 > > > > > 将jm内存配置为10g也是一样的情况(jobmanager.memory.pprocesa.size:10240m)。 > > > > > > > > > > 在同一个环境将版本回退到1.10没有出现该问题,也不会刷如上报错。 > > > > > > > > > > > > > > > 是否有其他排查思路? > > > > > > > > > > Best! > > > > > > > > > > > > > > > > > > > > > > > > > | | > > > > > a511955993 > > > > > | > > > > > | > > > > > 邮箱:[hidden email] > > > > > | > > > > > > > > > > 签名由 网易邮箱大师 定制 > > > > > > > > > > On 07/16/2020 13:17, Congxian Qiu wrote: > > > > > Hi > > > > > 如果没有异常,GC 情况也正常的话,或许可以看一下 pod 的相关日志,如果开启了 HA 也可以看一下 zk > > 的日志。之前遇到过一次在 > > > > Yarn > > > > > 环境中类似的现象是由于其他原因导致的,通过看 NM 日志以及 zk 日志发现的原因。 > > > > > > > > > > Best, > > > > > Congxian > > > > > > > > > > > > > > > SmileSmile <[hidden email]> 于2020年7月15日周三 下午5:20写道: > > > > > > > > > > > Hi Roc > > > > > > > > > > > > 该现象在1.10.1版本没有,在1.11版本才出现。请问这个该如何查比较合适 > > > > > > > > > > > > > > > > > > > > > > > > | | > > > > > > a511955993 > > > > > > | > > > > > > | > > > > > > 邮箱:[hidden email] > > > > > > | > > > > > > > > > > > > 签名由 网易邮箱大师 定制 > > > > > > > > > > > > On 07/15/2020 17:16, Roc Marshal wrote: > > > > > > Hi,SmileSmile. > > > > > > 个人之前有遇到过 类似 的host解析问题,可以从k8s的pod节点网络映射角度排查一下。 > > > > > > 希望这对你有帮助。 > > > > > > > > > > > > > > > > > > 祝好。 > > > > > > Roc Marshal > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 在 2020-07-15 17:04:18,"SmileSmile" <[hidden email]> 写道: > > > > > > > > > > > > > >Hi > > > > > > > > > > > > > >使用版本Flink 1.11,部署方式 kubernetes session。 TM个数30个,每个TM 4个slot。 > job > > > > > > 并行度120.提交作业的时候出现大量的No hostname could be resolved for the IP > > > address,JM > > > > > time > > > > > > out,作业提交失败。web ui也会卡主无响应。 > > > > > > > > > > > > > >用wordCount,并行度只有1提交也会刷,no > hostname的日志会刷个几条,然后正常提交,如果并行度一上去,就会超时。 > > > > > > > > > > > > > > > > > > > > >部分日志如下: > > > > > > > > > > > > > >2020-07-15 16:58:46,460 WARN > > > > > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] > - > > No > > > > > > hostname could be resolved for the IP address 10.32.160.7, using > > IP > > > > > address > > > > > > as host name. Local input split assignment (such as for HDFS > > files) > > > may > > > > > be > > > > > > impacted. > > > > > > >2020-07-15 16:58:46,460 WARN > > > > > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] > - > > No > > > > > > hostname could be resolved for the IP address 10.44.224.7, using > > IP > > > > > address > > > > > > as host name. Local input split assignment (such as for HDFS > > files) > > > may > > > > > be > > > > > > impacted. > > > > > > >2020-07-15 16:58:46,461 WARN > > > > > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] > - > > No > > > > > > hostname could be resolved for the IP address 10.40.32.9, using > IP > > > > > address > > > > > > as host name. Local input split assignment (such as for HDFS > > files) > > > may > > > > > be > > > > > > impacted. > > > > > > > > > > > > > >2020-07-15 16:59:10,236 INFO > > > > > > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager > > > [] - > > > > > The > > > > > > heartbeat of JobManager with id 69a0d460de468888a9f41c770d963c0a > > > timed > > > > > out. > > > > > > >2020-07-15 16:59:10,236 INFO > > > > > > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager > > > [] - > > > > > > Disconnect job manager 00000000000000000000000000000000 > > > > > > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 > for > > > job > > > > > > e1554c737e37ed79688a15c746b6e9ef from the resource manager. > > > > > > > > > > > > > > > > > > > > >how to deal with ? > > > > > > > > > > > > > > > > > > > > >beset ! > > > > > > > > > > > > > >| | > > > > > > >a511955993 > > > > > > >| > > > > > > >| > > > > > > >邮箱:[hidden email] > > > > > > >| > > > > > > > > > > > > > >签名由 网易邮箱大师 定制 > > > > > > > > > > > > > > > > > > > > > > > > > > > |
Hi,Yang Wang
因为日志太长了,删了一些重复的内容。 一开始怀疑过jm gc的问题,将jm的内存调整为10g也是一样的情况。 Best | | a511955993 | | 邮箱:[hidden email] | 签名由 网易邮箱大师 定制 On 07/27/2020 11:36, Yang Wang wrote: 看你这个任务,失败的根本原因并不是“No hostname could be resolved ”,这个WARNING的原因可以单独讨论(如果在1.10里面不存在的话)。 你可以本地起一个Standalone的集群,也会有这样的WARNING,并不影响正常使用 失败的原因是slot 5分钟申请超时了,你给的日志里面2020-07-23 13:55:45,519到2020-07-23 13:58:18,037是空白的,没有进行省略吧? 这段时间按理应该是task开始deploy了。在日志里看到了JM->RM的心跳超时,同一个Pod里面的同一个进程通信也超时了 所以怀疑JM一直在FullGC,这个需要你确认一下 Best, Yang SmileSmile <[hidden email]> 于2020年7月23日周四 下午2:43写道: > Hi Yang Wang > > 先分享下我这边的环境版本 > > > kubernetes:1.17.4. CNI: weave > > > 1 2 3 是我的一些疑惑 > > 4 是JM日志 > > > 1. 去掉taskmanager-query-state-service.yaml后确实不行 nslookup > > kubectl exec -it busybox2 -- /bin/sh > / # nslookup 10.47.96.2 > Server: 10.96.0.10 > Address: 10.96.0.10:53 > > ** server can't find 2.96.47.10.in-addr.arpa: NXDOMAIN > > > > 2. Flink1.11和Flink1.10 > > detail subtasks taskmanagers xxx x 这行 > 1.11变成了172-20-0-50。1.10是flink-taskmanager-7b5d6958b6-sfzlk:36459。这块的改动是?(目前这个集群跑着1.10和1.11,1.10可以正常运行,如果coredns有问题,1.10版本的flink应该也有一样的情况吧?) > > 3. coredns是否特殊配置? > > 在容器中解析域名是正常的,只是反向解析没有service才会有问题。coredns是否有什么需要配置? > > > 4. time out时候的JM日志如下: > > > > 2020-07-23 13:53:00,228 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > ResourceManager akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_0 > was granted leadership with fencing token 00000000000000000000000000000000 > 2020-07-23 13:53:00,232 INFO > org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting > RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher > at akka://flink/user/rpc/dispatcher_1 . > 2020-07-23 13:53:00,233 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - > Starting the SlotManager. > 2020-07-23 13:53:03,472 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registering TaskManager with ResourceID 1f9ae0cd95a28943a73be26323588696 > (akka.tcp://flink@10.34.128.9:6122/user/rpc/taskmanager_0) at > ResourceManager > 2020-07-23 13:53:03,777 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registering TaskManager with ResourceID cac09e751264e61615329c20713a84b4 > (akka.tcp://flink@10.32.160.6:6122/user/rpc/taskmanager_0) at > ResourceManager > 2020-07-23 13:53:03,787 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registering TaskManager with ResourceID 93c72d01d09f9ae427c5fc980ed4c1e4 > (akka.tcp://flink@10.39.0.8:6122/user/rpc/taskmanager_0) at > ResourceManager > 2020-07-23 13:53:04,044 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registering TaskManager with ResourceID 8adf2f8e81b77a16d5418a9e252c61e2 > (akka.tcp://flink@10.38.64.7:6122/user/rpc/taskmanager_0) at > ResourceManager > 2020-07-23 13:53:04,099 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registering TaskManager with ResourceID 23e9d2358f6eb76b9ae718d879d4f330 > (akka.tcp://flink@10.42.160.6:6122/user/rpc/taskmanager_0) at > ResourceManager > 2020-07-23 13:53:04,146 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registering TaskManager with ResourceID 092f8dee299e32df13db3111662b61f8 > (akka.tcp://flink@10.33.192.14:6122/user/rpc/taskmanager_0) at > ResourceManager > > > 2020-07-23 13:55:44,220 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Received > JobGraph submission 99a030d0e3f428490a501c0132f27a56 (JobTest). > 2020-07-23 13:55:44,222 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - > Submitting job 99a030d0e3f428490a501c0132f27a56 (JobTest). > 2020-07-23 13:55:44,251 INFO > org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting > RPC endpoint for org.apache.flink.runtime.jobmaster.JobMaster at > akka://flink/user/rpc/jobmanager_2 . > 2020-07-23 13:55:44,260 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - Initializing job JobTest > (99a030d0e3f428490a501c0132f27a56). > 2020-07-23 13:55:44,278 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - Using restart back off time strategy > NoRestartBackoffTimeStrategy for JobTest (99a030d0e3f428490a501c0132f27a56). > 2020-07-23 13:55:44,319 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - Running initialization on master for job JobTest > (99a030d0e3f428490a501c0132f27a56). > 2020-07-23 13:55:44,319 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - Successfully ran initialization on master in 0 ms. > 2020-07-23 13:55:44,428 INFO > org.apache.flink.runtime.scheduler.adapter.DefaultExecutionTopology [] - > Built 1 pipelined regions in 25 ms > 2020-07-23 13:55:44,437 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - Loading state backend via factory > org.apache.flink.contrib.streaming.state.RocksDBStateBackendFactory > 2020-07-23 13:55:44,456 INFO > org.apache.flink.contrib.streaming.state.RocksDBStateBackend [] - Using > predefined options: DEFAULT. > 2020-07-23 13:55:44,457 INFO > org.apache.flink.contrib.streaming.state.RocksDBStateBackend [] - Using > default options factory: > DefaultConfigurableOptionsFactory{configuredOptions={}}. > 2020-07-23 13:55:44,466 WARN org.apache.flink.runtime.util.HadoopUtils > [] - Could not find Hadoop configuration via any of the > supported methods (Flink configuration, environment variables). > 2020-07-23 13:55:45,276 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - Using failover strategy > org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy@72bd8533 > for JobTest (99a030d0e3f428490a501c0132f27a56). > 2020-07-23 13:55:45,280 INFO > org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl [] - > JobManager runner for job JobTest (99a030d0e3f428490a501c0132f27a56) was > granted leadership with session id 00000000-0000-0000-0000-000000000000 at > akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2. > 2020-07-23 13:55:45,286 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - Starting scheduling with scheduling strategy > [org.apache.flink.runtime.scheduler.strategy.EagerSchedulingStrategy] > > > > 2020-07-23 13:55:45,436 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > serve slot request, no ResourceManager connected. Adding as pending request > [SlotRequestId{e092b12b96b0a98bbf057e71b9705c23}] > 2020-07-23 13:55:45,436 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > serve slot request, no ResourceManager connected. Adding as pending request > [SlotRequestId{4ad15f417716c9e07fca383990c0f52a}] > 2020-07-23 13:55:45,436 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > serve slot request, no ResourceManager connected. Adding as pending request > [SlotRequestId{345fdb427a893b7fc3f4f040f93445d2}] > 2020-07-23 13:55:45,437 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > serve slot request, no ResourceManager connected. Adding as pending request > [SlotRequestId{e559485ea7b0b7e17367816882538d90}] > 2020-07-23 13:55:45,437 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > serve slot request, no ResourceManager connected. Adding as pending request > [SlotRequestId{7be8f6c1aedb27b04e7feae68078685c}] > 2020-07-23 13:55:45,437 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > serve slot request, no ResourceManager connected. Adding as pending request > [SlotRequestId{582a86197884206652dff3aea2306bb3}] > 2020-07-23 13:55:45,437 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > serve slot request, no ResourceManager connected. Adding as pending request > [SlotRequestId{0cc24260eda3af299a0b321feefaf2cb}] > 2020-07-23 13:55:45,437 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > serve slot request, no ResourceManager connected. Adding as pending request > [SlotRequestId{240ca6f3d3b5ece6a98243ec8cadf616}] > 2020-07-23 13:55:45,438 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > serve slot request, no ResourceManager connected. Adding as pending request > [SlotRequestId{c35033d598a517acc108424bb9f809fb}] > 2020-07-23 13:55:45,438 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > serve slot request, no ResourceManager connected. Adding as pending request > [SlotRequestId{ad35013c3b532d4b4df1be62395ae0cf}] > 2020-07-23 13:55:45,438 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > serve slot request, no ResourceManager connected. Adding as pending request > [SlotRequestId{c929bd5e8daf432d01fad1ece3daec1a}] > 2020-07-23 13:55:45,487 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - Connecting to ResourceManager > akka.tcp://flink@flink-jobmanager > :6123/user/rpc/resourcemanager_*(00000000000000000000000000000000) > 2020-07-23 13:55:45,492 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - Resolved ResourceManager address, beginning > registration > 2020-07-23 13:55:45,493 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registering job manager 00000000000000000000000000000000 > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > 99a030d0e3f428490a501c0132f27a56. > 2020-07-23 13:55:45,499 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registered job manager 00000000000000000000000000000000 > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > 99a030d0e3f428490a501c0132f27a56. > 2020-07-23 13:55:45,501 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - JobManager successfully registered at ResourceManager, > leader id: 00000000000000000000000000000000. > 2020-07-23 13:55:45,501 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{15fd2a9565c2b080748c1d1592b1cbbc}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,502 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Request slot with profile ResourceProfile{UNKNOWN} for job > 99a030d0e3f428490a501c0132f27a56 with allocation id > d420d08bf2654d9ea76955c70db18b69. > 2020-07-23 13:55:45,502 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{8cd72cc16f0e319d915a9a096a1096d7}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,503 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{e7e422409acebdb385014a9634af6a90}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,503 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{cef1af73546ca1fc27ca7a3322e9e815}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,503 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{108fe0b3086567ad79275eccef2fdaf8}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,503 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{265e67985eab7a6dc08024e53bf2708d}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,503 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{7087497a17c441f1a1d6fefcbc7cd0ea}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,503 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{14ac08438e79c8db8d25d93b99d62725}] and > profile ResourceProfile{UNKNOWN} from resource manager. > > 2020-07-23 13:55:45,514 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Request slot with profile ResourceProfile{UNKNOWN} for job > 99a030d0e3f428490a501c0132f27a56 with allocation id > fce526bbe3e1be91caa3e4b536b20e35. > 2020-07-23 13:55:45,514 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{40c7abbb12514c405323b0569fb21647}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,514 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{a4985a9647b65b30a571258b45c8f2ce}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,515 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{c52a6eb2fa58050e71e7903590019fd1}] and > profile ResourceProfile{UNKNOWN} from resource manager. > > 2020-07-23 13:55:45,517 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Request slot with profile ResourceProfile{UNKNOWN} for job > 99a030d0e3f428490a501c0132f27a56 with allocation id > 18ac7ec802ebfcfed8c05ee9324a55a4. > > 2020-07-23 13:55:45,518 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Request slot with profile ResourceProfile{UNKNOWN} for job > 99a030d0e3f428490a501c0132f27a56 with allocation id > 7ec76cbe689eb418b63599e90ade19be. > 2020-07-23 13:55:45,518 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{46d65692a8b5aad11b51f9a74a666a74}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,518 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{3670bb4f345eedf941cc18e477ba1e9d}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,518 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{4a12467d76b9e3df8bc3412c0be08e14}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,518 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{e092b12b96b0a98bbf057e71b9705c23}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,518 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{4ad15f417716c9e07fca383990c0f52a}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,518 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{345fdb427a893b7fc3f4f040f93445d2}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,519 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{e559485ea7b0b7e17367816882538d90}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,519 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Request slot with profile ResourceProfile{UNKNOWN} for job > 99a030d0e3f428490a501c0132f27a56 with allocation id > b78837a29b4032924ac25be70ed21a3c. > > > 2020-07-23 13:58:18,037 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.47.96.2, using IP address > as host name. Local input split assignment (such as for HDFS files) may be > impacted. > 2020-07-23 13:58:22,192 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.34.64.14, using IP address > as host name. Local input split assignment (such as for HDFS files) may be > impacted. > 2020-07-23 13:58:22,358 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.34.128.9, using IP address > as host name. Local input split assignment (such as for HDFS files) may be > impacted. > 2020-07-23 13:58:24,562 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.32.160.6, using IP address > as host name. Local input split assignment (such as for HDFS files) may be > impacted. > 2020-07-23 13:58:25,487 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.38.64.7, using IP address > as host name. Local input split assignment (such as for HDFS files) may be > impacted. > 2020-07-23 13:58:27,636 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.42.160.6, using IP address > as host name. Local input split assignment (such as for HDFS files) may be > impacted. > 2020-07-23 13:58:27,767 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.43.64.12, using IP address > as host name. Local input split assignment (such as for HDFS files) may be > impacted. > 2020-07-23 13:58:29,651 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > The heartbeat of JobManager with id 456a18b6c404cb11a359718e16de1c6b timed > out. > 2020-07-23 13:58:29,651 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Disconnect job manager 00000000000000000000000000000000 > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > 99a030d0e3f428490a501c0132f27a56 from the resource manager. > 2020-07-23 13:58:29,854 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.39.0.8, using IP address > as host name. Local input split assignment (such as for HDFS files) may be > impacted. > 2020-07-23 13:58:33,623 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.35.0.10, using IP address > as host name. Local input split assignment (such as for HDFS files) may be > impacted. > 2020-07-23 13:58:35,756 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.36.32.8, using IP address > as host name. Local input split assignment (such as for HDFS files) may be > impacted. > 2020-07-23 13:58:36,694 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.42.128.6, using IP address > as host name. Local input split assignment (such as for HDFS files) may be > impacted. > > > 2020-07-23 14:01:17,814 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - Close ResourceManager connection > 83b1ff14900abfd54418e7fa3efb3f8a: The heartbeat of JobManager with id > 456a18b6c404cb11a359718e16de1c6b timed out.. > 2020-07-23 14:01:17,815 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - Connecting to ResourceManager > akka.tcp://flink@flink-jobmanager > :6123/user/rpc/resourcemanager_*(00000000000000000000000000000000) > 2020-07-23 14:01:17,816 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - Resolved ResourceManager address, beginning > registration > 2020-07-23 14:01:17,816 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registering job manager 00000000000000000000000000000000 > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > 99a030d0e3f428490a501c0132f27a56. > 2020-07-23 14:01:17,836 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: > host_relation -> Timestamps/Watermarks -> Map (1/1) > (302ca9640e2d209a543d843f2996ccd2) switched from SCHEDULED to FAILED on not > deployed. > org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: > Could not allocate the required slot within slot request timeout. Please > make sure that the cluster has enough resources. > at > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeWrapWithNoResourceAvailableException(DefaultScheduler.java:441) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.scheduler.DefaultScheduler.lambda$assignResourceOrHandleError$6(DefaultScheduler.java:422) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.lambda$internalAllocateSlot$0(SchedulerImpl.java:168) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$SingleTaskSlot.release(SlotSharingManager.java:726) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.release(SlotSharingManager.java:537) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.lambda$new$0(SlotSharingManager.java:432) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > org.apache.flink.runtime.concurrent.FutureUtils.lambda$forwardTo$21(FutureUtils.java:1120) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:1036) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:402) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:195) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > [flink-dist_2.11-1.11.1.jar:1.11.1] > Caused by: java.util.concurrent.CompletionException: > java.util.concurrent.TimeoutException > at > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) > ~[?:1.8.0_242] > ... 25 more > Caused by: java.util.concurrent.TimeoutException > ... 23 more > 2020-07-23 14:01:17,848 INFO > org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy > [] - Calculating tasks to restart to recover the failed task > cbc357ccb763df2852fee8c4fc7d55f2_0. > 2020-07-23 14:01:17,910 INFO > org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy > [] - 902 tasks should be restarted to recover the failed task > cbc357ccb763df2852fee8c4fc7d55f2_0. > 2020-07-23 14:01:17,913 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job > JobTest (99a030d0e3f428490a501c0132f27a56) switched from state RUNNING to > FAILING. > org.apache.flink.runtime.JobException: Recovery is suppressed by > NoRestartBackoffTimeStrategy > at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:116) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:78) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:192) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:185) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:179) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:503) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.scheduler.UpdateSchedulerNgOnInternalFailuresListener.notifyTaskFailure(UpdateSchedulerNgOnInternalFailuresListener.java:49) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.executiongraph.ExecutionGraph.notifySchedulerNgAboutInternalTaskFailure(ExecutionGraph.java:1710) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1287) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1255) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.executiongraph.Execution.markFailed(Execution.java:1086) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.executiongraph.ExecutionVertex.markFailed(ExecutionVertex.java:748) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.scheduler.DefaultExecutionVertexOperations.markFailed(DefaultExecutionVertexOperations.java:41) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskDeploymentFailure(DefaultScheduler.java:435) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.scheduler.DefaultScheduler.lambda$assignResourceOrHandleError$6(DefaultScheduler.java:422) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.lambda$internalAllocateSlot$0(SchedulerImpl.java:168) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$SingleTaskSlot.release(SlotSharingManager.java:726) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.release(SlotSharingManager.java:537) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.lambda$new$0(SlotSharingManager.java:432) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > org.apache.flink.runtime.concurrent.FutureUtils.lambda$forwardTo$21(FutureUtils.java:1120) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:1036) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:402) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:195) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > [flink-dist_2.11-1.11.1.jar:1.11.1] > Caused by: > org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: > Could not allocate the required slot within slot request timeout. Please > make sure that the cluster has enough resources. > at > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeWrapWithNoResourceAvailableException(DefaultScheduler.java:441) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > ... 45 more > Caused by: java.util.concurrent.CompletionException: > java.util.concurrent.TimeoutException > at > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) > ~[?:1.8.0_242] > ... 25 more > Caused by: java.util.concurrent.TimeoutException > ... 23 more > > > > 2020-07-23 14:01:18,109 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - > Discarding the results produced by task execution > 1809eb912d69854f2babedeaf879df6a. > 2020-07-23 14:01:18,110 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job > JobTest (99a030d0e3f428490a501c0132f27a56) switched from state FAILING to > FAILED. > org.apache.flink.runtime.JobException: Recovery is suppressed by > NoRestartBackoffTimeStrategy > at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:116) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:78) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:192) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:185) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:179) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:503) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.scheduler.UpdateSchedulerNgOnInternalFailuresListener.notifyTaskFailure(UpdateSchedulerNgOnInternalFailuresListener.java:49) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.executiongraph.ExecutionGraph.notifySchedulerNgAboutInternalTaskFailure(ExecutionGraph.java:1710) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1287) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1255) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.executiongraph.Execution.markFailed(Execution.java:1086) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.executiongraph.ExecutionVertex.markFailed(ExecutionVertex.java:748) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.scheduler.DefaultExecutionVertexOperations.markFailed(DefaultExecutionVertexOperations.java:41) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskDeploymentFailure(DefaultScheduler.java:435) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.scheduler.DefaultScheduler.lambda$assignResourceOrHandleError$6(DefaultScheduler.java:422) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.lambda$internalAllocateSlot$0(SchedulerImpl.java:168) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$SingleTaskSlot.release(SlotSharingManager.java:726) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.release(SlotSharingManager.java:537) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.lambda$new$0(SlotSharingManager.java:432) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > org.apache.flink.runtime.concurrent.FutureUtils.lambda$forwardTo$21(FutureUtils.java:1120) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:1036) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:402) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:195) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > [flink-dist_2.11-1.11.1.jar:1.11.1] > Caused by: > org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: > Could not allocate the required slot within slot request timeout. Please > make sure that the cluster has enough resources. > at > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeWrapWithNoResourceAvailableException(DefaultScheduler.java:441) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > ... 45 more > Caused by: java.util.concurrent.CompletionException: > java.util.concurrent.TimeoutException > at > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) > ~[?:1.8.0_242] > at > java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) > ~[?:1.8.0_242] > ... 25 more > Caused by: java.util.concurrent.TimeoutException > ... 23 more > 2020-07-23 14:01:18,114 INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Stopping > checkpoint coordinator for job 99a030d0e3f428490a501c0132f27a56. > 2020-07-23 14:01:18,117 INFO > org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore [] > - Shutting down > 2020-07-23 14:01:18,118 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - > Discarding the results produced by task execution > 302ca9640e2d209a543d843f2996ccd2. > 2020-07-23 14:01:18,120 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Pending > slot request [SlotRequestId{15fd2a9565c2b080748c1d1592b1cbbc}] timed out. > 2020-07-23 14:01:18,120 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Pending > slot request [SlotRequestId{8cd72cc16f0e319d915a9a096a1096d7}] timed out. > 2020-07-23 14:01:18,120 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Pending > slot request [SlotRequestId{e7e422409acebdb385014a9634af6a90}] timed out. > 2020-07-23 14:01:18,121 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Pending > slot request [SlotRequestId{cef1af73546ca1fc27ca7a3322e9e815}] timed out. > 2020-07-23 14:01:18,121 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Pending > slot request [SlotRequestId{108fe0b3086567ad79275eccef2fdaf8}] timed out. > 2020-07-23 14:01:18,121 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Pending > slot request [SlotRequestId{265e67985eab7a6dc08024e53bf2708d}] timed out. > 2020-07-23 14:01:18,122 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Pending > slot request [SlotRequestId{7087497a17c441f1a1d6fefcbc7cd0ea}] timed out. > 2020-07-23 14:01:18,122 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Pending > slot request [ > > > 2020-07-23 14:01:18,151 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registering job manager 00000000000000000000000000000000 > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > 99a030d0e3f428490a501c0132f27a56. > 2020-07-23 14:01:18,157 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registered job manager 00000000000000000000000000000000 > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > 99a030d0e3f428490a501c0132f27a56. > 2020-07-23 14:01:18,157 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registered job manager 00000000000000000000000000000000 > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > 99a030d0e3f428490a501c0132f27a56. > 2020-07-23 14:01:18,157 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Job > 99a030d0e3f428490a501c0132f27a56 reached globally terminal state FAILED. > 2020-07-23 14:01:18,162 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registered job manager 00000000000000000000000000000000 > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > 99a030d0e3f428490a501c0132f27a56. > 2020-07-23 14:01:18,162 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - JobManager successfully registered at ResourceManager, > leader id: 00000000000000000000000000000000. > 2020-07-23 14:01:18,225 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - Stopping the JobMaster for job > JobTest(99a030d0e3f428490a501c0132f27a56). > 2020-07-23 14:01:18,381 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Suspending SlotPool. > 2020-07-23 14:01:18,382 INFO org.apache.flink.runtime.jobmaster.JobMaster > [] - Close ResourceManager connection > 83b1ff14900abfd54418e7fa3efb3f8a: JobManager is shutting down.. > 2020-07-23 14:01:18,382 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Stopping > SlotPool. > 2020-07-23 14:01:18,382 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Disconnect job manager 00000000000000000000000000000000 > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > 99a030d0e3f428490a501c0132f27a56 from the resource manager. > > a511955993 > 邮箱:[hidden email] > > <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=a511955993&uid=a511955993%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22%E9%82%AE%E7%AE%B1%EF%BC%9Aa511955993%40163.com%22%5D> > > 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail88> 定制 > > On 07/23/2020 13:26, Yang Wang <[hidden email]> wrote: > 很高兴你的问题解决了,但我觉得根本原因应该不是加上了taskmanager-query-state-service.yaml的关系。 > 我这边不创建这个服务也是正常的,而且nslookup {tm_ip_address}是可以正常反解析到hostname的。 > > 注意这里不是解析hostname,而是通过ip地址来反解析进行验证 > > > 回答你说的两个问题: > 1. 不是必须的,我这边验证不需要创建,集群也是可以正常运行任务的。Rest > service的暴露方式是ClusterIP、NodePort、LoadBalancer都正常 > 2. 如果没有配置taskmanager.bind-host, > [Flink-15911][Flink-15154]这两个JIRA并不会影响TM向RM注册时候的使用的地址 > > 如果你想找到根本原因,那可能需要你这边提供JM/TM的完整log,这样方便分析 > > > Best, > Yang > > SmileSmile <[hidden email]> 于2020年7月23日周四 上午11:30写道: > > > > > Hi Yang Wang > > > > 刚刚在测试环境测试了一下,taskManager没有办法nslookup出来,JM可以nslookup,这两者的差别在于是否有service。 > > > > 解决方案:我这边给集群加上了taskmanager-query-state-service.yaml(按照官网上是可选服务)。就不会刷No > > hostname could be resolved for ip > > address,将NodePort改为ClusterIp,作业就可以成功提交,不会出现time out的问题了,问题得到了解决。 > > > > > > 1. 如果按照上面的情况,那么这个配置文件是必须配置的? > > > > 2. 在1.11的更新中,发现有 [Flink-15911][Flink-15154] > > 支持分别配置用于本地监听绑定的网络接口和外部访问的地址和端口。是否是这块的改动, > > 需要JM去通过TM上报的ip反向解析出service? > > > > > > Bset! > > > > > > [1] > > > https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/kubernetes.html > > > > a511955993 > > 邮箱:[hidden email] > > > > < > https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=a511955993&uid=a511955993%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22%E9%82%AE%E7%AE%B1%EF%BC%9Aa511955993%40163.com%22%5D> > > > > > 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail88> 定制 > > > > On 07/23/2020 10:11, Yang Wang <[hidden email]> wrote: > > 我的意思就是你在Flink任务运行的过程中,然后下面的命令在集群里面起一个busybox的pod, > > 在里面执行 nslookup {ip_address},看看是否能够正常解析到。如果不能应该就是coredns的 > > 问题了 > > > > kubectl run -i -t busybox --image=busybox --restart=Never > > > > 你需要确认下集群的coredns pod是否正常,一般是部署在kube-system这个namespace下的 > > > > > > > > Best, > > Yang > > > > > > SmileSmile <[hidden email]> 于2020年7月22日周三 下午7:57写道: > > > > > > > > Hi,Yang Wang! > > > > > > 很开心可以收到你的回复,你的回复帮助很大,让我知道了问题的方向。我再补充些信息,希望可以帮我进一步判断一下问题根源。 > > > > > > 在JM报错的地方,No hostname could be resolved for ip address xxxxx > > > ,报出来的ip是k8s分配给flink pod的内网ip,不是宿主机的ip。请问这个问题可能出在哪里呢 > > > > > > Best! > > > > > > > > > a511955993 > > > 邮箱:[hidden email] > > > > > > < > > > https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=a511955993&uid=a511955993%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22%E9%82%AE%E7%AE%B1%EF%BC%9Aa511955993%40163.com%22%5D> > > > > > > > > > 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail88> 定制 > > > > > > On 07/22/2020 18:18, Yang Wang <[hidden email]> wrote: > > > 如果你的日志里面一直在刷No hostname could be resolved for the IP > > address,应该是集群的coredns > > > 有问题,由ip地址反查hostname查不到。你可以起一个busybox验证一下是不是这个ip就解析不了,有 > > > 可能是coredns有问题 > > > > > > > > > Best, > > > Yang > > > > > > Congxian Qiu <[hidden email]> 于2020年7月21日周二 下午7:29写道: > > > > > > > Hi > > > > 不确定 k8s 环境中能否看到 pod 的完整日志?类似 Yarn 的 NM 日志一样,如果有的话,可以尝试看一下这个 pod > > > > 的完整日志有没有什么发现 > > > > Best, > > > > Congxian > > > > > > > > > > > > SmileSmile <[hidden email]> 于2020年7月21日周二 下午3:19写道: > > > > > > > > > Hi,Congxian > > > > > > > > > > 因为是测试环境,没有配置HA,目前看到的信息,就是JM刷出来大量的no hostname could be > > > > > resolved,jm失联,作业提交失败。 > > > > > 将jm内存配置为10g也是一样的情况(jobmanager.memory.pprocesa.size:10240m)。 > > > > > > > > > > 在同一个环境将版本回退到1.10没有出现该问题,也不会刷如上报错。 > > > > > > > > > > > > > > > 是否有其他排查思路? > > > > > > > > > > Best! > > > > > > > > > > > > > > > > > > > > > > > > > | | > > > > > a511955993 > > > > > | > > > > > | > > > > > 邮箱:[hidden email] > > > > > | > > > > > > > > > > 签名由 网易邮箱大师 定制 > > > > > > > > > > On 07/16/2020 13:17, Congxian Qiu wrote: > > > > > Hi > > > > > 如果没有异常,GC 情况也正常的话,或许可以看一下 pod 的相关日志,如果开启了 HA 也可以看一下 zk > > 的日志。之前遇到过一次在 > > > > Yarn > > > > > 环境中类似的现象是由于其他原因导致的,通过看 NM 日志以及 zk 日志发现的原因。 > > > > > > > > > > Best, > > > > > Congxian > > > > > > > > > > > > > > > SmileSmile <[hidden email]> 于2020年7月15日周三 下午5:20写道: > > > > > > > > > > > Hi Roc > > > > > > > > > > > > 该现象在1.10.1版本没有,在1.11版本才出现。请问这个该如何查比较合适 > > > > > > > > > > > > > > > > > > > > > > > > | | > > > > > > a511955993 > > > > > > | > > > > > > | > > > > > > 邮箱:[hidden email] > > > > > > | > > > > > > > > > > > > 签名由 网易邮箱大师 定制 > > > > > > > > > > > > On 07/15/2020 17:16, Roc Marshal wrote: > > > > > > Hi,SmileSmile. > > > > > > 个人之前有遇到过 类似 的host解析问题,可以从k8s的pod节点网络映射角度排查一下。 > > > > > > 希望这对你有帮助。 > > > > > > > > > > > > > > > > > > 祝好。 > > > > > > Roc Marshal > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 在 2020-07-15 17:04:18,"SmileSmile" <[hidden email]> 写道: > > > > > > > > > > > > > >Hi > > > > > > > > > > > > > >使用版本Flink 1.11,部署方式 kubernetes session。 TM个数30个,每个TM 4个slot。 > job > > > > > > 并行度120.提交作业的时候出现大量的No hostname could be resolved for the IP > > > address,JM > > > > > time > > > > > > out,作业提交失败。web ui也会卡主无响应。 > > > > > > > > > > > > > >用wordCount,并行度只有1提交也会刷,no > hostname的日志会刷个几条,然后正常提交,如果并行度一上去,就会超时。 > > > > > > > > > > > > > > > > > > > > >部分日志如下: > > > > > > > > > > > > > >2020-07-15 16:58:46,460 WARN > > > > > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] > - > > No > > > > > > hostname could be resolved for the IP address 10.32.160.7, using > > IP > > > > > address > > > > > > as host name. Local input split assignment (such as for HDFS > > files) > > > may > > > > > be > > > > > > impacted. > > > > > > >2020-07-15 16:58:46,460 WARN > > > > > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] > - > > No > > > > > > hostname could be resolved for the IP address 10.44.224.7, using > > IP > > > > > address > > > > > > as host name. Local input split assignment (such as for HDFS > > files) > > > may > > > > > be > > > > > > impacted. > > > > > > >2020-07-15 16:58:46,461 WARN > > > > > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] > - > > No > > > > > > hostname could be resolved for the IP address 10.40.32.9, using > IP > > > > > address > > > > > > as host name. Local input split assignment (such as for HDFS > > files) > > > may > > > > > be > > > > > > impacted. > > > > > > > > > > > > > >2020-07-15 16:59:10,236 INFO > > > > > > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager > > > [] - > > > > > The > > > > > > heartbeat of JobManager with id 69a0d460de468888a9f41c770d963c0a > > > timed > > > > > out. > > > > > > >2020-07-15 16:59:10,236 INFO > > > > > > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager > > > [] - > > > > > > Disconnect job manager 00000000000000000000000000000000 > > > > > > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 > for > > > job > > > > > > e1554c737e37ed79688a15c746b6e9ef from the resource manager. > > > > > > > > > > > > > > > > > > > > >how to deal with ? > > > > > > > > > > > > > > > > > > > > >beset ! > > > > > > > > > > > > > >| | > > > > > > >a511955993 > > > > > > >| > > > > > > >| > > > > > > >邮箱:[hidden email] > > > > > > >| > > > > > > > > > > > > > >签名由 网易邮箱大师 定制 > > > > > > > > > > > > > > > > > > > > > > > > > > > |
建议先配置heartbeat.timeout的值大一些,然后把gc log打出来
看看是不是经常发生fullGC,每次持续时间是多长,从你目前提供的log看,进程内JM->RM都会心跳超时 怀疑还是和GC有关的 env.java.opts.jobmanager: -Xloggc:<LOG_DIR>/jobmanager-gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=2 -XX:GCLogFileSize=512M Best, Yang SmileSmile <[hidden email]> 于2020年7月27日周一 下午1:50写道: > Hi,Yang Wang > > 因为日志太长了,删了一些重复的内容。 > 一开始怀疑过jm gc的问题,将jm的内存调整为10g也是一样的情况。 > > Best > > > > | | > a511955993 > | > | > 邮箱:[hidden email] > | > > 签名由 网易邮箱大师 定制 > > On 07/27/2020 11:36, Yang Wang wrote: > 看你这个任务,失败的根本原因并不是“No hostname could be resolved > ”,这个WARNING的原因可以单独讨论(如果在1.10里面不存在的话)。 > 你可以本地起一个Standalone的集群,也会有这样的WARNING,并不影响正常使用 > > > 失败的原因是slot 5分钟申请超时了,你给的日志里面2020-07-23 13:55:45,519到2020-07-23 > 13:58:18,037是空白的,没有进行省略吧? > 这段时间按理应该是task开始deploy了。在日志里看到了JM->RM的心跳超时,同一个Pod里面的同一个进程通信也超时了 > 所以怀疑JM一直在FullGC,这个需要你确认一下 > > > Best, > Yang > > SmileSmile <[hidden email]> 于2020年7月23日周四 下午2:43写道: > > > Hi Yang Wang > > > > 先分享下我这边的环境版本 > > > > > > kubernetes:1.17.4. CNI: weave > > > > > > 1 2 3 是我的一些疑惑 > > > > 4 是JM日志 > > > > > > 1. 去掉taskmanager-query-state-service.yaml后确实不行 nslookup > > > > kubectl exec -it busybox2 -- /bin/sh > > / # nslookup 10.47.96.2 > > Server: 10.96.0.10 > > Address: 10.96.0.10:53 > > > > ** server can't find 2.96.47.10.in-addr.arpa: NXDOMAIN > > > > > > > > 2. Flink1.11和Flink1.10 > > > > detail subtasks taskmanagers xxx x 这行 > > > 1.11变成了172-20-0-50。1.10是flink-taskmanager-7b5d6958b6-sfzlk:36459。这块的改动是?(目前这个集群跑着1.10和1.11,1.10可以正常运行,如果coredns有问题,1.10版本的flink应该也有一样的情况吧?) > > > > 3. coredns是否特殊配置? > > > > 在容器中解析域名是正常的,只是反向解析没有service才会有问题。coredns是否有什么需要配置? > > > > > > 4. time out时候的JM日志如下: > > > > > > > > 2020-07-23 13:53:00,228 INFO > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > > ResourceManager akka.tcp://flink@flink-jobmanager > :6123/user/rpc/resourcemanager_0 > > was granted leadership with fencing token > 00000000000000000000000000000000 > > 2020-07-23 13:53:00,232 INFO > > org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - > Starting > > RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher > > at akka://flink/user/rpc/dispatcher_1 . > > 2020-07-23 13:53:00,233 INFO > > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] > - > > Starting the SlotManager. > > 2020-07-23 13:53:03,472 INFO > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > > Registering TaskManager with ResourceID 1f9ae0cd95a28943a73be26323588696 > > (akka.tcp://flink@10.34.128.9:6122/user/rpc/taskmanager_0) at > > ResourceManager > > 2020-07-23 13:53:03,777 INFO > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > > Registering TaskManager with ResourceID cac09e751264e61615329c20713a84b4 > > (akka.tcp://flink@10.32.160.6:6122/user/rpc/taskmanager_0) at > > ResourceManager > > 2020-07-23 13:53:03,787 INFO > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > > Registering TaskManager with ResourceID 93c72d01d09f9ae427c5fc980ed4c1e4 > > (akka.tcp://flink@10.39.0.8:6122/user/rpc/taskmanager_0) at > > ResourceManager > > 2020-07-23 13:53:04,044 INFO > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > > Registering TaskManager with ResourceID 8adf2f8e81b77a16d5418a9e252c61e2 > > (akka.tcp://flink@10.38.64.7:6122/user/rpc/taskmanager_0) at > > ResourceManager > > 2020-07-23 13:53:04,099 INFO > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > > Registering TaskManager with ResourceID 23e9d2358f6eb76b9ae718d879d4f330 > > (akka.tcp://flink@10.42.160.6:6122/user/rpc/taskmanager_0) at > > ResourceManager > > 2020-07-23 13:53:04,146 INFO > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > > Registering TaskManager with ResourceID 092f8dee299e32df13db3111662b61f8 > > (akka.tcp://flink@10.33.192.14:6122/user/rpc/taskmanager_0) at > > ResourceManager > > > > > > 2020-07-23 13:55:44,220 INFO > > org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - > Received > > JobGraph submission 99a030d0e3f428490a501c0132f27a56 (JobTest). > > 2020-07-23 13:55:44,222 INFO > > org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - > > Submitting job 99a030d0e3f428490a501c0132f27a56 (JobTest). > > 2020-07-23 13:55:44,251 INFO > > org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - > Starting > > RPC endpoint for org.apache.flink.runtime.jobmaster.JobMaster at > > akka://flink/user/rpc/jobmanager_2 . > > 2020-07-23 13:55:44,260 INFO > org.apache.flink.runtime.jobmaster.JobMaster > > [] - Initializing job JobTest > > (99a030d0e3f428490a501c0132f27a56). > > 2020-07-23 13:55:44,278 INFO > org.apache.flink.runtime.jobmaster.JobMaster > > [] - Using restart back off time strategy > > NoRestartBackoffTimeStrategy for JobTest > (99a030d0e3f428490a501c0132f27a56). > > 2020-07-23 13:55:44,319 INFO > org.apache.flink.runtime.jobmaster.JobMaster > > [] - Running initialization on master for job JobTest > > (99a030d0e3f428490a501c0132f27a56). > > 2020-07-23 13:55:44,319 INFO > org.apache.flink.runtime.jobmaster.JobMaster > > [] - Successfully ran initialization on master in 0 ms. > > 2020-07-23 13:55:44,428 INFO > > org.apache.flink.runtime.scheduler.adapter.DefaultExecutionTopology [] - > > Built 1 pipelined regions in 25 ms > > 2020-07-23 13:55:44,437 INFO > org.apache.flink.runtime.jobmaster.JobMaster > > [] - Loading state backend via factory > > org.apache.flink.contrib.streaming.state.RocksDBStateBackendFactory > > 2020-07-23 13:55:44,456 INFO > > org.apache.flink.contrib.streaming.state.RocksDBStateBackend [] - Using > > predefined options: DEFAULT. > > 2020-07-23 13:55:44,457 INFO > > org.apache.flink.contrib.streaming.state.RocksDBStateBackend [] - Using > > default options factory: > > DefaultConfigurableOptionsFactory{configuredOptions={}}. > > 2020-07-23 13:55:44,466 WARN org.apache.flink.runtime.util.HadoopUtils > > [] - Could not find Hadoop configuration via any of the > > supported methods (Flink configuration, environment variables). > > 2020-07-23 13:55:45,276 INFO > org.apache.flink.runtime.jobmaster.JobMaster > > [] - Using failover strategy > > > org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy@72bd8533 > > for JobTest (99a030d0e3f428490a501c0132f27a56). > > 2020-07-23 13:55:45,280 INFO > > org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl [] - > > JobManager runner for job JobTest (99a030d0e3f428490a501c0132f27a56) was > > granted leadership with session id 00000000-0000-0000-0000-000000000000 > at > > akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2. > > 2020-07-23 13:55:45,286 INFO > org.apache.flink.runtime.jobmaster.JobMaster > > [] - Starting scheduling with scheduling strategy > > [org.apache.flink.runtime.scheduler.strategy.EagerSchedulingStrategy] > > > > > > > > 2020-07-23 13:55:45,436 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > > serve slot request, no ResourceManager connected. Adding as pending > request > > [SlotRequestId{e092b12b96b0a98bbf057e71b9705c23}] > > 2020-07-23 13:55:45,436 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > > serve slot request, no ResourceManager connected. Adding as pending > request > > [SlotRequestId{4ad15f417716c9e07fca383990c0f52a}] > > 2020-07-23 13:55:45,436 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > > serve slot request, no ResourceManager connected. Adding as pending > request > > [SlotRequestId{345fdb427a893b7fc3f4f040f93445d2}] > > 2020-07-23 13:55:45,437 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > > serve slot request, no ResourceManager connected. Adding as pending > request > > [SlotRequestId{e559485ea7b0b7e17367816882538d90}] > > 2020-07-23 13:55:45,437 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > > serve slot request, no ResourceManager connected. Adding as pending > request > > [SlotRequestId{7be8f6c1aedb27b04e7feae68078685c}] > > 2020-07-23 13:55:45,437 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > > serve slot request, no ResourceManager connected. Adding as pending > request > > [SlotRequestId{582a86197884206652dff3aea2306bb3}] > > 2020-07-23 13:55:45,437 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > > serve slot request, no ResourceManager connected. Adding as pending > request > > [SlotRequestId{0cc24260eda3af299a0b321feefaf2cb}] > > 2020-07-23 13:55:45,437 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > > serve slot request, no ResourceManager connected. Adding as pending > request > > [SlotRequestId{240ca6f3d3b5ece6a98243ec8cadf616}] > > 2020-07-23 13:55:45,438 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > > serve slot request, no ResourceManager connected. Adding as pending > request > > [SlotRequestId{c35033d598a517acc108424bb9f809fb}] > > 2020-07-23 13:55:45,438 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > > serve slot request, no ResourceManager connected. Adding as pending > request > > [SlotRequestId{ad35013c3b532d4b4df1be62395ae0cf}] > > 2020-07-23 13:55:45,438 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > > serve slot request, no ResourceManager connected. Adding as pending > request > > [SlotRequestId{c929bd5e8daf432d01fad1ece3daec1a}] > > 2020-07-23 13:55:45,487 INFO > org.apache.flink.runtime.jobmaster.JobMaster > > [] - Connecting to ResourceManager > > akka.tcp://flink@flink-jobmanager > > :6123/user/rpc/resourcemanager_*(00000000000000000000000000000000) > > 2020-07-23 13:55:45,492 INFO > org.apache.flink.runtime.jobmaster.JobMaster > > [] - Resolved ResourceManager address, beginning > > registration > > 2020-07-23 13:55:45,493 INFO > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > > Registering job manager 00000000000000000000000000000000 > > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > > 99a030d0e3f428490a501c0132f27a56. > > 2020-07-23 13:55:45,499 INFO > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > > Registered job manager 00000000000000000000000000000000 > > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > > 99a030d0e3f428490a501c0132f27a56. > > 2020-07-23 13:55:45,501 INFO > org.apache.flink.runtime.jobmaster.JobMaster > > [] - JobManager successfully registered at > ResourceManager, > > leader id: 00000000000000000000000000000000. > > 2020-07-23 13:55:45,501 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > > Requesting new slot [SlotRequestId{15fd2a9565c2b080748c1d1592b1cbbc}] and > > profile ResourceProfile{UNKNOWN} from resource manager. > > 2020-07-23 13:55:45,502 INFO > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > > Request slot with profile ResourceProfile{UNKNOWN} for job > > 99a030d0e3f428490a501c0132f27a56 with allocation id > > d420d08bf2654d9ea76955c70db18b69. > > 2020-07-23 13:55:45,502 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > > Requesting new slot [SlotRequestId{8cd72cc16f0e319d915a9a096a1096d7}] and > > profile ResourceProfile{UNKNOWN} from resource manager. > > 2020-07-23 13:55:45,503 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > > Requesting new slot [SlotRequestId{e7e422409acebdb385014a9634af6a90}] and > > profile ResourceProfile{UNKNOWN} from resource manager. > > 2020-07-23 13:55:45,503 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > > Requesting new slot [SlotRequestId{cef1af73546ca1fc27ca7a3322e9e815}] and > > profile ResourceProfile{UNKNOWN} from resource manager. > > 2020-07-23 13:55:45,503 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > > Requesting new slot [SlotRequestId{108fe0b3086567ad79275eccef2fdaf8}] and > > profile ResourceProfile{UNKNOWN} from resource manager. > > 2020-07-23 13:55:45,503 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > > Requesting new slot [SlotRequestId{265e67985eab7a6dc08024e53bf2708d}] and > > profile ResourceProfile{UNKNOWN} from resource manager. > > 2020-07-23 13:55:45,503 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > > Requesting new slot [SlotRequestId{7087497a17c441f1a1d6fefcbc7cd0ea}] and > > profile ResourceProfile{UNKNOWN} from resource manager. > > 2020-07-23 13:55:45,503 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > > Requesting new slot [SlotRequestId{14ac08438e79c8db8d25d93b99d62725}] and > > profile ResourceProfile{UNKNOWN} from resource manager. > > > > 2020-07-23 13:55:45,514 INFO > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > > Request slot with profile ResourceProfile{UNKNOWN} for job > > 99a030d0e3f428490a501c0132f27a56 with allocation id > > fce526bbe3e1be91caa3e4b536b20e35. > > 2020-07-23 13:55:45,514 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > > Requesting new slot [SlotRequestId{40c7abbb12514c405323b0569fb21647}] and > > profile ResourceProfile{UNKNOWN} from resource manager. > > 2020-07-23 13:55:45,514 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > > Requesting new slot [SlotRequestId{a4985a9647b65b30a571258b45c8f2ce}] and > > profile ResourceProfile{UNKNOWN} from resource manager. > > 2020-07-23 13:55:45,515 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > > Requesting new slot [SlotRequestId{c52a6eb2fa58050e71e7903590019fd1}] and > > profile ResourceProfile{UNKNOWN} from resource manager. > > > > 2020-07-23 13:55:45,517 INFO > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > > Request slot with profile ResourceProfile{UNKNOWN} for job > > 99a030d0e3f428490a501c0132f27a56 with allocation id > > 18ac7ec802ebfcfed8c05ee9324a55a4. > > > > 2020-07-23 13:55:45,518 INFO > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > > Request slot with profile ResourceProfile{UNKNOWN} for job > > 99a030d0e3f428490a501c0132f27a56 with allocation id > > 7ec76cbe689eb418b63599e90ade19be. > > 2020-07-23 13:55:45,518 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > > Requesting new slot [SlotRequestId{46d65692a8b5aad11b51f9a74a666a74}] and > > profile ResourceProfile{UNKNOWN} from resource manager. > > 2020-07-23 13:55:45,518 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > > Requesting new slot [SlotRequestId{3670bb4f345eedf941cc18e477ba1e9d}] and > > profile ResourceProfile{UNKNOWN} from resource manager. > > 2020-07-23 13:55:45,518 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > > Requesting new slot [SlotRequestId{4a12467d76b9e3df8bc3412c0be08e14}] and > > profile ResourceProfile{UNKNOWN} from resource manager. > > 2020-07-23 13:55:45,518 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > > Requesting new slot [SlotRequestId{e092b12b96b0a98bbf057e71b9705c23}] and > > profile ResourceProfile{UNKNOWN} from resource manager. > > 2020-07-23 13:55:45,518 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > > Requesting new slot [SlotRequestId{4ad15f417716c9e07fca383990c0f52a}] and > > profile ResourceProfile{UNKNOWN} from resource manager. > > 2020-07-23 13:55:45,518 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > > Requesting new slot [SlotRequestId{345fdb427a893b7fc3f4f040f93445d2}] and > > profile ResourceProfile{UNKNOWN} from resource manager. > > 2020-07-23 13:55:45,519 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > > Requesting new slot [SlotRequestId{e559485ea7b0b7e17367816882538d90}] and > > profile ResourceProfile{UNKNOWN} from resource manager. > > 2020-07-23 13:55:45,519 INFO > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > > Request slot with profile ResourceProfile{UNKNOWN} for job > > 99a030d0e3f428490a501c0132f27a56 with allocation id > > b78837a29b4032924ac25be70ed21a3c. > > > > > > 2020-07-23 13:58:18,037 WARN > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > > hostname could be resolved for the IP address 10.47.96.2, using IP > address > > as host name. Local input split assignment (such as for HDFS files) may > be > > impacted. > > 2020-07-23 13:58:22,192 WARN > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > > hostname could be resolved for the IP address 10.34.64.14, using IP > address > > as host name. Local input split assignment (such as for HDFS files) may > be > > impacted. > > 2020-07-23 13:58:22,358 WARN > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > > hostname could be resolved for the IP address 10.34.128.9, using IP > address > > as host name. Local input split assignment (such as for HDFS files) may > be > > impacted. > > 2020-07-23 13:58:24,562 WARN > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > > hostname could be resolved for the IP address 10.32.160.6, using IP > address > > as host name. Local input split assignment (such as for HDFS files) may > be > > impacted. > > 2020-07-23 13:58:25,487 WARN > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > > hostname could be resolved for the IP address 10.38.64.7, using IP > address > > as host name. Local input split assignment (such as for HDFS files) may > be > > impacted. > > 2020-07-23 13:58:27,636 WARN > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > > hostname could be resolved for the IP address 10.42.160.6, using IP > address > > as host name. Local input split assignment (such as for HDFS files) may > be > > impacted. > > 2020-07-23 13:58:27,767 WARN > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > > hostname could be resolved for the IP address 10.43.64.12, using IP > address > > as host name. Local input split assignment (such as for HDFS files) may > be > > impacted. > > 2020-07-23 13:58:29,651 INFO > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > > The heartbeat of JobManager with id 456a18b6c404cb11a359718e16de1c6b > timed > > out. > > 2020-07-23 13:58:29,651 INFO > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > > Disconnect job manager 00000000000000000000000000000000 > > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > > 99a030d0e3f428490a501c0132f27a56 from the resource manager. > > 2020-07-23 13:58:29,854 WARN > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > > hostname could be resolved for the IP address 10.39.0.8, using IP address > > as host name. Local input split assignment (such as for HDFS files) may > be > > impacted. > > 2020-07-23 13:58:33,623 WARN > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > > hostname could be resolved for the IP address 10.35.0.10, using IP > address > > as host name. Local input split assignment (such as for HDFS files) may > be > > impacted. > > 2020-07-23 13:58:35,756 WARN > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > > hostname could be resolved for the IP address 10.36.32.8, using IP > address > > as host name. Local input split assignment (such as for HDFS files) may > be > > impacted. > > 2020-07-23 13:58:36,694 WARN > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > > hostname could be resolved for the IP address 10.42.128.6, using IP > address > > as host name. Local input split assignment (such as for HDFS files) may > be > > impacted. > > > > > > 2020-07-23 14:01:17,814 INFO > org.apache.flink.runtime.jobmaster.JobMaster > > [] - Close ResourceManager connection > > 83b1ff14900abfd54418e7fa3efb3f8a: The heartbeat of JobManager with id > > 456a18b6c404cb11a359718e16de1c6b timed out.. > > 2020-07-23 14:01:17,815 INFO > org.apache.flink.runtime.jobmaster.JobMaster > > [] - Connecting to ResourceManager > > akka.tcp://flink@flink-jobmanager > > :6123/user/rpc/resourcemanager_*(00000000000000000000000000000000) > > 2020-07-23 14:01:17,816 INFO > org.apache.flink.runtime.jobmaster.JobMaster > > [] - Resolved ResourceManager address, beginning > > registration > > 2020-07-23 14:01:17,816 INFO > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > > Registering job manager 00000000000000000000000000000000 > > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > > 99a030d0e3f428490a501c0132f27a56. > > 2020-07-23 14:01:17,836 INFO > > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - > Source: > > host_relation -> Timestamps/Watermarks -> Map (1/1) > > (302ca9640e2d209a543d843f2996ccd2) switched from SCHEDULED to FAILED on > not > > deployed. > > > org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: > > Could not allocate the required slot within slot request timeout. Please > > make sure that the cluster has enough resources. > > at > > > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeWrapWithNoResourceAvailableException(DefaultScheduler.java:441) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.scheduler.DefaultScheduler.lambda$assignResourceOrHandleError$6(DefaultScheduler.java:422) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > > ~[?:1.8.0_242] > > at > > > org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.lambda$internalAllocateSlot$0(SchedulerImpl.java:168) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > > ~[?:1.8.0_242] > > at > > > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$SingleTaskSlot.release(SlotSharingManager.java:726) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.release(SlotSharingManager.java:537) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.lambda$new$0(SlotSharingManager.java:432) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > > ~[?:1.8.0_242] > > at > > > org.apache.flink.runtime.concurrent.FutureUtils.lambda$forwardTo$21(FutureUtils.java:1120) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > > ~[?:1.8.0_242] > > at > > > org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:1036) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:402) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:195) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > Caused by: java.util.concurrent.CompletionException: > > java.util.concurrent.TimeoutException > > at > > > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) > > ~[?:1.8.0_242] > > ... 25 more > > Caused by: java.util.concurrent.TimeoutException > > ... 23 more > > 2020-07-23 14:01:17,848 INFO > > > org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy > > [] - Calculating tasks to restart to recover the failed task > > cbc357ccb763df2852fee8c4fc7d55f2_0. > > 2020-07-23 14:01:17,910 INFO > > > org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy > > [] - 902 tasks should be restarted to recover the failed task > > cbc357ccb763df2852fee8c4fc7d55f2_0. > > 2020-07-23 14:01:17,913 INFO > > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job > > JobTest (99a030d0e3f428490a501c0132f27a56) switched from state RUNNING to > > FAILING. > > org.apache.flink.runtime.JobException: Recovery is suppressed by > > NoRestartBackoffTimeStrategy > > at > > > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:116) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:78) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:192) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:185) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:179) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:503) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.scheduler.UpdateSchedulerNgOnInternalFailuresListener.notifyTaskFailure(UpdateSchedulerNgOnInternalFailuresListener.java:49) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.executiongraph.ExecutionGraph.notifySchedulerNgAboutInternalTaskFailure(ExecutionGraph.java:1710) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1287) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1255) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.executiongraph.Execution.markFailed(Execution.java:1086) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.executiongraph.ExecutionVertex.markFailed(ExecutionVertex.java:748) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.scheduler.DefaultExecutionVertexOperations.markFailed(DefaultExecutionVertexOperations.java:41) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskDeploymentFailure(DefaultScheduler.java:435) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.scheduler.DefaultScheduler.lambda$assignResourceOrHandleError$6(DefaultScheduler.java:422) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > > ~[?:1.8.0_242] > > at > > > org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.lambda$internalAllocateSlot$0(SchedulerImpl.java:168) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > > ~[?:1.8.0_242] > > at > > > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$SingleTaskSlot.release(SlotSharingManager.java:726) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.release(SlotSharingManager.java:537) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.lambda$new$0(SlotSharingManager.java:432) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > > ~[?:1.8.0_242] > > at > > > org.apache.flink.runtime.concurrent.FutureUtils.lambda$forwardTo$21(FutureUtils.java:1120) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > > ~[?:1.8.0_242] > > at > > > org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:1036) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:402) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:195) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > Caused by: > > > org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: > > Could not allocate the required slot within slot request timeout. Please > > make sure that the cluster has enough resources. > > at > > > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeWrapWithNoResourceAvailableException(DefaultScheduler.java:441) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > ... 45 more > > Caused by: java.util.concurrent.CompletionException: > > java.util.concurrent.TimeoutException > > at > > > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) > > ~[?:1.8.0_242] > > ... 25 more > > Caused by: java.util.concurrent.TimeoutException > > ... 23 more > > > > > > > > 2020-07-23 14:01:18,109 INFO > > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - > > Discarding the results produced by task execution > > 1809eb912d69854f2babedeaf879df6a. > > 2020-07-23 14:01:18,110 INFO > > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job > > JobTest (99a030d0e3f428490a501c0132f27a56) switched from state FAILING to > > FAILED. > > org.apache.flink.runtime.JobException: Recovery is suppressed by > > NoRestartBackoffTimeStrategy > > at > > > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:116) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:78) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:192) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:185) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:179) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:503) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.scheduler.UpdateSchedulerNgOnInternalFailuresListener.notifyTaskFailure(UpdateSchedulerNgOnInternalFailuresListener.java:49) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.executiongraph.ExecutionGraph.notifySchedulerNgAboutInternalTaskFailure(ExecutionGraph.java:1710) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1287) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1255) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.executiongraph.Execution.markFailed(Execution.java:1086) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.executiongraph.ExecutionVertex.markFailed(ExecutionVertex.java:748) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.scheduler.DefaultExecutionVertexOperations.markFailed(DefaultExecutionVertexOperations.java:41) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskDeploymentFailure(DefaultScheduler.java:435) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.scheduler.DefaultScheduler.lambda$assignResourceOrHandleError$6(DefaultScheduler.java:422) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > > ~[?:1.8.0_242] > > at > > > org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.lambda$internalAllocateSlot$0(SchedulerImpl.java:168) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > > ~[?:1.8.0_242] > > at > > > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$SingleTaskSlot.release(SlotSharingManager.java:726) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.release(SlotSharingManager.java:537) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.lambda$new$0(SlotSharingManager.java:432) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > > ~[?:1.8.0_242] > > at > > > org.apache.flink.runtime.concurrent.FutureUtils.lambda$forwardTo$21(FutureUtils.java:1120) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > > ~[?:1.8.0_242] > > at > > > org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:1036) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:402) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:195) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > at > > > akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > [flink-dist_2.11-1.11.1.jar:1.11.1] > > Caused by: > > > org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: > > Could not allocate the required slot within slot request timeout. Please > > make sure that the cluster has enough resources. > > at > > > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeWrapWithNoResourceAvailableException(DefaultScheduler.java:441) > > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > > ... 45 more > > Caused by: java.util.concurrent.CompletionException: > > java.util.concurrent.TimeoutException > > at > > > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) > > ~[?:1.8.0_242] > > at > > > java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) > > ~[?:1.8.0_242] > > ... 25 more > > Caused by: java.util.concurrent.TimeoutException > > ... 23 more > > 2020-07-23 14:01:18,114 INFO > > org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - > Stopping > > checkpoint coordinator for job 99a030d0e3f428490a501c0132f27a56. > > 2020-07-23 14:01:18,117 INFO > > org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore > [] > > - Shutting down > > 2020-07-23 14:01:18,118 INFO > > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - > > Discarding the results produced by task execution > > 302ca9640e2d209a543d843f2996ccd2. > > 2020-07-23 14:01:18,120 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Pending > > slot request [SlotRequestId{15fd2a9565c2b080748c1d1592b1cbbc}] timed out. > > 2020-07-23 14:01:18,120 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Pending > > slot request [SlotRequestId{8cd72cc16f0e319d915a9a096a1096d7}] timed out. > > 2020-07-23 14:01:18,120 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Pending > > slot request [SlotRequestId{e7e422409acebdb385014a9634af6a90}] timed out. > > 2020-07-23 14:01:18,121 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Pending > > slot request [SlotRequestId{cef1af73546ca1fc27ca7a3322e9e815}] timed out. > > 2020-07-23 14:01:18,121 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Pending > > slot request [SlotRequestId{108fe0b3086567ad79275eccef2fdaf8}] timed out. > > 2020-07-23 14:01:18,121 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Pending > > slot request [SlotRequestId{265e67985eab7a6dc08024e53bf2708d}] timed out. > > 2020-07-23 14:01:18,122 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Pending > > slot request [SlotRequestId{7087497a17c441f1a1d6fefcbc7cd0ea}] timed out. > > 2020-07-23 14:01:18,122 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Pending > > slot request [ > > > > > > 2020-07-23 14:01:18,151 INFO > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > > Registering job manager 00000000000000000000000000000000 > > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > > 99a030d0e3f428490a501c0132f27a56. > > 2020-07-23 14:01:18,157 INFO > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > > Registered job manager 00000000000000000000000000000000 > > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > > 99a030d0e3f428490a501c0132f27a56. > > 2020-07-23 14:01:18,157 INFO > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > > Registered job manager 00000000000000000000000000000000 > > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > > 99a030d0e3f428490a501c0132f27a56. > > 2020-07-23 14:01:18,157 INFO > > org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Job > > 99a030d0e3f428490a501c0132f27a56 reached globally terminal state FAILED. > > 2020-07-23 14:01:18,162 INFO > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > > Registered job manager 00000000000000000000000000000000 > > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > > 99a030d0e3f428490a501c0132f27a56. > > 2020-07-23 14:01:18,162 INFO > org.apache.flink.runtime.jobmaster.JobMaster > > [] - JobManager successfully registered at > ResourceManager, > > leader id: 00000000000000000000000000000000. > > 2020-07-23 14:01:18,225 INFO > org.apache.flink.runtime.jobmaster.JobMaster > > [] - Stopping the JobMaster for job > > JobTest(99a030d0e3f428490a501c0132f27a56). > > 2020-07-23 14:01:18,381 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > > Suspending SlotPool. > > 2020-07-23 14:01:18,382 INFO > org.apache.flink.runtime.jobmaster.JobMaster > > [] - Close ResourceManager connection > > 83b1ff14900abfd54418e7fa3efb3f8a: JobManager is shutting down.. > > 2020-07-23 14:01:18,382 INFO > > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Stopping > > SlotPool. > > 2020-07-23 14:01:18,382 INFO > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > > Disconnect job manager 00000000000000000000000000000000 > > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > > 99a030d0e3f428490a501c0132f27a56 from the resource manager. > > > > a511955993 > > 邮箱:[hidden email] > > > > < > https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=a511955993&uid=a511955993%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22%E9%82%AE%E7%AE%B1%EF%BC%9Aa511955993%40163.com%22%5D> > ; > > > > 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail88> 定制 > > > > On 07/23/2020 13:26, Yang Wang <[hidden email]> wrote: > > 很高兴你的问题解决了,但我觉得根本原因应该不是加上了taskmanager-query-state-service.yaml的关系。 > > 我这边不创建这个服务也是正常的,而且nslookup {tm_ip_address}是可以正常反解析到hostname的。 > > > > 注意这里不是解析hostname,而是通过ip地址来反解析进行验证 > > > > > > 回答你说的两个问题: > > 1. 不是必须的,我这边验证不需要创建,集群也是可以正常运行任务的。Rest > > service的暴露方式是ClusterIP、NodePort、LoadBalancer都正常 > > 2. 如果没有配置taskmanager.bind-host, > > [Flink-15911][Flink-15154]这两个JIRA并不会影响TM向RM注册时候的使用的地址 > > > > 如果你想找到根本原因,那可能需要你这边提供JM/TM的完整log,这样方便分析 > > > > > > Best, > > Yang > > > > SmileSmile <[hidden email]> 于2020年7月23日周四 上午11:30写道: > > > > > > > > Hi Yang Wang > > > > > > 刚刚在测试环境测试了一下,taskManager没有办法nslookup出来,JM可以nslookup,这两者的差别在于是否有service。 > > > > > > 解决方案:我这边给集群加上了taskmanager-query-state-service.yaml(按照官网上是可选服务)。就不会刷No > > > hostname could be resolved for ip > > > address,将NodePort改为ClusterIp,作业就可以成功提交,不会出现time out的问题了,问题得到了解决。 > > > > > > > > > 1. 如果按照上面的情况,那么这个配置文件是必须配置的? > > > > > > 2. 在1.11的更新中,发现有 [Flink-15911][Flink-15154] > > > 支持分别配置用于本地监听绑定的网络接口和外部访问的地址和端口。是否是这块的改动, > > > 需要JM去通过TM上报的ip反向解析出service? > > > > > > > > > Bset! > > > > > > > > > [1] > > > > > > https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/kubernetes.html > > > > > > a511955993 > > > 邮箱:[hidden email] > > > > > > < > > > https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=a511955993&uid=a511955993%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22%E9%82%AE%E7%AE%B1%EF%BC%9Aa511955993%40163.com%22%5D> > ; > > > > > > > > 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail88> 定制 > > > > > > On 07/23/2020 10:11, Yang Wang <[hidden email]> wrote: > > > 我的意思就是你在Flink任务运行的过程中,然后下面的命令在集群里面起一个busybox的pod, > > > 在里面执行 nslookup {ip_address},看看是否能够正常解析到。如果不能应该就是coredns的 > > > 问题了 > > > > > > kubectl run -i -t busybox --image=busybox --restart=Never > > > > > > 你需要确认下集群的coredns pod是否正常,一般是部署在kube-system这个namespace下的 > > > > > > > > > > > > Best, > > > Yang > > > > > > > > > SmileSmile <[hidden email]> 于2020年7月22日周三 下午7:57写道: > > > > > > > > > > > Hi,Yang Wang! > > > > > > > > 很开心可以收到你的回复,你的回复帮助很大,让我知道了问题的方向。我再补充些信息,希望可以帮我进一步判断一下问题根源。 > > > > > > > > 在JM报错的地方,No hostname could be resolved for ip address xxxxx > > > > ,报出来的ip是k8s分配给flink pod的内网ip,不是宿主机的ip。请问这个问题可能出在哪里呢 > > > > > > > > Best! > > > > > > > > > > > > a511955993 > > > > 邮箱:[hidden email] > > > > > > > > < > > > > > > https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=a511955993&uid=a511955993%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22%E9%82%AE%E7%AE%B1%EF%BC%9Aa511955993%40163.com%22%5D> > ; > > > > > > > > > > > > > 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail88> 定制 > > > > > > > > On 07/22/2020 18:18, Yang Wang <[hidden email]> wrote: > > > > 如果你的日志里面一直在刷No hostname could be resolved for the IP > > > address,应该是集群的coredns > > > > 有问题,由ip地址反查hostname查不到。你可以起一个busybox验证一下是不是这个ip就解析不了,有 > > > > 可能是coredns有问题 > > > > > > > > > > > > Best, > > > > Yang > > > > > > > > Congxian Qiu <[hidden email]> 于2020年7月21日周二 下午7:29写道: > > > > > > > > > Hi > > > > > 不确定 k8s 环境中能否看到 pod 的完整日志?类似 Yarn 的 NM 日志一样,如果有的话,可以尝试看一下这个 pod > > > > > 的完整日志有没有什么发现 > > > > > Best, > > > > > Congxian > > > > > > > > > > > > > > > SmileSmile <[hidden email]> 于2020年7月21日周二 下午3:19写道: > > > > > > > > > > > Hi,Congxian > > > > > > > > > > > > 因为是测试环境,没有配置HA,目前看到的信息,就是JM刷出来大量的no hostname could be > > > > > > resolved,jm失联,作业提交失败。 > > > > > > 将jm内存配置为10g也是一样的情况(jobmanager.memory.pprocesa.size:10240m)。 > > > > > > > > > > > > 在同一个环境将版本回退到1.10没有出现该问题,也不会刷如上报错。 > > > > > > > > > > > > > > > > > > 是否有其他排查思路? > > > > > > > > > > > > Best! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | | > > > > > > a511955993 > > > > > > | > > > > > > | > > > > > > 邮箱:[hidden email] > > > > > > | > > > > > > > > > > > > 签名由 网易邮箱大师 定制 > > > > > > > > > > > > On 07/16/2020 13:17, Congxian Qiu wrote: > > > > > > Hi > > > > > > 如果没有异常,GC 情况也正常的话,或许可以看一下 pod 的相关日志,如果开启了 HA 也可以看一下 zk > > > 的日志。之前遇到过一次在 > > > > > Yarn > > > > > > 环境中类似的现象是由于其他原因导致的,通过看 NM 日志以及 zk 日志发现的原因。 > > > > > > > > > > > > Best, > > > > > > Congxian > > > > > > > > > > > > > > > > > > SmileSmile <[hidden email]> 于2020年7月15日周三 下午5:20写道: > > > > > > > > > > > > > Hi Roc > > > > > > > > > > > > > > 该现象在1.10.1版本没有,在1.11版本才出现。请问这个该如何查比较合适 > > > > > > > > > > > > > > > > > > > > > > > > > > > > | | > > > > > > > a511955993 > > > > > > > | > > > > > > > | > > > > > > > 邮箱:[hidden email] > > > > > > > | > > > > > > > > > > > > > > 签名由 网易邮箱大师 定制 > > > > > > > > > > > > > > On 07/15/2020 17:16, Roc Marshal wrote: > > > > > > > Hi,SmileSmile. > > > > > > > 个人之前有遇到过 类似 的host解析问题,可以从k8s的pod节点网络映射角度排查一下。 > > > > > > > 希望这对你有帮助。 > > > > > > > > > > > > > > > > > > > > > 祝好。 > > > > > > > Roc Marshal > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 在 2020-07-15 17:04:18,"SmileSmile" <[hidden email]> 写道: > > > > > > > > > > > > > > > >Hi > > > > > > > > > > > > > > > >使用版本Flink 1.11,部署方式 kubernetes session。 TM个数30个,每个TM 4个slot。 > > job > > > > > > > 并行度120.提交作业的时候出现大量的No hostname could be resolved for the IP > > > > address,JM > > > > > > time > > > > > > > out,作业提交失败。web ui也会卡主无响应。 > > > > > > > > > > > > > > > >用wordCount,并行度只有1提交也会刷,no > > hostname的日志会刷个几条,然后正常提交,如果并行度一上去,就会超时。 > > > > > > > > > > > > > > > > > > > > > > > >部分日志如下: > > > > > > > > > > > > > > > >2020-07-15 16:58:46,460 WARN > > > > > > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] > > - > > > No > > > > > > > hostname could be resolved for the IP address 10.32.160.7, > using > > > IP > > > > > > address > > > > > > > as host name. Local input split assignment (such as for HDFS > > > files) > > > > may > > > > > > be > > > > > > > impacted. > > > > > > > >2020-07-15 16:58:46,460 WARN > > > > > > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] > > - > > > No > > > > > > > hostname could be resolved for the IP address 10.44.224.7, > using > > > IP > > > > > > address > > > > > > > as host name. Local input split assignment (such as for HDFS > > > files) > > > > may > > > > > > be > > > > > > > impacted. > > > > > > > >2020-07-15 16:58:46,461 WARN > > > > > > > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] > > - > > > No > > > > > > > hostname could be resolved for the IP address 10.40.32.9, using > > IP > > > > > > address > > > > > > > as host name. Local input split assignment (such as for HDFS > > > files) > > > > may > > > > > > be > > > > > > > impacted. > > > > > > > > > > > > > > > >2020-07-15 16:59:10,236 INFO > > > > > > > > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager > > > > [] - > > > > > > The > > > > > > > heartbeat of JobManager with id > 69a0d460de468888a9f41c770d963c0a > > > > timed > > > > > > out. > > > > > > > >2020-07-15 16:59:10,236 INFO > > > > > > > > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager > > > > [] - > > > > > > > Disconnect job manager 00000000000000000000000000000000 > > > > > > > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 > > for > > > > job > > > > > > > e1554c737e37ed79688a15c746b6e9ef from the resource manager. > > > > > > > > > > > > > > > > > > > > > > > >how to deal with ? > > > > > > > > > > > > > > > > > > > > > > > >beset ! > > > > > > > > > > > > > > > >| | > > > > > > > >a511955993 > > > > > > > >| > > > > > > > >| > > > > > > > >邮箱:[hidden email] > > > > > > > >| > > > > > > > > > > > > > > > >签名由 网易邮箱大师 定制 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > |
遇到了同样的问题,也是启动了 taskmanager-query-state-service.yaml 这个服务后,作业才能正常提交的,另外我是在本地装的 k8s 集群进行测试的,如果是 GC 的问题,启不启动 TM service 应该不会有影响的
-- Best, Matt Wang On 07/27/2020 15:01,Yang Wang<[hidden email]> wrote: 建议先配置heartbeat.timeout的值大一些,然后把gc log打出来 看看是不是经常发生fullGC,每次持续时间是多长,从你目前提供的log看,进程内JM->RM都会心跳超时 怀疑还是和GC有关的 env.java.opts.jobmanager: -Xloggc:<LOG_DIR>/jobmanager-gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=2 -XX:GCLogFileSize=512M Best, Yang SmileSmile <[hidden email]> 于2020年7月27日周一 下午1:50写道: Hi,Yang Wang 因为日志太长了,删了一些重复的内容。 一开始怀疑过jm gc的问题,将jm的内存调整为10g也是一样的情况。 Best | | a511955993 | | 邮箱:[hidden email] | 签名由 网易邮箱大师 定制 On 07/27/2020 11:36, Yang Wang wrote: 看你这个任务,失败的根本原因并不是“No hostname could be resolved ”,这个WARNING的原因可以单独讨论(如果在1.10里面不存在的话)。 你可以本地起一个Standalone的集群,也会有这样的WARNING,并不影响正常使用 失败的原因是slot 5分钟申请超时了,你给的日志里面2020-07-23 13:55:45,519到2020-07-23 13:58:18,037是空白的,没有进行省略吧? 这段时间按理应该是task开始deploy了。在日志里看到了JM->RM的心跳超时,同一个Pod里面的同一个进程通信也超时了 所以怀疑JM一直在FullGC,这个需要你确认一下 Best, Yang SmileSmile <[hidden email]> 于2020年7月23日周四 下午2:43写道: Hi Yang Wang 先分享下我这边的环境版本 kubernetes:1.17.4. CNI: weave 1 2 3 是我的一些疑惑 4 是JM日志 1. 去掉taskmanager-query-state-service.yaml后确实不行 nslookup kubectl exec -it busybox2 -- /bin/sh / # nslookup 10.47.96.2 Server: 10.96.0.10 Address: 10.96.0.10:53 ** server can't find 2.96.47.10.in-addr.arpa: NXDOMAIN 2. Flink1.11和Flink1.10 detail subtasks taskmanagers xxx x 这行 1.11变成了172-20-0-50。1.10是flink-taskmanager-7b5d6958b6-sfzlk:36459。这块的改动是?(目前这个集群跑着1.10和1.11,1.10可以正常运行,如果coredns有问题,1.10版本的flink应该也有一样的情况吧?) 3. coredns是否特殊配置? 在容器中解析域名是正常的,只是反向解析没有service才会有问题。coredns是否有什么需要配置? 4. time out时候的JM日志如下: 2020-07-23 13:53:00,228 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - ResourceManager akka.tcp://flink@flink-jobmanager :6123/user/rpc/resourcemanager_0 was granted leadership with fencing token 00000000000000000000000000000000 2020-07-23 13:53:00,232 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at akka://flink/user/rpc/dispatcher_1 . 2020-07-23 13:53:00,233 INFO org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - Starting the SlotManager. 2020-07-23 13:53:03,472 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Registering TaskManager with ResourceID 1f9ae0cd95a28943a73be26323588696 (akka.tcp://flink@10.34.128.9:6122/user/rpc/taskmanager_0) at ResourceManager 2020-07-23 13:53:03,777 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Registering TaskManager with ResourceID cac09e751264e61615329c20713a84b4 (akka.tcp://flink@10.32.160.6:6122/user/rpc/taskmanager_0) at ResourceManager 2020-07-23 13:53:03,787 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Registering TaskManager with ResourceID 93c72d01d09f9ae427c5fc980ed4c1e4 (akka.tcp://flink@10.39.0.8:6122/user/rpc/taskmanager_0) at ResourceManager 2020-07-23 13:53:04,044 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Registering TaskManager with ResourceID 8adf2f8e81b77a16d5418a9e252c61e2 (akka.tcp://flink@10.38.64.7:6122/user/rpc/taskmanager_0) at ResourceManager 2020-07-23 13:53:04,099 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Registering TaskManager with ResourceID 23e9d2358f6eb76b9ae718d879d4f330 (akka.tcp://flink@10.42.160.6:6122/user/rpc/taskmanager_0) at ResourceManager 2020-07-23 13:53:04,146 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Registering TaskManager with ResourceID 092f8dee299e32df13db3111662b61f8 (akka.tcp://flink@10.33.192.14:6122/user/rpc/taskmanager_0) at ResourceManager 2020-07-23 13:55:44,220 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Received JobGraph submission 99a030d0e3f428490a501c0132f27a56 (JobTest). 2020-07-23 13:55:44,222 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Submitting job 99a030d0e3f428490a501c0132f27a56 (JobTest). 2020-07-23 13:55:44,251 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting RPC endpoint for org.apache.flink.runtime.jobmaster.JobMaster at akka://flink/user/rpc/jobmanager_2 . 2020-07-23 13:55:44,260 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Initializing job JobTest (99a030d0e3f428490a501c0132f27a56). 2020-07-23 13:55:44,278 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Using restart back off time strategy NoRestartBackoffTimeStrategy for JobTest (99a030d0e3f428490a501c0132f27a56). 2020-07-23 13:55:44,319 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Running initialization on master for job JobTest (99a030d0e3f428490a501c0132f27a56). 2020-07-23 13:55:44,319 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Successfully ran initialization on master in 0 ms. 2020-07-23 13:55:44,428 INFO org.apache.flink.runtime.scheduler.adapter.DefaultExecutionTopology [] - Built 1 pipelined regions in 25 ms 2020-07-23 13:55:44,437 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Loading state backend via factory org.apache.flink.contrib.streaming.state.RocksDBStateBackendFactory 2020-07-23 13:55:44,456 INFO org.apache.flink.contrib.streaming.state.RocksDBStateBackend [] - Using predefined options: DEFAULT. 2020-07-23 13:55:44,457 INFO org.apache.flink.contrib.streaming.state.RocksDBStateBackend [] - Using default options factory: DefaultConfigurableOptionsFactory{configuredOptions={}}. 2020-07-23 13:55:44,466 WARN org.apache.flink.runtime.util.HadoopUtils [] - Could not find Hadoop configuration via any of the supported methods (Flink configuration, environment variables). 2020-07-23 13:55:45,276 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Using failover strategy org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy@72bd8533 for JobTest (99a030d0e3f428490a501c0132f27a56). 2020-07-23 13:55:45,280 INFO org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl [] - JobManager runner for job JobTest (99a030d0e3f428490a501c0132f27a56) was granted leadership with session id 00000000-0000-0000-0000-000000000000 at akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2. 2020-07-23 13:55:45,286 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Starting scheduling with scheduling strategy [org.apache.flink.runtime.scheduler.strategy.EagerSchedulingStrategy] 2020-07-23 13:55:45,436 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{e092b12b96b0a98bbf057e71b9705c23}] 2020-07-23 13:55:45,436 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{4ad15f417716c9e07fca383990c0f52a}] 2020-07-23 13:55:45,436 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{345fdb427a893b7fc3f4f040f93445d2}] 2020-07-23 13:55:45,437 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{e559485ea7b0b7e17367816882538d90}] 2020-07-23 13:55:45,437 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{7be8f6c1aedb27b04e7feae68078685c}] 2020-07-23 13:55:45,437 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{582a86197884206652dff3aea2306bb3}] 2020-07-23 13:55:45,437 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{0cc24260eda3af299a0b321feefaf2cb}] 2020-07-23 13:55:45,437 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{240ca6f3d3b5ece6a98243ec8cadf616}] 2020-07-23 13:55:45,438 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{c35033d598a517acc108424bb9f809fb}] 2020-07-23 13:55:45,438 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{ad35013c3b532d4b4df1be62395ae0cf}] 2020-07-23 13:55:45,438 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{c929bd5e8daf432d01fad1ece3daec1a}] 2020-07-23 13:55:45,487 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Connecting to ResourceManager akka.tcp://flink@flink-jobmanager :6123/user/rpc/resourcemanager_*(00000000000000000000000000000000) 2020-07-23 13:55:45,492 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Resolved ResourceManager address, beginning registration 2020-07-23 13:55:45,493 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Registering job manager 00000000000000000000000000000000 @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job 99a030d0e3f428490a501c0132f27a56. 2020-07-23 13:55:45,499 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Registered job manager 00000000000000000000000000000000 @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job 99a030d0e3f428490a501c0132f27a56. 2020-07-23 13:55:45,501 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - JobManager successfully registered at ResourceManager, leader id: 00000000000000000000000000000000. 2020-07-23 13:55:45,501 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{15fd2a9565c2b080748c1d1592b1cbbc}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,502 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Request slot with profile ResourceProfile{UNKNOWN} for job 99a030d0e3f428490a501c0132f27a56 with allocation id d420d08bf2654d9ea76955c70db18b69. 2020-07-23 13:55:45,502 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{8cd72cc16f0e319d915a9a096a1096d7}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,503 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{e7e422409acebdb385014a9634af6a90}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,503 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{cef1af73546ca1fc27ca7a3322e9e815}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,503 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{108fe0b3086567ad79275eccef2fdaf8}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,503 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{265e67985eab7a6dc08024e53bf2708d}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,503 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{7087497a17c441f1a1d6fefcbc7cd0ea}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,503 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{14ac08438e79c8db8d25d93b99d62725}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,514 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Request slot with profile ResourceProfile{UNKNOWN} for job 99a030d0e3f428490a501c0132f27a56 with allocation id fce526bbe3e1be91caa3e4b536b20e35. 2020-07-23 13:55:45,514 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{40c7abbb12514c405323b0569fb21647}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,514 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{a4985a9647b65b30a571258b45c8f2ce}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,515 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{c52a6eb2fa58050e71e7903590019fd1}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,517 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Request slot with profile ResourceProfile{UNKNOWN} for job 99a030d0e3f428490a501c0132f27a56 with allocation id 18ac7ec802ebfcfed8c05ee9324a55a4. 2020-07-23 13:55:45,518 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Request slot with profile ResourceProfile{UNKNOWN} for job 99a030d0e3f428490a501c0132f27a56 with allocation id 7ec76cbe689eb418b63599e90ade19be. 2020-07-23 13:55:45,518 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{46d65692a8b5aad11b51f9a74a666a74}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,518 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{3670bb4f345eedf941cc18e477ba1e9d}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,518 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{4a12467d76b9e3df8bc3412c0be08e14}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,518 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{e092b12b96b0a98bbf057e71b9705c23}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,518 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{4ad15f417716c9e07fca383990c0f52a}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,518 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{345fdb427a893b7fc3f4f040f93445d2}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,519 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Requesting new slot [SlotRequestId{e559485ea7b0b7e17367816882538d90}] and profile ResourceProfile{UNKNOWN} from resource manager. 2020-07-23 13:55:45,519 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Request slot with profile ResourceProfile{UNKNOWN} for job 99a030d0e3f428490a501c0132f27a56 with allocation id b78837a29b4032924ac25be70ed21a3c. 2020-07-23 13:58:18,037 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.47.96.2, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. 2020-07-23 13:58:22,192 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.34.64.14, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. 2020-07-23 13:58:22,358 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.34.128.9, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. 2020-07-23 13:58:24,562 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.32.160.6, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. 2020-07-23 13:58:25,487 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.38.64.7, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. 2020-07-23 13:58:27,636 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.42.160.6, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. 2020-07-23 13:58:27,767 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.43.64.12, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. 2020-07-23 13:58:29,651 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - The heartbeat of JobManager with id 456a18b6c404cb11a359718e16de1c6b timed out. 2020-07-23 13:58:29,651 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Disconnect job manager 00000000000000000000000000000000 @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job 99a030d0e3f428490a501c0132f27a56 from the resource manager. 2020-07-23 13:58:29,854 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.39.0.8, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. 2020-07-23 13:58:33,623 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.35.0.10, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. 2020-07-23 13:58:35,756 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.36.32.8, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. 2020-07-23 13:58:36,694 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.42.128.6, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. 2020-07-23 14:01:17,814 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Close ResourceManager connection 83b1ff14900abfd54418e7fa3efb3f8a: The heartbeat of JobManager with id 456a18b6c404cb11a359718e16de1c6b timed out.. 2020-07-23 14:01:17,815 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Connecting to ResourceManager akka.tcp://flink@flink-jobmanager :6123/user/rpc/resourcemanager_*(00000000000000000000000000000000) 2020-07-23 14:01:17,816 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Resolved ResourceManager address, beginning registration 2020-07-23 14:01:17,816 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Registering job manager 00000000000000000000000000000000 @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job 99a030d0e3f428490a501c0132f27a56. 2020-07-23 14:01:17,836 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source: host_relation -> Timestamps/Watermarks -> Map (1/1) (302ca9640e2d209a543d843f2996ccd2) switched from SCHEDULED to FAILED on not deployed. org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Could not allocate the required slot within slot request timeout. Please make sure that the cluster has enough resources. at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeWrapWithNoResourceAvailableException(DefaultScheduler.java:441) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.scheduler.DefaultScheduler.lambda$assignResourceOrHandleError$6(DefaultScheduler.java:422) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_242] at org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.lambda$internalAllocateSlot$0(SchedulerImpl.java:168) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_242] at org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$SingleTaskSlot.release(SlotSharingManager.java:726) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.release(SlotSharingManager.java:537) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.lambda$new$0(SlotSharingManager.java:432) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_242] at org.apache.flink.runtime.concurrent.FutureUtils.lambda$forwardTo$21(FutureUtils.java:1120) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_242] at org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:1036) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:402) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:195) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) [flink-dist_2.11-1.11.1.jar:1.11.1] at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) [flink-dist_2.11-1.11.1.jar:1.11.1] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) [flink-dist_2.11-1.11.1.jar:1.11.1] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [flink-dist_2.11-1.11.1.jar:1.11.1] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.actor.Actor$class.aroundReceive(Actor.scala:517) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.actor.ActorCell.invoke(ActorCell.scala:561) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.Mailbox.run(Mailbox.scala:225) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.Mailbox.exec(Mailbox.scala:235) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [flink-dist_2.11-1.11.1.jar:1.11.1] Caused by: java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) ~[?:1.8.0_242] ... 25 more Caused by: java.util.concurrent.TimeoutException ... 23 more 2020-07-23 14:01:17,848 INFO org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy [] - Calculating tasks to restart to recover the failed task cbc357ccb763df2852fee8c4fc7d55f2_0. 2020-07-23 14:01:17,910 INFO org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy [] - 902 tasks should be restarted to recover the failed task cbc357ccb763df2852fee8c4fc7d55f2_0. 2020-07-23 14:01:17,913 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job JobTest (99a030d0e3f428490a501c0132f27a56) switched from state RUNNING to FAILING. org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:116) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:78) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:192) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:185) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:179) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:503) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.scheduler.UpdateSchedulerNgOnInternalFailuresListener.notifyTaskFailure(UpdateSchedulerNgOnInternalFailuresListener.java:49) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.executiongraph.ExecutionGraph.notifySchedulerNgAboutInternalTaskFailure(ExecutionGraph.java:1710) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1287) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1255) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.executiongraph.Execution.markFailed(Execution.java:1086) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.executiongraph.ExecutionVertex.markFailed(ExecutionVertex.java:748) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.scheduler.DefaultExecutionVertexOperations.markFailed(DefaultExecutionVertexOperations.java:41) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskDeploymentFailure(DefaultScheduler.java:435) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.scheduler.DefaultScheduler.lambda$assignResourceOrHandleError$6(DefaultScheduler.java:422) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_242] at org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.lambda$internalAllocateSlot$0(SchedulerImpl.java:168) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_242] at org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$SingleTaskSlot.release(SlotSharingManager.java:726) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.release(SlotSharingManager.java:537) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.lambda$new$0(SlotSharingManager.java:432) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_242] at org.apache.flink.runtime.concurrent.FutureUtils.lambda$forwardTo$21(FutureUtils.java:1120) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_242] at org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:1036) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:402) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:195) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) [flink-dist_2.11-1.11.1.jar:1.11.1] at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) [flink-dist_2.11-1.11.1.jar:1.11.1] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) [flink-dist_2.11-1.11.1.jar:1.11.1] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [flink-dist_2.11-1.11.1.jar:1.11.1] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.actor.Actor$class.aroundReceive(Actor.scala:517) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.actor.ActorCell.invoke(ActorCell.scala:561) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.Mailbox.run(Mailbox.scala:225) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.Mailbox.exec(Mailbox.scala:235) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [flink-dist_2.11-1.11.1.jar:1.11.1] Caused by: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Could not allocate the required slot within slot request timeout. Please make sure that the cluster has enough resources. at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeWrapWithNoResourceAvailableException(DefaultScheduler.java:441) ~[flink-dist_2.11-1.11.1.jar:1.11.1] ... 45 more Caused by: java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) ~[?:1.8.0_242] ... 25 more Caused by: java.util.concurrent.TimeoutException ... 23 more 2020-07-23 14:01:18,109 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Discarding the results produced by task execution 1809eb912d69854f2babedeaf879df6a. 2020-07-23 14:01:18,110 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job JobTest (99a030d0e3f428490a501c0132f27a56) switched from state FAILING to FAILED. org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:116) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:78) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:192) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:185) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:179) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:503) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.scheduler.UpdateSchedulerNgOnInternalFailuresListener.notifyTaskFailure(UpdateSchedulerNgOnInternalFailuresListener.java:49) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.executiongraph.ExecutionGraph.notifySchedulerNgAboutInternalTaskFailure(ExecutionGraph.java:1710) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1287) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1255) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.executiongraph.Execution.markFailed(Execution.java:1086) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.executiongraph.ExecutionVertex.markFailed(ExecutionVertex.java:748) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.scheduler.DefaultExecutionVertexOperations.markFailed(DefaultExecutionVertexOperations.java:41) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskDeploymentFailure(DefaultScheduler.java:435) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.scheduler.DefaultScheduler.lambda$assignResourceOrHandleError$6(DefaultScheduler.java:422) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_242] at org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.lambda$internalAllocateSlot$0(SchedulerImpl.java:168) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_242] at org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$SingleTaskSlot.release(SlotSharingManager.java:726) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.release(SlotSharingManager.java:537) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.lambda$new$0(SlotSharingManager.java:432) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_242] at org.apache.flink.runtime.concurrent.FutureUtils.lambda$forwardTo$21(FutureUtils.java:1120) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_242] at org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:1036) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:402) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:195) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) [flink-dist_2.11-1.11.1.jar:1.11.1] at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) [flink-dist_2.11-1.11.1.jar:1.11.1] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) [flink-dist_2.11-1.11.1.jar:1.11.1] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [flink-dist_2.11-1.11.1.jar:1.11.1] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.actor.Actor$class.aroundReceive(Actor.scala:517) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.actor.ActorCell.invoke(ActorCell.scala:561) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.Mailbox.run(Mailbox.scala:225) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.Mailbox.exec(Mailbox.scala:235) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [flink-dist_2.11-1.11.1.jar:1.11.1] at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [flink-dist_2.11-1.11.1.jar:1.11.1] Caused by: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Could not allocate the required slot within slot request timeout. Please make sure that the cluster has enough resources. at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeWrapWithNoResourceAvailableException(DefaultScheduler.java:441) ~[flink-dist_2.11-1.11.1.jar:1.11.1] ... 45 more Caused by: java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) ~[?:1.8.0_242] ... 25 more Caused by: java.util.concurrent.TimeoutException ... 23 more 2020-07-23 14:01:18,114 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Stopping checkpoint coordinator for job 99a030d0e3f428490a501c0132f27a56. 2020-07-23 14:01:18,117 INFO org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore [] - Shutting down 2020-07-23 14:01:18,118 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Discarding the results produced by task execution 302ca9640e2d209a543d843f2996ccd2. 2020-07-23 14:01:18,120 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Pending slot request [SlotRequestId{15fd2a9565c2b080748c1d1592b1cbbc}] timed out. 2020-07-23 14:01:18,120 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Pending slot request [SlotRequestId{8cd72cc16f0e319d915a9a096a1096d7}] timed out. 2020-07-23 14:01:18,120 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Pending slot request [SlotRequestId{e7e422409acebdb385014a9634af6a90}] timed out. 2020-07-23 14:01:18,121 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Pending slot request [SlotRequestId{cef1af73546ca1fc27ca7a3322e9e815}] timed out. 2020-07-23 14:01:18,121 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Pending slot request [SlotRequestId{108fe0b3086567ad79275eccef2fdaf8}] timed out. 2020-07-23 14:01:18,121 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Pending slot request [SlotRequestId{265e67985eab7a6dc08024e53bf2708d}] timed out. 2020-07-23 14:01:18,122 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Pending slot request [SlotRequestId{7087497a17c441f1a1d6fefcbc7cd0ea}] timed out. 2020-07-23 14:01:18,122 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Pending slot request [ 2020-07-23 14:01:18,151 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Registering job manager 00000000000000000000000000000000 @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job 99a030d0e3f428490a501c0132f27a56. 2020-07-23 14:01:18,157 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Registered job manager 00000000000000000000000000000000 @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job 99a030d0e3f428490a501c0132f27a56. 2020-07-23 14:01:18,157 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Registered job manager 00000000000000000000000000000000 @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job 99a030d0e3f428490a501c0132f27a56. 2020-07-23 14:01:18,157 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Job 99a030d0e3f428490a501c0132f27a56 reached globally terminal state FAILED. 2020-07-23 14:01:18,162 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Registered job manager 00000000000000000000000000000000 @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job 99a030d0e3f428490a501c0132f27a56. 2020-07-23 14:01:18,162 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - JobManager successfully registered at ResourceManager, leader id: 00000000000000000000000000000000. 2020-07-23 14:01:18,225 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Stopping the JobMaster for job JobTest(99a030d0e3f428490a501c0132f27a56). 2020-07-23 14:01:18,381 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Suspending SlotPool. 2020-07-23 14:01:18,382 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Close ResourceManager connection 83b1ff14900abfd54418e7fa3efb3f8a: JobManager is shutting down.. 2020-07-23 14:01:18,382 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Stopping SlotPool. 2020-07-23 14:01:18,382 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Disconnect job manager 00000000000000000000000000000000 @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job 99a030d0e3f428490a501c0132f27a56 from the resource manager. a511955993 邮箱:[hidden email] < https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=a511955993&uid=a511955993%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22%E9%82%AE%E7%AE%B1%EF%BC%9Aa511955993%40163.com%22%5D> ; 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail88> 定制 On 07/23/2020 13:26, Yang Wang <[hidden email]> wrote: 很高兴你的问题解决了,但我觉得根本原因应该不是加上了taskmanager-query-state-service.yaml的关系。 我这边不创建这个服务也是正常的,而且nslookup {tm_ip_address}是可以正常反解析到hostname的。 注意这里不是解析hostname,而是通过ip地址来反解析进行验证 回答你说的两个问题: 1. 不是必须的,我这边验证不需要创建,集群也是可以正常运行任务的。Rest service的暴露方式是ClusterIP、NodePort、LoadBalancer都正常 2. 如果没有配置taskmanager.bind-host, [Flink-15911][Flink-15154]这两个JIRA并不会影响TM向RM注册时候的使用的地址 如果你想找到根本原因,那可能需要你这边提供JM/TM的完整log,这样方便分析 Best, Yang SmileSmile <[hidden email]> 于2020年7月23日周四 上午11:30写道: Hi Yang Wang 刚刚在测试环境测试了一下,taskManager没有办法nslookup出来,JM可以nslookup,这两者的差别在于是否有service。 解决方案:我这边给集群加上了taskmanager-query-state-service.yaml(按照官网上是可选服务)。就不会刷No hostname could be resolved for ip address,将NodePort改为ClusterIp,作业就可以成功提交,不会出现time out的问题了,问题得到了解决。 1. 如果按照上面的情况,那么这个配置文件是必须配置的? 2. 在1.11的更新中,发现有 [Flink-15911][Flink-15154] 支持分别配置用于本地监听绑定的网络接口和外部访问的地址和端口。是否是这块的改动, 需要JM去通过TM上报的ip反向解析出service? Bset! [1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/kubernetes.html a511955993 邮箱:[hidden email] < https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=a511955993&uid=a511955993%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22%E9%82%AE%E7%AE%B1%EF%BC%9Aa511955993%40163.com%22%5D> ; 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail88> 定制 On 07/23/2020 10:11, Yang Wang <[hidden email]> wrote: 我的意思就是你在Flink任务运行的过程中,然后下面的命令在集群里面起一个busybox的pod, 在里面执行 nslookup {ip_address},看看是否能够正常解析到。如果不能应该就是coredns的 问题了 kubectl run -i -t busybox --image=busybox --restart=Never 你需要确认下集群的coredns pod是否正常,一般是部署在kube-system这个namespace下的 Best, Yang SmileSmile <[hidden email]> 于2020年7月22日周三 下午7:57写道: Hi,Yang Wang! 很开心可以收到你的回复,你的回复帮助很大,让我知道了问题的方向。我再补充些信息,希望可以帮我进一步判断一下问题根源。 在JM报错的地方,No hostname could be resolved for ip address xxxxx ,报出来的ip是k8s分配给flink pod的内网ip,不是宿主机的ip。请问这个问题可能出在哪里呢 Best! a511955993 邮箱:[hidden email] < https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=a511955993&uid=a511955993%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22%E9%82%AE%E7%AE%B1%EF%BC%9Aa511955993%40163.com%22%5D> ; 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail88> 定制 On 07/22/2020 18:18, Yang Wang <[hidden email]> wrote: 如果你的日志里面一直在刷No hostname could be resolved for the IP address,应该是集群的coredns 有问题,由ip地址反查hostname查不到。你可以起一个busybox验证一下是不是这个ip就解析不了,有 可能是coredns有问题 Best, Yang Congxian Qiu <[hidden email]> 于2020年7月21日周二 下午7:29写道: Hi 不确定 k8s 环境中能否看到 pod 的完整日志?类似 Yarn 的 NM 日志一样,如果有的话,可以尝试看一下这个 pod 的完整日志有没有什么发现 Best, Congxian SmileSmile <[hidden email]> 于2020年7月21日周二 下午3:19写道: Hi,Congxian 因为是测试环境,没有配置HA,目前看到的信息,就是JM刷出来大量的no hostname could be resolved,jm失联,作业提交失败。 将jm内存配置为10g也是一样的情况(jobmanager.memory.pprocesa.size:10240m)。 在同一个环境将版本回退到1.10没有出现该问题,也不会刷如上报错。 是否有其他排查思路? Best! | | a511955993 | | 邮箱:[hidden email] | 签名由 网易邮箱大师 定制 On 07/16/2020 13:17, Congxian Qiu wrote: Hi 如果没有异常,GC 情况也正常的话,或许可以看一下 pod 的相关日志,如果开启了 HA 也可以看一下 zk 的日志。之前遇到过一次在 Yarn 环境中类似的现象是由于其他原因导致的,通过看 NM 日志以及 zk 日志发现的原因。 Best, Congxian SmileSmile <[hidden email]> 于2020年7月15日周三 下午5:20写道: Hi Roc 该现象在1.10.1版本没有,在1.11版本才出现。请问这个该如何查比较合适 | | a511955993 | | 邮箱:[hidden email] | 签名由 网易邮箱大师 定制 On 07/15/2020 17:16, Roc Marshal wrote: Hi,SmileSmile. 个人之前有遇到过 类似 的host解析问题,可以从k8s的pod节点网络映射角度排查一下。 希望这对你有帮助。 祝好。 Roc Marshal 在 2020-07-15 17:04:18,"SmileSmile" <[hidden email]> 写道: Hi 使用版本Flink 1.11,部署方式 kubernetes session。 TM个数30个,每个TM 4个slot。 job 并行度120.提交作业的时候出现大量的No hostname could be resolved for the IP address,JM time out,作业提交失败。web ui也会卡主无响应。 用wordCount,并行度只有1提交也会刷,no hostname的日志会刷个几条,然后正常提交,如果并行度一上去,就会超时。 部分日志如下: 2020-07-15 16:58:46,460 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.32.160.7, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. 2020-07-15 16:58:46,460 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.44.224.7, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. 2020-07-15 16:58:46,461 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.40.32.9, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. 2020-07-15 16:59:10,236 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - The heartbeat of JobManager with id 69a0d460de468888a9f41c770d963c0a timed out. 2020-07-15 16:59:10,236 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Disconnect job manager 00000000000000000000000000000000 @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job e1554c737e37ed79688a15c746b6e9ef from the resource manager. how to deal with ? beset ! | | a511955993 | | 邮箱:[hidden email] | 签名由 网易邮箱大师 定制 |
Hi,
我们也遇到了同样的问题,并行度增加后,JobManager 卡住的时间越来越长,直到所有的 TaskManager 都被迫超时了。目前来看和 GC 无关,网络这里嫌疑更大。 On Fri, Jul 31, 2020 at 7:55 PM Matt Wang <[hidden email]> wrote: > 遇到了同样的问题,也是启动了 taskmanager-query-state-service.yaml > 这个服务后,作业才能正常提交的,另外我是在本地装的 k8s 集群进行测试的,如果是 GC 的问题,启不启动 TM service 应该不会有影响的 > > > -- > > Best, > Matt Wang > > > On 07/27/2020 15:01,Yang Wang<[hidden email]> wrote: > 建议先配置heartbeat.timeout的值大一些,然后把gc log打出来 > 看看是不是经常发生fullGC,每次持续时间是多长,从你目前提供的log看,进程内JM->RM都会心跳超时 > 怀疑还是和GC有关的 > > env.java.opts.jobmanager: -Xloggc:<LOG_DIR>/jobmanager-gc.log > -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=2 -XX:GCLogFileSize=512M > > > Best, > Yang > > SmileSmile <[hidden email]> 于2020年7月27日周一 下午1:50写道: > > Hi,Yang Wang > > 因为日志太长了,删了一些重复的内容。 > 一开始怀疑过jm gc的问题,将jm的内存调整为10g也是一样的情况。 > > Best > > > > | | > a511955993 > | > | > 邮箱:[hidden email] > | > > 签名由 网易邮箱大师 定制 > > On 07/27/2020 11:36, Yang Wang wrote: > 看你这个任务,失败的根本原因并不是“No hostname could be resolved > ”,这个WARNING的原因可以单独讨论(如果在1.10里面不存在的话)。 > 你可以本地起一个Standalone的集群,也会有这样的WARNING,并不影响正常使用 > > > 失败的原因是slot 5分钟申请超时了,你给的日志里面2020-07-23 13:55:45,519到2020-07-23 > 13:58:18,037是空白的,没有进行省略吧? > 这段时间按理应该是task开始deploy了。在日志里看到了JM->RM的心跳超时,同一个Pod里面的同一个进程通信也超时了 > 所以怀疑JM一直在FullGC,这个需要你确认一下 > > > Best, > Yang > > SmileSmile <[hidden email]> 于2020年7月23日周四 下午2:43写道: > > Hi Yang Wang > > 先分享下我这边的环境版本 > > > kubernetes:1.17.4. CNI: weave > > > 1 2 3 是我的一些疑惑 > > 4 是JM日志 > > > 1. 去掉taskmanager-query-state-service.yaml后确实不行 nslookup > > kubectl exec -it busybox2 -- /bin/sh > / # nslookup 10.47.96.2 > Server: 10.96.0.10 > Address: 10.96.0.10:53 > > ** server can't find 2.96.47.10.in-addr.arpa: NXDOMAIN > > > > 2. Flink1.11和Flink1.10 > > detail subtasks taskmanagers xxx x 这行 > > > 1.11变成了172-20-0-50。1.10是flink-taskmanager-7b5d6958b6-sfzlk:36459。这块的改动是?(目前这个集群跑着1.10和1.11,1.10可以正常运行,如果coredns有问题,1.10版本的flink应该也有一样的情况吧?) > > 3. coredns是否特殊配置? > > 在容器中解析域名是正常的,只是反向解析没有service才会有问题。coredns是否有什么需要配置? > > > 4. time out时候的JM日志如下: > > > > 2020-07-23 13:53:00,228 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > ResourceManager akka.tcp://flink@flink-jobmanager > :6123/user/rpc/resourcemanager_0 > was granted leadership with fencing token > 00000000000000000000000000000000 > 2020-07-23 13:53:00,232 INFO > org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - > Starting > RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher > at akka://flink/user/rpc/dispatcher_1 . > 2020-07-23 13:53:00,233 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] > - > Starting the SlotManager. > 2020-07-23 13:53:03,472 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registering TaskManager with ResourceID 1f9ae0cd95a28943a73be26323588696 > (akka.tcp://flink@10.34.128.9:6122/user/rpc/taskmanager_0) at > ResourceManager > 2020-07-23 13:53:03,777 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registering TaskManager with ResourceID cac09e751264e61615329c20713a84b4 > (akka.tcp://flink@10.32.160.6:6122/user/rpc/taskmanager_0) at > ResourceManager > 2020-07-23 13:53:03,787 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registering TaskManager with ResourceID 93c72d01d09f9ae427c5fc980ed4c1e4 > (akka.tcp://flink@10.39.0.8:6122/user/rpc/taskmanager_0) at > ResourceManager > 2020-07-23 13:53:04,044 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registering TaskManager with ResourceID 8adf2f8e81b77a16d5418a9e252c61e2 > (akka.tcp://flink@10.38.64.7:6122/user/rpc/taskmanager_0) at > ResourceManager > 2020-07-23 13:53:04,099 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registering TaskManager with ResourceID 23e9d2358f6eb76b9ae718d879d4f330 > (akka.tcp://flink@10.42.160.6:6122/user/rpc/taskmanager_0) at > ResourceManager > 2020-07-23 13:53:04,146 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registering TaskManager with ResourceID 092f8dee299e32df13db3111662b61f8 > (akka.tcp://flink@10.33.192.14:6122/user/rpc/taskmanager_0) at > ResourceManager > > > 2020-07-23 13:55:44,220 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - > Received > JobGraph submission 99a030d0e3f428490a501c0132f27a56 (JobTest). > 2020-07-23 13:55:44,222 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - > Submitting job 99a030d0e3f428490a501c0132f27a56 (JobTest). > 2020-07-23 13:55:44,251 INFO > org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - > Starting > RPC endpoint for org.apache.flink.runtime.jobmaster.JobMaster at > akka://flink/user/rpc/jobmanager_2 . > 2020-07-23 13:55:44,260 INFO > org.apache.flink.runtime.jobmaster.JobMaster > [] - Initializing job JobTest > (99a030d0e3f428490a501c0132f27a56). > 2020-07-23 13:55:44,278 INFO > org.apache.flink.runtime.jobmaster.JobMaster > [] - Using restart back off time strategy > NoRestartBackoffTimeStrategy for JobTest > (99a030d0e3f428490a501c0132f27a56). > 2020-07-23 13:55:44,319 INFO > org.apache.flink.runtime.jobmaster.JobMaster > [] - Running initialization on master for job JobTest > (99a030d0e3f428490a501c0132f27a56). > 2020-07-23 13:55:44,319 INFO > org.apache.flink.runtime.jobmaster.JobMaster > [] - Successfully ran initialization on master in 0 ms. > 2020-07-23 13:55:44,428 INFO > org.apache.flink.runtime.scheduler.adapter.DefaultExecutionTopology [] - > Built 1 pipelined regions in 25 ms > 2020-07-23 13:55:44,437 INFO > org.apache.flink.runtime.jobmaster.JobMaster > [] - Loading state backend via factory > org.apache.flink.contrib.streaming.state.RocksDBStateBackendFactory > 2020-07-23 13:55:44,456 INFO > org.apache.flink.contrib.streaming.state.RocksDBStateBackend [] - Using > predefined options: DEFAULT. > 2020-07-23 13:55:44,457 INFO > org.apache.flink.contrib.streaming.state.RocksDBStateBackend [] - Using > default options factory: > DefaultConfigurableOptionsFactory{configuredOptions={}}. > 2020-07-23 13:55:44,466 WARN org.apache.flink.runtime.util.HadoopUtils > [] - Could not find Hadoop configuration via any of the > supported methods (Flink configuration, environment variables). > 2020-07-23 13:55:45,276 INFO > org.apache.flink.runtime.jobmaster.JobMaster > [] - Using failover strategy > > > org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy@72bd8533 > for JobTest (99a030d0e3f428490a501c0132f27a56). > 2020-07-23 13:55:45,280 INFO > org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl [] - > JobManager runner for job JobTest (99a030d0e3f428490a501c0132f27a56) was > granted leadership with session id 00000000-0000-0000-0000-000000000000 > at > akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2. > 2020-07-23 13:55:45,286 INFO > org.apache.flink.runtime.jobmaster.JobMaster > [] - Starting scheduling with scheduling strategy > [org.apache.flink.runtime.scheduler.strategy.EagerSchedulingStrategy] > > > > 2020-07-23 13:55:45,436 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > serve slot request, no ResourceManager connected. Adding as pending > request > [SlotRequestId{e092b12b96b0a98bbf057e71b9705c23}] > 2020-07-23 13:55:45,436 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > serve slot request, no ResourceManager connected. Adding as pending > request > [SlotRequestId{4ad15f417716c9e07fca383990c0f52a}] > 2020-07-23 13:55:45,436 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > serve slot request, no ResourceManager connected. Adding as pending > request > [SlotRequestId{345fdb427a893b7fc3f4f040f93445d2}] > 2020-07-23 13:55:45,437 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > serve slot request, no ResourceManager connected. Adding as pending > request > [SlotRequestId{e559485ea7b0b7e17367816882538d90}] > 2020-07-23 13:55:45,437 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > serve slot request, no ResourceManager connected. Adding as pending > request > [SlotRequestId{7be8f6c1aedb27b04e7feae68078685c}] > 2020-07-23 13:55:45,437 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > serve slot request, no ResourceManager connected. Adding as pending > request > [SlotRequestId{582a86197884206652dff3aea2306bb3}] > 2020-07-23 13:55:45,437 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > serve slot request, no ResourceManager connected. Adding as pending > request > [SlotRequestId{0cc24260eda3af299a0b321feefaf2cb}] > 2020-07-23 13:55:45,437 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > serve slot request, no ResourceManager connected. Adding as pending > request > [SlotRequestId{240ca6f3d3b5ece6a98243ec8cadf616}] > 2020-07-23 13:55:45,438 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > serve slot request, no ResourceManager connected. Adding as pending > request > [SlotRequestId{c35033d598a517acc108424bb9f809fb}] > 2020-07-23 13:55:45,438 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > serve slot request, no ResourceManager connected. Adding as pending > request > [SlotRequestId{ad35013c3b532d4b4df1be62395ae0cf}] > 2020-07-23 13:55:45,438 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot > serve slot request, no ResourceManager connected. Adding as pending > request > [SlotRequestId{c929bd5e8daf432d01fad1ece3daec1a}] > 2020-07-23 13:55:45,487 INFO > org.apache.flink.runtime.jobmaster.JobMaster > [] - Connecting to ResourceManager > akka.tcp://flink@flink-jobmanager > :6123/user/rpc/resourcemanager_*(00000000000000000000000000000000) > 2020-07-23 13:55:45,492 INFO > org.apache.flink.runtime.jobmaster.JobMaster > [] - Resolved ResourceManager address, beginning > registration > 2020-07-23 13:55:45,493 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registering job manager 00000000000000000000000000000000 > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > 99a030d0e3f428490a501c0132f27a56. > 2020-07-23 13:55:45,499 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registered job manager 00000000000000000000000000000000 > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > 99a030d0e3f428490a501c0132f27a56. > 2020-07-23 13:55:45,501 INFO > org.apache.flink.runtime.jobmaster.JobMaster > [] - JobManager successfully registered at > ResourceManager, > leader id: 00000000000000000000000000000000. > 2020-07-23 13:55:45,501 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{15fd2a9565c2b080748c1d1592b1cbbc}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,502 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Request slot with profile ResourceProfile{UNKNOWN} for job > 99a030d0e3f428490a501c0132f27a56 with allocation id > d420d08bf2654d9ea76955c70db18b69. > 2020-07-23 13:55:45,502 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{8cd72cc16f0e319d915a9a096a1096d7}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,503 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{e7e422409acebdb385014a9634af6a90}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,503 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{cef1af73546ca1fc27ca7a3322e9e815}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,503 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{108fe0b3086567ad79275eccef2fdaf8}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,503 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{265e67985eab7a6dc08024e53bf2708d}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,503 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{7087497a17c441f1a1d6fefcbc7cd0ea}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,503 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{14ac08438e79c8db8d25d93b99d62725}] and > profile ResourceProfile{UNKNOWN} from resource manager. > > 2020-07-23 13:55:45,514 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Request slot with profile ResourceProfile{UNKNOWN} for job > 99a030d0e3f428490a501c0132f27a56 with allocation id > fce526bbe3e1be91caa3e4b536b20e35. > 2020-07-23 13:55:45,514 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{40c7abbb12514c405323b0569fb21647}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,514 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{a4985a9647b65b30a571258b45c8f2ce}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,515 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{c52a6eb2fa58050e71e7903590019fd1}] and > profile ResourceProfile{UNKNOWN} from resource manager. > > 2020-07-23 13:55:45,517 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Request slot with profile ResourceProfile{UNKNOWN} for job > 99a030d0e3f428490a501c0132f27a56 with allocation id > 18ac7ec802ebfcfed8c05ee9324a55a4. > > 2020-07-23 13:55:45,518 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Request slot with profile ResourceProfile{UNKNOWN} for job > 99a030d0e3f428490a501c0132f27a56 with allocation id > 7ec76cbe689eb418b63599e90ade19be. > 2020-07-23 13:55:45,518 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{46d65692a8b5aad11b51f9a74a666a74}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,518 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{3670bb4f345eedf941cc18e477ba1e9d}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,518 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{4a12467d76b9e3df8bc3412c0be08e14}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,518 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{e092b12b96b0a98bbf057e71b9705c23}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,518 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{4ad15f417716c9e07fca383990c0f52a}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,518 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{345fdb427a893b7fc3f4f040f93445d2}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,519 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Requesting new slot [SlotRequestId{e559485ea7b0b7e17367816882538d90}] and > profile ResourceProfile{UNKNOWN} from resource manager. > 2020-07-23 13:55:45,519 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Request slot with profile ResourceProfile{UNKNOWN} for job > 99a030d0e3f428490a501c0132f27a56 with allocation id > b78837a29b4032924ac25be70ed21a3c. > > > 2020-07-23 13:58:18,037 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.47.96.2, using IP > address > as host name. Local input split assignment (such as for HDFS files) may > be > impacted. > 2020-07-23 13:58:22,192 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.34.64.14, using IP > address > as host name. Local input split assignment (such as for HDFS files) may > be > impacted. > 2020-07-23 13:58:22,358 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.34.128.9, using IP > address > as host name. Local input split assignment (such as for HDFS files) may > be > impacted. > 2020-07-23 13:58:24,562 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.32.160.6, using IP > address > as host name. Local input split assignment (such as for HDFS files) may > be > impacted. > 2020-07-23 13:58:25,487 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.38.64.7, using IP > address > as host name. Local input split assignment (such as for HDFS files) may > be > impacted. > 2020-07-23 13:58:27,636 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.42.160.6, using IP > address > as host name. Local input split assignment (such as for HDFS files) may > be > impacted. > 2020-07-23 13:58:27,767 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.43.64.12, using IP > address > as host name. Local input split assignment (such as for HDFS files) may > be > impacted. > 2020-07-23 13:58:29,651 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > The heartbeat of JobManager with id 456a18b6c404cb11a359718e16de1c6b > timed > out. > 2020-07-23 13:58:29,651 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Disconnect job manager 00000000000000000000000000000000 > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > 99a030d0e3f428490a501c0132f27a56 from the resource manager. > 2020-07-23 13:58:29,854 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.39.0.8, using IP address > as host name. Local input split assignment (such as for HDFS files) may > be > impacted. > 2020-07-23 13:58:33,623 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.35.0.10, using IP > address > as host name. Local input split assignment (such as for HDFS files) may > be > impacted. > 2020-07-23 13:58:35,756 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.36.32.8, using IP > address > as host name. Local input split assignment (such as for HDFS files) may > be > impacted. > 2020-07-23 13:58:36,694 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No > hostname could be resolved for the IP address 10.42.128.6, using IP > address > as host name. Local input split assignment (such as for HDFS files) may > be > impacted. > > > 2020-07-23 14:01:17,814 INFO > org.apache.flink.runtime.jobmaster.JobMaster > [] - Close ResourceManager connection > 83b1ff14900abfd54418e7fa3efb3f8a: The heartbeat of JobManager with id > 456a18b6c404cb11a359718e16de1c6b timed out.. > 2020-07-23 14:01:17,815 INFO > org.apache.flink.runtime.jobmaster.JobMaster > [] - Connecting to ResourceManager > akka.tcp://flink@flink-jobmanager > :6123/user/rpc/resourcemanager_*(00000000000000000000000000000000) > 2020-07-23 14:01:17,816 INFO > org.apache.flink.runtime.jobmaster.JobMaster > [] - Resolved ResourceManager address, beginning > registration > 2020-07-23 14:01:17,816 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registering job manager 00000000000000000000000000000000 > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > 99a030d0e3f428490a501c0132f27a56. > 2020-07-23 14:01:17,836 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - > Source: > host_relation -> Timestamps/Watermarks -> Map (1/1) > (302ca9640e2d209a543d843f2996ccd2) switched from SCHEDULED to FAILED on > not > deployed. > > org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: > Could not allocate the required slot within slot request timeout. Please > make sure that the cluster has enough resources. > at > > > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeWrapWithNoResourceAvailableException(DefaultScheduler.java:441) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.scheduler.DefaultScheduler.lambda$assignResourceOrHandleError$6(DefaultScheduler.java:422) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > > > org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.lambda$internalAllocateSlot$0(SchedulerImpl.java:168) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > > > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$SingleTaskSlot.release(SlotSharingManager.java:726) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.release(SlotSharingManager.java:537) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.lambda$new$0(SlotSharingManager.java:432) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > > > org.apache.flink.runtime.concurrent.FutureUtils.lambda$forwardTo$21(FutureUtils.java:1120) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > > > org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:1036) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:402) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:195) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > [flink-dist_2.11-1.11.1.jar:1.11.1] > Caused by: java.util.concurrent.CompletionException: > java.util.concurrent.TimeoutException > at > > > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) > ~[?:1.8.0_242] > at > > java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) > ~[?:1.8.0_242] > ... 25 more > Caused by: java.util.concurrent.TimeoutException > ... 23 more > 2020-07-23 14:01:17,848 INFO > > > org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy > [] - Calculating tasks to restart to recover the failed task > cbc357ccb763df2852fee8c4fc7d55f2_0. > 2020-07-23 14:01:17,910 INFO > > > org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy > [] - 902 tasks should be restarted to recover the failed task > cbc357ccb763df2852fee8c4fc7d55f2_0. > 2020-07-23 14:01:17,913 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job > JobTest (99a030d0e3f428490a501c0132f27a56) switched from state RUNNING to > FAILING. > org.apache.flink.runtime.JobException: Recovery is suppressed by > NoRestartBackoffTimeStrategy > at > > > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:116) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:78) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:192) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:185) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:179) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:503) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.scheduler.UpdateSchedulerNgOnInternalFailuresListener.notifyTaskFailure(UpdateSchedulerNgOnInternalFailuresListener.java:49) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.executiongraph.ExecutionGraph.notifySchedulerNgAboutInternalTaskFailure(ExecutionGraph.java:1710) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1287) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1255) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.executiongraph.Execution.markFailed(Execution.java:1086) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.executiongraph.ExecutionVertex.markFailed(ExecutionVertex.java:748) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.scheduler.DefaultExecutionVertexOperations.markFailed(DefaultExecutionVertexOperations.java:41) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskDeploymentFailure(DefaultScheduler.java:435) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.scheduler.DefaultScheduler.lambda$assignResourceOrHandleError$6(DefaultScheduler.java:422) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > > > org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.lambda$internalAllocateSlot$0(SchedulerImpl.java:168) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > > > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$SingleTaskSlot.release(SlotSharingManager.java:726) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.release(SlotSharingManager.java:537) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.lambda$new$0(SlotSharingManager.java:432) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > > > org.apache.flink.runtime.concurrent.FutureUtils.lambda$forwardTo$21(FutureUtils.java:1120) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > > > org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:1036) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:402) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:195) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > [flink-dist_2.11-1.11.1.jar:1.11.1] > Caused by: > > org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: > Could not allocate the required slot within slot request timeout. Please > make sure that the cluster has enough resources. > at > > > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeWrapWithNoResourceAvailableException(DefaultScheduler.java:441) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > ... 45 more > Caused by: java.util.concurrent.CompletionException: > java.util.concurrent.TimeoutException > at > > > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) > ~[?:1.8.0_242] > at > > java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) > ~[?:1.8.0_242] > ... 25 more > Caused by: java.util.concurrent.TimeoutException > ... 23 more > > > > 2020-07-23 14:01:18,109 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - > Discarding the results produced by task execution > 1809eb912d69854f2babedeaf879df6a. > 2020-07-23 14:01:18,110 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job > JobTest (99a030d0e3f428490a501c0132f27a56) switched from state FAILING to > FAILED. > org.apache.flink.runtime.JobException: Recovery is suppressed by > NoRestartBackoffTimeStrategy > at > > > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:116) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:78) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:192) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:185) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:179) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:503) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.scheduler.UpdateSchedulerNgOnInternalFailuresListener.notifyTaskFailure(UpdateSchedulerNgOnInternalFailuresListener.java:49) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.executiongraph.ExecutionGraph.notifySchedulerNgAboutInternalTaskFailure(ExecutionGraph.java:1710) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1287) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1255) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.executiongraph.Execution.markFailed(Execution.java:1086) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.executiongraph.ExecutionVertex.markFailed(ExecutionVertex.java:748) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.scheduler.DefaultExecutionVertexOperations.markFailed(DefaultExecutionVertexOperations.java:41) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskDeploymentFailure(DefaultScheduler.java:435) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.scheduler.DefaultScheduler.lambda$assignResourceOrHandleError$6(DefaultScheduler.java:422) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > > > org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl.lambda$internalAllocateSlot$0(SchedulerImpl.java:168) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > > > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$SingleTaskSlot.release(SlotSharingManager.java:726) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.release(SlotSharingManager.java:537) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.jobmaster.slotpool.SlotSharingManager$MultiTaskSlot.lambda$new$0(SlotSharingManager.java:432) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > > > org.apache.flink.runtime.concurrent.FutureUtils.lambda$forwardTo$21(FutureUtils.java:1120) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) > ~[?:1.8.0_242] > at > > > org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:1036) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:402) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:195) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > [flink-dist_2.11-1.11.1.jar:1.11.1] > at > > > akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > [flink-dist_2.11-1.11.1.jar:1.11.1] > Caused by: > > org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: > Could not allocate the required slot within slot request timeout. Please > make sure that the cluster has enough resources. > at > > > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeWrapWithNoResourceAvailableException(DefaultScheduler.java:441) > ~[flink-dist_2.11-1.11.1.jar:1.11.1] > ... 45 more > Caused by: java.util.concurrent.CompletionException: > java.util.concurrent.TimeoutException > at > > > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) > ~[?:1.8.0_242] > at > > java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) > ~[?:1.8.0_242] > at > > > java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) > ~[?:1.8.0_242] > ... 25 more > Caused by: java.util.concurrent.TimeoutException > ... 23 more > 2020-07-23 14:01:18,114 INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - > Stopping > checkpoint coordinator for job 99a030d0e3f428490a501c0132f27a56. > 2020-07-23 14:01:18,117 INFO > org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore > [] > - Shutting down > 2020-07-23 14:01:18,118 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph [] - > Discarding the results produced by task execution > 302ca9640e2d209a543d843f2996ccd2. > 2020-07-23 14:01:18,120 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Pending > slot request [SlotRequestId{15fd2a9565c2b080748c1d1592b1cbbc}] timed out. > 2020-07-23 14:01:18,120 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Pending > slot request [SlotRequestId{8cd72cc16f0e319d915a9a096a1096d7}] timed out. > 2020-07-23 14:01:18,120 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Pending > slot request [SlotRequestId{e7e422409acebdb385014a9634af6a90}] timed out. > 2020-07-23 14:01:18,121 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Pending > slot request [SlotRequestId{cef1af73546ca1fc27ca7a3322e9e815}] timed out. > 2020-07-23 14:01:18,121 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Pending > slot request [SlotRequestId{108fe0b3086567ad79275eccef2fdaf8}] timed out. > 2020-07-23 14:01:18,121 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Pending > slot request [SlotRequestId{265e67985eab7a6dc08024e53bf2708d}] timed out. > 2020-07-23 14:01:18,122 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Pending > slot request [SlotRequestId{7087497a17c441f1a1d6fefcbc7cd0ea}] timed out. > 2020-07-23 14:01:18,122 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Pending > slot request [ > > > 2020-07-23 14:01:18,151 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registering job manager 00000000000000000000000000000000 > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > 99a030d0e3f428490a501c0132f27a56. > 2020-07-23 14:01:18,157 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registered job manager 00000000000000000000000000000000 > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > 99a030d0e3f428490a501c0132f27a56. > 2020-07-23 14:01:18,157 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registered job manager 00000000000000000000000000000000 > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > 99a030d0e3f428490a501c0132f27a56. > 2020-07-23 14:01:18,157 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Job > 99a030d0e3f428490a501c0132f27a56 reached globally terminal state FAILED. > 2020-07-23 14:01:18,162 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registered job manager 00000000000000000000000000000000 > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > 99a030d0e3f428490a501c0132f27a56. > 2020-07-23 14:01:18,162 INFO > org.apache.flink.runtime.jobmaster.JobMaster > [] - JobManager successfully registered at > ResourceManager, > leader id: 00000000000000000000000000000000. > 2020-07-23 14:01:18,225 INFO > org.apache.flink.runtime.jobmaster.JobMaster > [] - Stopping the JobMaster for job > JobTest(99a030d0e3f428490a501c0132f27a56). > 2020-07-23 14:01:18,381 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Suspending SlotPool. > 2020-07-23 14:01:18,382 INFO > org.apache.flink.runtime.jobmaster.JobMaster > [] - Close ResourceManager connection > 83b1ff14900abfd54418e7fa3efb3f8a: JobManager is shutting down.. > 2020-07-23 14:01:18,382 INFO > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - > Stopping > SlotPool. > 2020-07-23 14:01:18,382 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Disconnect job manager 00000000000000000000000000000000 > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job > 99a030d0e3f428490a501c0132f27a56 from the resource manager. > > a511955993 > 邮箱:[hidden email] > > < > > https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=a511955993&uid=a511955993%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22%E9%82%AE%E7%AE%B1%EF%BC%9Aa511955993%40163.com%22%5D> > ; > > 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail88> 定制 > > On 07/23/2020 13:26, Yang Wang <[hidden email]> wrote: > 很高兴你的问题解决了,但我觉得根本原因应该不是加上了taskmanager-query-state-service.yaml的关系。 > 我这边不创建这个服务也是正常的,而且nslookup {tm_ip_address}是可以正常反解析到hostname的。 > > 注意这里不是解析hostname,而是通过ip地址来反解析进行验证 > > > 回答你说的两个问题: > 1. 不是必须的,我这边验证不需要创建,集群也是可以正常运行任务的。Rest > service的暴露方式是ClusterIP、NodePort、LoadBalancer都正常 > 2. 如果没有配置taskmanager.bind-host, > [Flink-15911][Flink-15154]这两个JIRA并不会影响TM向RM注册时候的使用的地址 > > 如果你想找到根本原因,那可能需要你这边提供JM/TM的完整log,这样方便分析 > > > Best, > Yang > > SmileSmile <[hidden email]> 于2020年7月23日周四 上午11:30写道: > > > Hi Yang Wang > > 刚刚在测试环境测试了一下,taskManager没有办法nslookup出来,JM可以nslookup,这两者的差别在于是否有service。 > > 解决方案:我这边给集群加上了taskmanager-query-state-service.yaml(按照官网上是可选服务)。就不会刷No > hostname could be resolved for ip > address,将NodePort改为ClusterIp,作业就可以成功提交,不会出现time out的问题了,问题得到了解决。 > > > 1. 如果按照上面的情况,那么这个配置文件是必须配置的? > > 2. 在1.11的更新中,发现有 [Flink-15911][Flink-15154] > 支持分别配置用于本地监听绑定的网络接口和外部访问的地址和端口。是否是这块的改动, > 需要JM去通过TM上报的ip反向解析出service? > > > Bset! > > > [1] > > > > https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/kubernetes.html > > a511955993 > 邮箱:[hidden email] > > < > > > https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=a511955993&uid=a511955993%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22%E9%82%AE%E7%AE%B1%EF%BC%9Aa511955993%40163.com%22%5D> > ; > > > 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail88> 定制 > > On 07/23/2020 10:11, Yang Wang <[hidden email]> wrote: > 我的意思就是你在Flink任务运行的过程中,然后下面的命令在集群里面起一个busybox的pod, > 在里面执行 nslookup {ip_address},看看是否能够正常解析到。如果不能应该就是coredns的 > 问题了 > > kubectl run -i -t busybox --image=busybox --restart=Never > > 你需要确认下集群的coredns pod是否正常,一般是部署在kube-system这个namespace下的 > > > > Best, > Yang > > > SmileSmile <[hidden email]> 于2020年7月22日周三 下午7:57写道: > > > Hi,Yang Wang! > > 很开心可以收到你的回复,你的回复帮助很大,让我知道了问题的方向。我再补充些信息,希望可以帮我进一步判断一下问题根源。 > > 在JM报错的地方,No hostname could be resolved for ip address xxxxx > ,报出来的ip是k8s分配给flink pod的内网ip,不是宿主机的ip。请问这个问题可能出在哪里呢 > > Best! > > > a511955993 > 邮箱:[hidden email] > > < > > > > https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=a511955993&uid=a511955993%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22%E9%82%AE%E7%AE%B1%EF%BC%9Aa511955993%40163.com%22%5D> > ; > > > > 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail88> 定制 > > On 07/22/2020 18:18, Yang Wang <[hidden email]> wrote: > 如果你的日志里面一直在刷No hostname could be resolved for the IP > address,应该是集群的coredns > 有问题,由ip地址反查hostname查不到。你可以起一个busybox验证一下是不是这个ip就解析不了,有 > 可能是coredns有问题 > > > Best, > Yang > > Congxian Qiu <[hidden email]> 于2020年7月21日周二 下午7:29写道: > > Hi > 不确定 k8s 环境中能否看到 pod 的完整日志?类似 Yarn 的 NM 日志一样,如果有的话,可以尝试看一下这个 pod > 的完整日志有没有什么发现 > Best, > Congxian > > > SmileSmile <[hidden email]> 于2020年7月21日周二 下午3:19写道: > > Hi,Congxian > > 因为是测试环境,没有配置HA,目前看到的信息,就是JM刷出来大量的no hostname could be > resolved,jm失联,作业提交失败。 > 将jm内存配置为10g也是一样的情况(jobmanager.memory.pprocesa.size:10240m)。 > > 在同一个环境将版本回退到1.10没有出现该问题,也不会刷如上报错。 > > > 是否有其他排查思路? > > Best! > > > > > | | > a511955993 > | > | > 邮箱:[hidden email] > | > > 签名由 网易邮箱大师 定制 > > On 07/16/2020 13:17, Congxian Qiu wrote: > Hi > 如果没有异常,GC 情况也正常的话,或许可以看一下 pod 的相关日志,如果开启了 HA 也可以看一下 zk > 的日志。之前遇到过一次在 > Yarn > 环境中类似的现象是由于其他原因导致的,通过看 NM 日志以及 zk 日志发现的原因。 > > Best, > Congxian > > > SmileSmile <[hidden email]> 于2020年7月15日周三 下午5:20写道: > > Hi Roc > > 该现象在1.10.1版本没有,在1.11版本才出现。请问这个该如何查比较合适 > > > > | | > a511955993 > | > | > 邮箱:[hidden email] > | > > 签名由 网易邮箱大师 定制 > > On 07/15/2020 17:16, Roc Marshal wrote: > Hi,SmileSmile. > 个人之前有遇到过 类似 的host解析问题,可以从k8s的pod节点网络映射角度排查一下。 > 希望这对你有帮助。 > > > 祝好。 > Roc Marshal > > > > > > > > > > > > 在 2020-07-15 17:04:18,"SmileSmile" <[hidden email]> 写道: > > Hi > > 使用版本Flink 1.11,部署方式 kubernetes session。 TM个数30个,每个TM 4个slot。 > job > 并行度120.提交作业的时候出现大量的No hostname could be resolved for the IP > address,JM > time > out,作业提交失败。web ui也会卡主无响应。 > > 用wordCount,并行度只有1提交也会刷,no > hostname的日志会刷个几条,然后正常提交,如果并行度一上去,就会超时。 > > > 部分日志如下: > > 2020-07-15 16:58:46,460 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] > - > No > hostname could be resolved for the IP address 10.32.160.7, > using > IP > address > as host name. Local input split assignment (such as for HDFS > files) > may > be > impacted. > 2020-07-15 16:58:46,460 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] > - > No > hostname could be resolved for the IP address 10.44.224.7, > using > IP > address > as host name. Local input split assignment (such as for HDFS > files) > may > be > impacted. > 2020-07-15 16:58:46,461 WARN > org.apache.flink.runtime.taskmanager.TaskManagerLocation [] > - > No > hostname could be resolved for the IP address 10.40.32.9, using > IP > address > as host name. Local input split assignment (such as for HDFS > files) > may > be > impacted. > > 2020-07-15 16:59:10,236 INFO > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager > [] - > The > heartbeat of JobManager with id > 69a0d460de468888a9f41c770d963c0a > timed > out. > 2020-07-15 16:59:10,236 INFO > > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager > [] - > Disconnect job manager 00000000000000000000000000000000 > @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 > for > job > e1554c737e37ed79688a15c746b6e9ef from the resource manager. > > > how to deal with ? > > > beset ! > > | | > a511955993 > | > | > 邮箱:[hidden email] > | > > 签名由 网易邮箱大师 定制 > > > > > > > > > > > |
Free forum by Nabble | Edit this page |