大家好:
环境一些作业出现下面异常,怀疑是Flink JobManager配置HA的问题,请问可能是什么问题? 异常信息: Internal server error., <Exception on server side:\norg.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fencing token not set: Ignoring message LocalFencedMessage(93428232915e7c6e4947aaa5910341a8, LocalRpcInvocation(requestMultipleJobDetails(Time))) sent to akka.tcp://flink@cluster2-host1:40443/user/dispatcher because the fencing token is null. at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:63) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:147) at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40) at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165) at akka.actor.Actor$class.aroundReceive(Actor.scala:502) at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526) at akka.actor.ActorCell.invoke(ActorCell.scala:495) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) at akka.dispatch.Mailbox.run(Mailbox.scala:224) at akka.dispatch.Mailbox.exec(Mailbox.scala:234) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) End of exception on server side> |
啥 flink 版本啊?1.10 Dispatcher 魔改之后应该不会 null 的
Best, tison. whirly <[hidden email]> 于2020年6月9日周二 下午8:58写道: > 大家好: > 环境一些作业出现下面异常,怀疑是Flink JobManager配置HA的问题,请问可能是什么问题? > > > 异常信息: > Internal server error., > <Exception on server > side:\norg.apache.flink.runtime.rpc.exceptions.FencingTokenException: > Fencing token not set: Ignoring message > LocalFencedMessage(93428232915e7c6e4947aaa5910341a8, > LocalRpcInvocation(requestMultipleJobDetails(Time))) > sent to akka.tcp://flink@cluster2-host1:40443/user/dispatcher because the > fencing token is null. > at > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:63) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:147) > at > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40) > at > akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165) > at akka.actor.Actor$class.aroundReceive(Actor.scala:502) > at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526) > at akka.actor.ActorCell.invoke(ActorCell.scala:495) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) > at akka.dispatch.Mailbox.run(Mailbox.scala:224) > at akka.dispatch.Mailbox.exec(Mailbox.scala:234) > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > End of exception on server side> |
你可以详细说一下场景,这个我想了一下应该是你选举窗口太长了
0. 某个时候,Dispatcher 选出了 Leader 并发布自己的地址 1. 某个组件向 Dispatcher 发了个消息,你这里前端点击之后后端 WebMonitor 给 Dispatcher 发 requestMultipleJobDetails 消息 2. Dispatcher 跟 zk 链接抖动,丢 leader 了。早期版本会把这个 fencing token 设置成 null 3. 1 里面的消息到达 Dispatcher,Dispatcher 走 fencing token 逻辑,看到是 null 4. 抛出此异常 如果稍后又选举成功,这里的异常应该是 fencing token mismatch 一类的 Best, tison. tison <[hidden email]> 于2020年6月9日周二 下午9:15写道: > 啥 flink 版本啊?1.10 Dispatcher 魔改之后应该不会 null 的 > > Best, > tison. > > > whirly <[hidden email]> 于2020年6月9日周二 下午8:58写道: > >> 大家好: >> 环境一些作业出现下面异常,怀疑是Flink JobManager配置HA的问题,请问可能是什么问题? >> >> >> 异常信息: >> Internal server error., >> <Exception on server >> side:\norg.apache.flink.runtime.rpc.exceptions.FencingTokenException: >> Fencing token not set: Ignoring message >> LocalFencedMessage(93428232915e7c6e4947aaa5910341a8, >> LocalRpcInvocation(requestMultipleJobDetails(Time))) >> sent to akka.tcp://flink@cluster2-host1:40443/user/dispatcher because >> the fencing token is null. >> at >> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:63) >> at >> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:147) >> at >> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40) >> at >> akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165) >> at akka.actor.Actor$class.aroundReceive(Actor.scala:502) >> at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95) >> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526) >> at akka.actor.ActorCell.invoke(ActorCell.scala:495) >> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) >> at akka.dispatch.Mailbox.run(Mailbox.scala:224) >> at akka.dispatch.Mailbox.exec(Mailbox.scala:234) >> at >> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >> at >> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) >> at >> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >> at >> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) >> End of exception on server side> > > |
In reply to this post by tison
Flink 1.8
| | whirly | | 邮箱:[hidden email] | 签名由 网易邮箱大师 定制 在2020年06月09日 21:15,tison 写道: 啥 flink 版本啊?1.10 Dispatcher 魔改之后应该不会 null 的 Best, tison. whirly <[hidden email]> 于2020年6月9日周二 下午8:58写道: > 大家好: > 环境一些作业出现下面异常,怀疑是Flink JobManager配置HA的问题,请问可能是什么问题? > > > 异常信息: > Internal server error., > <Exception on server > side:\norg.apache.flink.runtime.rpc.exceptions.FencingTokenException: > Fencing token not set: Ignoring message > LocalFencedMessage(93428232915e7c6e4947aaa5910341a8, > LocalRpcInvocation(requestMultipleJobDetails(Time))) > sent to akka.tcp://flink@cluster2-host1:40443/user/dispatcher because the > fencing token is null. > at > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:63) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:147) > at > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40) > at > akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165) > at akka.actor.Actor$class.aroundReceive(Actor.scala:502) > at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526) > at akka.actor.ActorCell.invoke(ActorCell.scala:495) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) > at akka.dispatch.Mailbox.run(Mailbox.scala:224) > at akka.dispatch.Mailbox.exec(Mailbox.scala:234) > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > End of exception on server side> |
噢,那应该就是上面说的问题了
你的 Dispatcher 能被发现说明一开始选主和发布是 ok 的,你可以贴一下 HA 的配置,看看有没特别不靠谱的,然后去日志里找一下丢 leadership 的日志,一般来说前后会有一堆 zk 链接 ConnectionLoss 或者 SessionExpire 的日志 Best, tison. whirly <[hidden email]> 于2020年6月9日周二 下午9:23写道: > Flink 1.8 > > > > > | | > whirly > | > | > 邮箱:[hidden email] > | > > 签名由 网易邮箱大师 定制 > > 在2020年06月09日 21:15,tison 写道: > 啥 flink 版本啊?1.10 Dispatcher 魔改之后应该不会 null 的 > > Best, > tison. > > > whirly <[hidden email]> 于2020年6月9日周二 下午8:58写道: > > > 大家好: > > 环境一些作业出现下面异常,怀疑是Flink JobManager配置HA的问题,请问可能是什么问题? > > > > > > 异常信息: > > Internal server error., > > <Exception on server > > side:\norg.apache.flink.runtime.rpc.exceptions.FencingTokenException: > > Fencing token not set: Ignoring message > > LocalFencedMessage(93428232915e7c6e4947aaa5910341a8, > > LocalRpcInvocation(requestMultipleJobDetails(Time))) > > sent to akka.tcp://flink@cluster2-host1:40443/user/dispatcher because > the > > fencing token is null. > > at > > > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:63) > > at > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:147) > > at > > > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40) > > at > > > akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165) > > at akka.actor.Actor$class.aroundReceive(Actor.scala:502) > > at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95) > > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526) > > at akka.actor.ActorCell.invoke(ActorCell.scala:495) > > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) > > at akka.dispatch.Mailbox.run(Mailbox.scala:224) > > at akka.dispatch.Mailbox.exec(Mailbox.scala:234) > > at > > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > > at > > > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > > at > > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > > at > > > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > End of exception on server side> > |
Free forum by Nabble | Edit this page |