Apache Flink 中文用户邮件列表

使用window function后checkpoint fail，报错Operation category WRITE is not supported in state standby

Classic

List

Threaded

2 messages Options

kingdomad

使用window function后checkpoint fail，报错Operation category WRITE is not supported in state standby

flink 1.11.1消费kafka0.10.1.1，然后开窗口去重统计，时间是eventtime，窗口是1分钟。
程序的结构大致如下：
kafkaStream.keyBy(<keyselector>).window(<windowassigner>).aggregate(newAverageAggregate());

flink on yarn，
程序能跑，但无法checkpoint，查看taskmanager的日志，发现报错如下。
查看了下，那几个节点都是正常的running。如果去掉窗口统计的代码，直接print kafkaStream，程序是可以正常checkpoint的。日志上也看不出其他问题，百思不得其解。求助各位大佬。

2020-11-18 13:30:52,475 INFO org.apache.kafka.common.utils.AppInfoParser [] - Kafka version : 0.10.2.2

2020-11-18 13:30:52,475 INFO org.apache.kafka.common.utils.AppInfoParser [] - Kafka commitId : cd80bc412b9b9701

2020-11-18 13:31:09,668 INFO org.apache.hadoop.io.retry.RetryInvocationHandler [] - org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category WRITE is not supported in state standby. Visit https://s.apache.org/sbnn-error

at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:88)

at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1952)

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1423)

at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:776)

at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:475)

at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)

at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)

at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1685)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)

, while invoking ClientNamenodeProtocolTranslatorPB.create over xxx:8020 after 1 failover attempts. Trying to failover after sleeping for 864ms.

--

kingdomad

kingdomad

Re:使用window function后checkpoint fail，报错Operation category WRITE is not supported in state standby

问题好像解决了。
使用flink-connector-kafka-0.10_2.12的FlinkKafkaConsumer010就会无法checkpoint，报这个错误，
换成flink-connector-kafka-2.12的FlinkKafkaConsumer就可以正常checkpoint，没报错。
CheckpointingMode是EXACTLY_ONCE或AT_LEAST_ONCE情况都相同。
尚不知何原因。

--

kingdomad

在 2020-11-18 17:19:29，"kingdomad" <[hidden email]> 写道：

>flink 1.11.1消费kafka0.10.1.1，然后开窗口去重统计，时间是eventtime，窗口是1分钟。
>程序的结构大致如下：
>kafkaStream.keyBy(<keyselector>).window(<windowassigner>).aggregate(newAverageAggregate());
>
>
>flink on yarn，
>程序能跑，但无法checkpoint，查看taskmanager的日志，发现报错如下。
>查看了下，那几个节点都是正常的running。如果去掉窗口统计的代码，直接print kafkaStream，程序是可以正常checkpoint的。日志上也看不出其他问题，百思不得其解。求助各位大佬。
>
>
>
>
>
>
>2020-11-18 13:30:52,475 INFO org.apache.kafka.common.utils.AppInfoParser [] - Kafka version : 0.10.2.2
>
>2020-11-18 13:30:52,475 INFO org.apache.kafka.common.utils.AppInfoParser [] - Kafka commitId : cd80bc412b9b9701
>
>2020-11-18 13:31:09,668 INFO org.apache.hadoop.io.retry.RetryInvocationHandler [] - org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category WRITE is not supported in state standby. Visit https://s.apache.org/sbnn-error
>
>at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:88)
>
>at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1952)
>
>at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1423)
>
>at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:776)
>
>at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:475)
>
>at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>
>at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>
>at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>
>at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
>
>at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
>
>at java.security.AccessController.doPrivileged(Native Method)
>
>at javax.security.auth.Subject.doAs(Subject.java:422)
>
>at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1685)
>
>at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
>
>, while invoking ClientNamenodeProtocolTranslatorPB.create over xxx:8020 after 1 failover attempts. Trying to failover after sleeping for 864ms.
>
>
>
>
>
>
>
>
>
>
>
>
>
>--
>
>kingdomad
>