flink提交offset失败

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

flink提交offset失败

孙福
版本:flink版本:  flink-1.7.1
kafka客户端版本:    flink-connector-kafka-0.11_2.12
kafka服务端版本2.0.0或者1.0.1
flink未启用checkpoint机制,kafka的都是默认配置,offset保存在kafka上


flink-kafka-connector启动参数设置如下:
auto.commit.interval.ms = 5000
        auto.offset.reset = earliest
        bootstrap.servers = [ip:9092]
        check.crcs = true
        client.id =
        connections.max.idle.ms = 540000
        enable.auto.commit = true
        exclude.internal.topics = true
        fetch.max.bytes = 52428800
        fetch.max.wait.ms = 500
        fetch.min.bytes = 1
        group.id = test_user_action_sdb_java2
        heartbeat.interval.ms = 3000
        interceptor.classes = null
        internal.leave.group.on.close = true
        isolation.level = read_uncommitted
        key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
        max.partition.fetch.bytes = 1048576
        max.poll.interval.ms = 300000
        max.poll.records = 500
        metadata.max.age.ms = 300000
        metric.reporters = []
        metrics.num.samples = 2
        metrics.recording.level = INFO
        metrics.sample.window.ms = 30000
        partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
        receive.buffer.bytes = 65536
        reconnect.backoff.max.ms = 1000
        reconnect.backoff.ms = 50
        request.timeout.ms = 305000
        retry.backoff.ms = 100
        sasl.jaas.config = null
        sasl.kerberos.kinit.cmd = /usr/bin/kinit
        sasl.kerberos.min.time.before.relogin = 60000
        sasl.kerberos.service.name = null
        sasl.kerberos.ticket.renew.jitter = 0.05
        sasl.kerberos.ticket.renew.window.factor = 0.8
        sasl.mechanism = GSSAPI
        security.protocol = PLAINTEXT
        send.buffer.bytes = 131072
        session.timeout.ms = 10000
        ssl.cipher.suites = null
        ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
        ssl.endpoint.identification.algorithm = null
        ssl.key.password = null
        ssl.keymanager.algorithm = SunX509
        ssl.keystore.location = null
        ssl.keystore.password = null
        ssl.keystore.type = JKS
        ssl.protocol = TLS
        ssl.provider = null
        ssl.secure.random.implementation = null
        ssl.trustmanager.algorithm = PKIX
        ssl.truststore.location = null
        ssl.truststore.password = null
        ssl.truststore.type = JKS
        value.deserializer = class org.apache.kafka.common.serialization.StringDeserializer


现象:
正常运行一段时间后,突然有一个partition的offset提交不上,但是还能正常消费处理,只是提交不了offset。这个问题不好复现,偶尔出现这个错误。


flink端异常信息如下:
2019-06-03 13:30:56,827 INFO  org.apache.kafka.clients.consumer.internals.AbstractCoordinator  - Marking the coordinator 1.1.1.1:9092 (id: 2147483030 rack: null) dead for group flink-ad-realtime-useraction
2019-06-03 13:30:56,829 WARN  org.apache.kafka.clients.consumer.internals.ConsumerCoordinator  - Auto-commit of offsets {user_action-96=OffsetAndMetadata{offset=3208384842, metadata=''}, user_action-48=OffsetAndMetadata{offset=3204869414, metadata=''}, user_action-0=OffsetAndMetadata{offset=3208651598, metadata=''}, user_action-120=OffsetAndMetadata{offset=3208633960, metadata=''}, user_action-24=OffsetAndMetadata{offset=3205592887, metadata=''}, user_action-72=OffsetAndMetadata{offset=3209105919, metadata=''}} failed for group flink-ad-realtime-useraction: Offset commit failed with a retriable exception. You should retry committing offsets. The underlying error was: The request timed out.


kafka broker端异常信息:



我们这边查了很久,不知道是什么原因导致offset提交失败的错误,还请帮忙看看。
Reply | Threaded
Open this post in threaded view
|

回复:flink提交offset失败

龚中强
你的一次性捞了500条消息,在处理的时候消费者超时了。导致coordinator 认为消费者挂了。

解决方法:
1.适当调低max.poll.records
2.调长消费者超时时间


------------------ 原始邮件 ------------------
发件人: "孙福"<[hidden email]>;
发送时间: 2019年8月16日(星期五) 下午5:27
收件人: "user-zh"<[hidden email]>;

主题: flink提交offset失败



版本:flink版本:  flink-1.7.1
kafka客户端版本:    flink-connector-kafka-0.11_2.12
kafka服务端版本2.0.0或者1.0.1
flink未启用checkpoint机制,kafka的都是默认配置,offset保存在kafka上


flink-kafka-connector启动参数设置如下:
auto.commit.interval.ms = 5000
        auto.offset.reset = earliest
        bootstrap.servers = [ip:9092]
        check.crcs = true
        client.id =
        connections.max.idle.ms = 540000
        enable.auto.commit = true
        exclude.internal.topics = true
        fetch.max.bytes = 52428800
        fetch.max.wait.ms = 500
        fetch.min.bytes = 1
        group.id = test_user_action_sdb_java2
        heartbeat.interval.ms = 3000
        interceptor.classes = null
        internal.leave.group.on.close = true
        isolation.level = read_uncommitted
        key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
        max.partition.fetch.bytes = 1048576
        max.poll.interval.ms = 300000
        max.poll.records = 500
        metadata.max.age.ms = 300000
        metric.reporters = []
        metrics.num.samples = 2
        metrics.recording.level = INFO
        metrics.sample.window.ms = 30000
        partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
        receive.buffer.bytes = 65536
        reconnect.backoff.max.ms = 1000
        reconnect.backoff.ms = 50
        request.timeout.ms = 305000
        retry.backoff.ms = 100
        sasl.jaas.config = null
        sasl.kerberos.kinit.cmd = /usr/bin/kinit
        sasl.kerberos.min.time.before.relogin = 60000
        sasl.kerberos.service.name = null
        sasl.kerberos.ticket.renew.jitter = 0.05
        sasl.kerberos.ticket.renew.window.factor = 0.8
        sasl.mechanism = GSSAPI
        security.protocol = PLAINTEXT
        send.buffer.bytes = 131072
        session.timeout.ms = 10000
        ssl.cipher.suites = null
        ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
        ssl.endpoint.identification.algorithm = null
        ssl.key.password = null
        ssl.keymanager.algorithm = SunX509
        ssl.keystore.location = null
        ssl.keystore.password = null
        ssl.keystore.type = JKS
        ssl.protocol = TLS
        ssl.provider = null
        ssl.secure.random.implementation = null
        ssl.trustmanager.algorithm = PKIX
        ssl.truststore.location = null
        ssl.truststore.password = null
        ssl.truststore.type = JKS
        value.deserializer = class org.apache.kafka.common.serialization.StringDeserializer


现象:
正常运行一段时间后,突然有一个partition的offset提交不上,但是还能正常消费处理,只是提交不了offset。这个问题不好复现,偶尔出现这个错误。


flink端异常信息如下:
2019-06-03 13:30:56,827 INFO  org.apache.kafka.clients.consumer.internals.AbstractCoordinator  - Marking the coordinator 1.1.1.1:9092 (id: 2147483030 rack: null) dead for group flink-ad-realtime-useraction
2019-06-03 13:30:56,829 WARN  org.apache.kafka.clients.consumer.internals.ConsumerCoordinator  - Auto-commit of offsets {user_action-96=OffsetAndMetadata{offset=3208384842, metadata=''}, user_action-48=OffsetAndMetadata{offset=3204869414, metadata=''}, user_action-0=OffsetAndMetadata{offset=3208651598, metadata=''}, user_action-120=OffsetAndMetadata{offset=3208633960, metadata=''}, user_action-24=OffsetAndMetadata{offset=3205592887, metadata=''}, user_action-72=OffsetAndMetadata{offset=3209105919, metadata=''}} failed for group flink-ad-realtime-useraction: Offset commit failed with a retriable exception. You should retry committing offsets. The underlying error was: The request timed out.


kafka broker端异常信息:



我们这边查了很久,不知道是什么原因导致offset提交失败的错误,还请帮忙看看。
Reply | Threaded
Open this post in threaded view
|

Re:回复:flink提交offset失败

孙福


您好:
max.poll.interval.ms=300000=5分钟
5分钟我肯定处理完500条数据了

在 2019-08-16 17:34:16,"龚中强" <[hidden email]> 写道:

>你的一次性捞了500条消息,在处理的时候消费者超时了。导致coordinator 认为消费者挂了。
>
>解决方法:
>1.适当调低max.poll.records
>2.调长消费者超时时间
>
>
>------------------ 原始邮件 ------------------
>发件人: "孙福"<[hidden email]>;
>发送时间: 2019年8月16日(星期五) 下午5:27
>收件人: "user-zh"<[hidden email]>;
>
>主题: flink提交offset失败
>
>
>
>版本:flink版本:  flink-1.7.1
>kafka客户端版本:    flink-connector-kafka-0.11_2.12
>kafka服务端版本2.0.0或者1.0.1
>flink未启用checkpoint机制,kafka的都是默认配置,offset保存在kafka上
>
>
>flink-kafka-connector启动参数设置如下:
>auto.commit.interval.ms = 5000
>        auto.offset.reset = earliest
>        bootstrap.servers = [ip:9092]
>        check.crcs = true
>        client.id =
>        connections.max.idle.ms = 540000
>        enable.auto.commit = true
>        exclude.internal.topics = true
>        fetch.max.bytes = 52428800
>        fetch.max.wait.ms = 500
>        fetch.min.bytes = 1
>        group.id = test_user_action_sdb_java2
>        heartbeat.interval.ms = 3000
>        interceptor.classes = null
>        internal.leave.group.on.close = true
>        isolation.level = read_uncommitted
>        key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
>        max.partition.fetch.bytes = 1048576
>        max.poll.interval.ms = 300000
>        max.poll.records = 500
>        metadata.max.age.ms = 300000
>        metric.reporters = []
>        metrics.num.samples = 2
>        metrics.recording.level = INFO
>        metrics.sample.window.ms = 30000
>        partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
>        receive.buffer.bytes = 65536
>        reconnect.backoff.max.ms = 1000
>        reconnect.backoff.ms = 50
>        request.timeout.ms = 305000
>        retry.backoff.ms = 100
>        sasl.jaas.config = null
>        sasl.kerberos.kinit.cmd = /usr/bin/kinit
>        sasl.kerberos.min.time.before.relogin = 60000
>        sasl.kerberos.service.name = null
>        sasl.kerberos.ticket.renew.jitter = 0.05
>        sasl.kerberos.ticket.renew.window.factor = 0.8
>        sasl.mechanism = GSSAPI
>        security.protocol = PLAINTEXT
>        send.buffer.bytes = 131072
>        session.timeout.ms = 10000
>        ssl.cipher.suites = null
>        ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
>        ssl.endpoint.identification.algorithm = null
>        ssl.key.password = null
>        ssl.keymanager.algorithm = SunX509
>        ssl.keystore.location = null
>        ssl.keystore.password = null
>        ssl.keystore.type = JKS
>        ssl.protocol = TLS
>        ssl.provider = null
>        ssl.secure.random.implementation = null
>        ssl.trustmanager.algorithm = PKIX
>        ssl.truststore.location = null
>        ssl.truststore.password = null
>        ssl.truststore.type = JKS
>        value.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
>
>
>现象:
>正常运行一段时间后,突然有一个partition的offset提交不上,但是还能正常消费处理,只是提交不了offset。这个问题不好复现,偶尔出现这个错误。
>
>
>flink端异常信息如下:
>2019-06-03 13:30:56,827 INFO  org.apache.kafka.clients.consumer.internals.AbstractCoordinator  - Marking the coordinator 1.1.1.1:9092 (id: 2147483030 rack: null) dead for group flink-ad-realtime-useraction
>2019-06-03 13:30:56,829 WARN  org.apache.kafka.clients.consumer.internals.ConsumerCoordinator  - Auto-commit of offsets {user_action-96=OffsetAndMetadata{offset=3208384842, metadata=''}, user_action-48=OffsetAndMetadata{offset=3204869414, metadata=''}, user_action-0=OffsetAndMetadata{offset=3208651598, metadata=''}, user_action-120=OffsetAndMetadata{offset=3208633960, metadata=''}, user_action-24=OffsetAndMetadata{offset=3205592887, metadata=''}, user_action-72=OffsetAndMetadata{offset=3209105919, metadata=''}} failed for group flink-ad-realtime-useraction: Offset commit failed with a retriable exception. You should retry committing offsets. The underlying error was: The request timed out.
>
>
>kafka broker端异常信息:
>无
>
>
>我们这边查了很久,不知道是什么原因导致offset提交失败的错误,还请帮忙看看。
Reply | Threaded
Open this post in threaded view
|

Re:回复:flink提交offset失败

孙福
In reply to this post by 龚中强
您好:
max.poll.interval.ms=300000=5分钟
5分钟我肯定处理完500条数据了
在 2019-08-16 17:34:16,"龚中强" <[hidden email]> 写道:

>你的一次性捞了500条消息,在处理的时候消费者超时了。导致coordinator 认为消费者挂了。
>
>解决方法:
>1.适当调低max.poll.records
>2.调长消费者超时时间
>
>
>------------------ 原始邮件 ------------------
>发件人: "孙福"<[hidden email]>;
>发送时间: 2019年8月16日(星期五) 下午5:27
>收件人: "user-zh"<[hidden email]>;
>
>主题: flink提交offset失败
>
>
>
>版本:flink版本:  flink-1.7.1
>kafka客户端版本:    flink-connector-kafka-0.11_2.12
>kafka服务端版本2.0.0或者1.0.1
>flink未启用checkpoint机制,kafka的都是默认配置,offset保存在kafka上
>
>
>flink-kafka-connector启动参数设置如下:
>auto.commit.interval.ms = 5000
>        auto.offset.reset = earliest
>        bootstrap.servers = [ip:9092]
>        check.crcs = true
>        client.id =
>        connections.max.idle.ms = 540000
>        enable.auto.commit = true
>        exclude.internal.topics = true
>        fetch.max.bytes = 52428800
>        fetch.max.wait.ms = 500
>        fetch.min.bytes = 1
>        group.id = test_user_action_sdb_java2
>        heartbeat.interval.ms = 3000
>        interceptor.classes = null
>        internal.leave.group.on.close = true
>        isolation.level = read_uncommitted
>        key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
>        max.partition.fetch.bytes = 1048576
>        max.poll.interval.ms = 300000
>        max.poll.records = 500
>        metadata.max.age.ms = 300000
>        metric.reporters = []
>        metrics.num.samples = 2
>        metrics.recording.level = INFO
>        metrics.sample.window.ms = 30000
>        partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
>        receive.buffer.bytes = 65536
>        reconnect.backoff.max.ms = 1000
>        reconnect.backoff.ms = 50
>        request.timeout.ms = 305000
>        retry.backoff.ms = 100
>        sasl.jaas.config = null
>        sasl.kerberos.kinit.cmd = /usr/bin/kinit
>        sasl.kerberos.min.time.before.relogin = 60000
>        sasl.kerberos.service.name = null
>        sasl.kerberos.ticket.renew.jitter = 0.05
>        sasl.kerberos.ticket.renew.window.factor = 0.8
>        sasl.mechanism = GSSAPI
>        security.protocol = PLAINTEXT
>        send.buffer.bytes = 131072
>        session.timeout.ms = 10000
>        ssl.cipher.suites = null
>        ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
>        ssl.endpoint.identification.algorithm = null
>        ssl.key.password = null
>        ssl.keymanager.algorithm = SunX509
>        ssl.keystore.location = null
>        ssl.keystore.password = null
>        ssl.keystore.type = JKS
>        ssl.protocol = TLS
>        ssl.provider = null
>        ssl.secure.random.implementation = null
>        ssl.trustmanager.algorithm = PKIX
>        ssl.truststore.location = null
>        ssl.truststore.password = null
>        ssl.truststore.type = JKS
>        value.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
>
>
>现象:
>正常运行一段时间后,突然有一个partition的offset提交不上,但是还能正常消费处理,只是提交不了offset。这个问题不好复现,偶尔出现这个错误。
>
>
>flink端异常信息如下:
>2019-06-03 13:30:56,827 INFO  org.apache.kafka.clients.consumer.internals.AbstractCoordinator  - Marking the coordinator 1.1.1.1:9092 (id: 2147483030 rack: null) dead for group flink-ad-realtime-useraction
>2019-06-03 13:30:56,829 WARN  org.apache.kafka.clients.consumer.internals.ConsumerCoordinator  - Auto-commit of offsets {user_action-96=OffsetAndMetadata{offset=3208384842, metadata=''}, user_action-48=OffsetAndMetadata{offset=3204869414, metadata=''}, user_action-0=OffsetAndMetadata{offset=3208651598, metadata=''}, user_action-120=OffsetAndMetadata{offset=3208633960, metadata=''}, user_action-24=OffsetAndMetadata{offset=3205592887, metadata=''}, user_action-72=OffsetAndMetadata{offset=3209105919, metadata=''}} failed for group flink-ad-realtime-useraction: Offset commit failed with a retriable exception. You should retry committing offsets. The underlying error was: The request timed out.
>
>
>kafka broker端异常信息:
>无
>
>
>我们这边查了很久,不知道是什么原因导致offset提交失败的错误,还请帮忙看看。