版本:Flink 1.12.0
环境:Native Kubernetes
模式:Application Mode
描述:
Flink以Native Kubernetes Application模式运行在k8s时,使用filesystem OSS作为backend发现日志请求OSS报错。
当代码使用`source.setStartFromEarliest();`,启动job之后从头开始消费,运行过程正常,运行到最新点位后会出现以下报错,过一段时间或者重启job之后报错消失。
当代码使用`source.setStartFromLatest();`,启动job之后直接从最新点位开始消费,则不会出现此报错。
据观察请问是我哪里配置或者使用有问题么?
命令:
./bin/flink run-application \ --target kubernetes-application \ -Dkubernetes.cluster-id=demo \ -Dkubernetes.container.image=xx/xx/xx:2.0.16 \ -Dstate.backend=filesystem \ -Dstate.checkpoints.dir=<a href="oss://bucket/文件夹" class="">oss://bucket/文件夹 \ -Dfs.oss.endpoint=oss-cn-beijing-internal.aliyuncs.com \ -Dfs.oss.accessKeyId=xx \ -Dfs.oss.accessKeySecret=xx \ <a href="local:///opt/flink/usrlib/my-flink-job.jar" class="">local:///opt/flink/usrlib/my-flink-job.jar 报错日志:
kill进程pod重启或过一段时间后taskManager正常日志:
oss内文件:
chk-10880目录:
|
2021-03-04 02:33:25,292 DEBUG org.apache.flink.runtime.rpc.akka.SupervisorActor [] - Starting FencedAkkaRpcActor with name jobmanager_2.
2021/3/4 上午10:33:25 2021-03-04 02:33:25,304 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting RPC endpoint for org.apache.flink.runtime.jobmaster.JobMaster at akka://flink/user/rpc/jobmanager_2 . 2021/3/4 上午10:33:25 2021-03-04 02:33:25,310 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Initializing job TransactionAndAccount (00000000000000000000000000000000). 2021/3/4 上午10:33:25 2021-03-04 02:33:25,323 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Using restart back off time strategy FixedDelayRestartBackoffTimeStrategy(maxNumberRestartAttempts=2147483647, backoffTimeMS=1000) for TransactionAndAccount (00000000000000000000000000000000). 2021/3/4 上午10:33:25 2021-03-04 02:33:25,380 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Running initialization on master for job TransactionAndAccount (00000000000000000000000000000000). 2021/3/4 上午10:33:25 2021-03-04 02:33:25,380 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Successfully ran initialization on master in 0 ms. 2021/3/4 上午10:33:25 2021-03-04 02:33:25,381 DEBUG org.apache.flink.runtime.jobmaster.JobMaster [] - Adding 2 vertices from job graph TransactionAndAccount (00000000000000000000000000000000). 2021/3/4 上午10:33:25 2021-03-04 02:33:25,381 DEBUG org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Attaching 2 topologically sorted vertices to existing job graph with 0 vertices and 0 intermediate results. 2021/3/4 上午10:33:25 2021-03-04 02:33:25,389 DEBUG org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Connecting ExecutionJobVertex cbc357ccb763df2852fee8c4fc7d55f2 (Source: Custom Source -> format to json -> Filter -> process timestamp range -> Timestamps/Watermarks) to 0 predecessors. 2021/3/4 上午10:33:25 2021-03-04 02:33:25,389 DEBUG org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Connecting ExecutionJobVertex 337adade1e207453ed3502e01d75fd03 (Window(TumblingEventTimeWindows(86400000), EventTimeTrigger, SumAggregator, PassThroughWindowFunction) -> Flat Map -> Sink: tidb) to 1 predecessors. 2021/3/4 上午10:33:25 2021-03-04 02:33:25,389 DEBUG org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Connecting input 0 of vertex 337adade1e207453ed3502e01d75fd03 (Window(TumblingEventTimeWindows(86400000), EventTimeTrigger, SumAggregator, PassThroughWindowFunction) -> Flat Map -> Sink: tidb) to intermediate result referenced via predecessor cbc357ccb763df2852fee8c4fc7d55f2 (Source: Custom Source -> format to json -> Filter -> process timestamp range -> Timestamps/Watermarks). 2021/3/4 上午10:33:25 2021-03-04 02:33:25,395 INFO org.apache.flink.runtime.scheduler.adapter.DefaultExecutionTopology [] - Built 1 pipelined regions in 2 ms 2021/3/4 上午10:33:25 2021-03-04 02:33:25,396 DEBUG org.apache.flink.runtime.jobmaster.JobMaster [] - Successfully created execution graph from job graph TransactionAndAccount (00000000000000000000000000000000). 2021/3/4 上午10:33:25 2021-03-04 02:33:25,406 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Using job/cluster config to configure application-defined state backend: File State Backend (checkpoints: 'oss://xx/backend', savepoints: 'null', asynchronous: TRUE, fileStateThreshold: 20480) 2021/3/4 上午10:33:25 2021-03-04 02:33:25,406 INFO org.apache.flink.runtime.jobmaster.JobMaster [] - Using application-defined state backend: File State Backend (checkpoints: 'oss://xx/backend', savepoints: 'null', asynchronous: TRUE, fileStateThreshold: 20480) 2021/3/4 上午10:33:25 2021-03-04 02:33:25,419 INFO org.apache.flink.fs.osshadoop.shaded.com.aliyun.oss [] - [Server]Unable to execute HTTP request: Not Found 2021/3/4 上午10:33:25 [ErrorCode]: NoSuchKey 2021/3/4 上午10:33:25 [RequestId]: 604046F58B49C830320A1A53 2021/3/4 上午10:33:25 [HostId]: null 2021/3/4 上午10:33:25 2021-03-04 02:33:25,432 INFO org.apache.flink.fs.osshadoop.shaded.com.aliyun.oss [] - [Server]Unable to execute HTTP request: Not Found 2021/3/4 上午10:33:25 [ErrorCode]: NoSuchKey 2021/3/4 上午10:33:25 [RequestId]: 604046F58B49C830322A1A53 2021/3/4 上午10:33:25 [HostId]: null 2021/3/4 上午10:33:25 2021-03-04 02:33:25,442 INFO org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] - Recovering checkpoints from KubernetesStateHandleStore{configMapName='demo-00000000000000000000000000000000-jobmanager-leader'}. 2021/3/4 上午10:33:25 2021-03-04 02:33:25,448 INFO org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] - Found 1 checkpoints in KubernetesStateHandleStore{configMapName='demo-00000000000000000000000000000000-jobmanager-leader'}. 2021/3/4 上午10:33:25 2021-03-04 02:33:25,449 INFO org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] - Trying to fetch 1 checkpoints from storage. 2021/3/4 上午10:33:25 2021-03-04 02:33:25,449 INFO org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] - Trying to retrieve checkpoint 10167. 2021/3/4 上午10:33:25 2021-03-04 02:33:25,483 DEBUG org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Status of the shared state registry of job 00000000000000000000000000000000 after restore: SharedStateRegistry{registeredStates={}}. 2021/3/4 上午10:33:25 2021-03-04 02:33:25,483 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Restoring job 00000000000000000000000000000000 from Checkpoint 10167 @ 1614825175716 for 00000000000000000000000000000000 located at oss://xx/backend/00000000000000000000000000000000/chk-10167. 检查了jobmanager日志,同样存在此报错NoSuchKey 2021年3月3日 上午11:23,王 羽凡 <[hidden email]<mailto:[hidden email]>> 写道: 版本:Flink 1.12.0 环境:Native Kubernetes 模式:Application Mode 描述: Flink以Native Kubernetes Application模式运行在k8s时,使用filesystem OSS作为backend发现日志请求OSS报错。 当代码使用`source.setStartFromEarliest();`,启动job之后从头开始消费,运行过程正常,运行到最新点位后会出现以下报错,过一段时间或者重启job之后报错消失。 当代码使用`source.setStartFromLatest();`,启动job之后直接从最新点位开始消费,则不会出现此报错。 据观察请问是我哪里配置或者使用有问题么? 命令: ./bin/flink run-application \ --target kubernetes-application \ -Dkubernetes.cluster-id=demo \ -Dkubernetes.container.image=xx/xx/xx:2.0.16 \ -Dstate.backend=filesystem \ -Dstate.checkpoints.dir=oss://bucket/文件夹<oss://bucket/%E6%96%87%E4%BB%B6%E5%A4%B9> \ -Dfs.oss.endpoint=oss-cn-beijing-internal.aliyuncs.com<http://oss-cn-beijing-internal.aliyuncs.com/> \ -Dfs.oss.accessKeyId=xx \ -Dfs.oss.accessKeySecret=xx \ local:///opt/flink/usrlib/my-flink-job.jar 报错日志: 2021-03-03 02:53:46,133 INFO org.apache.flink.streaming.connectors.pulsar.internal.PulsarMetadataReader [] - Committing offset 12701:1:-1:4 to topic TopicRange{topic=persistent://public/xx/xxxx, key-range=SerializableRange{range=[0, 65535]}} 2021/3/3 上午10:53:46 2021-03-03 02:53:46,140 INFO org.apache.flink.streaming.connectors.pulsar.internal.PulsarMetadataReader [] - Successfully committed offset 12701:1:-1:4 to topic TopicRange{topic=persistent://public/xx/xxxx, key-range=SerializableRange{range=[0, 65535]}} 2021/3/3 上午10:53:50 2021-03-03 02:53:50,899 INFO org.apache.flink.fs.osshadoop.shaded.com.aliyun.oss [] - [Server]Unable to execute HTTP request: Not Found 2021/3/3 上午10:53:50 [ErrorCode]: NoSuchKey 2021/3/3 上午10:53:50 [RequestId]: xx 2021/3/3 上午10:53:50 [HostId]: null 2021/3/3 上午10:53:50 2021-03-03 02:53:50,904 INFO org.apache.flink.fs.osshadoop.shaded.com.aliyun.oss [] - [Server]Unable to execute HTTP request: Not Found 2021/3/3 上午10:53:50 [ErrorCode]: NoSuchKey 2021/3/3 上午10:53:50 [RequestId]: xx 2021/3/3 上午10:53:50 [HostId]: null kill进程pod重启或过一段时间后taskManager正常日志: 2021-03-03 03:18:21,602 INFO org.apache.flink.streaming.connectors.pulsar.internal.PulsarMetadataReader [] - Successfully committed offset 12716:7:-1:1 to topic TopicRange{topic=persistent://public/xx/xxxx, key-range=SerializableRange{range=[0, 65535]}} 2021/3/3 上午11:18:26 2021-03-03 03:18:26,573 INFO org.apache.flink.streaming.connectors.pulsar.internal.PulsarMetadataReader [] - Committing offset 12716:7:-1:1 to topic TopicRange{topic=persistent://public/xx/xxxx, key-range=SerializableRange{range=[0, 65535]}} 2021/3/3 上午11:18:26 2021-03-03 03:18:26,582 INFO org.apache.flink.streaming.connectors.pulsar.internal.PulsarMetadataReader [] - Successfully committed offset 12716:7:-1:1 to topic TopicRange{topic=persistent://public/xx/xxxx, key-range=SerializableRange{range=[0, 65535]}} 2021/3/3 上午11:18:31 2021-03-03 03:18:31,571 INFO org.apache.flink.streaming.connectors.pulsar.internal.PulsarMetadataReader [] - Committing offset 12716:7:-1:1 to topic TopicRange{topic=persistent://public/xx/xxxx, key-range=SerializableRange{range=[0, 65535]}} 2021/3/3 上午11:18:31 2021-03-03 03:18:31,580 INFO org.apache.flink.streaming.connectors.pulsar.internal.PulsarMetadataReader [] - Successfully committed offset 12716:7:-1:1 to topic TopicRange{topic=persistent://public/xx/xxxx, key-range=SerializableRange{range=[0, 65535]}} 2021/3/3 上午11:18:36 2021-03-03 03:18:36,633 INFO org.apache.flink.streaming.connectors.pulsar.internal.PulsarMetadataReader [] - Committing offset 12716:7:-1:1 to topic TopicRange{topic=persistent://public/xx/xxxx, key-range=SerializableRange{range=[0, 65535]}} 2021/3/3 上午11:18:36 2021-03-03 03:18:36,642 INFO org.apache.flink.streaming.connectors.pulsar.internal.PulsarMetadataReader [] - Successfully committed offset 12716:7:-1:1 to topic TopicRange{topic=persistent://public/xx/xxxx, key-range=SerializableRange{range=[0, 65535]}} oss内文件: <粘贴的图形-1.png> chk-10880目录: <粘贴的图形-2.png> |
Free forum by Nabble | Edit this page |