hi, all: 麻烦大佬们帮看个问题,多谢! 处理逻辑如下 1. 使用DataStream API读取kafka中的数据,写入DataStream ds1中 2. 新建一个tableEnv,并注册hive catalog: tableEnv.registerCatalog(catalogName, catalog); tableEnv.useCatalog(catalogName); 3. 声明以ds1为数据源的table Table sourcetable = tableEnv.fromDataStream(ds1); String souceTableName = "music_source"; tableEnv.createTemporaryView(souceTableName, sourcetable); 4. 创建一张hive表:
5. 将step3表中的数据插入dwd_music_copyright_test 环境
问题 程序运行后,发现hive catalog部分分区未成功创建,如下未成功创建hour=02和hour=03分区:
但是hdfs目录下有文件生成:
且手动add partition后可以正常读取数据。 通过flink WebUI可以看到,过程中有checkpoint在StreamingFileCommitter时失败的情况发生: 请问: 1. exactly-once只能保证写sink文件,不能保证更新catalog吗? 2. 是的话有什么方案解决这个问题吗? 3. EXACTLY_ONCE有没有必要指定kafka参数isolation.level=read_committed和enable.auto.commit=false?是不是有了如下设置就可以保证EXACTLY_ONCE?
|
失败的图没有呢。。具体什么异常?
On Mon, Sep 7, 2020 at 10:23 AM MuChen <[hidden email]> wrote: > hi, all: > 麻烦大佬们帮看个问题,多谢! > > 处理逻辑如下 > 1. 使用DataStream API读取kafka中的数据,写入DataStream ds1中 > 2. 新建一个tableEnv,并注册hive catalog: > tableEnv.registerCatalog(catalogName, catalog); > tableEnv.useCatalog(catalogName); > 3. 声明以ds1为数据源的table > Table sourcetable = tableEnv.fromDataStream(ds1); > String souceTableName = "music_source"; > tableEnv.createTemporaryView(souceTableName, sourcetable); > 4. 创建一张hive表: > > CREATE TABLE `dwd_music_copyright_test`( > `url` string COMMENT 'url', > `md5` string COMMENT 'md5', > `utime` bigint COMMENT '时间', > `title` string COMMENT '歌曲名', > `singer` string COMMENT '演唱者', > `company` string COMMENT '公司', > `level` int COMMENT '置信度.0是标题切词,1是acrcloud返回的结果,3是人工标准') > PARTITIONED BY ( > `dt` string, > `hour` string)ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > LOCATION > 'hdfs://Ucluster/user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test' > TBLPROPERTIES ( > 'connector'='HiveCatalog', > 'partition.time-extractor.timestamp-pattern'='$dt $hour:00:00', > 'sink.partition-commit.delay'='1 min', > 'sink.partition-commit.policy.kind'='metastore,success-file', > 'sink.partition-commit.trigger'='partition-time', > 'sink.rolling-policy.check-interval'='30s', > 'sink.rolling-policy.rollover-interval'='1min', > 'sink.rolling-policy.file-size'='1MB'); > > > 5. 将step3表中的数据插入dwd_music_copyright_test > > 环境 > > flink:1.11 > kafka:1.1.1 > hadoop:2.6.0 > hive:1.2.0 > > 问题 > 程序运行后,发现hive catalog部分分区未成功创建,如下未成功创建hour=02和hour=03分区: > > show partitions rt_dwd.dwd_music_copyright_test; > > | dt=2020-08-29/hour=00 | > | dt=2020-08-29/hour=01 | > | dt=2020-08-29/hour=04 | > | dt=2020-08-29/hour=05 | > > 但是hdfs目录下有文件生成: > > $ hadoop fs -du -h /user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test/dt=2020-08-29/4.5 K 13.4 K /user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test/dt=2020-08-29/hour=002.0 K 6.1 K /user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test/dt=2020-08-29/hour=011.7 K 5.1 K /user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test/dt=2020-08-29/hour=021.3 K 3.8 K /user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test/dt=2020-08-29/hour=033.1 K 9.2 K /user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test/dt=2020-08-29/hour=04 > > > 且手动add partition后可以正常读取数据。 > > 通过flink WebUI可以看到,过程中有checkpoint在StreamingFileCommitter时失败的情况发生: > > > > > > 请问: > > 1. exactly-once只能保证写sink文件,不能保证更新catalog吗? > 2. 是的话有什么方案解决这个问题吗? > 3. > EXACTLY_ONCE有没有必要指定kafka参数isolation.level=read_committed和enable.auto.commit=false?是不是有了如下设置就可以保证EXACTLY_ONCE? > > streamEnv.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);tableEnv.getConfig().getConfiguration().set(ExecutionCheckpointingOptions.CHECKPOINTING_MODE, CheckpointingMode.EXACTLY_ONCE); > > -- Best, Jingsong Lee |
hi,jingsong:
图片发失败了,传到了图床: https://s1.ax1x.com/2020/09/07/wn1CFg.png checkpoint失败日志: 2020-09-04 17:17:59 org.apache.flink.util.FlinkRuntimeException: Exceeded checkpoint tolerable failure threshold. at org.apache.flink.runtime.checkpoint.CheckpointFailureManager.handleJobLevelCheckpointException(CheckpointFailureManager.java:66) at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:1626) at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:1603) at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.access$600(CheckpointCoordinator.java:90) at org.apache.flink.runtime.checkpoint.CheckpointCoordinator$CheckpointCanceller.run(CheckpointCoordinator.java:1736) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ------------------ 原始邮件 ------------------ 发件人: "user-zh" <[hidden email]>; 发送时间: 2020年9月7日(星期一) 上午10:55 收件人: "user-zh"<[hidden email]>; 主题: Re: 使用StreamingFileSink向hive metadata中增加分区部分失败 失败的图没有呢。。具体什么异常? On Mon, Sep 7, 2020 at 10:23 AM MuChen <[hidden email]> wrote: > hi, all: > 麻烦大佬们帮看个问题,多谢! > > 处理逻辑如下 > 1. 使用DataStream API读取kafka中的数据,写入DataStream ds1中 > 2. 新建一个tableEnv,并注册hive catalog: > tableEnv.registerCatalog(catalogName, catalog); > tableEnv.useCatalog(catalogName); > 3. 声明以ds1为数据源的table > Table sourcetable = tableEnv.fromDataStream(ds1); > String souceTableName = "music_source"; > tableEnv.createTemporaryView(souceTableName, sourcetable); > 4. 创建一张hive表: > > CREATE TABLE `dwd_music_copyright_test`( > `url` string COMMENT 'url', > `md5` string COMMENT 'md5', > `utime` bigint COMMENT '时间', > `title` string COMMENT '歌曲名', > `singer` string COMMENT '演唱者', > `company` string COMMENT '公司', > `level` int COMMENT '置信度.0是标题切词,1是acrcloud返回的结果,3是人工标准') > PARTITIONED BY ( > `dt` string, > `hour` string)ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > LOCATION > 'hdfs://Ucluster/user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test' > TBLPROPERTIES ( > 'connector'='HiveCatalog', > 'partition.time-extractor.timestamp-pattern'='$dt $hour:00:00', > 'sink.partition-commit.delay'='1 min', > 'sink.partition-commit.policy.kind'='metastore,success-file', > 'sink.partition-commit.trigger'='partition-time', > 'sink.rolling-policy.check-interval'='30s', > 'sink.rolling-policy.rollover-interval'='1min', > 'sink.rolling-policy.file-size'='1MB'); > > > 5. 将step3表中的数据插入dwd_music_copyright_test > > 环境 > > flink:1.11 > kafka:1.1.1 > hadoop:2.6.0 > hive:1.2.0 > > 问题 > 程序运行后,发现hive catalog部分分区未成功创建,如下未成功创建hour=02和hour=03分区: > > show partitions rt_dwd.dwd_music_copyright_test; > > | dt=2020-08-29/hour=00 | > | dt=2020-08-29/hour=01 | > | dt=2020-08-29/hour=04 | > | dt=2020-08-29/hour=05 | > > 但是hdfs目录下有文件生成: > > $ hadoop fs -du -h /user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test/dt=2020-08-29/4.5 K 13.4 K /user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test/dt=2020-08-29/hour=002.0 K 6.1 K /user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test/dt=2020-08-29/hour=011.7 K 5.1 K /user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test/dt=2020-08-29/hour=021.3 K 3.8 K /user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test/dt=2020-08-29/hour=033.1 K 9.2 K /user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test/dt=2020-08-29/hour=04 > > > 且手动add partition后可以正常读取数据。 > > 通过flink WebUI可以看到,过程中有checkpoint在StreamingFileCommitter时失败的情况发生: > > > > > > 请问: > > 1. exactly-once只能保证写sink文件,不能保证更新catalog吗? > 2. 是的话有什么方案解决这个问题吗? > 3. > EXACTLY_ONCE有没有必要指定kafka参数isolation.level=read_committed和enable.auto.commit=false?是不是有了如下设置就可以保证EXACTLY_ONCE? > > streamEnv.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);tableEnv.getConfig().getConfiguration().set(ExecutionCheckpointingOptions.CHECKPOINTING_MODE, CheckpointingMode.EXACTLY_ONCE); > > -- Best, Jingsong Lee |
还是没有找到原因,请大家帮诊断一下
------------------ 原始邮件 ------------------ 发件人: "MuChen" <[hidden email]>; 发送时间: 2020年9月7日(星期一) 中午11:01 收件人: "user-zh"<[hidden email]>; 主题: 回复: 使用StreamingFileSink向hive metadata中增加分区部分失败 hi,jingsong: 图片发失败了,传到了图床: https://s1.ax1x.com/2020/09/07/wn1CFg.png checkpoint失败日志: 2020-09-04 17:17:59 org.apache.flink.util.FlinkRuntimeException: Exceeded checkpoint tolerable failure threshold. at org.apache.flink.runtime.checkpoint.CheckpointFailureManager.handleJobLevelCheckpointException(CheckpointFailureManager.java:66) at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:1626) at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:1603) at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.access$600(CheckpointCoordinator.java:90) at org.apache.flink.runtime.checkpoint.CheckpointCoordinator$CheckpointCanceller.run(CheckpointCoordinator.java:1736) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ------------------ 原始邮件 ------------------ 发件人: "user-zh" <[hidden email]>; 发送时间: 2020年9月7日(星期一) 上午10:55 收件人: "user-zh"<[hidden email]>; 主题: Re: 使用StreamingFileSink向hive metadata中增加分区部分失败 失败的图没有呢。。具体什么异常? On Mon, Sep 7, 2020 at 10:23 AM MuChen <[hidden email]> wrote: > hi, all: > 麻烦大佬们帮看个问题,多谢! > > 处理逻辑如下 > 1. 使用DataStream API读取kafka中的数据,写入DataStream ds1中 > 2. 新建一个tableEnv,并注册hive catalog: > tableEnv.registerCatalog(catalogName, catalog); > tableEnv.useCatalog(catalogName); > 3. 声明以ds1为数据源的table > Table sourcetable = tableEnv.fromDataStream(ds1); > String souceTableName = "music_source"; > tableEnv.createTemporaryView(souceTableName, sourcetable); > 4. 创建一张hive表: > > CREATE TABLE `dwd_music_copyright_test`( > `url` string COMMENT 'url', > `md5` string COMMENT 'md5', > `utime` bigint COMMENT '时间', > `title` string COMMENT '歌曲名', > `singer` string COMMENT '演唱者', > `company` string COMMENT '公司', > `level` int COMMENT '置信度.0是标题切词,1是acrcloud返回的结果,3是人工标准') > PARTITIONED BY ( > `dt` string, > `hour` string)ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > LOCATION > 'hdfs://Ucluster/user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test' > TBLPROPERTIES ( > 'connector'='HiveCatalog', > 'partition.time-extractor.timestamp-pattern'='$dt $hour:00:00', > 'sink.partition-commit.delay'='1 min', > 'sink.partition-commit.policy.kind'='metastore,success-file', > 'sink.partition-commit.trigger'='partition-time', > 'sink.rolling-policy.check-interval'='30s', > 'sink.rolling-policy.rollover-interval'='1min', > 'sink.rolling-policy.file-size'='1MB'); > > > 5. 将step3表中的数据插入dwd_music_copyright_test > > 环境 > > flink:1.11 > kafka:1.1.1 > hadoop:2.6.0 > hive:1.2.0 > > 问题 > 程序运行后,发现hive catalog部分分区未成功创建,如下未成功创建hour=02和hour=03分区: > > show partitions rt_dwd.dwd_music_copyright_test; > > | dt=2020-08-29/hour=00 | > | dt=2020-08-29/hour=01 | > | dt=2020-08-29/hour=04 | > | dt=2020-08-29/hour=05 | > > 但是hdfs目录下有文件生成: > > $ hadoop fs -du -h /user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test/dt=2020-08-29/4.5 K 13.4 K /user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test/dt=2020-08-29/hour=002.0 K 6.1 K /user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test/dt=2020-08-29/hour=011.7 K 5.1 K /user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test/dt=2020-08-29/hour=021.3 K 3.8 K /user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test/dt=2020-08-29/hour=033.1 K 9.2 K /user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test/dt=2020-08-29/hour=04 > > > 且手动add partition后可以正常读取数据。 > > 通过flink WebUI可以看到,过程中有checkpoint在StreamingFileCommitter时失败的情况发生: > > > > > > 请问: > > 1. exactly-once只能保证写sink文件,不能保证更新catalog吗? > 2. 是的话有什么方案解决这个问题吗? > 3. > EXACTLY_ONCE有没有必要指定kafka参数isolation.level=read_committed和enable.auto.commit=false?是不是有了如下设置就可以保证EXACTLY_ONCE? > > streamEnv.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);tableEnv.getConfig().getConfiguration().set(ExecutionCheckpointingOptions.CHECKPOINTING_MODE, CheckpointingMode.EXACTLY_ONCE); > > -- Best, Jingsong Lee |
异常日志只有这些么?有没有详细点的
|
在checkpoint失败的时间,tm上还有一些info和warn级别的日志:
2020-09-04 17:17:59,520 INFO org.apache.hadoop.io.retry.RetryInvocationHandler [] - Exception while invoking create of class ClientNamenodeProtocolTranslatorPB over uhadoop-op3raf-master2/10.42.52.202:8020 after 14 fail over attempts. Trying to fail over immediately. java.io.IOException: java.lang.InterruptedException at org.apache.hadoop.ipc.Client.call(Client.java:1449) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.ipc.Client.call(Client.java:1401) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at com.sun.proxy.$Proxy26.create(Unknown Source) ~[?:?] at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:295) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) ~[?:?] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_144] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_144] at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at com.sun.proxy.$Proxy27.create(Unknown Source) ~[?:?] at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1721) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1657) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1582) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:397) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:393) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:393) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:337) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:141) ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] at org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:37) ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] at org.apache.flink.table.filesystem.SuccessFileCommitPolicy.commit(SuccessFileCommitPolicy.java:45) ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] at org.apache.flink.table.filesystem.stream.StreamingFileCommitter.commitPartitions(StreamingFileCommitter.java:167) ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] at org.apache.flink.table.filesystem.stream.StreamingFileCommitter.processElement(StreamingFileCommitter.java:144) ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:161) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:178) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:153) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:345) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:191) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:181) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:558) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:530) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144] Caused by: java.lang.InterruptedException at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404) ~[?:1.8.0_144] at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:1.8.0_144] at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1048) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.ipc.Client.call(Client.java:1443) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] ... 38 more 2020-09-04 17:17:59,522 WARN org.apache.hadoop.io.retry.RetryInvocationHandler [] - Exception while invoking class org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create over uhadoop-op3raf-master1/10.42.31.63:8020. Not retrying because failovers (15) exceeded maximum allowed (15) java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "uhadoop-op3raf-core13/10.42.99.178"; destination host is: "uhadoop-op3raf-master1":8020; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.ipc.Client.call(Client.java:1474) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.ipc.Client.call(Client.java:1401) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at com.sun.proxy.$Proxy26.create(Unknown Source) ~[?:?] at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:295) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) ~[?:?] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_144] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_144] at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at com.sun.proxy.$Proxy27.create(Unknown Source) ~[?:?] at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1721) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1657) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1582) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:397) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:393) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:393) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:337) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:141) ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] at org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:37) ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] at org.apache.flink.table.filesystem.SuccessFileCommitPolicy.commit(SuccessFileCommitPolicy.java:45) ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] at org.apache.flink.table.filesystem.stream.StreamingFileCommitter.commitPartitions(StreamingFileCommitter.java:167) ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] at org.apache.flink.table.filesystem.stream.StreamingFileCommitter.processElement(StreamingFileCommitter.java:144) ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:161) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:178) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:153) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:345) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:191) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:181) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:558) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:530) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144] Caused by: java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) ~[?:1.8.0_144] at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659) ~[?:1.8.0_144] at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:609) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.ipc.Client.getConnection(Client.java:1523) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.ipc.Client.call(Client.java:1440) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] ... 38 more 补充:程序多次执行,均会出现部分分区创建失败的情况,而且每次失败的分区是不同的 ------------------ 原始邮件 ------------------ 发件人: "[hidden email] 夏帅" <[hidden email]>; 发送时间: 2020年9月8日(星期二) 上午10:47 收件人: "user-zh"<[hidden email]>;"MuChen"<[hidden email]>; 主题: 回复:使用StreamingFileSink向hive metadata中增加分区部分失败 异常日志只有这些么?有没有详细点的 |
就第二次提供的日志看,好像是你的namenode出现的问题
------------------------------------------------------------------ 发件人:MuChen <[hidden email]> 发送时间:2020年9月8日(星期二) 10:56 收件人:[hidden email] 夏帅 <[hidden email]>; user-zh <[hidden email]> 主 题:回复: 回复:使用StreamingFileSink向hive metadata中增加分区部分失败 在checkpoint失败的时间,tm上还有一些info和warn级别的日志: 2020-09-04 17:17:59,520 INFO org.apache.hadoop.io.retry.RetryInvocationHandler [] - Exception while invoking create of class ClientNamenodeProtocolTranslatorPB over uhadoop-op3raf-master2/10.42.52.202:8020 after 14 fail over attempts. Trying to fail over immediately. java.io.IOException: java.lang.InterruptedException at org.apache.hadoop.ipc.Client.call(Client.java:1449) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.ipc.Client.call(Client.java:1401) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at com.sun.proxy.$Proxy26.create(Unknown Source) ~[?:?] at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:295) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) ~[?:?] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_144] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_144] at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at com.sun.proxy.$Proxy27.create(Unknown Source) ~[?:?] at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1721) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1657) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1582) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:397) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:393) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:393) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:337) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:141) ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] at org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:37) ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] at org.apache.flink.table.filesystem.SuccessFileCommitPolicy.commit(SuccessFileCommitPolicy.java:45) ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] at org.apache.flink.table.filesystem.stream.StreamingFileCommitter.commitPartitions(StreamingFileCommitter.java:167) ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] at org.apache.flink.table.filesystem.stream.StreamingFileCommitter.processElement(StreamingFileCommitter.java:144) ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:161) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:178) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:153) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:345) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:191) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:181) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:558) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:530) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144] Caused by: java.lang.InterruptedException at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404) ~[?:1.8.0_144] at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:1.8.0_144] at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1048) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.ipc.Client.call(Client.java:1443) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] ... 38 more 2020-09-04 17:17:59,522 WARN org.apache.hadoop.io.retry.RetryInvocationHandler [] - Exception while invoking class org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create over uhadoop-op3raf-master1/10.42.31.63:8020. Not retrying because failovers (15) exceeded maximum allowed (15) java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "uhadoop-op3raf-core13/10.42.99.178"; destination host is: "uhadoop-op3raf-master1":8020; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.ipc.Client.call(Client.java:1474) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.ipc.Client.call(Client.java:1401) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at com.sun.proxy.$Proxy26.create(Unknown Source) ~[?:?] at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:295) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) ~[?:?] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_144] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_144] at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at com.sun.proxy.$Proxy27.create(Unknown Source) ~[?:?] at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1721) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1657) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1582) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:397) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:393) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:393) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:337) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:141) ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] at org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:37) ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] at org.apache.flink.table.filesystem.SuccessFileCommitPolicy.commit(SuccessFileCommitPolicy.java:45) ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] at org.apache.flink.table.filesystem.stream.StreamingFileCommitter.commitPartitions(StreamingFileCommitter.java:167) ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] at org.apache.flink.table.filesystem.stream.StreamingFileCommitter.processElement(StreamingFileCommitter.java:144) ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:161) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:178) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:153) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:345) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:191) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:181) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:558) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:530) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144] Caused by: java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) ~[?:1.8.0_144] at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659) ~[?:1.8.0_144] at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:609) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.ipc.Client.getConnection(Client.java:1523) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.ipc.Client.call(Client.java:1440) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] ... 38 more 补充:程序多次执行,均会出现部分分区创建失败的情况,而且每次失败的分区是不同的 ------------------ 原始邮件 ------------------ 发件人: "[hidden email] 夏帅" <[hidden email]>; 发送时间: 2020年9月8日(星期二) 上午10:47 收件人: "user-zh"<[hidden email]>;"MuChen"<[hidden email]>; 主题: 回复:使用StreamingFileSink向hive metadata中增加分区部分失败 异常日志只有这些么?有没有详细点的 |
streaming file committer在提交分区之前会打印这样的日志:
LOG.info("Partition {} of table {} is ready to be committed", partSpec, tableIdentifier); partition commit policy会在成功提交分区以后打印这样的日志: LOG.info("Committed partition {} to metastore", partitionSpec); LOG.info("Committed partition {} with success file", context.partitionSpec()); 可以检查一下这样的日志,看是不是卡在什么地方了 On Tue, Sep 8, 2020 at 11:02 AM 夏帅 <[hidden email]> wrote: > 就第二次提供的日志看,好像是你的namenode出现的问题 > > > ------------------------------------------------------------------ > 发件人:MuChen <[hidden email]> > 发送时间:2020年9月8日(星期二) 10:56 > 收件人:[hidden email] 夏帅 <[hidden email]>; user-zh < > [hidden email]> > 主 题:回复: 回复:使用StreamingFileSink向hive metadata中增加分区部分失败 > > 在checkpoint失败的时间,tm上还有一些info和warn级别的日志: > 2020-09-04 17:17:59,520 INFO > org.apache.hadoop.io.retry.RetryInvocationHandler [] - Exception while > invoking create of class ClientNamenodeProtocolTranslatorPB over > uhadoop-op3raf-master2/10.42.52.202:8020 after 14 fail over attempts. > Trying to fail over immediately. > java.io.IOException: java.lang.InterruptedException > at org.apache.hadoop.ipc.Client.call(Client.java:1449) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at org.apache.hadoop.ipc.Client.call(Client.java:1401) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at com.sun.proxy.$Proxy26.create(Unknown Source) ~[?:?] > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:295) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) ~[?:?] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_144] > at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_144] > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at com.sun.proxy.$Proxy27.create(Unknown Source) ~[?:?] > at > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1721) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1657) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1582) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at > org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:397) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at > org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:393) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:393) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:337) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at > org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:141) > ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] > at > org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:37) > ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] > at > org.apache.flink.table.filesystem.SuccessFileCommitPolicy.commit(SuccessFileCommitPolicy.java:45) > ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] > at > org.apache.flink.table.filesystem.stream.StreamingFileCommitter.commitPartitions(StreamingFileCommitter.java:167) > ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] > at > org.apache.flink.table.filesystem.stream.StreamingFileCommitter.processElement(StreamingFileCommitter.java:144) > ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:161) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:178) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:153) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at > org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:345) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:191) > [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:181) > [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at > org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:558) > [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:530) > [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) > [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) > [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144] > Caused by: java.lang.InterruptedException > at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404) > ~[?:1.8.0_144] > at java.util.concurrent.FutureTask.get(FutureTask.java:191) > ~[?:1.8.0_144] > at > org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1048) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at org.apache.hadoop.ipc.Client.call(Client.java:1443) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > ... 38 more > 2020-09-04 17:17:59,522 WARN > org.apache.hadoop.io.retry.RetryInvocationHandler [] - Exception while > invoking class > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create > over uhadoop-op3raf-master1/10.42.31.63:8020. Not retrying because > failovers (15) exceeded maximum allowed (15) > java.io.IOException: Failed on local exception: > java.nio.channels.ClosedByInterruptException; Host Details : local host is: > "uhadoop-op3raf-core13/10.42.99.178"; destination host is: > "uhadoop-op3raf-master1":8020; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at org.apache.hadoop.ipc.Client.call(Client.java:1474) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at org.apache.hadoop.ipc.Client.call(Client.java:1401) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at com.sun.proxy.$Proxy26.create(Unknown Source) ~[?:?] > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:295) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) ~[?:?] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_144] > at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_144] > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at com.sun.proxy.$Proxy27.create(Unknown Source) ~[?:?] > at > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1721) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1657) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1582) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at > org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:397) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at > org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:393) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:393) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:337) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at > org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:141) > ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] > at > org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:37) > ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] > at > org.apache.flink.table.filesystem.SuccessFileCommitPolicy.commit(SuccessFileCommitPolicy.java:45) > ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] > at > org.apache.flink.table.filesystem.stream.StreamingFileCommitter.commitPartitions(StreamingFileCommitter.java:167) > ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] > at > org.apache.flink.table.filesystem.stream.StreamingFileCommitter.processElement(StreamingFileCommitter.java:144) > ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:161) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:178) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:153) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at > org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:345) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:191) > [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:181) > [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at > org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:558) > [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:530) > [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) > [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) > [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144] > Caused by: java.nio.channels.ClosedByInterruptException > at > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > ~[?:1.8.0_144] > at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659) > ~[?:1.8.0_144] > at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:609) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at > org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1523) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > at org.apache.hadoop.ipc.Client.call(Client.java:1440) > ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > ... 38 more > > 补充:程序多次执行,均会出现部分分区创建失败的情况,而且每次失败的分区是不同的 > > > ------------------ 原始邮件 ------------------ > 发件人: "[hidden email] 夏帅" <[hidden email]>; > 发送时间: 2020年9月8日(星期二) 上午10:47 > 收件人: "user-zh"<[hidden email]>;"MuChen"<[hidden email]>; > 主题: 回复:使用StreamingFileSink向hive metadata中增加分区部分失败 > > 异常日志只有这些么?有没有详细点的 -- Best regards! Rui Li |
hi, Rui Li:
如你所说,的确有类似日志,但是只有成功增加的分区的日志,没有失败分区的日志: 2020-09-04 17:17:10,548 INFO org.apache.flink.streaming.api.operators.AbstractStreamOperator [] - Partition {dt=2020-08-22, hour=18} of table `hive_catalog`.`rt_dwd`.`dwd_music_copyright_test` is ready to be committed 2020-09-04 17:17:10,716 INFO org.apache.flink.table.filesystem.MetastoreCommitPolicy [] - Committed partition {dt=2020-08-22, hour=18} to metastore 2020-09-04 17:17:10,720 INFO org.apache.flink.table.filesystem.SuccessFileCommitPolicy [] - Committed partition {dt=2020-08-22, hour=18} with success file 2020-09-04 17:17:19,652 INFO org.apache.flink.streaming.api.operators.AbstractStreamOperator [] - Partition {dt=2020-08-22, hour=19} of table `hive_catalog`.`rt_dwd`.`dwd_music_copyright_test` is ready to be committed 2020-09-04 17:17:19,820 INFO org.apache.flink.table.filesystem.MetastoreCommitPolicy [] - Committed partition {dt=2020-08-22, hour=19} to metastore 2020-09-04 17:17:19,824 INFO org.apache.flink.table.filesystem.SuccessFileCommitPolicy [] - Committed partition {dt=2020-08-22, hour=19} with success file 写hdfs的日志是都有的: 2020-09-04 17:16:04,100 INFO org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper [] - creating real writer to write at hdfs://Ucluster/user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test/dt=2020-08-22/hour=07/.part-b7d8f3c6-f1f3-40d4-a269-1ccf2c9a7720-0-1140.inprogress.1631ac6c-a07c-4ad7-86ff-cf0d4375d1de 2020-09-04 17:16:04,126 INFO org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper [] - creating real writer to write at hdfs://Ucluster/user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test/dt=2020-08-22/hour=19/.part-b7d8f3c6-f1f3-40d4-a269-1ccf2c9a7720-0-1142.inprogress.2700eded-5ed0-4794-8ee9-21721c0c2ffd ------------------ 原始邮件 ------------------ 发件人: "Rui Li" <[hidden email]>; 发送时间: 2020年9月8日(星期二) 中午12:09 收件人: "user-zh"<[hidden email]>;"夏帅"<[hidden email]>; 抄送: "MuChen"<[hidden email]>; 主题: Re: 回复:使用StreamingFileSink向hive metadata中增加分区部分失败 streaming file committer在提交分区之前会打印这样的日志:LOG.info("Partition {} of table {} is ready to be committed", partSpec, tableIdentifier); partition commit policy会在成功提交分区以后打印这样的日志: LOG.info("Committed partition {} to metastore", partitionSpec);LOG.info("Committed partition {} with success file", context.partitionSpec()); 可以检查一下这样的日志,看是不是卡在什么地方了 On Tue, Sep 8, 2020 at 11:02 AM 夏帅 <[hidden email]> wrote: 就第二次提供的日志看,好像是你的namenode出现的问题 ------------------------------------------------------------------ 发件人:MuChen <[hidden email]> 发送时间:2020年9月8日(星期二) 10:56 收件人:[hidden email] 夏帅 <[hidden email]>; user-zh <[hidden email]> 主 题:回复: 回复:使用StreamingFileSink向hive metadata中增加分区部分失败 在checkpoint失败的时间,tm上还有一些info和warn级别的日志: 2020-09-04 17:17:59,520 INFO org.apache.hadoop.io.retry.RetryInvocationHandler [] - Exception while invoking create of class ClientNamenodeProtocolTranslatorPB over uhadoop-op3raf-master2/10.42.52.202:8020 after 14 fail over attempts. Trying to fail over immediately. java.io.IOException: java.lang.InterruptedException at org.apache.hadoop.ipc.Client.call(Client.java:1449) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.ipc.Client.call(Client.java:1401) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at com.sun.proxy.$Proxy26.create(Unknown Source) ~[?:?] at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:295) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) ~[?:?] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_144] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_144] at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at com.sun.proxy.$Proxy27.create(Unknown Source) ~[?:?] at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1721) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1657) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1582) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:397) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:393) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:393) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:337) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:141) ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] at org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:37) ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] at org.apache.flink.table.filesystem.SuccessFileCommitPolicy.commit(SuccessFileCommitPolicy.java:45) ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] at org.apache.flink.table.filesystem.stream.StreamingFileCommitter.commitPartitions(StreamingFileCommitter.java:167) ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] at org.apache.flink.table.filesystem.stream.StreamingFileCommitter.processElement(StreamingFileCommitter.java:144) ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:161) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:178) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:153) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:345) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:191) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:181) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:558) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:530) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144] Caused by: java.lang.InterruptedException at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404) ~[?:1.8.0_144] at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:1.8.0_144] at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1048) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.ipc.Client.call(Client.java:1443) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] ... 38 more 2020-09-04 17:17:59,522 WARN org.apache.hadoop.io.retry.RetryInvocationHandler [] - Exception while invoking class org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create over uhadoop-op3raf-master1/10.42.31.63:8020. Not retrying because failovers (15) exceeded maximum allowed (15) java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "uhadoop-op3raf-core13/10.42.99.178"; destination host is: "uhadoop-op3raf-master1":8020; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.ipc.Client.call(Client.java:1474) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.ipc.Client.call(Client.java:1401) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at com.sun.proxy.$Proxy26.create(Unknown Source) ~[?:?] at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:295) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) ~[?:?] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_144] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_144] at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at com.sun.proxy.$Proxy27.create(Unknown Source) ~[?:?] at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1721) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1657) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1582) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:397) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:393) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:393) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:337) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:141) ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] at org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:37) ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] at org.apache.flink.table.filesystem.SuccessFileCommitPolicy.commit(SuccessFileCommitPolicy.java:45) ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] at org.apache.flink.table.filesystem.stream.StreamingFileCommitter.commitPartitions(StreamingFileCommitter.java:167) ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] at org.apache.flink.table.filesystem.stream.StreamingFileCommitter.processElement(StreamingFileCommitter.java:144) ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:161) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:178) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:153) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:345) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:191) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:181) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:558) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:530) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144] Caused by: java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) ~[?:1.8.0_144] at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659) ~[?:1.8.0_144] at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:609) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.ipc.Client.getConnection(Client.java:1523) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] at org.apache.hadoop.ipc.Client.call(Client.java:1440) ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] ... 38 more 补充:程序多次执行,均会出现部分分区创建失败的情况,而且每次失败的分区是不同的 ------------------ 原始邮件 ------------------ 发件人: "[hidden email] 夏帅" <[hidden email]>; 发送时间: 2020年9月8日(星期二) 上午10:47 收件人: "user-zh"<[hidden email]>;"MuChen"<[hidden email]>; 主题: 回复:使用StreamingFileSink向hive metadata中增加分区部分失败 异常日志只有这些么?有没有详细点的 -- Best regards! Rui Li |
作业有发生failover么?还是说作业能成功结束但是某些partition始终没提交?
On Tue, Sep 8, 2020 at 5:20 PM MuChen <[hidden email]> wrote: > hi, Rui Li: > 如你所说,的确有类似日志,但是只有成功增加的分区的日志,没有失败分区的日志: > 2020-09-04 17:17:10,548 INFO org.apache.flink.streaming.api.operators. > AbstractStreamOperator [] - Partition {dt=2020-08-22, hour=18} of table > `hive_catalog`.`rt_dwd`.`dwd_music_copyright_test` is ready to be committed > 2020-09-04 17:17:10,716 INFO org.apache.flink.table.filesystem. > MetastoreCommitPolicy [] - Committed partition {dt=2020-08-22, hour=18} > to metastore > 2020-09-04 17:17:10,720 INFO org.apache.flink.table.filesystem. > SuccessFileCommitPolicy [] - Committed partition {dt=2020-08-22, hour=18} > with success file > 2020-09-04 17:17:19,652 INFO org.apache.flink.streaming.api.operators. > AbstractStreamOperator [] - Partition {dt=2020-08-22, hour=19} of table > `hive_catalog`.`rt_dwd`.`dwd_music_copyright_test` is ready to be committed > 2020-09-04 17:17:19,820 INFO org.apache.flink.table.filesystem. > MetastoreCommitPolicy [] - Committed partition {dt=2020-08-22, hour=19} > to metastore > 2020-09-04 17:17:19,824 INFO org.apache.flink.table.filesystem. > SuccessFileCommitPolicy [] - Committed partition {dt=2020-08-22, hour=19} > with success file > > 写hdfs的日志是都有的: > 2020-09-04 17:16:04,100 INFO org.apache.hadoop.hive.ql.io.parquet.write. > ParquetRecordWriterWrapper [] - creating real writer to write at hdfs:// > Ucluster/user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test/dt=2020-08 > -22/hour=07/.part-b7d8f3c6-f1f3-40d4-a269-1ccf2c9a7720-0-1140.inprogress. > 1631ac6c-a07c-4ad7-86ff-cf0d4375d1de > 2020-09-04 17:16:04,126 INFO org.apache.hadoop.hive.ql.io.parquet.write. > ParquetRecordWriterWrapper [] - creating real writer to write at hdfs:// > Ucluster/user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test/dt=2020-08 > -22/hour=19/.part-b7d8f3c6-f1f3-40d4-a269-1ccf2c9a7720-0-1142.inprogress. > 2700eded-5ed0-4794-8ee9-21721c0c2ffd > > ------------------ 原始邮件 ------------------ > *发件人:* "Rui Li" <[hidden email]>; > *发送时间:* 2020年9月8日(星期二) 中午12:09 > *收件人:* "user-zh"<[hidden email]>;"夏帅"<[hidden email]>; > *抄送:* "MuChen"<[hidden email]>; > *主题:* Re: 回复:使用StreamingFileSink向hive metadata中增加分区部分失败 > > streaming file committer在提交分区之前会打印这样的日志: > > LOG.info("Partition {} of table {} is ready to be committed", partSpec, tableIdentifier); > > partition commit policy会在成功提交分区以后打印这样的日志: > > LOG.info("Committed partition {} to metastore", partitionSpec); > > LOG.info("Committed partition {} with success file", context.partitionSpec()); > > 可以检查一下这样的日志,看是不是卡在什么地方了 > > On Tue, Sep 8, 2020 at 11:02 AM 夏帅 <[hidden email]> wrote: > >> 就第二次提供的日志看,好像是你的namenode出现的问题 >> >> >> ------------------------------------------------------------------ >> 发件人:MuChen <[hidden email]> >> 发送时间:2020年9月8日(星期二) 10:56 >> 收件人:[hidden email] 夏帅 <[hidden email]>; user-zh < >> [hidden email]> >> 主 题:回复: 回复:使用StreamingFileSink向hive metadata中增加分区部分失败 >> >> 在checkpoint失败的时间,tm上还有一些info和warn级别的日志: >> 2020-09-04 17:17:59,520 INFO >> org.apache.hadoop.io.retry.RetryInvocationHandler [] - Exception while >> invoking create of class ClientNamenodeProtocolTranslatorPB over >> uhadoop-op3raf-master2/10.42.52.202:8020 after 14 fail over attempts. >> Trying to fail over immediately. >> java.io.IOException: java.lang.InterruptedException >> at org.apache.hadoop.ipc.Client.call(Client.java:1449) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at org.apache.hadoop.ipc.Client.call(Client.java:1401) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at >> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at com.sun.proxy.$Proxy26.create(Unknown Source) ~[?:?] >> at >> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:295) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) ~[?:?] >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> ~[?:1.8.0_144] >> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_144] >> at >> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at >> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at com.sun.proxy.$Proxy27.create(Unknown Source) ~[?:?] >> at >> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1721) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1657) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1582) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at >> org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:397) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at >> org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:393) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at >> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at >> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:393) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at >> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:337) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at >> org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:141) >> ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] >> at >> org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:37) >> ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] >> at >> org.apache.flink.table.filesystem.SuccessFileCommitPolicy.commit(SuccessFileCommitPolicy.java:45) >> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] >> at >> org.apache.flink.table.filesystem.stream.StreamingFileCommitter.commitPartitions(StreamingFileCommitter.java:167) >> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] >> at >> org.apache.flink.table.filesystem.stream.StreamingFileCommitter.processElement(StreamingFileCommitter.java:144) >> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] >> at >> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:161) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:178) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:153) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at >> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:345) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at >> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:191) >> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at >> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:181) >> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at >> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:558) >> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at >> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:530) >> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) >> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) >> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144] >> Caused by: java.lang.InterruptedException >> at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404) >> ~[?:1.8.0_144] >> at java.util.concurrent.FutureTask.get(FutureTask.java:191) >> ~[?:1.8.0_144] >> at >> org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1048) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at org.apache.hadoop.ipc.Client.call(Client.java:1443) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> ... 38 more >> 2020-09-04 17:17:59,522 WARN >> org.apache.hadoop.io.retry.RetryInvocationHandler [] - Exception while >> invoking class >> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create >> over uhadoop-op3raf-master1/10.42.31.63:8020. Not retrying because >> failovers (15) exceeded maximum allowed (15) >> java.io.IOException: Failed on local exception: >> java.nio.channels.ClosedByInterruptException; Host Details : local host is: >> "uhadoop-op3raf-core13/10.42.99.178"; destination host is: >> "uhadoop-op3raf-master1":8020; >> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at org.apache.hadoop.ipc.Client.call(Client.java:1474) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at org.apache.hadoop.ipc.Client.call(Client.java:1401) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at >> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at com.sun.proxy.$Proxy26.create(Unknown Source) ~[?:?] >> at >> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:295) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) ~[?:?] >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> ~[?:1.8.0_144] >> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_144] >> at >> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at >> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at com.sun.proxy.$Proxy27.create(Unknown Source) ~[?:?] >> at >> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1721) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1657) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1582) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at >> org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:397) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at >> org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:393) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at >> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at >> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:393) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at >> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:337) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at >> org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:141) >> ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] >> at >> org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:37) >> ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] >> at >> org.apache.flink.table.filesystem.SuccessFileCommitPolicy.commit(SuccessFileCommitPolicy.java:45) >> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] >> at >> org.apache.flink.table.filesystem.stream.StreamingFileCommitter.commitPartitions(StreamingFileCommitter.java:167) >> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] >> at >> org.apache.flink.table.filesystem.stream.StreamingFileCommitter.processElement(StreamingFileCommitter.java:144) >> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] >> at >> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:161) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:178) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:153) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at >> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:345) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at >> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:191) >> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at >> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:181) >> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at >> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:558) >> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at >> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:530) >> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) >> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) >> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144] >> Caused by: java.nio.channels.ClosedByInterruptException >> at >> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) >> ~[?:1.8.0_144] >> at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659) >> ~[?:1.8.0_144] >> at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at >> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:609) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at >> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at >> org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1523) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> at org.apache.hadoop.ipc.Client.call(Client.java:1440) >> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >> ... 38 more >> >> 补充:程序多次执行,均会出现部分分区创建失败的情况,而且每次失败的分区是不同的 >> >> >> ------------------ 原始邮件 ------------------ >> 发件人: "[hidden email] 夏帅" <[hidden email]>; >> 发送时间: 2020年9月8日(星期二) 上午10:47 >> 收件人: "user-zh"<[hidden email]>;"MuChen"<[hidden email]>; >> 主题: 回复:使用StreamingFileSink向hive metadata中增加分区部分失败 >> >> 异常日志只有这些么?有没有详细点的 > > > > -- > Best regards! > Rui Li > -- Best regards! Rui Li |
另外也list一下没有提交的分区目录吧,看看里面的文件是什么状态
On Tue, Sep 8, 2020 at 9:19 PM Rui Li <[hidden email]> wrote: > 作业有发生failover么?还是说作业能成功结束但是某些partition始终没提交? > > On Tue, Sep 8, 2020 at 5:20 PM MuChen <[hidden email]> wrote: > >> hi, Rui Li: >> 如你所说,的确有类似日志,但是只有成功增加的分区的日志,没有失败分区的日志: >> 2020-09-04 17:17:10,548 INFO org.apache.flink.streaming.api.operators. >> AbstractStreamOperator [] - Partition {dt=2020-08-22, hour=18} of table >> `hive_catalog`.`rt_dwd`.`dwd_music_copyright_test` is ready to be committed >> 2020-09-04 17:17:10,716 INFO org.apache.flink.table.filesystem. >> MetastoreCommitPolicy [] - Committed partition {dt=2020-08-22, hour=18} >> to metastore >> 2020-09-04 17:17:10,720 INFO org.apache.flink.table.filesystem. >> SuccessFileCommitPolicy [] - Committed partition {dt=2020-08-22, hour=18} >> with success file >> 2020-09-04 17:17:19,652 INFO org.apache.flink.streaming.api.operators. >> AbstractStreamOperator [] - Partition {dt=2020-08-22, hour=19} of table >> `hive_catalog`.`rt_dwd`.`dwd_music_copyright_test` is ready to be committed >> 2020-09-04 17:17:19,820 INFO org.apache.flink.table.filesystem. >> MetastoreCommitPolicy [] - Committed partition {dt=2020-08-22, hour=19} >> to metastore >> 2020-09-04 17:17:19,824 INFO org.apache.flink.table.filesystem. >> SuccessFileCommitPolicy [] - Committed partition {dt=2020-08-22, hour=19} >> with success file >> >> 写hdfs的日志是都有的: >> 2020-09-04 17:16:04,100 INFO org.apache.hadoop.hive.ql.io.parquet.write. >> ParquetRecordWriterWrapper [] - creating real writer to write at hdfs:// >> Ucluster/user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test/dt=2020- >> 08-22/hour=07/.part-b7d8f3c6-f1f3-40d4-a269-1ccf2c9a7720-0-1140 >> .inprogress.1631ac6c-a07c-4ad7-86ff-cf0d4375d1de >> 2020-09-04 17:16:04,126 INFO org.apache.hadoop.hive.ql.io.parquet.write. >> ParquetRecordWriterWrapper [] - creating real writer to write at hdfs:// >> Ucluster/user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test/dt=2020- >> 08-22/hour=19/.part-b7d8f3c6-f1f3-40d4-a269-1ccf2c9a7720-0-1142 >> .inprogress.2700eded-5ed0-4794-8ee9-21721c0c2ffd >> >> ------------------ 原始邮件 ------------------ >> *发件人:* "Rui Li" <[hidden email]>; >> *发送时间:* 2020年9月8日(星期二) 中午12:09 >> *收件人:* "user-zh"<[hidden email]>;"夏帅"<[hidden email]>; >> *抄送:* "MuChen"<[hidden email]>; >> *主题:* Re: 回复:使用StreamingFileSink向hive metadata中增加分区部分失败 >> >> streaming file committer在提交分区之前会打印这样的日志: >> >> LOG.info("Partition {} of table {} is ready to be committed", partSpec, tableIdentifier); >> >> partition commit policy会在成功提交分区以后打印这样的日志: >> >> LOG.info("Committed partition {} to metastore", partitionSpec); >> >> LOG.info("Committed partition {} with success file", context.partitionSpec()); >> >> 可以检查一下这样的日志,看是不是卡在什么地方了 >> >> On Tue, Sep 8, 2020 at 11:02 AM 夏帅 <[hidden email]> wrote: >> >>> 就第二次提供的日志看,好像是你的namenode出现的问题 >>> >>> >>> ------------------------------------------------------------------ >>> 发件人:MuChen <[hidden email]> >>> 发送时间:2020年9月8日(星期二) 10:56 >>> 收件人:[hidden email] 夏帅 <[hidden email]>; user-zh < >>> [hidden email]> >>> 主 题:回复: 回复:使用StreamingFileSink向hive metadata中增加分区部分失败 >>> >>> 在checkpoint失败的时间,tm上还有一些info和warn级别的日志: >>> 2020-09-04 17:17:59,520 INFO >>> org.apache.hadoop.io.retry.RetryInvocationHandler [] - Exception while >>> invoking create of class ClientNamenodeProtocolTranslatorPB over >>> uhadoop-op3raf-master2/10.42.52.202:8020 after 14 fail over attempts. >>> Trying to fail over immediately. >>> java.io.IOException: java.lang.InterruptedException >>> at org.apache.hadoop.ipc.Client.call(Client.java:1449) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.ipc.Client.call(Client.java:1401) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at com.sun.proxy.$Proxy26.create(Unknown Source) ~[?:?] >>> at >>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:295) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) >>> ~[?:?] >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> ~[?:1.8.0_144] >>> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_144] >>> at >>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at com.sun.proxy.$Proxy27.create(Unknown Source) ~[?:?] >>> at >>> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1721) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1657) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1582) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:397) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:393) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:393) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:337) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:141) >>> ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] >>> at >>> org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:37) >>> ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] >>> at >>> org.apache.flink.table.filesystem.SuccessFileCommitPolicy.commit(SuccessFileCommitPolicy.java:45) >>> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] >>> at >>> org.apache.flink.table.filesystem.stream.StreamingFileCommitter.commitPartitions(StreamingFileCommitter.java:167) >>> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] >>> at >>> org.apache.flink.table.filesystem.stream.StreamingFileCommitter.processElement(StreamingFileCommitter.java:144) >>> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] >>> at >>> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:161) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:178) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:153) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:345) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:191) >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:181) >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:558) >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:530) >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144] >>> Caused by: java.lang.InterruptedException >>> at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404) >>> ~[?:1.8.0_144] >>> at java.util.concurrent.FutureTask.get(FutureTask.java:191) >>> ~[?:1.8.0_144] >>> at >>> org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1048) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.ipc.Client.call(Client.java:1443) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> ... 38 more >>> 2020-09-04 17:17:59,522 WARN >>> org.apache.hadoop.io.retry.RetryInvocationHandler [] - Exception while >>> invoking class >>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create >>> over uhadoop-op3raf-master1/10.42.31.63:8020. Not retrying because >>> failovers (15) exceeded maximum allowed (15) >>> java.io.IOException: Failed on local exception: >>> java.nio.channels.ClosedByInterruptException; Host Details : local host is: >>> "uhadoop-op3raf-core13/10.42.99.178"; destination host is: >>> "uhadoop-op3raf-master1":8020; >>> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.ipc.Client.call(Client.java:1474) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.ipc.Client.call(Client.java:1401) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at com.sun.proxy.$Proxy26.create(Unknown Source) ~[?:?] >>> at >>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:295) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) >>> ~[?:?] >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> ~[?:1.8.0_144] >>> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_144] >>> at >>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at com.sun.proxy.$Proxy27.create(Unknown Source) ~[?:?] >>> at >>> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1721) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1657) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1582) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:397) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:393) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:393) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:337) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:141) >>> ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] >>> at >>> org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:37) >>> ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] >>> at >>> org.apache.flink.table.filesystem.SuccessFileCommitPolicy.commit(SuccessFileCommitPolicy.java:45) >>> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] >>> at >>> org.apache.flink.table.filesystem.stream.StreamingFileCommitter.commitPartitions(StreamingFileCommitter.java:167) >>> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] >>> at >>> org.apache.flink.table.filesystem.stream.StreamingFileCommitter.processElement(StreamingFileCommitter.java:144) >>> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] >>> at >>> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:161) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:178) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:153) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:345) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:191) >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:181) >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:558) >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:530) >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144] >>> Caused by: java.nio.channels.ClosedByInterruptException >>> at >>> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) >>> ~[?:1.8.0_144] >>> at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659) >>> ~[?:1.8.0_144] >>> at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:609) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1523) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.ipc.Client.call(Client.java:1440) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> ... 38 more >>> >>> 补充:程序多次执行,均会出现部分分区创建失败的情况,而且每次失败的分区是不同的 >>> >>> >>> ------------------ 原始邮件 ------------------ >>> 发件人: "[hidden email] 夏帅" <[hidden email]>; >>> 发送时间: 2020年9月8日(星期二) 上午10:47 >>> 收件人: "user-zh"<[hidden email]>;"MuChen"<[hidden email]>; >>> 主题: 回复:使用StreamingFileSink向hive metadata中增加分区部分失败 >>> >>> 异常日志只有这些么?有没有详细点的 >> >> >> >> -- >> Best regards! >> Rui Li >> > > > -- > Best regards! > Rui Li > -- Best regards! Rui Li |
插入Hive表的SQL也发下?
On Tue, Sep 8, 2020 at 9:44 PM Rui Li <[hidden email]> wrote: > 另外也list一下没有提交的分区目录吧,看看里面的文件是什么状态 > > On Tue, Sep 8, 2020 at 9:19 PM Rui Li <[hidden email]> wrote: > > > 作业有发生failover么?还是说作业能成功结束但是某些partition始终没提交? > > > > On Tue, Sep 8, 2020 at 5:20 PM MuChen <[hidden email]> wrote: > > > >> hi, Rui Li: > >> 如你所说,的确有类似日志,但是只有成功增加的分区的日志,没有失败分区的日志: > >> 2020-09-04 17:17:10,548 INFO org.apache.flink.streaming.api.operators. > >> AbstractStreamOperator [] - Partition {dt=2020-08-22, hour=18} of table > >> `hive_catalog`.`rt_dwd`.`dwd_music_copyright_test` is ready to be > committed > >> 2020-09-04 17:17:10,716 INFO org.apache.flink.table.filesystem. > >> MetastoreCommitPolicy [] - Committed partition {dt=2020-08-22, hour=18} > >> to metastore > >> 2020-09-04 17:17:10,720 INFO org.apache.flink.table.filesystem. > >> SuccessFileCommitPolicy [] - Committed partition {dt=2020-08-22, > hour=18} > >> with success file > >> 2020-09-04 17:17:19,652 INFO org.apache.flink.streaming.api.operators. > >> AbstractStreamOperator [] - Partition {dt=2020-08-22, hour=19} of table > >> `hive_catalog`.`rt_dwd`.`dwd_music_copyright_test` is ready to be > committed > >> 2020-09-04 17:17:19,820 INFO org.apache.flink.table.filesystem. > >> MetastoreCommitPolicy [] - Committed partition {dt=2020-08-22, hour=19} > >> to metastore > >> 2020-09-04 17:17:19,824 INFO org.apache.flink.table.filesystem. > >> SuccessFileCommitPolicy [] - Committed partition {dt=2020-08-22, > hour=19} > >> with success file > >> > >> 写hdfs的日志是都有的: > >> 2020-09-04 17:16:04,100 INFO org.apache.hadoop.hive.ql.io > .parquet.write. > >> ParquetRecordWriterWrapper [] - creating real writer to write at hdfs:// > >> Ucluster/user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test/dt=2020- > >> 08-22/hour=07/.part-b7d8f3c6-f1f3-40d4-a269-1ccf2c9a7720-0-1140 > >> .inprogress.1631ac6c-a07c-4ad7-86ff-cf0d4375d1de > >> 2020-09-04 17:16:04,126 INFO org.apache.hadoop.hive.ql.io > .parquet.write. > >> ParquetRecordWriterWrapper [] - creating real writer to write at hdfs:// > >> Ucluster/user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test/dt=2020- > >> 08-22/hour=19/.part-b7d8f3c6-f1f3-40d4-a269-1ccf2c9a7720-0-1142 > >> .inprogress.2700eded-5ed0-4794-8ee9-21721c0c2ffd > >> > >> ------------------ 原始邮件 ------------------ > >> *发件人:* "Rui Li" <[hidden email]>; > >> *发送时间:* 2020年9月8日(星期二) 中午12:09 > >> *收件人:* "user-zh"<[hidden email]>;"夏帅"<[hidden email]>; > >> *抄送:* "MuChen"<[hidden email]>; > >> *主题:* Re: 回复:使用StreamingFileSink向hive metadata中增加分区部分失败 > >> > >> streaming file committer在提交分区之前会打印这样的日志: > >> > >> LOG.info("Partition {} of table {} is ready to be committed", partSpec, > tableIdentifier); > >> > >> partition commit policy会在成功提交分区以后打印这样的日志: > >> > >> LOG.info("Committed partition {} to metastore", partitionSpec); > >> > >> LOG.info("Committed partition {} with success file", > context.partitionSpec()); > >> > >> 可以检查一下这样的日志,看是不是卡在什么地方了 > >> > >> On Tue, Sep 8, 2020 at 11:02 AM 夏帅 <[hidden email]> > wrote: > >> > >>> 就第二次提供的日志看,好像是你的namenode出现的问题 > >>> > >>> > >>> ------------------------------------------------------------------ > >>> 发件人:MuChen <[hidden email]> > >>> 发送时间:2020年9月8日(星期二) 10:56 > >>> 收件人:[hidden email] 夏帅 <[hidden email]>; user-zh < > >>> [hidden email]> > >>> 主 题:回复: 回复:使用StreamingFileSink向hive metadata中增加分区部分失败 > >>> > >>> 在checkpoint失败的时间,tm上还有一些info和warn级别的日志: > >>> 2020-09-04 17:17:59,520 INFO > >>> org.apache.hadoop.io.retry.RetryInvocationHandler [] - Exception while > >>> invoking create of class ClientNamenodeProtocolTranslatorPB over > >>> uhadoop-op3raf-master2/10.42.52.202:8020 after 14 fail over attempts. > >>> Trying to fail over immediately. > >>> java.io.IOException: java.lang.InterruptedException > >>> at org.apache.hadoop.ipc.Client.call(Client.java:1449) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.ipc.Client.call(Client.java:1401) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at com.sun.proxy.$Proxy26.create(Unknown Source) ~[?:?] > >>> at > >>> > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:295) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) > >>> ~[?:?] > >>> at > >>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > >>> ~[?:1.8.0_144] > >>> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_144] > >>> at > >>> > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at com.sun.proxy.$Proxy27.create(Unknown Source) ~[?:?] > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1721) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1657) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1582) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:397) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:393) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:393) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:337) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:141) > >>> ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] > >>> at > >>> > org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:37) > >>> ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] > >>> at > >>> > org.apache.flink.table.filesystem.SuccessFileCommitPolicy.commit(SuccessFileCommitPolicy.java:45) > >>> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] > >>> at > >>> > org.apache.flink.table.filesystem.stream.StreamingFileCommitter.commitPartitions(StreamingFileCommitter.java:167) > >>> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] > >>> at > >>> > org.apache.flink.table.filesystem.stream.StreamingFileCommitter.processElement(StreamingFileCommitter.java:144) > >>> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:161) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.flink.streaming.runtime.io > .StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:178) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.flink.streaming.runtime.io > .StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:153) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.flink.streaming.runtime.io > .StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:345) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:191) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:181) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:558) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:530) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144] > >>> Caused by: java.lang.InterruptedException > >>> at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404) > >>> ~[?:1.8.0_144] > >>> at java.util.concurrent.FutureTask.get(FutureTask.java:191) > >>> ~[?:1.8.0_144] > >>> at > >>> > org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1048) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.ipc.Client.call(Client.java:1443) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> ... 38 more > >>> 2020-09-04 17:17:59,522 WARN > >>> org.apache.hadoop.io.retry.RetryInvocationHandler [] - Exception while > >>> invoking class > >>> > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create > >>> over uhadoop-op3raf-master1/10.42.31.63:8020. Not retrying because > >>> failovers (15) exceeded maximum allowed (15) > >>> java.io.IOException: Failed on local exception: > >>> java.nio.channels.ClosedByInterruptException; Host Details : local > host is: > >>> "uhadoop-op3raf-core13/10.42.99.178"; destination host is: > >>> "uhadoop-op3raf-master1":8020; > >>> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.ipc.Client.call(Client.java:1474) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.ipc.Client.call(Client.java:1401) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at com.sun.proxy.$Proxy26.create(Unknown Source) ~[?:?] > >>> at > >>> > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:295) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) > >>> ~[?:?] > >>> at > >>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > >>> ~[?:1.8.0_144] > >>> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_144] > >>> at > >>> > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at com.sun.proxy.$Proxy27.create(Unknown Source) ~[?:?] > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1721) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1657) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1582) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:397) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:393) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:393) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:337) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:141) > >>> ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] > >>> at > >>> > org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:37) > >>> ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] > >>> at > >>> > org.apache.flink.table.filesystem.SuccessFileCommitPolicy.commit(SuccessFileCommitPolicy.java:45) > >>> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] > >>> at > >>> > org.apache.flink.table.filesystem.stream.StreamingFileCommitter.commitPartitions(StreamingFileCommitter.java:167) > >>> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] > >>> at > >>> > org.apache.flink.table.filesystem.stream.StreamingFileCommitter.processElement(StreamingFileCommitter.java:144) > >>> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:161) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.flink.streaming.runtime.io > .StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:178) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.flink.streaming.runtime.io > .StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:153) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.flink.streaming.runtime.io > .StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:345) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:191) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:181) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:558) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:530) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144] > >>> Caused by: java.nio.channels.ClosedByInterruptException > >>> at > >>> > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > >>> ~[?:1.8.0_144] > >>> at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659) > >>> ~[?:1.8.0_144] > >>> at org.apache.hadoop.net > .SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:609) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1523) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.ipc.Client.call(Client.java:1440) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> ... 38 more > >>> > >>> 补充:程序多次执行,均会出现部分分区创建失败的情况,而且每次失败的分区是不同的 > >>> > >>> > >>> ------------------ 原始邮件 ------------------ > >>> 发件人: "[hidden email] 夏帅" <[hidden email]>; > >>> 发送时间: 2020年9月8日(星期二) 上午10:47 > >>> 收件人: "user-zh"<[hidden email]>;"MuChen"<[hidden email]>; > >>> 主题: 回复:使用StreamingFileSink向hive metadata中增加分区部分失败 > >>> > >>> 异常日志只有这些么?有没有详细点的 > >> > >> > >> > >> -- > >> Best regards! > >> Rui Li > >> > > > > > > -- > > Best regards! > > Rui Li > > > > > -- > Best regards! > Rui Li > -- Best, Jingsong Lee |
插入表的sql如下:
INSERT INTO rt_dwd.dwd_music_copyright_test SELECT url,md5,utime,title,singer,company,level, from_unixtime(cast(utime/1000 as int),'yyyy-MM-dd') ,from_unixtime(cast(utime/1000 as int),'HH') FROM music_source; ------------------ 原始邮件 ------------------ 发件人: "user-zh" <[hidden email]>; 发送时间: 2020年9月9日(星期三) 上午10:32 收件人: "user-zh"<[hidden email]>; 抄送: "MuChen"<[hidden email]>;"夏帅"<[hidden email]>; 主题: Re: 回复:使用StreamingFileSink向hive metadata中增加分区部分失败 插入Hive表的SQL也发下? On Tue, Sep 8, 2020 at 9:44 PM Rui Li <[hidden email]> wrote: > 另外也list一下没有提交的分区目录吧,看看里面的文件是什么状态 > > On Tue, Sep 8, 2020 at 9:19 PM Rui Li <[hidden email]> wrote: > > > 作业有发生failover么?还是说作业能成功结束但是某些partition始终没提交? > > > > On Tue, Sep 8, 2020 at 5:20 PM MuChen <[hidden email]> wrote: > > > >> hi, Rui Li: > >> 如你所说,的确有类似日志,但是只有成功增加的分区的日志,没有失败分区的日志: > >> 2020-09-04 17:17:10,548 INFO org.apache.flink.streaming.api.operators. > >> AbstractStreamOperator [] - Partition {dt=2020-08-22, hour=18} of table > >> `hive_catalog`.`rt_dwd`.`dwd_music_copyright_test` is ready to be > committed > >> 2020-09-04 17:17:10,716 INFO org.apache.flink.table.filesystem. > >> MetastoreCommitPolicy [] - Committed partition {dt=2020-08-22, hour=18} > >> to metastore > >> 2020-09-04 17:17:10,720 INFO org.apache.flink.table.filesystem. > >> SuccessFileCommitPolicy [] - Committed partition {dt=2020-08-22, > hour=18} > >> with success file > >> 2020-09-04 17:17:19,652 INFO org.apache.flink.streaming.api.operators. > >> AbstractStreamOperator [] - Partition {dt=2020-08-22, hour=19} of table > >> `hive_catalog`.`rt_dwd`.`dwd_music_copyright_test` is ready to be > committed > >> 2020-09-04 17:17:19,820 INFO org.apache.flink.table.filesystem. > >> MetastoreCommitPolicy [] - Committed partition {dt=2020-08-22, hour=19} > >> to metastore > >> 2020-09-04 17:17:19,824 INFO org.apache.flink.table.filesystem. > >> SuccessFileCommitPolicy [] - Committed partition {dt=2020-08-22, > hour=19} > >> with success file > >> > >> 写hdfs的日志是都有的: > >> 2020-09-04 17:16:04,100 INFO org.apache.hadoop.hive.ql.io > .parquet.write. > >> ParquetRecordWriterWrapper [] - creating real writer to write at hdfs:// > >> Ucluster/user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test/dt=2020- > >> 08-22/hour=07/.part-b7d8f3c6-f1f3-40d4-a269-1ccf2c9a7720-0-1140 > >> .inprogress.1631ac6c-a07c-4ad7-86ff-cf0d4375d1de > >> 2020-09-04 17:16:04,126 INFO org.apache.hadoop.hive.ql.io > .parquet.write. > >> ParquetRecordWriterWrapper [] - creating real writer to write at hdfs:// > >> Ucluster/user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test/dt=2020- > >> 08-22/hour=19/.part-b7d8f3c6-f1f3-40d4-a269-1ccf2c9a7720-0-1142 > >> .inprogress.2700eded-5ed0-4794-8ee9-21721c0c2ffd > >> > >> ------------------ 原始邮件 ------------------ > >> *发件人:* "Rui Li" <[hidden email]>; > >> *发送时间:* 2020年9月8日(星期二) 中午12:09 > >> *收件人:* "user-zh"<[hidden email]>;"夏帅"<[hidden email]>; > >> *抄送:* "MuChen"<[hidden email]>; > >> *主题:* Re: 回复:使用StreamingFileSink向hive metadata中增加分区部分失败 > >> > >> streaming file committer在提交分区之前会打印这样的日志: > >> > >> LOG.info("Partition {} of table {} is ready to be committed", partSpec, > tableIdentifier); > >> > >> partition commit policy会在成功提交分区以后打印这样的日志: > >> > >> LOG.info("Committed partition {} to metastore", partitionSpec); > >> > >> LOG.info("Committed partition {} with success file", > context.partitionSpec()); > >> > >> 可以检查一下这样的日志,看是不是卡在什么地方了 > >> > >> On Tue, Sep 8, 2020 at 11:02 AM 夏帅 <[hidden email]> > wrote: > >> > >>> 就第二次提供的日志看,好像是你的namenode出现的问题 > >>> > >>> > >>> ------------------------------------------------------------------ > >>> 发件人:MuChen <[hidden email]> > >>> 发送时间:2020年9月8日(星期二) 10:56 > >>> 收件人:[hidden email] 夏帅 <[hidden email]>; user-zh < > >>> [hidden email]> > >>> 主 题:回复: 回复:使用StreamingFileSink向hive metadata中增加分区部分失败 > >>> > >>> 在checkpoint失败的时间,tm上还有一些info和warn级别的日志: > >>> 2020-09-04 17:17:59,520 INFO > >>> org.apache.hadoop.io.retry.RetryInvocationHandler [] - Exception while > >>> invoking create of class ClientNamenodeProtocolTranslatorPB over > >>> uhadoop-op3raf-master2/10.42.52.202:8020 after 14 fail over attempts. > >>> Trying to fail over immediately. > >>> java.io.IOException: java.lang.InterruptedException > >>> at org.apache.hadoop.ipc.Client.call(Client.java:1449) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.ipc.Client.call(Client.java:1401) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at com.sun.proxy.$Proxy26.create(Unknown Source) ~[?:?] > >>> at > >>> > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:295) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) > >>> ~[?:?] > >>> at > >>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > >>> ~[?:1.8.0_144] > >>> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_144] > >>> at > >>> > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at com.sun.proxy.$Proxy27.create(Unknown Source) ~[?:?] > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1721) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1657) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1582) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:397) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:393) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:393) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:337) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:141) > >>> ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] > >>> at > >>> > org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:37) > >>> ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] > >>> at > >>> > org.apache.flink.table.filesystem.SuccessFileCommitPolicy.commit(SuccessFileCommitPolicy.java:45) > >>> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] > >>> at > >>> > org.apache.flink.table.filesystem.stream.StreamingFileCommitter.commitPartitions(StreamingFileCommitter.java:167) > >>> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] > >>> at > >>> > org.apache.flink.table.filesystem.stream.StreamingFileCommitter.processElement(StreamingFileCommitter.java:144) > >>> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:161) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.flink.streaming.runtime.io > .StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:178) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.flink.streaming.runtime.io > .StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:153) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.flink.streaming.runtime.io > .StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:345) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:191) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:181) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:558) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:530) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144] > >>> Caused by: java.lang.InterruptedException > >>> at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404) > >>> ~[?:1.8.0_144] > >>> at java.util.concurrent.FutureTask.get(FutureTask.java:191) > >>> ~[?:1.8.0_144] > >>> at > >>> > org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1048) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.ipc.Client.call(Client.java:1443) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> ... 38 more > >>> 2020-09-04 17:17:59,522 WARN > >>> org.apache.hadoop.io.retry.RetryInvocationHandler [] - Exception while > >>> invoking class > >>> > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create > >>> over uhadoop-op3raf-master1/10.42.31.63:8020. Not retrying because > >>> failovers (15) exceeded maximum allowed (15) > >>> java.io.IOException: Failed on local exception: > >>> java.nio.channels.ClosedByInterruptException; Host Details : local > host is: > >>> "uhadoop-op3raf-core13/10.42.99.178"; destination host is: > >>> "uhadoop-op3raf-master1":8020; > >>> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.ipc.Client.call(Client.java:1474) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.ipc.Client.call(Client.java:1401) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at com.sun.proxy.$Proxy26.create(Unknown Source) ~[?:?] > >>> at > >>> > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:295) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) > >>> ~[?:?] > >>> at > >>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > >>> ~[?:1.8.0_144] > >>> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_144] > >>> at > >>> > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at com.sun.proxy.$Proxy27.create(Unknown Source) ~[?:?] > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1721) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1657) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1582) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:397) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:393) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:393) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:337) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:141) > >>> ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] > >>> at > >>> > org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:37) > >>> ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] > >>> at > >>> > org.apache.flink.table.filesystem.SuccessFileCommitPolicy.commit(SuccessFileCommitPolicy.java:45) > >>> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] > >>> at > >>> > org.apache.flink.table.filesystem.stream.StreamingFileCommitter.commitPartitions(StreamingFileCommitter.java:167) > >>> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] > >>> at > >>> > org.apache.flink.table.filesystem.stream.StreamingFileCommitter.processElement(StreamingFileCommitter.java:144) > >>> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:161) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.flink.streaming.runtime.io > .StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:178) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.flink.streaming.runtime.io > .StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:153) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.flink.streaming.runtime.io > .StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:345) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:191) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:181) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:558) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:530) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144] > >>> Caused by: java.nio.channels.ClosedByInterruptException > >>> at > >>> > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > >>> ~[?:1.8.0_144] > >>> at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659) > >>> ~[?:1.8.0_144] > >>> at org.apache.hadoop.net > .SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:609) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1523) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.ipc.Client.call(Client.java:1440) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> ... 38 more > >>> > >>> 补充:程序多次执行,均会出现部分分区创建失败的情况,而且每次失败的分区是不同的 > >>> > >>> > >>> ------------------ 原始邮件 ------------------ > >>> 发件人: "[hidden email] 夏帅" <[hidden email]>; > >>> 发送时间: 2020年9月8日(星期二) 上午10:47 > >>> 收件人: "user-zh"<[hidden email]>;"MuChen"<[hidden email]>; > >>> 主题: 回复:使用StreamingFileSink向hive metadata中增加分区部分失败 > >>> > >>> 异常日志只有这些么?有没有详细点的 > >> > >> > >> > >> -- > >> Best regards! > >> Rui Li > >> > > > > > > -- > > Best regards! > > Rui Li > > > > > -- > Best regards! > Rui Li > -- Best, Jingsong Lee |
In reply to this post by Rui Li
hi,Rui Li:
没有提交分区的目录是commited状态,手动add partition是可以正常查询的 /user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test/dt=2020-08-19/hour=07/part-b7d8f3c6-f1f3-40d4-a269-1ccf2c9a7720-0-1031 ------------------ 原始邮件 ------------------ 发件人: "user-zh" <[hidden email]>; 发送时间: 2020年9月8日(星期二) 晚上9:43 收件人: "MuChen"<[hidden email]>; 抄送: "user-zh"<[hidden email]>;"夏帅"<[hidden email]>; 主题: Re: 回复:使用StreamingFileSink向hive metadata中增加分区部分失败 另外也list一下没有提交的分区目录吧,看看里面的文件是什么状态 On Tue, Sep 8, 2020 at 9:19 PM Rui Li <[hidden email]> wrote: > 作业有发生failover么?还是说作业能成功结束但是某些partition始终没提交? > > On Tue, Sep 8, 2020 at 5:20 PM MuChen <[hidden email]> wrote: > >> hi, Rui Li: >> 如你所说,的确有类似日志,但是只有成功增加的分区的日志,没有失败分区的日志: >> 2020-09-04 17:17:10,548 INFO org.apache.flink.streaming.api.operators. >> AbstractStreamOperator [] - Partition {dt=2020-08-22, hour=18} of table >> `hive_catalog`.`rt_dwd`.`dwd_music_copyright_test` is ready to be committed >> 2020-09-04 17:17:10,716 INFO org.apache.flink.table.filesystem. >> MetastoreCommitPolicy [] - Committed partition {dt=2020-08-22, hour=18} >> to metastore >> 2020-09-04 17:17:10,720 INFO org.apache.flink.table.filesystem. >> SuccessFileCommitPolicy [] - Committed partition {dt=2020-08-22, hour=18} >> with success file >> 2020-09-04 17:17:19,652 INFO org.apache.flink.streaming.api.operators. >> AbstractStreamOperator [] - Partition {dt=2020-08-22, hour=19} of table >> `hive_catalog`.`rt_dwd`.`dwd_music_copyright_test` is ready to be committed >> 2020-09-04 17:17:19,820 INFO org.apache.flink.table.filesystem. >> MetastoreCommitPolicy [] - Committed partition {dt=2020-08-22, hour=19} >> to metastore >> 2020-09-04 17:17:19,824 INFO org.apache.flink.table.filesystem. >> SuccessFileCommitPolicy [] - Committed partition {dt=2020-08-22, hour=19} >> with success file >> >> 写hdfs的日志是都有的: >> 2020-09-04 17:16:04,100 INFO org.apache.hadoop.hive.ql.io.parquet.write. >> ParquetRecordWriterWrapper [] - creating real writer to write at hdfs:// >> Ucluster/user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test/dt=2020- >> 08-22/hour=07/.part-b7d8f3c6-f1f3-40d4-a269-1ccf2c9a7720-0-1140 >> .inprogress.1631ac6c-a07c-4ad7-86ff-cf0d4375d1de >> 2020-09-04 17:16:04,126 INFO org.apache.hadoop.hive.ql.io.parquet.write. >> ParquetRecordWriterWrapper [] - creating real writer to write at hdfs:// >> Ucluster/user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test/dt=2020- >> 08-22/hour=19/.part-b7d8f3c6-f1f3-40d4-a269-1ccf2c9a7720-0-1142 >> .inprogress.2700eded-5ed0-4794-8ee9-21721c0c2ffd >> >> ------------------ 原始邮件 ------------------ >> *发件人:* "Rui Li" <[hidden email]>; >> *发送时间:* 2020年9月8日(星期二) 中午12:09 >> *收件人:* "user-zh"<[hidden email]>;"夏帅"<[hidden email]>; >> *抄送:* "MuChen"<[hidden email]>; >> *主题:* Re: 回复:使用StreamingFileSink向hive metadata中增加分区部分失败 >> >> streaming file committer在提交分区之前会打印这样的日志: >> >> LOG.info("Partition {} of table {} is ready to be committed", partSpec, tableIdentifier); >> >> partition commit policy会在成功提交分区以后打印这样的日志: >> >> LOG.info("Committed partition {} to metastore", partitionSpec); >> >> LOG.info("Committed partition {} with success file", context.partitionSpec()); >> >> 可以检查一下这样的日志,看是不是卡在什么地方了 >> >> On Tue, Sep 8, 2020 at 11:02 AM 夏帅 <[hidden email]> wrote: >> >>> 就第二次提供的日志看,好像是你的namenode出现的问题 >>> >>> >>> ------------------------------------------------------------------ >>> 发件人:MuChen <[hidden email]> >>> 发送时间:2020年9月8日(星期二) 10:56 >>> 收件人:[hidden email] 夏帅 <[hidden email]>; user-zh < >>> [hidden email]> >>> 主 题:回复: 回复:使用StreamingFileSink向hive metadata中增加分区部分失败 >>> >>> 在checkpoint失败的时间,tm上还有一些info和warn级别的日志: >>> 2020-09-04 17:17:59,520 INFO >>> org.apache.hadoop.io.retry.RetryInvocationHandler [] - Exception while >>> invoking create of class ClientNamenodeProtocolTranslatorPB over >>> uhadoop-op3raf-master2/10.42.52.202:8020 after 14 fail over attempts. >>> Trying to fail over immediately. >>> java.io.IOException: java.lang.InterruptedException >>> at org.apache.hadoop.ipc.Client.call(Client.java:1449) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.ipc.Client.call(Client.java:1401) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at com.sun.proxy.$Proxy26.create(Unknown Source) ~[?:?] >>> at >>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:295) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) >>> ~[?:?] >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> ~[?:1.8.0_144] >>> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_144] >>> at >>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at com.sun.proxy.$Proxy27.create(Unknown Source) ~[?:?] >>> at >>> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1721) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1657) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1582) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:397) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:393) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:393) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:337) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:141) >>> ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] >>> at >>> org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:37) >>> ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] >>> at >>> org.apache.flink.table.filesystem.SuccessFileCommitPolicy.commit(SuccessFileCommitPolicy.java:45) >>> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] >>> at >>> org.apache.flink.table.filesystem.stream.StreamingFileCommitter.commitPartitions(StreamingFileCommitter.java:167) >>> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] >>> at >>> org.apache.flink.table.filesystem.stream.StreamingFileCommitter.processElement(StreamingFileCommitter.java:144) >>> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] >>> at >>> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:161) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:178) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:153) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:345) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:191) >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:181) >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:558) >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:530) >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144] >>> Caused by: java.lang.InterruptedException >>> at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404) >>> ~[?:1.8.0_144] >>> at java.util.concurrent.FutureTask.get(FutureTask.java:191) >>> ~[?:1.8.0_144] >>> at >>> org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1048) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.ipc.Client.call(Client.java:1443) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> ... 38 more >>> 2020-09-04 17:17:59,522 WARN >>> org.apache.hadoop.io.retry.RetryInvocationHandler [] - Exception while >>> invoking class >>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create >>> over uhadoop-op3raf-master1/10.42.31.63:8020. Not retrying because >>> failovers (15) exceeded maximum allowed (15) >>> java.io.IOException: Failed on local exception: >>> java.nio.channels.ClosedByInterruptException; Host Details : local host is: >>> "uhadoop-op3raf-core13/10.42.99.178"; destination host is: >>> "uhadoop-op3raf-master1":8020; >>> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.ipc.Client.call(Client.java:1474) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.ipc.Client.call(Client.java:1401) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at com.sun.proxy.$Proxy26.create(Unknown Source) ~[?:?] >>> at >>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:295) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) >>> ~[?:?] >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> ~[?:1.8.0_144] >>> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_144] >>> at >>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at com.sun.proxy.$Proxy27.create(Unknown Source) ~[?:?] >>> at >>> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1721) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1657) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1582) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:397) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:393) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:393) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:337) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:141) >>> ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] >>> at >>> org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:37) >>> ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] >>> at >>> org.apache.flink.table.filesystem.SuccessFileCommitPolicy.commit(SuccessFileCommitPolicy.java:45) >>> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] >>> at >>> org.apache.flink.table.filesystem.stream.StreamingFileCommitter.commitPartitions(StreamingFileCommitter.java:167) >>> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] >>> at >>> org.apache.flink.table.filesystem.stream.StreamingFileCommitter.processElement(StreamingFileCommitter.java:144) >>> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] >>> at >>> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:161) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:178) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:153) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:345) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:191) >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:181) >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:558) >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:530) >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144] >>> Caused by: java.nio.channels.ClosedByInterruptException >>> at >>> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) >>> ~[?:1.8.0_144] >>> at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659) >>> ~[?:1.8.0_144] >>> at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:609) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at >>> org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1523) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> at org.apache.hadoop.ipc.Client.call(Client.java:1440) >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] >>> ... 38 more >>> >>> 补充:程序多次执行,均会出现部分分区创建失败的情况,而且每次失败的分区是不同的 >>> >>> >>> ------------------ 原始邮件 ------------------ >>> 发件人: "[hidden email] 夏帅" <[hidden email]>; >>> 发送时间: 2020年9月8日(星期二) 上午10:47 >>> 收件人: "user-zh"<[hidden email]>;"MuChen"<[hidden email]>; >>> 主题: 回复:使用StreamingFileSink向hive metadata中增加分区部分失败 >>> >>> 异常日志只有这些么?有没有详细点的 >> >> >> >> -- >> Best regards! >> Rui Li >> > > > -- > Best regards! > Rui Li > -- Best regards! Rui Li |
非常感谢你的反馈,应该是真的有问题,我建个JIRA追踪下
https://issues.apache.org/jira/browse/FLINK-19166 会包含在即将发布的1.11.2中 Best, Jingsong On Wed, Sep 9, 2020 at 10:44 AM MuChen <[hidden email]> wrote: > hi,Rui Li: > 没有提交分区的目录是commited状态,手动add partition是可以正常查询的 > > /user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test/dt=2020-08-19/hour=07/part-b7d8f3c6-f1f3-40d4-a269-1ccf2c9a7720-0-1031 > > > > > ------------------ 原始邮件 ------------------ > 发件人: > "user-zh" > < > [hidden email]>; > 发送时间: 2020年9月8日(星期二) 晚上9:43 > 收件人: "MuChen"<[hidden email]>; > 抄送: "user-zh"<[hidden email]>;"夏帅"<[hidden email] > >; > 主题: Re: 回复:使用StreamingFileSink向hive metadata中增加分区部分失败 > > > > 另外也list一下没有提交的分区目录吧,看看里面的文件是什么状态 > > On Tue, Sep 8, 2020 at 9:19 PM Rui Li <[hidden email]> wrote: > > > 作业有发生failover么?还是说作业能成功结束但是某些partition始终没提交? > > > > On Tue, Sep 8, 2020 at 5:20 PM MuChen <[hidden email]> wrote: > > > >> hi, Rui Li: > >> 如你所说,的确有类似日志,但是只有成功增加的分区的日志,没有失败分区的日志: > >> 2020-09-04 17:17:10,548 INFO > org.apache.flink.streaming.api.operators. > >> AbstractStreamOperator [] - Partition {dt=2020-08-22, hour=18} of > table > >> `hive_catalog`.`rt_dwd`.`dwd_music_copyright_test` is ready to be > committed > >> 2020-09-04 17:17:10,716 INFO org.apache.flink.table.filesystem. > >> MetastoreCommitPolicy [] - Committed partition {dt=2020-08-22, > hour=18} > >> to metastore > >> 2020-09-04 17:17:10,720 INFO org.apache.flink.table.filesystem. > >> SuccessFileCommitPolicy [] - Committed partition {dt=2020-08-22, > hour=18} > >> with success file > >> 2020-09-04 17:17:19,652 INFO > org.apache.flink.streaming.api.operators. > >> AbstractStreamOperator [] - Partition {dt=2020-08-22, hour=19} of > table > >> `hive_catalog`.`rt_dwd`.`dwd_music_copyright_test` is ready to be > committed > >> 2020-09-04 17:17:19,820 INFO org.apache.flink.table.filesystem. > >> MetastoreCommitPolicy [] - Committed partition {dt=2020-08-22, > hour=19} > >> to metastore > >> 2020-09-04 17:17:19,824 INFO org.apache.flink.table.filesystem. > >> SuccessFileCommitPolicy [] - Committed partition {dt=2020-08-22, > hour=19} > >> with success file > >> > >> 写hdfs的日志是都有的: > >> 2020-09-04 17:16:04,100 INFO org.apache.hadoop.hive.ql.io > .parquet.write. > >> ParquetRecordWriterWrapper [] - creating real writer to write at > hdfs:// > >> > Ucluster/user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test/dt=2020- > >> 08-22/hour=07/.part-b7d8f3c6-f1f3-40d4-a269-1ccf2c9a7720-0-1140 > >> .inprogress.1631ac6c-a07c-4ad7-86ff-cf0d4375d1de > >> 2020-09-04 17:16:04,126 INFO org.apache.hadoop.hive.ql.io > .parquet.write. > >> ParquetRecordWriterWrapper [] - creating real writer to write at > hdfs:// > >> > Ucluster/user/hive/warehouse/rt_dwd.db/dwd_music_copyright_test/dt=2020- > >> 08-22/hour=19/.part-b7d8f3c6-f1f3-40d4-a269-1ccf2c9a7720-0-1142 > >> .inprogress.2700eded-5ed0-4794-8ee9-21721c0c2ffd > >> > >> ------------------ 原始邮件 ------------------ > >> *发件人:* "Rui Li" <[hidden email]>; > >> *发送时间:* 2020年9月8日(星期二) 中午12:09 > >> *收件人:* "user-zh"<[hidden email]>;"夏帅"< > [hidden email]>; > >> *抄送:* "MuChen"<[hidden email]>; > >> *主题:* Re: 回复:使用StreamingFileSink向hive metadata中增加分区部分失败 > >> > >> streaming file committer在提交分区之前会打印这样的日志: > >> > >> LOG.info("Partition {} of table {} is ready to be committed", > partSpec, tableIdentifier); > >> > >> partition commit policy会在成功提交分区以后打印这样的日志: > >> > >> LOG.info("Committed partition {} to metastore", partitionSpec); > >> > >> LOG.info("Committed partition {} with success file", > context.partitionSpec()); > >> > >> 可以检查一下这样的日志,看是不是卡在什么地方了 > >> > >> On Tue, Sep 8, 2020 at 11:02 AM 夏帅 <[hidden email]> > wrote: > >> > >>> 就第二次提供的日志看,好像是你的namenode出现的问题 > >>> > >>> > >>> > ------------------------------------------------------------------ > >>> 发件人:MuChen <[hidden email]> > >>> 发送时间:2020年9月8日(星期二) 10:56 > >>> 收件人:[hidden email] 夏帅 <[hidden email]>; > user-zh < > >>> [hidden email]> > >>> 主 题:回复: 回复:使用StreamingFileSink向hive metadata中增加分区部分失败 > >>> > >>> 在checkpoint失败的时间,tm上还有一些info和warn级别的日志: > >>> 2020-09-04 17:17:59,520 INFO > >>> org.apache.hadoop.io.retry.RetryInvocationHandler [] - > Exception while > >>> invoking create of class ClientNamenodeProtocolTranslatorPB > over > >>> uhadoop-op3raf-master2/10.42.52.202:8020 after 14 fail over > attempts. > >>> Trying to fail over immediately. > >>> java.io.IOException: java.lang.InterruptedException > >>> at > org.apache.hadoop.ipc.Client.call(Client.java:1449) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > org.apache.hadoop.ipc.Client.call(Client.java:1401) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > com.sun.proxy.$Proxy26.create(Unknown Source) ~[?:?] > >>> at > >>> > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:295) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) > >>> ~[?:?] > >>> at > >>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > >>> ~[?:1.8.0_144] > >>> at > java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_144] > >>> at > >>> > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > com.sun.proxy.$Proxy27.create(Unknown Source) ~[?:?] > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1721) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1657) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1582) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:397) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:393) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:393) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:337) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:141) > >>> ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] > >>> at > >>> > org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:37) > >>> ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] > >>> at > >>> > org.apache.flink.table.filesystem.SuccessFileCommitPolicy.commit(SuccessFileCommitPolicy.java:45) > >>> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] > >>> at > >>> > org.apache.flink.table.filesystem.stream.StreamingFileCommitter.commitPartitions(StreamingFileCommitter.java:167) > >>> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] > >>> at > >>> > org.apache.flink.table.filesystem.stream.StreamingFileCommitter.processElement(StreamingFileCommitter.java:144) > >>> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:161) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.flink.streaming. > runtime.io > .StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:178) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.flink.streaming. > runtime.io > .StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:153) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.flink.streaming. > runtime.io > .StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:345) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:191) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:181) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:558) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:530) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > java.lang.Thread.run(Thread.java:748) [?:1.8.0_144] > >>> Caused by: java.lang.InterruptedException > >>> at > java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404) > >>> ~[?:1.8.0_144] > >>> at > java.util.concurrent.FutureTask.get(FutureTask.java:191) > >>> ~[?:1.8.0_144] > >>> at > >>> > org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1048) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > org.apache.hadoop.ipc.Client.call(Client.java:1443) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> ... 38 more > >>> 2020-09-04 17:17:59,522 WARN > >>> org.apache.hadoop.io.retry.RetryInvocationHandler [] - > Exception while > >>> invoking class > >>> > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create > >>> over uhadoop-op3raf-master1/10.42.31.63:8020. Not retrying > because > >>> failovers (15) exceeded maximum allowed (15) > >>> java.io.IOException: Failed on local exception: > >>> java.nio.channels.ClosedByInterruptException; Host Details : > local host is: > >>> "uhadoop-op3raf-core13/10.42.99.178"; destination host is: > >>> "uhadoop-op3raf-master1":8020; > >>> at org.apache.hadoop.net > .NetUtils.wrapException(NetUtils.java:772) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > org.apache.hadoop.ipc.Client.call(Client.java:1474) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > org.apache.hadoop.ipc.Client.call(Client.java:1401) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > com.sun.proxy.$Proxy26.create(Unknown Source) ~[?:?] > >>> at > >>> > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:295) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) > >>> ~[?:?] > >>> at > >>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > >>> ~[?:1.8.0_144] > >>> at > java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_144] > >>> at > >>> > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > com.sun.proxy.$Proxy27.create(Unknown Source) ~[?:?] > >>> at > >>> > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1721) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1657) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1582) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:397) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:393) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:393) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:337) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:141) > >>> ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] > >>> at > >>> > org.apache.flink.hive.shaded.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:37) > >>> ~[flink-sql-connector-hive-1.2.2_2.11-1.11.0.jar:1.11.0] > >>> at > >>> > org.apache.flink.table.filesystem.SuccessFileCommitPolicy.commit(SuccessFileCommitPolicy.java:45) > >>> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] > >>> at > >>> > org.apache.flink.table.filesystem.stream.StreamingFileCommitter.commitPartitions(StreamingFileCommitter.java:167) > >>> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] > >>> at > >>> > org.apache.flink.table.filesystem.stream.StreamingFileCommitter.processElement(StreamingFileCommitter.java:144) > >>> ~[flink-table-blink_2.11-1.11.0.jar:1.11.0] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:161) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.flink.streaming. > runtime.io > .StreamTaskNetworkInput.processElement(StreamTaskNetworkInput.java:178) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.flink.streaming. > runtime.io > .StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:153) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.flink.streaming. > runtime.io > .StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:345) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:191) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:181) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:558) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:530) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) > >>> [music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > java.lang.Thread.run(Thread.java:748) [?:1.8.0_144] > >>> Caused by: java.nio.channels.ClosedByInterruptException > >>> at > >>> > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > >>> ~[?:1.8.0_144] > >>> at > sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659) > >>> ~[?:1.8.0_144] > >>> at org.apache.hadoop.net > .SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.net > .NetUtils.connect(NetUtils.java:530) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at org.apache.hadoop.net > .NetUtils.connect(NetUtils.java:494) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:609) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > >>> > org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > org.apache.hadoop.ipc.Client.getConnection(Client.java:1523) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> at > org.apache.hadoop.ipc.Client.call(Client.java:1440) > >>> ~[music_copyright-1.0-SNAPSHOT-jar-with-dependencies.jar:?] > >>> ... 38 more > >>> > >>> 补充:程序多次执行,均会出现部分分区创建失败的情况,而且每次失败的分区是不同的 > >>> > >>> > >>> ------------------ 原始邮件 ------------------ > >>> 发件人: "[hidden email] 夏帅" > <[hidden email]>; > >>> 发送时间: 2020年9月8日(星期二) 上午10:47 > >>> 收件人: "user-zh"<[hidden email]>;"MuChen"< > [hidden email]>; > >>> 主题: 回复:使用StreamingFileSink向hive metadata中增加分区部分失败 > >>> > >>> 异常日志只有这些么?有没有详细点的 > >> > >> > >> > >> -- > >> Best regards! > >> Rui Li > >> > > > > > > -- > > Best regards! > > Rui Li > > > > > -- > Best regards! > Rui Li -- Best, Jingsong Lee |
Free forum by Nabble | Edit this page |