问题现象是这样的
1. flink 实时的往hdfs 目录 /warehouse/rt_ods/ed_cell_num_info/写数据,以天为分区 2. 然后我们启动了一个定时任务在 hive上新建partition,是T - 1的,比如今天6点执行,新建 昨天 的分区。 3. 会报,inprogress文件找不到的错误,错误如下 : ``` org.apache.flink.streaming.runtime.tasks.AsynchronousException: Caught exception while processing timer. at org.apache.flink.streaming.runtime.tasks.StreamTask$StreamTaskAsyncExceptionHandler.handleAsyncException(StreamTask.java:1117) at org.apache.flink.streaming.runtime.tasks.StreamTask.handleAsyncException(StreamTask.java:1091) at org.apache.flink.streaming.runtime.tasks.StreamTask.invokeProcessingTimeCallback(StreamTask.java:1222) at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$null$16(StreamTask.java:1211) at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:92) at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:78) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:282) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:190) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:181) at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:566) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:536) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.flink.streaming.runtime.tasks.TimerException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /warehouse/rt_ods/ed_cell_num_info/pt=20210311/.part-80f4307d-b910-42fa-8500-2c1226c5a879-0-57.inprogress.94c24d6c-e6f7-4387-b2e2-e667a44b23f6 (inode 2792145632): File does not exist. [Lease. Holder: DFSClient_NONMAPREDUCE_-1421245761_79, pending creates: 2] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3698) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:3785) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3755) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:745) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.complete(AuthorizationProviderProxyClientProtocol.java:245) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:540) -- at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2211) ... 12 common frames omitted ``` 从上面看出,报错的是pt=20210311的分区里的 inprogress文件找不到。 sql 如下 : ``` CREATE TABLE T_ED_CELL_NUM_INFO_SINK( .... pt STRING ) PARTITIONED BY (pt) WITH ( 'connector' = 'filesystem', 'path' = 'hdfs://xxxx/warehouse/rt_ods/ed_cell_num_info', 'format' = 'csv', 'csv.field-delimiter' = U&'\0009', 'csv.disable-quote-character' = 'true', 'sink.rolling-policy.file-size' = '200m', 'sink.rolling-policy.rollover-interval' = '15min', 'sink.rolling-policy.check-interval' = '1min' ); ``` |
Free forum by Nabble | Edit this page |