Flink 1.10.0
问题描述:source表中有个test_array_string ARRAY<VARCHAR>字段,在DML语句用test_array_string[0]获取数组中的值会报数组越界异常。另外测试过Array<Varchar>也是相同错误,Array<int>,Array<bigint>等类型也会报数组越界问题。 请问这是Flink1.10的bug吗? SQL: CREATETABLE source ( …… test_array_string ARRAY<VARCHAR> ) WITH ( 'connector.type'='kafka', 'update-mode'='append', 'format.type'='json' …… ); CREATETABLE sink( v_string string ) WITH ( …… ); INSERTINTO sink SELECT test_array_string[0] as v_string from source; kafka样例数据:{"id":1,"test_array_string":["ff”]} Flink 执行的时候报以下错误: java.lang.ArrayIndexOutOfBoundsException: 33554432 at org.apache.flink.table.runtime.util.SegmentsUtil.getByteMultiSegments(SegmentsUtil.java:598) at org.apache.flink.table.runtime.util.SegmentsUtil.getByte(SegmentsUtil.java:590) at org.apache.flink.table.runtime.util.SegmentsUtil.bitGet(SegmentsUtil.java:534) at org.apache.flink.table.dataformat.BinaryArray.isNullAt(BinaryArray.java:117) at StreamExecCalc$9.processElement(UnknownSource) at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:641) at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:616) at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:596) at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:730) at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:708) at SourceConversion$1.processElement(UnknownSource) at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:641) at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:616) at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:596) at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:730) at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:708) at org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollectWithTimestamp(StreamSourceContexts.java:310) at org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collectWithTimestamp(StreamSourceContexts.java:409) at org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordWithTimestamp(AbstractFetcher.java:408) at org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.emitRecord(KafkaFetcher.java:185) at org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.runFetchLoop(KafkaFetcher.java:150) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:715) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:196) | | 叶贤勋 | | [hidden email] | 签名由网易邮箱大师定制 |
Hi,
SQL 中数据下标是从1开始的,不是从0,所以会有数组越界问题。建议使用数组时通过 select arr[5] from T where CARDINALITY(arr) >= 5 这种方式防止数组访问越界。 祝好, Leonard Xu > 在 2020年7月13日,20:34,叶贤勋 <[hidden email]> 写道: > > test_array_string[0] |
谢谢 Leonard的解答。刚刚也看到了这个jira单[1]
[1] https://issues.apache.org/jira/browse/FLINK-17847 | | 叶贤勋 | | [hidden email] | 签名由网易邮箱大师定制 在2020年07月13日 20:50,Leonard Xu<[hidden email]> 写道: Hi, SQL 中数据下标是从1开始的,不是从0,所以会有数组越界问题。建议使用数组时通过 select arr[5] from T where CARDINALITY(arr) >= 5 这种方式防止数组访问越界。 祝好, Leonard Xu 在 2020年7月13日,20:34,叶贤勋 <[hidden email]> 写道: test_array_string[0] |
Free forum by Nabble | Edit this page |