使用Flink Array Field Type

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

使用Flink Array Field Type

叶贤勋
Flink 1.10.0
问题描述:source表中有个test_array_string ARRAY<VARCHAR>字段,在DML语句用test_array_string[0]获取数组中的值会报数组越界异常。另外测试过Array<Varchar>也是相同错误,Array<int>,Array<bigint>等类型也会报数组越界问题。
请问这是Flink1.10的bug吗?


SQL:
CREATETABLE source (
……
test_array_string ARRAY<VARCHAR>
) WITH (
'connector.type'='kafka',
'update-mode'='append',
'format.type'='json'
  ……
);


CREATETABLE sink(
v_string string
) WITH (
  ……
);


INSERTINTO
sink
SELECT
test_array_string[0] as v_string
from
source;


kafka样例数据:{"id":1,"test_array_string":["ff”]}


Flink 执行的时候报以下错误:
java.lang.ArrayIndexOutOfBoundsException: 33554432
    at org.apache.flink.table.runtime.util.SegmentsUtil.getByteMultiSegments(SegmentsUtil.java:598)
    at org.apache.flink.table.runtime.util.SegmentsUtil.getByte(SegmentsUtil.java:590)
    at org.apache.flink.table.runtime.util.SegmentsUtil.bitGet(SegmentsUtil.java:534)
    at org.apache.flink.table.dataformat.BinaryArray.isNullAt(BinaryArray.java:117)
    at StreamExecCalc$9.processElement(UnknownSource)
    at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:641)
    at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:616)
    at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:596)
    at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:730)
    at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:708)
    at SourceConversion$1.processElement(UnknownSource)
    at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:641)
    at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:616)
    at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:596)
    at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:730)
    at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:708)
    at org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollectWithTimestamp(StreamSourceContexts.java:310)
    at org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collectWithTimestamp(StreamSourceContexts.java:409)
    at org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordWithTimestamp(AbstractFetcher.java:408)
    at org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.emitRecord(KafkaFetcher.java:185)
    at org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.runFetchLoop(KafkaFetcher.java:150)
    at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:715)
    at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
    at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
    at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:196)


| |
叶贤勋
|
|
[hidden email]
|
签名由网易邮箱大师定制

Reply | Threaded
Open this post in threaded view
|

Re: 使用Flink Array Field Type

Leonard Xu
Hi,

SQL 中数据下标是从1开始的,不是从0,所以会有数组越界问题。建议使用数组时通过 select arr[5] from T where CARDINALITY(arr) >= 5 这种方式防止数组访问越界。


祝好,
Leonard Xu

> 在 2020年7月13日,20:34,叶贤勋 <[hidden email]> 写道:
>
> test_array_string[0]

Reply | Threaded
Open this post in threaded view
|

回复: 使用Flink Array Field Type

叶贤勋
谢谢 Leonard的解答。刚刚也看到了这个jira单[1]


[1] https://issues.apache.org/jira/browse/FLINK-17847
| |
叶贤勋
|
|
[hidden email]
|
签名由网易邮箱大师定制


在2020年07月13日 20:50,Leonard Xu<[hidden email]> 写道:
Hi,

SQL 中数据下标是从1开始的,不是从0,所以会有数组越界问题。建议使用数组时通过 select arr[5] from T where CARDINALITY(arr) >= 5 这种方式防止数组访问越界。


祝好,
Leonard Xu

在 2020年7月13日,20:34,叶贤勋 <[hidden email]> 写道:

test_array_string[0]