各位好, flink1.10,在跑flink批量sql的适合语法通过没问题,在运行脚步的适合报错如下,hive脚步跑没有问题,不知道为什么flink 跑会报数组越界,这个是什么问题?
|
图看不到
flink内置udf和hive udf不同,有些udf下标是从1开始的 各位好, flink1.10,在跑flink批量sql的适合语法通过没问题,在运行脚步的适合报错如下,hive脚步跑没有问题,不知道为什么flink 跑会报数组越界,这个是什么问题? |
我觉得要是从1开始,那么编译的时候就应该报异常了,而不是提交作业运行报。
Caused by: java.lang.ArrayIndexOutOfBoundsException: 22369621 18-05-2020 16:27:14 CST INFO - at org.apache.flink.table.runtime.util.SegmentsUtil.getByteMultiSegments(SegmentsUtil.java:598) 18-05-2020 16:27:14 CST INFO - at org.apache.flink.table.runtime.util.SegmentsUtil.getByte(SegmentsUtil.java:590) 18-05-2020 16:27:14 CST INFO - at org.apache.flink.table.runtime.util.SegmentsUtil.bitGet(SegmentsUtil.java:534) 18-05-2020 16:27:14 CST INFO - at org.apache.flink.table.dataformat.BinaryArray.isNullAt(BinaryArray.java:117) 18-05-2020 16:27:14 CST INFO - at BatchCalc$822.processElement(Unknown Source) 18-05-2020 16:27:14 CST INFO - at org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingOutput.pushToOperator(OperatorChain.java:550) 18-05-2020 16:27:14 CST INFO - at org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingOutput.collect(OperatorChain.java:527) 18-05-2020 16:27:14 CST INFO - at org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingOutput.collect(OperatorChain.java:487) 18-05-2020 16:27:14 CST INFO - at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingBroadcastingOutputCollector.collect(OperatorChain.java:748) 18-05-2020 16:27:14 CST INFO - at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingBroadcastingOutputCollector.collect(OperatorChain.java:734) 18-05-2020 16:27:14 CST INFO - at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:730) 18-05-2020 16:27:14 CST INFO - at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:708) 18-05-2020 16:27:14 CST INFO - at org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104) 18-05-2020 16:27:14 CST INFO - at org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:93) 18-05-2020 16:27:14 CST INFO - at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100) 18-05-2020 16:27:14 CST INFO - at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63) 在 2020-05-18 16:38:16,"1048262223" <[hidden email]> 写道: >图看不到 >flink内置udf和hive udf不同,有些udf下标是从1开始的 > > > > > > > > >各位好, > flink1.10,在跑flink批量sql的适合语法通过没问题,在运行脚步的适合报错如下,hive脚步跑没有问题,不知道为什么flink 跑会报数组越界,这个是什么问题? > > > > > > |
数组长度是运行时的问题,编译期并不知道数组的长度。而且现在好像也没有检查下标是不是合法(比如必须大于0)。我们以前也经常遇到这种问题。
allanqinjy <[hidden email]> 于2020年5月18日周一 下午5:15写道: > 我觉得要是从1开始,那么编译的时候就应该报异常了,而不是提交作业运行报。 > > > > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 22369621 > 18-05-2020 16:27:14 CST INFO - at > org.apache.flink.table.runtime.util.SegmentsUtil.getByteMultiSegments(SegmentsUtil.java:598) > 18-05-2020 16:27:14 CST INFO - at > org.apache.flink.table.runtime.util.SegmentsUtil.getByte(SegmentsUtil.java:590) > 18-05-2020 16:27:14 CST INFO - at > org.apache.flink.table.runtime.util.SegmentsUtil.bitGet(SegmentsUtil.java:534) > 18-05-2020 16:27:14 CST INFO - at > org.apache.flink.table.dataformat.BinaryArray.isNullAt(BinaryArray.java:117) > 18-05-2020 16:27:14 CST INFO - at BatchCalc$822.processElement(Unknown > Source) > 18-05-2020 16:27:14 CST INFO - at > org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingOutput.pushToOperator(OperatorChain.java:550) > 18-05-2020 16:27:14 CST INFO - at > org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingOutput.collect(OperatorChain.java:527) > 18-05-2020 16:27:14 CST INFO - at > org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingOutput.collect(OperatorChain.java:487) > 18-05-2020 16:27:14 CST INFO - at > org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingBroadcastingOutputCollector.collect(OperatorChain.java:748) > 18-05-2020 16:27:14 CST INFO - at > org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingBroadcastingOutputCollector.collect(OperatorChain.java:734) > 18-05-2020 16:27:14 CST INFO - at > org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:730) > 18-05-2020 16:27:14 CST INFO - at > org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:708) > 18-05-2020 16:27:14 CST INFO - at > org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104) > 18-05-2020 16:27:14 CST INFO - at > org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:93) > 18-05-2020 16:27:14 CST INFO - at > org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100) > 18-05-2020 16:27:14 CST INFO - at > org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63) > > > > > > > > > 在 2020-05-18 16:38:16,"1048262223" <[hidden email]> 写道: > >图看不到 > >flink内置udf和hive udf不同,有些udf下标是从1开始的 > > > > > > > > > > > > > > > > > >各位好, > > > flink1.10,在跑flink批量sql的适合语法通过没问题,在运行脚步的适合报错如下,hive脚步跑没有问题,不知道为什么flink > 跑会报数组越界,这个是什么问题? > > > > > > > > > > > > > -- Benchao Li School of Electronics Engineering and Computer Science, Peking University Tel:+86-15650713730 Email: [hidden email]; [hidden email] |
Hi, allanqinjy
运行时抛ArrayIndexOutOfBoundsException 是不符合预期的,感觉是个bug。 可以复现的haul,方便提供下复现的sql和数据吗? Best, Leonard Xu > 在 2020年5月18日,17:37,Benchao Li <[hidden email]> 写道: > > 数组长度是运行时的问题,编译期并不知道数组的长度。而且现在好像也没有检查下标是不是合法(比如必须大于0)。我们以前也经常遇到这种问题。 > > allanqinjy <[hidden email]> 于2020年5月18日周一 下午5:15写道: > >> 我觉得要是从1开始,那么编译的时候就应该报异常了,而不是提交作业运行报。 >> >> >> >> >> Caused by: java.lang.ArrayIndexOutOfBoundsException: 22369621 >> 18-05-2020 16:27:14 CST INFO - at >> org.apache.flink.table.runtime.util.SegmentsUtil.getByteMultiSegments(SegmentsUtil.java:598) >> 18-05-2020 16:27:14 CST INFO - at >> org.apache.flink.table.runtime.util.SegmentsUtil.getByte(SegmentsUtil.java:590) >> 18-05-2020 16:27:14 CST INFO - at >> org.apache.flink.table.runtime.util.SegmentsUtil.bitGet(SegmentsUtil.java:534) >> 18-05-2020 16:27:14 CST INFO - at >> org.apache.flink.table.dataformat.BinaryArray.isNullAt(BinaryArray.java:117) >> 18-05-2020 16:27:14 CST INFO - at BatchCalc$822.processElement(Unknown >> Source) >> 18-05-2020 16:27:14 CST INFO - at >> org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingOutput.pushToOperator(OperatorChain.java:550) >> 18-05-2020 16:27:14 CST INFO - at >> org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingOutput.collect(OperatorChain.java:527) >> 18-05-2020 16:27:14 CST INFO - at >> org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingOutput.collect(OperatorChain.java:487) >> 18-05-2020 16:27:14 CST INFO - at >> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingBroadcastingOutputCollector.collect(OperatorChain.java:748) >> 18-05-2020 16:27:14 CST INFO - at >> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingBroadcastingOutputCollector.collect(OperatorChain.java:734) >> 18-05-2020 16:27:14 CST INFO - at >> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:730) >> 18-05-2020 16:27:14 CST INFO - at >> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:708) >> 18-05-2020 16:27:14 CST INFO - at >> org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104) >> 18-05-2020 16:27:14 CST INFO - at >> org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:93) >> 18-05-2020 16:27:14 CST INFO - at >> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100) >> 18-05-2020 16:27:14 CST INFO - at >> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63) >> >> >> >> >> >> >> >> >> 在 2020-05-18 16:38:16,"1048262223" <[hidden email]> 写道: >>> 图看不到 >>> flink内置udf和hive udf不同,有些udf下标是从1开始的 >>> >>> >>> >>> >>> >>> >>> >>> >>> 各位好, >>> >> flink1.10,在跑flink批量sql的适合语法通过没问题,在运行脚步的适合报错如下,hive脚步跑没有问题,不知道为什么flink >> 跑会报数组越界,这个是什么问题? >>> >>> >>> >>> >>> >>> >> > > > -- > > Benchao Li > School of Electronics Engineering and Computer Science, Peking University > Tel:+86-15650713730 > Email: [hidden email]; [hidden email] |
Hi, allanqinjy
方便贴下查询的query吗?今天在排查另外一个问题时也遇到了这个问题,我建了issue来跟踪[1],想看下是不是相同原因。 Best, Leonard [1] https://issues.apache.org/jira/browse/FLINK-17847 <https://issues.apache.org/jira/browse/FLINK-17847> > 在 2020年5月18日,19:52,Leonard Xu <[hidden email]> 写道: > > Hi, allanqinjy > > 运行时抛ArrayIndexOutOfBoundsException 是不符合预期的,感觉是个bug。 > 可以复现的haul,方便提供下复现的sql和数据吗? > > Best, > Leonard Xu > > >> 在 2020年5月18日,17:37,Benchao Li <[hidden email]> 写道: >> >> 数组长度是运行时的问题,编译期并不知道数组的长度。而且现在好像也没有检查下标是不是合法(比如必须大于0)。我们以前也经常遇到这种问题。 >> >> allanqinjy <[hidden email]> 于2020年5月18日周一 下午5:15写道: >> >>> 我觉得要是从1开始,那么编译的时候就应该报异常了,而不是提交作业运行报。 >>> >>> >>> >>> >>> Caused by: java.lang.ArrayIndexOutOfBoundsException: 22369621 >>> 18-05-2020 16:27:14 CST INFO - at >>> org.apache.flink.table.runtime.util.SegmentsUtil.getByteMultiSegments(SegmentsUtil.java:598) >>> 18-05-2020 16:27:14 CST INFO - at >>> org.apache.flink.table.runtime.util.SegmentsUtil.getByte(SegmentsUtil.java:590) >>> 18-05-2020 16:27:14 CST INFO - at >>> org.apache.flink.table.runtime.util.SegmentsUtil.bitGet(SegmentsUtil.java:534) >>> 18-05-2020 16:27:14 CST INFO - at >>> org.apache.flink.table.dataformat.BinaryArray.isNullAt(BinaryArray.java:117) >>> 18-05-2020 16:27:14 CST INFO - at BatchCalc$822.processElement(Unknown >>> Source) >>> 18-05-2020 16:27:14 CST INFO - at >>> org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingOutput.pushToOperator(OperatorChain.java:550) >>> 18-05-2020 16:27:14 CST INFO - at >>> org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingOutput.collect(OperatorChain.java:527) >>> 18-05-2020 16:27:14 CST INFO - at >>> org.apache.flink.streaming.runtime.tasks.OperatorChain$ChainingOutput.collect(OperatorChain.java:487) >>> 18-05-2020 16:27:14 CST INFO - at >>> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingBroadcastingOutputCollector.collect(OperatorChain.java:748) >>> 18-05-2020 16:27:14 CST INFO - at >>> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingBroadcastingOutputCollector.collect(OperatorChain.java:734) >>> 18-05-2020 16:27:14 CST INFO - at >>> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:730) >>> 18-05-2020 16:27:14 CST INFO - at >>> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:708) >>> 18-05-2020 16:27:14 CST INFO - at >>> org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104) >>> 18-05-2020 16:27:14 CST INFO - at >>> org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:93) >>> 18-05-2020 16:27:14 CST INFO - at >>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100) >>> 18-05-2020 16:27:14 CST INFO - at >>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63) >>> >>> >>> >>> >>> >>> >>> >>> >>> 在 2020-05-18 16:38:16,"1048262223" <[hidden email]> 写道: >>>> 图看不到 >>>> flink内置udf和hive udf不同,有些udf下标是从1开始的 >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> 各位好, >>>> >>> flink1.10,在跑flink批量sql的适合语法通过没问题,在运行脚步的适合报错如下,hive脚步跑没有问题,不知道为什么flink >>> 跑会报数组越界,这个是什么问题? >>>> >>>> >>>> >>>> >>>> >>>> >>> >> >> >> -- >> >> Benchao Li >> School of Electronics Engineering and Computer Science, Peking University >> Tel:+86-15650713730 >> Email: [hidden email]; [hidden email] > |
Free forum by Nabble | Edit this page |