Apache Flink 中文用户邮件列表

Flink1.12 批处理模式，分词统计时单词个数为1的单词不会被打印

Classic

List

Threaded

2 messages Options

xhyan0427

Flink1.12 批处理模式，分词统计时单词个数为1的单词不会被打印

代码：
val env = StreamExecutionEnvironment.getExecutionEnvironment

env.setRuntimeMode(RuntimeExecutionMode.BATCH) // 在DataStream
API上以批处理方式执行

// 本地测试文件
val inputStream =
env.readTextFile(getClass.getResource("/hello.txt").getPath)

// 分词统计，问题：批处理模式的时候，sum 为 1 的单词不会被打印
val resultStream = inputStream
.flatMap(_.split(","))
.filter(_.nonEmpty)
.map((_, 1))
.keyBy(_._1)
.sum(1)
resultStream.print()
env.execute("word count")

测试文件的数据内容：
hello,flink
hello,flink
hello,hive
hello,hive
hello,hbase
hello,hbase
hello,scala
hello,kafka
hello,kafka

测试结果：hello/flink/hive/hbase/kafka的和大于1，会打印出来；但是 scala的个数为1，不会被打印出来

--
Sent from: http://apache-flink.147419.n8.nabble.com/

Xintong Song

Re: Flink1.12 批处理模式，分词统计时单词个数为1的单词不会被打印

你用的应该是 1.12.0 版本吧。这是一个已知问题 [1]，升级到 1.12.1 有修复。

Thank you~

Xintong Song

[1] https://issues.apache.org/jira/browse/FLINK-20764

On Thu, Jan 28, 2021 at 4:55 PM xhyan0427 <[hidden email]> wrote:

> 代码：
> val env = StreamExecutionEnvironment.getExecutionEnvironment
>
> env.setRuntimeMode(RuntimeExecutionMode.BATCH) // 在DataStream
> API上以批处理方式执行
>
> // 本地测试文件
> val inputStream =
> env.readTextFile(getClass.getResource("/hello.txt").getPath)
>
> // 分词统计，问题：批处理模式的时候，sum 为 1 的单词不会被打印
> val resultStream = inputStream
> .flatMap(_.split(","))
> .filter(_.nonEmpty)
> .map((_, 1))
> .keyBy(_._1)
> .sum(1)
> resultStream.print()
> env.execute("word count")
>
> 测试文件的数据内容：
> hello,flink
> hello,flink
> hello,hive
> hello,hive
> hello,hbase
> hello,hbase
> hello,scala
> hello,kafka
> hello,kafka
>
>
> 测试结果：hello/flink/hive/hbase/kafka的和大于1，会打印出来；但是 scala的个数为1，不会被打印出来
>
>
>
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/
>