flink-1.11 ddl kafka-to-hive问题

classic Classic list List threaded Threaded
5 messages Options
kcz
Reply | Threaded
Open this post in threaded view
|

flink-1.11 ddl kafka-to-hive问题

kcz
hive-1.2.1
chk 已经成功了(去chk目录查看了的确有chk数据,kafka也有数据),但是hive表没有数据,我是哪里缺少了什么吗?
String hiveSql = "CREATE  TABLE  stream_tmp.fs_table (\n" +
        "  host STRING,\n" +
        "  url STRING," +
        "  public_date STRING" +
        ") partitioned by (public_date string) " +
        "stored as PARQUET " +
        "TBLPROPERTIES (\n" +
        "  'sink.partition-commit.delay'='0 s',\n" +
        "  'sink.partition-commit.trigger'='partition-time',\n" +
        "  'sink.partition-commit.policy.kind'='metastore,success-file'" +
        ")";
tableEnv.executeSql(hiveSql);


tableEnv.executeSql("INSERT INTO  stream_tmp.fs_table SELECT host, url, DATE_FORMAT(public_date, 'yyyy-MM-dd') FROM stream_tmp.source_table");
Reply | Threaded
Open this post in threaded view
|

Re: flink-1.11 ddl kafka-to-hive问题

Jark
Administrator
rolling 策略配一下?
https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/connectors/filesystem.html#sink-rolling-policy-rollover-interval

Best,
Jark

On Tue, 21 Jul 2020 at 20:38, JasonLee <[hidden email]> wrote:

> hi
> hive表是一直没有数据还是过一段时间就有数据了?
>
>
> | |
> JasonLee
> |
> |
> 邮箱:[hidden email]
> |
>
> Signature is customized by Netease Mail Master
>
> 在2020年07月21日 19:09,kcz 写道:
> hive-1.2.1
> chk 已经成功了(去chk目录查看了的确有chk数据,kafka也有数据),但是hive表没有数据,我是哪里缺少了什么吗?
> String hiveSql = "CREATE  TABLE  stream_tmp.fs_table (\n" +
>        "  host STRING,\n" +
>        "  url STRING," +
>        "  public_date STRING" +
>        ") partitioned by (public_date string) " +
>        "stored as PARQUET " +
>        "TBLPROPERTIES (\n" +
>        "  'sink.partition-commit.delay'='0 s',\n" +
>        "  'sink.partition-commit.trigger'='partition-time',\n" +
>        "  'sink.partition-commit.policy.kind'='metastore,success-file'" +
>        ")";
> tableEnv.executeSql(hiveSql);
>
>
> tableEnv.executeSql("INSERT INTO  stream_tmp.fs_table SELECT host, url,
> DATE_FORMAT(public_date, 'yyyy-MM-dd') FROM stream_tmp.source_table");
kcz
Reply | Threaded
Open this post in threaded view
|

回复:flink-1.11 ddl kafka-to-hive问题

kcz
In reply to this post by kcz
一直都木有数据 我也不知道哪里不太对 hive有这个表了已经。我测试写ddl hdfs 是OK的





------------------ 原始邮件 ------------------
发件人: JasonLee <[hidden email]&gt;
发送时间: 2020年7月21日 20:39
收件人: user-zh <[hidden email]&gt;
主题: 回复:flink-1.11 ddl kafka-to-hive问题



hi
hive表是一直没有数据还是过一段时间就有数据了?


| |
JasonLee
|
|
邮箱:[hidden email]
|

Signature is customized by Netease Mail Master

在2020年07月21日 19:09,kcz 写道:
hive-1.2.1
chk 已经成功了(去chk目录查看了的确有chk数据,kafka也有数据),但是hive表没有数据,我是哪里缺少了什么吗?
String hiveSql = "CREATE&nbsp; TABLE&nbsp; stream_tmp.fs_table (\n" +
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp; host STRING,\n" +
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp; url STRING," +
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp; public_date STRING" +
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ") partitioned by (public_date string) " +
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "stored as PARQUET " +
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "TBLPROPERTIES (\n" +
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp; 'sink.partition-commit.delay'='0 s',\n" +
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp; 'sink.partition-commit.trigger'='partition-time',\n" +
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp; 'sink.partition-commit.policy.kind'='metastore,success-file'" +
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ")";
tableEnv.executeSql(hiveSql);


tableEnv.executeSql("INSERT INTO&nbsp; stream_tmp.fs_table SELECT host, url, DATE_FORMAT(public_date, 'yyyy-MM-dd') FROM stream_tmp.source_table");
Reply | Threaded
Open this post in threaded view
|

Re: flink-1.11 ddl kafka-to-hive问题

Leonard Xu
HI,

Hive 表时在flink里建的吗? 如果是建表时使用了hive dialect吗?可以参考[1]设置下

Best
Leonard Xu
[1] https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/hive_dialect.html#use-hive-dialect <https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/hive_dialect.html#use-hive-dialect>

> 在 2020年7月21日,22:57,kcz <[hidden email]> 写道:
>
> 一直都木有数据 我也不知道哪里不太对 hive有这个表了已经。我测试写ddl hdfs 是OK的
>
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: JasonLee <[hidden email] <mailto:[hidden email]>&gt;
> 发送时间: 2020年7月21日 20:39
> 收件人: user-zh <[hidden email] <mailto:[hidden email]>&gt;
> 主题: 回复:flink-1.11 ddl kafka-to-hive问题
>
>
>
> hi
> hive表是一直没有数据还是过一段时间就有数据了?
>
>
> | |
> JasonLee
> |
> |
> 邮箱:[hidden email]
> |
>
> Signature is customized by Netease Mail Master
>
> 在2020年07月21日 19:09,kcz 写道:
> hive-1.2.1
> chk 已经成功了(去chk目录查看了的确有chk数据,kafka也有数据),但是hive表没有数据,我是哪里缺少了什么吗?
> String hiveSql = "CREATE&nbsp; TABLE&nbsp; stream_tmp.fs_table (\n" +
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp; host STRING,\n" +
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp; url STRING," +
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp; public_date STRING" +
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ") partitioned by (public_date string) " +
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "stored as PARQUET " +
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "TBLPROPERTIES (\n" +
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp; 'sink.partition-commit.delay'='0 s',\n" +
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp; 'sink.partition-commit.trigger'='partition-time',\n" +
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp; 'sink.partition-commit.policy.kind'='metastore,success-file'" +
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ")";
> tableEnv.executeSql(hiveSql);
>
>
> tableEnv.executeSql("INSERT INTO&nbsp; stream_tmp.fs_table SELECT host, url, DATE_FORMAT(public_date, 'yyyy-MM-dd') FROM stream_tmp.source_table");

Reply | Threaded
Open this post in threaded view
|

Re: flink-1.11 ddl kafka-to-hive问题

Jingsong Li
你的Source表是怎么定义的?确定有watermark前进吗?(可以看Flink UI)

'sink.partition-commit.trigger'='partition-time' 去掉试试?

Best,
Jingsong

On Wed, Jul 22, 2020 at 12:02 AM Leonard Xu <[hidden email]> wrote:

> HI,
>
> Hive 表时在flink里建的吗? 如果是建表时使用了hive dialect吗?可以参考[1]设置下
>
> Best
> Leonard Xu
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/hive_dialect.html#use-hive-dialect
> <
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/hive_dialect.html#use-hive-dialect
> >
>
> > 在 2020年7月21日,22:57,kcz <[hidden email]> 写道:
> >
> > 一直都木有数据 我也不知道哪里不太对 hive有这个表了已经。我测试写ddl hdfs 是OK的
> >
> >
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: JasonLee <[hidden email] <mailto:[hidden email]>&gt;
> > 发送时间: 2020年7月21日 20:39
> > 收件人: user-zh <[hidden email] <mailto:[hidden email]
> >&gt;
> > 主题: 回复:flink-1.11 ddl kafka-to-hive问题
> >
> >
> >
> > hi
> > hive表是一直没有数据还是过一段时间就有数据了?
> >
> >
> > | |
> > JasonLee
> > |
> > |
> > 邮箱:[hidden email]
> > |
> >
> > Signature is customized by Netease Mail Master
> >
> > 在2020年07月21日 19:09,kcz 写道:
> > hive-1.2.1
> > chk 已经成功了(去chk目录查看了的确有chk数据,kafka也有数据),但是hive表没有数据,我是哪里缺少了什么吗?
> > String hiveSql = "CREATE&nbsp; TABLE&nbsp; stream_tmp.fs_table (\n" +
> > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp; host STRING,\n" +
> > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp; url STRING," +
> > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp; public_date STRING" +
> > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ") partitioned by (public_date
> string) " +
> > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "stored as PARQUET " +
> > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "TBLPROPERTIES (\n" +
> > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp;
> 'sink.partition-commit.delay'='0 s',\n" +
> > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp;
> 'sink.partition-commit.trigger'='partition-time',\n" +
> > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp;
> 'sink.partition-commit.policy.kind'='metastore,success-file'" +
> > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ")";
> > tableEnv.executeSql(hiveSql);
> >
> >
> > tableEnv.executeSql("INSERT INTO&nbsp; stream_tmp.fs_table SELECT host,
> url, DATE_FORMAT(public_date, 'yyyy-MM-dd') FROM stream_tmp.source_table");
>
>

--
Best, Jingsong Lee