开发者好:
目前有此场景:算不同部门的总收入和所有部门的总收入,打算把两部分SQL union all ,但是实际情况发现 union all的时候会丢一部分数据,要么是各个部门的数据少了,要么是所有部门的总收入少了 如果把union all 的两段SQL 分别独立出来,插入同一张表,那么数据就是正常的,不知道是否是bug还是使用方法不对 原sql : insert into dws_XXXX select 0 as id ,cast (DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') as bigint) as ftime ,case when dept_name like '%XX%' then 'X1' when dept_name = 'xXX' then 'X2' else 'X3' end as paytype ,count(orderid) as paynum_h ,round(sum(amt)) as paymoney_h from dwd_XXX where write_time >=DATE_FORMAT(LOCALTIMESTAMP, 'yyyy-MM-dd') group by DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH'), case when dept_name like '%XX%' then 'X1' when dept_name = 'xXX' then 'X2' else 'X3' end ; union all select 0 as id ,cast (DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') as int) as ftime ,'all' as paytype ,count(orderid) as paynum_h ,round(sum(amt)) as paymoney_h from dwd_XXX where write_time >=DATE_FORMAT(LOCALTIMESTAMP, 'yyyy-MM-dd') group by DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') ; |
你的union all上面,也就是insert into 的第一条select 末尾有 分号 ‘;’,, 这样的话我感觉第二条select是不会执行的。还有另一个问题是你把分号去掉,我感觉数据会被 覆盖的。因为我最近也有使用union all的场景。
我觉得你应该这样组装 你的sql : ```sql Insert into xxx Select d1, d2, count(1) From ( Select * from a Union all Select * from b, ) Group by d1, d2 ``` 发送自 Windows 10 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>应用 发件人: 夜思流年梦<mailto:[hidden email]> 发送时间: 2020年11月4日 18:21 收件人: [hidden email]<mailto:[hidden email]> 主题: union all 丢失部分数据 开发者好: 目前有此场景:算不同部门的总收入和所有部门的总收入,打算把两部分SQL union all ,但是实际情况发现 union all的时候会丢一部分数据,要么是各个部门的数据少了,要么是所有部门的总收入少了 如果把union all 的两段SQL 分别独立出来,插入同一张表,那么数据就是正常的,不知道是否是bug还是使用方法不对 原sql : insert into dws_XXXX select 0 as id ,cast (DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') as bigint) as ftime ,case when dept_name like '%XX%' then 'X1' when dept_name = 'xXX' then 'X2' else 'X3' end as paytype ,count(orderid) as paynum_h ,round(sum(amt)) as paymoney_h from dwd_XXX where write_time >=DATE_FORMAT(LOCALTIMESTAMP, 'yyyy-MM-dd') group by DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH'), case when dept_name like '%XX%' then 'X1' when dept_name = 'xXX' then 'X2' else 'X3' end ; union all select 0 as id ,cast (DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') as int) as ftime ,'all' as paytype ,count(orderid) as paynum_h ,round(sum(amt)) as paymoney_h from dwd_XXX where write_time >=DATE_FORMAT(LOCALTIMESTAMP, 'yyyy-MM-dd') group by DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') ; |
In reply to this post by 夜思流年梦
Hi liaobiao,
你的 flink 版本是什么呢? 根据你的 SQL,如果是版本是 <= 1.10 的话,会根据 MetaDataHander 识别出你的 group by 后面的 key 作为 upsert key,这样就会产生覆盖的情况。 你看下结果是否是这种情况的? Best, Hailong Wang 在 2020-11-04 17:20:23,"夜思流年梦" <[hidden email]> 写道: >开发者好: > 目前有此场景:算不同部门的总收入和所有部门的总收入,打算把两部分SQL union all ,但是实际情况发现 union all的时候会丢一部分数据,要么是各个部门的数据少了,要么是所有部门的总收入少了 > 如果把union all 的两段SQL 分别独立出来,插入同一张表,那么数据就是正常的,不知道是否是bug还是使用方法不对 > > > > >原sql : > > >insert into dws_XXXX > > >select >0 as id >,cast (DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') as bigint) as ftime >,case >when dept_name like '%XX%' then 'X1' >when dept_name = 'xXX' then 'X2' >else 'X3' end as paytype >,count(orderid) as paynum_h >,round(sum(amt)) as paymoney_h >from dwd_XXX >where write_time >=DATE_FORMAT(LOCALTIMESTAMP, 'yyyy-MM-dd') >group by >DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH'), >case >when dept_name like '%XX%' then 'X1' >when dept_name = 'xXX' then 'X2' >else 'X3' end ; > > > > >union all > > > > >select 0 as id >,cast (DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') as int) as ftime >,'all' as paytype >,count(orderid) as paynum_h >,round(sum(amt)) as paymoney_h >from dwd_XXX >where write_time >=DATE_FORMAT(LOCALTIMESTAMP, 'yyyy-MM-dd') >group by DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') ; |
In reply to this post by 史 正超-2
哦,不好意思,因为我把这条SQL是分成两段了,所以在恢复原SQL的时候没有把 分号去掉 ; 当时在union的时候,中间是不会有分号的,不然也提不上去 在 2020-11-05 10:00:01,"史 正超" <[hidden email]> 写道: >你的union all上面,也就是insert into 的第一条select 末尾有 分号 ‘;’,, 这样的话我感觉第二条select是不会执行的。还有另一个问题是你把分号去掉,我感觉数据会被 覆盖的。因为我最近也有使用union all的场景。 >我觉得你应该这样组装 你的sql : >```sql > >Insert into xxx >Select > d1, > d2, > count(1) >From ( > Select * from a > Union all > Select * from b, >) >Group by d1, d2 > >``` > >发送自 Windows 10 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>应用 > >发件人: 夜思流年梦<mailto:[hidden email]> >发送时间: 2020年11月4日 18:21 >收件人: [hidden email]<mailto:[hidden email]> >主题: union all 丢失部分数据 > >开发者好: > 目前有此场景:算不同部门的总收入和所有部门的总收入,打算把两部分SQL union all ,但是实际情况发现 union all的时候会丢一部分数据,要么是各个部门的数据少了,要么是所有部门的总收入少了 > 如果把union all 的两段SQL 分别独立出来,插入同一张表,那么数据就是正常的,不知道是否是bug还是使用方法不对 > > > > >原sql : > > >insert into dws_XXXX > > >select >0 as id >,cast (DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') as bigint) as ftime >,case >when dept_name like '%XX%' then 'X1' >when dept_name = 'xXX' then 'X2' >else 'X3' end as paytype >,count(orderid) as paynum_h >,round(sum(amt)) as paymoney_h >from dwd_XXX >where write_time >=DATE_FORMAT(LOCALTIMESTAMP, 'yyyy-MM-dd') >group by >DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH'), >case >when dept_name like '%XX%' then 'X1' >when dept_name = 'xXX' then 'X2' >else 'X3' end ; > > > > >union all > > > > >select 0 as id >,cast (DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') as int) as ftime >,'all' as paytype >,count(orderid) as paynum_h >,round(sum(amt)) as paymoney_h >from dwd_XXX >where write_time >=DATE_FORMAT(LOCALTIMESTAMP, 'yyyy-MM-dd') >group by DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') ; > |
In reply to this post by hailongwang
flink 版本是1.11的版本了 在 2020-11-05 00:02:12,"hailongwang" <[hidden email]> 写道: >Hi liaobiao, > > >你的 flink 版本是什么呢? >根据你的 SQL,如果是版本是 <= 1.10 的话,会根据 MetaDataHander 识别出你的 group by 后面的 key 作为 upsert key,这样就会产生覆盖的情况。 >你看下结果是否是这种情况的? > > >Best, >Hailong Wang > > > > >在 2020-11-04 17:20:23,"夜思流年梦" <[hidden email]> 写道: >>开发者好: >> 目前有此场景:算不同部门的总收入和所有部门的总收入,打算把两部分SQL union all ,但是实际情况发现 union all的时候会丢一部分数据,要么是各个部门的数据少了,要么是所有部门的总收入少了 >> 如果把union all 的两段SQL 分别独立出来,插入同一张表,那么数据就是正常的,不知道是否是bug还是使用方法不对 >> >> >> >> >>原sql : >> >> >>insert into dws_XXXX >> >> >>select >>0 as id >>,cast (DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') as bigint) as ftime >>,case >>when dept_name like '%XX%' then 'X1' >>when dept_name = 'xXX' then 'X2' >>else 'X3' end as paytype >>,count(orderid) as paynum_h >>,round(sum(amt)) as paymoney_h >>from dwd_XXX >>where write_time >=DATE_FORMAT(LOCALTIMESTAMP, 'yyyy-MM-dd') >>group by >>DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH'), >>case >>when dept_name like '%XX%' then 'X1' >>when dept_name = 'xXX' then 'X2' >>else 'X3' end ; >> >> >> >> >>union all >> >> >> >> >>select 0 as id >>,cast (DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') as int) as ftime >>,'all' as paytype >>,count(orderid) as paynum_h >>,round(sum(amt)) as paymoney_h >>from dwd_XXX >>where write_time >=DATE_FORMAT(LOCALTIMESTAMP, 'yyyy-MM-dd') >>group by DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') ; |
可以确认下 union all 之后的数据是不是根据 group by 的 key 相互覆盖的情况。 在 2020-11-05 13:24:20,"夜思流年梦" <[hidden email]> 写道: > > > > > > > > > >flink 版本是1.11的版本了 > > > > > > > > >在 2020-11-05 00:02:12,"hailongwang" <[hidden email]> 写道: >>Hi liaobiao, >> >> >>你的 flink 版本是什么呢? >>根据你的 SQL,如果是版本是 <= 1.10 的话,会根据 MetaDataHander 识别出你的 group by 后面的 key 作为 upsert key,这样就会产生覆盖的情况。 >>你看下结果是否是这种情况的? >> >> >>Best, >>Hailong Wang >> >> >> >> >>在 2020-11-04 17:20:23,"夜思流年梦" <[hidden email]> 写道: >>>开发者好: >>> 目前有此场景:算不同部门的总收入和所有部门的总收入,打算把两部分SQL union all ,但是实际情况发现 union all的时候会丢一部分数据,要么是各个部门的数据少了,要么是所有部门的总收入少了 >>> 如果把union all 的两段SQL 分别独立出来,插入同一张表,那么数据就是正常的,不知道是否是bug还是使用方法不对 >>> >>> >>> >>> >>>原sql : >>> >>> >>>insert into dws_XXXX >>> >>> >>>select >>>0 as id >>>,cast (DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') as bigint) as ftime >>>,case >>>when dept_name like '%XX%' then 'X1' >>>when dept_name = 'xXX' then 'X2' >>>else 'X3' end as paytype >>>,count(orderid) as paynum_h >>>,round(sum(amt)) as paymoney_h >>>from dwd_XXX >>>where write_time >=DATE_FORMAT(LOCALTIMESTAMP, 'yyyy-MM-dd') >>>group by >>>DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH'), >>>case >>>when dept_name like '%XX%' then 'X1' >>>when dept_name = 'xXX' then 'X2' >>>else 'X3' end ; >>> >>> >>> >>> >>>union all >>> >>> >>> >>> >>>select 0 as id >>>,cast (DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') as int) as ftime >>>,'all' as paytype >>>,count(orderid) as paynum_h >>>,round(sum(amt)) as paymoney_h >>>from dwd_XXX >>>where write_time >=DATE_FORMAT(LOCALTIMESTAMP, 'yyyy-MM-dd') >>>group by DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') ; |
Free forum by Nabble | Edit this page |