union all 丢失部分数据

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

union all 丢失部分数据

夜思流年梦
开发者好:
               目前有此场景:算不同部门的总收入和所有部门的总收入,打算把两部分SQL union all  ,但是实际情况发现 union all的时候会丢一部分数据,要么是各个部门的数据少了,要么是所有部门的总收入少了
               如果把union  all 的两段SQL 分别独立出来,插入同一张表,那么数据就是正常的,不知道是否是bug还是使用方法不对




原sql :


insert into dws_XXXX


select
0 as id
,cast (DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') as bigint) as ftime
,case
when dept_name like '%XX%' then 'X1'
when dept_name = 'xXX' then 'X2'
else 'X3' end as paytype
,count(orderid) as paynum_h
,round(sum(amt)) as paymoney_h
from dwd_XXX
where write_time >=DATE_FORMAT(LOCALTIMESTAMP, 'yyyy-MM-dd')
group by
DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH'),
case
when dept_name like '%XX%' then 'X1'
when dept_name = 'xXX' then 'X2'
else 'X3' end ;




union all




select 0 as id
,cast (DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') as int) as ftime
,'all' as paytype
,count(orderid) as paynum_h  
,round(sum(amt)) as paymoney_h  
from dwd_XXX
where write_time >=DATE_FORMAT(LOCALTIMESTAMP, 'yyyy-MM-dd')
group by DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') ;
Reply | Threaded
Open this post in threaded view
|

回复: union all 丢失部分数据

史 正超-2
你的union all上面,也就是insert into 的第一条select 末尾有 分号 ‘;’,, 这样的话我感觉第二条select是不会执行的。还有另一个问题是你把分号去掉,我感觉数据会被 覆盖的。因为我最近也有使用union all的场景。
我觉得你应该这样组装 你的sql :
```sql

Insert into xxx
Select
       d1,
       d2,
       count(1)
From (
       Select * from a
       Union all
       Select * from b,
)
Group by d1, d2

```

发送自 Windows 10 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>应用

发件人: 夜思流年梦<mailto:[hidden email]>
发送时间: 2020年11月4日 18:21
收件人: [hidden email]<mailto:[hidden email]>
主题: union all 丢失部分数据

开发者好:
               目前有此场景:算不同部门的总收入和所有部门的总收入,打算把两部分SQL union all  ,但是实际情况发现 union all的时候会丢一部分数据,要么是各个部门的数据少了,要么是所有部门的总收入少了
               如果把union  all 的两段SQL 分别独立出来,插入同一张表,那么数据就是正常的,不知道是否是bug还是使用方法不对




原sql :


insert into dws_XXXX


select
0 as id
,cast (DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') as bigint) as ftime
,case
when dept_name like '%XX%' then 'X1'
when dept_name = 'xXX' then 'X2'
else 'X3' end as paytype
,count(orderid) as paynum_h
,round(sum(amt)) as paymoney_h
from dwd_XXX
where write_time >=DATE_FORMAT(LOCALTIMESTAMP, 'yyyy-MM-dd')
group by
DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH'),
case
when dept_name like '%XX%' then 'X1'
when dept_name = 'xXX' then 'X2'
else 'X3' end ;




union all




select 0 as id
,cast (DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') as int) as ftime
,'all' as paytype
,count(orderid) as paynum_h
,round(sum(amt)) as paymoney_h
from dwd_XXX
where write_time >=DATE_FORMAT(LOCALTIMESTAMP, 'yyyy-MM-dd')
group by DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') ;

Reply | Threaded
Open this post in threaded view
|

Re:union all 丢失部分数据

hailongwang
In reply to this post by 夜思流年梦
Hi liaobiao,


你的 flink 版本是什么呢?
根据你的 SQL,如果是版本是 <= 1.10 的话,会根据 MetaDataHander 识别出你的 group by 后面的 key 作为 upsert key,这样就会产生覆盖的情况。
你看下结果是否是这种情况的?


Best,
Hailong Wang




在 2020-11-04 17:20:23,"夜思流年梦" <[hidden email]> 写道:

>开发者好:
>               目前有此场景:算不同部门的总收入和所有部门的总收入,打算把两部分SQL union all  ,但是实际情况发现 union all的时候会丢一部分数据,要么是各个部门的数据少了,要么是所有部门的总收入少了
>               如果把union  all 的两段SQL 分别独立出来,插入同一张表,那么数据就是正常的,不知道是否是bug还是使用方法不对
>
>
>
>
>原sql :
>
>
>insert into dws_XXXX
>
>
>select
>0 as id
>,cast (DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') as bigint) as ftime
>,case
>when dept_name like '%XX%' then 'X1'
>when dept_name = 'xXX' then 'X2'
>else 'X3' end as paytype
>,count(orderid) as paynum_h
>,round(sum(amt)) as paymoney_h
>from dwd_XXX
>where write_time >=DATE_FORMAT(LOCALTIMESTAMP, 'yyyy-MM-dd')
>group by
>DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH'),
>case
>when dept_name like '%XX%' then 'X1'
>when dept_name = 'xXX' then 'X2'
>else 'X3' end ;
>
>
>
>
>union all
>
>
>
>
>select 0 as id
>,cast (DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') as int) as ftime
>,'all' as paytype
>,count(orderid) as paynum_h  
>,round(sum(amt)) as paymoney_h  
>from dwd_XXX
>where write_time >=DATE_FORMAT(LOCALTIMESTAMP, 'yyyy-MM-dd')
>group by DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') ;
Reply | Threaded
Open this post in threaded view
|

Re:回复: union all 丢失部分数据

夜思流年梦
In reply to this post by 史 正超-2






哦,不好意思,因为我把这条SQL是分成两段了,所以在恢复原SQL的时候没有把 分号去掉 ;


当时在union的时候,中间是不会有分号的,不然也提不上去











在 2020-11-05 10:00:01,"史 正超" <[hidden email]> 写道:

>你的union all上面,也就是insert into 的第一条select 末尾有 分号 ‘;’,, 这样的话我感觉第二条select是不会执行的。还有另一个问题是你把分号去掉,我感觉数据会被 覆盖的。因为我最近也有使用union all的场景。
>我觉得你应该这样组装 你的sql :
>```sql
>
>Insert into xxx
>Select
>       d1,
>       d2,
>       count(1)
>From (
>       Select * from a
>       Union all
>       Select * from b,
>)
>Group by d1, d2
>
>```
>
>发送自 Windows 10 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>应用
>
>发件人: 夜思流年梦<mailto:[hidden email]>
>发送时间: 2020年11月4日 18:21
>收件人: [hidden email]<mailto:[hidden email]>
>主题: union all 丢失部分数据
>
>开发者好:
>               目前有此场景:算不同部门的总收入和所有部门的总收入,打算把两部分SQL union all  ,但是实际情况发现 union all的时候会丢一部分数据,要么是各个部门的数据少了,要么是所有部门的总收入少了
>               如果把union  all 的两段SQL 分别独立出来,插入同一张表,那么数据就是正常的,不知道是否是bug还是使用方法不对
>
>
>
>
>原sql :
>
>
>insert into dws_XXXX
>
>
>select
>0 as id
>,cast (DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') as bigint) as ftime
>,case
>when dept_name like '%XX%' then 'X1'
>when dept_name = 'xXX' then 'X2'
>else 'X3' end as paytype
>,count(orderid) as paynum_h
>,round(sum(amt)) as paymoney_h
>from dwd_XXX
>where write_time >=DATE_FORMAT(LOCALTIMESTAMP, 'yyyy-MM-dd')
>group by
>DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH'),
>case
>when dept_name like '%XX%' then 'X1'
>when dept_name = 'xXX' then 'X2'
>else 'X3' end ;
>
>
>
>
>union all
>
>
>
>
>select 0 as id
>,cast (DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') as int) as ftime
>,'all' as paytype
>,count(orderid) as paynum_h
>,round(sum(amt)) as paymoney_h
>from dwd_XXX
>where write_time >=DATE_FORMAT(LOCALTIMESTAMP, 'yyyy-MM-dd')
>group by DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') ;
>
Reply | Threaded
Open this post in threaded view
|

Re:Re:union all 丢失部分数据

夜思流年梦
In reply to this post by hailongwang









flink 版本是1.11的版本了








在 2020-11-05 00:02:12,"hailongwang" <[hidden email]> 写道:

>Hi liaobiao,
>
>
>你的 flink 版本是什么呢?
>根据你的 SQL,如果是版本是 <= 1.10 的话,会根据 MetaDataHander 识别出你的 group by 后面的 key 作为 upsert key,这样就会产生覆盖的情况。
>你看下结果是否是这种情况的?
>
>
>Best,
>Hailong Wang
>
>
>
>
>在 2020-11-04 17:20:23,"夜思流年梦" <[hidden email]> 写道:
>>开发者好:
>>               目前有此场景:算不同部门的总收入和所有部门的总收入,打算把两部分SQL union all  ,但是实际情况发现 union all的时候会丢一部分数据,要么是各个部门的数据少了,要么是所有部门的总收入少了
>>               如果把union  all 的两段SQL 分别独立出来,插入同一张表,那么数据就是正常的,不知道是否是bug还是使用方法不对
>>
>>
>>
>>
>>原sql :
>>
>>
>>insert into dws_XXXX
>>
>>
>>select
>>0 as id
>>,cast (DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') as bigint) as ftime
>>,case
>>when dept_name like '%XX%' then 'X1'
>>when dept_name = 'xXX' then 'X2'
>>else 'X3' end as paytype
>>,count(orderid) as paynum_h
>>,round(sum(amt)) as paymoney_h
>>from dwd_XXX
>>where write_time >=DATE_FORMAT(LOCALTIMESTAMP, 'yyyy-MM-dd')
>>group by
>>DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH'),
>>case
>>when dept_name like '%XX%' then 'X1'
>>when dept_name = 'xXX' then 'X2'
>>else 'X3' end ;
>>
>>
>>
>>
>>union all
>>
>>
>>
>>
>>select 0 as id
>>,cast (DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') as int) as ftime
>>,'all' as paytype
>>,count(orderid) as paynum_h  
>>,round(sum(amt)) as paymoney_h  
>>from dwd_XXX
>>where write_time >=DATE_FORMAT(LOCALTIMESTAMP, 'yyyy-MM-dd')
>>group by DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') ;
Reply | Threaded
Open this post in threaded view
|

Re:Re:Re:union all 丢失部分数据

hailongwang


可以确认下 union all 之后的数据是不是根据 group by 的 key 相互覆盖的情况。

在 2020-11-05 13:24:20,"夜思流年梦" <[hidden email]> 写道:

>
>
>
>
>
>
>
>
>
>flink 版本是1.11的版本了
>
>
>
>
>
>
>
>
>在 2020-11-05 00:02:12,"hailongwang" <[hidden email]> 写道:
>>Hi liaobiao,
>>
>>
>>你的 flink 版本是什么呢?
>>根据你的 SQL,如果是版本是 <= 1.10 的话,会根据 MetaDataHander 识别出你的 group by 后面的 key 作为 upsert key,这样就会产生覆盖的情况。
>>你看下结果是否是这种情况的?
>>
>>
>>Best,
>>Hailong Wang
>>
>>
>>
>>
>>在 2020-11-04 17:20:23,"夜思流年梦" <[hidden email]> 写道:
>>>开发者好:
>>>               目前有此场景:算不同部门的总收入和所有部门的总收入,打算把两部分SQL union all  ,但是实际情况发现 union all的时候会丢一部分数据,要么是各个部门的数据少了,要么是所有部门的总收入少了
>>>               如果把union  all 的两段SQL 分别独立出来,插入同一张表,那么数据就是正常的,不知道是否是bug还是使用方法不对
>>>
>>>
>>>
>>>
>>>原sql :
>>>
>>>
>>>insert into dws_XXXX
>>>
>>>
>>>select
>>>0 as id
>>>,cast (DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') as bigint) as ftime
>>>,case
>>>when dept_name like '%XX%' then 'X1'
>>>when dept_name = 'xXX' then 'X2'
>>>else 'X3' end as paytype
>>>,count(orderid) as paynum_h
>>>,round(sum(amt)) as paymoney_h
>>>from dwd_XXX
>>>where write_time >=DATE_FORMAT(LOCALTIMESTAMP, 'yyyy-MM-dd')
>>>group by
>>>DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH'),
>>>case
>>>when dept_name like '%XX%' then 'X1'
>>>when dept_name = 'xXX' then 'X2'
>>>else 'X3' end ;
>>>
>>>
>>>
>>>
>>>union all
>>>
>>>
>>>
>>>
>>>select 0 as id
>>>,cast (DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') as int) as ftime
>>>,'all' as paytype
>>>,count(orderid) as paynum_h  
>>>,round(sum(amt)) as paymoney_h  
>>>from dwd_XXX
>>>where write_time >=DATE_FORMAT(LOCALTIMESTAMP, 'yyyy-MM-dd')
>>>group by DATE_FORMAT(LOCALTIMESTAMP, 'yyyyMMddHH') ;