SQL interval join 问题

classic Classic list List threaded Threaded
3 messages Options
Mic
Reply | Threaded
Open this post in threaded view
|

SQL interval join 问题

Mic
现有 SQL 语句如下:
create table source1(
  id varchar PRIMARY KEY,
  a varchar,
  proctime AS PROCTIME()
) with (
    'connector' = 'kafka'
    ...
);
create table source2(
  id varchar PRIMARY KEY,
  a varchar,
  proctime AS PROCTIME()
) with (
    'connector' = 'kafka'
    ...
);
select
  case
    when s1.id is not null then s1.id
    else s2.id
  end as ids,
  s1.a, s2.b
from source1 as s1 full outer join source2 as s2 on s1.id = s2.id where s1.proctime between s2.proctime - INTERVAL '5' SECOND and s2.proctime + INTERVAL '5' SECOND;


最后的 join 语句预期是 如果两个source的消息<id, a>, <id, b>先后到达时间超过 10 秒,则输出<id, a>, <id, b> 两条消息。


目前的观察结果是,如果两条消息<id, a>, <id, b>先后到达时间超过10 秒,输出为:<id, a>, <id, a, b>
为何超过 10 秒后,仍然会输出 <id, a, b> ?
Mic
Reply | Threaded
Open this post in threaded view
|

Re:SQL interval join 问题

Mic
搜了一下,目前是有一个 issue 看起来相关,https://issues.apache.org/jira/browse/FLINK-18996<br/><br/>不知道处理进度如何?
在 2020-10-19 15:03:54,"Mic" <[hidden email]> 写道:

>现有 SQL 语句如下:
>create table source1(
>  id varchar PRIMARY KEY,
>  a varchar,
>  proctime AS PROCTIME()
>) with (
>    'connector' = 'kafka'
>    ...
>);
>create table source2(
>  id varchar PRIMARY KEY,
>  a varchar,
>  proctime AS PROCTIME()
>) with (
>    'connector' = 'kafka'
>    ...
>);
>select
>  case
>    when s1.id is not null then s1.id
>    else s2.id
>  end as ids,
>  s1.a, s2.b
>from source1 as s1 full outer join source2 as s2 on s1.id = s2.id where s1.proctime between s2.proctime - INTERVAL '5' SECOND and s2.proctime + INTERVAL '5' SECOND;
>
>
>最后的 join 语句预期是 如果两个source的消息<id, a>, <id, b>先后到达时间超过 10 秒,则输出<id, a>, <id, b> 两条消息。
>
>
>目前的观察结果是,如果两条消息<id, a>, <id, b>先后到达时间超过10 秒,输出为:<id, a>, <id, a, b>
>为何超过 10 秒后,仍然会输出 <id, a, b> ?
Reply | Threaded
Open this post in threaded view
|

Re: SQL interval join 问题

Benchao Li-2
Hi Mic,

感谢关注这个issue,这个issue当前还在讨论中。
我认为问题已经定位清楚了,抄送了其他的committer同学进一步讨论确认。

Mic <[hidden email]> 于2020年10月19日周一 下午3:51写道:

> 搜了一下,目前是有一个 issue 看起来相关,https://issues.apache.org/jira/browse/FLINK-18996
> <br/><br/>不知道处理进度如何?
> 在 2020-10-19 15:03:54,"Mic" <[hidden email]> 写道:
> >现有 SQL 语句如下:
> >create table source1(
> >  id varchar PRIMARY KEY,
> >  a varchar,
> >  proctime AS PROCTIME()
> >) with (
> >    'connector' = 'kafka'
> >    ...
> >);
> >create table source2(
> >  id varchar PRIMARY KEY,
> >  a varchar,
> >  proctime AS PROCTIME()
> >) with (
> >    'connector' = 'kafka'
> >    ...
> >);
> >select
> >  case
> >    when s1.id is not null then s1.id
> >    else s2.id
> >  end as ids,
> >  s1.a, s2.b
> >from source1 as s1 full outer join source2 as s2 on s1.id = s2.id where
> s1.proctime between s2.proctime - INTERVAL '5' SECOND and s2.proctime +
> INTERVAL '5' SECOND;
> >
> >
> >最后的 join 语句预期是 如果两个source的消息<id, a>, <id, b>先后到达时间超过 10 秒,则输出<id, a>,
> <id, b> 两条消息。
> >
> >
> >目前的观察结果是,如果两条消息<id, a>, <id, b>先后到达时间超过10 秒,输出为:<id, a>, <id, a, b>
> >为何超过 10 秒后,仍然会输出 <id, a, b> ?
>


--

Best,
Benchao Li