[DISCUSS] FLIP-162: Consistent Flink SQL time function behavior

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] FLIP-162: Consistent Flink SQL time function behavior

Leonard Xu
Hello, everyone

I want to start the discussion of FLIP-162: Consistent Flink SQL time function behavior[1].
We’ve some initial discussion of several problematic functions in dev mail list[2], and I think it's the right time to resolve them by a FLIP.  
 
Currently some time function behaviors are wired to user, user can not get local date/time/timestamp in their local time zone for time functions:
CURRENT_DATE
CURRENT_TIME
CURRENT_TIMESTAMP
NOW()
PROCTIME()
Assume user's clock time is '2021-01-20 07:52:52.270' in Beijing time(UTC+8), currently the unexpected values are returned when user SELECT above functions in Flink SQL client

Flink SQL> SELECT NOW(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE, CURRENT_TIME;
+-------------------------+-------------------------+-------------------------+--------------+--------------+
|                  NOW()   |              PROCTIME() |       CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME |
+-------------------------+-------------------------+-------------------------+--------------+--------------+
| 2021-01-19T23:52:52.270 | 2021-01-19T23:52:52.270 | 2021-01-19T23:52:52.270 |   2021-01-19 | 23:52:52.270 |
+-------------------------+-------------------------+-------------------------+--------------+--------------+

Besides, the window with interval one day width based on PROCTIME() can not collect correct data that belongs to the date '2021-01-20', because some data was assigned to window '2021-01-19' due to the PROCTIME() does not return local TIMESTAMP as user expected.

These problems come from these time-related functions like PROCTIME(), NOW(), CURRENT_DATE, CURRENT_TIME and CURRENT_TIMESTAMP are returning time values based on UTC+0 time zone, this is an incorrect behavior from my investigation[3].
I Invested all Flink time-related functions and compared with other DB vendors like Pg,Presto, Hive, Spark, Snowflake, this topic will lead to a comparison of the three types, i.e.
 TIMESTAMP/TIMESTAMP WITHOUT TIME ZONE
 TIMESTAMP WITH LOCAL TIME ZONE
 TIMESTAMP WITH TIME ZONE
In order to better understand above three types, I wrote a document[4] to help understand them better. You will found the behavior of them is same with in Hadoop ecosystem from the document.The document is detailed and pretty long, it’s necessary to make the semantics clear(You can focus on the FLIP and skip the document).

In one word, to correct the behavior of above functions, we can change the function return type or function return value. Both of them are valid because SQL:2011 does not specify the function return type, and every SQL engine vendor has its own implementation. For example the CURRENT_TIMESTAMP function in the document[3], Spark, Presto, Snowflake have different behaviors.

I tend to only change the return value for these problematic functions and introduce an option for compatibility consideration, the detailed proposal can be found in FLIP-162[1].  
After corrected these function, user can get their expected return values as following:

Flink SQL> SELECT NOW(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE, CURRENT_TIME;
+-------------------------+-------------------------+-------------------------+--------------+--------------+
|                  NOW()   |              PROCTIME() |       CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME |
+-------------------------+-------------------------+-------------------------+--------------+--------------+
| 2021-01-20T07:52:52.270 | 2021-01-20T07:52:52.270 | 2021-01-20T07:52:52.270 |   2021-01-20 | 07:52:52.270 |
+-------------------------+-------------------------+-------------------------+--------------+--------------+

Looking forward to your feedback.

Best,
Leonard

[1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-162%3A+Consistent+Flink+SQL+time+function+behavior
[2] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Correct-time-related-function-behavior-in-Flink-SQL-tc47989.html 
[3] https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing
[4] https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing