|
Hi,Community:
目前sql-client和sql-gateway只能支持单条SQL statement,这样就没法利用multiple insert的优化。如下:
INSERT INTO RubberOrders SELECT product, amount FROM Orders WHERE product LIKE '%Rubber%'
INSERT INTO GlassOrders SELECT product, amount FROM Orders WHERE product LIKE '%Glass%'
两条语句会分别生成两个Job,对Orders表读了两次。如果Orders表是Kafka,消费两次的成本是很大的浪费,是否可以优化下?
一种方案是,在SQL 语法层面上支持,如Hive的multiple insert:
FROM Orders INSERT INTO RubberOrders SELECT product, amount WHERE product LIKE '%Rubber%' INSERT INTO GlassOrders SELECT product, amount WHERE product LIKE '%Glass%'
另一种方案是,让sql-client/sql-gateway支持multiple SQL statement,这样即使语法不支持,两条INSERT语句也可以被优化成一个Job,只scan一次Orders表。
不知道社区后续是否有计划针对这个feature优化?
附:目前Table API 支持multiple insert的优化
// run multiple INSERT queries on the registered source table and emit the result to registered sink tables
StatementSet stmtSet = tEnv.createStatementSet();
// only single INSERT query can be accepted by `addInsertSql` method
stmtSet.addInsertSql(
"INSERT INTO RubberOrders SELECT product, amount FROM Orders WHERE product LIKE '%Rubber%'");
stmtSet.addInsertSql(
"INSERT INTO GlassOrders SELECT product, amount FROM Orders WHERE product LIKE '%Glass%'");
// execute all statements together
TableResult tableResult2 = stmtSet.execute();
|