GROUP BY caluse optimization works on the premise that if all columns in the GROUP BY clause are included in an index, you can use the index upon executing a query, so you don't execute a separate sorting job. The columns in the GROUP BY clause must exist in front side of the column forming the index.
SELECT * FROM tab WHERE col1 > 0 GROUP BY col1,col2
You can use the index if the column condition is a constant although the column consisting of the idex doesn't exist in the GROUP BY clause.
SELECT * FROM tab WHERE col2=val GROUP BY col1,col3
If there is any index that consists of tab(col1, col2, col3) in the above example, use the index for optimizing GROUP BY.
Row sorting by GROUP BY is not required, because you can get the result as the requested order on the index scan.
If the index consisting of the GROUP BY column and the first column of the index is NOT NULL, even though there is no WHERE clause, the GROUP BY optimization will be applied.
GROUP BY optimization is applied only when MIN() or MAX() are used in an aggregate function, and to use the two aggregate functions together, an identical column must be used.
CREATE INDEX ON T(a, b, c);
SELECT a, MIN(b), c, MAX(b) FROM T WHERE a > 18 GROUP BY a, b;
CREATE TABLE tab (i INT, j INT, k INT);
CREATE INDEX ON tab (j,k);
INSERT INTO tab VALUES (1,2,3),(6,4,2),(3,4,1),(5,2,1),(1,5,5),(2,6,6),(3,5,4);
The following example shows that indexes consisting of tab(j,k) are used and no separate sorting process is required because GROUP BY is executed by j and k columns.
SELECT i,j,k FROM tab WHERE j > 0 GROUP BY j,k;
-- the selection from the query plan dump shows that the index i_tab_j_k was used and sorting was not necessary
-- (/* ---> skip GROUP BY */)
Query plan:
iscan
class: tab node[0]
index: i_tab_j_k term[0]
sort: 2 asc, 3 asc
cost: fixed 0(0.0/0.0) var 1(0.0/1.0) card 0
Query stmt:
select tab.i, tab.j, tab.k from tab tab where ((tab.j> ?:0 )) group by tab.j, tab.k
/* ---> skip GROUP BY */
i j k
5 2 1
1 2 3
3 4 1
6 4 2
3 5 4
1 5 5
2 6 6
The following example shows that an index consisting of tab(j,k) is used and no separate sorting process is required while GROUP BY is executed by j and k columns, no condition exists for j, and j column has NOT NULL attribute.
ALTER TABLE tab CHANGE COLUMN j j INT NOT NULL;
SELECT * FROM tab GROUP BY j,k;
-- the selection from the query plan dump shows that the index i_tab_j_k was used (since j has the NOT NULL constraint )
-- and sorting was not necessary (/* ---> skip GROUP BY */)
Query plan:
iscan
class: tab node[0]
index: i_tab_j_k
sort: 2 asc, 3 asc
cost: fixed 0(0.0/0.0) var 1(0.0/1.0) card 0
Query stmt: select tab.i, tab.j, tab.k from tab tab group by tab.j, tab.k
/* ---> skip GROUP BY */
=== <Result of SELECT Command in Line 1> ===
i j k
=======================================
5 2 1
1 2 3
3 4 1
6 4 2
3 5 4
1 5 5
2 6 6