Query Folding is the ability to generate a single query statement from your Power Query transformations. It allows transformations being performed within Power Query to be pushed back to the database. Many data sources support the concept of query folding including relational databases like SQL Server.
There are a number of transformations in Power Query which support query folding including:
- Removing columns;
- Filtering rows;
- Groupings;
- Renaming columns.
Query folding may occur for an entire Power Query query, or for a subset of its steps. When query folding cannot be achieved the Power Query mashup engine must compensate by processing data transformations itself. This process can involve retrieving source query results. If the source results are large, this can be very resource intensive and slow.
In some cases users will create their Power BI models in Power Query from one source query. They may not have access to the underlying database to create their own SQL objects and get left with having to write a SQL Statement themselves. This process might look like this.
When a SQL query command is used to retrieve the data query folding will not be available to us out of the box.
This can cause the Power Query refresh to become incredibly slow and resource intensive. If we look at an example in Power BI Desktop of this scenario; right clicking on a step we can see the View Native Query option is disabled. This indicates that query folding is disabled.
If we were to replace the SQL Command with a connection to a table or view in the database then the situation is different. Power Query now has the option to perform query folding.
The View Native Query option is now enabled. If we open it we can see the query that has been sent back to the database to be run. We created a de-duplication step. We can see the engine has translated this into the DISTINCT SQL keyword.
This also happens with a view. In the below we have performed the same transformations against a view and we get the same query folding action.
We can think of query folding doing the below in these instances.
If we create model tables by performing Power Query transformations on the same data, but one with a SQL statement command, one against a table, and one against a view we can see the table and view load in quickly, both with 15 rows. The one using the SQL statement is still busy loading all the rows from the statement. It could not perform query folding and is much slower to process as it is loading the entire dataset into memory before performing the transformations.
Let’s take a look at another example. Here our model is being created in Power Query from a single SQL statement. The model has been created using a series of reference commands in Power Query to build up a set of dimensions and fact from the base query. Since we are referencing a single SQL statement query folding is not in play.
If we check the activity on our database we can see the exact same statement being 6 times. This is incredibly inefficient process. If we had query folding we could see 6 different queries running which were extracting much smaller datasets in most instances.
Query folding is a great tool to improve performance within Power Query. Be careful when it is not being used as large datasets can cause significant performance bottlenecks.