Pascal Bellerose on 15 Oct 2021 14:40:44
This is coming from past experiences using RStudio and practically any other programming languages.
Make it so that when I read the data source on the first step, it will cache the data read and use this cache for future steps instead of downloading a new preview everytime.
For example, if I'm setting the data profiling on the full dataset, it should keep a cached copy of the dataset in memory to make it faster to process.
That's how I would do it in RStudio (any other R programming interface).
I would load the source on the first line, then transform it by executing any of the steps in the script.
example:
dset <= read_csv("c:\myfile.csv")
head(dset,10)
This will store the data read in "myfile.csv" into a data.table object named "dset"
I can later refer to this object and view a preview of its contents by using "head" function.
I can even tell it how many records I want in the preview.
This is very effective even when reading large files (10M lines).
I think the whole issue is coming from the fact that PowerQuery M uses an iterative way of making changes to data, therefore it has to read from source and apply all steps again every time I click on a step and that's what making it so damn slow.
This is why I'm taking the time to put in an idea.