Power BI - Optimising memory: how to split columns

In this blog series, we’ll explore some easy ways to reduce the size of a Power BI model.

Our previous blog posts:

The impact of duplicates in a column

In our previous blog posts, we showed you how to resize the model to some extent with a few easy fixes.  Removing unneeded columns is an important step in this process, because Power BI mainly works with columns.

Sometimes, though, you need to do the exact opposite: instead of reducing the number of columns, it can be more efficient to split one column into several. At first glance, this seems to contradict the last blog post, but it doesn’t.

Let’s take another look at the result in our last post:

In that post, we mainly focused on the total size and didn’t pay much attention to the size of each individual column.  The screenshot above shows that the total size of all columns is about 16.8 MB, of which 14 MB are squeezed into a single column: SalesDate.

To understand this, we need to look at the first value in PivotTable: the ‘cardinality’ or amount of unique values in this column. You’ll see a correlation between the number of unique values and the size of a column.  In the background, Power BI records the number of unique values in most columns in a table.  In other words: the more duplicate values, the smaller the underlying table, and the less space the relevant column takes up.

Even though the field is named SalesDate, it contains two values: the date and the time of a sale.

If it contains data from the last ten years, this theoretically means that the column contains 10 * 365 (# days in a year) * 86400 (# years * # days per year * # seconds per day) different values. If we were to split this column into two (SalesDate and SalesTime), these two columns would have a much smaller impact because they’d contain more duplicate values. 
Additionally, you’d need to consider whether sales time is a useful metric at all.  If not, you could just remove this column.  If you do need to keep track of sales time, you could ask yourself if you really need to track it down to seconds.  If it’d be sufficient to know if a sale took place between 10-11 am or between 11-12 am, this column will only contain 24 unique values.

If we use Power Query to split these data, the result will look like this:

If you check the memory in Excel now, you’ll be amazed by the result: the size of our data model has been reduced from 16.8 MB to a mere 4 MB.

The SalesID column

In our last post, we discussed removing unneeded columns. I should’ve removed the SalesID column then, but I kept it around for this post.

The screenshot shows that the size of the SalesID column is about 1.7 MB out of a total of 3.9 MB.  This is because SalesID is the primary key in this table - in other words: every value in the SalesID column is unique. This makes its cardinality equal to the number of rows in the entire table.
Ask yourself: “What do I need SalesID for in my model?”  If you can’t come up with an adequate answer to this question, it’s best to remove this column from the model.

Share this blogpost
Categories: Power BI
Tags: Power BI

Also interesting for you

Leave a reply

Your email address will not be published. Required fields are marked.

Virtual workshops Office 365

Are you exploring Office 365 and do you have any questions about the software that need answering? Book one of our virtual workshops today. We’ll help you and your co-workers get up and running with Teams, Power BI, Power Automate, Project Online Professional, and other tools.

Discover the Office 365 workshops

Online Coaching: you've got questions, we have the answers

Do you have urgent questions about Excel, Project, Power BI or Adobe?  Our Office experts will gladly help you through online coaching!

Bring your knowledge and skills to a higher level

Subscribe for our monthly newsletter and receive:

  • Exclusive tips & tricks
  • Information about our courses
  • Trends on courses
  • Invitations to seminars and events

Most popular courses