30 Days to Success in Power BI: Day Three General Housekeeping
Welcome back to day three of our series on Success in Power BI! With 30 days of Power BI learning, we should be able to be successful with our use of the tool. Did you forget where we left off on day two? If so, here is the link to refresh your memory.
Today we are going to do some general housekeeping to get ready to use the data we loaded. This will help us understand Power BI desktop a little better and get us ready to create some amazing visualizations.
If you remember yesterday we clicked on the data icon on the left toolbar and then we could view our data set that we loaded from the web source. To avoid any confusion when or if we load more data, let’s first change that name of the table. We are going to rename Table 2 to Standard Batting, as it was labeled on the website.
Now we need to do a cursory check of the data. It is important to make sure the data loaded is the data that we want to report off of and also to ensure that there were no errors from some questionable web programming.
If you look at Figure 2, you’ll see we have two rows at the top for years 1952 and 1953. Those are for the minor leagues. We do not want those included in our visualizations as we are comparing Hank’s professional career and it is a generally accepted practice to include only major league statistics in that comparison due to the breadth and level of competition.
So now if you look at Figure 3, this is what we saw on the web page for Hank Aaron’s major league batting statistics and what we thought we were loading originally. The website, Baseball-Reference.com, is using some programming to hide minor league stats (as you can see a button called Show Minors). We might not have noticed this, but its an import step to note when loading data from web sources.
So we need to remove those rows so that we can get an accurate professional representation of Aaron’s performance. Sounds easy, right?
Not so fast. When we click somewhere in the grid, it highlights a column, and when we right-click to choose delete (as shown in Figure 4), it appears that the column will be deleted and not the row. Trust me, it will be the column deleted. So how do we delete the row?
Let’s click on the Edit Queries icon on the Home toolbar (or ribbon) at the top of the screen. This will load the data into a Query Editor window (giving you two active windows). We can highlight rows here but we still cannot delete them. However, if we imagine this is Excel, then we can use the filter ability to remove our the two rows. Click on the down arrow at the column header for Year. Now deselect 1952 and 1953, as shown in Figure 5. Select Ok to make it happen captain.
I want to point out here in Figure 6, that the rows are gone. But, I also want to point out the section on the right of the screen called Applied Steps. At this point, we have not committed any changes yet. However, this is a list of steps that we’ve done thus far in our manipulations. We can change that list or step through that list and watch the data change, if we choose.
So, if you click on the Changed Type step, you can see the two rows are there again. You can also click the delete icon (X marks the spot on the right of the step) to remove the step. Thus, if we decided to leave those two years in there and cancel out the changes that we made, we could then click the X and delete the Filtered Rows step. We could also do several other things here in the data if we needed to before applying the changes.
In this case, we did not really delete rows as they are filtered out just like in Excel, but they are removed from our visual data set that we will report off of. It is essentially smoke and mirrors as shown in Figure 7. This is a great stopping point. See you on day four!