Welcome back to day nine of our thirty-day series on achieving Success in Power BI! Have you forgotten where we left off from day eight? If so, here is the link to refresh your memory.
Yesterday we made an assertion about Hank Aaron’s on-base percentage (OBP) remaining fairly flat over his career that could have been caused by more walks as a result of him being a home run hitter. This is a common strategy in baseball where the pitcher will walk a home run hitter to prevent a home run when there are runners on base. We should test that theory using our data! Change your visual to the Line and clustered column chart in the Visualizations pane. Your visual should appear like Figure 1.
Now remove the batting average (BA) field from the Column values to avoid confusion as we want to see if walks (BB) correlate with on-base percentages (OB) and home runs (HR) for Hank Aaron. First off, let’s change the title to something a little more reflective of our comparison. Then, add the BB field to the Line values as depicted by the black arrow in Figure 2. Also, notice that a trend line was added for these in a dashed version of the color used to depict the walk line.
Now we will add the home run (HR) field to the Line values area as shown with the red arrow in Figure 3. It is interesting to note that according to the trend line, Aaron’s walks (BB) increased as he got older yet the home runs (HR) decreased, as we would have expected with a declining physical stature. It also doesn’t appear that there is a correlation to walks (BB) versus home runs (HR), at least not a strong one.
Lucky for us that in 1955 Major League Baseball began tracking intentional walks (IBB) as those were more related to a home run hitter’s prowess. Regular walks could be from poor pitching and were not necessarily intended to pitch around a good hitter. The distinction is important to show correlation in this theory. Also, keep in mind 1955 was Aaron’s second year so we will not see data for his rookie year.
However, there still does not seem to be a correlation as some years like at age 32 he had 44 home runs and only 15 intentional walks and his on-base percentage was lower than most other years despite being a strong home run performance year. Likewise two years later his home run total dropped to 29 but his intentional walks spiked to 23, the highest in his career. The trend for intentional walks does follow the home run trend downward.
Does this data give us an accurate analysis? If we were presenting this to the board with a couple of visualizations that we’ve created, would they make an informed decision? Would you? Stay tuned!
Welcome back to day eight of our 30 day series on Success in Power BI! Have you forgotten where we left off from day seven? If so, here is the link to refresh your memory. Let’s remove the Max and Min lines that we added on day seven and add some different analytics to our visual. You should be starting like Figure 1.
In the Analytics pane, add a Trend Line as shown in Figure 2. Notice that it inserts a dashed black line, but even more interesting is that it defaults a setting called Combine Series to On. In essence, we are seeing a trend line for both the on-base percentage and the batting average. It is as we would expect with a player aging, his performance trends downward. Change the default for Combine Series to Off and let’s see how it changes our story.
Now we have TWO trend lines as shown in Figure 3, one for OBP and one for BA. In addition, they are defaulted to the colors of the original lines but these are dashed for delineation: Automagically!!! Now when we compare those two sets of trend lines (Figures 2 and 3) you should see a different slope. The on-base percentage was more flat and consistent while the batting average dropped more rapidly than the combined line. That is fascinating and it shows that even as he aged he was still consistently able to get on base. Being a home run hitter, some of that was attributed to more walks as pitchers would rather give up an intentional walk than a home run especially with runners on base. Let’s look at that in a future day. Changing this setting altered our story and gave us more insight than a combined trend line. It may not always.
What else can we do as far as analytics are concerned? Well…we can also add Average Lines, Median Lines, and Percentile Lines. Give them all a try to see what the differences are. But, a click or two and you have added instant analytics in your visuals. Pretty nifty, huh? Stay tuned for more Power BI!
Welcome back to day seven of our series on Success in Power BI! With 30 days of Power BI learning, we should be able to be successful with our use of the desktop tool. Have you forgotten where we left off from day six? If so, here is the link to refresh your memory.
What more can we do with our visualization? What if we do not like the Line Chart visual? If you select the Area Chart visual you can quickly change your visual to the graph shown in Figure 1.
Try some other visuals. Better yet, try them all. Personally, I do not think any of the other visuals work with this set of selected values. For example, in Figure 2 we see a Clustered Column Chart which to me isn’t as easy to see the natural trend line but it works. I think the original Area Chart works best in this scenario. Let’s switch back to that visual moving forward.
Speaking of trend lines, let’s add some simple analytics such as a Min and Max line. We can easily add a Min line for the BA and the OBP. Select the third icon in the pane below the Visualizations called Analytics as shown in Figure 3 with the red arrow. Select Min Line and add two Min Lines and configure them as shown with the blue arrows. We need the colors and alignment to show in our area chart. For batting average, I used a bright yellow line and for on-base percentage a red. I then used a black font and moved it to the right and under the line. If the colors are hard to see you could switch back to a regular line chart to prevent the color clash.
Now we can do the same to add in some Max Lines using straight lines with the same respective color for either OBP or BA as showing in Figure 4. I changed these to be solid lines to avoid confusion. The visual is sort of busy but it works, right? Maybe the color choices are not perfect and those can always be tweaked. But we’ve added some base analytics to our visual. We are rock stars, right? Stay tuned for more.
Welcome back to day six of our series on Success in Power BI! With 30 days of Power BI learning, we should be able to be successful with our use of the desktop tool. Have you forgotten where we left off from day five? If so, here is the link to refresh your memory.
Before we get started from where we left off, let me show you a nifty little feature in Power BI Desktop. If you select the Data / Drill tab on the ribbon menu and select the See Data button, you can see all of the data points mapped including the Tooltips data points just below the graph as seen in Figure 1.
Let’s go through some definitions to help your understanding of baseball terminology in statistics:
At Bats (AB): a statistic that counts a plate appearance only if the batter produces a hit, an out, an error, or a fielder’s choice.
Base on Balls or Walk (BB): when a pitcher throws four balls and the batter is subsequently awarded first base.
Batting Average (BA): a statistic that is defined by the number of hits divided by the number of at bats.
Hit (H): a statistic where the batter safely reaches first base without an error or fielder’s choice.
Hit By Pitch (HBP): this occurs when a pitcher hits the batter with the ball and he is awarded first base.
On-Base Percentage (OBP): a statistics defined by the sum of hits, walks, and hit by pitches divided by the sum of at bats, walks or base on balls, hit by pitches, and sacrifice fly balls. In other words: OBP = (H + BB + HBP) / (AB + BB + HBP + SF)
We went through that exercise to help you understand the difference between batting average and on-base percentage. These are two of the main statistical comparison points for offense in baseball. Therefore,we will now add the OBP field to the graph. We can drag that from the Fields pane on the left to the Values slot in just to the right of there as shown in Figure 2. Theoretically, OBP should always be greater than BA because batters can get on base for more than just a hit, as shown above in the definitions.
Let’s make some formatting changes to clean up our addition. First, let’s change the title to reflect our added data as sown in Figure 3 with the blue arrow. Now, we can change the OBP line to green as shown with the green arrow so that in stands out in comparison to the blue of the BA. Now let’s add a legend, place it in the bottom center, increase the font size, and color it red as shown with the red arrow. We are fast becoming pros at Power BI. Did we miss anything? Take a good look.
Did you see it? The data labels are showing some crazy decimal places. Let’s tidy that up too! Go back into the dataset and change the formatting from Auto to three decimal places for OBP as shown in Figure 4. We should go ahead and do the same for SLG and OPS while we are in there in case we want to graph them later. We did this in day five for batting average.
Here in Figure 5 we see the finished product. A visualization that we can be proud of. Stay tuned tomorrow to see where we go next!
Welcome back to day five of our series on Success in Power BI! With 30 days of Power BI learning, we should be able to be successful with our use of the desktop tool. Have you forgotten where we left off from day four? If so, here is the link to refresh your memory.
So for today let’s tidy up our visualization to make it a little more appealing because a polished visual is needed to give to executives, right? The first thing we are going to do is add some tooltips. These are designated fields of data that will show when you hover over data-points on the graph. Our data tells a story, so let’s add some details to tell it properly.
In Figure 1, you can see by the red arrow that we’ve dragged over the Year and the First Awards fields from the dataset. When we highlight over a data-point (as shown in the green arrow) we will get a tooltip that shows the age first (39) then the Batting Average (BA 0.30). Now, we can also see that the Year was 1973 and we can see some awards (AS, MVP-12). The original website designates these as an All Star selection year and he also finished twelfth in the Most Valuable Player award voting that particular year. That data may not be as useful in this case but it is a great addition to see that he was 39 in 1973. It helps to explain our insight a little better.
Now let’s cleanup the data-points for the batting averages as baseball generally sees this statistics in the thousandths places, such as batting 0.300 which is pronounced three hundred. As you can see in the Figure 1 Tooltip, it shows that he batted 0.30 which is really 0.301 and that extra one makes all the difference in the world in baseball.
In order to get the tooltips to show three places, we need to open the dataset again as shown in Figure 2. We highlight the BA column and go to the Formatting area on the Modeling ribbon. There is a decimal control there (see red arrow in Figure 2), press up until you get to three. Notice at the blue arrow we see the average of 0.301. We can now see if we switch back to the report view and hover over the same spot, we will now see 0.301 as shown in Figure 3. But the Y-axis is still showing only two decimal places. We should correct that for consistency sake.
In the Visualizations Pane, we select the Format tool (Figure 4, blue arrow that looks like a paint roller). This brings up a formatting menu where we can change many aspects of the visualization. In this case we want to increase the font on the X-axis to 14 (Figure 4, orange arrow). Then we will do the same for the Y-axis (Figure 4, red arrow). However, we will also change the scale with the starting point being 0 and the ending point being 0.360. This will give our chart some depth and make it more linear in scale. We will also change the decimal field to three so that our axis matches the accuracy of our tooltips.
Let’s also change the color of the line graph as I really don’t like the default colors in Power BI. It reminds me too much of the cyan from the limited color scale available on my 1980’s IBM PC. If you select the Data Colors control and then change the Batting Average data to Blue by entering the #0000FF hexadecimal code or selecting blue from the custom palette as shown by the blue arrow in Figure 5. Right below there we will see the Data labels control by the red arrow. We can set that to On and now see the data points with actual labels. This feels more presentable to me.
Are we done yet? Not yet. This visualization needs a catchy title as you will notice in Figure 5 in the upper left hand corner a small font with a couple of words defaulted there. Let’s change that to something more visually stimulating for the end-user. If we keep scrolling down on our Formatting area, we can see the Title control. Select that control and change the title, give it a blue color, and center it as shown in Figure 6 near the blue arrow. We’ve now completed our first visualization and it looks most impressive. Good job. Stay tuned to day six!
Welcome back to day four of our series on Success in Power BI! With 30 days of Power BI learning, we should be able to be successful with our use of the desktop tool. Have you forgotten where we left off from day three? If so, here is the link to refresh your memory.
So let’s create a visualization today in Power BI. For clarity’s sake, the Power BI desktop consists of reports, visualizations, and datasets. We’ve already built a dataset in the previous couple of days. Now, there is a difference between reports and visualizations. A visualization might be a pie chart or a map. A report, however, can contain one or many visualizations in order to present your data insight. Therefore it might contain a pie chart and a map to present the data cohesively in order to understand the insight we are presenting. Make sense?
So let’s get started. If you look at Figure 1, you’ll see the red arrow pointing to the available visualizations. You’ll also see where I clicked on the ellipses icon in the bottom right area and brought up a menu to import visuals (another name for visualizations). I am mentioning this to let you know that you can download custom visualizations!! Some are pretty fun and create memorable visuals for your intended audience. You can download these at https://app.powerbi.com/visuals/ and install them on your Power BI Desktop installation. For our purposes, however, we will stick with the standard visuals for now until we get up and running.
The first step to creating a visualization on our blank canvas is to choose one from the Visualizations area (highlighted in Figure 1) and click on it. We are going to select the Line Chart visual which is highlight in Figure 2 (first on the left of the second row from the top). This will drop a blank line chart visual onto your page as shown in Figure 2 on our white page.
Now you can see the dark area (called Fields and Filter pane) below the visualizations has changed between Figure 1 and Figure 2, depending upon the visual that was selected. This is probably the most difficult part of Power BI, in my opinion, as these titles do not seem very intuitive to the lay person (nor me the data professional, lol).
For this line chart, we want to see Hank Aaron’s batting average by his Age as players’ performance tends to decline as they age (as do all of us eventually). So if you look to the far right, you can see our data set fields. From here we can drag the batting average (depicted as BA) field over to the Values slot and then the Age field to the Axis slot. We will now see hit batting average on the Y axis and the age across the X axis as shown in Figure 3.
Obviously, we cannot see this visual so let’s grab the bottom corner and drag it across the page as shown in Figure 4. Not sure if you have the data in the right slots? Try different configurations to see if this is the insight you are trying to communicate. I tried reversing these and it made no sense in this situation. We really want to see an increase or decrease as the player ages.
So now if you look at Figure 5, this is what we saw on the original web page for Hank Aaron’s major league batting statistics from Baseball-Reference.com. We have now taken our batting dataset and turned it into a workable visual. It’s not pretty. Yet. This is a great stopping point. See you on day five!
Welcome back to day three of our series on Success in Power BI! With 30 days of Power BI learning, we should be able to be successful with our use of the tool. Did you forget where we left off on day two? If so, here is the link to refresh your memory.
Today we are going to do some general housekeeping to get ready to use the data we loaded. This will help us understand Power BI desktop a little better and get us ready to create some amazing visualizations.
If you remember yesterday we clicked on the data icon on the left toolbar and then we could view our data set that we loaded from the web source. To avoid any confusion when or if we load more data, let’s first change that name of the table. We are going to rename Table 2 to Standard Batting, as it was labeled on the website.
Now we need to do a cursory check of the data. It is important to make sure the data loaded is the data that we want to report off of and also to ensure that there were no errors from some questionable web programming.
If you look at Figure 2, you’ll see we have two rows at the top for years 1952 and 1953. Those are for the minor leagues. We do not want those included in our visualizations as we are comparing Hank’s professional career and it is a generally accepted practice to include only major league statistics in that comparison due to the breadth and level of competition.
So now if you look at Figure 3, this is what we saw on the web page for Hank Aaron’s major league batting statistics and what we thought we were loading originally. The website, Baseball-Reference.com, is using some programming to hide minor league stats (as you can see a button called Show Minors). We might not have noticed this, but its an import step to note when loading data from web sources.
So we need to remove those rows so that we can get an accurate professional representation of Aaron’s performance. Sounds easy, right?
Not so fast. When we click somewhere in the grid, it highlights a column, and when we right-click to choose delete (as shown in Figure 4), it appears that the column will be deleted and not the row. Trust me, it will be the column deleted. So how do we delete the row?
Let’s click on the Edit Queries icon on the Home toolbar (or ribbon) at the top of the screen. This will load the data into a Query Editor window (giving you two active windows). We can highlight rows here but we still cannot delete them. However, if we imagine this is Excel, then we can use the filter ability to remove our the two rows. Click on the down arrow at the column header for Year. Now deselect 1952 and 1953, as shown in Figure 5. Select Ok to make it happen captain.
I want to point out here in Figure 6, that the rows are gone. But, I also want to point out the section on the right of the screen called Applied Steps. At this point, we have not committed any changes yet. However, this is a list of steps that we’ve done thus far in our manipulations. We can change that list or step through that list and watch the data change, if we choose.
So, if you click on the Changed Type step, you can see the two rows are there again. You can also click the delete icon (X marks the spot on the right of the step) to remove the step. Thus, if we decided to leave those two years in there and cancel out the changes that we made, we could then click the X and delete the Filtered Rows step. We could also do several other things here in the data if we needed to before applying the changes.
In this case, we did not really delete rows as they are filtered out just like in Excel, but they are removed from our visual data set that we will report off of. It is essentially smoke and mirrors as shown in Figure 7. This is a great stopping point. See you on day four!
Welcome back to day two of our series on Power BI! Are you sore from yesterday’s heavy lifting? No? Good. Did you forget where we left off? Here is the link to refresh your memory.
Today we are going to start with loading some data from a web data source. There are a ton of great data sources out there but I chose Baseball-Reference.com because of the wealth of information and statistics available there and I personally love baseball. It doesn’t matter if you are a baseball fan or not as this is just a demonstration. Feel free to find a different site for your favorite sport or activity.
Hank Aaron is generally considered to be one of the greatest baseball players of all time so we will grab some of his statistics and break them down in Power BI. With my company, Innovative Architects, being based in Atlanta, we’re pretty fond of Mr. Aaron here at the office. So let’s see where this takes us for this adventure.
Let’s start with opening Power BI. We are greeted each time with this great modal dialog box. It is a great spring board. We can open a previous project or start a new one by clicking on Get Data.
- Select Get Data to get started on our journey.
- Select Other to get to the Web data source option and select Connect.
- Select Web. Type in http://www.baseball-reference.com/players/a/aaronha01-bat.shtml and select Ok. This page will give us some lifetime stats for Hank Aaron so we can do some visualizations.
- We are now connecting to the data into Power BI.
- Now the fun part: figuring out which data that we want to load. Look at all of these tables. Which one is ours?
- Let me show you a trick. Click on the Web View tab around the middle of the screen. It will show our web page data as it appears normally. However, it will also label the tables so that we know which one we want to load. Snazzy, huh?
- From here we can click on the Table 2 check box near the Standard Batting title for the table. That was much easier, huh? Select the Load button and Power BI will now start to load the data into the tool.
- When that completes, we will actually have data loaded into Power BI to begin our work. If you look at table 2 on the right, you can see all of the columns that we saw on the web page.
- If we click on the data icon in the middle of the three icons on the right hand side of the screen, we can actually see the data. The other two show the report and the relationships between tables. We will get into that later on.
- Let’s Save the File as Hank Aaron and pick up again on Day Three. Good job! We are data visualization gods, right?
In the SQL Server community, I’ve seen quite a few 30 day blogs. I love the format with the idea being that you read the blog once a day for a month to master a new skill. Let’s go on adventure to learn PowerBI!
PowerBI is the latest Microsoft Business Intelligence tool. This tool, however, is considered a self-service tool in that it lets users create the data visualizations that they want to see instead of waiting for your in-house report writers to create a report for you. Excited yet? Don’t worry, you will be. It is a fun tool. Let’s get started.
Recently, I completed my first PowerBI project for a client of ours at Innovative Architects. It only took a couple of days to deliver multiple dashboards and they were beautiful if I do say so myself. This project inspired me to blog about getting started with PowerBI.
So day one needs to be getting PowerBI installed and ready to go. The first thing we do is to download the software at https://powerbi.microsoft.com/en-us/get-started/ using an email address. This is important because PowerBI is getting monthly updates and the tool is adding features every month. You really want to know about them as they are released.
- Let’s start the installer and get this party going!
- Strait forward so far, right? Click Next.
- Check the acceptance box and then click Next.
- Choose an installation folder by clicking Change or click Next to continue with the default location.
- Click Install to begin.
- Click Finish to end the installation. Wow, that was easy!
I told you it would be fun, right? We have now installed PowerBI Desktop. We’re should probably take the rest of today off. See you tomorrow!
Read the full series here:
- Day Two: Loading a Web Data Source
- Day Three: General Housekeeping
- Day Four: Our First Visualization
- Day Five: Making It Pretty
- Day Six: Adding More Data
- Day Seven: Adding Simple Analytics
- Day Eight: Adding More Analytics
- Day Nine: Line and Clustered Column Chart
- Day Ten: Add New Calculated Columns
- Day Eleven: Adding a Second Visual
- Day Twelve: Adding a Slicer
- Day Thirteen:
- Day Fourteen:
- Day Fifteen:
Having moved from Tampa a couple of years ago, I always wondered why Atlanta only had one SQL Saturday per year. While Tampa has two SQL Saturdays, one regular event and one specifically for Business Intelligence (BI). Atlanta, however, is the big daddy of SQL Saturdays (ok, ok, technically we are the second biggest SQL Saturday in the world but only by a few people and only because our venue will not physically hold anymore people thanks to that darned Fire Marshall and his silly rules about people cramming into a room). Thus, it seemed only natural that Atlanta would also have a BI event with such a vibrant BI user community here.
In my consulting role at Innovative Architects, I’ve been doing quite a bit of BI projects and I absolutely love having irons in both fires. In addition, there were a lot of people here that were interested in helping with an additional SQL Saturday. We just needed a push to get it off of the ground. Enter my co-worker Damu Venkatesan (t) who said “OK, let’s do this now!” A few months later and we have moved to the wait list for registrations, we have almost sold out our sponsorship slots, and we have an amazing line-up of speakers. Maybe I am biased, but I’ve been to at least thirty SQL Saturdays in my career and I think this is as good of a schedule as I’ve ever seen. After the schedule was announced, the registrations filled up at a frantic pace.
We also had a few big names in the community submit pre-conference training abstracts for Friday, January 8th. We ultimately decided on having two sessions for our first year even though we had several great submissions. We finally decided upon SSIS Design Patterns and BIML: A Day of Intelligent Data Integration by Andy Leonard (b|t) for our first session. Our second session, Microsoft BI In a Day, is being presented by Microsoft employees Patrick Leblanc (b|t) and Adam Saxton (t) who is also known in the community as Guy In A Cube (yt|t).
Below are some PowerBI visualizations, because this is a BI SQL Saturday! Enjoy!