2024 Week 27 | Power BI: Scatter plot showing the path of selected data points

Introduction

The challenge for this week is to first build a scatter plot of fertility vs life expectancy of different countries. However, this will not just be for a single year, but will contain information across many years, making it possible to slide the year and see how all the bubbles move. The key challenge though is how to bring it to the next level and show the path that any selected data point will travel when we move the year slider.

In the example below, selecting data points (ctrl select for multiple points) will show the relative paths.

Requirements

Getting the data

To get the data, follow these steps

  1. go to https://www.gapminder.org/data/
  2. In the “select indicator” search box type “Fertility rate, total” and select the only remaining indicator
  3. Click on the XLSX button next to “Download as”. Save the file as “Fertility Rate.xlsx” 
  4. Clicking on the title “Fertility rate, total (Births per woman)” will open the search box again
  5. Type “Life Expectancy, at birth”
  6. Download the XLSX file, save as “Life Expectancy.xlsx” 

 

Modeling the data

Historical Data (Fact)

  1. On Power BI Desktop select Excel File as data source and find the “Fertility Rate.xlsx” file
  2. Name the query “Fertility Rate”
  3. Select the only sheet of the file and choose “transform data” 
  4. Promote headers
  5. Select “country” field and unpivot other columns
  6. In a separate query repeat steps for “Life Expectancy.xlsx”. Call the query “Life Expectancy”
  7. Join “Fertility Rate” and “Life Expectancy” queries as a new query. Join by country name and year. Name this query “Historical Data”
  8. Expand “Life Expectancy” column and keep only the Life expectancy column.
  9. Rename “country” to “Country” (do the right thing)
  10. Create a new column from examples called “Year-Country”, enter 1960-Afghanistan for the first row, 1961-Afghanistan for the second row and it should already pick it up. 
  11. Disable load on “Fertility rate and Life Expectancy” 

 

Country (Dimension)

  1. Right click on the “Historical Data” query and select “Relationship”. 
  2. Select the country column, right click on the header and select “Remove other columns” 
  3. Remove the duplicates
  4. Rename the query to “Country”
  5. “Close and Apply” Power Query

Year (Dimension)

  1. In Report view, go to modeling tab and create a Numeric parameter, whole numbers from 1960 to 2022. Call the parameter “Year” (keep the check on “add slicer to the report” option enabled)

Relationships

  1. Create a relationship between “Country” in “Country” table and “Country” in “Historical Data” table
  2. Create a relationship between “Year” in the “Year” table and “Year” in the “Historical Data” table. 

 

Measures

  1. Create “Avg Fertility Rate” measure as AVERAGE of the Fertility Rate field
  2. Create “Avg Life Expectancy measure as AVERAGE of the Life Expectancy field.
  3. Create a “Size” measure equal to 1
  4. Create a “Color” measure equal to “#CCCCCC77” for a light gray with transparency

 

Building the main scatter plot

  1. Add a scatter plot to the canvas
  2. Drag “Country” field from “Country” table into the “Values” well
  3. Drag “Avg Fertility Rate” measure into the into the “X-Axis” well
  4. Drag “Avg Life Expectancy” measure into the “Y-Axis” well
  5. Drag “Size” measure into the “Size” well
  6. Go to Format > Markers > Size and select -25
  7. Go to Format > Markers > Color and select conditional formatting by field value, choose “Color” measure. 
  8. Hard-code min and max values for the Y-Axis to 0 and 100
  9. Hard-code min and max values for the X-Axis to 0 and 10
  10. Go to Size and Style and remove the background
  11. Disable the title 

Building the scatter plot to show the selected point path

  1. Duplicate the scatter plot
  2. Create a measure “Path Size” that stores the selected year in a variable, then the year on the Historic Data if no filter from the year table existed. In case both values match, the value should be equal to the “Size” measure. Otherwise the same value but divided by 100.
  3. Using the same logic to capture the selected and datapoint year, create a “Patch color” measure that returns a dark red if the year of the data point is greater or equal to the selected year and a light gray otherwise. 
  4. Replace the field “Country” on the “Values” well with “Year-Country” from the “Historical Data” table
  5. Replace “Size” in the “Size” well with “Path Size” measure
  6. Go to Format > Markers > Color and use conditional formatting by field value, use  “Path Color” measure.
  7. We don’t want the year to filter this chart, so create a calculation group called “Model” with a calculation item called “Remove year filter” that removes the filter coming from the Year table, but only for the “Avg Fertlity Rate” and “Avg Life Expectancy” measures. Apply the calculation item at the visual level
  8. We only want the values to be shown for the selected data points. Create a calculation group called “Visibility” with a calculation item called “Only if Country is Selected”  that checks if “Country” ISFILTERED. If so, leave values unaffected, otherwise return blank. Apply the calculation item at the visual level. 
  9. Go to Format > Markers > Color and choose a dark red. 
  10. Overlap both scatter charts to the exact same size. Make them both as large as possible, with the main chart on top and the secondary one below. Leave enough space just for the year slider. 

Dataset

Share

After you finish your workout, share on Twitter using the hashtags #WOW2024 and #PowerBI, and tag @MMarie, @shan_gsd, @KerryKolosko, @AgulloBernat. Also make sure to fill out the Submission Tracker so that we can count you as a participant this week in order to track our participation throughout the year.

Solution

Scroll to Top