Introduction
This week for Workout Wednesday we will be trying our hand at non-linear regression with the Deneb custom visual.
With Vega and Vega-Lite we can create a non-linear line of best fit over our data points and utilise the statistical API to return a correlation coefficient.
Requirements
Power Query
- Obtain the dataset from the Data Stories Gallery
- Model and load data
Power BI Desktop
LINEAR REGRESSION
- Import Deneb Custom Visual found in AppSource
- Place fields in Field Well
- From the ellipsis of the custom visual header, select edit visual
- Visit Regression | Vega-Lite for guidance
- Create a new Vega-Lite Specification
- Create a scatter chart of mark type = point with ‘power’ and ‘shot put distance’ encoded on the x and y axes respectively
- Layer a line with a linear regression transform
- Layer a text mark with the correlation coefficient
NON-LINEAR REGRESSION
- Repeat the Linear Regression
- Change the regression method to quadratic for the line: “transform”: [ {
“regression”: “Power”,
“on”: “Shot putt distance”,
“method” : “quad”
}
], - Change the regression method to quadratic for the text: “transform”: [{
“regression”: “Power”,
“on”: “Shot putt distance”,
“params”: true,
“method” : “quad”
},
{“calculate”: “‘R²: ‘+format(datum.rSquared, ‘.2f’)”, “as”: “R2”}
],
ADDITIONAL NON-LINEAR REGRESSION
- Repeat the Linear Regression
- Try adjusting the extent of the lines
- Try out other regression types such as polynominal with orders 2, 3 and 4
- Polynomial Regression: The Only Introduction You’ll Need | by Aden Haussmann | Towards Data Science
ADVANCED
1. Try recreating these regression graphs with Vega
Dataset
The dataset this week can be viewed here
Share
After you finish your workout, share on Twitter using the hashtags #WOW2023 and #PowerBI, and tag @MMarie, @shan_gsd, @KerryKolosko. Also make sure to fill out the Submission Tracker so that we can count you as a participant this week in order to track our participation throughout the year.
Solution
Solution File available for download via Data Stories Gallery
Hi,
I think the regression should be the other way around? as power is the cause (independent variable) and shot the dependent variable?
I followed the original research paper which has it this way around