13:46 Probability will only cognitive learning break your heart — Or — Trust the Process, Doubt the Procedure NBA playoff win chances Data Science News | |
Probability will only break your heart — or — trust the process, doubt the procedure: NBA playoff win chancessettling a bet with SQL data-nesting & spicy statistical takes (bayesian vs frequentist decision analysis)daniel mcnicholblockedunblockfollowfollowingmay 28this is a parable about simple, straightforward questions of fact, & how they often devolve into complex matters of data processing, analysis & decision-making under fragile epistemic limits, in the real world.Cognitive learning theory in the classroom The 126,314-row CSV file (17mb) contains entries for every NBA (& ABA?) game between 1946-7 & 2014-5, with 2 rows for each game (1 from the perspective of each team), as well as an is_playoffs binary indicator field &game_location indicator field (home or away), among many others:this data will more than suffice, but as with any real world data analysis, substantial preprocessing & data preparation is necessary, informed by relevant domain knowledge.Cognitive learning theory in the classroom So I uploaded the file into my cloud data warehouse of choice: google bigquery, which boasts not only outstanding performance on massive datasets, but also my favorite “standard” SQL engine due to its support for nested data structures, DDL & DML (to say nothing of next-gen innovations like bigquery ML & deep integration w/ google cloud platform).Cognitive learning theory in the classroom W_W, W_L, etc)series_wins & series_losses: total number of wins & losses in the playoff series by the team of interestseries_result: indicator of the series result determined by comparing series_wins & series_lossesseries_home_court: indicator of whether the team of interest was home or away for first 2 gamesthis was accomplished via the following query, taking advantage of some of my favorite features of bigquery standard SQL mentioned above (particularly array functions)the bigquery output table is here, & looks something like this:note the series_results_array column, which nests results for the entire playoff series inside of a single cell in each row.Cognitive learning theory in the classroom Table output here, sample:yet there remains row data extraneous to the conditions of our original bet:each series contains duplicate entries, one from each team’s perspectivewe’re only interested in the away team’s perspectivewe’re only interested in series where the first 2 games were split, in some order — “W_L” or “L_W”this is resolved by a few statements in a WHERE clause (which could also simply be applied during any subsequent aggregation query or analysis step, but done here for clarity):table output here, sample:this brings us to the moment of truth…kinda.Cognitive learning theory in the classroom Data analysiscounts, summary stats / descriptive statisticsbeginning with the cleanest period (post-2002), we can start with simple counts & percentages, conventionally represented in a contingency table:output:so at first blush, independence prevails: away teams winning then losing the first 2 games have an essentially identical series win % to teams losing then winning: 39%.Cognitive learning theory in the classroom The filters are set to relevant configurations for the question at hand, but feel free to modify & explore on your own!Epilogue: flus, flukes & tearsproperty of the philadelphia inquirer – charles fox, staff photographeras for our heroes, they indeed won game 2 in enemy territory, bringing the tied series home & going on to win game 3 by blowout.Cognitive learning theory in the classroom | |
|
Total comments: 0 | |