Click on text below to jump to the corresponding location in the video (except iOS)



I am going to talk about rugby analytics. The agenda is not rugby the sport but probabilistic programming and how you can use this to build a predictive model for an interesting problem.
Who am I?
I am a data analytics professional based on Luxembourg.
Contents: Probabilistic programming applied to rugby
Standard xkcd cartoonSports commentary
How can statistic help with sports?
-fundamentally rugby is a simulatable event
-how do we generate a model to predict the outcome of a tournament?

-how do we quantify our uncertainty in our model?

What influenced me on this?
Quantopian talk
What's wrong with statistics
Models should not be built for mathematical convenience, but to accurately model the data.
What is Bayesian statistics?
Implies that we have a prior belief about the world. Bayesian statistics is a formula to update our beliefs after having observed data.
Bayesian rugby
Based on an original paper by Baio and Blangiardo
What Zalando did
They used it for automatic weight estimations for items
So why Bayesians?
Probabilistic programming is a new paradigm. I will be comparing blackbox machine learning with scikit-learn.
Blackbox machine learning
Predictions based on a blackbox
Limitations of machine learning
The models being blackbox is in itself a limitation - hard to explain to customersProbabilistic programmingOpenbox modes. Blackbox inference engine.
Probabilistic programming - what's the big deal?
We are able to use data and our prior beliefs to generate a model. Generating a model is extremely powerful, and we can tell a story.
Six nations rugby
MotivationYour estimate of the strength of a team depends on your estimates of others' strengths.
Results from previous years
Preparing model for PyMCWhat do we want to infer?We want to infer the team strengths. We want to infer latent parameters. Probabilistic programming allows us to get these latent parameters.
MCMC samples
What do we want?We want to quantify the uncertainty, to use this to generate a model, and we want answers as distributions and not point estimates.
What assumptions do we know?
Finite number of team. We have data from last year, and sports scoring is modeled as a Poisson distribution.
The model
Home advantage is taken into account.
Key assumption: home effect is an advantage in sports. Bayesian models allow you to incorporate these beliefs into your model.

Digression: why the flat priors were picked
It made no statistical difference
A prior distribution is non-informative if the prior is flat relative to the likelihood function
Often in Bayesian modelling it doesn't matter what your priors are. Even bad guesses will give enough information to find an interesting answer.
Let us run the model
DiagnosticsThe plot indicates that the model converges.
The home advantage gives about 0.55 points advantage.

Simulating a season
We are going to simulate 1000 seasons. So the model predicted Ireland would win most of the time 4 games. We can also see how many points they score.
What happened in reality?
Shrinkage: Fundamentally all models are wrong, but some are useful.
What are the predictions of the model?
Let us look at the winning team on average.
What actually happened
We need to investigate like scientists.
Ireland won the six nations
ConclusionLearn moreProbabilistic programming for hackers
Doing Bayesian data analysis

Questions
Can we sell this tool as a compliance tool for FIFA?Never going to happen.Maybe.
Why don't you use nested sampling instead of MCMC?
The PyMC3 project has a few more samplers. You can submit a pull request to add it yourself.
PyMC2 vs PyMC3. Can you tell us why you chose PyMC 2?
At the time I did it, the documentation for PyMC 3 was not that good. I will probably port it to PyMC3 at some point.
Video outline created using VideoJots. Click and drag lower right corner to resize video. On iOS devices you cannot jump to video location by clicking on the outline.