Data clustering for fitting parameters of a Markov chain model of multi-game playoff series

We propose a Markov chain model of a best-of-7 game playoff series that involves game-togame dependence on the current status of the series. To create a relatively parsimonious model, we seek to group transition probabilities of the Markov chain into clusters of similar game-winning frequency. To do so, we formulate a binary optimization problem to minimize several measures of cluster dissimilarity. We apply these techniques on Major League Baseball (MLB) data and test the goodness of fit to historical playoff outcomes. These state-dependent Markov models improve significantly on probability models based solely on home-away game dependence. It turns out that a better two-parameter model ignores where the games are played and instead focuses simply on, for each possible series status, whether or not the team with home-field advantage in the series has been the historical favorite - the more likely winner - in the next game of the series.
© Copyright 2008 Journal of Quantitative Analysis in Sports. de Gruyter. All rights reserved.

Subjects: sports game analysis statistics competition theory baseball
Notations: technical and natural sciences sport games
Published in: Journal of Quantitative Analysis in Sports
Published: 2008
Volume: 4
Issue: 1
Document types: article
Language: English
Level: advanced