“Sees spin well and has decent power—let’s see how he does in the 3-hole.”
“We need speed in the leadoff spot to take the extra base and put pressure on the defense.”
There is value in having a good intuition and using qualitative information when building a lineup. Maybe this intuition comes from paying close attention to how the most advanced major league teams form their lineup, or perhaps it comes as a result of years of coaching experience.
But the real value comes from combining this with analytics.
With a vast number of different outcomes during the course of a baseball game, how can you be sure your intuition is correct? Suppose you want your fastest hitter (Hitter A) at leadoff to score from first base if hitter 2-4 drives the ball into the gap for a double. But what if Hitter A doesn’t get on base as frequently as Hitter B, even though Hitter B is significantly slower?
There are infinite questions a coach can ask that don’t give any sort of quantitative answers.
Quantitative Lineup Optimization
Instead of basing this important decision on instinct, we are excited to announce a collaboration between Driveline and Seqnzr, a new advanced analytics software that leverages simulation modeling to optimize your lineup.
Seqnzr is a Python-based baseball simulation that runs thousands of games (with models specific to the college or professional level), enabling you to test which sequence maximizes run production. Across initial tests, Seqnzr gives a 5-10% chance of scoring an extra run each game. Check out our informational web page at www.seqnzr.com for a high-level overview, drop us a note here, or click on “Simulation Log In” to register and test it out for free.
Below, we’ll review the methodology in more detail, explain how the model is tuned and backtested, give applications in lineup optimization and player valuation, and open the conversation for future enhancements.
First, here is a quick preview of the web application.
How does the simulation work?
The user feeds the model a player’s projected statistics for the upcoming game. Think of the input stats as a forecast. These could be from fall, pre-season, the previous season, or a combination of stats with adjustments. Seqnzr converts these to percent likelihoods of different outcomes.
Using a random number generator, it “rolls the dice”. If a hitter has 100 plate appearances with three home runs, the model assumes that that hitter will hit a home run 3% of the time. There are numerous different outcomes for each plate appearance, such as single, long single (runners advance two bases), fly ball, ground ball, etc.
For fly ball vs. ground ball rates, the model uses the strong linear relationship between extra base hit (XBH) % and fly ball (FB) % based on 2019 MLB data (i.e. more XBH = more FB). Within those ground ball and fly ball outcomes, the percentage at which the runner advances, holds, or is doubled up is based on the speed factors of the players involved. The simulation of baseball plays isn’t limited to direct batter-pitcher interactions — there are scenarios built on top for stolen bases, pickoffs, and wild pitches/passed balls.
The code runs through a lineup plate appearance by plate appearance, inning by inning, game by game, modeling what happens at each juncture given which hitter is batting and which runners are on base. The code is about as lengthy and as complicated as you might expect, but it is thoroughly tested and provides a robust, accurate representation of a baseball game.
Simulating thousands of games allows us to model every possible situation we were trying to conceptualize intuitively earlier. Remember Hitter A? We can feed Hitter A’s speed and statistics into the program and observe projected runs scored with him in different positions in the lineup.
The model isn’t trying to tell us exactly how many runs a lineup will score in a given game. There are too many factors, such as opposing pitcher, weather, ballpark, etc. to predict this without extensive data.
The purpose of the model is to attempt to predict which lineup is expected to score more runs in the long term. For example, the model could output one lineup averaging 4.51 runs per game and another averaging 4.61 runs per game. The lineup with the higher expected value has more relative effectiveness from the sequence of hitters, meaning this is a more productive sequence of hitters.
What’s the value of just 0.10 runs per game?
Think of this as a 10% chance of scoring one more run each game. Over the course of a 60 game season (assuming decent input stats early in the season), you’d score roughly six extra runs.
There are few, if any, analytics products that can quantify that sort of benefit. Maybe one of those runs comes in a game you would’ve otherwise lost by one. As important as those intuitive conversations are, their real value comes in combination with analytics and data-driven decisions.
How will the benefit change from team to team?
The more variability in player ability, the more benefit in finding the optimal lineup.
If you have a lineup with Barry Bonds, Mark McGwire, and seven high school hitters, it definitely matters where Bonds and McGwire hit. If you have nine identical hitters, optimization obviously doesn’t have any benefit.
The quality of input statistics has an impact as well.
Stats aren’t going to be perfect early in the season, but we encourage coaches to use Seqnzr to test hypotheses and build a better analytical framework for lineup optimization, rather than just guessing. All that requires is feeding potential stats into the model, changing the lineup, and pressing “Run”.
For the model to be effective, it needs to be representative of a baseball game.
What’s the best way to test that? If we were to look at just one team and simulate a 60 game season, what value would we expect the simulation to output?
Suppose a team actually scored 4.6 runs per game over their 60 game season. If we plug their players’ stats from the end of the season into the model, there are a couple issues with validation.
- The simulation models a 9 inning game. In how many games did the offense actually hit nine times? To fix this, let’s look at Runs per Plate Appearance.
- In what percent of games did players play? How do we choose those nine starting players for the model?
- If this team played the whole season over again with the exact same conditions, how many runs would they score? This is a challenging concept. There’s inherent randomness in a season, as with our model. Over a small sample size of 60 games, that number is bound to change.
To backtest, we need a larger sample size. Instead of looking at just one team, we’ll aggregate stats across several conferences over an entire season, take the average player in each lineup spot, and compare actual to predicted runs per plate appearance over a large number of games in the simulation.
This removes a majority of the randomness and answers our initial question: How representative is the model of a baseball game?
Here are the aggregate hitter stats across the SEC, ACC, PAC-12, and Big Ten:
We’re going to feed the model these statistics — there’s no need to normalize for one hitter as the model converts these to percent likelihood of outcomes.
For example, 2,409 home runs in 106,461 plate appearances (AB + HBP + BB + Sacrifice ABs) will result in a ~2.3% chance of a home run in each plate appearance.
Across those conferences, the actual runs per plate appearance score is 0.1551. Using these input stats and parameters, our model predicts an astoundingly similar number of 0.1555, giving an absolute error of 0.3%. Thus, the model accuracy for this particular test is 99.7%.
What about outcomes not given by these inputs?
As mentioned earlier, the model provides parameters, which might be open for tuning by advanced users. Due to the complexity, we recommend keeping these as given. These parameters are pulled from existing data and adjusted to fit the level of play.
For example, there’s roughly the same rate of sacrifice flies in the simulation as across college baseball in 2019, and the rate of simulated double plays is fitted as well. More straightforward outcomes like wild pitch / passed ball (wp_pb_ratio) are also fit to league stats.
2019 College Baseball Season
At the MLB level, there are dedicated teams of analysts responsible for quantitatively answering these sorts of questions.
However, this advanced modeling capability just doesn’t exist at the college level yet, either due to the lack of continuity of statistical analysts, funding, or difficulty building a scalable platform.
Let’s take a look at Vanderbilt’s lineup in game three of the 2019 College World Series and see how it differs from what Seqnzr would have suggested.
To do this, there are several assumptions that need to be made as to what stats we’re feeding the model. Here are two of the main factors:
- How will each player perform given the pitching matchup? Not just the starting pitcher, but the bullpen later in the game.
- Should we factor in hot/cold streaks? There’s interesting, yet perhaps inconclusive research based on a Markov chain model that supports this theory. This is a judgment call, but as a former pitcher it seems undeniably true that players are streaky.
- Injury Influence: If a player is struggling with a wrist injury, for example, this probably indicates he won’t perform as well as he would have if he had been healthy (and is also an argument for the existence of streakiness).
First, we’ll run the simulations with their current season stats without any adjustments and check several reasonable lineup combinations.
|Actual – 8.25
|Bad – 8.19
|Good – 8.31
Seqnzr tells us that by using Lineup 3 instead of Vanderbilt’s chosen lineup, Vanderbilt would have increased expected value by 0.06, giving them an extra 6% chance (!!!) of scoring another run over a 9 inning game. So although Bleday isn’t a typical leadoff hitter, the model suggests he’s such an exceptional hitter that there’s value in trying to get him an extra plate appearance, even if it’s less likely that runners are on base. Take a look at where Kris Bryant and Ronald Acuna Jr. hit, for example.
Now let’s add an assumption for R/L matchups and assume Vanderbilt knew they had a good chance to face multiple right-handed pitchers.
Taking same-handed vs opposite-handed matchup data from the 2019 MLB season, there’s roughly a 5% increase in HR, 10% increase in 2B, 15% increase in BB, and a 10% decrease in strikeouts, so we’ll do the same with their players’ stats.
Since Bleday has a favorable L/R matchup, we’ll increase his HR by 2.5% (half of 5%) from 26 to 26*(1.025) ~ 27. This is a very conservative matchup adjustment, and users are encouraged to test using their own data/findings.
Quick note: It wouldn’t make sense to compare results across simulations with different input stats. We expect different results from Adjusted Lineup 1 (below) and Non-Adjusted Lineup 1 (above).
|Actual – 8.3
|Bad – 8.26
|Good – 8.36
As expected, we find similar results in relative effectiveness. It’s important to recognize the combination of art and science that comes with advanced analytics tools such as this.
Sometimes the lineup chosen isn’t the exact optimal lineup Seqnzr recommends, but it’s close. Maybe the coach recognizes a hitter has been struggling recently, needs to see better pitches, and tries to hide that hitter in the seven spot.
But when it comes to Game 3 of the College World Series? Any advantage to score more runs should be taken.
As Seqnzr continues to evolve, there are a number of enhancements in the backlog. The capabilities that offer the most insight to relative lineup effectiveness are at the top of that list.
- Fatigue Factor: As a pitcher goes deeper into an inning, his effectiveness will likely decrease due to a dip in velocity or the fact of pitching with runners on base. This gives more value to the long inning, which an optimally sequenced lineup is more likely to produce. Thus, the variation in results from a mediocre lineup to an optimal lineup increases. That 0.10 number from earlier could be closer to 0.15.
- Protection Factor: As the best hitter in the lineup moves into different positions, does this affect the quality of pitches and the success of the hitter in front of him? More importantly, does adding this feature change which lineup is optimal?
On top of enhancements to the simulation logic, there are many user interface enhancements to make using the web application easier. These could include:
- Automated web scraping to pull players’ most up-to-date stats
- Standard adjustments based on league average research for RHP/LHP matchups
Outside of the current simulation, there is also an extension to model sacrifice bunt success, which we’ve built out but haven’t included in the web application.
Instead of simply looking at a run expectancy matrix from MLB data, extrapolating to college baseball, and concluding it doesn’t pay off to bunt in just about every situation, we can use the simulation to model effectiveness.
We could open the simulation parameters to allow the user to specify the exact situation, the percentage likelihood of a successful bunt or fielding error, and then model how the inning proceeds.
Because not every hitter in a college lineup is as effective as an MLB hitter at avoiding a double play or advancing the runner, our initial analysis indicates that there’s slightly more value in weaker hitters bunting than traditional thought might tell us.
After several years of developing Seqnzr, I’m excited to officially launch it and announce the collaboration with Driveline Baseball. We believe Seqnzr will create an immediate impact on how data is leveraged in lineup creation.
Alex and the team at Driveline are scoping out a consulting service to offer their expertise and educate teams on how best to leverage the simulation.
If you’re interested in a demo, don’t hesitate to reach out to Brian McAfee. co-founder of Seqnzr (email@example.com), or Alex Caravan, Quantitative Analyst at Driveline (firstname.lastname@example.org).
Written by Alex Caravan and Brian McAfee