The measures from a survey I trust the most relate to market share. However, sometimes you need to know what percent of consumers engaged in a particular behavior or had a certain need over a given period of time. This is a “cumulative penetration” measure and consumers are bad at recalling this because of telescoping and imperfect memory. So how can you estimate penetration analytically as an alternative approach which can also be used for logic checking survey answers? I am going to give you two math-based hacks…the first in this blog.

## “Independent event” probability estimation

If you know my probability of doing something on a given event, you can estimate the probability I will do it at least once over n trials. Let’s say I have a 20% probability of buying a given brand…it’s in my consideration set but not my favorite. Furthermore, let’s say I buy the category 6 times per year. The expected probability that I buy the brand at least once is [1- ((.8)^6)], or 74%. Actually, this is how a binomial formula works…independent trials.

If I want to know the incidence of ALL consumers buying a brand at least once, you need to know the distribution of what percent of consumers have a given probability of buying that brand on a category purchase. Luckily that is quite easy to estimate.

A beta distribution depicts the percent of category buyers who have a particular probability of choosing your brand given a category purchase. The two parameters are alpha and beta. Alpha divided by the sum of alpha + beta is the market share. The sum of alpha + beta is a shape parameter that reflects loyalty. If you have an estimate of the brand’s Markov repeat rate, you can directly solve for the two parameters. You can get this from numerus data sources, but from a survey, use constant sum questions to simulate a repeat rate. Expect alpha + Beta to be in the 1-2 range.

With one equation for share and one equation for repeat rate, you have two equations and two unknowns. This gives you the parameters and the distribution (easily operationalized as a built-in function in excel).

If you know the average category purchase cycle, you can simulate cumulative penetration very closely.

There is a related probability distribution called an NBD Dirichlet (Dirichlet can be thought of as a multivariate version of a beta; NBD is negative binomial distribution). Putting together NBD and Dirichlet gives a histogram of the number of purchases consumers make of different brands, given Dirichlet heterogeneity. That will give you the estimated penetration for all brands in the category. One cautionary note is that the Dirichlet model makes assumptions that there is no market structure. I don’t prefer it for that reason as I always find market structure where some brands are more in competition with each other than they are with brands outside that competitive sub-set.

You can estimate a beta distribution within need states as well. Suppose you want to know what percent drink Coca-Cola over 6 months for breakfast? Or what percent drink Coca- Cola when they are driving around and stop in the convenience store while fueling up. Or what percent buy carbonated drinks at a 7-11 style convenience store vs. an enriched water vs. fruit juice? Or what percent watch a streaming service after midnight during the week (vs. no TV, or linear, or DVDs). All of this can now be estimated mathematically by using the beta distribution along with a few simple survey answers that are easier for a respondent to recall.

In this way, researchers can more accurately spot opportunities for brand growth by need state.

In the next blog in the series, I will show you a different cool way to estimate penetration that does not even require knowing the market share of a brand in a given need state situation. This other approach is based on Markov matrices.