Why Are Incremental Response Models Difficult to Develop?
Posted by Hongjie Wang on Tuesday, March 15th, 2011As the Chief Scientist at Fulcrum working with clients from various industries, I have experienced, first-hand, the accelerated demand for more sophisticated and diversified analytical solutions from our clients. A perfect example is the popularity of incremental response modeling, something we began championing nearly ten years ago. These days, Fulcrum gets more requests and projects for incremental response modeling than for traditional response modeling. Companies are rightly adopting incremental metrics to drive their marketing operations.
At many of our banking clients, for example, instead of repeating the traditional approach of estimating attrition and value-at-risk, they are more likely to focus on identifying customers for whom marketing programs are more likely to mitigate the value-at-risk. This trend is encouraging. More marketers have recognized an important principle in modeling – if you want to use the model to optimize your marketing, your model has to address or include the decision variables that will be operationalized in the ensuing programs.
Fulcrum has devoted considerable effort to developing incremental modeling methodologies. As a result I know first-hand thatthat incremental models are much more difficult to develop than traditional response models. Many clients have tried incremental modeling internally and found them extremely difficult to develop and validate. I will use this blog to provide some high-level insights on the challenges and suggest ways to overcome them.
To provide a context for our discussion, let’s consider the following campaign report.
| Recency segment (months) | Test (mailed
Sample) |
Control
(non-mailed sample) |
Purchase rate (test) | Purchase rate (control) | Difference |
| 1 – 3 | 9,152 | 9,146 | 9.45% | 8.76% | 0.69% |
| 4 – 6 | 9,256 | 9,116 | 5.66% | 5.30% | 0.36% |
| 7 – 9 | 9,311 | 9,299 | 4.15% | 3.98% | 0.17% |
| 10 – 12 | 11,741 | 11,913 | 4.90% | 4.60% | 0.30% |
| 13 – 15 | 6,696 | 6,848 | 2.84% | 2.72% | 0.12% |
| 16 – 18 | 6,060 | 6,087 | 2.52% | 2.50% | 0.03% |
| 19 – 21 | 6,222 | 6,304 | 2.15% | 1.98% | 0.17% |
| 22 – 24 | 8,566 | 8,647 | 2.52% | 2.07% | 0.45% |
| 25 – 27 | 4,652 | 4,696 | 1.57% | 1.66% | -0.09% |
| 28 – 30 | 4,225 | 4,220 | 2.01% | 1.28% | 0.73% |
| 31 – 33 | 5,269 | 5,155 | 1.56% | 1.34% | 0.22% |
| 34 – 36 | 7,092 | 6,916 | 1.80% | 1.71% | 0.10% |
In this case, a specialty retailer divided its customer base randomly into test and control groups. The test group was mailed. The response/purchase data were collected and summarized by pre-campaign customer recency segments.
- Unlike the traditional response, value or attrition metric, incremental metrics are not directly observable at the customer level for a particular campaign. If you analyze a catalog mailing campaign, you know who the responders are to your campaign. But in our case, we do not know who the incremental responders are. Take the recency 1-3 month segment as an example. We know the incremental response rate is 0.69%. Therefore, if you mailed 1,000 customers, you would have gained net 7 incremental purchasers. But who are these 7 purchasers? Can we identify them within the larger set of responders? The answer is no. This is a big deal. It means that we will not have a clearly defined dependent variable that we can use for profiling analysis and regression modeling. [To be strictly correct, it is actually possible to derive or estimate individual-level incremental metrics. For example, in our work for grocery retailers, we often use promotional and price sensitivity as part of the basis for segmentation. But such measurements require data from multiple campaigns where a particular customer was in some campaigns and not others. In other words, a customer serves as her own control and by using some longitudinal panel modeling techniques, we can derive individual level promotional elasticity. But this is not what we are dealing with here.]
- The overall effect of mailing is usually very small. You surely noticed that none of the cells in our report has an incremental response rate that is higher than 1%. While we masked the data, we did not distort its pattern and magnitude. What the report shows actually represents the norm we have experienced over many client projects. I remember two years ago, I attended a presentation at NCDM. One of the vendors and its client were co-presenting incremental modeling. The room was absolutely packed. The overall spread between the mailed and control was over 10%! I was baffled with disbelief. First of all, it is very rare to see such overall difference. I have never seen such overall lift in all the incrementality-related projects, and we have worked on many such projects from a wide array of industries. Secondly, if one has such a nice spread, then, presumably the baseline (purchase rate from control) would be very low. Therefore, the incremental response rate essentially becomes the traditional response rate. And finally, if you have 10% overall difference, why bother building another model, unless your offer is extremely costly or your margin is very low?
- The challenge of a very low overall incremental response imposes is obvious. If your overall incremental is 0.5%, and you want to use a model to identify a subset of the customer base, say 20% of them, with a 2% incremental lift, this usually implies that the bottom deciles will have pretty large negative incremental, which is difficult to justify. In essence, this would mean that marketing actually causes some people not to buy. Furthermore, building a response model based on campaign data with 0.5% response rate is no trivial matter, let alone doing this in the context of incremental modeling. Talking about finding the needle in the haystack! Ever wonder why you cannot validate your incremental model? Maybe it is because the model is spurious. Every time, the measurement of interest is smaller than the acceptable error rate in such areas as database householding algorithms, one should pause.
The greatest difficulty in building incremental response model lies in the lack of observed heterogeneity. In some sense, statistical modeling is about understanding the variation among customers and systematically decomposing such variations and attributing them to other tractable constructs. The resulting model is then used for formulating various inferences and predictions. Yet we know that often we cannot observe such heterogeneity, or at least not the variation that truly are associated with behavior. This is a concern in all marketing analysis, but a particular challenge when it comes to incrementality.
With this in mind, let’s use the data from the report to show the key difference between a response model and incremental response model. We assume for the time being that the only variable we have is recency. Keep in mind that recency is one of the most important variables in most predictive models in marketing. Response modeling first. In this case, the variation of interest is the different degree of propensities among customers to respond to a marketing offer. There are at least three sources for the variation:
- Customers in the same recency segment might be different in other ways. This is called within-segment variation.
- Customers from different segments have different propensities. This is called between-segment variation
- Noise in the data, such as measurement errors or sample imbalance may be creating the appearance of variation.
The first source, within-segment variation, is of no use to us, since we cannot explain it using other observable variables. The last source, that of some error in the data, should be controlled to ensure the validity of the models. If we have reason to suspect such flaws, it is best to find a more reliable data set.
The second source, between-segment variation, is of primary interest, since we can then use the systematic relationship between recency and the variation of response propensity to target customers with the highest propensities. Building the model is nothing but translating recency (or other variables) into a more sophisticated functional form. We can use a simple graph to show the various sources of variation.
For each recency segment, we plot the response rate and its confidence interval (the vertical bars). These confidence intervals are a measurement of the variation within each of the segments. We notice that clearly, the biggest variations are between recency segments, rather than within segments. We further verified our intuition by developing a random effects model that systematically decomposed these variations into different sources. The model showed that the 94% of the total variation in the data is due to between-segment differences. The relationship between recency and response rates is simple and tractable. That is why a response model is very easy to build. In this case, including recency without any transformation would result in a model that ranks order almost perfectly.
Now, let’s use the same data set for incremental response modeling. In this case, I use the difference between test and control groups as the measurement of interest. We can use log odds or relative risks. From the standpoint of statistical inference, they are largely equivalent. The distribution of the incremental response is assumed to be normal. This is justified given the sample sizes we have in our data.
Immediately, we notice some sharp contrasts. First of all, the confidence intervals at each recency segment are very large. This is because the difference between test and control is very small. Again, I want to remind you that such variations are not useful for us, since we cannot explain them. Furthermore, and this is the core of the problem: there is little, if any variation between segments. Fitting the same random effect meta-study model, we concluded that the variation was dominated by the within-segment heterogeneity, which we cannot take advantage of because they are uncounted. But the only source of variation that could be useful for prediction purpose, namely the cross-segment difference, is not statistically significant.
Imagine that you are building a model to identify high-risk credit card customers, but discover out that the difference between high-risk and low-risk customers are much smaller than the differences within each of these respective groups. Whatever you come up with is likely to be dominated by data noise or random chance. When the noise-effect ratio is high, the resulting model is very likely to be unstable.
Admittedly, our example assumes that we only have recency as a variable; generally, we would have other variables at our disposal to contribute to population heterogeneity. On the other hand, recency is one of the most productive variables in database marketing, and the “recency effect” is often so strong that is dominates many other variables. It could be that, even with other variables introduced, recency might explain a sufficient amount of the observed variation, and we might still end up with similar outcomes.
So, what can be done? What are the remedies? First of all, I definitely recommend performing some analysis similar to what we have shown here before embarking on an incremental response modeling project. In addition to recency, you can use existing response model scores to understand if there is enough tractable and stable heterogeneity in the data that are related to incrementality. There should be occasions where you need to have the common sense to conclude that there is really not enough overall incremental lift or there is no significant variation of the incremental lift among customers, and building a model is not advised. Rather, the focus should be to improve the offer and creative and make the direct marketing viable. Once the viability issue is addressed, we then can focus on the efficiency by developing a model.
From a methodological standpoint, one possibility is to consider using data from multiple campaigns to derive a “non-campaign-specific” promotional elasticity measure. This can be done using panel modeling (random effect or Bayesian approaches, among other possibilities). Such elasticity can be derived at the customer level. The drawback of this approach is that it may not optimize any particular campaign and is more suitable for a holistic and long-term communication plans. But I would argue that if you develop an incremental response model specifically for a Holiday mailing, using prior holiday campaign data for the holiday does not necessarily give me lots of confidence.
Finally, consider more advanced modeling techniques. The standard approach for incremental response model is building two equations for test and control groups separately or to factor in some interactions. When the effect of mailing is relatively small comparing with the other noise in the data, it is more desirable to impose more structure.
For example, you could use empirical Bayes-based shrinkage estimates. You could make some assumptions and exercise tradeoff between being biased and using data more efficiently. This is analogous to sample size requirement for direct marketing tests. If you have limited sample size, you can increase the power of the test by having more restricted set of hypothesis. Instead of using a traditional logit model (one equation with interaction, or two separate models for test and control) where you are not making any a priori assumption on which group has a higher response rate, you could impose a functional form in your regression so that the test is always better than control. In other words, you impose the non-negative effect constraint.
Similarly, you can use latent class regression to introduce so called “structural heterogeneity” constraints where you assume that there are two latent segments A and B. For segment A, the incremental effect of mailing is zero and the incremental effect of mailing for segment B is strictly positive. Such latent segmentation will take out lots of noise in the data and allows you to focus on segment B and come up with a further prioritization in terms of the relative degree of incrementality.
There is no simple answer to which method is more likely to yield useful results, so one is advised to be creative in exploring potential methodologies. At the same time, it is important to recognize that limitations in the data may mean that it is not possible to identify segments where there is incrementality. And finally, we may find that the segment that really does exhibit incremental behavior is so small that it is not useful for the business; even if we market to this group, the absolute sales generated may be too inconsequential to be worth the effort.
If, on the other hand, we do find a strong population exhibiting incremental behavior with a model that is stable and reliable, then we have a tool that can drive considerable value for the organization. In many cases, it provides a more profitable method of identifying customers for promotion than traditional response models, which tend to over-reward those already likely to respond.


What is a conference interval
As to incremental responders, could you not – if you had one more variable apart from recency – estimate an individual response probability (from a previous campaign) and then the incrementals are those that deviate from a low probability?
Dirk
Dirk, thank you for the question (and the proofreading).
The framework you suggest is, in fact, one of the approaches that we employ in solving such problems. What the article tried to convey is that there are technical difficulties that arise, if we do not “customize” this framework in a rather creative way.
Here is the essence of the problem. I randomly pick two customers A_c and B_c from the test and control group. Suppose for some magical reason, I found exact two identical counterparts A_t and B_t. There are three types of differences of interest:
1. The difference between A_c and B_c (and therefore A_t and B_t): This is your traditional response model, or (in general) the customers’ preference variations under control or test conditions.
2. The difference between A_c and A_t (and therefore B_c and B_t): This is your marketing effect.
3. The difference between (A_t – A_c) and (B_t – B_c): This the marketing effect variation among customers.
Number 3 is what we need to model. Usually, Number 1 is huge, Number 2 is tiny. It is because of this disproportional ratio of effect vs. “noise” that makes the incremental model difficult to develop. If Number 2 is big enough, then Number 3 is relatively easy to derive and estimate. But in almost all cases, Number 2 is small.
It is like asking for what types of children a government education policy worked better, when there is no evidence the policy worked over all. Ironically, one of the reasons one cannot ascertain the overall effectiveness of the policy is because children are very different to begin with (e.g. Number 1 is big).
So overall, your suggestion works, but it is important to control for the considerable amount of noise in the data.
Thanks again. Hope this helps clarify things.
Hello,
I just came across this article. Thank You very much for such practical pointers.
A have 2 questions:
1. Did You mean to say two customer A_t and B_t from Test population and their counterparts A_c and B_c from control?
2. I was trying to minimize the “noise” by segmenting customers by their buying patterns. It didn’t work well. I was thinking about applying “propensity score matching” methodology to try to pick counterpart one by one. Do You think it is going to work, have You tried anything like that? Propensity score (from what I read – never did it myself though) should reduce all variables potentially affecting outcome to one single score. Splitting by that score should produce segments that have “same” distributions of underlying variables.
Thank You
On point 1: You’re right. As written, it says “test and control”, but the next sentences indicate that it should be “control.”
On point 2: If I understand your question, the approach you suggest is to use propensity score matching to find comparable groups. It’s an interesting approach, and while Rosenbaum and Rubin’s original paper (Biometrika, 1983) was geared more toward medical treatment, many of the techniques in direct marketing grew out of biological and medical approaches. There are a number of advanced techniques that could be employed.
Another way of understanding the problem of incremental response models is to say the following… First, in a traditional response model, we’re interested in the difference between treated and control groups, and between responders and non-responders among the groups. In gross terms, we want to see a higher response rate among the treated group than among non-treated. Assuming we have a well-randomized and relatively large population for each, we can ascribe the difference to the effect of the mailing.
With an incremental approach, our goal is to select only customers that require the stimulus of treatment in order to respond. In fact, if we were successful, many customers whom we might otherwise treat using a traditional response model might now go un-treated, although we expect them to respond even absent a communication. So among all likely responders, we need to be able to distinguish the “incremental” and “non-incremental” customers. And that is really difficult to do.
BTW, you mentioned that you tried segmenting customers by “buying patterns.” The challenge there is that purchasing is a stochastic process, which by itself can be very challenging to model, because the data are always right-censored. For example, we employ a stochastic modeling approach that tries to estimate the likelihood that a customer is still a customer, because attrition is silent in retail (we call this a customer equity model). We developed it as a way around some of the problems with defining buying patterns (e.g. large purchase – infrequent, small purchase – frequent, etc.) because such definitions wind up not capturing the significant degree of heterogeneity in the underlying transactional behavior. BTW, “recency” is definitely part of the stochastic process, and as Hongjie shows in his analysis, recency does not explain variation well in an incremental modeling problem, although it might with other co-variates.
Thanks for the comment.
David King