Privacy-Robust Incrementality Estimation in Cookieless Settings via Uplift Modeling: Reproducible Evidence from the Hillstrom E-Mail Experiment
DOI:
https://doi.org/10.51903/jtie.v5i1.468Keywords:
Cookieless Measurement, Incrementality, Uplift Modeling, Heterogeneous Treatment Effects, Differential PrivacyAbstract
Measuring advertising incrementality in the absence of user-level identifiers is increasingly constrained by platform policies and privacy regulations. In cookieless environments, practitioners often observe only aggregated or weak signals (e.g., cohort-level conversion counts) and must still estimate the causal lift of an intervention while quantifying uncertainty. This paper studies cookieless incrementality evaluation through the lens of uplift and individual treatment effect (ITE) modeling under explicit privacy constraints. We conduct full experimental evaluations on the MineThatData (Hillstrom) E-Mail Analytics Challenge dataset (64,000 customers in a randomized controlled experiment with three arms). We cast the task as a binary treatment problem—sending any e-mail campaign versus sending none—and compare six ITE estimators (S-, T-, X-, R-, and doubly robust learners, plus transformed-outcome regression) against cohort-only estimators that emulate cookieless measurement. The cohort estimator uses only aggregated counts and a Bayesian beta–binomial model to shrink noisy rates, and we evaluate robustness under k-anonymity thresholds and Laplace-noised differentially private aggregates. Across held-out test data, the best ID-level model (T-learner with logistic regression) achieves a Qini coefficient of 6.675 and improves the estimated policy conversion rate when targeting the top 20% of customers by predicted uplift. Cohort-only estimation retains a weaker and more variable signal; its point estimate is sensitive to privacy constraints but yields valid uncertainty intervals with 0.892 empirical coverage for a 95% interval in cohort-level validation. The results demonstrate that (i) causal lift is estimable without identifiers when randomized experimentation is available, (ii) doubly robust estimators provide strong performance and fast scoring, and (iii) privacy-preserving aggregation introduces an accuracy–privacy trade-off that can be quantified and monitored using bootstrap and Bayesian uncertainty.
References
Apple. (2021). Take Advantage of New Advertising Attribution Technologies. Apple Developer News. https://developer.apple.com/news/?id=wajvzt18
Athey, S., Tibshirani, J., & Wager, S. (2019). Generalized Random Forests. Annals of Statistics, 47(2), 1148–1178. https://doi.org/10.1214/18-aos1709
Bang, H., & Robins, J. M. (2005). Doubly Robust Estimation in Missing Data and Causal Inference Models. Biometrics, 61(4), 962–973. https://doi.org/10.1111/j.1541-0420.2005.00377.x
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., & Robins, J. (2018). Double/Debiased Machine Learning for Treatment and Structural Parameters. The Econometrics Journal, 21(1), 1–68. https://doi.org/10.1111/ectj.12097
Devriendt, F., Moldovan, D., Verbeke, W., & Baesens, B. (2018). A Literature Survey and Experimental Evaluation of the State-of-the-Art in Uplift Modeling: A Stepping Stone toward the Development of Prescriptive Analytics. Big Data, 6(1), 13–41. https://doi.org/10.1089/big.2017.0104
Dwork, C., McSherry, F., Nissim, K., & Smith, A. (2006). Calibrating Noise to Sensitivity in Private Data Analysis. Theory of Cryptography Conference (TCC 2006), Lecture Notes in Computer Science, 3876, 265–284. https://doi.org/10.1007/11681878_14
Dwork, C., & Roth, A. (2014). The Algorithmic Foundations of Differential Privacy. Foundations and Trends in Theoretical Computer Science, 9(3), 211–407. https://doi.org/10.1561/0400000042
Efron, B., & Tibshirani, R. J. (1994). An introduction to the bootstrap. Monographs on statistics and applied probability, 57(1), 1-436. https://doi.org/10.1201/9780429246593
Google. (2025). Overview of Attribution Reporting API. Privacy Sandbox. https://privacysandbox.google.com/private-advertising/attribution-reporting
Gutierrez, P., & Gérardy, J.-Y. (2017). Causal Inference and Uplift Modeling: A Review of the Literature. In Proceedings of the 3rd International Conference on Predictive Applications and APIs, 67, 1–13. https://proceedings.mlr.press/v67/gutierrez17a.html
Hernán, M. A., & Robins, J. M. (2020). Causal inference: What If Chapman & Hall/CRC. https://miguelhernan.org/whatifbook
Hillstrom, K. (2008, March 20). The MineThatData E Mail Analytics and Data Mining Challenge. MineThatData. https://blog.minethatdata.com/2008/03/minethatdata-e-mail-analytics-and-data.html
Hanqi Zhang. (2023). DriftGuard: Multi-Signal Drift Early Warning and Safe Re-Training/Rollback for CTR/CVR Models. Journal of Advanced Computing Systems, 3(7), 24-40. https://doi.org/10.69987/jacs.2023.30703
Hanqi Zhang. (2024). Risk-Aware Budget-Constrained Auto-Bidding under First-Price RTB: A Distributional Constrained Deep Reinforcement Learning Framework. Journal of Advanced Computing Systems, 4(6), 30-47. https://doi.org/10.69987/jacs.2024.40603
Hanqi Zhang. (2025). Counterfactual Learning-to-Rank for Ads: Off-Policy Evaluation on the Open Bandit Dataset. Journal of Advanced Computing Systems, 5(12), 1-11. https://doi.org/10.69987/jacs.2025.51201
Jamaludin, H., Achlison, U., & Rokhman, N. (2024). Enhancing AI Model Accuracy and Scalability Through Big Data and Cloud Computing. Journal of Technology Informatics and Engineering, 3(3), 296–307. https://doi.org/10.51903/jtie.v3i3.203
Imbens, G. W., & Rubin, D. B. (2015). Causal inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press. https://doi.org/10.1017/cbo9781139025751
Jubin Zhang. (2025). Graph-based Knowledge Tracing for Personalized MOOC Path Recommendation. Journal of Advanced Computing Systems, 5(11), 1-15. https://doi.org/10.69987/jacs.2025.51101
Kohavi, R., Longbotham, R., Sommerfield, D., & Henne, R. M. (2009). Controlled Experiments on the Web: Survey and Practical Guide. Data Mining and Knowledge Discovery, 18(1), 140–181. https://doi.org/10.1007/s10618-008-0114-1
Künzel, S. R., Sekhon, J. S., Bickel, P. J., & Yu, B. (2019). Metalearners for Estimating Heterogeneous Treatment Effects Using Machine Learning. Proceedings of the National Academy of Sciences, 116(10), 4156–4165. https://doi.org/10.1073/pnas.1804597116
Lo, V. S. Y. (2002). The True Lift Model: A Novel Data Mining Approach to Response Modeling in Database Marketing. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 481–486. ttps://doi.org/10.1145/772862.772872
Nie, X., & Wager, S. (2021). Quasi-Oracle Estimation of Heterogeneous Treatment Effects. Biometrika, 108(2), 299–319. https://doi.org/10.1093/biomet/asaa076
Pearl, J. (2009). Causality: Models, Reasoning, and Inference (2nd ed.). Cambridge University Press. https://www.cambridge.org/9780521895606
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., VanderPlas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit Learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830. https://www.jmlr.org/papers/volume12/pedregosa11a/pedregosa11a.pdf
Radcliffe, N. J. (2007). Using Control Groups to Target on Predicted Lift: Building and Assessing Uplift Models. Direct Marketing Analytics Journal, 1, 14–21, 1, 14–21. https://doi.org/10.1007/s10796-022-10283-4
Radcliffe, N. J., & Surry, P. D. (2011). Quality Measures for Uplift Models. Technical report. https://www.stochasticsolutions.com/pdf/kdd2011late.pdf
Rubin, D. B. (1974). Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies. Journal of Educational Psychology, 66(5), 688–701. https://doi.org/10.1037/h0037350
Rzepakowski, P., & Jaroszewicz, S. (2012). Decision Trees for Uplift Modeling with Single and Multiple Treatments. Knowledge and Information Systems, 32(2), 303–327. https://doi.org/10.1007/s10115-011-0434-0
Shirakawa, T., Li, Y., Wu, Y., Qiu, S., Li, Y., Zhao, M., Iso, H., & van der Laan, M. (2024). Longitudinal Targeted Minimum Loss-Based Estimation with Temporal-Difference Heterogeneous Transformer. Proceedings of machine learning research, 235, 45097. https://pmc.ncbi.nlm.nih.gov/articles/pmc12681028/
Sklift. (2021). fetch_hillstrom: MineThatData E-Mail Analytics and Data Mining Challenge Dataset (Copy). https://www.uplift-modeling.com/en/v0.3.1/api/datasets/fetch_hillstrom.html
Sweeney, L. (2002). k-anonymity: A Model for Protecting Privacy. International Journal on Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5), 557–570. https://doi.org/10.1142/s0218488502001648
Wager, S., & Athey, S. (2018). Estimation and Inference of Heterogeneous Treatment Effects Using Random Forests. Journal of the American Statistical Association, 113(523), 1228–1242. https://doi.org/10.1080/01621459.2017.1319839
Shirakawa, T., Li, Y., Wu, Y., Qiu, S., Li, Y., Zhao, M., Iso, H., & Van der Laan, M. (2024). Longitudinal Targeted Minimum Loss-Based Estimation with Temporal-Difference Heterogeneous Transformer. In Proceedings of the 41st International Conference on Machine Learning, 235, 45097. https://pmc.ncbi.nlm.nih.gov/articles/pmc12681028
Xinzhuo Sun, Yifei Lu, & Jing Chen. (2023). Controllable Long-Term User Memory for Multi-Session Dialogue: Confidence-Gated Writing, Time-Aware Retrieval-Augmented Generation, and Update/Forgetting. Journal of Advanced Computing Systems, 3(8), 9-24. https://doi.org/10.69987/jacs.2023.30802
Xinzhuo Sun, Jing Chen, Binghua Zhou, & Meng-Ju Kuo. (2024). ConRAG: Contradiction-Aware Retrieval-Augmented Generation under Multi-Source Conflicting Evidence. Journal of Advanced Computing Systems, 4(7), 50-64. https://doi.org/10.69987/jacs.2024.40705
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Jingwen Bai, Haozhe Wang, Qiyou Wu, Boning Zhang

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

