Uncertainty-Aware Late Fusion for 3D Perception (Confidence Calibration + Fusion Rule Learning)

Qi Xin

doi:10.51903/jtie.v4i1.485

Authors

Qi Xin Management Information Systems, University of Pittsburgh, PA, USA

DOI:

https://doi.org/10.51903/jtie.v4i1.485

Keywords:

3D perception, late fusion, confidence calibration, uncertainty estimation, LiDAR-camera fusion

Abstract

Late fusion remains attractive for multi-sensor 3D perception because it preserves independent sensor pipelines, enables modular upgrades, and supports rigorous ablation experiments. This paper presents an uncertainty-aware late-fusion framework that combines per-modality confidence calibration with learning a fusion rule. We conduct full experimental evaluations on a PandaSet-style LiDAR+camera subset comprising 10 multi-frame sequences and 2,200 synchronized frames, with 49,549 annotated 3D objects across the Car, Pedestrian, and Cyclist classes. The framework calibrates LiDAR and camera confidence using temperature scaling and isotonic regression, estimates uncertainty-conditioned localization variance, and fuses associated candidates using multiple rules (max, mean, product/odds, and Dempster-Shafer) as well as a learned fusion rule (logistic regression trained on association features). On the test split, isotonic calibration reduces LiDAR Expected Calibration Error from 0.260 to 0.006 and Negative Log-Likelihood from 0.410 to 0.110, and it similarly improves camera confidence quality. Although mean Average Precision (mAP) remains similar to a LiDAR-only baseline in this controlled setting, calibrated late fusion provides substantially better decision reliability at fixed confidence thresholds and maintains conservative high-precision behavior under camera dropout. These results support an engineering conclusion: confidence calibration is the highest-leverage upgrade for late fusion in safety-critical stacks, and fusion rule choice can be tuned to downstream risk preferences.

References

Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel, L., Monfort, M., Muller, U., Zhang, J., Zhang, X., Zhao, J., & Zieba, K. (2016). End to End Learning for Self-Driving Cars. arXiv preprint arXiv:1604.07316. https://arxiv.org/abs/1604.07316

Chen, X., Ma, H., Wan, J., Li, B., & Xia, T. (2017). Multi-View 3D Object Detection Network for Autonomous Driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1907–1915. https://doi.org/10.1109/cvpr.2017.691

Dempster, A. P. (1967). Upper and Lower Probabilities Induced by a Multivalued Mapping. The Annals of Mathematical Statistics, 38(2), 325–339. https://doi.org/10.1214/aoms/1177698950

Ferson, S., Kreinovich, V., Ginzburg, L., Myers, D. S., & Sentz, K. (2003). Constructing Probability Boxes and Dempster-Shafer Structures. Sandia National Laboratories Report. https://www.osti.gov/biblio/15008659

Gal, Y., & Ghahramani, Z. (2016). Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Proceedings of the 33rd International Conference on Machine Learning, 48, 1050–1059. https://proceedings.mlr.press/v48/gal16.html

Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017). On Calibration of Modern Neural Networks. In Proceedings of the 34th International Conference on Machine Learning, 70, 1321–1330. https://proceedings.mlr.press/v70/guo17a.html

Geiger, A., Lenz, P., & Urtasun, R. (2012). Are We Ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, 3354–3361. https://doi.org/10.1109/cvpr.2012.6248074

Kendall, A., & Gal, Y. (2017). What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? In Advances in Neural Information Processing Systems, 30, 5574–5584. https://papers.nips.cc/paper_files/paper/2017/hash/2650d6089a6d640c5e85b2b88265dc2b-Abstract.html

Kim Sa Ram, Park Ji Hoon, & Hong Jae Yeon. (2025). A Hybrid Noise Reduction and Normalization Framework for Improving Multimodal Sensor Data Quality in Real-Time Systems. Journal of Technology Informatics and Engineering, 4(3), 350–368. https://doi.org/10.51903/jtie.v4i3.440

Lang, A. H., Vora, S., Caesar, H., Zhou, L., Yang, J., & Beijbom, O. (2019). PointPillars: Fast Encoders for Object Detection from Point Clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12697–12705. https://doi.org/10.1109/cvpr.2019.01297

Lakshminarayanan, B., Pritzel, A., & Blundell, C. (2017). Simple and Scalable Predictive Uncertainty Estimation Using Deep Ensembles. In Advances in Neural Information Processing Systems, 30, 6402–6413. https://papers.nips.cc/paper_files/paper/2017/hash/9ef2ed4b7fd2c810847ffa5fa85bce38-Abstract.html

Liu, Z., Zhang, Z., Cao, Y., Hu, H., & Tong, Y. (2023). BEVFusion: Multi-Task Multi-Sensor Fusion With Unified Bird's-Eye View Representation. arXiv preprint arXiv:2205.13542. https://arxiv.org/abs/2205.13542

Naeini, M. P., Cooper, G. F., & Hauskrecht, M. (2015). Obtaining Well Calibrated Probabilities Using Bayesian Binning. In Proceedings of the AAAI Conference on Artificial Intelligence, 2901–2907. https://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/10000

Pang, S., Morris, D., & Radha, H. (2020). CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection. arXiv preprint arXiv:2009.00784. https://arxiv.org/abs/2009.00784

Pang, S., Morris, D., & Radha, H. (2022). Fast-CLOCs: Fast Camera-LiDAR Object Candidates Fusion for 3D Object Detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2269–2278. https://doi.org/10.1109/wacv51458.2022.00233

Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann. https://doi.org/10.1016/c2009 0 27609 4

Qi, C. R., Liu, W., Wu, C., Su, H., & Guibas, L. J. (2018). Frustum PointNets for 3D Object Detection from RGB-D Data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 918–927. https://doi.org/10.1109/cvpr.2018.00101

Shafer, G. (1976). A Mathematical Theory of Evidence. Princeton University Press. https://doi.org/10.1515/9780691214696

Sentz, K., & Ferson, S. (2002). Combination of Evidence in Dempster-Shafer Theory (Sandia National Laboratories Report SAND2002-0835). Sandia National Laboratories. https://www.osti.gov/biblio/15006958

Sindagi, V. A., Zhou, Y., & Tuzel, O. (2019). MVX-Net: Multimodal VoxelNet for 3D Object Detection. In 2019 International Conference on Robotics and Automation, 2392–2398. https://doi.org/10.1109/icra.2019.8793956

Vora, S., Lang, A. H., Helou, B., & Beijbom, O. (2020). PointPainting: Sequential Fusion for 3D Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4604–4612. https://doi.org/10.1109/cvpr42600.2020.00463

Weng, X., Wang, J., Held, D., & Kitani, K. (2020). AB3DMOT: A Baseline for 3D Multi-Object Tracking and New Evaluation Metrics. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems, 1–8. https://doi.org/10.1109/iros45743.2020.9340882

Xiao, X., Gagliano, R., Lee, J., et al. (2021). PandaSet: Advanced Sensor Suite Dataset for Autonomous Driving. arXiv preprint arXiv:2111.12969. https://arxiv.org/abs/2111.12969

Zadrozny, B., & Elkan, C. (2002). Transforming Classifier Scores into Accurate Multiclass Probability Estimates. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 694–699. https://doi.org/10.1145/775047.775151

Zhou, Y., & Tuzel, O. (2018). VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4490–4499. https://doi.org/10.1109/cvpr.2018.00472

Uncertainty-Aware Late Fusion for 3D Perception (Confidence Calibration + Fusion Rule Learning)

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

full sidebar