Uncertainty-Aware Late Fusion for 3D Perception (Confidence Calibration + Fusion Rule Learning)
DOI:
https://doi.org/10.51903/jtie.v4i1.485Keywords:
3D perception, late fusion, confidence calibration, uncertainty estimation, LiDAR-camera fusionAbstract
Late fusion remains attractive for multi-sensor 3D perception because it preserves independent sensor pipelines, enables modular upgrades, and supports rigorous ablation experiments. This paper presents an uncertainty-aware late-fusion framework that combines per-modality confidence calibration with learning a fusion rule. We conduct full experimental evaluations on a PandaSet-style LiDAR+camera subset comprising 10 multi-frame sequences and 2,200 synchronized frames, with 49,549 annotated 3D objects across the Car, Pedestrian, and Cyclist classes. The framework calibrates LiDAR and camera confidence using temperature scaling and isotonic regression, estimates uncertainty-conditioned localization variance, and fuses associated candidates using multiple rules (max, mean, product/odds, and Dempster-Shafer) as well as a learned fusion rule (logistic regression trained on association features). On the test split, isotonic calibration reduces LiDAR Expected Calibration Error from 0.260 to 0.006 and Negative Log-Likelihood from 0.410 to 0.110, and it similarly improves camera confidence quality. Although mean Average Precision (mAP) remains similar to a LiDAR-only baseline in this controlled setting, calibrated late fusion provides substantially better decision reliability at fixed confidence thresholds and maintains conservative high-precision behavior under camera dropout. These results support an engineering conclusion: confidence calibration is the highest-leverage upgrade for late fusion in safety-critical stacks, and fusion rule choice can be tuned to downstream risk preferences.
References
Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel, L., Monfort, M., Muller, U., Zhang, J., Zhang, X., Zhao, J., & Zieba, K. (2016). End to End Learning for Self-Driving Cars. arXiv preprint arXiv:1604.07316. https://arxiv.org/abs/1604.07316
Chen, X., Ma, H., Wan, J., Li, B., & Xia, T. (2017). Multi-View 3D Object Detection Network for Autonomous Driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1907–1915. https://doi.org/10.1109/cvpr.2017.691
Dempster, A. P. (1967). Upper and Lower Probabilities Induced by a Multivalued Mapping. The Annals of Mathematical Statistics, 38(2), 325–339. https://doi.org/10.1214/aoms/1177698950
Ferson, S., Kreinovich, V., Ginzburg, L., Myers, D. S., & Sentz, K. (2003). Constructing Probability Boxes and Dempster-Shafer Structures. Sandia National Laboratories Report. https://www.osti.gov/biblio/15008659
Gal, Y., & Ghahramani, Z. (2016). Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Proceedings of the 33rd International Conference on Machine Learning, 48, 1050–1059. https://proceedings.mlr.press/v48/gal16.html
Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017). On Calibration of Modern Neural Networks. In Proceedings of the 34th International Conference on Machine Learning, 70, 1321–1330. https://proceedings.mlr.press/v70/guo17a.html
Geiger, A., Lenz, P., & Urtasun, R. (2012). Are We Ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, 3354–3361. https://doi.org/10.1109/cvpr.2012.6248074
Kendall, A., & Gal, Y. (2017). What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? In Advances in Neural Information Processing Systems, 30, 5574–5584. https://papers.nips.cc/paper_files/paper/2017/hash/2650d6089a6d640c5e85b2b88265dc2b-Abstract.html
Kim Sa Ram, Park Ji Hoon, & Hong Jae Yeon. (2025). A Hybrid Noise Reduction and Normalization Framework for Improving Multimodal Sensor Data Quality in Real-Time Systems. Journal of Technology Informatics and Engineering, 4(3), 350–368. https://doi.org/10.51903/jtie.v4i3.440
Lang, A. H., Vora, S., Caesar, H., Zhou, L., Yang, J., & Beijbom, O. (2019). PointPillars: Fast Encoders for Object Detection from Point Clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12697–12705. https://doi.org/10.1109/cvpr.2019.01297
Lakshminarayanan, B., Pritzel, A., & Blundell, C. (2017). Simple and Scalable Predictive Uncertainty Estimation Using Deep Ensembles. In Advances in Neural Information Processing Systems, 30, 6402–6413. https://papers.nips.cc/paper_files/paper/2017/hash/9ef2ed4b7fd2c810847ffa5fa85bce38-Abstract.html
Liu, Z., Zhang, Z., Cao, Y., Hu, H., & Tong, Y. (2023). BEVFusion: Multi-Task Multi-Sensor Fusion With Unified Bird's-Eye View Representation. arXiv preprint arXiv:2205.13542. https://arxiv.org/abs/2205.13542
Naeini, M. P., Cooper, G. F., & Hauskrecht, M. (2015). Obtaining Well Calibrated Probabilities Using Bayesian Binning. In Proceedings of the AAAI Conference on Artificial Intelligence, 2901–2907. https://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/10000
Pang, S., Morris, D., & Radha, H. (2020). CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection. arXiv preprint arXiv:2009.00784. https://arxiv.org/abs/2009.00784
Pang, S., Morris, D., & Radha, H. (2022). Fast-CLOCs: Fast Camera-LiDAR Object Candidates Fusion for 3D Object Detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2269–2278. https://doi.org/10.1109/wacv51458.2022.00233
Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann. https://doi.org/10.1016/c2009 0 27609 4
Qi, C. R., Liu, W., Wu, C., Su, H., & Guibas, L. J. (2018). Frustum PointNets for 3D Object Detection from RGB-D Data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 918–927. https://doi.org/10.1109/cvpr.2018.00101
Shafer, G. (1976). A Mathematical Theory of Evidence. Princeton University Press. https://doi.org/10.1515/9780691214696
Sentz, K., & Ferson, S. (2002). Combination of Evidence in Dempster-Shafer Theory (Sandia National Laboratories Report SAND2002-0835). Sandia National Laboratories. https://www.osti.gov/biblio/15006958
Sindagi, V. A., Zhou, Y., & Tuzel, O. (2019). MVX-Net: Multimodal VoxelNet for 3D Object Detection. In 2019 International Conference on Robotics and Automation, 2392–2398. https://doi.org/10.1109/icra.2019.8793956
Vora, S., Lang, A. H., Helou, B., & Beijbom, O. (2020). PointPainting: Sequential Fusion for 3D Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4604–4612. https://doi.org/10.1109/cvpr42600.2020.00463
Weng, X., Wang, J., Held, D., & Kitani, K. (2020). AB3DMOT: A Baseline for 3D Multi-Object Tracking and New Evaluation Metrics. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems, 1–8. https://doi.org/10.1109/iros45743.2020.9340882
Xiao, X., Gagliano, R., Lee, J., et al. (2021). PandaSet: Advanced Sensor Suite Dataset for Autonomous Driving. arXiv preprint arXiv:2111.12969. https://arxiv.org/abs/2111.12969
Zadrozny, B., & Elkan, C. (2002). Transforming Classifier Scores into Accurate Multiclass Probability Estimates. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 694–699. https://doi.org/10.1145/775047.775151
Zhou, Y., & Tuzel, O. (2018). VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4490–4499. https://doi.org/10.1109/cvpr.2018.00472
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Qi Xin

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

