Layout-Aware Progressive PDF Rendering: AI Prioritization of PDF Slices to Reduce Time-to-Functional-First-Frame on FUNSD

Heyu  Wang; Yuxuan Ren; Xiaohan  Chang

doi:10.51903/jtie.v4i2.523

Authors

Heyu Wang Computer Science, University of Southern California, CA, USA
Yuxuan Ren Chemical Engineering & Data Science, University of Washington, WA, USA
Xiaohan Chang Computer Science, University of Connecticut, CT, USA

DOI:

https://doi.org/10.51903/jtie.v4i2.523

Keywords:

PDF rendering, progressive rendering, document AI, FUNSD, tile ranking

Abstract

Progressive PDF rendering is attractive because users rarely need every visible pixel at once; they need the semantically useful parts of the current viewport early enough for reading and interaction. This paper studies whether layout-aware AI can prioritize PDF slices more effectively than geometric or density-based heuristics. We reconstruct vector PDFs from official FUNSD form annotations and evaluate a tile scheduler that predicts tile utility from inexpensive layout and preview features before high-resolution rendering begins. The empirical study covers 26 reconstructed documents from the FUNSD test split that were fully processed in the present environment, four viewport scenarios, and measured clip-render timings for all visible tiles. The main configuration uses an 8×10 grid and a random-forest regressor trained with page-level 5-fold GroupKFold, then compares the learned scheduler with row-major visible-first, center-first, ink-density, text-density, a hand-tuned layout heuristic, full-page rendering, and an oracle upper bound. The proposed model reaches TTFF-90 in 14.21 ms, compared with 15.18 ms for the best non-AI heuristic, 20.48 ms for full-page rendering, and 24.09 ms for row-major rendering. It also achieves Utility@20ms of 0.941, AUC@25ms of 0.730, NDCG@10 of 0.963, and Recall@10 of 0.969. The results show that slice rendering is not inherently beneficial: the summed visible-tile cost in the main 8×10 setting is 28.80 ms, which is higher than the full-page cost of 20.48 ms, so scheduling quality determines whether slicing improves or harms TTFF. A coarser 6×8 grid reduces AI TTFF-90 to 10.58 ms, while the densest pages favor a full-page fallback. Paired Wilcoxon signed-rank tests over the page-scenario cases yield p < .001 for TTFF-90 improvements of the proposed model over every non-AI baseline. However, those tests should be interpreted as case-level rather than document-level evidence.

References

Achanta, R., Hemami, S., Estrada, F., & Susstrunk, S. (2009). Frequency-Tuned Salient Region Detection. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, 1597–1604. https://doi.org/10.1109/cvpr.2009.5206596

Appalaraju, S., Jasani, B., Kota, B. U., Xie, Y., & Manmatha, R. (2021). DocFormer: End-to-End Transformer for Document Understanding. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 993–1003. https://doi.org/10.1109/iccv48922.2021.00103

Artifex Software. (2024). MuPDF Documentation. https://mupdf.com/docs

Binghua Zhou, Siming Zhao, & David Chao. (2023). LLM-Guided Energy-Aware A/B Testing for Consolidation and DVFS Policies via Power-Sensitivity Clustering. Journal of Advanced Computing Systems, 3(4), 12–30. https://doi.org/10.69987/jacs.2023.30402

Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/a:1010933404324

Card, S. K., Moran, T. P., & Newell, A. (1983). The Psychology of Human-Computer Interaction. Lawrence Erlbaum Associates. https://doi.org/10.1201/9780203736456

Daren Zheng, & Chenyu Li. (2024). Behavior-Level Jailbreak Resistance via Multi-Stage Refusal + Utility Preservation. Journal of Advanced Computing Systems, 4(1), 83–99. https://doi.org/10.69987/jacs.2024.40107

Fedhira, & Prianto, C. (2025). Systematic Literature Review: Analysis of AI Implementation for Document Verification. Jurnal Ilmiah Sistem Informasi, 4(3), 417–430. https://doi.org/10.51903/kjjwk708

Friedman, J. H. (2001). Greedy Function Approximation: A Gradient Boosting Machine. Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451

Harley, A. W., Ufkes, A., & Derpanis, K. G. (2015). Evaluation of Deep Convolutional Nets for Document Image Classification and Retrieval. In 2015 13th International Conference on Document Analysis and Recognition (ICDAR), 991–995. https://doi.org/10.1109/icdar.2015.7333910

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778. https://doi.org/10.1109/cvpr.2016.90

Huang, Y., Lv, T., Cui, L., Lu, Y., & Wei, F. (2022). LayoutLMv3: Pre-Training for Document AI with Unified Text and Image Masking. In Proceedings of the 30th ACM International Conference on Multimedia, 4083–4091. https://doi.org/10.1145/3503161.3548112

International Organization for Standardization. (2008). ISO 32000-1:2008 Document Management - Portable Document Format - Part 1: PDF 1.7. https://www.iso.org/standard/51502.html

Itti, L., Koch, C., & Niebur, E. (1998). A Model of Saliency-Based Visual Attention for Rapid Scene Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254–1259. https://doi.org/10.1109/34.730558

Jarvelin, K., & Kekalainen, J. (2002). Cumulated Gain-Based Evaluation of IR Techniques. ACM Transactions on Information Systems, 20(4), 422–446. https://doi.org/10.1145/582415.582418

Jaume, G., Ekenel, H. K., & Thiran, J.-P. (2019). FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents. In 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), 2, 1–6. https://doi.org/10.1109/icdarw.2019.10029

Katti, A. R., Reisswig, C., Guder, C., Brarda, S., Bickel, S., Hohne, J., & Faddoul, J. B. (2018). Chargrid: Towards Understanding 2D Documents. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 4459–4469. https://doi.org/10.18653/v1/d18-1476

Kohavi, R. (1995). A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, 2, 1137–1143. https://dl.acm.org/doi/10.5555/1643031.1643047

Miller, R. B. (1968). Response Time in Man-Computer Conversational Transactions. In Proceedings of the December 9-11, 1968, Fall Joint Computer Conference, Part I, 267–277. https://doi.org/10.1145/1476589.1476628

Mozilla and individual contributors. (2024). PDF.js Documentation. https://mozilla.github.io/pdf.js/

Nielsen, J. (1993). Usability Engineering. Morgan Kaufmann. https://doi.org/10.1016/b978-0-08-052029-2.50007-3

O'Gorman, L. (1993). The Document Spectrum for Page Layout Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(11), 1162–1173. https://doi.org/10.1109/34.244677

Oktavia, A., & Wibowo, A. (2025). A New Theoretical Framework for Analyzing the Social and Economic Impacts of AI Within the Digital Economy. Journal of Management and Informatics, 4(2), 859–871. https://doi.org/10.51903/jmi.v4i2.156

Orinos, N., Onola, Q., & Chistoff, O. B. (2025). Zero-Shot Learning for Multilingual Document Classification in Low-Resource Languages. Journal of Technology Informatics and Engineering, 4(3), 391–402. https://doi.org/10.51903/jtie.v4i3.446

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is All You Need. In Advances in Neural Information Processing Systems 30, 5998–6008. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-abstract.html

Wilcoxon, F. (1945). Individual Comparisons by Ranking Methods. Biometrics Bulletin, 1(6), 80–83. https://doi.org/10.2307/3001968

Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., & Zhou, M. (2020). LayoutLM: Pre-Training of Text and Layout for Document Image Understanding. In Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 1192–1200. https://doi.org/10.1145/3394486.3403172

Xu, Y., Xu, Y., Lv, T., Cui, L., Wei, F., Wang, G., Lu, Y., Florencio, D., Zhang, C., Che, W., Zhang, M., & Zhou, L. (2021). LayoutLMv2: Multi-Modal Pre-Training for Visually-Rich Document Understanding. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2579–2591. https://doi.org/10.18653/v1/2021.acl-long.201

Yifei Lu, Jinyi Mu, & Thao Tran. (2024). Uncertainty-Aware Uplift Modeling for Safer Marketing Targeting: Conformal Prediction and Bayesian Calibration with LCB Policies. Journal of Advanced Computing Systems, 4(5), 84–101. https://doi.org/10.69987/jacs.2024.40507

Yunhe Li. (2024). Findable then Explainable: Retrieval–Summary Integration for Code Intelligence on a Lightweight CodeSearchNet Subset. Journal of Advanced Computing Systems, 4(7), 65–82. https://doi.org/10.69987/jacs.2024.40706

Zhong, X., Tang, J., & Jimeno Yepes, A. (2019). PubLayNet: Largest Dataset Ever for Document Layout Analysis. In 2019 International Conference on Document Analysis and Recognition (ICDAR), 1015–1022. https://doi.org/10.1109/icdar.2019.00166