Section: Archaeology
Topic:
Archaeology,
Computer sciences,
Engineering
A multimodal approach to heritage preservation in the context of climate change
Corresponding author(s): Roqui, David (roquidavid@yahoo.fr)
10.24072/pcjournal.723 - Peer Community Journal, Volume 6 (2026), article no. e48
Get full text PDF Peer reviewed and recommended by PCICultural heritage sites face accelerating degradations due to climate change, yet traditional monitoring relies on unimodal analysis (visual inspection or environmental sensors alone) that fails to capture the complex interplay between environmental stressors and material deterioration. We propose a lightweight multimodal architecture that fuses sensor data (temperature, humidity) with visual imagery to predict degradation severity at heritage sites. Our approach adapts PerceiverIO with two key innovations: (1) simplified encoders (64D latent space) that prevent overfitting on small datasets (37 samples for training, 555 with data augmentation; 13 for validation, and 13 for testing), and (2) Adaptive Barlow Twins loss that encourages modality complementarity rather than redundancy. On data from Strasbourg Cathedral, our model achieves 76.9% accuracy and 77.0% weighted-F1 score on the test set, a 43% improvement over standard multimodal architectures (VisualBERT, Transformer) and 25% over vanilla PerceiverIO. Ablation studies reveal that sensor-only achieves 61.5% while image-only reaches 46.2%, confirming successful multimodal synergy. A systematic hyperparameter study identifies an optimal moderate correlation target (τ = 0.3) that balances alignment and complementarity, achieving 69.2% accuracy compared to other τ values (τ = 0.1/0.5/0.7: 53.8%, τ = 0.9: 61.5%). This work demonstrates that architectural simplicity combined with contrastive regularization enables effective multimodal learning in data-scarce heritage monitoring contexts, providing a foundation for AI-driven conservation decision support systems.
Type: Research article
Keywords: Multimodal learning, Machine learning, Culture heritage, Climate change
Roqui, David  1 , 2 , 3 ; Cormier, Adèle  3 , 4 , 2 ; Grozavu, Nistor  1 ; Bourges, Ann  3
CC-BY 4.0
Roqui, D.; Cormier, A.; Grozavu, N.; Bourges, A. A multimodal approach to heritage preservation in the context of climate change. Peer Community Journal, Volume 6 (2026), article no. e48. https://doi.org/10.24072/pcjournal.723
@article{10_24072_pcjournal_723,
author = {Roqui, David and Cormier, Ad\`ele and Grozavu, Nistor and Bourges, Ann},
title = {A multimodal approach to heritage preservation in the context of climate change
},
journal = {Peer Community Journal},
eid = {e48},
year = {2026},
publisher = {Peer Community In},
volume = {6},
doi = {10.24072/pcjournal.723},
language = {en},
url = {https://peercommunityjournal.org/articles/10.24072/pcjournal.723/}
}
TY - JOUR AU - Roqui, David AU - Cormier, Adèle AU - Grozavu, Nistor AU - Bourges, Ann TI - A multimodal approach to heritage preservation in the context of climate change JO - Peer Community Journal PY - 2026 VL - 6 PB - Peer Community In UR - https://peercommunityjournal.org/articles/10.24072/pcjournal.723/ DO - 10.24072/pcjournal.723 LA - en ID - 10_24072_pcjournal_723 ER -
%0 Journal Article %A Roqui, David %A Cormier, Adèle %A Grozavu, Nistor %A Bourges, Ann %T A multimodal approach to heritage preservation in the context of climate change %J Peer Community Journal %] e48 %D 2026 %V 6 %I Peer Community In %U https://peercommunityjournal.org/articles/10.24072/pcjournal.723/ %R 10.24072/pcjournal.723 %G en %F 10_24072_pcjournal_723
PCI peer reviews and recommendation, and links to data, scripts, code and supplementary information: 10.24072/pci.archaeo.100708
Conflict of interest of the recommender and peer reviewers:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.
[1] Convolutional Networks for Images, Speech, and Time Series, The Handbook of Brain Theory and Neural Networks, MIT Press, Cambridge, MA, 1995, pp. 255-258
[2] An Automatic Survey System for Paved and Unpaved Road Classification and Road Anomaly Detection using Smartphone Sensor, arXiv, 2020 no. arXiv:2007.13389 | DOI
[3] UNITER: Universal Image-Text Representation Learning, Proceedings of the European Conference on Computer Vision (ECCV) 2020 (Lecture Notes in Computer Science), Volume 12375, Springer, Cham, 2020, pp. 104-120 | DOI
[4] A Comprehensive Study of Weathering Mechanisms and Predictive Modeling for Heritage Stone Conservation, arXiv, 2025 no. arXiv:2511.13343 | DOI
[5] Automatic Crack Classification and Segmentation on Masonry Surfaces Using Convolutional Neural Networks and Transfer Learning, Automation in Construction, Volume 125 (2021), p. 103606 | DOI
[6] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, Proceedings of the 9th International Conference on Learning Representations (ICLR), 2021 | DOI
[7] CAN: Creative Adversarial Networks, Generating “Art” by Learning About Styles and Deviating from Style Norms, arXiv, 2017 no. arxiv:1706.07068 | DOI
[8] Domain-Adversarial Training of Neural Networks, Journal of Machine Learning Research, Volume 17 (2016) no. 59, pp. 1-35 | DOI
[9] 3D Reconstruction and Semantic Segmentation of Heritage Buildings Using Point Clouds and BIM, The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2/W15, ISPRS, 2019, pp. 467-474 | DOI
[10] Learning Both Weights and Connections for Efficient Neural Network, Advances in Neural Information Processing Systems 28 (NeurIPS), 2015, pp. 1135-1143
[11] A multimodal approach to heritage preservation in the context of climate change, Peer Community in Archaeology, 2026 no. 100708 | DOI
[12] Distilling the Knowledge in a Neural Network, arXiv, 2015 no. arxiv:1503.02531 | DOI
[13] Long Short-Term Memory, Neural Computation, Volume 9 (1997) no. 8, pp. 1735-1780 | DOI
[14] Perceiver IO: A General Architecture for Structured Inputs and Outputs, Proceedings of the 10th International Conference on Learning Representations (ICLR), 2022 | DOI
[15] Perceiver: General Perception with Iterative Attention, Proceedings of the 38th International Conference on Machine Learning (ICML) (Proceedings of Machine Learning Research), Volume 139, PMLR, 2021, pp. 4651-4664
[16] Large-Scale Video Classification with Convolutional Neural Networks, 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Columbus, OH, USA, 2014, pp. 1725-1732 | DOI
[17] VisualBERT: A Simple and Performant Baseline for Vision and Language, arXiv, 2019 no. arxiv:1908.03557 | DOI
[18] Decoupled Weight Decay Regularization, Proceedings of the 7th International Conference on Learning Representations (ICLR), 2019
[19] Multimodal Deep Learning, Proceedings of the 28th International Conference on Machine Learning (ICML), 2011
[20] A Review of Affective Computing: From Unimodal Analysis to Multimodal Fusion, Information Fusion, Volume 37 (2017), pp. 98-125 | DOI
[21] Learning Transferable Visual Models from Natural Language Supervision, Proceedings of the 38th International Conference on Machine Learning (ICML) (Proceedings of Machine Learning Research), Volume 139, PMLR, 2021, pp. 8748-8763
[22] Code and data for “A Multimodal Approach to Heritage Preservation in the Context of Climate Change”, https://doi.org/10.5281/zenodo.19053096 (Zenodo), 2025 no. 19053096 | DOI
[23] Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization, Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017 | DOI
[24] Understanding Machine Learning: From Theory to Algorithms, Cambridge University Press, 2014 | DOI
[25] FLAVA: A Foundational Language and Vision Alignment Model, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 15638-15650 | DOI
[26] Prototypical Networks for Few-shot Learning, Advances in Neural Information Processing Systems 30 (NeurIPS), 2017, pp. 4077-4087
[27] Well-Read Students Learn Better: On the Importance of Pre-training Compact Models, arXiv, 2019 no. arXiv:1908.08962v2 | DOI
[28] Tensor2Tensor for Neural Machine Translation, arXiv (2018) no. arxiv:1803.07416 | DOI
[29] Le glossaire illustré ICOMOS-ISCS sur les formes d'altération de la pierre : un outil précieux pour les constats d'état de la statuaire des parcs, jardins et cimetières, Pierre 2011 — Conservation de la pierre dans les parcs, jardins et cimetières, Actes du colloque, SFIIC (Section Française de l'Institut International de Conservation), Paris, 2011
[30] Barlow Twins: Self-Supervised Learning via Redundancy Reduction, Proceedings of the 38th International Conference on Machine Learning (ICML) (Proceedings of Machine Learning Research), Volume 139, PMLR, 2021, pp. 12310-12320
Cited by Sources: