Section: Archaeology
Topic: Archaeology, Computer sciences, Engineering

A multimodal approach to heritage preservation in the context of climate change

Corresponding author(s): Roqui, David (roquidavid@yahoo.fr)

10.24072/pcjournal.723 - Peer Community Journal, Volume 6 (2026), article no. e48

Get full text PDF Peer reviewed and recommended by PCI

Cultural heritage sites face accelerating degradations due to climate change, yet traditional monitoring relies on unimodal analysis (visual inspection or environmental sensors alone) that fails to capture the complex interplay between environmental stressors and material deterioration. We propose a lightweight multimodal architecture that fuses sensor data (temperature, humidity) with visual imagery to predict degradation severity at heritage sites. Our approach adapts PerceiverIO with two key innovations: (1) simplified encoders (64D latent space) that prevent overfitting on small datasets (37 samples for training, 555 with data augmentation; 13 for validation, and 13 for testing), and (2) Adaptive Barlow Twins loss that encourages modality complementarity rather than redundancy. On data from Strasbourg Cathedral, our model achieves 76.9% accuracy and 77.0% weighted-F1 score on the test set, a 43% improvement over standard multimodal architectures (VisualBERT, Transformer) and 25% over vanilla PerceiverIO. Ablation studies reveal that sensor-only achieves 61.5% while image-only reaches 46.2%, confirming successful multimodal synergy. A systematic hyperparameter study identifies an optimal moderate correlation target (τ = 0.3) that balances alignment and complementarity, achieving 69.2% accuracy compared to other τ values (τ = 0.1/0.5/0.7: 53.8%, τ = 0.9: 61.5%). This work demonstrates that architectural simplicity combined with contrastive regularization enables effective multimodal learning in data-scarce heritage monitoring contexts, providing a foundation for AI-driven conservation decision support systems.

Published online:
DOI: 10.24072/pcjournal.723
Type: Research article
Classification:
Keywords: Multimodal learning, Machine learning, Culture heritage, Climate change

Roqui, David  1 , 2 , 3 ; Cormier, Adèle  3 , 4 , 2 ; Grozavu, Nistor  1 ; Bourges, Ann  3

1 Laboratoire ETIS, UMR 8051, CY Cergy Paris Université - ENSEA - CNRS, Cergy-Pontoise, France
2 Fondation des sciences du patrimoine (FSP), Paris, France
3 Centre de recherche et de restauration des musées de France (C2RMF), UMR 8247 IRCP-PCMTH, Paris, France
4 Epitopos, Paris, France
License: CC-BY 4.0
Copyrights: The authors retain unrestricted copyrights and publishing rights
Roqui, D.; Cormier, A.; Grozavu, N.; Bourges, A. A multimodal approach to heritage preservation in the context of climate change. Peer Community Journal, Volume 6 (2026), article  no. e48. https://doi.org/10.24072/pcjournal.723
@article{10_24072_pcjournal_723,
     author = {Roqui, David and Cormier, Ad\`ele and Grozavu, Nistor and Bourges, Ann},
     title = {A multimodal approach to heritage preservation in the context of climate change
},
     journal = {Peer Community Journal},
     eid = {e48},
     year = {2026},
     publisher = {Peer Community In},
     volume = {6},
     doi = {10.24072/pcjournal.723},
     language = {en},
     url = {https://peercommunityjournal.org/articles/10.24072/pcjournal.723/}
}
TY  - JOUR
AU  - Roqui, David
AU  - Cormier, Adèle
AU  - Grozavu, Nistor
AU  - Bourges, Ann
TI  - A multimodal approach to heritage preservation in the context of climate change

JO  - Peer Community Journal
PY  - 2026
VL  - 6
PB  - Peer Community In
UR  - https://peercommunityjournal.org/articles/10.24072/pcjournal.723/
DO  - 10.24072/pcjournal.723
LA  - en
ID  - 10_24072_pcjournal_723
ER  - 
%0 Journal Article
%A Roqui, David
%A Cormier, Adèle
%A Grozavu, Nistor
%A Bourges, Ann
%T A multimodal approach to heritage preservation in the context of climate change

%J Peer Community Journal
%] e48
%D 2026
%V 6
%I Peer Community In
%U https://peercommunityjournal.org/articles/10.24072/pcjournal.723/
%R 10.24072/pcjournal.723
%G en
%F 10_24072_pcjournal_723

PCI peer reviews and recommendation, and links to data, scripts, code and supplementary information: 10.24072/pci.archaeo.100708

Conflict of interest of the recommender and peer reviewers:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

[1] Bengio, Y.; LeCun, Y. Convolutional Networks for Images, Speech, and Time Series, The Handbook of Brain Theory and Neural Networks, MIT Press, Cambridge, MA, 1995, pp. 255-258

[2] Cabral, F. S.; Pinto, M.; Mouzinho, F.; Fukai, H.; Tamura, S. An Automatic Survey System for Paved and Unpaved Road Classification and Road Anomaly Detection using Smartphone Sensor, arXiv, 2020 no. arXiv:2007.13389 | DOI

[3] Chen, Y.-C.; Li, L.; Yu, L.; El Kholy, A.; Ahmed, F.; Gan, Z.; Cheng, Y.; Liu, J. UNITER: Universal Image-Text Representation Learning, Proceedings of the European Conference on Computer Vision (ECCV) 2020 (Lecture Notes in Computer Science), Volume 12375, Springer, Cham, 2020, pp. 104-120 | DOI

[4] Cormier, A.; Roqui, D.; Bourges, A.; Grozavu, N. A Comprehensive Study of Weathering Mechanisms and Predictive Modeling for Heritage Stone Conservation, arXiv, 2025 no. arXiv:2511.13343 | DOI

[5] Dais, D.; Bal, .. E.; Smyrou, E.; Sarhosis, V. Automatic Crack Classification and Segmentation on Masonry Surfaces Using Convolutional Neural Networks and Transfer Learning, Automation in Construction, Volume 125 (2021), p. 103606 | DOI

[6] Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; Uszkoreit, J.; Houlsby, N. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, Proceedings of the 9th International Conference on Learning Representations (ICLR), 2021 | DOI

[7] Elgammal, A.; Liu, B.; Elhoseiny, M.; Mazzone, M. CAN: Creative Adversarial Networks, Generating “Art” by Learning About Styles and Deviating from Style Norms, arXiv, 2017 no. arxiv:1706.07068 | DOI

[8] Ganin, Y.; Ustinova, E.; Ajakan, H.; Germain, P.; Larochelle, H.; Laviolette, F.; Marchand, M.; Lempitsky, V. Domain-Adversarial Training of Neural Networks, Journal of Machine Learning Research, Volume 17 (2016) no. 59, pp. 1-35 | DOI

[9] Grilli, E.; Remondino, F. 3D Reconstruction and Semantic Segmentation of Heritage Buildings Using Point Clouds and BIM, The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2/W15, ISPRS, 2019, pp. 467-474 | DOI

[10] Han, S.; Pool, J.; Tran, J.; Dally, W. J. Learning Both Weights and Connections for Efficient Neural Network, Advances in Neural Information Processing Systems 28 (NeurIPS), 2015, pp. 1135-1143

[11] Hein, A. A multimodal approach to heritage preservation in the context of climate change, Peer Community in Archaeology, 2026 no. 100708 | DOI

[12] Hinton, G.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network, arXiv, 2015 no. arxiv:1503.02531 | DOI

[13] Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory, Neural Computation, Volume 9 (1997) no. 8, pp. 1735-1780 | DOI

[14] Jaegle, A.; Borgeaud, S.; Alayrac, J.-B.; Doersch, C.; Ionescu, C.; Ding, D.; Koppula, S.; Zoran, D.; Brock, A.; Shelhamer, E.; Hénaff, O.; Botvinick, M. M.; Zisserman, A.; Vinyals, O.; Carreira, J. Perceiver IO: A General Architecture for Structured Inputs and Outputs, Proceedings of the 10th International Conference on Learning Representations (ICLR), 2022 | DOI

[15] Jaegle, A.; Gimeno, F.; Brock, A.; Zisserman, A.; Vinyals, O.; Carreira, J. Perceiver: General Perception with Iterative Attention, Proceedings of the 38th International Conference on Machine Learning (ICML) (Proceedings of Machine Learning Research), Volume 139, PMLR, 2021, pp. 4651-4664

[16] Karpathy, A.; Toderici, G.; Shetty, S.; Leung, T.; Sukthankar, R.; Fei-Fei, L. Large-Scale Video Classification with Convolutional Neural Networks, 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Columbus, OH, USA, 2014, pp. 1725-1732 | DOI

[17] Li, L. H.; Yatskar, M.; Yin, D.; Hsieh, C.-J.; Chang, K.-W. VisualBERT: A Simple and Performant Baseline for Vision and Language, arXiv, 2019 no. arxiv:1908.03557 | DOI

[18] Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization, Proceedings of the 7th International Conference on Learning Representations (ICLR), 2019

[19] Ngiam, J.; Khosla, A.; Kim, M.; Nam, J.; Lee, H.; Ng, A. Y. Multimodal Deep Learning, Proceedings of the 28th International Conference on Machine Learning (ICML), 2011

[20] Poria, S.; Cambria, E.; Bajpai, R.; Hussain, A. A Review of Affective Computing: From Unimodal Analysis to Multimodal Fusion, Information Fusion, Volume 37 (2017), pp. 98-125 | DOI

[21] Radford, A.; Kim, J. W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; Krueger, G.; Sutskever, I. Learning Transferable Visual Models from Natural Language Supervision, Proceedings of the 38th International Conference on Machine Learning (ICML) (Proceedings of Machine Learning Research), Volume 139, PMLR, 2021, pp. 8748-8763

[22] Roqui, D. Code and data for “A Multimodal Approach to Heritage Preservation in the Context of Climate Change”, https://doi.org/10.5281/zenodo.19053096 (Zenodo), 2025 no. 19053096 | DOI

[23] Selvaraju, R. R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization, Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017 | DOI

[24] Shalev-Shwartz, S.; Ben-David, S. Understanding Machine Learning: From Theory to Algorithms, Cambridge University Press, 2014 | DOI

[25] Singh, A.; Hu, R.; Goswami, V.; Couairon, G.; Galuba, W.; Rohrbach, M.; Kiela, D. FLAVA: A Foundational Language and Vision Alignment Model, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 15638-15650 | DOI

[26] Snell, J.; Swersky, K.; Zemel, R. Prototypical Networks for Few-shot Learning, Advances in Neural Information Processing Systems 30 (NeurIPS), 2017, pp. 4077-4087

[27] Turc, I.; Chang, M.-W.; Lee, K.; Toutanova, K. Well-Read Students Learn Better: On the Importance of Pre-training Compact Models, arXiv, 2019 no. arXiv:1908.08962v2 | DOI

[28] Vaswani, A.; Bengio, S.; Brevdo, E.; Chollet, F.; Gomez, A. N.; Gouws, S.; Jones, L.; Kaiser, Ł.; Kalchbrenner, N.; Parmar, N.; Sepassi, R.; Shazeer, N.; Uszkoreit, J. Tensor2Tensor for Neural Machine Translation, arXiv (2018) no. arxiv:1803.07416 | DOI

[29] Vergès-Belmin, V.; Vallet, J.-M.; Bromblet, P. Le glossaire illustré ICOMOS-ISCS sur les formes d'altération de la pierre : un outil précieux pour les constats d'état de la statuaire des parcs, jardins et cimetières, Pierre 2011 — Conservation de la pierre dans les parcs, jardins et cimetières, Actes du colloque, SFIIC (Section Française de l'Institut International de Conservation), Paris, 2011

[30] Zbontar, J.; Jing, L.; Misra, I.; LeCun, Y.; Deny, S. Barlow Twins: Self-Supervised Learning via Redundancy Reduction, Proceedings of the 38th International Conference on Machine Learning (ICML) (Proceedings of Machine Learning Research), Volume 139, PMLR, 2021, pp. 12310-12320

Cited by Sources: