Project of semantic web of PARES authorities: extraction and initial analysis
Main Article Content
Abstract
The research focuses on describing the types of authorities within the Spanish Archives Portal, providing their quantification and relational ratio to outline the initial graph of this sector in PARES. To achieve this, web scraping methods have been employed, allowing for the compilation of all authority records for processing and analysis. The collected data demonstrates the greater relevance of personal authorities and families, followed by institutions and concepts. This approach reflects the importance of individuals and family relationships in the historical and archival context. Additionally, associative relationships between individuals and institutions are highlighted, suggesting the complexity of social and organizational interactions in the past. Furthermore, a strong interconnection between places and individuals is observed, as well as between places and other entities such as institutions and norms. This underscores the importance of geolocation and geographic context in understanding the historical and cultural heritage represented in PARES. Moreover, an equitable proportion of family relationships is identified, indicating a rich representation of social and family life. Conversely, there is a low proportion of associative relationships with information sources, suggesting the need to expand the documentation and references used in descriptive records.
Article Details
References
Agrawal, N., & Johari, S. (2019). A survey on content-based crawling for deep and surface web. Fifth International Conference on Image Information Processing (ICIIP) (pp. 491-496). IEEE. https://doi.org/10.1109/ICIIP47207.2019.8985906
APEF (n.d.) Who we are. Archives Portal Europe. https://www.archivesportaleurope.net/about-us/who-we-are/
Bae, S. W., Lee, H. D. & Cho, D. (2018). Design and implementation of a web crawler system for collection of structured and unstructured data. Journal of Korea Multimedia Society, 21(2), 199-209. https://doi.org/10.9717/kmms.2018.21.2.199
Chang, Z. (2022). A survey of modern crawler methods. Proceedings of the 6th International Conference on Control Engineering and Artificial Intelligence (pp. 21-28). https://doi.org/10.1145/3522749.3523076
CRUE (2017). Guía Linked Open Data para archivos universitarios. Grupo de Trabajo Linked Open Data y Archivos Universitarios, CRUE. http://cau.crue.org/wp-content/uploads/GT_9_Gu%C3%ADa_Linked_Open_Data_para_Archivos_Universitarios_2017.pdf
Dombrowski, A., & Dombrowski, Q. (2010). A formal approach to XML semantics: Implications for archive standards. Proceedings of the International Symposium on XML for the Long Haul: Issues in the Long-Term Preservation of XML. https://doi.org/10.4242/BalisageVol6.Dombrowski01
Gracy, K. F. (2015). Archival description and linked data: a preliminary study of opportunities and implementation challenges. Archival Science, 15, 239-294. https://doi.org/10.1007/s10502-014-9216-2
Guernaccini, F., Mazzini, S., & Bruno, G. (2019). LOD publication in the archival domain: methods and practices. ODOCH@ CaiSE, (pp. 15-26). https://ceur-ws.org/Vol-2375/paper2.pdf
Gunawan, R., Rahmatulloh, A., Darmawan, I., & Firdaus, F. (2019). Comparison of web scraping techniques: regular expression, HTML DOM and Xpath. 2018 International Conference on Industrial Enterprise and System Engineering (ICoIESE 2018). Atlantis Press (pp. 283-287). https://doi.org/10.2991/icoiese-18.2019.50
Hogan, A., Blomqvist, E., Cochez, M., D’Amato, C., Melo, G. D., Gutierrez, C., Kirrane, S., Labra Gayo, J. E., Navigli, R., Neumaier, S., Ngonga Ngomo, A. C., Polleres, A., Rashid, S. M., Rula, A., Schmelzeisen, L., Sequeda, J. F., Staab, S., & Zimmermann, A. (2021). Knowledge graphs. ACM Computing Surveys, 54(4). https://doi.org/10.1145/3447772
Jacobs, C. T., Avdis, A., Mouradian, S. L., & Piggott, M. D. (2015). Integrating research data management into geographical information systems. Roceedings of the 5th International Workshop on Semantic Digital Archives (SDA 2015) (pp. 7–17). http://ceur-ws.org/Vol-1529/paper2.pdf
Koch, I., Freitas, N., Ribeiro, C., Lopes, C. T., & Da Silva, J. R. (2019). Knowledge graph implementation of archival descriptions through CIDOC-CRM. International conference on theory and practice of digital libraries (pp. 99-106). Cham: Springer International Publishing.
Llanes-Padrón, D., & Pastor-Sánchez, J.A. (2017). Records in contexts: the road of archives to semantic interoperability. Program, 2017, 51(4), 387-405. https://doi.org/10.1108/PROG-03-2017-0021
López Cuadrado, A. M., & Requejo Zalama, J. (2021). Estrategias y modelos de gestión de datos archivísticos. Tábula, 24, 97–111. https://publicaciones.acal.es/tabula/article/view/874
López Cuadrado, A. M. (2016). PARES 2.0: tecnología para mejorar el acceso de los ciudadanos a los documentos y a la información en los Archivos Estatales. En González Cachafeiro, J. (coord.). Actas de las jornadas 9ª Jornadas archivando: usuarios, retos y oportunidades. León, 10 y 11 de noviembre (pp. 36-59). ISBN 978-84-617-7452-4
Marciano, R., Lemieux, V., Hedges, M., Esteva, M., Underwood, W., Kurtz, M., & Conrad, M. (2018). Archival records and training in the age of Big Data. In: J. Percell, L. C. Sarin, P. T. Jaeger, & J. C. Bertot (Eds.) Re-envisioning the MLS: Perspectives on the Future of Library and Information Science Education (Advances in Librarianship, vol. 44B, pp. 179-199). Emerald Publishing Limited, Leeds. https://doi.org/10.1108/S0065-28302018000044B010
Maynard, D., & Greenwood, M. A. (2012). Large scale semantic annotation, indexing, and search at the national archives. Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012 (pp. 3487–3494). http://www.lrec-conf.org/proceedings/lrec2012/pdf/122_Paper.pdf
Miller, E. (2001). Semantic Web Layer Cake. https://www.w3.org/2001/09/06-ecdl/slide17-0.html
Niu, J. (2016). Linked data for archives. Archivaria, 82(1), 83-110. https://archivaria.ca/index.php/archivaria/article/view/13582
O’Reilly, T. (30 de septiembre de 2005). What is Web 2.0: Design patterns and business models for the next generation of software. O’Reilly. https://www.oreilly.com/pub/a/web2/archive/what-is-web-20.html
Portal de Archivos Españoles (n.d.). Estadísticas de PARES. https://pares.culturaydeporte.gob.es/estadisticas.html
Radilova, M., Kamencay, P., Hudec, R., Benco, M., & Radil, R. (2022). Tool for parsing important data from web pages. applied sciences, 12(23), 12031. https://doi.org/10.3390/app122312031
Society of American Archivists (2011). Encoded Archival Context - Corporate bodies, Persons, and Families (EAC-CPF). https://www2.archivists.org/node/23669
Vafaie, M., Bruns, O., Pilz, N., Dessí, D. & Sack, H. (2021). Modelling archival hierarchies in practice: Key aspects and lessons learned. CEUR Workshop Proceedings, 2981. https://doi.org/10.34657/8006
Zhang, S., Wu, J., & Yang, K. (2020). A webpage segmentation method based on node information entropy of DOM tree. Journal of Physics: Conference Series, 1624(3), 032023. https://doi.org/10.1088/1742-6596/1624/3/032023