BUA fellowship: Metadata flows in the context of research data – Research Group Information Management @ Humboldt-Universität zu Berlin

In my PhD research, I investigate how metadata for research data are generated and maintained. I have started by analyzing how metadata records of the DOI registration agency DataCite change over time (Strecker 2024a, 2024b). Recently, I was awarded a grant by the Berlin University Alliance (BUA) - through the BUA Fellowship Program of the Objective 3 - Advancing Research Quality and Value. This grant will allow me to visit Stefanie Haustein, associate professor at the School of Information Studies at the University of Ottawa and co-director of the ScholCommLab. During my stay, I will investigate metadata flows in the context of research data. In particular, I will examine the transfer of metadata from local to global contexts.

Metadata flows in the context of research data

Research data are typically deposited in research data repositories - infrastructures that specialize in handling this resource type. Metadata describing the datasets are also generated and maintained at these infrastructures. Numerous standards - metadata schemas - are used to describe research data (Asok et al. 2024; Koshoffer et al. 2018). Even within a discipline, the use of metadata schemas is not uniform (Mayernik and Liapich 2022). This variance arises because research data repositories create metadata for different use cases. On the one hand, metadata are needed to describe research data in a general and homogeneous way - for example, to enable data discovery in large search services. On the other hand, researchers need subject-specific metadata in order to be able to make an informed decision about whether they want to reuse a dataset (De Vries et al. 2022; Doran, Edmond, and Nugent-Folan 2021; Sostek et al. 2024). In order to cover this range of requirements, many research data repositories use several metadata schemas in parallel, e.g. the general DataCite metadata schema for DOI registration and a specialized schema adressing the needs of local users (Habermann 2023; Lee and Jeng 2019; Wu et al. 2023). In research data repositories, metadata are often initially generated based on a locally implemented, specific metadata schema. Metadata are then mapped to a general schema to meet global requirements such as DOI registration or harvesting (Habermann 2023). During these mapping processes, however, information from the original comprehensive description can be lost (Radio et al. 2017; Taylor et al. 2022). As a result, metadata lose completeness when they are transferred from local to global contexts (Mayernik and Liapich 2022).

During the fellowship, I will investigate metadata flows from research data repositories to DataCite. The aim is to determine whether information that is missing in DataCite metadata are also missing in local contexts or would be available but are not transmitted.

Why this matters

Complete metadata are necessary to convey the context of research data. This is relevant for many interactions with research data, including discovery, reuse and citation. Bibliometric research in particular is hindered by incomplete metadata, as is the development of metrics that are intended to reflect the impact of research data (Ninkov et al. 2021). This research will assess the potential of an approach to improve metadata quality - the optimization of mappings to the DataCite metadata schema.

Further information about our research group can be found on our official website.

This text – excluding quotes and otherwise labelled parts – is licensed under the CC BY 4.0 DEED.

References

Asok, Kavya, Sushree Snigdha Dandpat, Dinesh K. Gupta, and Prashant Shrivastava. 2024. “Common Metadata Framework for Research Data Repository: Necessity to Support Open Science.” Journal of Library Metadata 24: 1–13. https://doi.org/10.1080/19386389.2024.2329370.

De Vries, Jerry, Vyacheslav Tykhonov, Andrea Scharnhorst, Eko Indarto, Femmy Admiraal, and Mike Priddy. 2022. “Flexible Metadata Schemes for Research Data Repositories.The Common Framework in Dataverse and the CMDI Use Case.” In, 168–80. https://doi.org/10.3384/ecp18915.

Doran, Michelle, Jennifer Edmond, and Georgina Nugent-Folan. 2021. “Reconciling the Cultural Complexity of Research Data: Can We Make Data Interdisciplinary Without Hiding Disciplinary Knowledge.” http://www.tara.tcd.ie/handle/2262/83156.

Habermann, Ted. 2023. “Connecting Repositories to the Global Research Community: A Re-Curation Process.” Journal of eScience Librarianship 12 (3): e739. https://doi.org/10.7191/jeslib.739.

Koshoffer, Amy, Amy E. Neeser, Linda Newman, and Lisa R. Johnston. 2018. “Giving Datasets Context: A Comparison Study of Institutional Repositories That Apply Varying Degrees of Curation.” International Journal of Digital Curation 13 (1): 15–34. https://doi.org/10.2218/ijdc.v13i1.632.

Lee, Jian‐Sin, and Wei Jeng. 2019. “The Landscape of Archived Studies in a Social Science Data Infrastructure: Investigating the ICPSR Metadata Records.” Proceedings of the Association for Information Science and Technology 56 (1): 147–56. https://doi.org/10.1002/pra2.62.

Mayernik, Matthew S., and Yauheniya Liapich. 2022. “The Role of Metadata and Vocabulary Standards in Enabling Scientific Data Interoperability: A Study of Earth System Science Data Facilities.” Journal of eScience Librarianship 11 (2). https://doi.org/10.7191/jeslib.619.

Ninkov, Anton Boudreau, Kathleen Gregory, Isabella Peters, and Stefanie Haustein. 2021. “Datasets on DataCite - an Initial Bibliometric Investigation.” In. Leuven, Belgium (Virtual). https://doi.org/10.5281/zenodo.4730857.

Radio, Erik, Fernando Rios, Jeffrey C. Oliver, Benjamin Hickson, and Niamh Wallace. 2017. “Manifestations of Metadata Structures in Research Datasets and Their Ontic Implications.” Journal of Library Metadata 17 (3-4): 161–82. https://doi.org/10.1080/19386389.2018.1439278.

Sostek, Katrina, Daniel Russell, Nitesh Goyal, Tarfah Alrashed, Stella Dugall, and Natasha Noy. 2024. “Discovering Datasets on the Web Scale: Challenges and Recommendations for Google Dataset Search.” Harvard Data Science Review Special Issue 4. https://doi.org/10.1162/99608f92.4c3e11ca.

Strecker, Dorothea. 2024a. “Changes in DataCite DOI Metadata for Research Data.” Zenodo. https://doi.org/10.5281/zenodo.14274240.

———. 2024b. “How Permanent Are Metadata for Research Data? Understanding Changes in DataCite DOI Metadata.” arXiv. https://doi.org/10.48550/arxiv.2412.05128.

Taylor, Shawna, Sarah Wright, Mikala R. Narlock, and Ted Habermann. 2022. “Think Globally, Act Locally: The Importance of Elevating Data Repository Metadata to the Global Infrastructure.” In. https://hdl.handle.net/11299/228001.

Wu, Mingfang, Stephen M. Richard, Chantelle Verhey, Leyla Jael Castro, Baptiste Cecconi, and Nick Juty. 2023. “An Analysis of Crosswalks from Research Data Schemas to Schema.org.” Data Intelligence 5 (1): 100–121. https://doi.org/10.1162/dint_a_00186.

Citation

BibTeX citation:

@online{strecker2025,
  author = {Strecker, Dorothea},
  title = {BUA Fellowship: {Metadata} Flows in the Context of Research
    Data},
  date = {2025-07-18},
  url = {https://doi.org/10.59350/wwwj7-4cm07},
  langid = {en}
}

For attribution, please cite this work as:

Strecker, Dorothea. 2025. “BUA Fellowship: Metadata Flows in the Context of Research Data .” July 18, 2025. https://doi.org/10.59350/wwwj7-4cm07.