Federating scientific communities for Open Discovery: the LUMEN Data Mesh framework

short talk × friday × 13.30-15.00

Julien Homo

Foxcub
Paris, France

Kévin Darty

Foxcub
Paris, France

Thomas Klebel

Know Center Research GmbH
Graz, Austria

Yann Le Franc

e‐Science Data Factory
Montpellier, France

Luca De Santis

Net7
Pisa, Italia

PRESENTATION

Open science is transforming scholarly communication by promoting openness, transparency, and reuse. Yet to fully realize its benefits, the scholarly ecosystem must overcome fragmentation and disciplinary silos that hinder the discoverability, quality, and reusability of research outputs. Infrastructures that are either overly centralized or rigidly standardized often fail to reflect the diversity of scientific practices and data types, especially across smaller or specialized communities. Monolithic repositories, large-scale aggregators, or top-down metadata catalogs may aggregate content, but rarely support meaningful cross-domain interoperability or flexible reuse that respects domain-specific formats, vocabularies, and workflows.

At LUMEN [1], we introduce the LUMEN Data Mesh: a novel solution to this problem that operationalizes the Data Mesh paradigm [2] for open science. Rather than imposing a central infrastructure or fixed data model, LUMEN establishes a federated, community-governed network in which research communities retain autonomy over their data platforms while exposing their outputs – datasets, software, publications, semantic artefacts, and author profiles – as standardized, reusable Data Products. These products are described and governed through Data Contracts, ensuring consistent structure, semantics, quality, and access conditions. The model aligns with the FAIR principles and EOSC recommendations while remaining lightweight and adaptable, allowing legacy infrastructures to onboard incrementally by publishing Data Contracts and exposing standard interfaces such as APIs, SPARQL endpoints, or harvesting protocols.

The LUMEN architecture is built on three integrated layers. At the base lies the Federated Communities Ecosystem, where each discipline (e.g., Social Sciences, Earth System, Mathematics, Molecular Dynamics) operates its own platform and retains ownership of its data and curation practices. Communities define their own discovery environments, apply domain-specific metadata schemas, and publish FAIR-aligned Data Products without abandoning local workflows. These nodes connect to the mesh by complying with shared eligibility criteria, including the publication of ODCS-compliant Data Contracts [3] and exposing standard interfaces (REST APIs, OAI-PMH, SPARQL endpoints, etc.).

At the second layer, the Shared Data Platform offers cross-domain services such as a White Label discovery platform inspired by GoTriple [4], a FAIR Semantic Artefact Management Space, a Meta-Search engine, an AI-powered chatbot for research assistance, and metrics dashboards. These tools promote semantic interoperability, metadata harmonization, and intelligent knowledge discovery across the federation, while reducing technical burden on individual communities.

The third layer, Federated Governance, orchestrates the rules of the ecosystem. Composed of representatives from each community, it validates Data Contracts, defines federation policies, and ensures compliance with FAIR, open licensing, and EOSC service expectations.

This federated design contrasts sharply with traditional interoperability models. While inspired by existing models such as the EOSC Interoperability Framework (EOSC-IF), the RDA Metadata Working Group, and SKG-IF [5], the LUMEN federated design pushes further by applying these principles across community-owned platforms. Instead of standardizing every dataset under a single schema or repository, LUMEN enables semantic decoupling with alignment: communities use their own ontologies but converge through a shared metamodel and mappings. This balance between autonomy and alignment allows innovation at the local level while preserving global coherence. It fosters diversity in representation, yet ensures discoverability and reuse through machine-actionable, rich metadata.

Beyond its architectural originality, the LUMEN Data Mesh offers a concrete response to persistent challenges in scholarly communication. By formalizing Data Products – datasets, software, publications, semantic artefacts, and author profiles – through structured contracts, it sets clear expectations on structure, semantics, availability, and curation. These contracts operationalize the FAIR principles [6], enabling a shift from ad-hoc metadata to machine-actionable, reusable outputs. Shared validation mechanisms and metadata harmonization foster semantic coherence and cross-domain quality assurance, while federated discovery ensures that all outputs are indexed in a mesh-wide catalog enriched by semantic services – boosting their visibility and reuse across disciplines.

At its core, LUMEN is not a platform but a participatory framework for federated infrastructure. Communities retain control over their services while benefiting from shared tools and governance. Though sustainability depends on continued engagement and resources, LUMEN mitigates this through flexible onboarding, shared components, and distributed maintenance. It offers a resilient, inclusive model that lowers technical barriers, supports multilingual practices, and fosters equity. By combining decentralized ownership with shared protocols and creating a framework aligned with institutional strategies for reproducibility and open science training [7], the LUMEN Data Mesh lays the foundation for scalable, interoperable, and collaborative open science infrastructure.

keywords

Data Contracts; Data Mesh; Data Products; EOSC Integration; FAIR Data; Federated Governance

References

1. European Commission. (2024). Linked User-driven Multidisciplinary Exploration Network (LUMEN) [Horizon Europe project]. CORDIS. https://doi.org/10.3030/101187940

2. Supramanian, A. V. (2025). Data Mesh Architecture: Revolutionizing Enterprise Data Management through Decentralization. Int. J. of Sci. Res. in CSEIT, 11(2), 63–71. https://doi.org/10.32628/CSEIT251112387

3. Bitol Project. (2023). Open Data Contract Standard (ODCS). The Linux Foundation. https://github.com/bitol-oss/opendatacontract-standard

4. De Santis, L. FAIR as a Journey: Lessons Learned from Building the GoTriple Discovery Platform for Social Sciences and Humanities. Publications 2024, 12, 26. https://doi.org/10.3390/publications12030026

5. Baglioni, M., Pavone, G., Mannocci, A. et al. Towards the interoperability of scholarly repository registries. Int J Digit Libr 26, 2 (2025). https://doi.org/10.1007/s00799-025-00414-y

6. Mons B, Neylon C, Velterop J, Dumontier M, da Silva Santos LOB, Wilkinson MD. Cloudy, increasingly FAIR; revisiting the FAIR Data guiding principles for the European Open Science Cloud. Information Services and Use. 2017;37(1):49-56. https://doi.org/10.3233/ISU-170824

7. Friederike E Kohrs, Susann Auer, Alexandra Bannach-Brown, Susann Fiedler, Tamarinde Laura Haven, Verena Heise, Constance Holman, Flavio Azevedo, René Bernard, Arnim Bleier, Nicole Bössel, Brian Patrick Cahill, Leyla Jael Castro, Adrian Ehrenhofer, Kristina Eichel, Maximillian Frank, Claudia Frick, Malte Friese, Anne Gärtner, Kerstin Gierend, David Joachim Grüning, Lena Hahn, Maren Hülsemann, Malika Ihle, Sabrina Illius, Laura König, Matthias König, Louisa Kulke, Anton Kutlin, Fritjof Lammers, David MA Mehler, Christoph Miehl, Anett Müller-Alcazar, Claudia Neuendorf, Helen Niemeyer, Florian Pargent, Aaron Peikert, Christina U Pfeuffer, Robert Reinecke, Jan Philipp Röer, Jessica L Rohmann, Alfredo Sánchez-Tójar, Stefan Scherbaum, Elena Sixtus, Lisa Spitzer, Vera Maren Straßburger, Marcel Weber, Clarissa J Whitmire, Josephine Zerna, Dilara Zorbek, Philipp Zumstein, Tracey L Weissgerber (2023). Eleven strategies for making reproducible research and open science training the norm at research institutions. eLife 12:e89736 https://doi.org/10.7554/eLife.89736