MRS (Metadata Registry System)

Overview

SLICES research infrastructure will construct one of Europe’s most advanced scientific instruments in the field of digital sciences, fostering cutting-edge research, and scientific data-sharing, in line with Europe’s open science policy. Metadata is a key enabler of open science, facilitating the repeatability and reproducibility of experimental results by providing the necessary information to understand, evaluate, and replicate a study’s methods, data, and findings, enhancing scientific research integrity. Carefully crafted metadata must adhere to open science principles, such as FAIR, to facilitate human-/machine-readability and machine-actionability, and ensure compliance with legal and ethical regulations. However, due to the multi-disciplinary nature of research that SLICES aims to support, opting for a single metadata standard, presents several important challenges, particularly usability, performance, and interoperability. Similarly, opting for a unification of several established, domain-specific metadata schemas and vocabularies is cumbersome and decreases resource discovery, access, and (re)use efficiency and effectiveness.

SLICES aims to overcome these challenges and allow its users and its ecosystem of platforms and systems to uniformly find, access, and use any digital object, such as data, services, tools, and software. To accomplish this, SLICES defines a hierarchical metadata profile scheme, combining domain-agnostic metadata for easy and uniform discovery of digital objects by both humans and machines and type- and domain-specific information for enhancing machine-actionability, allowing internal and external services to access more complex information about the object and take appropriate actions.

The Metadata Registry System (MRS) is an essential component of the SLICES Research Infrastructure, enabling efficient management and discovery of digital objects while ensuring findability, accessibility, interoperability, and reusability (FAIR principles). MRS plays a crucial role in the SLICES ecosystem by centralizing metadata for datasets, publications, software, tools, and internal resources.

This documentation provides a comprehensive look into MRS’s architecture, core components, and deployment strategies. Key sections include:

  • SLICES Metadata Models: Describes the types, versions, and structures of digital objects, such as datasets, publications, and services, utilized within MRS, supporting extensibility and adaptability over time.

  • Metadata Registry System (MRS): Acts as a centralized platform for managing andretrieving metadata on digital objects, with a focus on interoperability within and outside the SLICES network.

  • MRS Backend: An ASP.NET Core application that provides a RESTful API for metadata operations, supports authentication through SLICES AAI, and ensures versioned API compatibility.

  • MRS Web Portal: A user-friendly Angular-based SPA that enables users to interact with MRS via search, management, and reporting features, fully integrated with SLICES AAI for secure access.

  • MRS Deployment: Options for MRS deployment include local testing environments and production-like setups with reverse proxies, supporting scalability and flexibility across different infrastructure setups.

Metadata Models

SLICES aims to facilitate the management and sharing of the research data intuitively and safely, respecting the creators’ rights, while ensuring compliance with Open Science and FAIR principles. The SLICES metadata model is structured in a hierarchical manner, enabling flexibility and applicability across various research domains.

Design Objectives

  • Flexible Hierarchical Metadata Model (Domain-agnostic and Domain-specific): Since there is no “one-size-fits-all” metadata standard to address all current and future requirements, SLICES considers the utilization of a hierarchical model consisting of compulsory metadata attributes that are domain-agnostic and can describe any digital object (e.g., data, services) ensuring that it conforms to FAIR principles and beyond. Where appropriate, SLICES supports additional optional metadata attributes accompanied by their metadata model to further enhance the description of the object. Consequently, the model comprehensively describes data objects and supports a plethora of functions to query and retrieve them. The model is flexible and open, allowing extensions with new attributes or new types/categories as well as with new additional hierarchy levels.

  • Hybrid Metadata Production (human and machine-generated): SLICES streamlines metadata production using appropriate tools that can automatically generate/extract metadata (e.g., date of creation). Manual human-based metadata creation is supported (e.g., description, keywords) to enable user-defined metadata. Furthermore, predefined input validation workflows are employed to improve the quality of data.

  • Wide-ranging Interoperability: Metadata should include attributes to support different levels of interoperability: semantic, which allows internal and external systems to discover and understand what the underlying object is; legal, which describes the restrictions in the data; and technical, which enables systems to communicate effectively using appropriate catalogs and services. Furthermore, although metadata are stored in the SFDO core metadata format, SLICES must ensure interoperability with other systems (e.g., EOSC, aggregators) by supporting the transformation of the data to other metadata formats (e.g., DublinCore, OpenAIRE). As such, SLICES-RI provides appropriate interoperability services to enable researchers/practitioners, content providers, funders and research administrators to collaborate or utilize an existing platform to do so.

  • Long-term Reusability: It is expected that through the lifetime of the SLICES project, the metadata model will evolve due to internal (e.g., support for new use cases) or external factors (e.g., introduction of new standards, communication protocols). The metadata model must incorporate appropriate elements to describe the metadata model itself, including specific attributes, such as the metadata version, and utilize services where systems or users can obtain this info. Vocabularies and Standards utilized by specific attributes should also be versioned. This will also enable mappings between different versions of the same metadata records further enhancing reusability and interoperability.

  • Enhanced Discovery: Discovery should be further supported with descriptors created manually (e.g., keywords) but also automatically. Automatic descriptors may come from automatic metadata data extraction (e.g., creation date, file size) but also using built-in data analysis functions (e.g., term document frequency).

  • Metadata Quality Assurance: Metadata quality is important for ensuring reusability and interoperability with other infrastructures/platforms and applications that may want to consume data from SLICES. A Metadata Quality Assurance Framework should be put in place as part of the Data Governance Group with appropriate metrics to assess the quality of metadata, its FAIRness, and other objectives.

  • Metadata Governance: The members of the Data Governance Group should have the knowledge and authority to make decisions on how metadata are maintained, what format is being utilized, and how changes are authorized and audited.

Model Structure

SLICES implements a flexible hierarchical metadata model consisting of three levels as illustrated in the Figure below. The first level consists of compulsory domain-agnostic information that can describe any digital object (e.g., data, services, publications, tools), ensuring that it conforms to FAIR principles and beyond. It includes basic information, such as identification, description and its resource type. Management information is also included such as version and metadata profile used.

Diagram of Metadata Hierarchical Model

SLICES hierarchical metadata model

Additionally, SLICES employs a second level of compulsory metadata attributes that are type-specific to enhance machine-actionability for specific commonly used types of digital objects, such as data, services, and software. For example, a dataset may have start and end date, a facility may have an address.

Finally, the third level incorporates optional domain-specific attributes to further enhance interoperability for specific communities. For example, the SigMF standard is designed to record signal (e.g. wireless radio transmissions) data and includes some predefined metadata attributes, such as core:sample_rate, core:datatype and core:hw (hardware that was used for signal recording). This information is exposed as Level 3 SFDO attributes to facilitate enhanced discoverability across different such datasets.

Metdata Catalogue

The SLICES metadata catalogue is the result of the analysis performed in the previous sections and crosswalks between some of the most predominant metadata standards such as RDA metadata IG, EOSC minimum metadata set, Dublin Core, Datacite 4.3, and DCAT 2.0.

The catalogue is composed of the following field categories:

  • Primary Information & Important information that is used to identify and describe the resource, such as internal and external identifiers

  • Management Information & Information related to the management of the SFDO including the versioning and metadata profile used to interpret it

  • Access Information & The manner and mode in which the SFDO can be accessed

  • Classification Information & Classification of the SFDOin specific scientific domains

  • Publication Information & Specific publication information such as the date of submission and acceptance

  • Financial Information & Information related to the financial aspects of access for the SFDO

  • Support Information & Support information, such as helpdesk URLs, manuals, terms of use, etc.

  • User Information & Information related to registered users. The information access mode is by default private and only identifiers are used

  • Attribution Information & Information related to the funding instruments (e.g., body, project) of the SFDO

  • Maturity Information & Fields utilized specifically with services and software and describe important information such as standards, open source technologies, and certifications

  • Spatio-Temporal Information & Location and Time descriptors. Locations use Geographic coordinates when applicable

  • Geographic Information & Specific geographic coordinate attributes i.e., latitude, longitude, and altitude

  • Language Information & Language descriptors

  • Dataset Information & Specific dataset descriptors, such as format, size

  • Rights and terms of access & Legal information related to the rights of access, such as licenses, copyrights, etc.

  • Software & Software-related information, such as repository, documentation, and programming language used to develop the software.

Metadata Registry System (MRS)

The Metadata Registry System (MRS) is designed to support the findability, accessibility, interoperability, and reusability of digital objects across the SLICES ecosystem.

The main objectives of MRS are outlined below:

  • Findability of all digital objects: Ensures that systems and users can easily locate digital objects.

  • Implements SLICES Fair Digital Object (SFDO): Supports various types of digital objects:

    • Common: Datasets, Publications, Software, Tools

    • Internal: Nodes, Sites, Provider, Monitoring

    • Meta: Metadata models, versioning

  • Query Engine: Provides a discovery engine for digital objects within the SLICES infrastructure.

  • Access and Management: A comprehensive API for CRUD operations and advanced functionality, secured by SLICES authentication and authorization.

MRS Architecture

The MRS provides access and management services for SFDOs using three main components, which are described below:

  • Repository: Metadata is stored in a dedicated repository implemented as a PostgreSQL database.

  • Backend: A backend system exposes the repository as a REST API, providing authentication, authorization, backward compatibility, maintenance, and other essential functions.

  • Web Portal: A web portal facilitates user interaction with the MRS.

Metadata Registry System (MRS) Architecture

Metadata Registry System (MRS) Architecture

Discovery

The MRS offers a versatile discovery engine to enable various search capabilities, accommodating both simple and advanced needs:

  • Simple search: Supports match, multi-match, and pattern match queries.

  • Advanced search: Allows multiple conditions and filters, including fuzzy search, synonyms, and other refinements.

  • Metadata-based search: Enables filtering and finding digital objects by specific types or metadata criteria.

Search Options: Additionally, search results can be sorted, advanced-sorted, paginated, and limited by number to support user-defined queries effectively.

Interoperability

The MRS enhances interoperability by allowing stored metadata to be compatible with external platforms and systems:

  • Converters: Built-in converters enable seamless interoperability with platforms like EOSC and support future expansion to other systems.

  • Export and Import: Selected metadata can be exported to the EOSC catalogue (e.g., services, datasets) and may be expanded to other platforms. Additionally, the MRS supports importing metadata, facilitating cross-platform integration.

  • Mapping Maintenance: To support interoperability, the system maintains mappings between SLICES SFDO and other models, with support for exporting to standard formats like DublinCore, DataCite, and RDA.

  • Automated Updates: The backend forwards updates to the EOSC API automatically while omitting updates to non-EOSC metadata.

Reusability of Digital Objects

Enhances reusability by supporting type/domain-specific metadata and automatic metadata production, improving both machine-actionability and interoperability.

  • Includes type/domain-specific attributes for more precise metadata.

  • Automates metadata production for streamlined data handling and reuse.

Implementation

This section provides information about the implementation of MRS.

Database

MRS uses a dedicated repository for storing the metadata models and the digital objects. An important requirement for metadata persistence was to use an open-source solution. PostgreSQL was selected due to its extensive feature set, reliable track record, and widespread familiarity. However, this component can be easily replaced in the future based on new requirements, e.g. scaling, replication, etc. For example, given the nature of MRS storage and access patterns, the storage could be switched for NoSQL document and graph database engines.

SFDOs are stored using a Table per Hierarchy (TPH) 1 method, i.e. SFDOs of different Level 2 types share the same database table. To accommodate for this, Level 2 attributes (columns) are set as nullable, even if they are required. As the database is only ever accessed by the backend (see next section), the correct schema is still enforced before data is persisted. If the database was accessed directly by multiple systems, check constraints could be used to enforce non-nullability based on the discriminator column.

Backend

The backend is the core of the system, implemented using ASP.NET Core Web API and Entity Framework Core, alongside several widely adopted open-source libraries such as NSwag for OpenAPI specification generation, Asp.Versioning for API versioning, and QueryKit for advanced search capabilities. This structure lowers the barrier for contribution and ensures robust functionality.

SFDO and its attributes are defined as C# classes, with an abstract class for Level 1 and specific classes for each Level 2 type inheriting from this abstract class. The backend exposes metadata as a REST API, documented by an automatically generated OpenAPI 3.0 specification, facilitating interaction with the MRS across various programming languages and frameworks.

MRS REST APIs.

MRS REST APIs.

All APIs are versioned to maintain backward compatibility, with versions specified in the URL path. This design allows for seamless integration with existing clients, even as metadata evolves. The backend handles any required translation, allowing the underlying storage format to be decoupled from any client interaction. This allows for scenarios such as:

  • New attribute is added. Older clients do not receive this attribute. When older clients create a new SFDO, the new attribute is set to its default value (e.g. null) or a predefined placeholder (e.g. ‘N/A’). When older clients apply a differential update, the attribute’s value is not changed

  • New level 2 type is added. Clients using older versions are either not able to see SFDOs of the new type or a generic ‘other’ type is returned

  • A single value item is promoted to an array. Older clients receive only the 1st element (or null if no elements)

MRS Versioned Models Diagram.

MRS Versioning

Any change to the metadata is accompanied by an increase in the metadata/API version number. While this approach may appear to add a lot of overhead for MRS maintainers, metadata is not expected to change of ten; likewise, backward compatibility of a core service is very important to ensuring the longevity of the platform. Currently, the versioning scheme uses ‘v{major}.{minor}’ format. From v1.0 onwards, minor versions are restricted to only adding new attributes and SFDO types. The API paths for SFDO CRUD (Create, Read, Update, Delete) operations are as follows:

  • Create: POST /v[apiVersion]/digital-objects, Body: attribute values

  • Read: GET /v[apiVersion]/digital-objects/id

  • Update: PUT /v[apiVersion]/digital-objects/id, Body: attribute values

  • Delete: DELETE /v[apiVersion]/digital-objects/id

For instance, to read an SFDO with ID 13 using metadata version 0.1, the request would be: GET /v0.1/digital-objects/13.

While the backend is designed to be as independent as possible, as mentioned above the authentication is delegated to SLICES-global Authentication and Authorization Infrastructure (AAI) to ensure uniform user access across the entire infrastructure.

Web Portal

The web portal provides a user-friendly interface for accessing MRS functionalities, including searching for and managing digital objects (if permissions allow) and accessing reporting features such as dashboards. Accessible to all registered users, the portal enhances usability across the platform.

Implemented as a single-page application (SPA) using Angular 16, the web portal communicates with the backend using RESTful JSON APIs, consistent with other client applications. User authentication and authorization are managed through the global SLICES Authentication and Authorization Infrastructure (AAI), where users obtain an access token (JWT) to be included as a header in backend API calls.

Simple search allows quick filtering by the SFDO name alone and is designed primarily for quick user-led discovery through the web portal.

Web Portal Discovery Search

MRS Discovery Search

Advanced search provides a dedicated query language for arbitrary per-field filtering of SFDOs. Table 4 provides examples of various query expressions. Some advanced search query expression examples are:

  • Identifier “https://doi.org” - Identifier starts with “https://doi.org”

  • ByteSize 1000 - Size is larger than 1000 bytes

  • CreatedAt 2022-07-01 - SFDO is created later than 2022-07-01

  • Creators.FirstName @=* “John” (case sensitive) - First name of any creator contains “John”

  • Keywords ∧$“5G′′ - Keywords contains “5G” entry (case sensitive)

Furthermore, the query expressions can be combined using and/or operators.

Web Portal Advanced Search

MRS Advanced Search

Web Portal Top Menu

MRS Top Menu

MRS provides visibility into its operations using a reporting dashboard, presenting usage and monitoring information, such as the number of stored SFDOs and types.

Web Portal Dashboard

MRS Dashboard

Web Portal Results

MRS Results

Web Portal Management

MRS Management

Deployment

Terminology

  • MRS: Metadata Registry System

  • Service: Individual applications within the MRS, corresponding to Docker Compose services

  • OIDC: OpenID Connect

  • IdP: Identity Provider

  • SSO: Single Sign-On

  • SPA: Single Page Application

  • HA: High Availability

MRS deployment consists of two main components, the storage service (database) and the backend, which allows for manipulating the data in the database. These two systems, allow for authenticated systems to interact with MRS. Furthermore, a sample portal is included for enabling the system to be usable by human users.

The MRS manages metadata for digital objects in the SLICES-RI ecosystem and consists of:

  • Database: PostgreSQL for metadata storage

  • Backend: ASP.NET Core application for data manipulation

  • Portal: Angular-based SPA for user access

Additional infrastructure includes an OIDC-based IdP/SSO with Keycloak, a load balancer/reverse proxy for SSL termination and routing, and optional pgAdmin 4 for database management in development setups.

MRS is designed with flexibility and adaptability in mind, allowing for deployment and integration across a wide range of existing infrastructures. This versatility positions MRS as an ideal solution for managing metadata across diverse systems within the SLICES ecosystem.

MRS Standalone Implementation

The standalone implementation of MRS operates with a backend that communicates directly with the database. User authentication is managed through the AAI, with the backend and portal verifying access. Users access services via the system’s ingress points, ensuring secure and direct access to data and services.

MRS Standalone Implementation Diagram.

MRS Standalone Implementation

Preliminary Preparation

If using Docker Desktop for Mac, initialize the system by running the following commands:

chmod +x containers/postgres/init/init-keycloak.sh
chmod +x containers/postgres/init/init-mrs.sh

Source: Stack Overflow

Methodologies

MRS supports flexible deployment configurations, from HA/load-balanced setups to CDN-served portals. For ease of deployment, two Docker Compose configurations are provided:

Option 1: Localhost Ports

Intended for quick tests, this option deploys services on distinct ports without a reverse proxy. It is found in the localhost-ports directory.

Option 2: Domains

This production-like setup deploys services on dedicated domains (e.g., backend.mrs.slices-ri.eu). Requirements include:

  • A pre-existing Traefik reverse proxy with Docker provider support

  • HTTPS with certificates configured in Traefik

This option is located in the domains directory.

Other Options: K8s and More

The Docker Compose configurations provide baseline examples for deploying MRS across various environments. Traefik, a leading cloud-native reverse proxy, is preferred for Docker Compose as it dynamically configures based on container labels. Reference configurations are available in traefik/compose.yml, with further guidance in the Traefik documentation.

Note: HTTPS certificates must be provided by infrastructure management; otherwise, Traefik will generate a self-signed certificate.

Docker Compose Configuration

Environment variables for container configuration are provided via .env files. A template (.env.dist) is included with default values and explanations.

Post-Deployment Actions

Once the stack is running, set up a user in Keycloak:

  • Log in to the Keycloak admin panel with admin as the username and the password set in [MRS_C_KEYCLOAK_ADMIN_PASSWORD] in .env.

  • Switch to the slices realm.

  • Navigate to Users in the sidebar, click New User, and complete the form.

  • Under Credentials, set a password and uncheck “temporary password” if desired.

Access the system by navigating to the URL set in MRS_C_PORTAL_HOST.

5G Blueprint MRS Integration Architecture

This section illustrates the integration of MRS within the 5G Blueprint, demonstrating its interoperability with other networked components.

5G Blueprint MRS Integration Architecture Diagram.

5G Blueprint MRS Integration Architecture

MRS operates as a component of the Data Management Infrastructure (DMI). DMI orchestrates both MRS and the storage service to store any digital object created during experimentation. The high level architecture of MRS is depicted below.

MRS-DMI High-level Architecture Diagram.

MRS-DMI High-level Architecture

The API for creating content involves a request with the metadata model, which returns a pre-signed content URL. Content can be uploaded using a standard HTTP PUT request (e.g., via curl). The 5G-BP “provisioner” makes two calls: one to the DMI to get a URL (according to the PID) and another to the Storage Service to upload the file at that URL, avoiding the need to embed large files in the first call.

MRS API Creation.

MRS API Creation

For API retrieval, a request is made with the SFDO ID, and the response includes the metadata along with a pre-signed download URL, which can be used to download the file via a standard HTTP GET request (e.g., via curl).

MRS API Retrieval.

MRS API Retrieval

Authentication and authorization are managed via Bearer JWTs issued by the SLICES Open ID Connect Secure Token Service (STS). Currently, the authorization layer operates in a binary mode, allowing authenticated users to perform mutating operations while unauthenticated users are restricted. Future updates will inherit authorization from the SLICES AAI, enabling more granular access control based on user project participation.