Open Data Floods

— Metadata and RDF Assertion

In our implementation, metadata is expressed using the Resource Description Framework (RDF), a W3C standard that structures information as a set of triples—subject, predicate, and object. This model enables precise, machine-readable assertions about datasets, including their title, publisher, license, themes, spatial and temporal coverage, and distribution formats. RDF metadata assertions facilitate linking and federating datasets across different institutions and domains, thereby supporting the principles of Linked Open Data and the Semantic Web.

Considering the Brazilian context and national open data policies—especially those outlined on the Plataforma de Dados Abertos (dados.gov.br)— DCAT Version 3 was implemented. DCAT-BR, a specialization of DCAT, is a Brazilian vocabulary and standard/ecosystem used to describe public sector and linked data sets.

DCAT, or Data Catalog Vocabulary, is an RDF vocabulary developed by the W3C (World Wide Web Consortium) to facilitate interoperability between data catalogs published on the Web. DCAT provides a standard model and vocabulary for describing datasets and data services, enabling people and machines to find, access, and use them more effectively.

People wade through an area flooded by heavy rains in Porto Alegre, Rio Grande do Sul state, Brazil, May 6, 2024.

This approach is fully aligned with Brazil’s legislative framework for transparency and open government, particularly:

Lei nº 12.527/2011 (Lei de Acesso à Informação): Which guarantees citizens' access to public information;
Decreto nº 8.777/2016 Which establishes the Política de Dados Abertos do Poder Executivo federal, encouraging the proactive release of government data.
Estratégia de Governo Digital (EGD): Which promotes data interoperability and digital innovation in public administration.

By adopting RDF-based metadata assertions and adhering to DCAT Version 3, our datasets ensure semantic interoperability with both national and international frameworks.

— Use of Ontologies and Linked Data

DCAT (Data Catalog Vocabulary)

Plays a central role in describing datasets and catalogs on the Web. Key properties such as dcat:dataset associate datasets with catalogs, dcat: theme classifies them by subject (e.g., Population and Society, Civil Protection, Enviroment*), and dcat:distribution identifies available formats such as CSV, JSON, or RDF.

DCTERMS (Dublin Core Terms)

Provides essential metadata elements like dcterms:title (dataset titles), dcterms:description, and dcterms:accessRights, which clarify whether datasets are public, restricted, or confidential. This ensures alignment with Lei nº 12.527/2011 (Lei de Acesso à Informação) and other open government data policies in Brazil.

ADMS (Asset Description Metadata Schema)

Complements DCAT by enabling the description of data assets, services, and public sector information—playing a vital role in the management and cataloging of government data under initiatives

CC (Creative Commons)

Vocabularies (e.g., cc:license) specify licensing terms for each dataset, facilitating transparency in data usage rights and encouraging responsible reuse, in line with Brazil’s open data commitments.

PROV (Provenance Ontology) and FOAF (Friend of a Friend)

Are used to describe data provenance and the agents (individuals or institutions) responsible for generating and curating the data. This supports traceability, accountability, and long-term data stewardship.

SKOS (Simple Knowledge Organization System)

Structures controlled vocabularies and taxonomies used in tagging and classifying datasets, improving semantic interoperability and searchability.

— Dataset Catalog Serialization

To improve interoperability and semantic accessibility, all datasets used in this project were described using RDF (Resource Description Framework). We followed W3C standards such as DCAT and Dublin Core to create machine-readable metadata for each dataset and catalog.

Two Turtle (.ttl) files were created:

serialization_catalog.ttl – describes the overall dataset catalog, including title, publisher, license, and theme.
serialization_datasets.ttl – provides metadata for individual datasets (S2ID, INMET, IBGE, MapBiomas, etc.), including distribution URLs, formats, and temporal/spatial coverage.

These serializations enable integration into open data portals, support for SPARQL queries, and visualization using RDF tools.

Catalog Serialization
Datasets Serialization

All the information is available in the Open Data Floods GitHub repository Consult the documentation