The quality assessment was conducted using four key dimensions: Accuracy, Coherence, Completeness, and Timeliness.
1. Accuracy (Syntactic and Semantic)
- Disaster Records (S2ID) (D1): Data is self-reported by municipalities, which introduces variability in accuracy due to local reporting capabilities and technical standards.
- Rainfall Data (INMET) (D2): Rainfall data is collected from calibrated meteorological stations, ensuring a high degree of measurement reliability.
- Population Estimates (IBGE) (D3): Official demographic projections of population in 2024 based on statistics from the 2022 census; syntactically and semantically consistent.
- Urban Expansion (MapBiomas) (D4): Derived from satellite imagery with validated classification processes. Urban land cover classes were extracted consistently.
- Deforestation (MapBiomas) (D5): Follows the same rigorous image classification process as D4, with a strong accuracy track record across Brazilian biomes.
- Civil Defense (Transparency Portal) (D6):Is based on oficial data from the Portal da Transparência (Transparecy Portal) that is the website that publish all the government costs in Brazil.
2. Coherence
- Municipality names and state identifiers were standardized to ensure interoperability between datasets.
- Rainfall (INMET) and disaster records (S2ID) were temporally and spatially aligned and showed expected patterns in high-risk regions.
- Urban expansion and forest loss trends (MapBiomas) complemented each other and reflected known environmental transitions.
3. Completeness
- S2ID (D1): Underreporting is a known issue in certain municipalities. The dataset required cleaning to handle encoding issues and numeric inconsistencies.
- INMET (D2): Complete for 2024, though metadata extraction for each weather station required manual merging.
- IBGE (D3): Fully complete for 2018. No updated census values are available at the municipal level for more recent years.
- MapBiomas (D4 & D5): Complete and consistent at the national level, covering 1985–2022 with annual updates and no missing years.
- Portal da Transparência (D6): Complete and consistent at the national level, covering 2014-2024 without missing years.
Some inconsistencies were noted during the validation of demographic and disaster data. In a few municipalities, the sum of people affected by multiple disaster events slightly exceeded the total projected population. This could be explained by:
- Duplicate representation: Individuals may be counted more than once if affected by multiple events throughout the year.
- Rounding and approximation: Reported figures like "10,000" or "6,000" suggest estimates rather than exact counts.
- Population projections: IBGE estimates may not be fully up-to-date or precise, especially in smaller municipalities. In some cases, percentages exceed 100% due to these discrepancies.
Despite these challenges, using official sources like IBGE remains preferable to assuming unknown values, as it ensures transparency and consistency in comparative analysis.
4. Timeliness
- S2ID: Includes records up to 2024. Reporting delays may limit visibility of recent disaster impacts.
- INMET: Fully up-to-date with daily records through 2024.
- IBGE: Timeliness is limited by the date of the last census (2018), affecting population-based indicators.
- MapBiomas: Last available version is 2022 (Collection 9), with annual updates generally released mid-year.
- Portal da Transparência: Include the general data from 2014 to 2025.
5. Summary Table – Data Quality Dimensions
| ID | Dataset | Accuracy | Coherence | Completeness | Timeliness |
|---|---|---|---|---|---|
| D1 | S2ID – Disaster Records | Medium | Medium | Medium | Medium |
| D2 | INMET – Rainfall Data | High | High | High | High |
| D3 | IBGE – Population Data (2018) | High | High | High | Low |
| D4 | MapBiomas – Urban Expansion | High | High | High | Medium |
| D5 | MapBiomas – Deforestation | High | High | High | Medium |
| D6 | Civil Defense - Transparency Portal | High | High | High | High |
All datasets were obtained through official Brazilian open data platforms and are used in accordance with the country's open data reuse policy. Any pre-processing performed preserved the datasets' original structure, and all transformations are documented to support reproducibility and transparency.