Documenting Data

Documenting data for research is essential for FAIR data. Proper data documentation makes it easier for both the original researchers and others to analyze, share, or replicate the findings.

Without clear documentation, future users may misinterpret the data. Well-documented data facilitates collaboration and data sharing across research teams, as standardized documentation makes it easier to integrate datasets into broader research contexts.

Another key benefit of documenting data is preserving it for long-term use. As technology, software, and methods evolve, clear documentation ensures that data remains accessible and useful years after it is collected. This is particularly important in large-scale research projects, where multiple datasets may be integrated.

Some Considerations for Documenting Data

  • Vocabularies and Ontologies: Use structured vocabularies and standardized ontologies (e.g., GO for gene products, MIAME for microarrays) to ensure consistency and compatibility across research projects.
  • Data Schemas: Define a clear structure or schema for your data, outlining how the data is organized and how the fields relate to each other.
  • Metadata: Include detailed metadata to describe the context, content, and structure of the data (e.g., date of collection, methods used, units of measurement).
  • Data Filenames: Use consistent and descriptive filenames that indicate the contents, version, and date, helping users quickly understand what the file contains.
  • File Formats: Ensure the data is stored in widely accepted formats (e.g., CSV, JSON, XML) to facilitate long-term accessibility and reusability.
  • Versioning: Keep track of changes to the data or its documentation through version control, ensuring that prior versions of the dataset remain available if needed.
  • Provenance: Document the data’s origin, including how, when, and by whom it was collected, as well as any processing steps applied to it.
  • Data Licensing: Provide clear information about how others can use, share, or modify the data, including any applicable licensing terms.
  • Data Quality Standards: Indicate any quality control measures applied to the data to ensure accuracy and reliability.
  • Data Annotations: Include any relevant annotations or notes that can provide additional insight into the dataset or clarify complex aspects.

  • written by Carly Huitema

Table of contents