Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Research Data Management Toolkit: Data collection

Best practices in Research Data Management promote research integrity and collaborative opportunities. A Data Management Plan ensures data security, accessibility and validation of results.

Data formats

Before commencing your research project you will need to decide how you and your team will work with your data. Decisions need to be made at the outset and documented in your data management plan about:

  • Data formats (consider preservation, many projects will be required to store their data for at least 7 years)
  • Software, especially if specialist software/equipment is needed to view or use the data
  • Folder and file structure and version control
  • Quality assurance processes

Metadata is an essential part of your project documentation.

Research Data Preservation Formats

Researchers need to ensure that their research data is secure and retrievable for long term use. The longevity of research data formats and software versions must be accounted for at the data storage stage.

When selecting file formats consideration must be given to:

  • data analysis methods;
  • discipline-related standards;
  • software and hardware compatibility and longevity during the data retention period; and
  • preference of proprietary software as opposed to open source software.

During data collection and analysis, researchers may select specific data formats. Conversion of data into standard interchangeable formats may be necessary for preservation purposes. As future access and reuse of data may be affected by proprietary formats, it is advisable to use open formats such as Rich Text Format (RTF) or Open Document Format (ODF) for preservation purposes. The Library of Congress Recommended Formats Statement (2021-2022) includes recommendations for datasets and associated metadata.

Researchers should document all data capture and storage formats as well as any analysis software used. In the event of future software changes it is advisable to also store a copy of the software along with the data.

Folder and file structure and version control

Establishing a logical, consistent way to organise your folders and the files within them will make it much easier to manage your data throughout the project and for describing your project in future publications. There are many ways of doing this, such as by date or project. The University of Cambridge has guidance on organising your data, suggestions for file names, versioning, documentation and metadata, as well as managing emails. Some commonly used techniques include:

  • Folders and subfolders - used to group similar information together in one place. Be mindful of how many folders 'deep' your system is to avoid having the same information stored in to different places in your structure. Hierarchical systems are recommended, with short but meaningful and consistent names. naming folders according to the area of work rather than individual people makes it easier if roles or personnel change during the project.
  • Separate ongoing and completed work - completed documents can be archived in a separate folder making it easier to access current work.
  • File names need to be consistent. If dates, punctuation and version numbers are used, the group needs to decide how they will work, preferably at the outset of the project. New versions may be documented by adding v01 to the first version, then a minor revision would be v1.1. Annotations to draft documents may be tracked by adding the person's initials to the end of the filename.

ANDS site: https://www.ands.org.au/working-with-data/data-management/data-versioning

Quality assurance processes

Quality assurance is the steps taken during the data collection process to ensure the data is of high quality and complete. As explained by the USGS (US Geological Survey), the key steps are:

  • Quality by design - planning how your data is recorded, manipulated and stored
  • Domain management and reference data - where appropriate, using tools which control the values that can be entered, such as lookup tables and drop-down boxes.

Quality control is the subsequent process of checking that the data meets overall quality goals and criteria. Quality control should be conducted regularly throughout the project and may lead to improvements in quality assurance processes.

Metadata

Metadata can be explained simply as 'data about data'. Metadata describes the related research data and details its location to enable efficient retrieval and reuse throughout the research data lifecycle.

It is an essential component of research data management and should incorporate file naming and organisation protocols which are used by all researchers working on a project.

Metadata:

  • enables effective organisation of research data;
  • facilitates discovery;
  • facilitates research data sharing;
  • provides digital identifiers for the research data; and
  • supports archiving and preservation.

Correct storage of documentation and metadata is just as important as the storage of the research data itself, as the metadata provides a descriptive meaning to raw research data. Researchers are encouraged to use the same guidelines to store all documentation and metadata as those used for research data storage and backup.

Many disciplines have their own way of structuring metadata, known as schemas. They list the information you need to provide about your data and how the information should be structured. Some examples include:

Discipline Metadata Standard
General

Dublin Core

Metadata Object Description Schema (MODS)

Metadata Encoding and Transmission Standard (METS)

Data Standards for Western Australian Government

Humanities

Text Encoding Initiative

Visual Resources Association Core

Functional Requirements for Bibliographic Records (FRBR)

Social Sciences Data Documentation Initiative (DDI)
Sciences

CSMD-CCLRC Core Scientific Metadata Model

Darwin Core

Ecological Metadata Language (EML)

Geospatial Content Standard for Digital Geospatial Metadata (CSDGM)

 

 

 

There are several tools available to make creating and managing metadata easier, as listed by Stanford Libraries.

CONTENT LICENCE

 Except for logos, Canva designs or where otherwise indicated, content in this guide is licensed under a Creative Commons Attribution-ShareAlike 4.0 International Licence.