Skip to Main Content

Research Data Management Toolkit: Collect

Best practices in Research Data Management promote research integrity and collaborative opportunities. A Research Data Management Plan ensures data security, accessibility and validation of results.

Data formats

Before commencing your research project you will need to decide how you and your team will work with your data. Decisions need to be made at the outset and documented in your research data management plan about:

Metadata is an essential part of your project documentation.

Research Data Preservation Formats

Researchers need to ensure that their research data is secure and retrievable for long term use. The longevity of research data formats and software versions must be accounted for at the data storage stage.

When selecting file formats consideration must be given to:

  • data analysis methods;
  • discipline-related standards;
  • software and hardware compatibility and longevity during the data retention period; and
  • preference of proprietary software as opposed to open source software.

During data collection and analysis, researchers may select specific data formats. Conversion of data into standard interchangeable formats may be necessary for preservation purposes. As future access and reuse of data may be affected by proprietary formats, it is advisable to use open formats such as Rich Text Format (RTF) or Open Document Format (ODF) for preservation purposes. The Library of Congress Recommended Formats Statement (2022-2023) includes recommendations for datasets and associated metadata.

Researchers should document all data capture and storage formats as well as any analysis software used. In the event of future software changes it is advisable to also store a copy of the software along with the data.

Folder and file structure and version control

Establishing a logical, consistent way to organise your folders and the files within them will make it much easier to manage your data throughout the project and for describing your project in future publications. There are many ways of doing this, such as by date or project. The University of Cambridge has guidance on organising your data, suggestions for file names, versioning, documentation and metadata, as well as managing emails. Some commonly used techniques include:

  • Folders and subfolders - used to group similar information together in one place. Be mindful of how many folders 'deep' your system is to avoid having the same information stored in to different places in your structure. Hierarchical systems are recommended, with short but meaningful and consistent names. naming folders according to the area of work rather than individual people makes it easier if roles or personnel change during the project.
  • Separate ongoing and completed work - completed documents can be archived in a separate folder making it easier to access current work.
  • File names need to be consistent. If dates, punctuation and version numbers are used, the group needs to decide how they will work, preferably at the outset of the project. New versions may be documented by adding v01 to the first version, then a minor revision would be v1.1. Annotations to draft documents may be tracked by adding the person's initials to the end of the filename.

The ARDC's Data Versioning page provides guidelines, tools and further information.

Quality assurance processes

Quality assurance is the steps taken during the data collection process to ensure the data is of high quality and complete. As explained by the USGS (US Geological Survey), the key steps are:

  • Quality by design - planning how your data is recorded, manipulated and stored
  • Domain management and reference data - where appropriate, using tools which control the values that can be entered, such as lookup tables and drop-down boxes.

Quality control is the subsequent process of checking that the data meets overall quality goals and criteria. Quality control should be conducted regularly throughout the project and may lead to improvements in quality assurance processes.


Metadata can be explained simply as 'data about data'. Metadata facilitates the discovery, identification, organisation, sharing and interoperability of research data. Having rich metadata will help to maximise exposure, reuse and citation of your research findings. Plan to create rich metadata from the the start of your project. This will ensure the quality of the metadata and save you time in the long run. Incorporate file naming and organisation protocols which are to be used by all researchers working on a project.

Correct storage of documentation and metadata is just as important as the storage of the research data itself, as the metadata provides a descriptive meaning to raw research data. Researchers are encouraged to use the same guidelines to store all documentation and metadata as those used for research data storage and backup.

Many disciplines have their own way of structuring metadata, known as schemas. They list the information you need to provide about your data and how the information should be structured. Some examples include:

Discipline Metadata Standard

Dublin Core

Metadata Object Description Schema (MODS)

Metadata Encoding and Transmission Standard (METS)

Data Standards for Western Australian Government


Text Encoding Initiative

Visual Resources Association Core

Functional Requirements for Bibliographic Records (FRBR)

Social Sciences Data Documentation Initiative (DDI)

CSMD-CCLRC Core Scientific Metadata Model

Darwin Core

Ecological Metadata Language (EML)

Geospatial Content Standard for Digital Geospatial Metadata (CSDGM)




















The Australian Research Data Commons (ARDC) Metadata Guide is a recommended resource when planning for the creation, linking and storage of metadata.

There are also several tools available to make creating and managing metadata easier, as listed by Stanford Libraries.


 Except for logos, Canva designs or where otherwise indicated, content in this guide is licensed under a Creative Commons Attribution-ShareAlike 4.0 International Licence.