Before commencing your research project you will need to decide how you and your team will work with your data. Decisions need to be made at the outset and documented in your research data management plan about:
Metadata is an essential part of your project documentation.
Researchers need to ensure that their research data is secure and retrievable for long term use. The longevity of research data formats and software versions must be accounted for at the data storage stage.
When selecting file formats consideration must be given to:
During data collection and analysis, researchers may select specific data formats. Conversion of data into standard interchangeable formats may be necessary for preservation purposes. As future access and reuse of data may be affected by proprietary formats, it is advisable to use open formats such as Rich Text Format (RTF) or Open Document Format (ODF) for preservation purposes. The Library of Congress Recommended Formats Statement (2022-2023) includes recommendations for datasets and associated metadata.
Researchers should document all data capture and storage formats as well as any analysis software used. In the event of future software changes it is advisable to also store a copy of the software along with the data.
Establishing a logical, consistent way to organise your folders and the files within them will make it much easier to manage your data throughout the project and for describing your project in future publications. There are many ways of doing this, such as by date or project. The University of Cambridge has guidance on organising your data, suggestions for file names, versioning, documentation and metadata, as well as managing emails. Some commonly used techniques include:
The ARDC's Data Versioning page provides guidelines, tools and further information.
Quality assurance is the steps taken during the data collection process to ensure the data is of high quality and complete. As explained by the USGS (US Geological Survey), the key steps are:
Quality control is the subsequent process of checking that the data meets overall quality goals and criteria. Quality control should be conducted regularly throughout the project and may lead to improvements in quality assurance processes.
Metadata can be explained simply as 'data about data'. Metadata facilitates the discovery, identification, organisation, sharing and interoperability of research data. Having rich metadata will help to maximise exposure, reuse and citation of your research findings. Plan to create rich metadata from the the start of your project. This will ensure the quality of the metadata and save you time in the long run. Incorporate file naming and organisation protocols which are to be used by all researchers working on a project.
Correct storage of documentation and metadata is just as important as the storage of the research data itself, as the metadata provides a descriptive meaning to raw research data. Researchers are encouraged to use the same guidelines to store all documentation and metadata as those used for research data storage and backup.
Many disciplines have their own way of structuring metadata, known as schemas. They list the information you need to provide about your data and how the information should be structured. Some examples include:
Discipline | Metadata Standard |
---|---|
General |
Metadata Object Description Schema (MODS) |
Humanities | |
Social Sciences | Data Documentation Initiative (DDI) |
Sciences | |
Geospatial | Content Standard for Digital Geospatial Metadata (CSDGM) |
The Australian Research Data Commons (ARDC) Metadata Guide is a recommended resource when planning for the creation, linking and storage of metadata.
There are also several tools available to make creating and managing metadata easier, as listed by Stanford Libraries.
Except for logos, Canva designs, AI generated images or where otherwise indicated, content in this guide is licensed under a Creative Commons Attribution-ShareAlike 4.0 International Licence.