Metadata can be explained simply as 'data about data'. Metadata describes the related research data and details its location to enable efficient retrieval and reuse throughout the research data lifecycle.
It is an essential component of research data management and should incorporate file naming and organisation protocols which are used by all researchers working on a project.
Correct storage of documentation and metadata is just as important as the storage of the research data itself, as the metadata provides a descriptive meaning to raw research data. Researchers are encouraged to use the same guidelines to store all documentation and metadata as those used for research data storage and backup.
Researchers should develop an organised electronic filing system where everyone involved in data collection, analysis, reuse and storage understands the file naming protocols. The use of folders must follow the agreed standard where each project, assay method, laboratory experiment or sample group is logically placed in a hierarchical order.
File naming guide
|Naming Standard||Description of Naming Standard|
|Numbering Standards||Specification of digit numbers to ensure a consecutive listing of files|
|Date Standards||Specification of date formats to ensure a consecutive listing a files e.g. 'YYYY-MM-DD'|
|Punctuation Standards||Do not use any punctuation or spaces except for underscores or hyphens to partition words. The period sign should only precede the file extension e.g. ‘project_101_sample_001.xls’ or ‘project-101-sample-001.xls’|
|Vocabulary Standards||Maintain disciplinary standards in vocabulary, language and abbreviations e.g. ‘project-101-pcr-sample-001’ or ‘project-101-microarray-sample-101’|
|File Version Numbering||Label the file versions in numerical terms e.g. '1.0, 1.1, 1.2 etc.'|
|File Version Description||Complete file naming appropriately through the use of descriptive terms at the end of the document name e.g. 'draft_1, draft_2, final_1 etc.'|
To rename a larger volume of files simultaneously see the Microsoft Support’s Rename Multiple Files in Windows XP with Windows Explorer guide.
Throughout the course of the research data lifecycle multiple versions of documents or files can be created and mechanisms must be put into place to decipher between the different versions.
Version control management can be achieved through:
Version control ensures maintenance of a master file which documents all versions and all changes that are made to the research data.
Researchers must also consider the longevity of the software/hardware required to create and analyse research data. It is important to include documentation relating to software/hardware requirements in the Research Data Management Plan. It may be necessary to include a copy of the software version including any related metadata together with the research data (depending on software licensing conditions).
There are numerous free or commercial storage tools available online which can aid researchers with version control management.
The Pawsey Supercomputing Centre facilitates the uptake of supercomputing, large scale data storage and visualisation in Western Australia. It is an unincorporated joint venture between CSIRO, Curtin University, Edith Cowan University, Murdoch University and The University of Western Australia and is supported by the Western Australian Government. For details of services see https://pawsey.org.au/.
UWA is also contributing datasets to the Australian Data Archive (ADA) for further analysis by researchers. ADA Data Access information and forms are available. Those interested in contributing their research to ADA can find information on how to do this on the ADA website.
Through the use of metadata standards, computer software is able to recall and combine metadata from several sources.
The most commonly used descriptive standard is Dublin Core as it is flexible across disciplines and data formats (including non-digital). It includes elements such as Title, Creator, Subject, Date and Type. The UWA Profiles and Research Repository uses Dublin Core and MARC as the metadata standard.
The Registry Interchange Format – Collections and Services (RIF-CS) schema is used to describe collections, parties, activities and services related to research data collections. UWA uses the RIF-CS schema to describe local research data collections which are then harvested into Research Data Australia (RDA). For more information about RIF-CS, please refer to the Australian National Data Service (ANDS) website.
An identifier is a label or reference number given to a data object and is integral to research data documentation and metadata. It is the responsibility of the researcher to ensure that the location information of the research data is kept current. Identifiers should be both persistent and unique. A unique identifier - such as a Uniform Resource Locator (URL) - which is not persistent may result in a broken link if the dataset is relocated. Persistent identifiers (PIDs) are kept current or redirected over specified time periods by the host. Digital Object Identifiers (DOI), Persistent Uniform Resource Locators (PURLs) or the Handle System can embed an identifier into the URL to also ensure the PID is kept up to date.
Metadata should include an explanatory description of the research data incorporating:
UWA’s Research Data Management Plan is available to researchers for download to aid in the creation of metadata descriptions for their research data.
Discipline-specific metadata standards
Social Sciences Data
2.6.1 Keep clear and accurate records of the research methods and data sources, including any approvals granted, during and after the research process.
Good quality documentation ensures that research data is:
Depending on the research discipline, documentation will have different requirements but as a general rule should include comprehensive metadata.
Edina Data Centre has developed a video which describes metadata and its benefits.