Skip to Main Content

Text and Data Mining

A guide for UWA staff and students on text and data mining

Resources

Not all resources permit text and data mining and others require you to get permission before you proceed. Publishers may also charge a fee to text mine their resources.

Below you will find information on text and data mining for UWA Library subscribed resources. Please check each resource for their text and data mining terms and conditions before proceeding. For further information, email staffsupport-lib@uwa.edu.au (UWA staff) or hdrsupport-lib@uwa.edu.au (HDR students).

Data Source

Description Further information and access
Adam Matthew A data mining API is available by request.  An offline copy of data can also be provided on a hard drive for secure storage and analysis.  Please contact your faculty librarian team for further information. Adam Matthew Data Mining/ Text mining statement.
Brill ebooks, journals, databases and primary sources Text and data mining is permitted for UWA users without written permission.  Please contact your faculty librarian team for further information on access.
 
Please contact the Library for further information on access.
 
British Online Archives Text and data mining of the licensed content is permitted without written permission, for legitimate academic research and other non-commercial educational purposes.  
De Gruyter De Gruyter will review applications for text and data mining on request. Please contact the Library for further information
Gale: Primary Sources UWA users are permitted to use content from Gale Primary Soures for text and data mining. Data Mining, Textual Analytics, the Digital Humanities and Gale.
IEEE Xplore Register for the Xplore Metadata API key to access IEEE ebooks, journals, conference papers and some standards. Register for the API key visit: IEEE Xplore Interactive Documentation
JSTOR Data for Research Provides a dataset to researchers for use in research and teaching.  Data available includes metadata, n-grams, and word counts for most articles and book chapters at no cost to the researcher and may include data for up to 25,000 documents (JSTOR, 2020). JSTOR Data for Research website
Knowledge Unlatched All content is published in open access under various Creative Commons licenses (usually by CC-BY). As long as the TDM process adheres to this licensing text and data mining is permitted without restrictions.  
Newsbank Text and data mining is not permitted. Researchers seeking to undertake text mining on the Newsbank content should seek the express written consent of Newsbank. For further information, please contact the Library.
Readex The publisher will consider text and data mining requests on a case-by-case basis.  Please note, there may be a charge and the publisher will provide this information on request.  For further information, please contact the Library.
Sage Journals Online UWA users are permitted to extract or use information contained in the Licensed Materials for educational, scientific or research purposes. SAGE text and data mining policy
ScienceDirect UWA users are permitted to access text and data mining services for non-commercial research purposes via an API. Request access to the API, visit Elsevier's Text and Data Mining FAQs.
Scopus UWA users are permitted to access text and data mining services for non-commerical research purposes via an API. Request access to the API, visit Elsevier's Text and Data Mining FAQs.

The following open sources are available on the web and permit text and data mining. 

Data Source Description Further information and access
ArXiv An open-access repository of electronic preprints, in the fields of mathematics, physics, astronomy, electrical engineering and computer science.  ArXiv provides bulk metadata and abstract access and the arXiv API.  Visit arXiv Bulk Data Access page for more information.
CORE Collection of open access research papers. Access to raw data for text mining.
Europe PMC An open science platform providing access to life science publications and preprints. Visit the Europe PMC Developer resources page to get access to the RESTful API and bulk downloads tools.
Google Books Contains an index of full-text books digitised by Google.   
Hathi Trust Digital Library Contains over 17 million digitised resources for scholarly research. Visit Data Availability and APIs for information on bulk download options.
PLOS Open access publisher in the fields of Science and Medicine. PLOS provides several options to access their data via their Text and Data Mining page. API Display Policy for terms and conditions.
Project Gutenberg Contains a library of free ebooks. See the Project Gutenberg License and Permissions pages for information on what you can do with the data.
Wikidata Contains free multilingual open data that can be read and edited by both humans and machines (Wikidata.org). Please see the Wikidata: Data access page for bulk access to Wikidata content using the API.

 

CONTENT LICENCE

 Except for logos, Canva designs, AI generated images or where otherwise indicated, content in this guide is licensed under a Creative Commons Attribution-ShareAlike 4.0 International Licence.