The following open sources are available on the web and permit text and data mining.
Data Source |
Description |
Further information and access |
ArXiv |
An open-access repository of electronic preprints, in the fields of mathematics, physics, astronomy, electrical engineering and computer science. |
ArXiv provides bulk metadata and abstract access and the arXiv API. Visit arXiv Bulk Data Access page for more information. |
CORE |
Collection of open access research papers. |
Access to raw data for text mining. |
Europe PMC |
An open science platform providing access to life science publications and preprints. |
Visit the Europe PMC Developer resources page to get access to the RESTful API and bulk downloads tools. |
Google Books |
Contains an index of full-text books digitised by Google. |
|
Hathi Trust Digital Library |
Contains over 17 million digitised resources for scholarly research. |
Visit Data Availability and APIs for information on bulk download options. |
PLOS |
Open access publisher in the fields of Science and Medicine. |
PLOS provides several options to access their data via their Text and Data Mining page. API Display Policy for terms and conditions. |
Project Gutenberg |
Contains a library of free ebooks. |
See the Project Gutenberg License and Permissions pages for information on what you can do with the data. |
Wikidata |
Contains free multilingual open data that can be read and edited by both humans and machines (Wikidata.org). |
Please see the Wikidata: Data access page for bulk access to Wikidata content using the API. |