Skip to Main Content

Artificial Intelligence (AI): Evaluating Gen AI

Evaluating Generative AI

Whilst Gen AI can be a useful tool to support your studies, it is not without limitations. There are many types of Gen AI and each comes with its own advantages and disadvantages. Gen AI is a tool like any other, it is important to select the right tool for the job. By evaluating the outputs produced you can identify inaccuracies, misinformation, disinformation, and bias.

Before you begin:

  • Check you are permitted to use Gen AI in your studies or research
  • Read the information in the tab Using Gen AI Tools

This page of the guide aims to help you:

  • Think critically about Gen AI tools and outputs
  • Assess suitability and appropriateness

How to Evaluate Gen AI

A wooden desk cluttered with books and stationary. Sitting in the middle of the desk is a laptop. On the laptop screen is an image of a cartoon-like blue robot on a black background. Fig 1. Image using Microsoft Edge's Copilot, powered by DALL-E from the prompt "a desk with an AI device on it".

Applying similar principles that you would when assessing any other source of information or tool is a good place to start. There are a number of tests that can help you evaluate Gen AI.

Tests to evaluate Gen AI:

  • ROBOT test prompts you to analyse AI with thoughtful questions
  • SWOT analysis is a framework to assess suitability

Tests to evaluate outputs:

  • The EVERY test guides you with five steps to follow when using generative AI to produce outputs,
  • and the CRAAP test can help you evaluate any source of information.

ROBOT Test

The ROBOT test is comprehensive, and was designed to help you think critically about AI. It was created by Amanda Wheatley and Sandy Hervieux from McGill University. You can find more resources on the LibrAIry blog.

Choose a Gen AI tool and try to answer the questions in each section of the test.

 

Reliability

  • How reliable is the information available about the AI technology? 
  • If it’s not produced by the party responsible for the AI, what are the author’s credentials? Bias? 
  • If it is produced by the party responsible for the AI, how much information are they making available?  
  • Is information only partially available due to trade secrets? 
  • How biased is the information that they produce? 

 

Tips: Try to find information about the generative AI tool on the website of the company/organisation/individual who made it. Is the information about the tool easy to locate and is it clear? Additionally consider if the generative AI tool has been designed to produce reliable information or content, try to fact check the output.

Objective

  • What is the goal or objective of the use of AI? 
  • What is the goal of sharing information about it? 
    • To inform? 
    • To convince? 
    • To find financial support? 

 

Tip: Consider the purpose the AI tool was made for, does this align with how you are using it? ChatGPT was made by a company called OpenAI, what is the purpose of this company, what is their mission? What did they intend their users to do with the tool they created?

Bias

  • What could create bias in the AI technology? 
  • Are there ethical issues associated with this? 
  • Are bias or ethical issues acknowledged? 
    • By the source of information? 
    • By the party responsible for the AI? 
    • By its users?

 

Tips: Read the Ethical Considerations page of this guide for more information on issues and ethical considerations.

Search the library's catalogue OneSearch with the follow search string, and use filters to limit the results:

  • Bias AND "Generative AI"

Ownership

  • Who is the owner or developer of the AI technology? 
  • Who is responsible for it? 
    • Is it a private company? 
    • The government? 
    • A think tank or research group? 
  • Who has access to it? 
  • Who can use it? 

Tips: Consider the motivations of the owner, how might this impact the the technology?

 

Type

  • Which subtype of AI is it? 
  • Is the technology theoretical or applied? 
  • What kind of information system does it rely on? 
  • Does it rely on human intervention? 

Tips: These questions may be applied to AI in the broader sense, as well as more specifically to generative AI. For example, if you want to use AI to search for information, it might be best to find a type of generative AI that has uses a search engine and can give you links to the resources used in the content it generates.

 

Attribution:

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

To cite in APA: Hervieux, S. & Wheatley, A. (2020). The ROBOT test [Evaluation tool]. The LibrAIry. https://thelibrairy.wordpress.com/2020/03/11/the-robot-test

 

 

SWOT Analysis

  • Will using the tool improve my work?
  • Is the output that is generated is immediate?
  • Does the output relate to what I am working on?
  • Will over reliance compromise your own learning, understanding, critical thinking skills, creativity, work ethic?
  • Will I incur a fee if I want to use the tool?
  • How effective is the tool? Are there known risks and pitfalls associated with use? Are there any privacy concerns?
  • Will using the tool improve my work and enhance my learning?
  • In what way can the tool assist my learning?
  • Will lack of full disclosure and use of AI result in academic misconduct?
  • Do I really know if the output is factual and unbiased?
  • Will using this tool remove or hinder my learning opportunities?

EVERY Evaluation Tool

The EVERY tool provides guidance on how to use AI responsibly with 5 steps to follow.

  • Try the EVERY tool on the AI for Education website.

Is it CRAAP?

Is it CRAAP?  can be used to evaluate information resources. Similar principles may be applied to Gen AI outputs. Remember always fact check and follow up on topics by finding academic sources for your assignments.

Currency

  • How old is the generative AI tool?
    • Are there revised or updated versions that are more relevant?
    • How old is the dataset the tool was trained on? How does this impact the output?

Relevance

  • Ascertain whether the information is fit for purpose;
    • has the prompt produced relevant content? 
  • Who is the intended audience of the generative AI tool?
    • What is tool intended for, is this reflected in the output?
  • Is the information at an appropriate level?
    • Can you verify the output with more relevant scholarly sources?
  • Would you be comfortable citing this as a source in your work?

Accuracy

  • Is the output accurate?
    • Is it supported by evidence?
    • If there are any citations, do they actually exist?
    • Has the tool hallucinated?
  • Where does the information come from?
  • Has the information been reviewed or refereed?
  • Can you verify any of the information in another source or from personal knowledge?
  • Does the output seem unbiased and free of emotion?
  • Are there spelling, grammar or typographical errors?

Authority

  • Who created the generative AI tool?
    • Do you know the source of the output that was generated?

Purpose

  • What is the purpose of the tool and how is this evident in the output?
    • What was the generative AI made to do?
  • Does the organisation of company responsible for the creation of tool make their purpose/mission clear?
    • Has the tool been made to generate output that is a verifiable fact, is it an opinion or propaganda, or advertising?
  • Are there political, social, ideological, cultural, religious, institutional or personal biases present?

CONTENT LICENCE

 Except for logos, Canva designs, AI generated images or where otherwise indicated, content in this guide is licensed under a Creative Commons Attribution-ShareAlike 4.0 International Licence.