It’s May 2025: The test development process is changing fast, particularly when it comes to content creation.
Generative artificial intelligence (GenAI) is altering traditional methods at a number of test publishing companies, allowing for faster and more efficient creation of test content across different industries, including education and corporate training. In this blog post, we’ll take a look at how GenAI is, or can be, applied to creating the main content of test items, developing supplementary materials and refining the process of reviewing test items. We’ll also look at the critical role of human quality assurance when leveraging AI tools.
GenAI can be a powerful tool for producing test content. Its speed and potential to create a large volume of multiple choice questions have already been widely acknowledged. Not limited to multiple choice questions though, GenAI can easily create various types of test items. Generating less standard item types typically requires more detailed prompts and examples, although GenAI can be particularly useful for creating less structured content, such as prompts for essays or reading passages. The data used to train it, however, can significantly affect the quality and potential biases of the test items it produces. There is also a risk of generating content that is outdated, overly simplistic or even inaccurate; GenAI does not automatically provide an assessment of how confident it is in its output. GenAI’s productivity gains are tempting for many leaders, but it’s not like the flick of a switch; incorporating GenAI into test development requires careful and thoughtful process calibration, which must include quality assurance.
Traditionally, subject matter experts provide the reasons why correct answers are correct and incorrect answers are incorrect. When GenAI creates new multiple choice questions, it can also be asked to generate these rationales, which can help subject matter experts during review and highlight potential issues with the generated items. For certain types of assessments, the rationales provided to test takers are a critical part of the test material.
GenAI can also automate certain aspects of the process of reviewing test items. It can evaluate items to ensure they follow established rules, meet psychometric quality standards and adequately cover the intended content. At the same time, it’s crucial to have human experts verify the analyses performed by GenAI and to establish clear policies for its use in the item review process.
While GenAI presents opportunities for increased efficiency and new capabilities in how test content is created and reviewed, the expertise and oversight of humans remain essential. And that’s where we can help.
As a long-time partner for test publishing companies, ensuring high-quality, reliable testing content, Responsive Translation now also offers AI quality assurance services for testing content in English and other languages. We offer a vital layer of human review with a range of services designed to help ensure accuracy, clarity, appropriateness and efficiency, while keeping bias firmly at bay. For more information, please get in touch at [email protected] or request a custom proposal here.