AI Content Leveling Tool
Altering the reading level of reference content with an LLM
Project
AI, Differentiation, Education, K-12, Databases
Domain
Research, requirement definition, design, testing
My Role
Project Time
6 months
This AI leveling tool support teacher differentiation needs by allowing them to change the Lexile level of Gale’s reference content. The Lexile score is an educator standard measure of text complexity and Gale’s content skews towards the high side. The teacher needs the ability to offer the same text in a variety of reading levels, covering the same topics but with lower language complexity.
Problem Space
With text transformation falling squarely within the capabilities of an LLM, Gale contracted with an Amazon team to build a proof of concept for Social Studies and English Language Arts content. The Amazon team came back with a set of prompts able to both raise and lower the Lexile level of targeted Gale texts. Claude 3 & 3.5 Sonnet and Claude 3 Opus were tested and though Opus had the best performance, Sonnet 3.5 had the best value. Once the concept was proven technically viable, I led an effort to better understand how teachers interacted with existing tools in the space while simultaneously designing a testing sandbox where subject matter experts could further validate the performance of our PoC models.
Problem #1
Gale’s encyclopedia content is too academic
Encyclopedic topic overviews are extremely useful but only when their content level matches up with a student’s reading level
Problem #2
Teachers need differentiated reading levels
In order to accommodate various student reading abilities, teachers need content at several reading levels which cover the same topics
Performance Testing
I pushed for the creation of a testing sandbox where teachers could evaluate the prompt outputs on a curated set of documents. I designed an experience where teachers would be paid to evaluate various attributes of documents transformed into several different Lexile levels. The insights gained from the process were critical for tweaking the prompt, identifying its shortcomings and gaining a basic understanding of how teachers evaluate AI leveled content.
Design and Usability Testing
I collaborated with another UX designer on my team to create and test several prototype workflows. We uncovered several critical usability issues the team hadn’t anticipated, as well as some new user requirements. Working under significant time pressure, our design changes accommodated shortcomings of the model. After conducting multiple rounds of usability tests using interactive Axure prototypes, our testing revealed a minimum viable feature set and important roadmap items for the future including in-app editing of leveled documents.
Usability Testing Insight #1
Teachers want to edit leveled documents in-app
Teacher use cases frequently require they making formatting changes or take excerpting the leveled text. The future of the platform is assigning to students directly, our documents need this additional flexibility.
Usability Testing Insight #2
LLM performance caused a big design shift
A combination of long document length, prompt complexity and lack of a dedicated model pipeline created situations where users had to wait ~5 minutes for their document. A switch to progressively revealing text as it was created significantly reduced user friction.