Why we've upgraded the AI behind Starlight's lesson analysis

Adam Sturdee
May 7
5 min read

We've moved Starlight's lesson transcript analysis onto a stronger model this week.

The decision didn't come from a benchmark table. It came from sitting down with two sets of reports, generated from the same classroom transcripts, and asking a simple question: which one would a teacher actually find useful?

That's the test that matters to us, and it's a harder test than it sounds.

What lesson transcript analysis actually involves

A classroom transcript is not a tidy document. It has overlapping voices, half-finished sentences, instructions cutting across discussion, behaviour reminders, jokes, silences, hesitant answers, and the occasional moment where a student says something genuinely revealing and the teacher has a fraction of a second to decide what to do with it.

Reading that, and producing a meaningful coaching report from it, is harder than reading a polished lesson plan and commenting on it.

A weaker model can manage the surface layer. It can identify the topic, name the activities, and recognise the broad pedagogical moves: questioning, modelling, retrieval practice, behaviour management. That's the part teachers don't really need help with, because they were there.

The part teachers do need help with is harder to see in the moment. Did the question I asked actually open up thinking, or did it just check whether the answer was there? Did I give that hesitant student long enough? Was my explanation tight, or did I drift? Did the way I responded to a wrong answer keep the door open or quietly close it?

Those are the calls a coaching report needs to make well. They are also the calls that separate a useful AI tool from a generic one.

What we noticed in testing

Across a wide range of real lesson transcripts, the new model is noticeably stronger on the judgements that matter for coaching.

It picks up subtler patterns: the difference between rehearsal questioning and genuine inquiry, the way a teacher's pacing shifts when the room is engaged versus when it's drifting, the moments where a student is on the edge of understanding and a small adjustment from the teacher would tip them over.

It handles ambiguity more honestly. Audio is rarely perfect, and some lessons have moments where it's genuinely unclear what was said or what a student meant. The new model is more willing to flag that uncertainty rather than guess, which we'd much rather see than confident-sounding fiction.

The next steps it suggests sit closer to coaching practice. Instead of telling a teacher to "increase student discussion", it's more likely to suggest something specific: a short rehearsal phase before cold calling, more deliberate wait time after a higher-order question, a planned revisit of a key idea later in the lesson.

It also holds the whole lesson in view. Long transcripts can defeat weaker models, which start to lose coherence the further into the lesson they get. The newer one stays in the lesson from start to finish, and the report reflects that.

Why we keep testing models

The biggest risk for any AI coaching platform isn't that the feedback is wrong. It's that it feels generic.

If a teacher reads their report and it could have been written about almost any lesson, the report stops being useful. They don't come back the following week. The platform becomes a curiosity rather than part of the professional rhythm.

That's why we test new models against real lesson transcripts whenever they appear, and why we don't make that decision once and walk away. The technology is moving quickly, and the best option today may not be the best option in six months.

We don't ask schools to track this. They shouldn't have to. Comparing models, reading release notes, and deciding which one handles a Year 8 science lesson or a sixth form seminar best is our job, not theirs. Schools get the result, which is a platform that quietly keeps getting better.

Cost, and where it sits in the decision

Running a stronger model costs more per lesson. We've absorbed that.

Cost matters in any sustainable business, and we don't pretend otherwise. But on the core task, which is reading a classroom transcript and turning it into something a teacher will actually use, we're not optimising for the cheapest answer. We're optimising for the best.

Lighter workflows, such as quick summaries, tagging, and metadata extraction, will continue to use models that are well suited to those tasks and cost less to run. That's good engineering rather than a compromise. The point is that whole-lesson analysis sits in a different category, and we treat it that way.

What teachers and schools should notice

The change is focused specifically on lesson transcript analysis. Over the next few uploads, teachers should find that the reports feel sharper.

The strengths identified should sit more closely against what actually happened in the room. The next steps should feel more like the kind of thing a thoughtful coach might say after watching the lesson, rather than advice that could apply to anyone. Where the transcript leaves something genuinely unclear, the report should say so rather than overreach.

What hasn't changed is the principle behind the report. Starlight is built around teacher autonomy. The report is private. The reflection is the teacher's. The next step is the teacher's. The model's role is to read the lesson carefully and offer a clearer view of what happened, so that the professional doing the work has better material to think with.

That's what we're working towards, and it's why we'll keep doing this. The model behind the platform will keep changing. The commitment to using the best one for this particular job won't.

If you'd like to see what a Starlight report looks like for a lesson in your own school, you can book a demo at https://starlightmentor.com/demo-request.

Spark Insight with Starlight, and experience a coaching platform that keeps getting sharper.

🎥 Subscribe to our channel here: https://www.youtube.com/@Star21-ai

🌐 Read more on our blog: www.coaching.software

💡 Explore the platform: www.starlightmentor.com

🐦 Follow us on X: @star21starlight

The Insight Engine is written by Adam Sturdee, co-founder of Starlight, the UK’s first AI-powered coaching platform, and a senior leader with responsibility for teaching, learning and coaching. This blog is part of a wider mission to support educators through meaningful reflection, not performance metrics. It documents the journey of building Starlight from the ground up, and explores how AI, when shaped with care, can reduce workload, surface insight, and help teachers think more deeply about their practice. Rooted in the belief that growth should be private, professional, and purposeful, The Insight Engine offers ideas and stories that put insight—not judgment—at the centre of development.

🔗 Connect with me on LinkedIn: https://www.linkedin.com/in/adam-sturdee-b0695b35a/

Why we've upgraded the AI behind Starlight's lesson analysis

Recent Posts

Comments