Blending Human Judgement and AI Insight: A Smarter Coaching Model
- Adam Sturdee
- 9 hours ago
- 5 min read

There is a temptation, when a new coaching tool arrives in a school, to treat it like a shortcut.
Upload a lesson. Receive a report. Tick a box. Move on.
But one school I spoke to recently is doing something far more interesting. They are using Starlight as a mirror, not a measure. And in doing so, they are quietly building a mentoring programme that blends human professional judgement with AI insight in a way that feels calm, safe, and genuinely developmental.
I am sharing their approach here anonymously because it deserves to be copied.
They started with a single question
At the centre of their programme is a simple enquiry:
What does Starlight notice that we do not, and what do we notice that Starlight cannot?
That question changes everything.
It moves the conversation away from whether the tool is “right” or “wrong”, and towards something more useful: comparison, interpretation, and professional learning.
In other words, the report becomes a prompt for dialogue, not a verdict on practice.
They used peer observation to reduce pressure
The school anticipated something that many leaders underestimate.
If you launch coaching as a formal observation programme, you often trigger defensiveness. People brace themselves. They perform. Or they avoid.
So instead, they reframed the entire process.
Staff were paired deliberately, not by hierarchy, but by trust.
Pairs were chosen based on comfort and existing working relationships. In many cases, staff observed someone outside their department, precisely to reduce the feeling of judgement or subject-specific scrutiny.
The message was clear: this is not accountability. It is a shared exploration of feedback.
That small design decision made participation feel safer, which in turn increased buy-in.
They removed the prompts on purpose
This was the most intelligent move.
Rather than giving observers a tightly structured checklist, the school provided a blank observation proforma with no pre-set headings like “questioning” or “adaptive teaching”.
Why?
Because they wanted to find out what teachers naturally attend to when they watch each other teach.
That creates valuable information at a whole-school level.
If thirty observations happen and almost nobody comments on behaviour, that tells you something about your culture. If most observers comment on explanation clarity, or pace, or challenge, that tells you where professional attention is already focused.
It is a kind of internal diagnostic, generated by teachers themselves, not imposed from above.
They used Starlight as the third voice in the room
Each pair ran a simple comparison process:
One teacher recorded a lesson and uploaded it to Starlight.
A colleague observed at least thirty minutes of the same lesson.
After the lesson, they compared what the human observed with what Starlight surfaced.
They did not treat either as superior.
Instead, they treated both as partial.
The observer may notice non-verbal cues, low-level behaviour, or a pupil disengaging at the back. Starlight may notice patterns in questioning, teacher talk ratio, tone shifts, or missed opportunities for metacognitive modelling.
The coaching conversation lives in the gap between those two perspectives.
That is where the learning sits.
They built a culture of public safety
During their staff training session, they did something leaders sometimes shy away from.
They printed a small number of Starlight reports and placed them on tables for discussion.
Not to judge the teacher, but to critique the feedback.
Staff explored questions like:
What in the transcript might have led the model to say this?
Does this align with what we would see in the room?
Where might the model be limited?
What parts feel helpful, and what parts need interpretation?
The impact was powerful.
It normalised the idea that AI feedback can be questioned, examined, and used selectively.
It also quietly reassured staff that this is not a surveillance tool. It is a reflective tool.
And it modelled something important: you are allowed to disagree with the report.
They treated AI limitations like a feature, not a problem
One of the most mature aspects of their approach was how openly they acknowledged the limits of transcript-based analysis.
They were clear that Starlight does not have eyes in the room. It cannot see a pupil’s facial expression, a hand half raised, or a subtle moment of confusion.
But rather than using that as a reason to dismiss the tool, they used it to sharpen professional judgement.
They framed it like this:
Starlight can show you patterns in language, structure, and discourse. Humans can see context, relationships, and the full complexity of behaviour and learning. When you put them together, you get better coaching than either can provide alone.
That mindset avoids both extremes: blind trust in AI, or blanket rejection.
They are using the data to co-build a bespoke template
This is where their programme becomes strategic.
They are not rushing to write a bespoke Starlight template based on leadership assumptions.
Instead, they are collecting a body of observation comparisons across the school first.
When those documents come in, they will analyse the themes:
What do teachers most frequently notice in each other’s practice?
What does Starlight tend to highlight consistently?
Where do the two align, and where do they diverge?
What teaching priorities are emerging organically from the staff body?
Only then will they co-design a bespoke template that reflects both the school’s priorities and the staff’s professional attention.
That matters, because ownership changes everything.
When teachers feel they helped build the lens, they are more likely to look through it.
The hidden win: recording becomes a reflective act
One of the most subtle benefits they reported is this:
Teachers begin to teach differently the moment they choose to record.
Not in a performative way. In a mindful way.
Putting the lanyard on, pressing record, and uploading a lesson is a small act of deliberate practice. It signals: I am paying attention to what I say and how I say it.
Over time, those small moments accumulate into habit.
And habit becomes culture.
What leaders can steal from this approach
If you are building a mentoring programme that blends human coaching and AI feedback, here are the moves worth copying:
Pair staff by trust, not hierarchy.
Keep the purpose developmental and explicitly separate from QA.
Remove tight observation prompts to reveal what teachers naturally attend to.
Compare human observations with Starlight reports and coach in the gap.
Make reports discussable and critiqueable, not unquestionable.
Name AI limitations openly without becoming defensive.
Use early data to co-build a bespoke template, rather than imposing one.
Treat recording as a reflective ritual, not a data collection step.
This is what it looks like when a school uses AI with maturity.
Not as a replacement for professional judgement, and not as a threat to it.
As a tool that helps teachers notice more, together.
Spark insight with Starlight. Engineer progress through reflection.
🎥 Subscribe to our channel here: https://www.youtube.com/@Star21-ai
🌐 Read more on our blog: www.coaching.software
💡 Explore the platform: www.starlightmentor.com
🐦 Follow us on X: @star21starlight
The Insight Engine is written by Adam Sturdee, co-founder of Starlight—the UK’s first AI-powered coaching platform—and Assistant Headteacher at St Augustine’s Catholic College. This blog is part of a wider mission to support educators through meaningful reflection, not performance metrics. It documents the journey of building Starlight from the ground up, and explores how AI, when shaped with care, can reduce workload, surface insight, and help teachers think more deeply about their practice. Rooted in the belief that growth should be private, professional, and purposeful, The Insight Engine offers ideas and stories that put insight—not judgment—at the centre of development.