AI Coaching and Teacher Reflection: What We Took to BERA TEAN 2026
- Adam Sturdee
- May 22
- 6 min read

Insights from building the UK's teacher-first transcript-based lesson analysis platform.
This week we presented our research at the BERA Teacher Education and Action Network (TEAN) Conference at Sheffield Hallam University. The talk drew together two years of work in classrooms, a year of structured pilot research, and an extended deployment now reaching teachers across the UK, Europe and Asia. This post shares the substance of what we put forward, and what we are taking away.
The coaching paradox
Coaching works. A meta-analysis of sixty causal studies by Kraft, Blazar and Hogan (2018) reports an improvement of +0.49 SD in instructional practice and +0.18 SD in pupil achievement from teacher coaching. These are among the strongest effect sizes in professional development.
The trouble is that coaching does not scale.
On a typical UK union allowance of no more than three hours of classroom observation per teacher each year, roughly 0.3% of lessons are seen by another professional and result in meaningful feedback. As Assistant Headteacher for Teaching and Learning, I have lived inside that constraint. A coaching programme of sixty-plus teachers, one observation per term, and even that was a best case. Cover. Pressing demands. Workload. The teachers we most needed to see were the ones we most often did not get to.
The ambition was never the problem. The capacity was.
What teachers deserve
The premise of the work is straightforward. Every teacher deserves what every professional athlete demands: regular, high-quality feedback. Drawing on the EEF's *Effective Professional Development* guidance, we frame quality as four interlocking conditions:
- Specific – tied to observed practice
- Timely – soon enough to act on
- Actionable – a small number of next steps
- Regular – frequent enough to build momentum
We call this STAR feedback. It is the standard we ask Starlight to meet on every report.
Transcript-Based Lesson Analysis: a sixty-year lineage
The approach we have built sits in a long tradition. Flanders began systematic coding of classroom talk in the 1960s. Sinclair and Coulthard identified the IRF sequence in 1975, showing that dialogic intention is most exposed in the actual record, not in memory. Cazden and Schön in the 1980s, Mercer and Alexander through the 1990s and 2000s, all converged on a single insight: the gap between what teachers intend and what teachers do is largely invisible without a record of the language used in the room.
Japan's lesson study tradition has carried this forward for decades, building whole-school improvement on non-evaluative analysis of classroom dialogue. More recently, Hennessy, Vrikki and colleagues translated this into the T-SEDA framework, equipping practitioners to inquire into their own talk.
The transcript is not a judgement tool. It is a thinking tool. It makes visible the ratio of teacher to student talk, the kinds of questions asked, wait time after questions, patterns of affirmation and redirection, moments of silence, sentiment and tone. And, increasingly, it gives us the building blocks for the next lesson's resources.
What has changed is the labour cost. Until very recently, producing a usable transcript of a full lesson and a structured developmental report was the work of hours. Speech-to-text and large language models now close that gap. A whole-lesson transcript and a structured coaching report can sit in a teacher's inbox within minutes of the bell.
Methodology
The study is positioned as insider practitioner research, following Cochran-Smith and Lytle (2009). My closeness to the work is treated as a source of insight rather than a problem to be controlled for.
Phase 1 – Structured pilot, April to July 2025.
Four UK secondary schools across state, academy and independent sectors. 81 teachers across subjects and career stages. 469 lesson recordings. 1,036 AI-generated reports. Voluntary participation. No use in performance management.
Phase 2 – Extended deployment, July 2025 to May 2026.
Twenty schools and organisations. Hundreds of registered users. 245 user-submitted ratings. Transcription in seventy languages, including dual-instruction settings. Live in schools across the UK, Europe and Asia.
Findings: engagement and uptake
Of invited staff across pilot schools, 79% became active users. Teachers averaged 5.8 recordings each over the pilot window, with a long tail ranging from one to more than twenty.
The headline from the engagement data is not the technology. It is the local leadership. Adoption tracks closely with the quality of internal coaching cultures and the credibility of the people championing the work on the ground.
Findings: satisfaction and perceived accuracy
Pilot teachers rated the reports 4.2 out of 5 on average across 82 ratings. More than 80% were 4 or 5 stars. In the extended phase, with 245 ratings drawn from a much wider user base, the average rose to 4.3.
Among the coaching team at the lead pilot school, 100% rated the AI feedback as accurate or very accurate, and 83% said they wanted the platform to play an important role in next year's coaching programme.
What teachers value most
Five themes emerged consistently from aggregated, anonymised staff feedback.
Psychological safety.
Private, confidential, non-judgemental. Feedback without exposure. Coaching, not evaluation.
Strengths first.
Teachers were repeatedly surprised to be affirmed. Naming what works builds the foundation for growth.
Shifts in thinking.
Beyond next-step tips, the reports reshape how teachers interpret their own practice. The distinction is the one between advice and coaching.
Fits busy schools.
Flexible, easy, low-burden. Used on the teacher's terms. Never another compliance task.
Emotional support.
Pride, reassurance, encouragement. Rare in CPD, and perhaps the most important signal of all.
A few lines from teachers stayed with me.
"It doesn't know me, so the feedback feels fair."
"The most useful feedback about my teaching I have ever had."
"Recording felt less invasive than in-person observation."
And from a Deputy Headteacher: "I can't see how we would coach our staff without it."
Perhaps the most honest of all, from a pilot teacher:
"It makes me feel better about myself and like a good teacher on the days when I feel absolutely sodding awful because I'm busy doing a thousand other things."
Not just our finding
The international research is converging on the same picture. Stanford's Tutor CoPilot randomised controlled trial (Wang et al., 2024) found a 4 percentage-point gain in student mastery overall and a 9 point gain for students working with weaker tutors. AI augmented, rather than replaced, the human in the loop.
Demszky and colleagues (2025) ran a randomised controlled trial of automated discourse feedback in K-12 classrooms and reported a 20% increase in teachers' use of focusing questions. The effect was dose-dependent. Consistency over time mattered more than intensity.
A separate Harvard RCT (Kestin et al., 2025) found that students working with a research-based AI tutor learned more in less time than peers in an active-learning classroom. Pedagogy-grounded design was decisive.
What comes next: minimally invasive professional learning
Over twenty years ago, Sugata Mitra cut a hole in a wall in Delhi and asked what sounded at the time like a naive question: what can children learn without adult supervision?
We are asking a parallel question. How far can teachers go on their own, when given high-quality reflective tools, before the marginal value of external coaching reasserts itself?
The next phase of research, will explore this through a paired study across a government school and a private school. Impact will be evaluated using validated instruments, including the Teacher Sense of Efficacy Scale developed by Tschannen-Moran and Woolfolk Hoy, alongside the Hennessy et al. T-SEDA framework.
Invitation to ITE partners
We are inviting initial teacher education providers to help test this work in real teacher education settings. Free pilot access is available for partners willing to explore how whole-lesson transcripts, AI coaching reports and structured feedback can support reflection, mentoring and professional growth for trainee teachers.
If you lead an ITE programme and would like to work with us, we would be very pleased to hear from you.
Three questions we left the room with
We closed the session with three questions for the field, and we close this post with them too.
1. How should AI-generated reflection sit alongside human coaching in initial teacher training and the ECF?
2. Where is the boundary between formative AI feedback and the surveillance of teacher practice?
3. What does responsible scaling of transcript-informed coaching look like across schools and trusts?
These are not rhetorical. The answers will shape what this technology becomes.
If you'd like to see what a Starlight report looks like for a lesson in your own school, you can book a demo at https://starlightmentor.com/demo-request.
Spark Insight with Starlight, and give trainees sharper feedback from day one.
🎥 Subscribe to our channel here: https://www.youtube.com/@Star21-ai
🌐 Read more on our blog: www.coaching.software
💡 Explore the platform: www.starlightmentor.com
🐦 Follow us on X: @star21starlight
The Insight Engine is written by Adam Sturdee, co-founder of Starlight, the UK’s first AI-powered coaching platform, and a senior leader with responsibility for teaching, learning and coaching. This blog is part of a wider mission to support educators through meaningful reflection, not performance metrics. It documents the journey of building Starlight from the ground up, and explores how AI, when shaped with care, can reduce workload, surface insight, and help teachers think more deeply about their practice. Rooted in the belief that growth should be private, professional, and purposeful, The Insight Engine offers ideas and stories that put insight—not judgment—at the centre of development.
🔗 Connect with me on LinkedIn: https://www.linkedin.com/in/adam-sturdee-b0695b35a/



Comments