top of page
Search

What Two Landmark Studies Tell Us About AI Coaching (and Why It Matters for Starlight)

  • Adam Sturdee
  • Jan 14
  • 4 min read

Over the past year, two rigorous, peer-reviewed studies have quietly changed what we can say with confidence about AI coaching in schools.


Not opinion pieces. Not vendor case studies. Randomised controlled trials, run in real classrooms, with real teachers.


Together, they answer a question many school leaders and teachers are rightly asking:

Does AI-generated feedback actually improve teaching practice?

The answer, based on the best evidence we now have, is yes.


The Stanford Study: AI That Scales Expertise, Not Judgement


The first study, led by researchers at Stanford, examined a tool called Tutor CoPilot, designed to support live tutoring through real-time language suggestions.


You can read the full study here:https://doi.org/10.26300/81nh-8262


In a large randomised trial involving over seven hundred tutors and one thousand students, the researchers found that:


  • Students supported by tutors using AI coaching were more likely to master lesson objectives

  • The impact was strongest for less experienced practitioners

  • The mechanism of improvement was not content knowledge, but changes in language


Tutors using the tool:


  • Asked more probing questions

  • Used less generic praise

  • Avoided giving answers too quickly


Crucially, the AI did not replace tutors or direct them. It offered multiple possible responses, preserved tutor agency, and made expert thinking visible in the moment.

The headline insight from Stanford is this:


AI coaching works best when it amplifies expert practice and leaves human judgement intact.


That principle sits at the heart of Starlight.


The Harvard Study: Automated Feedback Works in Real Classrooms


The second study, led by Harvard researchers in partnership with TeachFX, tested automated feedback with mathematics and science teachers in brick-and-mortar classrooms.



This matters, because until now, most strong evidence for AI coaching came from online or tutoring contexts.


In this trial:


  • Teachers received automated feedback focused on one practice: focusing questions

  • Teachers who received the feedback increased their use of those questions by around twenty percent


The feedback worked in this case because it was narrow.


It did not try to improve everything. It did not offer a general judgement on teaching quality. It focused on one high-leverage move and returned to it repeatedly.


The study also surfaced important realities:


  • Teachers engaged most when feedback was easy to access

  • Trust, accuracy, and time were bigger barriers than technology

  • Teachers valued reflection more than prescription


The key insight from Harvard is this:


AI coaching changes practice when it is precise, repeated, and reflective.


What These Studies Mean for Starlight


Taken together, these studies strongly validate the direction Starlight is taking.

They tell us that effective AI coaching is:


  • Language-focused, not metric-driven

  • Specific, not general

  • Repeated over time, not one-off

  • Private, reflective, and teacher-owned

  • Designed to prompt thinking, not enforce compliance


This is why Starlight is built around:


  • Transcript-based analysis

  • Coaching templates rather than universal scores

  • “You said this → you could try this” feedback

  • Optional use, not mandated evaluation


The research also explains why we are cautious about over-engineering dashboards, ratings, or surveillance-style metrics. None of the evidence suggests that those are what drive improvement. Language does.


What This Means for Leaders and Teachers Building Templates


One of the most important implications of both studies is for schools using Starlight’s template system.


The evidence is very clear:


The more focused the template, the greater the impact.


Effective templates tend to:


  • Target one practice at a time

  • Use concrete examples from the transcript

  • Offer alternative phrasing rather than abstract advice

  • Encourage experimentation over perfection


Templates that try to do everything tend to do very little.

This is why we actively encourage departments, coaches, and teachers to:


  • Build and share their own focused templates

  • Reuse the same template across multiple lessons

  • Treat AI feedback as a mirror, not a verdict


The power of AI coaching does not come from novelty. It comes from noticing patterns, again and again.


Where We Are Taking This Next


These studies are not the end of the story. They are the foundation.


At Starlight, we are using this evidence to:


  • Refine our questioning and feedback templates

  • Improve how we surface missed opportunities in transcripts

  • Make our emailed reports even clearer and lower-friction

  • Support schools in building template cultures grounded in trust and professional reflection


Most importantly, they reinforce a simple belief:


Great coaching does not judge. It helps people see.


Spark Insight with Starlight today and build evidence-informed coaching that scales.


🎥 Subscribe to our channel here: https://www.youtube.com/@Star21-ai

🌐 Read more on our blog: www.coaching.software

💡 Explore the platform: www.starlightmentor.com

🐦 Follow us on X: @star21starlight


The Insight Engine is written by Adam Sturdee, co-founder of Starlight—the UK’s first AI-powered coaching platform—and Assistant Headteacher at St Augustine’s Catholic College. This blog is part of a wider mission to support educators through meaningful reflection, not performance metrics. It documents the journey of building Starlight from the ground up, and explores how AI, when shaped with care, can reduce workload, surface insight, and help teachers think more deeply about their practice. Rooted in the belief that growth should be private, professional, and purposeful, The Insight Engine offers ideas and stories that put insight—not judgment—at the centre of development.

 
 
 

Comments


bottom of page