Teachers vary considerably in their ability to improve student outcomes. This variation prompts the question of whether—and how—professional development (PD) can help less effective teachers develop their skills and become more effective. The prospect of such improvement has motivated considerable investment in PD worldwide. Across the 36 countries that participated in the Teaching and Learning International Survey (TALIS) 2018, secondary teachers reported spending an average of 10.5 days per year on PD. An important strand of literature has sought to evaluate whether the costs of this investment are outweighed by the benefits in terms of improved teaching and learning. Indeed, hundreds of randomized controlled trials (RCTs) have now been published on this topic. This research reveals that the effects of PD vary widely, suggesting that different types of PD are more effective than others are. Accordingly, researchers are now conducting research that aims to identify the most effective types of PD and to understand the components of and the conditions for effective PD.

Key Findings

  • Key Finding 1

    There is strong evidence that PD programs can have medium-sized positive effects on student achievement at a low cost per student.

    As a case study in what PD can achieve, consider the My Teaching Partner (MTP) program. The results from three successive RCTs of MTP in the U.S. suggest a medium-sized impact on student learning of approximately 0.18 standard deviations (s.d.). This finding provides strong evidence that certain PD programs can improve teaching and learning. The MTP program comes at a high absolute cost of approximately $6,000 per teacher but can be considered “low” cost given that this per-teacher figure amounts to at most $228 per affected student.

  • Key Finding 2

    Evaluated PD programs have positive average effects on student achievement across a range of subjects via improvements in teachers’ knowledge and teaching practices.

    PD improves student test scores by an average of 0.05–0.23 s.d., and there is a positive association between how much a PD program changes teachers’ practice and how much it improves student learning.  However, in this literature, different research methods appear to considerably influence effect sizes, which should be kept in mind when interpreting the results of individual studies.

  • Key Finding 3

    The PD in which teachers typically participate (average-quality PD) is not associated with any increase in student test scores.

    PD programs that have been rigorously evaluated (key finding #2) have likely been selected for evaluation because they have shown some signs of promise. While the available evidence has some important limitations, research that uses representative samples of teachers suggests that the PD in which teachers typically participate is not associated with any increase in student test scores.  The wide variation in the impact of different PD programs (key findings #1–3) shows the importance of being able to reliably differentiate more from less effective PD (key findings #4–5) and to carefully support implementation (key findings #6–7).

  • Key Finding 4

    Instructional coaching has medium-sized positive effects on student achievement, but individual coaches differ widely in effectiveness, and it is not clear what differentiates better coaching.

    Instructional coaching involves an expert providing individualized support to a teacher. It typically involves multiple observation and feedback cycles over time, with a focus on the deliberate development of specific teaching skills. PD that incorporates instructional coaching increases student achievement by 0.1–0.18 s.d.  Moreover, the same PD content (e.g., literacy instruction) is more effective when it is delivered via instructional coaching.  However, individual coaches appear to differ widely in effectiveness, and it is not yet clear what differentiates more from less effective coaches.

  • Key Finding 5

    Research has begun to successfully differentiate the characteristics of more and less effective teacher PD, but further research is required.

    In 2009, one researcher argued that the field had reached a consensus on five “core features” of effective PD (e.g., a focus on subject matter content and an extended duration).  However, subsequent empirical research has cast doubt on these features. Recently, researchers have suggested a different framework consisting of 14 causal mechanisms of effective PD (e.g., modelling and rehearsal) and have shown that PD that incorporates more of these mechanisms tends to be more effective.  This framework fits the data better than prior frameworks do, but further experimental research is needed to directly validate it.

  • Key Finding 6

    The contexts of and the conditions for PD influence the way in which it affects teachers and students.

    PD policy implementation is situated in the broader educational ecosystem. Adequate resources (e.g., funding, facilities and time) are essential for successful PD implementation. Policymakers and educational leaders should assess and fortify the infrastructure for instructional improvement to raise the quality of PD.

  • Key Finding 7

    When it is intentionally designed, PD can promote instructional reform.

    When it is intentionally designed, PD can equip teachers with the tools and knowledge to meet new standards and improve student outcomes. Scholars have noted that aligning PD with high-quality instructional materials, assessment systems, and reform messages helps teachers integrate new standards and policies into their teaching. PD also helps teachers navigate policy changes and instructional demands.

Introduction

Teachers vary considerably in their ability to improve student outcomes.[1] This variation prompts the question of whether—and how—professional development (PD) can help less effective teachers develop their skills and become more effective. The prospect of such improvement has motivated considerable investment in PD worldwide. Across the 36 countries that participated in the Teaching and Learning International Survey (TALIS) 2018, secondary teachers reported spending an average of 10.5 days per year on PD.[2] An important strand of literature has sought to evaluate whether the costs of this investment are outweighed by the benefits in terms of improved teaching and learning. Indeed, hundreds of randomized controlled trials (RCTs) have now been published on this topic.[3] This research reveals that the effects of PD vary widely, suggesting that different types of PD are more effective than others are. Accordingly, researchers are now conducting research that aims to identify the most effective types of PD and to understand the components of and the conditions for effective PD.

A closely related literature has explored the role of PD policy in creating reform-oriented change. Motivated by persistent opportunity gaps for underserved and historically marginalized students, policymakers have turned to PD to build individual and collective capacity, with the goal of driving sustained change in districts, schools, and, ultimately, classrooms. Related policies often include mandates or guidelines specifying both the “what” and the “how” of PD (e.g., content, format, and quantity). PD policies are frequently coupled with resources—including funding, time, and space—seeking to enable such change. In addition, PD policies delineate roles and responsibilities, such as who will participate and who will plan, facilitate, and monitor the outcomes of PD, sometimes creating new roles such as instructional coaches and professional learning directors. Research in this area, which often uses qualitative methods, has traced the diverse ways in which PD policy succeeds and fails in creating change. This diversity suggests a second likely reason that PD varies in effectiveness: implementation.

The two strands of literature described above are complementary in that the first provides evidence about the most effective types of PD, while the latter provides insights around the broader policies and organizational conditions necessary to implement PD. In this chapter, we synthesize this body of evidence. We begin by reviewing the quantitative evidence on the benefits and costs of different types of PD. To maximize interpretability for policymakers, impact and costs are expressed relative to other interventions in the education literature[4] and relative to annual average progress in mathematics for grade 5–6 (11-year-old) students.[5] Figure 1 summarizes the results, revealing sizable differences in the impact of different types of PD. We then turn to exploring the implications of these findings for those looking to design policies that can create sustained change in practice. The wide variations in the impact of different types of PD place a greater onus on policymakers to select more effective types of PD and to then make substantive investments in the resources (e.g., personnel) and complementary policies (e.g., accountability and standards) necessary to support coherent implementation. Taken together, the evidence on effective PD can support efforts to provide access to better-quality teaching for diverse students and communities. 

 

Evidence

Key Finding #1: There is strong evidence that PD programs can have medium-sized positive effects on student achievement at a low cost per student.

This section focuses on the MTP PD program because it is the only such program to have shown positive results across three separate RCTs. Hence, it represents a valuable case study in the impact that PD can achieve. MTP is built around the Classroom Assessment Scoring System-Secondary (CLASS-S) framework, which consists of three domains: emotional support for students, classroom organization (behavior management), and effective instruction.[24] The program begins with a workshop in which teachers are introduced to the CLASS-S framework and the accompanying video library of good practice models. It then proceeds with cycles of individualized coaching across a year or more. Each cycle begins with the teacher recording a lesson and then sending it to the coach for review. The coach extracts one clip showing good practice (as defined by CLASS-S) and another showing an area for improvement. The coach subsequently sends these clips back to the teacher, accompanied by an explanation and prompts for teacher reflection. The teacher and coach then discuss the clips and the teacher’s responses and develop a plan for what the teacher will do differently next time.

In the first RCT of MTP, 78 secondary-school teachers in the U.S. were randomly assigned to participate in 10–12 cycles[25] of MTP coaching over a year or were randomly assigned to a control group that continued with business as usual.[26] In the year following the intervention, students of teachers in the treatment group had scores on the state end-of-year tests that were 0.22 s.d. higher than those of students of teachers in the control group were.[27] This improvement in test scores was mediated by improvements in teachers’ classroom practice, as measured by the CLASS-S framework. This 0.22 s.d. effect on test scores is considered “large” relative to others in the education literature[28] and is equivalent to approximately half the progress normally made by a grade 5–6 student across a year.[29] The findings from this RCT were then replicated in a follow-up study by the same team of researchers.[30] This study involved randomly assigning 86 secondary-school teachers in the U.S. to participate in 5–6 cycles of MTP coaching per year over two years or randomly assigning teachers to a business-as-usual control group. At the end of the second year, the state tests scores of students of teachers in the treatment group were 0.48 s.d. higher than those of students of teachers in the control group were. This effect size is considered “large” relative to others in the education literature[31] and is more than the progress normally made by a grade 5–6 student across a whole year.[32]

The replicated positive findings from these two studies provide strong evidence for the effectiveness of MTP. However, both studies were conducted by researchers involved in developing the MTP program, and neither of them was preregistered, which leaves some room for doubt as to whether the positive findings could be explained by flexibility in the way the data were analyzed.[33] These limitations have recently been addressed by a third, independent, and preregistered RCT. The researchers adapted the program for use in elementary schools and then randomly assigned fourth- or fifth-grade teachers in 107 schools in the U.S. to the MTP program or to a business-as-usual control group. The teachers in the MTP group were further randomly assigned to participate in either five cycles or eight cycles of coaching across a single academic year. At the end of the intervention year, the students of teachers in the five-cycle treatment group had English scores on the state end-of-year tests that were 0.08 s.d. higher than those of students of teachers in the control group were, rising to 0.11 s.d. among novice teachers.[34] This effect size is considered a “medium-sized effect” relative to others in the education literature[35] and is equivalent to the progress normally made by a grade 5–6 student across one-fifth of a school year.[36] Interestingly, the five-cycle treatment group also had better results than the eight-cycle treatment group did, suggesting that eight coaching cycles in a year may be too many.[37]

Taken together, the studies reviewed here provide very strong evidence that MTP is effective at improving teaching and learning. In doing so, they demonstrate that PD can improve student outcomes. The five-cycle version of the program comes at a cost of approximately $6,095 (£4,691) per teacher, i.e., $228 (£175) per student in 2022 prices.[38] This figure is considered a “low” cost per student relative to other education interventions,[39] with MTP being more cost-effective than either class size reductions or individualized tutoring.[40]

Key finding #2: Evaluated PD programs have positive average effects on student achievement across a range of subjects via improvements in teachers’ knowledge and teaching practices.

This section takes a more representative view of the impact of evaluated PD programs by looking at evidence from systematic reviews of the literature. A recent meta-analysis summarizes the findings from all RCT evaluations of PD programs reporting effects on student test scores published between 2002 and 2020.[41] Across 104 such studies, the authors report an average effect size of 0.05 s.d. This size is considered “small” relative to others in the education literature,[42] equivalent to approximately one-eighth of the progress normally made by a grade 5–6 student across a year.[43] An important caveat here is that this effect size drops to zero among the subset of 26 studies in which the analysis was preregistered. One plausible explanation for this discrepancy is that the flexibility around how researchers analyze the data in studies that are not preregistered may result in inflated effect sizes. However, a supplementary analysis that uses a p curve does not support this explanation, leaving the reason for this disparity in findings somewhat unclear. The authors identify sizable heterogeneity in effects across programs, suggesting that the design, content (see key findings #4 and #5) and implementation (see key finding #6) of PD may influence its impact. Almost all of this evidence comes from the U.S. or the United Kingdom and from the core subjects of English, mathematics, and science.

One upside of the concentration of studies in these three core subjects is that it allows meta-analyses to be conducted separately for each subject. For example, one such study looking at the literature on PD for reading teachers in K–8 classrooms found 28 causal studies, with an average effect of 0.18 s.d.[44] Another meta-analysis looked at PD focused on literacy teaching, finding 17 causal studies, with an average effect of 0.23 s.d.[45] These effects are considered medium-large” relative to others in the education literature[46] and equate to approximately half the progress normally made by a grade 5–6 student in mathematics across a year.[47] A third subject-specific meta-analysis looked at PD focused on science, technology, engineering, and mathematics (STEM) teaching. The researchers identified 95 causal studies, with an average impact of 0.21 s.d.[48] A complementary meta-analysis looked at the subset of 37 (of these 95) studies that collected outcome measures relating to both teacher knowledge and instruction and student achievement. It found that PD had positive average effects on teacher knowledge and that programs with larger effects on teacher practice also had larger effects on student achievement.[49]

Notably, the average effects in the cross-subject meta-analysis[50] are smaller than those in the subject-specific meta-analyses[51] cited above are. This difference is plausibly because the cross-subject meta-analysis excludes quasi-experimental studies and studies that use researcher-designed tests as outcome measures, both of which are known to be associated with higher effect sizes.[52] Indeed, the average effect size in the STEM meta-analysis drops from 0.21 s.d. to 0.8 s.d. when the analysis is restricted to studies that use state standardized test outcome measures. This decrease suggests that (reasonable) methodological differences across studies have a sizable influence on effect sizes in this literature, which should be kept in mind when interpreting the results of individual studies. These caveats notwithstanding, systematic reviews of the literature suggest that evaluated PD likely has a positive average effect on student achievement.

Key finding #3: The PD in which teachers typically participate (average-quality PD) is not associated with any increase in student test scores.

The evidence reviewed under key finding #2 above is representative of PD programs that have been rigorously evaluated. Notably, however, programs have likely been selected for evaluation because they had been judged to be particularly promising. Once selected for evaluation, these programs are also likely to be well resourced. Hence, it is plausible that PD that has been rigorously evaluated is more effective than the PD in which teachers typically participate is. Understanding the effectiveness of the latter category is challenging because doing so requires studies that both 1) use a large and representative sample of teachers and 2) are able to distinguish the causal impact of the PD on test scores from preexisting differences in the test scores of students whose teachers participate in the PD. The evidence reviewed above largely comes from intervention studies run by researchers, which usually achieve 2) through random assignment but consequently tend to use a small sample of teachers who have given their consent to participate, hence ruling out 1). To the best of our knowledge, only two studies have come anywhere close to fulfilling both criteria at once. This section reviews each in turn.

The first study analyzes an administrative dataset that contains millions of teachers and is broadly representative of the state school system in Florida.[53] This dataset includes measures of the number of hours of PD in which teachers have participated in each of the last five years, as well as their students’ achievement on annual standardized tests. The researchers drop school districts in which students appear to be systematically assigned to teachers based on their achievement. They also empirically show that teachers do not seem to acquire more PD depending on their prior performance. Hence, the study provides some reassurance that the researchers can distinguish the causal impact of PD from preexisting differences in the outcomes of teachers who tend to participate in PD. The researchers find that a teacher participating in 50 hours (the sample average) of PD per year compared to zero hours is associated with a very small (0.006 s.d.) increase in student test scores. This relationship is so substantively small that it is best interpreted as no association at all.

The second study analyzes Trends in International Mathematics and Science Study (TIMSS) data, which contain approximately 30,000 teachers (in 2015 and 2019) who are broadly representative of fourth- and eighth-grade teachers across the 66 participating school systems.[54] This dataset includes measures of the number of hours of mathematics PD and science PD in which teachers have participated across the last two years, as well as their students’ achievement on the TIMSS large-scale assessments in mathematics and science. The researchers look at the difference between students’ mathematics and science results and whether this difference relates to the amount of PD that their mathematics teacher has participated in relative to the amount of PD that their science teacher has participated in. This within-student, between-subjects approach provides some reassurance that differences in student test scores across teachers with different levels of PD participation do not simply reflect the assignment of certain types of students to teachers who are more (or less) effective in a given subject. The researchers find that increased participation in PD is associated with a very small (-0.02 s.d.) change in student test scores. As with the figure above, this change is so substantively small that it is best interpreted as no relationship at all.

The evidence on the impact of the PD in which teachers typically participate is sparse and, inevitably, has some important methodological limitations. Nonetheless, the evidence available suggests that typical PD has essentially zero relationship with student achievement. The wide variation in the impact of different PD programs highlighted in key findings #1–3 (see Figure 1) shows the importance of being able to reliably differentiate more from less effective PD.

  • Figure 1

    Impact of different types of teacher professional development on student test scores

    Impact of different types of teacher professional development on student test scores

 

Key finding #4: Instructional coaching has medium-sized positive effects on student achievement, but individual coaches differ widely in effectiveness, and it is not clear what differentiates better coaching.

            Instructional coaching is an individualized form of teacher PD that can also function as a means of organizational improvement. Coaching was originally defined as “an observation and feedback cycle” targeting instruction.[55] Over time, the theory and practice of instructional coaching have been elaborated on, and contemporary definitions tend to require an instructional expert providing individualized support to a teacher, with this support being structured around multiple observation and feedback cycles over time, focusing on the deliberate development of specific teaching skills and a deeper understanding of curricular materials.[56] The MTP program (key finding #1) is an example of an instructional coaching program. MTP includes multiple cycles of classroom observation and feedback across the course of a year or more. The coaches are experienced teachers with specialist training who work on a one-on-one basis with program participants, focusing on improving specific elements of the CLASS teaching framework.

A meta-analysis of PD programs incorporating instructional coaching identified 31 causal studies, with an average effect on student achievement of 0.18 s.d.[57] This effect size is considered a “medium-sized effect” relative to others in the education literature[58] and is equivalent to the progress normally made by a grade 5–6 student across half a year.[59] However, this average effect drops to 0.10 s.d. among larger “effectiveness” trials and to 0.12 in studies that use state standardized tests as outcome measures.[60] These decreases suggest that the methodological choices made by researchers have a considerable influence on effect sizes in this literature. In addition to student achievement, the researchers looked at teachers’ classroom practice, as captured by classroom observation rubrics. Across 43 causal studies, they found that instructional coaching has an average effect of 0.49 s.d. on teachers’ classroom practice. In the subset of 23 studies that included both types of outcome measures, they found that the more a coaching program changed teachers’ practice, the more it improved student achievement (correlation 0.37). Pairing coaching with conventional, workshop-format group PD activities was associated with a 0.31 s.d. larger effect size in terms of changes in teacher instruction and a 0.12 s.d. larger effect size in terms of changes in student achievement.[61].The design and enactment of coaching as a mode of PD influence the effectiveness of this form of PD.

            Most PD programs combine some content (e.g., behavior management and reading instruction) with a format (e.g., instructional coaching and lesson study). This combination raises the question of the extent to which the effects observed in the instructional coaching literature are driven primarily by the format (instructional coaching) or by the content. Perhaps instructional coaching programs are effective because they happen to include high-quality content rather than because they use the instructional coaching format. Recent studies have provided evidence on this matter by holding the content of PD fixed while randomly varying the format. One team of researchers investigated this issue using classroom simulator experiments in which trainee teachers in the U.S. were given PD focused on managing student behavior. The teachers were then randomly assigned to receive either structured self-reflection or instructional coaching. The teachers in the coaching group improved their practice, whereas those in the reflection group did not.[62] Another team of researchers conducted an RCT in which teachers in 180 primary schools in South Africa were given PD on reading instruction.[63] The teachers were randomly assigned to receive this PD either through a traditional off-site training seminar or through instructional coaching. The instructional coaching group showed larger and more sustained improvements in teaching practice and student achievement. Taken together, these studies suggest that instructional coaching plays an independent causal role, making the same PD content more effective.

            One observational study conducted in the U.S. suggests that individual coaches vary widely in terms of their contribution to improvements in teachers’ practice, as captured by a classroom observation rubric.[64] A 1 s.d. increase in coach effectiveness is associated with a 0.36–0.43 s.d. improvement in teachers’ practice, which is comparable to the average impact of coaching on teachers’ practice (0.49 s.d.). This finding suggests that who a teacher is coached by accounts for all of the difference in the impact of the coaching. While this finding is striking, it is arguably not surprising given that instructional coaching relies (by definition) on the expertise of a single coach. Understanding what differentiates more from less effective coaching is crucial to enhancing the reliability of instructional coaching programs. However, research in this area remains in the early stages of development and has yet to yield clear findings on what characterizes more effective coaching.[65] Notably, studies of the effectiveness of coaching have been conducted across contexts, applied varied techniques, and been performed within different policy waves.

Key finding #5: Research has begun to successfully differentiate the characteristics of more and less effective teacher PD, but further research is required.

               It is clear from the evidence above that there is considerable variation in effects across different types of PD. Hence, researchers need to develop frameworks that can help differentiate more effective PD to support the decisions of those designing and commissioning it.[66] In an influential 2009 paper, one researchers argued that the field had reached a consensus on five “core features” of effective PD: 1) a focus on subject matter content, 2) active learning opportunities for teachers, 3) coherence with teachers’ prior knowledge and beliefs, 4) an extended duration, and 5) collective participation.[67] This apparent consensus has made its way into policy documents and official guidance on PD in the U.S., the United Kingdom, and Europe.[68] However, subsequent meta-analyses have cast doubt on several of these features. Focusing on subject matter content (as opposed to generic instructional strategies) is not associated with a higher impact on test scores.[69] Likewise, incorporating collaborative group (as opposed to purely individual) activities is not consistently associated with a higher impact on test scores.[70] Similarly, having a longer duration (more hours) is not associated with a higher impact on tests scores,[71] and experimental evidence suggest that more than five coaching cycles per year actually reduce the benefits of coaching.[72]

            More recently, some researchers have suggested that instead of focusing on the surface-level features of PD, such as the number of hours, the field would benefit from a mechanistic account that explains how the design of PD influences teaching practice.[73] Building on this suggestion, one research team synthesized causal evidence from cognitive science, behavioral science, and the literature on expertise to suggest a list of 14 such mechanisms.[74] These mechanisms are organized into four categories based on whether they primarily develop teachers’ insights (I) about teaching, motivate (M) changes in practice, develop teaching techniques (T), or help to embed new practices (P) in teachers’ repertoire. Returning to the running example of MTP, this program can be characterized via this framework as employing the manage cognitive load and revisit prior learning mechanisms to develop insight (I), the goal setting and positive reinforcement mechanisms to motivate (M) changes in teaching practice, the instruction, modelling and feedback mechanisms to develop teaching techniques (T), and the self-monitoring mechanism to embed these changes in practice (P).

Across 104 experimental studies, the meta-analysis shows that the number of such mechanisms embedded in the design of PD is positively correlated with the effect of that PD on test scores. The researchers also find that PD programs that incorporate at least one mechanism in all four of the IMTP categories have approximately three times the effect (0.14 s.d.) of PD programs that do not (0.05 s.d.). However, this difference is not statistically significant, partly due to the smaller number of programs in the former category. While IMTP theory shows a better fit to the data than the theory underlying the five “core features” does, further empirical research is needed to directly test IMTP theory. For example, recent research using a classroom simulator experiment in England has shown that modelling, one of the technique (T) mechanisms, causally improves teachers’ use of evidence-based practices compared to an active control condition.[75] Further studies of this sort that test fine-grained aspects of the design of PD, including the mechanisms in the IMTP framework, would further help to distinguish higher-quality PD.[76]

Key finding #6: The contexts of and the conditions for PD influence the way in which it affects teachers and students.

The outcomes of PD policies, programs, and practices are shaped by the context of and infrastructure for instructional improvement. This infrastructure includes curricular frameworks, assessment systems, and leadership.[77] Intersecting policies (e.g., high-quality instructional materials, formative assessment systems, and educator evaluation) color the content and format of PD while also coloring how educators respond to particular PD opportunities. Hence, policies and other structures matter for increasing the effectiveness of PD policies and practices.

Recent scholarship reveals the way in which conditions, including resources, capacity, priorities, and pressures, influence the direction and depth of PD implementation. A complex set of resources enables PD policy implementation, contributing to positive outcomes for schools and students.[78] Resources include funding, facilities, and time, which help support the systems and routines needed for implementation.[79] These resources matter for the direction and extent of educational improvement efforts.[80] For example, the availability of skilled instructional coaches is critical for scaling up coaching-based PD systems (see key finding #5). Scholars have shed light on the association between funding and PD outcomes, shedding light on how funding makes it possible to contract with intermediary organizations that facilitate PD and to hire instructional coaches.[81]

PD programs have financial implications. So, in addition to ascertaining the necessary resources for designing and instituting PD, it is important to calculate the return on investment of PD.[82] One scholar encouraged researchers to test the cost-effectiveness of PD approaches.[83] Another scholar developed a method for tracking and describing the costs associated with PD; however, there is limited current research on costs related to recent PD initiatives.[84] Yet another scholar engaged in a cost analysis of the components of an instructional coaching model in one district,[85] encouraging researchers to utilize a cost framework that assesses total investments in PD.

Leaders shape facets of PD policy implementation, and educational leaders can create positive conditions for PD. More concretely, district and school leaders play key roles in forming and sustaining a robust ecosystem for professional learning so that educators have the time and space to make sense of new ideas about improvement efforts and to gain knowledge and skills.[86] Turning to the case of coaching, as one mode of PD, system leaders operationalize and manage facets of coaching.[87] Specifically, system leaders devote time to defining and legitimizing instructional coaching, oftentimes with the goal of clarifying priorities for coaches and reducing teacher resistance.[88] In particular, the way in which district leaders define and manage coaching ultimately influences the work of coaches and the outcomes of instructional coaching.[89] Additionally, research has demonstrated that compared to school-based coaches, coaches supervised by district leaders devoted more time to engaging with teachers on instructional issues (as opposed to generic support or quasi-administrative tasks).[90] These results regarding coaching models and the control of coaches have implications for the degree to which coaching advances key priorities.

Coherence matters for PD policy implementation because the alignment of ideas and resources enables learning and change.[91] Scholars and practitioners bemoan the persistence of fragmentation in the U.S. education system and a blizzard of reforms that can overwhelm leaders and teachers. Hence, PD implementation benefits from a sustained, focused approach that is interconnected with other systems and priorities.[92] Coherence supports leaders’ and teachers’ responses to PD because it improves the conditions for sensemaking, learning, and change.

Scholars have shed light on how coherence matters for the enactment of PD. One researcher documented several benefits when districts integrated professional learning communities (PLCs), coaching, and teacher leadership while launching a mathematics reform.[93] Existing research indicates that teachers are more likely to apply concepts from PD when they match the priorities of district and school leaders.[94] This finding indicates the importance of leaders fostering coherence that encourages the uptake of ideas from PD and ultimately contributing to necessary change in classroom practice.

Key finding #7: When it is intentionally designed, PD can promote instructional reform.

PD plays a crucial role in promoting instructional reform by providing teachers with the necessary tools, knowledge, and support to respond to new standards and curricula and to improve teaching and student-level outcomes, including standardized test scores.[95] Hands-on, reform-oriented PD, which entails practical engagement through coaching and mentorship, prepares teachers more effectively for classroom practice and curriculum implementation than one-shot trainings or workshops do.[96] Notably, reform-oriented PD enables teachers to translate new instructional strategies into their teaching and to increase student learning outcomes.[97] By emphasizing practical application, PD can ensure that teachers are well prepared to implement new curricula and instructional practices in their classrooms, which can substantially improve and align teaching methods.[98] Ultimately, these PD models aim to positively support teacher learning and student-level outcomes (e.g., test scores and student engagement).

Additionally, scholars have found that PD supports instructional reform by aligning professional learning with curricular materials and other strands of accountability-era reforms (e.g., educator evaluation systems and formative assessment systems).[99] This alignment ensures that teachers are not only knowledgeable about new standards but also capable of integrating these standards into their teaching practices while meeting the requirements of their district’s instructional initiatives.[100] Effective PD programs target instructional leaders, contributing to cascading effects that can yield improvements in student learning outcomes.[101] The effects of this type of PD are further evidenced by long-term benefits, including students’ college attendance, indicating that initial gains in teacher effectiveness have enduring positive effects.[102]

            Finally, by addressing the complexities and uncertainties of new policies/programs and the necessary steps for educators to change practices, PD promotes instructional reform.[103] Waves of policy seek to motivate teachers to shift practices to match particular guidelines.[104] PD holds the potential to support teachers in navigating these shifting expectations by fostering a deeper understanding of reforms and providing strategies for incorporating them into their classroom. Scholars argue that PD is essential for teachers to sustainably incorporate new forms of instruction, rather than reverting to traditional practices.[105] PD is important for enabling teachers to navigate instructional reforms that can contribute to lasting improvements in student outcomes.

Conclusion

Over the last decade, research has bolstered our understanding of the benefits of teacher PD. It is now clear that certain PD programs and formats improve teaching and learning, as captured by scores on standardized tests. However, the impact of PD varies widely. This variation is likely explained by differences in the design of PD and in the availability and coherence of contextual supports for implementing PD. Research has now begun to identify indicators of high-quality PD that can help policymakers and school leaders make this shift. At the level of overall PD design, instructional coaching is a promising approach, showing medium-sized average effects on student test scores. In addition, research focused on more fine-grained aspects of PD design suggests that PD is likely to be more effective when it incorporates certain components, such as modelling of evidence-based teaching practices. Qualitative research has shown that adequate and consistent funding is necessary to support the development of expert teacher educators with the skills necessary to deliver effective PD, such as instructional coaches. Moreover, state policymakers should exercise caution when mandating statewide PD, as this form of PD is rarely contextualized to the local needs of districts/schools. Allowing district and school leaders to design or select PD that aligns with their local context can make PD a more meaningful and efficacious learning experience.

 

Endnotes and references

 


[1] Chetty, R., J. Friedman, and J. Rockoff. 2014. Measuring the Impacts of Teachers II: Teacher Value-Added and Student Outcomes in Adulthood. American Economic Review 104(9): 2633–2679. https://www.nber.org/system/files/working_papers/w19424/w19424.pdf; James, J., and S. Loeb. 2021. Value-Added Estimates of Teacher Effectiveness: Measurement, Uses, and Limitations. In Oxford Research Encyclopedia of Economics and Finance. https://oxfordre.com/economics/display/10.1093/acrefore/9780190625979.001.0001/acrefore-9780190625979-e-647.

[2] Sellen, P. 2016. Teacher Workload and Professional Development in England’s Secondary Schools: Insights from TALIS. Education Policy Institute. https://epi.org.uk/wp-content/uploads/2018/01/TeacherWorkload_EPI.pdf.

 [3] Sims, S., H. Fletcher-Wood, A. O’Mara-Eves, S. Cottingham, C. Stansfield, J. Goodrich, ... and J. Anders. 2023. Effective Teacher Professional Development: New Theory and a Meta-Analytic Test. Review of Educational Research. https://doi.org/10.3102/00346543231217480.

 [4] Kraft, M. A. 2020. Interpreting Effect Sizes of Education Interventions. Educational Researcher 49(4): 241–253. https://scholar.harvard.edu/sites/scholar.harvard.edu/files/mkraft/files/kraft_2019_effect_sizes.pdf.

 [5] Hill, C. J., H. S. Bloom, A. R. Black, and M. W. Lipsey. 2008. Empirical Benchmarks for Interpreting Effect Sizes in Research. Child Development Perspectives 2(3): 172–177. https://www.mdrc.org/sites/default/files/full_84.pdf.

 [6] Kraft (2020).

 [7] This result is based on an original random effects meta-analysis of the following three studies (see Figure 1):

Allen, J. P., R. C. Pianta, A. Gregory, A. Y. Mikami, and J. Lun. 2011. An Interaction-based Approach to Enhancing Secondary School Instruction and Student Achievement. Science 333(6045): 1034–1037. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3387786/pdf/nihms386733.pdf; Allen, J. P., C. A. Hafen, A. C. Gregory, A. Y. Mikami, and R. Pianta. 2015. Enhancing Secondary School Instruction and Student Achievement: Replication and Extension of the My Teaching Partner-Secondary Intervention. Journal of Research on Educational Effectiveness 8(4): 475–489. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5323067/pdf/nihms847421.pdf; Clark, M., J. Max, S. James-Burdumy, S. Robles, M. McCullough, P. Burkander, and S. Malick. 2022. Study of Teacher Coaching based on Classroom Videos: Impacts on Student Achievement and Teachers' Practices (NCEE-2022-006a). National Center for Education Evaluation and Regional Assistance. https://files.eric.ed.gov/fulltext/ED619739.pdf.

 [8] Relative to other interventions in the education literature. Kraft (2020).

 [9] This is the total cost for the five-cycle group ($664,454, reported in Table B.19 in the Appendix) divided by the number of teachers in the five-cycle group (N=111, Table B.5 in the Appendix). Clark et al. (2022). https://www.mathematica.org/-/media/publications/pdfs/education/2022/teacher-video-report-appendix.pdf.

 [10] Basma, B., and R. Savage. 2018. Teacher Professional Development and Student Literacy Growth: A Systematic Review and Meta-Analysis. Educational Psychology Review 30(2); 457–481. https://link.springer.com/article/10.1007/s10648-017-9416-4; Didion, L., T. R. Toste, and M. J. Filderman. 2020. Teacher Professional Development and Student Reading Achievement: A Meta-Analytic Review of the Effects. Journal of Research on Educational Effectiveness 13(1): 29–66. https://doi.org/10.1080/19345747.2019.1670884; Lynch, K., H. C. Hill, K. E. Gonzalez, and C. Pollard. 2019. Strengthening the Research Base That Informs STEM Instructional Improvement Efforts: A Meta-Analysis. Educational Evaluation and Policy Analysis 41(3): 260–293. https://scholar.harvard.edu/files/kathleenlynch/files/stem_professional_development_meta-analysis_.pdf; Sims et al. (2023).

 [11] Harris, D. N., and T. R. Sass. 2011. Teacher Training, Teacher Quality and Student Achievement. Journal of Public Economics 95(7–8): 798–812. https://files.eric.ed.gov/fulltext/ED509656.pdf; Kirsten, N., J. Lindvall, A. Ryve, and J. E. Gustafsson. 2023. How Effective is the Professional Development in Which Teachers Typically Participate? Quasi-Experimental Analyses of Effects on Student Achievement based on TIMSS 2003–2019. Teaching and Teacher Education 132: 104242. https://doi.org/10.1016/j.tate.2023.104242.

[12] Kraft, M. A., D. Blazar, and D. Hogan. 2018. The Effect of Teacher Coaching on Instruction and Achievement: A Meta-Analysis of the Causal Evidence. Review of Educational Research 88(4): 547–588. https://scholar.harvard.edu/sites/scholar.harvard.edu/files/mkraft/files/kraft_blazar_hogan_2018_teacher_coaching.pdf.

[13] Cilliers, J., B. Fleisch, C. Prinsloo, and S. Taylor. 2020. How to Improve Teaching Practice?: An Experimental Comparison of Centralized Training and In-Classroom Coaching. Journal of Human Resources 55(3): 926–962. https://riseprogramme.org/sites/default/files/2020-11/RISE_WP-024_Cilliers_TeachingPractice.pdf; Cilliers, J., B. Fleisch, J. Kotze, M. Mohohlwane, and S. Taylor. 2022. The Challenge of Sustaining Effective Teaching: Spillovers, Fade-Out, and the Cost-Effectiveness of Teacher Development Programs. Economics of Education Review 87: 102215. https://doi.org/10.1016/j.econedurev.2021.102215; Cohen, J., V. Wong, A. Krishnamachari, and R. Berlin. 2020. Teacher Coaching in a Simulated Environment. Educational Evaluation and Policy Analysis 42(2), 208–231. https://doi.org/10.3102/0162373720906217; Cohen, J., V. C. Wong, A. Krishnamachari, and S. Erickson. 2024. Experimental Evidence on the Robustness of Coaching Supports in Teacher Education. Educational Researcher 53(1): 19–35. https://edworkingpapers.com/sites/default/files/ai21-468.pdf.

[14] Blazar, D., D. McNamara, and G. Blue. 2023. Instructional Coaching Personnel and Program Scalability. Education Finance and Policy 1–32. https://files.eric.ed.gov/fulltext/ED616777.pdf

[15] Desimone, L. M. 2009. Improving Impact Studies of Teachers’ Professional Development: Toward Better Conceptualizations and Measures. Educational Researcher 38(3): 181–199. https://isidore.udayton.edu/access/content/group/48d85ee6-68d7-4a63-ac4e-db6c0e01d494/EDT650/readings/Desimone_Laura_M.pdf

[16] Sims, S., and H. Fletcher-Wood. 2021. Identifying the Characteristics of Effective Teacher Professional Development: A Critical Review. School Effectiveness and School Improvement 32(1): 47–63. https://doi.org/10.1080/09243453.2020.1772841.

[17] Sims et al. (2023).

[18] Allen et al. (2011); Allen et al. (2015); Clark et al. (2022).

[19] Kraft, Blazar, and Hogan (2018).

[20] Sims et al. (2023).

[21] Ibid.

[22] Ibid.

[23] Harris and Sass (2011); Kirsten et al. (2023).

[24] Gregory, A., E. Ruzek, C. A. Hafen, A. Y. Mikami, J. P. Allen, and R. C. Pianta. 2017. My Teaching Partner-Secondary: A Video-based Coaching Model. Theory into Practice 56(1): 38–45. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5571870/pdf/nihms844807.pdf

[25] Cycles per year are taken from p. 477 of Allen et al. (2015).

[26] Allen et al. (2011).

[27] Ibid.

[28] Kraft (2020).

[29] Hill et al. (2008).

[30] Allen et al. (2015).

[31] Kraft (2020).

 [32] Hill et al. (2008).

 [33] Wolf, R., J. Morrison, A. Inns, R. Slavin, and K. Risman. 2020. Average Effect Sizes in Developer-Commissioned and Independent Evaluations. Journal of Research on Educational Effectiveness 13(2): 428–447.

 [34] See Appendix Table C.1 and Appendix B Table C.2 in Clark et al. (2022). https://www.mathematica.org/-/media/publications/pdfs/education/2022/teacher-video-report-appendix.pdf.

 [35] Kraft (2020).

 [36]Hill et al. (2008).

 [38] This is the total cost for the five-cycle group ($664,454, reported in Table B.19 in the Appendix) divided by the number of teachers or students in the five-cycle group (N=111, Table B.5 in the Appendix). Ibid.

 [39] Kraft (2020).

 [40] On class size reductions, see Exhibit B20 in the Appendix of Clark et al. (2022). https://www.mathematica.org/-/media/publications/pdfs/education/2022/teacher-video-report-appendix.pdf. On individualized tutoring, Kraft cites estimates of 0.23 s.d. for a cost of $2,500 per student per year; see Kraft (2020). For similar estimated benefits from a meta-analysis, see Nickow, A., P. Oreopoulos, and V. Quan. 2024. The Promise of Tutoring for PreK–12 Learning: A Systematic Review and Meta-Analysis of the Experimental Evidence. American Educational Research Journal 61(1): 74–107.

[41] Sims et al. (2023).

 [42] Kraft (2020).

 [43] Hill et al. (2008).

 [44] Didion, Toste, and Filderman (2020).

 [45] Basma and Savage (2018).

 [46] Kraft (2020).

 [47] Hill et al. (2008).

 [48] Lynch et al. (2019).

 [49] Gonzalez, K., K. Lynch, and H. C. Hill. 2022. A Meta-Analysis of the Experimental Evidence Linking STEM Classroom Interventions to Teacher Knowledge, Classroom Instruction, and Student Achievement (EdWorkingPaper: 22-515). Annenberg Institute at Brown University. https://doi.org/10.26300/d9kc-4264.

 [50] Sims et al. (2023).

 [51] Didion, Toste, and Filderman (2020); Basma and Savage (2018); Lynch et al. (2019).

 [52] Cheung, A. C., and R. E. Slavin. 2016. How Methodological Features Affect Effect Sizes in Education. Educational Researcher 45(5): 283–292. https://doi.org/10.3102/0013189X16656615; Wolf, B., and E. Harbatkin. 2023. Making Sense of Effect Sizes: Systematic Differences in Intervention Effect Sizes by Outcome Measure Type. Journal of Research on Educational Effectiveness 16(1): 134–161. https://doi.org/10.1080/19345747.2022.2071364.

 [53] Harris and Sass (2011).

 [54] Kirsten et al. (2023).

 [55] Joyce, B. R., and B. Showers. 1981. Transfer of Training: The Contribution of “Coaching”. Journal of Education 163(2): 163–172. https://files.eric.ed.gov/fulltext/ED231035.pdf.

 [56] Kraft, Blazar, and Hogan (2018).

 [57] Ibid.

 [58] Kraft (2020).

 [59] Hill et al. (2008).

 [60] See p. 572 for the effect size from effectiveness trials and p562 for the effect size on state standardized tests.

 [62] Cohen et al. (2020); Cohen et al. (2024).

 [63] Cilliers et al. (2020); Cilliers et al. (2022).

 [64] Blazar, McNamara, and Blue (2023).

 [65] Boguslav, A. 2023. Capturing Instructional Practice at Scale: Conceptualizing and Describing the Professional Practice of Teachers and Coaches (Doctoral Thesis). University of Virginia. https://libraetd.lib.virginia.edu/public_view/f4752h79r

 [66] Hill, H. C., M. Beisiegel, and R. Jacob. 2013. Professional Development Research: Consensus, Crossroads, and Challenges. Educational Researcher 42(9): 476–487. https://doi.org/10.3102/0013189X13512674.

 [67] Desimone (2009).

 [68] Caena, F. 2011. Literature Review: Teachers’ Core Competences: Requirements and Development. European Commission. https://ec.europa.eu/assets/eac/education/experts-groups/2011-2013/teacher/teacher-competences_en.pdf; Desimone (2009); Department for Education. 2016. Standard for Teachers’ Professional Developmenthttps://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/537031/160712_-_PD_Expert_Group_Guidance.pdf.

 [69] Didion, Toste, and Filderman (2020); Kraft, Blazar, and Hogan (2018); Lynch et al. (2019).

 [70] Didion, Toste, and Filderman (2020); Kraft, Blazar, and Hogan (2018); Lynch et al. (2019).

 [71] Basma and Savage (2018); Didion, Toste, and Filderman (2020); Kraft, Blazar, and Hogan (2018); Lynch et al. (2019).

 [73] Sims and Fletcher-Wood (2021).

 [74] Sims et al. (2023).

 [75] Sims, S., H. Fletcher-Wood, T. Godfrey-Faussett, P. Mccrea, and S. Meliss. 2023b. Modelling Evidence-based Practice in Initial Teacher Training: Causal Effects on Teachers’ Skills, Knowledge and Self-Efficacy (No. 23-09). UCL Centre for Education Policy and Equalising Opportunities. https://repec-cepeo.ucl.ac.uk/cepeow/cepeowp23-09.pdf.

 [76] For other examples of research programs focused on finer-grained aspects of PD design, see Hill, H. C., J. P. Papay, N. Schwartz, S. Johnson, E. Freitag, K. Donohue, ... and B. Williamson-Zerwic. 2021. Improving Teacher Professional Learning at Scale. Research Partnership for Professional Learning; Australian Education Research Organisation. What We Don’t Know (But Want to Learn) about Professional Learning.   

 [77] Hopkins, Megan, et al. 2013. Infrastructure Redesign and Instructional Reform in Mathematics: Formal Structure and Teacher Leadership. The Elementary School Journal 114(2): 200–224. https://doi.org/10.1086/671935., 50.

[78] Grubb, W. Norton. 2009. The Money Myth School Resources, Outcomes, and Equity. New York: Russell Sage Foundation; Jackson, C. Kirabo, and Claire L Mackevicius. 2024. What Impacts Can We Expect from School Spending Policy? Evidence from Evaluations in the United States. American Economic Journal: Applied Economics 16(1): 412–446. https://doi.org/10.1257/app.20220279.

[79] Kirabo and Mackevicius (2024).

[80] Ibid.; Grubb, W. Norton, and Rebecca Allen. 2011. Rethinking School Funding, Resources, Incentives, and Outcomes. Journal of Educational Change 12(1)): 121–130. https://doi.org/10.1007/s10833-010-9146-6.

[81] Knight, David S. 2012. Assessing the Cost of Instructional Coaching. Journal of Education Finance 38(1): 52–80. https://www.jstor.org/stable/23259121., 4.

[82] Foster, John M. Eugenia F. Toma, and Suzanne P. Troske. 2013. Does Teacher Professional Development Improve Math and Science Outcomes and Is It Cost Effective? Journal of Education Finance 39(3): 255–273. https://www.jstor.org/stable/23354866.

[83] Ibid.

[84] Miles, Karen, et al. 2003. Inside the Black Box of School District Spending on Professional Development: Lessons from Comparing Five Urban Districts. Journal of Education Finance 30(1): 1–26. https://www.jstor.org/stable/40704218.

[85] Ibid.

[86] Woulfin, Sarah L. 2021. Leaders Play Key Roles in the Professional Learning Ecosystem. The Learning Professional 42(5): 62–64. https://www.proquest.com/openview/129cd88b9a7ed9543f95f0df13ce2c3c/1?pq-origsite=gscholar&cbl=47961.

[87] Ibid.; Kane, Britnie Delinger, and Brooks Rosenquist. 2019. Relationships between Instructional Coaches’ Time Use and District- and School-Level Policies and Expectations. American Educational Research Journal 56(5): 1718–1768. https://doi.org/10.3102/0002831219826580.

[88] Kane and Rosenquist (2019).

[89] Ibid.

[90] Ibid.

[91] Bryk, Anthony S., et al. 2015. Learning to Improve : How America’s Schools Can Get Better at Getting Better. Cambridge, Massachusetts: Harvard Education Press.

[92] Ibid.

[93] Ibid.

[94] Ibid.

[95] Audisio, Anna, et al. 2024. Does Teacher Professional Development Improve Student Learning? Evidence from Leading Educators’ Fellowship Model. Annenberg Institute at Brown University. https://doi.org/10.26300/ah2f-z471; Coburn, Cynthia E., Heather C. Hill, and James P. Spillane. 2016. Alignment and Accountability in Policy Design and Implementation. Educational Researcher 45(4): 243–251. https://doi.org/10.3102/0013189x16651080; Penuel, William R., et al. 2007. What Makes Professional Development Effective? Strategies That Foster Curriculum Implementation. American Educational Research Journal 44(4): 921–958. https://doi.org/10.3102/0002831207308221.

[96] Penuel et al. (2007).

[97] Ibid.

[98] Ibid.

[99] Ibid.; Allen, Carrie D., and William R. Penuel. 2014. Studying Teachers’ Sensemaking to Investigate Teachers’ Responses to Professional Development Focused on New Standards. Journal of Teacher Education 66(2): 136–149. https://doi.org/10.1177/0022487114560646.

[100] Allen and Penuel (2014).

[101] Ibid.

[102] Ibid.

[103] Ibid.; Kaufman, Julia Heath, and Mary Kay Stein. 2009. Teacher Learning Opportunities in a Shifting Policy Environment for Instruction. Educational Policy 24(4): 563–601. https://doi.org/10.1177/0895904809335106., 23.

[104] Kaufman and Stein (2009).

[105] Ibid.

Suggested Citation

Lizárraga, Lizeth, Sam Sims and Sarah L. Woulfin (2025). "Teacher Professional Development: Costs, Benefits and Policy," in Live Handbook of Education Policy Research, in Douglas Harris (ed.), Association for Education Finance and Policy, viewed 11/06/2025, https://livehandbook.org/k-12-education/workforce-teachers/teacher-development/.

Provide Feedback

Opt-in to receive e-mail updates from the AEFP about the Live Handbook.

Required fields

Processing