A review of the use of a systematic observation method in coaching research between 1997 and 2016

ABSTRACT A systematic observation method has been one of the most popularly employed methods in coaching research. Kahan’s review of this method conducted between 1975 and 1997 highlighted the key trends in this research, and offered methodological guidance for researchers wishing to use this method in their research. The purpose of this review was to provide an update of the use of a systematic observation method in coaching research and assess the extent to which the calls made by Kahan have been addressed. While in some respect this field of study has progressed (i.e., the introduction of qualitative methods), researchers adopting this method have failed to attend to many of the issues Kahan raised. For this method to continue to make a positive contribution towards the coaching research literature, researchers need to more critically reflect on how and why they are employing this method. At present, some of the decisions made by researchers who have conducted work in this area are not justified with a rationale. It is our intention that this review will serve as guidance for researchers and practitioners, and editors and reviewers of journals when attempting to assess the quality of this type of work.


Introduction
A previous review of studies using systematic observation methods in coaching (Kahan, 1999) included 56 studies that had used this method to observe coaching behaviour during the period 1975-1997. Moreover, a review of coaching science research from 1970 to 2001 (Gilbert & Trudel, 2004) revealed that the study of coaching behaviour was the main area under investigation, with 13.1% of all studies included in this review using a systematic observation method. Given these figures, it is clear that the coaching research community sees systematic observation as a valuable tool in developing a greater understanding of what coaches do in practice and competition.
In his review, Kahan (1999) raised some concerns with research that had employed systematic observation: 1) studies were mostly conducted from a positivistic perspective and so rarely considered the contextual factors that impacted coaches' behaviour; 2) studies had been conducted in a small number of sports (i.e., basketball, football and soccer), and mostly in a youth sport context; 3) few studies observed coaches' behaviour within training and competition; 4) sample sizes were small and often not randomly sampled; and 5) conclusions of coaches' behaviour were made based on a limited number of observations that only produced a "snapshot" of those coaches' practices. On a positive note, Kahan (1999) suggested that systematic observation in coaching has revealed a lot about what coaches do, although judgments of the appropriateness of this behaviour were unable to be made due to a limited knowledge of factors related to athletes (i.e., their learning needs and motivations for participating), and the context in which they participated (i.e., what coaches were attempting to achieve related to the context).
In the period since Kahan's review, the use of systematic observation has remained popular amongst coaching researchers, and has continued to evolve as the field of coaching has become more established. However, the extent to which this evidence has contributed towards the development of coaching practice, especially within the confines of the specific contexts in which "coaching" takes place (Lyle, 2002), is unclear. Therefore, the purpose of this paper is to present an updated review of research into the use of a systematic observation method to record coaching behaviour, and consider how this line of research has moved forward since Kahan's review. First, we overview how we identified studies to be included in the review.

Identification of studies
Coaching studies using a systematic observation method were searched using a 3-phase approach (Harvey & Jarrett, 2014). Phase 1 involved searching the EBSCO HOST database. Specific databases searched were Academic Search Complete, Educational Research Complete, ERIC, PsycArticles, PsycBooks, PsycInfo and SPORTdiscus with FullText. Original search terms followed those of Kahan (1999), which were systematic observation AND coaching AND behaviour. Closely related terms and those used in studies that were known to have used a systematic method, such as coach and athlete and learning were also included in searches to ensure all relevant articles that met the inclusion criteria were identified. Database searches stopped once a saturation point had been reached, which was when no new articles were found.
Phase 2 expanded the search beyond the databases to involve other studies that met the inclusion criterion: post 1997, empirical, peer-reviewed study, written in English, the participants of the study were coaches, and a category-based, systematic observation instrument to observe coaching behaviour directed towards players. This extended search was achieved by reading the reference lists of articles identified in phase 1, as well as emailing researchers who were known to conduct coaching research using a systematic observation method. Finally, colleagues directed the authors to any other studies that had not been identified through any other means. Any studies that did not meet the inclusion criteria were removed. These were studies that were of a theoretical nature, or with the purpose of validating a systematic observation instrument, and focused on teachers rather than coaches.
To ensure reliability, a 3-step process, as outlined and implemented by Gilbert and Trudel (2004) and LaVoi and Dutove (2012), was followed. First, all members of the research team agreed to the criteria for article inclusion. Once it was agreed that the study should be included, it was allocated to a member of the research team to read and code. Second, the first and second authors drew upon the experience and expertise of the third author, who had been trained in and published similar work, for guidance on coding articles included for review. Finally, the first and second authors coded 25% of the articles (n = 6) independently from a random sample of articles. Inter-coder reliability was 96%, with the one disagreement discussed until consensus was obtained.

Summary of studies from 1997 to 2016
Twenty-six studies on the use of a systematic observation method in coaching were identified in the current review, and presented in tables 1-10. To document the information from each study, a coding system was designed. Initial categories of this coding system were informed by: 1) previous reviews of coaching behaviour to allow comparisons to be made (Kahan, 1999;Trudel, Côté, & Bernard, 1996) and, 2) the authors' experiences of conducting similar reviews in coaching. For each study, the following categories were coded: a) sports, b) countries, c) coaching context, d) systematic observation instrument, e) additional methods, f) number of total observations per coach, g) observation frequencies across studies, h) method of recording, i) reliability procedure. Coding information for each of these categories resulted in these being combined into 4 broader themes: 1) instrument development and technology; 2) coder training, reliability, and The total equals 29 because some studies employed more than 1 systematic observation method. Inter-reliability 20 (5) Intra-reliability 18 (7) The total equals 25 because 1 study employed a consensus-building technique. Incalculable: studies in which a set datum value across coaches was reported for observation frequency. This number equals 29 because some studies used 2 methods of recording The total equals 27 because 1 study employed more than 1 additional method. procedural issues; 3) research questions and paradigm shift; 4) research context.

Instrument development and technology
Researchers employed a range of systematic observation instruments. The most common was the Arizona State University Observation Instrument (ASUOI), which was used in 9 studies. However, 4 of these studies used a modified or adapted version, rather than Lacy and Darst's (1984)

Coder training, reliability and procedural issues
Coder training, and intra-and inter-observer reliability scores, if specified, were recorded for each study. Seventeen studies indicated that coders had been trained in using the systematic observation instrument employed with 1 study (Becker & Wrisberg, 2008) stating that consensus training had taken place. Seven studies failed to report if any coder training had taken place. Furthermore, 17 studies provided intra-observer reliability scores, while 7 did not, with 20 studies providing inter-observer reliability scores whereas 5 did not. There were differences in the number of coaches observed for each study depending on its purpose and nature. However, not all coaches were observed within the same study the same number of times. For example, in Harvey, Cushion, Cope and Muir (2013) study, the 3 coaches were observed a different number of times each. In these instances, we grouped coaches together and reported the mean number of observations per study. In 6 studies, coaches were observed only once; in 7 studies, coaches were observed for an average of between 2 and 4 times; in 6 studies, coaches were observed for an average of between 5 and 7 times; in 3 studies, coaches were observed for an average of between 8 and 10 times; and in only 1 study a coach was observed on more than 10 occasions. Two studies did not report the number of times coaches were observed, while 1 study was incalculable.
Reviewing the number of minutes coaches were systematically observed was difficult with studies tending to report different descriptions, for example, total number of hours, average number of hours and some studies collating the coaches' hours as a group or as individuals. For example, Pereira, Mesquita and Graça (2009) reported the average session length for all 28 coaches as 87 min and the total minutes for all coaches as 2430. Vinson et al. (2016) reported that each case study was systematically observed for approximately 4 h, up to 2 h with 2 instruments. Owing to these indifferences and the challenges in presenting data in a consistent format, we did not report the length in time of observations.
Finally, the method of recording behaviour was coded. In 10 studies an interval recording method was used, in 7 studies time sampled event was used, in 5 studies event was employed, in 3 time sampled was used, with 5 studies failing to report this information.
In addition, 19 studies reported that they had videoed sessions to allow for post-observation coding, while 7 studies had coded behaviour live. The total equals 30 because some studies used a systematic observation method to investigate coaches' behaviour in more than 1 sport.  This total equals 30 because some studies used a systematic observation method across more than one context. Accompanying a systematic observation instrument, a number of studies used an additional method(s) in an attempt to reveal a further aspect of the coach's practice. Fourteen studies employed some form of interview to predominantly find out the underpinning reasons why coaches used certain behaviours, 4 studies used time-use analysis to find out how long coaches engaged athletes in different practice activities (i.e., technical, phase of play, small-sided game), 2 studies used observational field notes to uncover descriptive information related to the coach-athlete relationship and coaching context (i.e., how coaches communicated and how athletes seemed to receive this information), and the modified expectancy rating scale was employed once in order to measure coach expectations of athletes. Eight of the 26 studies did not use any additional methods.

Research context
With respect to the type of sport studied, 29 were team based, which included 12 in football, 5 in volleyball and basketball, 2 in handball and 1 in rugby union, synchronised swimming, wheelchair basketball, field hockey and American football.
Only 1 study investigated the behaviours of a coach working in an individual sport, which was golf.
There has been an increase in the range of geographical location of systematic observation research in coaching. Results from this review indicate that it is now the UK where most of this research is being completed, with 13 studies conducted during the review period. Along with this, 3 studies were undertaken in Portugal, the USA, and Spain, 2 in Canada and 1 in Australia and Greece.
Based on Trudel and Gilbert's (2006) conceptualisation of coaching contexts, it was identified that 6 studies had been conducted in a recreational context, 12 in a developmental context and 12 in an elite context. 1 Six studies observed coaches' behaviour during matches/ game, with 18 during training. Only 2 studies observed coaches during both matches/game and training.

Discussion
The purpose of this paper was to review studies that had used a systematic observation method to investigate coaching behaviour, and to consider the extent to which this area of research has developed since Kahan's (1999) review. The use of systematic observation to identify coaches' behaviour has continued to receive substantial research interest and has undoubtedly provided important insights that have added to the body of sports coaching knowledge (Cushion, 2013). However, some of the problematic trends identified by Kahan (1999) still exist and will be overviewed in this section. The discussion will be presented under the 4 broader themes: 1) instrument development and technology, 2) coder training, reliability and procedural issues, 3) research questions and paradigm shift, 4) research context.

Instrument development and technology
Systematic observation instruments have been developed in line with advancements in technology. This has led to some movement away from instruments, which Kahan (1999) reported as being the most employed. These were the CBRF (Langsdorf, 1979), the CBAS (Smith, Smoll, & Curtis, 1978) and, most common, the ASUOI (Lacy & Darst, 1984). Since Kahan's review, the CBRF and CBAS, in particular, have been employed sparingly by researchers undertaking systematic observation work; however, the ASUOI remains popular, although this has reduced as other systems have been validated and transposed onto a digital software platform. Several instruments have been developed based on existing instruments, which include the RCABI, the CAFIAS and the CAICS as it was claimed that existing instruments did not enable the purpose of these studies to be met. Perhaps the most notable of the "newer" systematic observation instruments, however, is the Coach Analysis and Intervention System (CAIS) . It has been argued that the CAIS is a more sophisticated systematic observation method than those previous, as it provides a greater breakdown of coaching behaviours that better reflect those used by coaches, enables multilevel coding (i.e., coding more than one behaviour at once) and allows researchers to code secondary behaviours (e.g., recipient, timing, content), and as a function of the practice form behaviours have occurred in Harvey et al., 2013).
Regardless of the systematic observation employed, there has been a trend in adapting or modifying the chosen instrument. For example, Bloom, Crumpton and Anderson (1999) and Zetou, Amprasi, Michalopoulou and Aggelousis (2011) used a revised version of the CBRF, while Ford, Yates and Williams (2010) and Smith and Cushion (2006), among others, used a modified version of the ASUOI. This suggests that these instruments were not appropriate in enabling researchers to gather data that satisfied their research questions. This could be explained by the dated nature of these instruments and their inability to reflect current thinking in coaching. However, a similar situation exists with the use of the CAIS. As evidenced in the results, a modified version of this instrument (i.e., Guzman & Calpe- Gomez, 2012;Partington & Cushion, 2012, 2013 has been used more often than the validated version as presented by Cushion et al. (2012). Furthermore, only 1 study (Harvey et al., 2013) reported secondary behaviours. Finally, while researchers have made claims that they have used the CAIS, the primary behaviours were different to those noted by Cushion et al. (2012). It could be argued then that where this has occurred, the CAIS was in fact not the instrument employed. 1 A recreational context is characterised by a limited focus on competition, low intensity and commitment, formal organisation but irregular and local involvement. A developmental context is characterised by a more formal competition structure, and the requirement for a greater commitment from players than exists in participatory sport. Players are also often selected through some form of talent identification. Finally, an elite context is characterised by intensive preparation and involvement from players, highly structured and formalised competition, and coaches who work with the same group of players in a full-time capacity (Trudel & Gilbert, 2006, pp. 520-522).
Due to the range in systematic observation instruments used it is difficult to assess what coaches do in different sports and contexts. Different instruments include different behaviours, which are defined differently making it challenging for readers to interpret systematic observation data. For example, the RCABI as used by Hall, Gray and Sproule (2015) defined praise as: "Non-specific praise given during the activity (e.g., 'Excellent', clapping)" while the CAIS used by Harvey et al. (2013) defined the same behaviour as: "Positive or supportive verbal statements or non-verbal gestures which demonstrate the coach's general satisfaction or pleasure to a player(s) that DO NOT specifically aim to improve the player(s) performance at the next skill attempt". While it is appreciated that no one systematic observation method can be all encompassing and suit the purpose of every study, there is a critical need to use more common language when defining behaviours. Although not to advocate the use of one instrument over another, researchers do need to consider the instrument they are using and offer a suitable rationale for why they are employing this. For example, if a modified or adapted version of an instrument is being employed, why is this? Or, if a less sophisticated system is adopted over a system that is more complex, then what is the rationale for continuing with the less complex system that has been argued does not best capture coaches' behaviour? From the studies reviewed that used modified or adapted versions of an instrument, researchers offered limited rationale of why the full version was not appropriate.

Research questions and paradigm shift
Early coaching research was conducted, interpreted and discussed through a positivistic lens (Gilbert & Trudel, 2004), as attempts were made to demonstrate a causal relationship between coach behaviour and athlete response (Kahan, 1999). While the coach occupies a position of centrality and considerable influence on athletes' sporting performances (Cushion, 2010), it is now well appreciated that coaching is a social process with many factors influencing athlete learning (Cushion, 2013). It has been suggested that using systematic observation as an isolated method cannot appreciate the social contextual factors that can impact coaches' behaviours (Potrac, Jones, & Cushion, 2007). To investigate the socio-contextual elements of coaching, different research questions needed asking, which has resulted in the use of additional methods. Consequently, since Kahan's (1999) review, this area of study has seen the emergence of mixed methodologies where qualitative methods have been used in conjunction with a systematic observation method.
The purpose of using qualitative methods, mainly in the form of interviews, is that they enable researchers to gain an understanding of how and why coaches use certain behaviours and practice forms/activities (Partington, Cushion, & Harvey, 2014;Potrac, Jones, & Armour, 2002;Potrac et al., 2007;Smith & Cushion, 2006). Indeed, it has been suggested that to make changes to "what" coaches do, there must be an understanding of "why" they do it (Potrac et al., 2007). Interviews have been employed mostly with the coaches studied by the researchers, but in some cases with coach's athletes (Webster, Hunt, & LeFleche, 2013) or with key stakeholders (i.e., parents) (Vinson et al., 2016) in order to investigate their perceptions of the coach's behaviour. Another qualitative method used has been field note recordings in an attempt to examine coaching practice in greater detail (Stodter & Cushion, 2014;Vinson et al., 2016); however, this method has been used sparsely and is in need of greater research focus.
In attempts to understand why coaches use particular pedagogical strategies, scholars have drawn on, and introduced sociological theory and theoretical concepts and related these to coaching. For example, Potrac et al. (2002) interpreted their data through Goffman's (1959) concepts of "social role", "power" and "presentation of the self", while Potrac et al. (2007) used French andRaven's work on power (1959) to offer explanations of why coaches used certain behaviours at the expense of others. Furthermore, Harvey et al. (2013) used Bruner's (1999 notion of "folk pedagogies" when interpreting why the coaches in their study may have coached in particular ways. While this work is much welcomed and has offered a furthered understanding of coaches' practice, it seems the case that the theories drawn upon have not been well developed in coaching. In other words, scholars have tended to introduce many different theoretical concepts without, arguably, exploring these in any great depth. What appears needed is the development of existing theories and concepts used in coaching when theorising practice, before the introduction of different theories. For example, the work of sociological theorists such as Erving Goffman has been used in studies where systematic observation has been the predominant data generation method (Partington & Cushion, 2012;Potrac et al. 2002), yet these have often been one-off studies conducted with a particular coach or group of coaches in a particular context. What we are advocating is that researchers to build upon this work and, thus, develop an increased understanding of how these theories can explain coaching practice.
Alongside a systematic observation method, other quantitative methods have been used. The most common method has been time-use analysis, which is a method that measures the amount of time a coach engages their athletes in different practice forms and activities. Although this method has been used prior to this review (i.e., Lacy & Martin, 1994), it has received more attention in recent years. Most studies that have used a time-use analysis method have examined the time coaches spend engaging athletes in "training" or "playing" form. These data have provided evidence pertaining to how coaches are structuring their practice and whether they are engaging athletes in the most meaningful and relevant activities for their development. Harvey et al. (2013) went one step further than this and also recorded the time spent in "other" form. This was any time when players were physically inactive. Findings from this study showed athletes spent considerable periods of time in this practice form. Given this, it does raise the question what periods of physical inactivity were coded as in other studies that used a time-use analysis method. As with issues related to coach behaviour definitions, there is a need for greater consistency in how researchers are using systematic observation instruments and accompanying methods in order to gather data that are most reflective of a coach's practice.
Besides time-use analysis, other quantitative methods have been used. Becker and Wrisberg (2008) used a modified expectancy rating scale in order to measure whether coaches gave different types of feedback to athletes they regarded as either high expectancy (HE) or low expectancy (LE). As with the use of field notes, this method has been used sparingly, making specific conclusions and recommendations difficult in this current review.
The use of additional methods alongside systematic observation is a welcome development in systematic observation research, and something researchers should give serious consideration to when designing studies using systematic observation. These additional methods provide further insights into the nuances of the impact of the coaching context and how this implicates coaching behaviour. Although systematic observation is considered one of the most appropriate means to identify what coaches do, coaches' behaviour cannot be contextualised without a knowledge and understanding of why or how coaches employ certain behaviours (Cushion, 2010;van der Mars, 1989). This is important as it gives a sense of what coaches were trying to achieve and what factors informed their practice, and gives details pertaining to the interactions between coach and athlete (Cope, Partington, Cushion, & Harvey, 2016;Groom, Nelson, & Cushion, 2012), as well as other key stakeholders (i.e., administrators, parents).

Coder training, reliability and procedural issues
Recently, Ayers and Blankenship (2015) conducted a presentation at the Physical Education Teacher Education Conference in Atlanta titled, Where Have All the Systematic Observation Instruments Gone? While their main argument was based on the reduction in utilisation of these instruments in teacher education programmes in the USA, the issue of training individuals in systematic observation procedures, and where these instruments appear in coach education and development programmes, as well as in doctoral student programmes is one worth considering. We would argue from our own experiences that this reduction in the utilisation of these instruments in teacher education, and lack of use in coach education and development, may be due to there being a lack of researchers who clearly understand and are trained in behavioural analysis techniques during undergraduate, masters and doctoral level programmes. This issue may be due to the fact that a range of methodologies to examine coaching practice has developed, as argued in the previous section. However, we would argue that to gain an in-depth understanding of what coaches do, and how this changes over time, this requires some form of behavioural analysis assessment . Consequently, this raises additional issues about offering quality coder training, where this appears in coach education and development and in doctoral programmes, as well as the need to follow strict training procedures.
Ensuring the credibility of data is essential when employing a systematic observation method (McKenzie & van der Mars, 2015). McKenzie and van der Mars (2015) consider data can only be credible if coders have been through a process of proper training, and reliability checks are conducted throughout data collection and analysis. Indeed, McKenzie and van der Mars (2015) offer a coder training protocol to follow, however, while 17 of the 26 studies stated that coders had been trained, it is unclear from this review the extent to which studies have followed this coder training protocol, or something similar. As such, data presented where coder training and reliability have not been reported should be read with caution as there are no means of detecting whether these data are representative of what coaches actually do.
Prolonged observations of coaches during the different phases of a season are another mechanism by which to ensure data are representative of coaches' behaviour. Kahan (1999) was highly critical of a single observation of coaches suggesting that conclusions could not be drawn from such "snapshots". Unfortunately, similar issues exist despite continued calls for work of a more season-long and/or longitudinal nature (Harvey et al., 2013;Kahan, 1999). The general pattern is that studies with smaller sample sizes often observe coaches for longer duration, and those with increased sample sizes conduct fewer observations. As Kahan (1999) acknowledged, researchers face a decision of whether they choose a larger sample size and thus limit the number of observations, or choose a smaller sample size and increase the number of observations. There comes a point, however, when a minimum number of observations are required if data are to be representative of what coaches do, which Brewer and Jones (2002) suggest to be 3 coaching sessions of 90 min per coach. The problem with single observations or limited time spent observing is that coaches may act or behave in certain ways to satisfy the observation period (Partington & Cushion, 2012). Equally, due to the contextual and situational nature of coaching, a single observation cannot be deemed an example of how a coach behaves, and should be avoided.
Another issue is that without long periods of time in the field conducting observations, it is impossible to undertake intervention-based studies. With the exception of studies by Stodter and Cushion (2014) and Partington, Cushion, Cope and Harvey (2015), no other studies in the current review investigated changes in coaches' behaviour, which means that little is known about how to most effectively do this (More & Franks, 1996). Therefore, while descriptive examinations of practice provide information of what coaches are doing and are therefore essential (Potrac et al., 2007), if an understanding is to be developed regarding the impact of different learning interventions on coaches' behaviour and practice, then intervention studies are a necessity.
Given the time-consuming nature of collecting and analysing systematic observation data, it is unsurprising that few studies have moved beyond a small number of observations, or carried out seasonal or longitudinal interventions. This issue has somewhat not been helped by the introduction of more sophisticated and complex systematic instruments. Consequently, while these systems are welcomed for providing a greater level of information regarding a coach's behaviour and practice, the trade-off is that coder training, analysis of data and achieving the required level of reliability is a more onerous and challenging process. Yet, if researchers want to investigate such things as how coaches' behaviour changes over the course of season or during different phases of a season, more seasonal/longitudinal work is required using a systematic observation instrument that appropriately captures what coaches are doing, rather than utilising a system that is perhaps easier to use and more convenient. The depth to which systematic observation data can be collected is dependent on whether live or post-observation coding takes place. Coinciding with developments in systematic observation instruments is the use of video to record coaching sessions. Although there are advantages to this, such as the ability to code primary and secondary behaviours and conduct post-observation reliability tests, there are also feasibility issues that need consideration. For example, the more complex the instrument, the more challenging it is to reliably capture all information as the coaching is happening. As such, if researchers wanted to use a system such as CAIS, they would have no choice but to code post event, unless they used a modified or adapted version like Partington and colleagues (2012,2013,2014).
Coding post event opens up the possibility to use a timesampled event method, which is coding each behaviour every time it occurs (van der Mars, 1989). However, the method of coding researchers decide to employ is not so much the issue as them offering an explanation of what they mean by this method of coding. In most cases, researchers state what method of coding they have employed but fail to tell readers what this method is. Researchers need to address this issue and offer greater clarity over the method of coding used.
A final issue with the use of video is that there are increased ethical constraints that require consideration. While this should not be a determining factor in researchers using video, pragmatically it could be problematic when observing coaches of children, and in certain sports (i.e., swimming and gymnastics). Also, if the coaching takes place in an area accessed by other people as most recreational children's sessions do this leads to further ethical issues of making sure anyone who could appear in the video is aware and consenting. While this is a challenge, this could be overcome by making clear to participants that techniques such as pixelating faces and clothing will help ensure anonymity of identity.

Research context
Where systematic observation research in coaching has developed, it is through systematically observing coaches across different contextual domains (e.g., participation, development and performance). Kahan (1999) reported that during the period of his review, there had been a predominant focus on studies that had systematically observed the behaviours of coaches in youth sport contexts. Although youth sport stretches across the participant and development domains, this current review highlighted that more research was being conducted in the performance domain. This is a positive sign and is something researchers need to continue doing if detailed understandings of what coaches do in different contextual domains is to be gained.
We did identify 2 primary gaps in the current literature on systematic observation through our analysis. First, the systematic observation of coaches during training, rather than games continues to dominate systematic observation research. Although there are fewer games than practices, coaches' behaviours have been identified as being different under the 2 conditions, hence practice behaviours can not be assumed as being the same as coaches' in-game behaviours (Cushion, 2010;Trudel et al., 1996). Therefore, a complete picture of coaching behaviour and the potential relationships between practice and in-competition behaviours has not been examined, especially in the range of contexts where coaching takes place. Certainly, coaches have much more control over their behaviour within their own practice environments. In contrast, coaches' behaviour may be more reactionary in competition, where coaches make decisions in response to the continually changing environment, and more circumstances that are beyond their own control.
Second, following on from an argument made by Kahan (1999), there remains a limited understanding of coaches' behaviour across a variety of sports, countries and coaching populations. Although it could be claimed that there is much systematic observation data to draw on to provide evidence of what coaches do, it is the case that these data have been generated mainly from male coaches, in a limited number of sports and in a select number of countries. We similarly found it is mostly the same sports that are still receiving the majority of the research attention in this area. This is not to criticise the research we reviewed, as it has been most helpful in providing an in-depth understanding of what coaches in these sports are doing. However, a number of "gaps" remain in systematic observation research with respect to context. For example, research with female coaches at all levels is of critical need. Moreover, the behaviour and practice of coaches who work in a disability domain is urgently needed, as well as research about the role that assistant coaches play in training and competition games (Hall et al., 2015;Gilbert & Trudel, 2004).

Conclusion
This review has shown that while systematic observation continues to advance knowledge and understanding of what coaches do, there are many areas, as highlighted in the discussion section, that require further research attention. Without wishing to repeat these here, we do urge researchers to adopt a more critical approach when adopting a systematic observation method. This includes researchers offering a clearer rationale for the systematic observation instrument being employed, considering the number of observations for each coach and reflecting on the use of a multiple, mixedmethods approach. We hope that this review has brought some of these issues to light, and offers greater clarity for researchers and practitioners wanting to employ this method in future work. Furthermore, we hope this review acts as a useful guide for editors and reviewers who are responsible for making judgments of the quality of this type of work.

Disclosure statement
No potential conflict of interest was reported by the authors.