Towards an understanding of how appraisal of doctors produces its effects: a realist review

Revalidation was launched in the UK to provide assurances to the public that doctors are up to date and fit to practice. Appraisal is a fundamental component of revalidation. Approximately 150 000 doctors are appraised annually, costing an estimated £97 million over 10 years. There is little understanding of the theory of how and why appraisal is supposed to produce its effects. A realist review of the literature was utilised to explore these issues, as they generate context‐mechanism‐outcome (CMO) configurations, resulting in the creation of theories of how and why appraisal of doctors produces its effects.

CONTEXT Revalidation was launched in the UK to provide assurances to the public that doctors are up to date and fit to practice. Appraisal is a fundamental component of revalidation. Approximately 150 000 doctors are appraised annually, costing an estimated £97 million over 10 years. There is little understanding of the theory of how and why appraisal is supposed to produce its effects. A realist review of the literature was utilised to explore these issues, as they generate contextmechanism-outcome (CMO) configurations, resulting in the creation of theories of how and why appraisal of doctors produces its effects.
METHODS A programme theory of appraisal was created by convening stakeholders in appraisal and searching a database of research on appraisal of doctors. Supplementary searches provided literature on theories identified in the programme theory. Relevant sections of texts relating to the programme theory were extracted from included articles, coded in NVivo and synthesised using realist logic of analysis. A classification tool categorised the included articles' contributions to programme theory.
RESULTS One hundred and twenty-five articles were included. Three mechanisms were identified: dissonance, denial and selfaffirmation. The dissonance mechanism is most likely to cause outcomes of reflection and insight. Important contexts for the dissonance mechanism include the appraiser being highly skilled, the appraisee's working environment being supportive and the appraisee having the right attitude. The denial mechanism is more likely to be enacted if the opposite of these contexts occurs and could lead to game-playing behaviour. A skilled appraiser was also important in triggering the self-affirmation mechanism, resulting in reflection and insight. The contexts, mechanisms and outcomes identified were, however, limited by a lack of evidence that could enable further refining of the CMO configurations.
CONCLUSION This review makes a significant contribution to our understanding of appraisal by identifying different ways that appraisal of doctors produces its effects. Further research will focus on testing the CMO configurations. INTRODUCTION Medical appraisal is an educational intervention traditionally based around a formative, developmental meeting between two professionals supported by information gathered from the full scope of practice. 1 It became a contractual requirement for all UK General Practitioners (GP) and Consultants in the National Health Service (NHS) in the early 2000s, 2,3 with approximately 150 000 doctors in practice now appraised each year. 4 The appraisal process was originally intended to give feedback on a doctor's performance, map a doctor's progress and outline further development requirements 5 and was not designed as an assessment of competence that a doctor either passed or failed. 6 With the introduction of revalidation by the General Medical Council (GMC) in 2012, the importance of medical appraisal was elevated. Revalidation is based on a doctor's participation in five annual appraisals and aims to provide objective assurance to the public that a doctor is up to date and fit to practise. 7 Revalidation necessitates the review of all doctors' licences to practise every 5 years by a responsible officer (RO). 8 The RO is typically the most high-ranking doctor in the organisation, who makes his or her judgement using the appraisal output summary. If the ROs have any fears regarding a doctor's fitness to practise, they are legally obliged to refer doctors to the GMC for further investigation. The Department of Health estimated that revalidation will cost £97 million over 10 years. 9 The interlinking of appraisal and revalidation, and the potential impact of revalidation on the appraisal process, has resulted in much debate. This has been compounded by disparate views on the goal of revalidation: either to detect 'bad apples', necessitating a summative approach and low criteria, or create a system where all doctors improve, necessitating evolving standards and a developmental model. 10 However, developing a workforce is not the same as making sure it is safe and confusion over the dual purpose of revalidation may have an unintended impact on the educational value of appraisal. 11 Revalidation is purported to have impacted on the appraisal process in other ways, including increased time costs, 12 collusion 13 and engagement in appraisal on a superficial level by disinterested appraisers and appraisees (game-playing), 14 and might reduce any regulatory control that appraisal was intended to deliver.
Before we can fully understand the potential impact of revalidation on the appraisal process, it is important to understand how appraisal of doctors is intended to produce its effects; that is, what are the causal mechanisms at work in the appraisal process? Despite the increased importance of the appraisal process to the medical profession, and the change in nature of medical appraisal, there is little understanding of the theory of how and why appraisal is supposed to improve doctors' performance. Scallan et al.'s 15 scoping review of the literature on appraisal found that to date the literature has mainly focused on: 'engagement' from the perspective of the doctor being appraised; exploration of different models of appraisal (internal versus external appraisal, peer and practice appraisal, and appraisal for revalidation); the benefits of appraisal; GPs' perceptions of the problematic link between appraisal and performance assessment or revalidation; administrative and management issues; and developing appraisers' skills.
Identifying the causal mechanisms in the appraisal process will contribute to the theory of how a doctor's performance is improved by appraisal. It may then be possible to understand the impact of revalidation on the appraisal process. Hitherto, no research of this type has been conducted. It will also have practical implications, in that an increased understanding of 'how and why appraisal works' will aid decision making around modifying and implementing appraisal processes at a local level. At an international level, the UK is unique in that certification of doctors relies heavily on appraisal. 16 There are some countries that do use a form of doctor appraisal 17 but this is not related to certification. However, appraisal relies on a number of processes (e.g. reflection, feedback on performance and developing insight) that are commonly used when regulators and professional bodies in other jurisdictions assess the fitness to practice of doctors. What this paper will attempt is to not only show how and why these processes work in the UK context, but also point out where the findings will be 'externally valid' to other systems around the world.

Research questions
The purpose of this review was to understand how and why appraisal of doctors produces its effects. The review questions included: 1 What are the mechanisms by which appraisal of doctors is believed to result in its intended outcomes? 2 What are the important contexts that determine whether the different mechanisms produce their intended outcome? 3 In what circumstances is appraisal likely to be effective?
We define an effective appraisal as one that results in a positive change in a doctor's practice that may ultimately benefit patients.

Realist review
A realist review approach was used to address the review questions. Realist review is a theoryorientated and explanatory approach to evidence synthesis. The literature is interrogated to develop and refine the theories that support the intervention being studied (in this review, appraisal), to explain what works, for whom, in what circumstances, in what respects. 18 A central part of a realist review is the development of a programme theory, which is an 'abstracted description', usually including a diagram that delineates the key functions, strategies or activities of an intervention, the intended outcomes of the intervention and the mechanisms that contribute to particular outcomes. 19 A realist synthesis takes a generative approach to causation: "to infer a causal outcome (O) between two events (x and y), one needs to understand the underlying mechanism (M) that connects them and the context (C) in which the relationship occurs". 20 Our review followed Pawson's five practical stages in conducting realist reviews. 18 A key feature of the realist review method is that it is iterative and the process frequently necessitates going back and forward between the different steps as the programme theory evolves.
Step 1: locate existing theories In order to develop an initial programme theory of appraisal we needed to locate existing theories. This was achieved in two ways. Firstly, we consulted with key stakeholders in appraisal, including doctors and appraisees, appraisers, academic experts on appraisal, the GMC, ROs and human resources personnel. Formal meetings were held with the stakeholders at 3-4-month intervals throughout the review and took the form of facilitated discussions centred on evolving programme theory. Stakeholders were identified through their involvement in previous research conducted by JA, MB and NB. Secondly, we searched the literature to find existing theories on how and why appraisal is supposed to work. These theories formed the foundation of the preliminary programme theory, which was then tested with data from studies included in the review.
Step 2: search strategy We undertook two searches. The search for data to develop, confirm, refute or refine ('test') aspects of the programme theory focused on well-established bibliographies generated from research on appraisal and revalidation conducted by the 'Collaboration for the Advancement of Medical Education Research and Assessment' (CAMERA) research team. [21][22][23][24][25][26] The CAMERA research team has conducted a programme of research on appraisal and revalidation since 2009.
This search strategy was different to the strategy outlined in our protocol, 27 which involved searching electronic databases (e.g. EMBASE), but was adopted because our piloting of searches indicated that it was an effective way of locating the relevant literature. All references relating to appraisal and revalidation were compiled into a single library using the Endnote X7.4 (Thomson Reuters Corporation, Toronto, ON, Canada) software programme. The library consisted of 463 references, including articles from peer-reviewed journals, books, reports and websites. The references of included papers were also searched for relevant articles. Searches of the CAMERA database were performed in April 2015 and updated in May 2016. The database was kept up to date. Members of the stakeholder group and review team were also canvassed to identify relevant literature.
The supplementary searches were purposive and undertaken when we identified that we needed more data on specific theories regarding different aspects of the programme theory of how appraisal is supposed to work (e.g. how feedback on performance is meant to change behaviour, the process of developing insight, the relationship between insight and reflection, behaviour change theories, experiential learning theories and selfaffirmation theories). Medline and Google Scholar were searched using topic keywords. Once relevant references were found, backwards and forwards citation searching techniques were used to identify further relevant papers. Relevant literature was also identified by members of the review team and stakeholder group.
Step 3: study selection criteria and procedures Documents used to 'test' the programme theory were selected based on relevance (i.e. does the source contain evidence or data that we can use to develop or 'test' aspects of programme theory?). Documents were not excluded because of their type and we included editorials, opinion pieces, commentaries, process evaluations, qualitative research, programme manuals and systematic reviews. Using a preliminary set of inclusion and exclusion criteria, the references in the database were then screened. The criteria were deliberately broad. Inclusion criteria: Aspect of appraisal interventionall documents that contain information on the appraisal meeting between the appraisee and appraiser Study designall study designs Types of settingsdocuments relating to healthcare settings Types of participantsall documents about doctors (in any specialty and at any career stage) Outcome measuresall appraisal related outcome measures.
Studies focusing only on the supporting information gathered by doctors for the appraisal meeting, the summary of the appraisal meeting, doctors' personal development plans (PDPs) and the appraisers' statements, or just revalidation, were excluded. Studies about medical students were also excluded.  A randomly selected sample of 10% of the identified articles was evaluated by two review authors (NB and MB) using the inclusion and exclusion criteria. The remaining 90% were screened by one reviewer (NB). Any discrepancies were discussed between NB and MB until agreement was reached. NB held the casting vote.
Step 4: extracting and organising data A hybrid tool was used to classify the included articles with regards to how much they were likely to contribute to developing programme theory (Table 1). This existing tool categorised sources as conceptually rich (thick) or thin (weaker) in their explanatatory power. [28][29][30] Again, a random sample of 10% of the included articles was independently assessed by two reviewers (NB and MB) and any discrepancies were discussed until agreement was reached.
In a traditional systematic review, data extraction is typically carried out using a standardised form; however, the realist review approach synthesises information through note-taking and annotation. Full texts of the included articles were imported into NVivo 10 (QSR International Pty Ltd, Doncaster, Victoria, Australia). Sections of texts that we interpreted as relating to one or more aspects of the programme theory were then coded, firstly by conceptual 'themes' and then by context, mechanism or outcome later during the analyses.
Step 5: data synthesis The data coded in NVivo were synthesised using a realist logic of analysis. More details can be found in our review protocol. 27 Data from included documents were used to 'test' and refine each part of the preliminary programme theory. For each outcome identified by the programme theory, we searched for data to support inferences about the likely causal mechanisms and the contexts in which those mechanisms might be triggered. The final realist programme theory is explained using narrative synthesis, text and figures. Findings were reported according to the RAMESES publication standards for realist syntheses. 19

Summary of studies
Of the 463 references identified in the CAMERA database, 394 were excluded based on title and abstract. Sixty-nine references were read in full and a further 27 citations were identified for inclusion through citation tracking, yielding 96 articles focusing on appraisal of doctors that were identified to develop and 'test' the initial programme theory (Fig. S1). The 96 included articles were published in 30 different journals, with the majority being in Education for Primary Care and the British Journal of General Practice ( Table 2). The majority of studies were carried out in the UK in Primary Care. The articles addressed a variety of topics in relation to appraisal. The majority of articles were opinion pieces or editorials. The majority were classified as having a thinner description.
For programme theory development, a further 29 papers were identified that contained theories that contributed to an understanding of how appraisal is supposed to work. The contents of these were synthesised, which helped the development and refinement of different aspects of our initial programme theory.
A full list of the 125 articles is available from the corresponding author by request.

Programme theory development: from appraisal intervention strategy to realist programme theory
The included articles were used to create an outline of the appraisal intervention. From this outline of the appraisal process, the realist programme theory was iteratively developed in the following steps.
Step 1: appraisee collects and reflects on supporting information; completes the appraisal form.
Step 2: appraiser reviews the appraisal form and supporting information; prepares for the appraisal meeting.
Step 3: appraisee and appraiser discuss the appraisal form and supporting information in the appraisal meeting.
Step 4: appraiser provides feedback or challenge.
Step 5: appraiser and appraisee agree on a personal development plan (PDP).
Our programme theory (Fig. 1) concentrates on disaggregating between Step 4 (the appraiser providing feedback) and Step 5 (inclusion of objective and action in PDP agreed) of the appraisal intervention strategy. In between these two steps we believe there are two potential intended intermediate outcomes (i.e. reflection and insight) 31 and then the unintended outcome of game-playing. 32 These intermediate outcomes may lead to another intermediate outcome of the appraisee including an objective in their PDP (Fig. 1). There are also a number of further outcomes after this, including the appraisee carrying out what is in his or her PDP, followed by behaviour change, followed by improved performance. Our realist review specifically focuses on the outcomes of reflection, insight and game-playing.

Context-mechanism-outcome configurations
For the outcomes within our initial programme theory ( Fig. 1) Figure 1 Programme theory of appraisal of doctors explanations for how different outcomes within the programme theory (reflection, insight and game-playing) have come about. Our exploratory realist explanations are expressed in the form of context-mechanism-outcome (CMO) configurations. In other words, for each outcome we propose which mechanism could have caused it and the contexts in which the mechanism is likely to be triggered.
However, a major caveat of this realist review was the limited rich data we were able to find in the included documents with which to develop, confirm, refute and refine the programme theory.
Although there were sufficient data in the literature to support the importance of certain contexts and outcomes, the data on the relationships between them and on their mechanisms were much more limited. This meant that we were unable to fully and reliably configure the CMOs. Even where evidence was available, the thin nature of much of the evidence limited the depth of the explanations we could develop.
The following examples of data extracts from the included articles demonstrate the differences between the rich and thin data. An example of data with a rich description was: The value of appraisals also depends on whether the appraiser is skilled [context] in conducting appraisals and is supportive, focused on the future, and participative. Badly conducted appraisals depress rather than improve performance [outcome]. 33 An example of data with a thin description was: appraisal has a tendency to operate superficially within medicine, with doctors engaging in creative "game playing" [outcome] toward its procedural requirements. 34 Thus the CMO configurations we present in the following sections are supported by data where possible but had to be supplemented by the content expertise of the review team and stakeholder group.

Dissonance
This CMO configuration includes the mechanism dissonance that we inferred from the included sources to be necessary in order to achieve the desired appraisal outcomes of reflection and insight. 22,31,32,[35][36][37][38][39] Cognitive dissonance is a well-recognised concept and refers to a situation where a person has conflicting attitudes, beliefs and behaviours. This produces a feeling of discomfort, which leads to an alteration in the person's attitudes, beliefs and behaviours in order to reduce the discomfort and restore balance.
In medical appraisal, the concept of dissonance has the properties of a mechanism. The feedback from the appraiser is used as a strategy to create a situation or context that is likely to trigger dissonance [40][41][42][43][44][45] (mechanism) leading to reflection (an intermediate outcome), which may then eventually lead to an 'a-ha' moment 46 or insight (outcome). Accordingly, the appraisee will seek to resolve this dissonance in some way. They can accept the data presented to them and re-evaluate their performance (insight or the 'a-ha' moment), or they can seek to resolve the dissonance in another way (e.g. when the denial mechanism is activated [see below]).
The skills of the appraiser (context) were reported as being crucial to an effective appraisal. 14,33,34,[47][48][49][50][51][52] The success of appraisal partially depends on the quality of the feedback provided by the appraiser. 50 Thus if the appraiser is highly skilled (context) and able to effectively deliver feedback (context), this triggers the dissonance mechanism, which is more likely to lead to the appraisee developing insight (outcome).
If the appraisee has the 'right' characteristics and approaches it in a proactive manner (context) the appraisal is more likely to be effective. [52][53][54] Incongruent feedback is more likely to trigger dissonance and result in reflection or increased insight (outcome) and ultimately behaviour change (outcome). 44 The same argument could be made for the context of time available to prepare for the appraisal.
Our findings regarding the behaviour of the dissonance mechanism are consistent with Conlon's adaptation of Kolb's learning cycle, which depicts appraisal as a formalised way of guiding a professional through the learning cycle. 38 Reflection is the connection between experience and the generation of ideas, which then leads to behaviour change. Appraisal works on the reflection part of the learning cycle by forcing appraisees to reflect and then providing feedback on the reflections, which leads to appraisees developing insight and changing their behaviours.

Denial
This CMO configuration includes the mechanism of denial 40,43 that we inferred from the included data. It led to the less desirable outcome of game-playing or treating the appraisal process like a tick-box exercise (Fig. 1). 32,34,55,56 The denial mechanism interacts with the dissonance mechanism. One way of reducing cognitive dissonance is denial of the feedback or validity of the data being presented. 43 When appraisees treat the appraisal like a tick box exercise 32,34,55,56 they are essentially going through the process of appraisal without actually benefitting from the feedback intervention strategy. Besides changing behaviour, a doctor can reduce his or her commitment to the goal (context). Such a reduction would reduce dissonance by making the discrepant behaviour less relevant to the clinician and easier to deny.
We infer that there are several contexts in which the denial (mechanism) is likely to trigger gameplaying or tick-boxing behaviour (outcome). These contexts are the opposite to those that trigger the dissonance mechanism and include the appraiser being unskilled, 13 the appraisee not having the right approach to appraisal and the appraisee not having enough time available to prepare for appraisal. 14,35,[57][58][59] Our findings on the behaviour of the denial mechanism are consistent with Payne and Hysong's physician feedback model (i.e. non-acceptance [or denial] of feedback tends to result in no behaviour modification). 60

Self-Affirmation mechanism
This CMO configuration includes the mechanism of self-affirmation, which we infer leads to the appraisee maintaining or improving performance (outcome) (Fig. 1). There was some evidence to support this mechanism in the literature, where appraisees reported the value of receiving confirmation that their performance was satisfactory, a recognition of their achievements and identification of their strengths. 12,36,38,50,58 Self-affirmation is understood in psychology as an act that demonstrates one's adequacy based on the 'premise that people are motivated to maintain the perceived worth and integrity of the self'. 61 In any given day there are numerous events that are understood as relevant to the self in one way or another and this helps people to repeatedly recharge their sense of adequacy. A self-affirmation is a happening that validates a person's adequacy (e.g. positive feedback on a skill). 61 Well-timed affirmations can improve performance, education, health and relationship outcomes. 62 In medical appraisal, the concept of self-affirmation acts as a mechanism. If an appraisee's performance is strong in a particular area of clinical practice, a skilled appraiser (context) identifying these strengths (context) provides positive feedback confirming that the appraisee is performing well (context). This leads to self-affirmation (mechanism) of their clinical practice and to appraisees reflecting on their performance (outcome), thus maintaining or improving performance (outcome). This mechanism is particularly important when doctors are working in relative isolation.
Another context likely to influence whether the selfaffirmation mechanism is triggered, relates to the appraiser and appraisee's relationship and in particular whether the appraisee respects and trusts the appraiser. If an appraiser is trusted and respected 54 by appraisees (context) they are more likely to trust and appreciate the positive feedback they are given. If they do not trust or respect their appraiser (context) then it is likely that the feedback could lead to denial (mechanism), which leads to game-playing (outcome).
Our findings on the behaviour of the selfaffirmation mechanism are congruent with Cohen's cycle of adaptive potential, in which a positive feedback loop between the self and the social system can over time promote adaptive outcomes. 62 Path (a) in Figure 1 in this paper 62 is particularly relevant (i.e. as a result of being self-affirmed, the person achieves more adaptive outcomes, e.g. better performance). Similarly, part of Payne and Hysong's physician feedback model depicts a similar outcome (i.e. when doctors receive favourable feedback they feel a sense of pride and try to ensure performance is maintained). 60

Statement of principal findings
This realist review sought to explore the mechanisms and contextual factors in the appraisal of doctors. We developed a programme theory to explain how appraisal of doctors is supposed to produce its effects. This review included 125 articles that were extracted and synthesised to create CMO configurations centring on three mechanisms: (1) dissonance, (2) denial, and (3) self-affirmation. We propose that it is through these mechanisms that appraisees reach both desirable and undesirable outcomes from appraisal. Contexts in which the dissonance mechanism is more likely to be triggered include the appraiser being highly skilled, the appraisee's working environment being supportive of appraisal and the appraisee having the right attitude towards appraisal. The dissonance mechanism is most likely to result in reflection and insight, which is more likely to lead to the appraisee changing his or her behaviour. The denial mechanism is more likely to be enacted if the opposite of these contexts occur (i.e. the appraiser is unskilled, the doctor's environment is not supportive of appraisal and the appraisee lacks the 'right' attitude towards appraisal). The denial mechanism is more likely to trigger the outcome of game-playing or treating appraisal like a tickbox exercise. Finally, if an appraisee's performance is strong in a particular area of clinical practice, a skilled appraiser identifying these strengths and providing positive feedback, confirming that the appraisee is performing well, provides a self-affirmation that leads to the outcomes of reflection and insight. A trusting relationship between the appraisee and appraiser was also an important context for the selfaffirmation mechanism and arguably for the dissonance mechanism too.
Although we have identified three different mechanisms in the appraisal process, it would be reasonable to hypothesise that all three mechanisms (dissonance, denial and self-affirmation) could be present in one appraisal encounter in any combination. In an appraisal meeting a number of aspects of a doctor's performance are discussed, with the potential for each triggering a different mechanism and thus a different outcome. The structure of the programme theory allows for this complexity.

Strengths and limitations of the review
The main strength of this review was the use of realist methodology. The knowledge created by this method has resulted in an empirical contribution to the existing body of literature, as previous research has not focused on identifying contexts or mechanisms that result in a successful appraisal. The included literature on the appraisal of doctors was exhaustive and was supplemented by literature from a variety of different areas to support our programme theory. Although many of the concepts, like cognitive dissonance and selfaffirmation, are not new, what this realist review achieves is to bring disparate bodies of literature together, linking them to appraisal. Furthermore, although this review focuses on an intervention implemented in the UK, this review is about a system that relies on a number of commonly used processes (e.g. feedback on performance) and thus our findings are 'externally valid' to other similar systems in place around the world. The review also has local practical implications, in that an increased understanding of 'how and why appraisal works' will aid decision making around modifying and implementing appraisal processes at a local level.
In order to reduce errors in the screening, data extraction and quality assessment of sources, a proportion (10%) was carried out independently by two reviewers. The RAMESES quality standards for realist syntheses were followed throughout the review process. Input on evolving programme theory and emerging findings was provided throughout the review by key stakeholders in appraisal.
The main limitation of this review relates to the lack of good-quality data in the included studies to effectively carry out our CMO configurations. Thus, the explanatory power of this review was limited because of the absence of rich data. Empirical data on appraisal of doctors that contain elements of contexts and outcomes and can be used to make inferences about mechanisms are required. The literature from other sectors in which appraisal has been implemented, including nursing and higher education, could have been explored; however, this was not the focus of the review.

Recommendations for further research
Findings from this realist review highlight the need for further research to support, refute or refine our proposed programme theory of how appraisal of doctors is supposed to produce its effects. Realist evaluation methods would be most suited to this endeavour. Although a realist review depends on secondary data (i.e. published literature), a realist evaluation would gather primary data (e.g. through interviews with appraisers and appraisees) to 'test' the programme theory of how appraisal produces its effects. This type of research would generate findings to explain what occurs during appraisal meetings to instigate behaviour change and how contextual factors impact upon appraisal outcomes.
Once we fully understand how appraisal of doctors produces it effects, the next step would be to understand how revalidation produces its effects. Again, realist methods would be best to answer this research question. Revalidation is a separate intervention to appraisal, although appraisal is part of the intervention strategy of revalidation. There were limited data in the literature we reviewed to support the idea of revalidation being an important context that impacts on appraisal. This was probably because there has been limited published research to date evaluating revalidation. Anecdotal evidence combined with ongoing empirical research currently being conducted will be able to fill this gap in the near future. 21

CONCLUSION
This review is an important first step in understanding how appraisal of doctors produces it effects. It makes a significant contribution to the literature by identifying dissonance, denial and selfaffirmation as three causal mechanisms at work in the appraisal process. The review also identified appraisers' skills, appraisees' characteristics, the environment in which the appraisee works and the relationship between the appraisee and appraiser as key contexts that are crucial to an effective appraisal.
Contributors: JA conceptualised the study. NB led the design and drafting of the review protocol, which was critically reviewed by JA, MB, MP, CC and GW. CC scoped and designed the search strategy. NB carried out the review. MB helped with screening, data extraction and quality assessment. Methodological advice was given by MP and GW. NB wrote the first draft of this paper. JA, MB, MP, CC and GW critically reviewed it and provided comments to improve the manuscript. All authors have read and approved the final manuscript.