The Birth Satisfaction Scale – Revised (BSS-R): should the subscale scores or the total score be used?

ABSTRACT Objective and background: The 10-item Birth Satisfaction Scale – Revised (BSS-R) is increasingly being used internationally as the instrument of choice for the assessment of birth satisfaction. There remains conjecture over the most appropriate way to score the instrument; subscale scores overall total score, or both approaches. The current study sought to clarify this issue by examining the measurement characteristics of the United States version of the BSS-R from a large data set. Methods: Secondary analysis of a data matrix from a large sample US BSS-R validation study (N = 2116) using structural equation modelling. Results: A bi-factor model revealed an excellent fit to data (χ2(df = 25) = 208.21, p < 0.001, CFI = 0.98, RMSEA = 0.06, SRMR = 0.04), demonstrating relative independence of the BSS-R quality of care subscale, while in contrast the women’s attributes and stress experienced during childbearing subscales could be explained more plausibly by a general factor of experience of childbirth. Conclusion: Consistent with the recommendations of the original BSS-R validation study, the current investigation found robust empirical evidence to support the use of both the subscale scoring system and the total score. Researchers and clinicians can therefore select either approach (or both) with confidence.


Introduction
Birth satisfaction has been placed centrally within the main domains of interest and relevance for birth outcomes globally (The International Consortium for Health Outcome Measurement, 2017). Given the established relationship of birth satisfaction and key clinical parameters, including delivery type (Fleming et al., 2016), accurate, valid and reliable assessment of the birth satisfaction concept is essential. The Birth Satisfaction Scale -Revised (BSS-R; (Hollins Martin & Martin, 2014) has become established as the 'gold standard' instrument of choice for assessment of birth satisfaction, with the status of the tool conferred by international expert consensus (ICHOM, 2017), validation studies (Barbosa-Leiker, Fleming, Hollins Martin, & Martin, 2015;Fleming et al., 2016;Goncu Serhatlioglu, Karahan, Hollins Martin, & Martin, 2018;Hollins Martin & Martin, 2014;Jefford, Hollins Martin, & Martin, 2018;Martin et al., 2017) and clinical investigations highlighting the veracity and applicability of the measure (Hinic, 2016(Hinic, , 2017. A topical issue since the original development of the BSS-R (Hollins Martin & Martin, 2014) is which scoring approach to undertake. The BSS-R assesses three domains of (i) stress experienced during childbearing, (ii) women's attributes, and (iii) quality of care. Hollins Martin and Martin (2014) recommended in their original paper that subscale scores representing the above domains can be used and the developers offer a scoring algorithm to facilitate this. Additionally, Hollins Martin and Martin (2014) also suggest that the total 10-item score is permissible as an overall index of birth satisfaction or the experience of childbearing. Consequently, those using the BSS-R are left with essentially three choices for scoring; (i) subscales, (ii) total score or (iii) subscales and total score.
It was noted that although the original study established a three-factor correlated model as the measurement model of the tool, thus the rubric for the three derived subscales, a second-order model was also evaluated with experience of childbearing being the second-order factor (Hollins Martin & Martin, 2014). Importantly, support for a second-order model would suggest that the variance in the three domains would be explained by the second-order factor and thus would imply a (statistical) preference for the use of a total score. However, no evidence was found for the superiority of this higher-order model when compared to the three-factor correlated model, which remains the foundation for the established multidimensional scoring system of the BSS-R (Hollins Martin & Martin, 2014).
Given that the BSS-R is being increasingly used world-wide and is globally endorsed for clinical outcome assessment (ICHOM, 2017), addressing the scoring preferences of the BSS-R empirically is both prudent and relevant.
The lack of a definitive preference for a model type may be accounted for by the inherent limitations of the higher-order model and its specification (Chen, West, & Sousa, 2006), thus the observation of little difference between models in model-fit when compared. An alternative approach has been suggested, the bi-factor model, which offers a number of advantages over higher-order models and to some degree correlated models (Chen et al., 2006). The bi-factor model tests fit to data by examining how much of the variance in all of the items of the scale are explained by a general factor, and within the model specification, how much of the remaining variance in the items is explained by domain-specific factors (Chen et al., 2006).
The current investigation sought to generate an empirically driven recommendation regarding the scoring of the tool. The objectives of the current study were as follows.
(1) To compare the three-factor correlated model of the BSS-R with an alternative bifactor model. (2) To determine whether a general factor of 'experience of childbearing' explains the variance in the items of the BSS-R.
(3) To determine whether domain-specific factors of 'stress experienced during childbearing', 'women's attributes' and 'quality of care' explains the variance in the items of the BSS-R. (4) Based on the findings from objectives 1-3, to recommend an empirically derived scoring approach to the BSS-R.

Method
A secondary analysis of the data matrix from the US large-sample BSS-R study (Fleming et al., 2016;Martin et al., 2017).

Participants
Participant characteristics (N = 2116) are described in detail in Fleming et al. (2016) and Martin et al. (2017), with pertinent details of study design, demographics, clinical details and ethical approval summarised.

Measures
The BSS-R (Hollins Martin & Martin, 2014) is scored on a five-point Likert-type scale with responses ranging from 1, strongly agree; 2, agree; 3, neither agree or disagree; 4, disagree; 5, strongly disagree, with reverse scoring of four items. The three BSS-R subscales include stress experienced during childbearing (four items), e.g. 'I found giving birth a distressing experience', quality of care (four items), e.g. 'The delivery room was clean and hygienic' and women's attributes (two items), e.g. 'I felt out of control during my birth experience'.

Statistical analysis
The objectives of the study were addressed using a structural equation modelling (SEM) approach to model evaluation (Byrne, 2010), specifically confirmatory factor analysis (CFA). The tridimensional measurement model of the BSS-R comprising correlated factors of stress experienced during childbearing, women's attributes and quality of care specified by Hollins Martin and Martin (2014) was compared to an alternative bi-factor model comprising a general factor and the three domain-specific factors previously outlined. Model fit indices (Bentler & Bonett, 1980) used to evaluate fit included the comparative fit index (CFI; Bentler, 1990), the root mean squared error of approximation (RMSEA; Steiger & Lind, 1980) and the square root mean residual (SRMR; Hu & Bentler, 1999). The relationship between both domain-specific factors and the general factor within the bi-factor model are specified as orthogonal, because an essential characteristic of the model is to identify the additional contribution of the domain-specific factor beyond that of the general factor.

Preparation of the data covariance matrix
A correlation matrix from the US large-sample BSS-R data set of the 10 BSS-R items was obtained from the instrument developers (Hollins Martin & Martin, 2014). The standard deviations from each item of the BSS-R were also obtained. Knowledge of the standard deviations of each of the BSS-R items allows the correlation matrix to be converted into a covariance matrix. Information on the sample size from the US large-sample BSS-R data set (N = 2116) is then incorporated into the data analysis with the covariance matrix and this provides an accurate approximation of the original data set for the purposes of SEM without the need to access the original data set itself.

and 6.
A posteriori single-factor CFA was therefore conducted to examine the model fit of these nine items. This revealed a poor fit to data (χ 2 (df = 27) = 2546.51, p < 0.001, CFI = 0.72, RMSEA = 0.21, SRMR = 0.10). A further a posteriori single-factor CFA was conducted comprising stress experienced during childbearing and women's attributes items, with excellent fit to data being observed (χ 2 (df = 9) = 77.15, p < 0.001, CFI = 0.99, RMSEA = 0.06, SRMR = 0.02). The quality of care items were evaluated within a single-factor CFA (χ 2 (df = 2) = 12.35, p = 0.002, CFI = 0.99, RMSEA = 0.05, SRMR = 0.01) and revealed excellent fit to data. A two-factor correlated model comprising quality of care items and a combined factor of women's attributes and stress experienced during childbearing was evaluated with excellent fit to data being observed Figure 1. Bi-factor model illustrating item-factor loadings as a function of domain-specific factors and the general factor. All item-factor loadings are standardised. Double-headed arrows to the immediate right of items represent residual values. Double-headed arrows on factors represent factor variances.

Discussion
The findings from this re-analysis of the US large-sample BSS-R data set offers some useful insights into scoring approaches that can be taken with the BSS-R. The bi-factor model was shown to offer a similar fit to data in comparison to the three-factor correlated model that circumscribes the measurement model of the BSS-R. It was noted that the variance of the two items comprising the women's attributes factor were explained by the general factor, with the domain-specific factor of women's attributes contributing little explanatory variance to BSS-R items 4 and 8 once the influence of the general factor is taken into account. A similar observation was observed for the stress experienced during childbearing subscale. This contrasts with the quality of care subscale, which demonstrated independence as a domain-specific factor once the impact of the general factor was taken into account. The findings from the current study indicate the value of the BSS-R domain-specific subscales as valuable and independent measures of discrete aspects of birth satisfaction, particularly in relation to the quality of care subscale. The finding that the general factor accounted for a significant proportion of the variance in nine of the BSS-R items would also indicate the valid use of the BSS-R total score. The contribution of the general factor to the variance of most BSS-R items offers a clear indication that the overall score measures a coherent theme of satisfaction with the experience of childbearing, as suggested by Hollins Martin and Martin (2014) in their original instrument development study and their evaluation of a higher-order model that considers this notion. The relative independence of the quality of care subscale also provides an indication that this four-item subscale could be used separately to the BSS-R, should a robust, short and valid self-report measure of quality of care be required, thus a standalone self-report measure of care quality.
Limitations of the study included using a covariance matrix to approximate actual data; however, in the very specific context of SEM, this approximation is extremely accurate to actual data and fit for this specific purpose. A strength of the study is the additional insights into the BSS-R gained using bi-factor modelling, an approach which has rarely been used in reproductive psychology, but yet offers some valuable potential in understanding the measurement characteristics of tools used in the area.

Notes
1. As would be anticipated, model-fit of the three-factor correlated model is identical to that reported by Martin et al. (2017), the trivial fractional variation in χ 2 (256.82 vs. 255.15) explained by rounding to two decimal points in the correlation matrix and individual BSS-R item standard deviations.