Development of a novel risk prediction and risk stratification score for polycystic ovary syndrome

The aim of this study was to develop a simple phenotypic algorithm that can capture the underlying clinical and hormonal abnormalities to help in the diagnosis and risk stratification of polycystic ovary syndrome (PCOS).

three criteria and can be assessed by using a variety of assays to test for relevant biomarkers in serum and/or saliva including serum levels of total testosterone (TT), free T, androstenedione and dehydroepiandrosterone sulphate (DHEAS) or by calculating available indices such as free androgen index. This plethora of available androgen biomarkers and indices in combination with the current little guidance on cut-offs indicative of androgen excess in the PCOS guidelines 3,4 contribute to diagnosis-and risk stratification-related uncertainties.
FAI is commonly used to define hyperandrogenaemia in the diagnosis of PCOS. However, recent data 6 show that FAI is not a reliable indicator of free T when sex hormone-binding globulin (SHBG) concentration is low and hence can misclassify women who are being investigated for PCOS. Clinical hyperandrogenaemia, characterized by the presence of hirsutism, is recommended as a substitute of biochemical hyperandrogenaemia in the current guidelines but this can often be unreliable due to wide interobserver variation and ethnic variations. 7 While the focus has been placed upon biochemical and clinical hyperandrogenaemia for the diagnosis of PCOS, recent data by our group 8 and others 9 have shown that elevated levels of antimullerian hormone (AMH), a surrogate measure of follicle count on ultrasound, can be an important supplement to the hormonal parameters used in the diagnosis of PCOS. While PCOS is a diagnosis of exclusion, the diagnosis can often be challenging, given the presentation of this syndrome as a spectrum of clinical features and metabolic abnormalities in the affected patients, rather than the presence of a single unified entity, PCOS. The aim of this study was to use relevant biochemical markers and quantifiable clinical features to derive a risk score that can capture the entire PCOS disease spectrum. This simple risk score has the potential to assist in diagnosis, severity prediction of the disease risk stratification of PCOS women.

| Study population
This was a cross-sectional study involving 111 well-characterized women with PCOS and 67 women without PCOS who presented sequentially and prospectively at the Department of Academic Diabetes, Endocrinology and Metabolism. All patients gave written informed consent. This study was approved by the Newcastle & North Tyneside Ethics committee (ISRCTN70196169) and was conducted in accordance to the Declaration of Helsinki and local regulations. The diagnosis of PCOS was based on at least two out of three of the diagnostic criteria of the Rotterdam consensus, namely clinical and biochemical evidence of hyperandrogenism (Ferriman-Gallwey score >8; free androgen index >4, total testosterone >1.5 nmol/L), oligomenorrhea or amenorrhoea and polycystic ovaries on transvaginal ultrasound. Nonclassical 21-hydroxylase deficiency, hyperprolactinemia, Cushing's disease and androgen-secreting tumours were excluded by appropriate tests. The study and study measurements are described in detail in our previous publication. 8 In summary, we measured body mass index (BMI) (kg/m 2 ), waist circumference (cm), hip circumference (cm), AMH (pmol/L), salivary testosterone (pmol/L), total testosterone (nmol/L), salivary androstenedione (pmol/L), serum androstenedione (nmol/L), SHBG (nmol/L), FAI (%), follicle-stimulating hormone (FSH) (IU/L), luteinizing hormone (LH) (IU/L), fasting glucose (mmol/L), 2-hour glucose (mmol/L), insulin (μIU/mL) according to established protocols in women with PCOS and controls. We also ascertained oral contraceptive use and history of menstrual irregularity/amenorrhoea. All of the control women had regular periods, no clinical or biochemical hyperandrogenism, no polycystic ovaries on ultrasound, no significant background medical history and none of them were on any medications including oral contraceptive pills or over the counter medications.

| Study measurements
Blood samples were centrifuged within 5 minutes of collection and were stored frozen at −80 °C pending analysis. All study measure- we used the Beckman Coulter Access automated immunoassay from Beckman Coulter, as studies have shown good correlation between the Gen II, Elecsys assays and the new Access AMH assay. 10 17-OHP was measured in the early morning sample and if on the higher side of the nomogram, congenital adrenal hyperplasia was excluded with ACTH stimulation test. The free androgen index (FAI) was calculated as the total testosterone × 100/SHBG.

| Collection and handling of saliva samples
This has been detailed previously for the saliva collection and for the salivary androgen measurement methodology7. In brief, participants were asked to spit or drool directly into a 4 mL sealable polystyrene tube and to provide at least 3 mL of saliva. Unstimulated saliva samples were used to avoid any assay interference. The "passive drool" technique was used for the collection of saliva rather than the "salivette" method. Salivary testosterone and salivary androstenedione were measured by LC-MS/MS analysis performed using a Waters Acquity UPLC system coupled to a Waters Xevo TQS mass spectrometer, giving a lower limit of quantification of 5 pmol/L for salT and 6.25 pmol/L for salA with an inter-and intra-assay precision coefficient of variation of <4% and <7.5%, respectively.

| Statistical analysis
All the study variables were log transformed if they were not normally distributed. After the log transformation, we imputed the missing values using an iterative imputation method missForest. 11 missForest is an implementation of random forest algorithm. It is a nonparametric imputation method, which builds a random forest model for each variable and subsequently uses the model to predict missing values in the variable with the help of observed values. To evaluate androgen levels between PCOS cases and controls, univariate comparative analyses were performed using the nonparametric Mann-Whitney tests on the imputed data sets. Means (standard deviations) or medians (interquartile range) were used to summarize continuous variables as appropriate, while proportions and frequencies were used to summarize categorical variables.

| Risk prediction
In logistic regression models, if the sample size is small or if a predictor is strongly associated with one of the possible outcomes the estimated coefficients may be biased. To overcome this issue, we used logistic regression model with Firth's bias-adjusted estimates. The basic idea of the Firth's logistic regression (Firth 1993) is to introduce a more effective score function by adding a term that counteracts the first-order term from the asymptotic expansion of the bias of the maximum likelihood estimation-and the term will go to zero as the sample size increases. 12 Model selection with Firth's bias adjustments was done using R package "logistf". 12 Firstly, we included all the relevant variables in a model such as age, BMI, waist circumference, menstrual irregularity (yes/no), use of oral contraceptives (yes/no), serum testosterone, salivary testosterone, serum androstenedione, salivary androstenedione, oestradiol, SHBG, DHEAS, LH, FSH, Prolactin, 17-OHP, FAI and AMH levels. We did not include menstrual disturbances in the model as it is extremely difficult to quantify the extent duration and severity of menstrual disturbances and simply entering a yes/no variable can lead to model overfitting.
Next, we used backward in logistf in R to identify best model from a set of candidate predictor variables by entering predictors based on P value cut-off of 0.05. The variable selection in logistf is simply performed by repeatedly calling add one or drop one methods for logistf and is based on penalized likelihood ratio test. In order to assess the stability of the model thus obtained compared this stepwise model based on P-values to a model using forward selection.
As the apparent predictive performance (performance in the development cohort) usually overestimates the performance in other patients, owing to overfitting and peculiarities in the development cohort, 13 we internally validated the model through bootstrapping

| RE SULTS
The anthropometric and hormonal characteristics of women with PCOS and controls from the Hull UK PCOS biobank are shown in Table 1. Women with PCOS were younger (P = 0.01) had higher BMI  (Table S1). Bootstrap estimates of several discrimination indices to quantify the model are presented in Table S2. The optimism corrected estimate of the Somers' D was 0.81 (Table S2)  lower HDL-C levels (P = 0.02), as compared to those with a low-risk score (Table 3). We have constructed a mobile phone application for easy usage of this risk score in clinical settings. (Figure S1).

| D ISCUSS I ON
The diagnosis of PCOS is often challenging given the wide range of hormonal markers and derived indices used to measure hyperandrogenism and variations in clinical presentations. We developed and internally validated a simple four-variable model (ie, FAI, 17-OHP, AMH and waist circumference) for predicting the risk of having PCOS in clinical settings. This model showed good discrimination ability and good calibration. Each of the 4 variables reported in our model has been previously associated with PCOS. 6,9,15,16 In line with differential diagnoses of conditions causing hyperandrogenism in females, in this we measured 17-OHP levels to rule out a potential diagnosis of nonclassical congenital adrenal hyperplasia (NCCAH), which is another disorder of hyperandrogenism.

PCOS (n = 67)
Control (n = 111)  controls, 15,16 with the levels being highest in those with severe phenotype of PCOS. 15 Interestingly, a subgroup of PCOS patients with exaggerated 17-OHP response to GnRH agonist presented with severe hyperandrogenaemia, glucose-stimulated β-cell insulin secretion, and worse insulin resistance. 20 The excess 17-OHP in patients with PCOS is thought to be of the result of excess stimulation of theca interna cells by luteinizing hormone (LH). 15 In this study, for the first time, we showed that 17-OHP is independently associated with PCOS, after adjustments of FAI, AMH and waist circumference. However, the discriminatory capacity of 17-OHP to detect PCOS was small and if not readily available, can be excluded from the model.

P-value * Median (IQR) Median (IQR)
We also show that AMH was independently associated with PCOS diagnosis after adjustments for FAI, WC and 17-OHP. AMH is produced in the granulosa cells by the preantral and small antral follicles and it appears to inhibit the action of FSH on aromatase, and therefore, it contributes to the development of a single follicle for ovulation. 21 AMH is elevated in PCOS due to the increased count of small antral follicle and increased secretion of AMH per follicle. 22 We have recently shown that those with raised AMH have up to 4fold increased risk of having PCOS. 8 It has also been suggested that serum AMH reflects ovarian size in PCOS patients and can be used as surrogate for transvaginal ultrasound in the diagnosis of PCOS. 9 The associations of FAI and waist circumference with PCOS are well-documented in the literature. 6,23 Waist circumference, a measure of central adiposity, is a marker of severity of PCOS and has been suggested to be a better surrogate of glucose and lipid metabolism in PCOS than the disease status per se. 23 Menstrual dysfunction is a common symptom in PCOS and is a consequence of anovulation.
Ovulatory dysfunction can also be seen in women who have regular menstrual cycle 24,25 and as a result menstrual history alone is insufficient in defining PCOS. The prevalence of nonspecific menstrual dysfunction is high in women, especially in adolescent population where it can be as high as 30%, 1-year post-menarche. 26 It is difficult to identify real anovulation-related menstrual dysfunction and many of the women are already on oral contraceptive pills which makes it difficult to ascertain the history of menstrual dysfunction. Hence, we decided not to include this variable in our model.
In this study, we showed that those with a high-risk score derived from a model, which included waist circumference, FAI, AMH

| CON CLUS IONS
In summary, we have developed a simple model consisting of FAI, 17-OHP, AMH and waist circumference for risk prediction and risk stratification in PCOS, with these variables previously associated with PCOS. This model will have to be externally validated in populations across different ethnicities before a widespread clinical application.

CO N FLI C T O F I NTE R E S T
Nothing to declare.