The selection function of the RAVE survey

We characterize the selection function of RAVE using 2MASS as our underlying population, which we assume represents all stars which could have potentially been observed. We evaluate the completeness fraction as a function of position, magnitude, and color in two ways: first, on a field-by-field basis, and second, in equal-size areas on the sky. Then, we consider the effect of the RAVE stellar parameter pipeline on the final resulting catalogue, which in principle limits the parameter space over which our selection function is valid. Our final selection function is the product of the completeness fraction and the selection function of the pipeline. We then test if the application of the selection function introduces biases in the derived parameters. To do this, we compare a parent mock catalogue generated using Galaxia with a mock-RAVE catalogue where the selection function of RAVE has been applied. We conclude that for stars brighter than I = 12, between $4000 \rm K<T_{\rm eff}<8000 \rm K$ and $0.5<\rm{log}\,g<5.0$, RAVE is kinematically and chemically unbiased with respect to expectations from Galaxia.


I N T RO D U C T I O N
In any statistical analysis, it is fundamental to understand the relation between the objects for which data were obtained, and the underlying population from which the sample was drawn. This relation is called the selection function of the sample. Without this knowledge, it is difficult to accurately infer the general properties of a population.
Many large-scale astronomical surveys of Milky Way stars with data releases currently or soon available make some effort to characterize their selection function. The explicit quantification of the selection function of a stellar survey has been demonstrated by Schönrich & Binney (2009) for the Geneva-Copenhagen survey (GCS; Nordström et al. 2004), Bovy et al. (2012) for a subsample of the Sloan Extension for Galactic Understanding and Exploration survey (Yanny et al. 2009), Nidever et al. (2014) for the APO Galactic Evolution Experiment (Majewski et al. 2015) and Stonkutė et al. (2016) for the Gaia-ESO survey (Gilmore et al. 2012). A number of factors such as changes to the observing strategy, limitations due to instrumentation, or including different input catalogues can all affect the final resulting catalogue, so it is crucial to consider each of these aspects when characterizing the selection function.
In this paper, we present a study of the selection function of the RAdial Velocity Experiment (RAVE) survey based on its most recent data release (DR5; Kunder et al. 2017), to facilitate the wider and more robust use of this publicly available catalogue. This survey was among the first surveys in Galactic astronomy with the explicit purpose of producing a homogeneous and well-defined data set. To achieve this goal, the initial target selection was purely based on the apparent I-band magnitudes of the stars.
Based on the simplicity of the selection function, a number of recent studies using RAVE data, reviewed in Kordopatis (2015), assumed the RAVE survey to be a kinematically unbiased sample to investigate models of our Galaxy. In particular, Sharma et al. (2014) briefly addressed the selection function with respect to ensuring that their subsample was unbiased, by mimicking the target selection of RAVE directly using Monte Carlo realizations of their Galaxy models. However, here we aim to characterize the selection function of all stars available in DR5.
We present a short overview on the RAVE survey in Section 2, summarizing the history of the survey with respect to the target selection and observing strategy. Our reduced sample for evaluating the selection function is described in Section 3. In Section 4, we present our results for two different ways of evaluating the selection function: field-by-field and by HEALPIX pixel. Then in Section 4.3, we incorporate the effects of the spectral analysis pipeline on the final catalogue. In Section 5, we present the method for generating our mock-RAVE catalogue, and compare it to a sample of RAVE DR5 stars. We then test for biases due to the selection function of RAVE, by comparing our mock-RAVE catalogue with a parent GALAXIA sample. Finally, we discuss the implications of these findings and our conclusions in Section 6.

T H E R AV E S U RV E Y
RAVE is a large-scale spectroscopic stellar survey of the Southern hemisphere conducted using the 6 degree Field (6dF) multiobject spectrograph on the 1.2-m UK Schmidt Telescope at the Siding Spring Observatory in Australia, and completed in 2013. A general description of the project can be found in the data release papers (DR1, Steinmetz et al. 2006; DR2, Zwitter et al. 2008; DR3, Siebert  Kordopatis et al. 2013a) as well as in the most recent data release paper (DR5; Kunder et al. 2017). We show the distribution of targets available in RAVE DR5 in Fig. 1.
The spectra were taken in the Ca II-triplet region (8410-8795 Å) with an effective spectral resolution of R ≈ 7500. The strong calcium absorption lines allow a robust determination of the lineof-sight velocities via the Doppler effect even with low signal-tonoise ratio (SNR) ( 10 pixel −1 ). This region was explicitly chosen to coincide with the spectral range of Gaia's Radial Velocity Spectrometer (Prusti 2012;Bailer-Jones et al. 2013;Recio-Blanco et al. 2016). While Gaia will release radial velocity and stellar parameters in forthcoming data releases, at present Gaia offers only position and magnitude information for approximately a billion stars (Gaia Collaboration 2016). The Tycho-Gaia astrometric solution (TGAS; Michalik, Lindegren & Hobbs 2015) provides parallax and proper motion data for ∼2 million stars that were observed by Tycho-2 (Høg et al. 2000). As RAVE contains 215 590 unique TGAS stars, it offers a unique advantage of providing stellar parameters for stars with improved parallax and proper motion data from TGAS.

Input catalogue
When observations for the RAVE survey started in 2003, there was no comprehensive photometric infrared survey available to serve as an input catalogue. Instead, approximate I-band magnitudes were calculated from the Tycho-2 catalogue and the SuperCOSMOS Sky Survey (SSS; Hambly et al. 2001), and used to construct an initial input catalogue of ∼300 000 stars. In 2005 May, the DENIS catalogue (Epchtein et al. 1999) became available that provided Gunn I-band photometry; however, it did not provide sufficient sky coverage to serve as the sole basis for the input catalogue. RAVE DR1, DR2 and DR3 were sourced from the original input catalogue (Kordopatis et al. 2013a).
The fourth data release, DR4 (Kordopatis et al. 2013a), incorporated observations drawn from a new input catalogue, using DE-NIS DR3 (DENIS Consortium 2005) as the basis, which had been cross-matched with the 2MASS point source catalogue (Skrutskie et al. 2006). The new input catalogue also extended the RAVE footprint to include lower Galactic latitudes (5 • < |b| < 25 • ), where a colour cut using 2MASS photometry (J − K > 0.5 mag) was applied to preferentially select giants (Kordopatis et al. 2013a). This input catalogue is also used for the most recent data release (DR5; Kunder et al. 2017).

Target selection and observing strategy
Here, we summarize the target selection and observing strategy described in the first data release (DR1; Steinmetz et al. 2006), as the selection function of a survey depends explicitly on how the observations are conducted.
From the input catalogue described in the previous section, 400 targets were selected for a given field of view. This selection was then split into two field files consisting of 200 stars each, to allow for two separate pointings. The 6dF instrument, used to conduct RAVE observations, consists of three fibre plates with 150 fibres each. These fibres were assigned to science targets according to a field configuration algorithm developed for the 2dF spectrograph (Lewis et al. 2002). However, for various reasons such as inaccessible areas on the fibre plate and fibre breakage, on average approximately 90 science fibres were allocated per pointing. Each observation consisted of a minimum of 3 (average 5) exposures, which were then stacked to improve the SNR per pointing. Fig. 2 shows the distribution of fibres placed on science targets present in DR5 for all fields in the master list of RAVE field centres (see Section 4.1).
During the first year and a half of observations, no blocking filter was used on the spectrograph, so spectra were contaminated with second-order diffraction (i.e. flux from the ∼4200-4400 Å wavelength range entered the primary wavelength range). Therefore, in DR5 the automated stellar parameter pipeline does not give stellar parameters for observations made before 2004 April 6.
A problem with fibre cross-talk due to bright (I ∼ 9) stars adjacent to fainter stars was also identified in the period before DR1, and corrected for in the first iteration of the data reduction pipeline (Steinmetz et al. 2006). Therefore, in 2006 March, the observing strategy was modified to observe stars only in a given magnitude bin for each pointing. These magnitude bins are illustrated in Fig. 3 as vertical dashed lines. In addition to reduced fibre cross-talk, this change in the observing strategy had the added benefit of optimizing exposure times (e.g. bright fields could be observed in nominal conditions, while faint fields were preferentially observed when conditions were excellent), increasing the SNR per spectrum, and therefore resulting in more accurate stellar parameters. For fields in which interlopers or stars with variable brightness affected the fibres despite the magnitude selection, assessment and data reduction was conducted on a case-by-case basis to minimize the probability that problematic stars would enter the final catalogue.

Survey footprint
A simple footprint was imposed for observations: pointings were restricted to the Southern hemisphere and |b| > 25 • . RAVE generally avoided regions on the sky with large extinction, i.e. close to the Galactic disc and towards the bulge. The primary reason for avoiding low Galactic latitudes was to prevent multiple stars entering a fibre, which had a spatial extent of 7 arcsec on the sky. Exceptions were a number of calibration fields around |b| = 0 • and several targeted observations of open clusters in the Galactic plane. In addition, there are a few fields in regions at the northern side of the bulge that originate from an interim input catalogue. We exclude these fields when evaluating the completeness of RAVE, as the target selection in these fields differed from the general selection procedure.
In addition, we note the impact of utilizing DENIS DR3 as an input catalogue. The DENIS survey was observed in strips of 30 • in declination and 12 arcmin in right ascension, with an overlap of 2 arcmin between consecutive strips. This observing pattern is embedded in the formulation of the selection function as a function of position (equation 1), and therefore is considered when evaluating both the completeness and the selection function. Fig. 1 shows the adopted survey footprint for this study, which differs from the original footprint used for observations, as well as the distribution of individual stars in DR5.

RAVE data release 5
The latest public data release, DR5, contains information from 520 781 measurements of 457 588 individual stars. The distribution on the sky of these stars can be found in Fig. 1. In addition to obtaining precise line-of-sight velocities V los (typical uncertainties ∼2 kms −1 ), RAVE DR5 provides several other stellar parameters derived from the spectra: effective temperature (T eff ), surface gravity (log g), an overall metallicity ([M/H]) and individual abundances for six elements: magnesium, aluminium, silicon, titanium, iron and nickel.
Line-of-sight distances for RAVE stars have been estimated using a number of methods, including red-clump giants (e.g. Siebert et al. 2008;Veltz et al. 2008;Williams et al. 2013), isochrone fitting (e.g. Breddels et al. 2010;Zwitter et al. 2010) and a robust Bayesian analysis method described in Burnett & Binney (2010). RAVE DR5 provides distances derived using the method described in Binney et al. (2014), where stellar parameters, along with known positions, are used to derive spectrophotometric distance estimates for a large fraction of the stars in the survey.
In addition, Matijevič et al. (2012) performed a morphological classification of the spectra to allow for the identification of spectroscopic binaries and other peculiar stars in the catalogue. All targets in DR5 were also cross-matched with a number of other data sets: Tycho-2 (Høg et al. 2000), UCAC4 (Zacharias et al. 2013), PPMXL (Roeser, Demleitner & Schilbach 2010), 2MASS (Skrutskie et al. 2006), WISE (Wright et al. 2010), APASS (Munari et al. 2014) and Gaia DR1 (Gaia Collaboration 2016) to provide additional information such as proper motions, as well as apparent magnitudes in other filter passbands.

C ATA L O G U E D E S C R I P T I O N A N D Q UA L I T Y F L AG S
The RAVE survey was designed to have as simple a selection function as possible to ensure that any biases could be accurately quantified. The initial target selection was based only on the apparent I-band magnitude (9 I 12) and sky position. An I-band selection was chosen as the most appropriate for efficient use of the spectral range of the 6dF instrument. In Fig. 3, we show the distribution of I-band magnitudes in RAVE DR5. This distribution extends past the initial apparent magnitude limits due to uncertainties in the SSS photometry used for the first input catalogue (see fig. 4 of Steinmetz et al. 2006). During 2006, the angular footprint was expanded to include regions close to the Galactic disc and bulge (Galactic latitude 5 • < |b| < 25 • ) as a result of the new input catalogue (see Section 2.1), and in these new regions a colour criterion (J − K s ≥ 0.5) was imposed to select for cool giant stars over more prevalent dwarfs (Kordopatis et al. 2013a). We can thus assume that the probability, S, of a star being observed by the RAVE survey is with α and δ denoting the equatorial coordinates of stars in a given region on the sky, within the defined footprint (see Fig. 1). Due to its complex history and owing to observational constraints and actual atmospheric conditions on the respective day, the input catalogue for RAVE carries some inhomogeneity, and it is therefore not straightforward to construct a valid parent sample from this variety of data sets. However, one data set in particular, 2MASS, offers complete coverage of both the survey area and the magnitude range of RAVE. Therefore, we adopt the 2MASS photometry in order to compare our RAVE targets with as homogenous a sample as possible.
2MASS provides accurate J, H and K s photometry for nearly all RAVE targets and, equally important, also for all other stars that could have potentially entered the input catalogue. Unfortunately, 2MASS does not provide I-band photometry, which is needed to Equation (2) is derived by a direct comparison of 2MASS J and K s magnitudes with DENIS I magnitudes. This transformation is determined by a polynomial fit in I − J versus J − K s , and is an evolution of equation (24) in Zwitter et al. (2008), with an improved fit for very cool stars. The distribution of I 2MASS magnitudes for RAVE DR5 is shown in Fig. 3. Here, we find a significant number of RAVE stars that have I 2MASS < 9. We note that this is due to the fact that both DENIS and SuperCOSMOS saturate around I DENIS ∼ 9, and the conversion of their cross-matched 2MASS magnitudes gives magnitudes brighter than I 2MASS ∼ 9. In addition, there are a number of other factors that also have an influence on the final selection function, which we will describe in the following sections.

RAVE quality criteria
To assess the completeness S select (equation 4), we remove fields that were reprocessed during the course of data reduction (indicated in DR5 with either 'a','b' or 'c' appended to the RAVE_OBS_ID). After removing these stars, we are left with a sample of 518 079 entries in DR5, corresponding to 455 626 individual spectra.

2MASS quality criteria
We compute an I 2MASS value (equation 2) for each 2MASS star and clean the data from spurious measurements. Our requirements for a 'valid' measurement are given in Table 1.

Field by field
We first consider the selection function of RAVE on a field-by-field basis, in order to account for changes in the observing strategy as a function of time.
First, the observation date and position for each individual pointing is identified from a master list of RAVE field centres and their  (Table 2), we include only those fields that have stars parametrized and published in DR5. An excerpt of the resulting completeness fraction on a field-byfield basis can be found in Table 2. The completeness fraction for a field centred on (α, δ) is given by where the double sum is over a given I 2MASS range and the total J − K s range in that field. It is important to note that there exists substantial overlap between RAVE pointings, and therefore, it is not appropriate to combine the data given in Table 2 to construct a selection function for the entire RAVE survey. In order to facilitate this, we must consider the completeness of RAVE for equal, discrete areas on the sky. We do note, however, that on scales below the size of the field plate ( 28.3 deg 2 ), we expect inhomogeneities due to certain technical constraints with fibre positioning on the field plates used for RAVE observations (see fig. 3 of Steinmetz et al. 2006).

Equal area on the sky (HEALPIX)
To construct our parent RAVE sample for considering equal areas on the sky, we first remove all repeat observations and keep for each star only the observation with the highest SNR. This is in contrast to Section 4.1, where we do not remove duplicates. Here, the goal is not to conserve the temporal information, but to accurately reconstruct the sky coverage and completeness of RAVE, so any given star is counted only once, even if it was observed multiple times. In addition, for the rest of the study, we will only consider stars within the adopted footprint ( Fig. 1). This excludes ∼7000 stars available in RAVE DR5. These specific stars are documented in the RAVE DR5 catalogue with FootPrint_Flag. We then divide the sky into equal area pixels using the HEALPIX algorithm (Górski et al. 2005). As described in the previous section, using the RAVE fields directly would cause additional complications for certain applications because some fields are overlapping. We use 12 288 pixels for the whole sky (NSIDE = 32), which results in a pixel area of 3.36 deg 2 , much smaller than the size of a RAVE field ( 28.3 deg 2 ). We note that we use the 'nested' 2 scheme and equatorial coordinates (α, δ) to determine the corresponding pixel ID for any given star. We count the number of RAVE stars, N RAVE , in each pixel (centred on α and δ) as a function of I 2MASS in 0.1 dex magnitude bins. To estimate the completeness, we follow the same procedure for all stars in our 2MASS sample to obtain N 2MASS and then compute where the double sum is over a given I 2MASS range and the total J − K s range in that pixel. Table 3 gives an excerpt of the completeness fraction for HEALPIX pixels, in 0.1 mag width bins. Full versions of Tables 2 and 3 are available as part of the online-only materials, and also via the RAVE website. The resulting completeness as a function of magnitude and sky position has already been shown in the fourth RAVE data release paper ( fig. 3 of Kordopatis et al. 2013a), and we replicate it here for DR5 in Fig. 4. 3 Overall, as in DR4, we find that the completeness is highly anisotropic on the sky for any given magnitude bin, and drops off significantly for fainter magnitudes.

Impact of the analysis pipeline
Until now we have only investigated effects that originate from the RAVE target selection. However, when considering certain applications, there is another important issue: namely, the effects of the automated pipelines. RAVE DR5 contains output from a number of pipelines that provide additional information for observed stars. As described in Section 2, in addition to line-of-sight velocities, RAVE provides estimates of stellar parameters such as effective temperature, surface gravity, elemental abundances, as well as distance and age estimates.
In addition, this pipeline yields reliable results only in a restricted region in stellar parameter space (Kordopatis et al. 2013a). We explicitly implement this by using only stars with 4000 K < T eff < 8000 K, 0.5 < log g < 5.
These limits are based on the range of parameters for the spectra used for the learning grid of the analysis pipeline (Kordopatis et al. 2011(Kordopatis et al. , 2013a, as well as unphysical or highly unlikely combinations of derived parameters.
These restrictions have to be taken into account when comparing observed data with specific Galaxy models. They can be expressed as an additional selection function and hence the complete selection function S is We give examples of this effect in Figs 5 and 6, for the selection function evaluated with HEALPIX pixels, and field by field, respectively. Fig. 5 shows the distribution of the number of stars satisfying these criteria that have derived parameters (stellar parameters, distance and chemical abundances) available in RAVE DR5 as a function of I 2MASS magnitude (left-hand and middle panels represent S pipeline , see equation 6), as well as the completeness fraction of these parameters in RAVE with respect to 2MASS (the right-hand panel represents the complete selection function, see equation 7). We find that the number of stars having a given parameter in DR5 varies as a function of magnitude, with the brightest magnitude bin (9 < I 2MASS < 10) having the highest number of stars with stellar parameters, distances and chemical abundances. When we consider the relative fraction of stars with a given parameter (using radial velocity as a baseline, as all stars satisfying the quality criteria have radial velocity measurements), we find that stellar parameters are Figure 5. Left: histogram of stellar parameters, chemical abundances and distance measurements in RAVE DR5 satisfy the quality criteria and parameter limits given in Section 4.3 as a function of magnitude (S pipeline , equation 6). Stars with stellar parameters are indicated in orange, distances in red and chemical abundances in green. Observed magnitude bins are indicated with dashed lines. Middle: relative fraction of stars with derived parameters as a function of magnitude. We use radial velocity as a baseline for comparison, as all stars satisfying the criteria given in Section 4.3 have radial velocity measurements. As all stars with radial velocities in this sample also have stellar parameters, the completeness of stellar parameters is 100 per cent. Right: completeness fraction of derived parameters, relative to the number of 2MASS stars, as a function of magnitude. This represents the complete selection function with respect to 2MASS (see equation 7).

Figure 6. Distribution of RAVE stars (open circles) and 2MASS (grey)
stars on the sky, for a given RAVE pointing. Orange shaded circles indicates that a given RAVE star has spectral parameters from the spectral parameter pipeline (T eff , log g, [M/H]), red squares indicate stars that have distance estimates from the distance pipeline and green squares indicate stars that have all abundance measurements from the chemical abundance pipeline. derived for all stars with radial velocities, while distances are derived for ∼80 per cent of these stars. The relative fraction of stars with chemical abundance estimates is calculated for stars that have all six element abundances derived from the chemical abundance pipeline (Boeche et al. 2011). We find that ∼40-60 per cent of stars brighter than 10th magnitude have chemical abundance information available in DR5. Finally, when we consider the completeness of a given derived parameter in RAVE with respect to 2MASS, we find that stars in the brightest magnitude bin (9 < I 2MASS < 10) have the highest completeness. This panel represents the complete selection function (see equation 7).
In Fig. 6, we characterize the completeness fraction of derived parameters for a typical RAVE pointing. RAVE stars are shown in black, purple and orange, with the underlying 2MASS parent sample shown in grey. For this particular pointing, we find that all stars have estimated stellar parameters, ∼90 per cent have distances and ∼10 per cent have chemical abundance estimates.

C O M PA R I S O N W I T H A G A L AC T I C M O D E L
We now explore the potential influence of the selection function with respect to inducing biases in the stellar parameter distributions of our RAVE DR5 stars compared with what we expect from models of the Galaxy. For this comparison, we utilize the stellar population synthesis code GALAXIA 4 (Sharma et al. 2011).
GALAXIA is a tool that uses a given Galactic model to conduct synthetic observations, generating a catalogue that imitates any given survey of the Milky Way. Here, we use the default provided in GALAXIA, a modified version of the Besançon model (Robin et al. 2003). Details on the extent of these modifications can be found in Sharma et al. (2011). The Besançon model within the GALAXIA framework has been found to agree quite well with Besançon star counts (Sharma et al. 2011). The input parameters for GALAXIA are very simple, and correspond well to our adopted form of RAVE's selection function (equation 1).
The catalogue may be generated for a given circular area on the sky, as well as for the whole sky. In order to compare these mock observations with our two methods of characterizing the selection function of RAVE, we generate two catalogues: one on a field-byfield basis and one full-sky, which is then divided into HEALPIX pixels. For each of these catalogues, we allow GALAXIA to generate stars with apparent I-band magnitude 0 < I < 13, and no colour restriction. We then perturb the output from GALAXIA with a simple noise model to imitate observational uncertainties present in RAVE, and apply the RAVE selection function. We refer to this modified catalogue as our 'mock-RAVE' catalogue. The mock-RAVE catalogue can then be compared to our parent GALAXIA sample (where the RAVE selection function has not been applied) to evaluate the effect that the selection function has on fundamental distributions such as kinematics and chemistry.

Applying uncertainties to generate a mock-RAVE catalogue
GALAXIA provides stellar parameters and magnitudes with infinite precision and accuracy. This does not reflect our observational data, where each of the derived parameters has intrinsic uncertainties associated with its measurements. In order to facilitate an accurate comparison between the mock catalogue and real RAVE data, we perturb J, K s , T eff , log g and [Fe/H] available in our GALAXIA catalogue based on the uncertainty distributions of 2MASS magnitudes and RAVE stellar parameters before applying the selection function of RAVE. We then apply the selection function of RAVE using both methods described in Section 4: field by field and HEALPIX pixels. In addition to scattering the GALAXIA distributions with our simple noise model, we slightly modify the metallicity distribution of the thick disc and the halo of our GALAXIA output for better agreement with observations.

2MASS apparent magnitude uncertainties
First, we modify the output GALAXIA 2MASS J and K s magnitudes by a simple noise model, derived from the observational uncertainties in 2MASS. To do this, we characterize the observational uncertainty for a given 0.1 mag bin as a function of magnitude. We model the distribution of uncertainties in each bin as a Gaussian, and draw from this Gaussian to obtain an 'observational uncertainty' on our GALAXIA output. Typical 2MASS J magnitude uncertainties are of the order of 0.025 dex. From the modified J and K s values, we obtain an I 2MASS for each GALAXIA star using equation (2).

Applying RAVE-like uncertainties to stellar parameters
In order to compare the stellar parameters available in this mock catalogue with those derived from the RAVE DR5 stellar parameter pipeline, we must first modify the output from GALAXIA with the uncertainty distributions of RAVE stellar parameters. The RAVE DR5 stellar parameter pipeline provides individual uncertainties for each star, and we can use the distribution of these uncertainties to modify our initial GALAXIA catalogue by RAVE-like uncertainties, similar to the process used in the previous section, but in a higher dimensional space due to correlations between the uncertainties.
In Fig. 7, we show the correlation of uncertainties as a function of position in different planes of stellar parameters. Here, we colour code the mean uncertainty as a function of the stellar parameters in T eff -log g and T eff -[M/H] space. The highest uncertainties are found primarily in hot, giant stars in the T eff -log g plane and metal-poor stars in the [M/H]-log g plane (see also table 4, Kunder et al. 2017). However, comparing these regions to the density contours, we find that these regions are sparsely populated, and therefore should not significantly affect the mean uncertainty. The abrupt jumps, visible at e.g. T eff ∼ 5000 K and [M/H] ∼ −0.7, result from discrete coverage of the stellar parameter space by model atmospheres that are compared to the observed spectra by the pipeline. We find that the majority of RAVE stars have similar uncertainties in spectral parameters, with σ (T eff ) ∼ 50-75 K, σ (log g) ∼ 0.1-0.2 dex and σ ([M/H]) ∼ 0.1 dex.
In addition to an anisotropic distribution of uncertainties in in T eff -log g and T eff -[M/H] space, it has been well documented that these uncertainties in the derived atmospheric parameters are also correlated (see fig. 6 of Kordopatis et al. 2011 and fig. 23 of Kordopatis et al. 2013a). Due to these correlations, it is not sufficient to simply model the uncertainties as individual Gaussians and draw from them. Instead, we consider the distribution of uncertainties to have the form of a multivariate Gaussian, and estimate the covariance between uncertainties in T eff , log g and [M/H]. We then draw from this multivariate Gaussian to obtain simultaneously uncertainties for these three respective parameters. Note that in this way we can introduce only the internal uncertainties of the analysis pipeline, but not systematic shifts coming from inaccuracies of the stellar atmosphere models.
Finally, we apply S pipeline by setting weights to zero for all stars that do not fulfil the criteria given in equation (5). We refer to the result as the mock-RAVE catalogue. The effect of this step is model dependent as, for example, the number of supersolar metallicity stars varies between different Galaxy models. Using the version of the Besançon model in GALAXIA, we find that approximately 9 per cent of stars fall outside of our T eff and log g limits.
The effect of applying these observational uncertainties as well as S pipeline is shown in Fig. 8. The top row shows 2D histograms of stellar parameters for our GALAXIA sample (without the application of S pipeline ). RAVE-like uncertainties and the selection function are applied to obtain the panels in the middle row. Our RAVE sample is shown on the bottom row. Overall, we find good agreement in the distribution of these stellar parameters between the observations and the mock-RAVE catalogue.

Impact of the selection function
We now turn to the implications of the observed stellar populations due to the selection function of RAVE. While RAVE targets within the footprint were selected on purely photometric grounds, it remains to be seen if changes to the observing strategy as well as the applied colour cut at low latitudes have induced biases in the observed characteristics of the sample. In order to test if RAVE is a kinematically unbiased survey, we compare the Galactocentric cylindrical velocity distributions of the parent GALAXIA sample with those of the mock-RAVE catalogue. We also examine potential biases in the metallicity distribution of the sample, as abundance measurements are highly correlated with other derived values, such as effective temperature and surface gravity, as well as external characteristics such as kinematics. Hence, biases in either velocity or metallicity are potentially harmful if undetected, for both chemical evolution and dynamical modelling.
We take a uniformly selected subsample of our full GALAXIA catalogue in the footprint of RAVE as our expected 'parent' sample (i.e. what we consider to be the 'truth' for the purpose of this exercise), and compare it to our mock-RAVE catalogue. Any considerable deviations between the two distributions may indicate a bias in RAVE due to the selection function. We note that for this exercise, we do not apply RAVE-like uncertainties to the velocities or metallicities in our mock-RAVE catalogue (i.e. here we use the true GALAXIA output). In addition to a GALAXIA subsample limited to I < 13, we also investigate the effects of limiting our GALAXIA subsample to I < 12, as it has been shown in Fig. 3 that RAVE is not complete at I 2MASS = 13. Quantitatively, in order to characterize the skewness of each distribution, we compute quartile values (Q 1 , Q 2 , Q 3 ), which represent the 25th, 50th and 75th percentiles, respectively.
We investigate these potential biases in three subsamples: giants (log g < 3.5), the main-sequence region (log g > 4.0, T eff < 5500 K) and the turnoff region (log g > 3.5, 5500 K < T eff < 7000 K). The boundaries of these subsamples have been determined from the T eff -log g plane of our parent GALAXIA sample (see the top row of Fig. 8). For these comparisons, we also consider the distance |z| from the Galactic plane by dividing our subsamples into three bins of height above the plane. The size of these bins varies between our subsamples, as these populations probe different distance distributions.

Velocity distribution comparison
We first examine the effect of our selection function on distributions of the cylindrical Galactocentric velocity components in our mock-RAVE catalogue. Our results are shown in Figs 9-11, with the GALAXIA distribution shown as dashed black curves, and the mock-RAVE catalogue shown in green. A GALAXIA distribution limited to I < 12 is shown as solid black curves. Quartile values are given in each panel.
For our giant and main-sequence region samples (Figs 9 and 10), we find nearly identical distributions for all distance bins when comparing our mock-RAVE catalogue with the respective parent GALAXIA distributions. We consider the distributions to agree if we find all three quartiles to agree within 5 kms −1 . Using this criterion, we confirm that the selection function does not impose kinematic biases for these populations as a whole. We note that when we consider only low-latitude fields (5 • < |b| < 25 • ), the colour criterion that was imposed to select preferentially for giants (see Section 3.1.1) reflects to a small bias in age. Further comparisons with the model have shown that this age bias does not introduce a significant kinematic bias; however, we urge some caution when considering the velocity distributions for these low-latitude fields.
We also find good agreement in most height bins for each velocity component of our turnoff region sample (Fig. 11). However, for the most distant bin (0.30 < |z| < 0.60 kpc), there is a slight difference between the distributions in the low-V φ tail. Specifically, the application of the selection function leads to an underrepresentation of stars with V φ 150 kms −1 in our mock-RAVE catalogue. Bias is present in all components of the velocity, but we find it most clearly in V φ , as the velocity distribution functions for the thin disc, thick disc and halo do not have the same mean for this component. . The top row shows these 2D histograms for our GALAXIA sample with the RAVE selection function applied. In the middle row, we show our GALAXIA sample that has had both the RAVE selection function and RAVE-like uncertainties applied. The bottom row shows our RAVE DR5 sample. The colour scale is lognormalized.
The difference that we find can be explained by the difference in magnitude distributions between our two samples: our parent GALAXIA sample extends to I 2MASS ∼ 13 (see Section 5), whereas our mock-RAVE sample follows the I-magnitude distribution of RAVE (see Fig. 3), by the definition of the selection function. As a consequence, there are relatively few stars observed in RAVE with 12 < I < 13 compared to those present in our parent GALAXIA sample. By having a larger fraction of stars at fainter magnitudes, the parent GALAXIA sample probes more of the thick disc and halo compared to our mock-RAVE sample. This effect is also reflected in differences that we see between the metallicity distributions (see Section 5.2.2 and Fig. 12). This discrepancy is small (and indeed disappears if we limit our parent GALAXIA sample to I 2MASS < 12), and overall the distributions meet our criterion (all three quartiles agree within 5 kms −1 ), so we consider the turnoff region stars to also be kinematically unbiased.
Similar tests were done for a sample of hot dwarf stars (log g > 3.5, T eff > 7000 K), but are not shown here. As with our turnoff region sample, we find our sample of hot dwarfs to also be unbiased for I < 12.

Metallicity distribution comparison
Next, we examine the metallicity distributions of the GALAXIA samples and our mock-RAVE catalogue. The metallicity distributions for each subsample in different slices in distance |z| from the Galactic plane are shown in Fig. 12. Here, we consider the distributions to agree if all three quartiles agree within 0.1 dex.
For giants (left column of Fig. 12) and stars in the main-sequence region (middle column of Fig. 12), we find very good agreement between the GALAXIA and mock-RAVE metallicity distributions for all distance bins. For stars in the main-sequence region and the most distant bin (0.20 < |z| < 0.30 kpc), we find that in our mock-RAVE sample the metal-poor tail of the metallicity distribution is slightly underrepresented, compared to the GALAXIA sample. However, this difference can be explained by small number statistics, as our mock-RAVE sample would need only one star below [M/H] ∼ −0.6 to reconcile the difference between the two distributions. Again, despite this small discrepancy, the quartile values satisfy our criterion, and therefore, we consider our main-sequence region sample to be chemically unbiased. We conclude that for giants and stars in Figure 9. Distributions of Galactocentric cylindrical velocity components for samples of giant stars (log g < 3.5) at different distances from the Galactic plane as indicated in the panels. The green histograms show the velocity distributions in the mock-RAVE catalogue, while the black dashed curves show the distributions for our parent GALAXIA subsample of giants. Solid black curves show the distribution for a parent GALAXIA sample limited to I < 12. Quantile values (Q 1 , Q 2 , Q 3 ) for both distributions are given in each panel, which represent the 25th, 50th and 75th percentiles, respectively. The sample size (N) for the distributions are shown in green and black, representing the mock-RAVE sample and the parent GALAXIA sample limited to I < 12, respectively. The y-axis is plotted on a logarithmic scale. Figure 10. Same as Fig. 9 but for the main-sequence region (log g > 4.0, T eff < 5500 K) sample. Figure 11. Same as Fig. 9 but for the turnoff region (log g > 3.5, 5500 K < T eff < 7000 K) sample. the main-sequence region, our metallicity distribution is minimally affected by our selection function.
Similarly, for the turnoff region sample (the right column of Fig. 12), we find good agreement for the two closest distance bins, with differences between the two distributions found only in the furthest distance bin (0.30 < |z| < 0.60 kpc). For this bin, we find that our criterion is barely met, with Q 1 differing by ∼0.1 dex. This discrepancy between the two distributions is explained by the difference in magnitude limits as described in Section 5.2.1. That is, as our parent GALAXIA sample includes a larger fraction of faint (12 < I < 13) stars compared to our mock-RAVE sample, it probes a larger volume, and therefore more of the thick disc and halo. This effect is less prominent for our giant sample, as the relative fractional increase of thick disc and halo stars is much less for giants, compared to our dwarf sample. We conclude that our turnoff region sample is unbiased for I 2MASS < 12. As with the velocity comparisons, we also test the [M/H] distributions for a sample of hot dwarf stars (log g > 3.5, T eff > 7000 K), and find them to also be chemically unbiased.

D I S C U S S I O N A N D C O N C L U S I O N S
We have described, in detail, how to evaluate the selection function S of the RAVE survey in two different ways: field by field and HEALPIX pixels. In addition, we discussed the uncertainty distributions of RAVE DR5 and illustrated that these uncertainties depend heavily on the position in stellar parameter space. We then generated a mock-RAVE catalogue by applying the detailed selection function to the model output, and modified the raw GALAXIA output by RAVElike uncertainties.
To investigate that RAVE is a kinematically and chemically unbiased survey, we tested the impact of S on the resulting velocity and metallicity distributions using a modified version of the Besançon model available in the GALAXIA framework. The velocity and metallicity distributions of our mock-RAVE catalogue were compared with the distributions of the underlying GALAXIA populations. We find that, for I < 12, our selection function does not intrinsically induce biases in the kinematics or chemistry of stars within the stellar parameter space covered in RAVE (4000 K < T eff < 8000 K and 0.5 < log g < 5.0), with respect to expectations from the Besançon model available in GALAXIA. We do find some small biases when we consider a parent sample extending to I = 13; however, it has been shown that the completeness of RAVE falls off for fainter magnitudes (due to the magnitude limit imposed from the input catalogues), and therefore our conclusion stands for the magnitude range where we consider RAVE to provide a representative sample of stars (9 < I < 12). Under these criteria, and within this parameter space, RAVE stars provide unbiased samples in terms of kinematics and metallicities that are well suited for kinematic modelling without taking into account the detailed selection function via volume corrections.
For our giant and main-sequence region samples, we find good agreement between the parent GALAXIA sample and our mock-RAVE catalogue. We find similar trends for our sample of turnoff region stars, with small differences in the velocity distributions for the most distant stars, and the metal-poor tail of the [M/H] distribution. However, we explain this bias due to the fact that our GALAXIA sample includes a larger number of stars at fainter magnitudes compared to our mock-RAVE catalogue. The parent GALAXIA sample therefore probes a larger volume than our mock-RAVE catalogue, and consequently more of the thick disc and halo populations. As we are able to account for the source of these differences, we consider our turnoff region sample to also be kinematically and chemically unbiased for I 2MASS < 12.
Recently, a number of studies used RAVE data, and in particular subsamples of giant stars, for kinematic modelling (e.g. Kordopatis et al. 2013b;Williams et al. 2013;Bienaymé et al. 2014;Binney et al. 2014;Minchev et al. 2014;Piffl et al. 2014). Here, we confirm that the giant stars in RAVE can indeed be used as an unbiased sample. Piffl et al. (2014) fitted a full dynamical model of the Milky Way to the kinematics of the RAVE giants. They then tested if the resulting model would also correctly predict the kinematics of a sample of hot dwarf stars from RAVE and found a number of discrepancies. Their conclusion was that the thick disc distribution function in their model was too simplistic. However, Binney et al. (2014) also found that a similar dynamical model fitted to data from the GCS (Nordström et al. 2004) could reproduce the RAVE hot dwarf kinematics, but did not fit the RAVE giants. Since the GCS has a selection function that is different from that of the RAVE dwarfs, this implies that taking into account a more complicated volume correction for the hot dwarfs will not be enough to completely reconcile them with the model of Piffl et al. (2014). Hence a more complex distribution function for the thick disc, as argued for by the authors, seems still necessary.
We also illustrate that the quantified RAVE selection function can be used to generate mock-RAVE surveys from stellar population synthesis models, and in combination with code frameworks like GALAXIA, it can serve as a powerful tool to test Galaxy models against the RAVE data. The two versions of the RAVE selection function produced by this study (field by field and by HEALPIX pixel) will be made publicly available on the RAVE web site (https://www.rave-survey.org).

S U P P O RT I N G I N F O R M AT I O N
Supplementary data are available at MNRAS online. Table 2. Completeness fraction of RAVE on a field-by-field basis, for 0.1 mag width bins. Table 3. Completeness fraction of RAVE on a pixel-by-pixel basis, for 0.1 mag width bins. Here, the nested scheme is used to determine a given pixel ID.
Please note: Oxford University Press is not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article. This paper has been typeset from a T E X/L A T E X file prepared by the author.