Detecting host-parasitoid interactions in an invasive Lepidopteran using nested tagging DNA-metabarcoding

Determining the host-parasitoid interactions and parasitism rates for invasive species entering novel environments is an important first step in assessing potential routes for biocontrol and integrated pest management. Conventional insect rearing techniques followed by taxonomic identification are widely used to obtain such data, but this can be time consuming and prone to biases. Here we present a Next Generation Sequencing approach for use in ecological studies which allows for individual level metadata tracking of large numbers of invertebrate samples through the use of hierarchically organised molecular identification tags. We demonstrate its utility using a sample data set examining both species identity and levels of parasitism in late larval stages of the Oak Processionary Moth ( Thaumetopoea processionea - Linn. 1758), an invasive species recently established in the UK. Overall we find that there are two main species exploiting the late larval stages of Oak Processionary Moth in the UK with the main parasitoid ( Carcelia iliaca - Ratzeburg, 1840) parasitising 45.7% of caterpillars, while a rare secondary parasitoid ( Compsilura conccinata - Meigen, 1824) was also detected in 0.4% of caterpillars. Using this approach on all life stages of the Oak Processionary Moth may demonstrate additional parasitoid diversity. We discuss the wider potential of nested tagging DNA-metabarcoding for constructing large, highly-resolved species interaction networks. identical to the designed sequence. Forward sequencing was less successful and could not produce a strong Sanger trace (probably due to the primer length being suboptimal for Sanger sequencing). Alignments of the poor quality sequence and the reverse primer suggest there may be two deletions within the bridge sequence leading to poor library formation in PCR2. Further studies conducted subsequently were performed using newly synthesised PCR positive primers and sequencing was successful. We also recommend that primers for future studies should be synthesised at the highest possible quality standard to ensure accuracy of synthesis. We tested the ability of a NGS nested metabarcoding design to produce individual-level data for a large number of caterpillar samples (>1000) in a single sequencing run. We achieved a high level of PCR and sequencing success and found an average of 11,000x coverage for each PCR well before sequence filtering, allowing us to adopt a high stringency for sequence quality. The depth of coverage found in our experiment allowed us to distinguish multiple unique sequences in each well, representing the host, parasitoids, and (potentially) any other species interacting with the moths such as parasitic fungi or intracellular parasites. Thaumetopoea processionea caterpillars were parasitised by two parasitoid species already known from the literature. Carcelia iliaca was found to parasitise almost half of all caterpillars while the other, Compsilura concinnata , was only detected in four caterpillars.


Introduction
Invasive species are a growing global threat and pose a major risk to both natural and cultivated ecosystems with detrimental effects including direct competition for resources (Peck et al. 2014), predation on native species (Boland 2004) and even disruption of intentionally released biocontrol agents (Schooler et al. 2011). Economically, it is estimated that invasive species have a total global cost of at least US$ 70 billion annually (Bradshaw et al. 2016). In Europe, there are over 1590 nonnative invasive arthropod species (estimate as of Roques 2010) and the rate at which species are establishing is increasing with 'an average of 10.9 species per year for the period 1950-1974 to an estimated 19.6 species per year for 2000-2008' (Roques 2010;Roques et al. 2016). The UK has the third largest non-native species burden in Europe, with more than 502 arthropod species and 1376 higher plants (Roy et al. 2014). Determining how to deal with invasive species is critically important for both ecological and financial reasons. Although good biosecurity is likely to be far cheaper than control or eradication of established species (Vazquez-Prokopec et al. 2010), this is not always achieved. Thus, an understanding of the interactions between invasive species and native species within an invaded range is essential for quantifying the impacts on communities (Roy et al. 2009;Hesketh et al. 2010) as well as developing practical management approaches, such as biocontrol.
Ecological network modelling provides a framework within which these questions can be addressed but, typically requires well sampled networks that are laborious to create with traditional observational approaches (Evans et al. 2016). Molecular tools and in particular, modern sequencing technologies, provide a way to collect this data on a large scale by standardising and automating much of the effort required to detect interactions (Handley et al. 2011).
The Oak Processionary Moth (Thaumetopoea processionea, hereafter referred to as OPM) is historically considered a native of the warmer parts of southern and central Europe with a more sporadic presence in western Europe (Groenen & Meurisse 2012) but since the 1970s the frequency of outbreaks in north-western Europe (especially Belgium, the Netherlands and Germany) has increased dramatically with multiple intense outbreaks where the gregarious larvae reach population densities of thousands of individuals per host tree (deciduous Quercus spp.) (Stigter et al. 1997).
Population densities on this scale are capable of defoliating large areas of oak forest (Wagenhoff & Veit 2011). In addition to commercial forestry concerns, the caterpillars are also a serious public

Accepted Article
This article is protected by copyright. All rights reserved.
health risk due to the presence of urticating hairs containing thaumetopoein, a strong allergen unique to OPM and related moth species (Lamy et al. 1986).
The Oak Processionary Moth arrived in the UK in 2006 and has spread throughout Greater London but has yet to establish beyond this area (Mindlin et al. 2012). This is in part due to a UK Government control program that involves both manual nest removal and insecticide spraying using Bacillus thuringiensis (Bt) and even though the government funded program does not cover the complete UK range of the moth, the costs are still high at around £1.2 million in 2016 (Forest Research 2017). With costs this high, it is desirable to find an alternative to the current control method that is both financially viable and minimises any adverse ecological impact. Natural enemies of OPM have been investigated in Europe and at least 30 egg, larval or pupal parasitoids are known (Zwakhals 2005;Sobczyk 2014;Roques 2014;Sands et al. 2015), but often these records are collected on an ad hoc basis and other than one incidence of a single parasitoid being collected in London [Richmond Park: Carcelia iliaca (Sands et al. 2015)], nothing is known about parasitoids of OPM and their infection rates in the UK. Understanding which parasitoids utilise OPM and the parasitism rates for each parasitoid is essential for assessing the potential use of these species for Integrated Pest Management (IPM). Species interactions have traditionally been detected using either observational data (e.g. Maglianesi et al. 2014), microscopic analysis of collected specimens (e.g. gut contents Otte & Joern 1976;Hyslop 1980; or pollen on pollinators Lopezaraiza-Mikel et al. 2007) or the rearing of organisms to identify plant-herbivore and host-parasitoid interactions (Pocock et al. 2012), but these approaches are labour intensive and there are often significant taxonomic hurdles to overcome, both in terms of the knowledge base required to accurately perform identifications and the presence of cryptic species (e.g. Smith et al. 2007Smith et al. , 2008Kaartinen et al. 2010). In addition to this, studying the parasitoids of OPM and other species with urticating hairs can make laboratory rearing impractical and there is evidence that more traditional rearing methods can underestimate parasitism rates (Day 1994). Thus a better method is required to understand host-parasitoid interactions, the vital first step in assessing the potential for biocontrol methods.
The advent of molecular biological tools has allowed unprecedented opportunities to determine hitherto difficult to observe species interactions. Most of the work to date has focussed on studies using PCR diagnostic approaches where a primer pair specific to a single species is used to amplify only that species within a more complex DNA mixture and then visualise a band on an

Accepted Article
This article is protected by copyright. All rights reserved. agarose gel. This is typically applied to known sets of species by using suites of primer pairs that each produce bands of different lengths that can then be separated by gel or capillary electrophoresis (e.g. aphid -parasitoid interactions Traugott et al. 2008). These approaches are extremely targeted and require extensive a priori knowledge regarding the interacting species as specific primers must be designed for each target species.
Massively parallel 'next generation' sequencing (NGS) is now a commonly used tool in diverse areas of ecology and has the advantage of being able to separate mixtures of DNA from multiple species into their constituent components. One commonly applied approach is 'community metabarcoding' where a bulk DNA sample from one environment is PCR amplified for a standard barcode locus, sequenced, and taxa comprising the community identified bioinformatically (e.g. Taberlet et al. 2012;Yu et al. 2012). Bulk sampling and sequencing a few complex samples results in community rather than individual level data. While this is useful for the detection of species (e.g. Dejean et al. 2012) or characterisation of whole communities across treatments or time (e.g. Yu et al. 2012;Giguet-Covex et al. 2014), this is not necessarily appropriate for detecting species interactions (but see Leray et al. 2013 for a dietary analysis using this approach). The ideal species interaction detection method would involve the ability to sequence a wide variety of organisms in complex mixtures (e.g. extracted DNA containing both host and parasitoid) while retaining individual sample level metadata so that semi and fully quantitative networks can be created sensu Hrček and Godfray (2015).
The use of unique MID tags (Molecular Identification tags, 8-mer oligonucleotide sequences) added to the PCR primers is a well-tested strategy for sample tracking in multiplexed samples with NGS approaches (Binladen et al. 2007). Eight forward MID tags are typically matched with twelve reverse MID tags to give 96 unique tag combinations. Multiple sets of primers like this can increase the number of samples used but large numbers of unique MID-labelled primers can be expensive and complex to organise in a laboratory environment, making it unusual to have more than four sets of primer combinations in a single experiment (384 samples although see Campbell et al. (2015) for an example of highly multiplexed SNP genotyping). Sequencing 384 individual insects per sequencing run using a next generation sequencing approach to detect species interactions is possible, but due to the cost per sample it is unlikely to be feasible for the thousands of samples required when building well sampled ecological networks (Evans et al. 2016).

Accepted Article
This article is protected by copyright. All rights reserved. Shokralla et al. (2015) and Cruaud et al. (2017) provided a potential solution to this problem by utilising a nested barcoding approach involving two PCR steps followed by sequencing on the In this study we have two interlinked objectives: 1. To present a simplified version of nestedmetabarcoding methods for determining OPM-parasitoid interactions using a single PCR locus for a large number of samples, including improvements to control cross contamination; and 2. Demonstrate the utility of nested metabarcoding for detecting species interactions by determining the parasitoid identities and rates for one recently established population of OPM in the UK using a reproducible pipeline for creating host-parasitoid networks (metaBEAT 0.97.7 https://github.com/HullUnibioinformatics/metaBEAT) with a downloadable working environment conveniently packaged in Docker (Docker Inc. 2017) and GitHub (GitHub Inc. 2017). By combining these objectives, we discuss how the ability to link metadata to individuals opens many new avenues for research including the ability to create larger more highly-resolved ecological networks for habitat management and restoration (Evans et al. 2016).

The nested metabarcoding approach
We employed a modification to the standard Illumina 16S bacterial metabarcoding protocol (Illumina 2011). In the original protocol two rounds of PCR were used to: firstly to isolate, and amplify the gene region of interest (PCR1) and; secondly, to add a set of molecular identification tags (MID tags) and the Illumina MiSeq adapter sequences (PCR2). Our modifications to the protocol include adding (1) an additional set of MIDs in PCR1 to further increase the resolution of sample identification, and (2) adding a sequencing heterogeneity spacer to improve MiSeq performance (Fadrosh et al. 2014).

Accepted Article
This article is protected by copyright. All rights reserved.
Each MID tag was composed of a unique 8-nucleotide sequence allowing them to be bioinformatically linked back to the individual sample. We included MIDs in both the forward and reverse primers with twelve forward tags and eight reverse tags, to give 96 unique combinations of sample tags that can be arranged on a single plate (See Fig. 1 for general primer design). We also included specific MID tags for positives and negatives as this helped track contamination and mistagging through illegal tag combinations. A plate of PCRs with these tagged primers was carried out with each PCR well being given a unique combination of tags. The PCR products were then pooled into a separate pre-library for each plate of samples (PCR1, Fig2). The pre-library was then used as a template for a second round of PCR which added the adapters necessary for Illumina sequencing. This reaction also added two additional MID tags that uniquely identify the plate (PCR2, Fig2). These tagged pre-libraries could then be purified, pooled and sequenced on a single Illumina MiSeq run.

Sampling and laboratory protocols
For this study 1012 OPM caterpillars (4th to 6th instar) were extracted from 26 nests (silk structures created by communally living caterpillars) collected from various locations in Croydon, London, UK in July 2014 (full collection data is available in Table S1 in the Github repository). Nests were frozen whole at -20°C for at least 48 hours to kill the caterpillars before the nest was opened up and individual caterpillars removed. Whole caterpillars were placed in deep well plates with a single 5mm stainless steel ball bearing per well and 300 μl of digestion buffer one (20mM EDTA, 120mM NaCl and 50mM Tris). Mechanical lysis was then performed by shaking in a Qiagen TissueLyser II for 2 x 2 minutes at 30Hz. The caterpillar slurry was centrifuged to remove tissue residue from lids and reduce the possibility of cross contamination. To each sample, 270 μl of digestion buffer two (20mM EDTA, 120mM NaCl, 50mM Tris and 2% SDS) plus 30 μl of 10 mg/ml Proteinase K solution were added. The plates were then mixed by repeated inversion and digested overnight at 37 o C. After enzymatic lysis, 10 μl of the digestion supernatant was then used as the starting material for a 70 μl HotSHOT DNA extraction (Truett et al. 2000) which was diluted 1/100 for PCR amplification.
A 313 bp fragment of the Cytochrome C Oxidase subunit I barcode region (coxI) was amplified using primers based on mICOIintF and jgHCO2198 modified from Leray (2013) to include standard Illumina MIDs and bridge sequences (see Fig. 1 and Table S2 in the GitHub repository

Accepted Article
This article is protected by copyright. All rights reserved.
associated with this manuscript) along with a variable length sequencing heterogeneity spacer as in Fadrosh (2014 in 20 μl reactions using a high fidelity Taq mastermix (MyFi Mix Bioline), 1 μl of template DNA and each primer (final concentration -0.5 μM). Extra cycles were required as long primers are known to cause a lag in PCR amplification (Schnell et al. 2015). In order to prevent cross contamination between wells, all PCRs were performed in individually capped PCR strips and all wells were sealed using mineral oil. In addition to this, oil was placed in the PCR well before all other reagents and the PCR master mix was mixed with primers and template DNA under oil to prevent cross contamination.
An example output from a poorer performing run not employing these methods can be seen in Supplementary information appendix 1.
PCRs were checked on a gel to gauge success rates and 10 μl of each product from a plate was pooled together (without quantification) to produce each pre-library, resulting in eleven separate pre-libraries. Two aliquots of each pre-library were gel purified to remove remaining primers using

Accepted Article
This article is protected by copyright. All rights reserved.
was quantified using a Qubit 3.0 before being added directly to the pre-library during pooling. The PCR positive volume added to each pre-library was calculated so that we were adding 1/94th the total DNA (96 samples minus two negatives) of each pre-library as PCR positive. All samples were sequenced (including positives and negatives) even when no band was present as PCR products may still exist below gel detectable levels.

Bioinformatic processing of Illumina MiSeq output
Processing of Illumina data from raw sequences to taxonomic assignment was performed using a

Accepted Article
This article is protected by copyright. All rights reserved.

Assignment of thresholds for data processing
We

PCR and sequencing success rates
Overall we had an apparent PCR success rate of 96.3% (i.e. 96.3% of sample wells produced a visible band on a gel). Analysis of clustering thresholds revealed that no negatives contained detectable reads or clusters above approximately 36 reads (Figs. S7A and S8A in appendix 3), and that per-well read depth and clusters retained in the sample wells became stable at approximately 56 reads (Figs. S7B and S8B in appendix 3). The mistagging analysis revealed that the single largest cluster created by any illegal tag combination was 66 reads (Fig. S9 in appendix 3). Based on these results, we believe that performing our final analysis using a conservative minimum cluster size of 67 reads resulted in the effective exclusion of errors from the dataset.

Accepted Article
This article is protected by copyright. All rights reserved.
From a single MiSeq v2 Illumina run we produced a total of 12,112,538 untrimmed sequences and retained 11,078,131 after quality trimming (91.5%). For the 1,012 moth samples, read depth per well ranged from 0 -63,182 reads before quality trimming (mean = 11,840.1, sd = 9,309.6) and from 0 -57,336 reads after quality trimming (mean = 10,870.9, sd = 8,613.8) (Fig. 3). Overall we had a sequencing success of 94.5% (percentage of sample wells for which reads were retained after data processing), eight out of eleven DNA positives sequenced successfully, but none of the PCR positives were successful. For the failed DNA positives, one produced clusters that were just below our minimum cluster coverage so was excluded, while the remaining two produced no reads at all. The PCR positives produced raw reads, but they were of poor quality leading to very few reads merging and all clusters being excluded (Fig. 4). The DNA positives that failed started with low raw read counts despite having distinct bands on agarose gels. This suggests that a pooling error led to underrepresentation of these wells in the final pooled pre-library which probably resulted in dropout.
The PCR positives also produced strong consistent bands on a gel prior to being added to the prelibraries, but because we quantified how much PCR positive we should add, we are less inclined to believe that a simple pooling error was responsible. A primer synthesis error in either the bridge sequence or the sequencing primer binding site for one or both of the PCR positive primers would result in complete sequencing failure. A sequence error in the bridge sequence would result in poor library formation for this sample during PCR2 while an error in the sequencing primer site would result in unreliable sequencing signal and subsequent filtering. To try and resolve this, Sanger sequencing of the PCR1 PCR positive product was undertaken. Reverse sequencing revealed that the forward primer was identical to the designed sequence. Forward sequencing was less successful and could not produce a strong Sanger trace (probably due to the primer length being suboptimal for Sanger sequencing). Alignments of the poor quality sequence and the reverse primer suggest there may be two deletions within the bridge sequence leading to poor library formation in PCR2. Further studies conducted subsequently were performed using newly synthesised PCR positive primers and sequencing was successful. We also recommend that primers for future studies should be synthesised at the highest possible quality standard to ensure accuracy of synthesis.

Accepted Article
This article is protected by copyright. All rights reserved.

Species identifications and parasitism levels in OPM
Most sequences were assigned by BLAST to either OPM or its known parasitoid fly Carcelia iliaca (Sands et al. 2015) with an additional rare parasitoid Compsilura concinnata (a parasitoid normally associated with Gypsy moth -Lymantria dispar - Fig. 5A including the entomopathogenic ascomycete fungus (Beauveria bassiana), but given the more common use of ITS as a fungal barcoding locus (Seifert 2009) and the probable inefficient amplification of fungal coxI when using primers designed for invertebrates, we pooled all fungal hits into one identification and did not consider them further. A small subset of reads was left unassigned by the metaBEAT pipeline. Manual BLAST searches of these sequences through the NCBI website revealed that these were either: (1) Sequences that did not meet the BLAST search criteria (95% similar across at least 90% of the sequence length) due to gaps in database composition; or (2) sequences where a lowest common ancestor could not be assigned due to database error.
Scenario (1) generally occurs when BLAST identifications are either; all fungal but none are close enough to assign (i.e. probably genuine fungal sequences but from groups poorly represented in Genbank for coxI sequences) or dipteran sequences with stop codons in all reading frames suggesting that these are Carcelia iliaca NuMts (as defined in Lopez et al. 1994). Scenario (2) occurs when the lowest common ancestor algorithm fails because the top 10% of BLAST hits are a mixture of unrelated sequences probably due to the misidentification of sequences in Genbank (e.g. fungal sequences from dipteran specimens labelled as dipteran sequences).

Accepted Article
This article is protected by copyright. All rights reserved.

Evaluating nested metabarcoding for determining Lepidopteranparasitoid interactions
We tested the ability of a NGS nested metabarcoding design to produce individual-level data for a large number of caterpillar samples (>1000) in a single sequencing run. We achieved a high level of PCR and sequencing success and found an average of 11,000x coverage for each PCR well before sequence filtering, allowing us to adopt a high stringency for sequence quality. The depth of coverage found in our experiment allowed us to distinguish multiple unique sequences in each well, representing the host, parasitoids, and (potentially) any other species interacting with the moths such as parasitic fungi or intracellular parasites. Thaumetopoea processionea caterpillars were parasitised by two parasitoid species already known from the literature. Carcelia iliaca was found to parasitise almost half of all caterpillars while the other, Compsilura concinnata, was only detected in four caterpillars.
In addition to tachinid parasitoids, we detected a range of fungal sequences including the entomopathogenic fungus Beauveria bassiana. However, before making assessments of B. bassiana infection rates, it would be much better to use fungi specific primers that target the ITS region more commonly used for fungal barcoding (Seifert 2009). Arthropod coxI primers are likely to be inefficient at amplifying fungal DNA and the fungal reference libraries are much more complete for ITS. While this would require investment in a set of tagged fungal ITS primers in addition to the general arthropod primers used here, multiple loci may not automatically mean multiple sequencing runs (e.g. Cruaud et al. 2017), so the overall cost may not increase considerably, something that is not the case with Sanger approaches. Thus, our approach leads easily to a much more complete understanding of the ecological interactions than standard Sanger barcoding approaches (cf. Wirta et al. 2014;Derocles et al. 2015). The relative costs of NGS and Sanger sequencing vary with the scale of the experiment. Commercial UK prices for Sanger sequences in both directions are approximately 1/150th the cost of an Illumina MiSeq run at time of writing so for small numbers of individuals and a single barcode locus Sanger sequencing may be much more cost effective. As the quantity of data required increases, however, NGS has the potential to be considerably cheaper, since the costs of a

Accepted Article
This article is protected by copyright. All rights reserved.
single NGS run are largely fixed, irrespective of how many individuals are included. Although our experiment could have even been performed using Sanger sequencing (through the use of order level primers or cloning), for our experiment we estimate that the costs are at least 1/2 that of the equivalent Sanger experiment even when buying a set of tagged primers sufficient to cover multiple experiments and assuming that no cloning and extra sequencing is required for the Sanger approach (see appendix 4). In reality the cost savings are likely to be much greater.

Limitations and improvements of the nested metabarcoding approach
Our first attempt at using this method (Supplementary information appendix 1) revealed that it can be highly sensitive to cross contamination between both sample and control wells (see figures S1B, S2B, S4 and S5A; appendix 1 figures are intended for direct comparison with the main text figures). It was suspected that a number of pathways may have contributed to the contamination. First, manual puncturing of caterpillars may be releasing bodily fluids of both host and parasitoid into the air around the DNA extraction plate. We improved this by mechanically lysing caterpillars in closed tubes to contain any aerosols or debris. Second, it was suspected that contaminating aerosols may be moving beneath the commonly used sealing film on a standard 96 well PCR plate during the hottest stages of the PCR cycle. To mitigate this we moved from using 96 well plates to strips of tubes with individual lids and a mineral oil vapour barrier above the PCR mastermix. This improved the quality of the results dramatically allowing us to have much greater confidence that our results are representative of true parasitism rates. In addition to the improvements already implemented here, we would recommend quantification of PCR products using a plate reader and the use of robotic liquid handlers to accurately pool equimolar samples into each pre-library prior to PCR2 as this would likely help control for potential sequencing dropout as possibly seen in our DNA positives.
There is considerable variation in the proportions of reads in each sample attributable to OPM and its parasitoids and it may be the case that this represents true variation in the proportions of each sample composed of OPM or parasitoid tissue but we consider this to be an unreliable approach at present. Some authors have attempted to relate read depth to biomass or numbers of individuals both for PCR based metabarcoding (e.g. Elbrecht & Leese 2015;Thomas et al. 2016) and PCR free metabarcoding (e.g. Tang et al. 2015). Attempting to measure sample sizes or biomass from read depth presents a number of challenges. First, PCR based approaches can be biased by variation in

Accepted Article
This article is protected by copyright. All rights reserved.
amplification efficiency across different taxa (for example, variation in primer binding affinities across different taxa or base composition variation affecting enzyme efficiency). PCR free approaches to metabarcoding attempt to circumvent this by removing the PCR step and all the associated biases completely (e.g. Tang et al. 2015). In theory, read depth should then correlate with copy number for a given locus, but in reality we have little knowledge, for most species of how sequenceable DNA availability is affected by extraction method and more importantly, how read depth then correlates with biomass or numbers of individuals across different life stages. PCR free metabarcoding is further constrained as much of the read depth which could be used for sequencing additional specimens is used for sequencing additional areas of genome that are not necessary for identification. While much of the variation in proportions of OPM and parasitoid reads in our samples are likely to be attributable to relative proportions of host and parasitoid tissue, we feel that it would be necessary to perform extensive calibration (as in Thomas et al. 2014;and Elbrecht & Leese 2015) to make any concrete conclusions surrounding this. Nevertheless, our approach allows us to use presence/absence data across a large number of individual specimens to produce quantitative frequency data that can be analysed with standard statistical tests at the same time as reducing over-sequencing of any single individual.

OPM, its parasitoids and IPM
Other parasitoids species known to attack OPM in its native range were not detected. Their absence in our data set may be due to our samples being almost exclusively late instar caterpillars, whereas many of the parasitoids recorded in the literature are egg or early instar parasitoids that emerge before nest formation or are pupal parasitoids (Sobczyk 2014 in the UK), and much of the parasitoid community associated with OPM in its native range may simply not be present in the UK, or even if present as a taxonomic species, the local race may be an ecologically adapted cryptic species with different host preferences (e.g. Smith et al. 2007). More thorough sampling of all OPM life stages and those of other insect herbivores in the wider forest environment would allow us to assess this. In addition a study examining a series of OPM populations sampled across the native range may provide further insights into potential biocontrol agents for future introduction.
In order to better understand the role of parasitoids in mediating OPM numbers, it is also desirable to consider the place of both OPM and its parasitoids in the broader ecological networks of which they are members, as both direct and indirect interactions with other species in the wider network affect probability of hosts and parasitoids of interest interacting (Hrček & Godfray 2015;Evans et al. 2016). This is especially important when considering introducing a new biocontrol agent to an area. By knowing both the alternative hosts of confirmed OPM parasitoids and any previously reported parasitoids, forest managers could design specific planting regimes to enhance parasitoid control of OPM (as suggested in Evans et al. 2016). An example of this can be seen with C.
concinnata. Evidence from the North American use of C. concinnata as an introduced biocontrol agent for gypsy and brown-tail moths suggests that this species also has a very wide host range (Strazanac et al. 2001;Elkinton & Boettner 2012), and that it is generally ineffective at preventing the spread of the two main target species because of low parasitism levels. This species was also detected at very low levels in UK OPM, but whether these were accidental parasitism events caused by adult females misinterpreting oviposition cues or the first steps in the host range expansion of the UK race of C. concinnata is unclear. Understanding both how this will change over time and the competitive effects of other hosts vs OPM for this species is crucial to its evaluation as an OPM biocontrol agent in the UK.

Nested metabarcoding and ecological networks
Ecological sciences are appreciating more than ever the power of incorporating ecological networks rather than simple species lists into monitoring approaches. The ability to start disentangling species

Accepted Article
This article is protected by copyright. All rights reserved.
interactions has potential to revolutionise habitat management, habitat restoration, conservation and IPM, but in order to do this there is a need for large well sampled ecological networks (Evans et al. 2016). Building such networks requires large sample sizes of individual level rather than community data, and so have previously been little assisted by NGS. Nested metabarcoding can fill this gap and although applied here to parasitised individuals, we anticipate that the sequencing approach demonstrated could be applied in exactly the same way to a range of study systems where it is desirable to sequence numerous samples each containing a restricted number of species, e.g. for detecting pollen on pollinators (current work in prep); identifying recent meals on mouthparts of insect herbivores, or describing interactions between arbuscular mycorrhizal fungi and different plant species. In addition, the networks produced are explicitly linked to sequence data for all the individuals included. This facilitates an evolutionary approach to examining community assembly and for the investigation of broader coevolutionary patterns.
This approach could also be used to process more complex communities in environmental or medical samples, soil mesofauna, bulk insect samples, or any other complex community while still keeping the number of MID tags required at reasonable levels. Should the read number be insufficient for a given experiment, the same samples could be loaded onto a sequencer with higher throughput (e.g. Illumina HiSeq rather than MiSeq) to address this issue, as long as the paired-end nature of the sequences can be maintained. For taxonomic groups that require longer barcodes for accurate identification, emerging technologies such as nanopore sequencing and the PacBio SMRT sequencing may ultimately prove useful.

Conclusions
Here we demonstrate a highly successful approach to detecting species interactions using a single MiSeq sequencing run. We have shown that a significant proportion of over 1000 OPM caterpillars were parasitised by either Carcelia iliaca or Compsilura concinnata. The costs are highly favourable compared to undertaking the same study using Sanger based approaches. Scaling this approach would allow for the construction of large, highly-resolved ecological networks of use in a range of applications including conservation and land management, but the sequence based nature of the data generated also allows for the construction of phylogenetically-structured networks that enables many

Accepted Article
This article is protected by copyright. All rights reserved.
fundamental community dynamics and co-evolutionary questions to be explored. These network and evolutionary based approaches will be of increasing importance as we attempt to quantify functional changes in ecological networks with climate change, habitat modification, and species loss.