Linking modern pollen accumulation rates to biomass: Quantitative vegetation reconstruction in the western Klamath Mountains, NW California, USA

Quantitative reconstructions of vegetation abundance from sediment-derived pollen systems provide unique insights into past ecological conditions. Recently, the use of pollen accumulation rates (PAR, grains cm−2 year−1) has shown promise as a bioproxy for plant abundance. However, successfully reconstructing region-specific vegetation dynamics using PAR requires that accurate assessments of pollen deposition processes be quantitatively linked to spatially-explicit measures of plant abundance. Our study addressed these methodological challenges. Modern PAR and vegetation data were obtained from seven lakes in the western Klamath Mountains, California. To determine how to best calibrate our PAR-biomass model, we first calculated the spatial area of vegetation where vegetation composition and patterning is recorded by changes in the pollen signal using two metrics. These metrics were an assemblage-level relevant source area of pollen (aRSAP) derived from extended R-value analysis (sensu Sugita, 1993) and a taxon-specific relevant source area of pollen (tRSAP) derived from PAR regression (sensu Jackson, 1990). To the best of our knowledge, aRSAP and tRSAP have not been directly compared. We found that the tRSAP estimated a smaller area for some taxa (e.g. a circular area with a 225 m radius for Pinus) than the aRSAP (a circular area with a 625 m radius). We fit linear models to relate PAR values from modern lake sediments with empirical, distance-weighted estimates of aboveground live biomass (AGLdw) for both the aRSAP and tRSAP distances. In both cases, we found that the PARs of major tree taxa – Pseudotsuga, Pinus, Notholithocarpus, and TCT (Taxodiaceae, Cupressaceae, and Taxaceae families) – were statistically significant and reasonably precise estimators of contemporary AGLdw. However, predictions weighted by the distance defined by aRSAP tended to be more precise. The relative root-mean squared error for the aRSAP biomass estimates was 9% compared to 12% for tRSAP. Our results demonstrate that calibrated PAR-biomass relationships provide a robust method to infer changes in past plant biomass.


Introduction
Quantitative reconstruction of past plant abundance has been an important goal in paleoecology since the field's inception (Von Post, 1918) and a major research frontier spanning decades (Davis and Deevey, 1964;Davis et al., 1984;Hicks, 2001;Likens and Davis, 1975;Seppä et al., 2009;Matthias and Giesecke, 2014;Marquer et al., 2014). The research community lacks a complete understanding of how the pollen signal reflects plant population parameters (e.g. biomass), and therefore past population change (Davis et al., 1984;Fagerlind 1952;Prentice, 1988;Seppä et al., 2009). Developing methods to quantitatively reconstruct past plant populations would aid climate science and restoration ecology. In climate science, for example, quantitative reconstructions of past plant populations would allow better understanding of long-term ecosystem dynamics (Gaillard et al., 2000) and provide past analogues to test complex climate models that account for the effects of landcover on the climate system (Gaillard et al., 2010). Restoration ecology would benefit from an improved understanding of the impact of disturbances (natural and anthropogenic) on landscapes and ecosystems (Broström et al., 1998;Crawford et al., 2015) and from the increased participation by paleo-ecologists in the debates of modern restoration ecology (Hellman et al., 2009;Swetnam et al., 1999).
Palynologists often use pollen percentage data in pollenvegetation models to reconstruct landcover and understand past plant populations, but this approach does not provide separate reconstructions for each taxon's population change (Davis, 1963;Prentice, 1988). Relative changes in abundance of species have been inferred form Bayesian hierarchical spatio-temporal pollenvegetation models (Dawson et al., 2019). In contrast, pollen accumulation rates (PAR) -a measure of the rate of pollen deposition at the sediment surface per unit area during a given time period (e.g. grains cm −2 year −1 , Davis and Deevey, 1964) -depend on the abundance of the plant taxa producing that pollen type around the collection site. That is, the PAR for each taxon is independent of all other taxa. PAR allows results from different regions to be directly compared, irrespective of other taxa in the investigations (Giesecke and Fontana 2008;Hicks and Hyvärinen, 1999). PAR has been used to reconstruct not only landcover, but also population dynamics and plant biomass (Matthias and Giesecke, 2014;Seppä et al., 2009;Theuerkauf et al., 2012). For example, PAR has been used to reconstruct past population growth rates (Bennett, 1983(Bennett, , 1986Giesecke, 2005;MacDonald, 1993) and to reconstruct Holocene biomass records in at least two areas: the Finnish boreal zone (Seppä et al., 2009) and a sub-alpine forest in Utah (Morris et al., 2015).
PAR is not a simple reflection of vegetation abundance because the pollen signal is a distance-weighted measure of taxa abundance in the surrounding vegetation, responding to the structure of the plant community as well as species abundance (Jackson, 1990). Modern PAR values must be quantitatively correlated with modern plant population data from the lake surroundings in order to parameterize the PAR-population relationship before fossil PAR records can be interpreted in terms of past plant population change. This correlation step requires accurate vegetation data from forest inventories (Matthias and Giesecke, 2014;Seppä et al., 2009), careful field surveys (Bunting et al., 2013), or well-resolved spatial imagery coupled with ground-truthing (Han et al., 2017) that encompasses the relevant source area of pollen ("RSAP," discussed below and in section "Spatial area represented by the pollen record," sensu Sugita, 1993). Previous work has shown a linear relationship between PAR and distance-weighted biomass across a range of lake sites in northeastern Germany (Matthias and Giesecke, 2014).
In addition to quantitative vegetation data, reliable PAR data require a robust chronology of the pollen system being studied. Ideally, a sedimentary core for PAR data collection has two features: it is obtained from an undisturbed lake environment where sediment accumulates evenly over time, and the resulting sediment is dated at high resolution. Where lakes are found to have stable sedimentary conditions, reliable PAR datasets can be obtained (e.g. Hyvärinen 1975;Ritchie, 1969;Seppä and Hicks, 2006) though there may still be channel funneling. The recent development of Bayesian tools has improved the construction of chronologies from isotopic data such as 210 Pb activity measurements, giving more reliable measures of uncertainty (Aquino-López et al., 2018).
Lastly, all sedimentary basins have a relevant source area of pollen (RSAP), which is sometimes referred to as the "pollenshed" of the basin (sensu Sugita, 1993). The basic premise is that vegetation within a certain distance of the basin corresponds to the quantity and type of pollen deposited at the site. With distance from the lake shore, correlations between plant abundance and pollen loading are expected to improve, then approach an asymptote at some distance because source vegetation of pollen far from the basin should have much less influence on the pollen representation than vegetation closer to the basin. Estimating the RSAP is a key step for quantitative calibration because it provides information about the spatial extent of any subsequent vegetation reconstruction (Bunting et al., 2004;Hellman et al., 2009;Sugita et al., 1999). To our knowledge, the distinction between an assemblage-level RSAP (aRSAP, Sugita, 1993) and taxa-specific RSAP (tRSAP, sensu Jackson, 1990; Matthias and Giesecke, 2014) has not yet been drawn within the same basin. Comparing these estimates provides insight about how pollen assemblages "sense" vegetation, which is critical to the extraction of vegetation information from pollen data.
Given the methodological challenges, the application of calibrated PAR-biomass transfer functions to any ecosystem is not routine. This paper develops PAR-biomass models using short cores from seven small lakes in the western Klamath Mountains, California, and follows the general approach used in previous studies (e.g. Matthias and Giesecke, 2014;Seppä et al., 2009) whilst critically evaluating each step in the process. The Klamath bioregion contains numerous small lakes and is an area where Holocene-length paleoecological records have already provided a portrait of ecological change (Figure 1). We measured modern PAR from lake sediments and acquired vegetation abundance data to achieve three goals: (1) to understand the spatial relationship between pollen flux in small lakes and surrounding vegetation cover, through modeling of the aRSAP and tRSAP, (2) to calibrate a PAR-biomass model using distance-weighted biomass for major tree taxa, and (3) to assess the potential of this model to reconstruct past changes in assemblage-wide biomass from the region.

Background
Below, we describe the study area's physical features (section "Study Area"), our pollen-vegetation modeling approach (section "Pollen-vegetation models"), and the methodology used to estimate aRSAP and tRSAP (section "Spatial area represented by the pollen record").

Study area
The Klamath bioregion, a physically and floristically diverse area in northwestern California (Cheng, 2004;Whittaker, 1960), contains hundreds of small lakes. Many lakes are found at high elevations and are glacial in origin, but there are also landslide-created lakes at low-and mid-elevations in the western portion of the region (Wahrhaftig and Birman, 1965). The landscape has deep catchments and steep mountains (Irwin, 1981), and the climate is Mediterranean, consisting of cool, wet winters and warm, dry summers (Skinner et al., 2006). Prior to 20th century fire suppression, the landscape had a mixed-severity fire regime characterized by mostly small, low-intensity, frequent fires, and infrequent large burns of mixed-severity (Taylor and Skinner, 2003).
We selected seven small lakes in the Six Rivers National Forest with small basins, minimal stream inputs, and shallow slopes (Table 1, Figure 1). Vegetation around the lakes is representative of the diverse mixed conifer forest of the Klamath bioregion (Hudiburg et al., 2009) although the dominant overstory varies at each lake site. Holocene-length pollen records (percentage and PAR) already exist for three of the seven lakes and suggest that the modern forest structure and composition have been relatively stable for the last 2000 years (Wanket, 2002) but also imply a 3000-year historic high of Douglas-fir in the contemporary forest (Crawford et al., 2015).

Pollen-vegetation models
Linear pollen-vegetation models (PVMs) have a long history of use in palynology (Andersen, 1970;Bunting and Middleton, 2005;Bunting et al., 2013;Davis, 1963Davis, , 2000Prentice, 1985Prentice, , 1988Sugita, 1993Sugita, , 1994. PVMs use the relationship between pollen assemblages and vegetation to infer past vegetation composition or structure from fossil pollen data. The main strengths of linear PVMs are: 1) they provide the means to reconstruct vegetation from landscapes with no modern analogue; 2) they have been widely tested against empirical data in quantitative reconstruction research (e.g. Andersen, 1970;Bunting and Hjelle, 2010;Davis, 1963;Prentice, 1985;Sugita, 1993); and 3) they have been successfully validated in at least one region (southern Sweden, Hellman et al., 2008aHellman et al., , 2008bSugita, 2007aSugita, , 2007b. In this work, we used a version of Sutton's original PVM model (Sutton, 1947(Sutton, , 1953 inverted by Prentice (1985) and modified by Sugita (1994) for lake environments. This model's form -called Prentice-Sugita-Sutton -assumes that pollen could land anywhere on the lake surface and would be perfectly mixed in the water column before being deposited on the lakebed. The Prentice-Sugita-Sutton model also assumes that pollen transport is largely via wind above the canopy and gravity beneath the canopy, and that the sampling basin is circular with uniform wind in every direction (Sugita, 1994, full list of assumptions in the supplement). Under this approach, we (a) divide the vegetation into rings, (b) distance-weight each ring, and (c) compare the PAR from the basin with the summed distance weighting from one or more rings, working out from the edge of the basin. This model calculates the total pollen influx from each source across the whole lake. Its simplest linear form is: where, Y ik = pollen influx for a taxon i at site k α i = pollen productivity of taxon i ψ ik = the distance-weighted plant abundance (DWPA) of taxon i around site k with the weighting term reflecting the pollen dispersal of taxon i (weighting term calculation shown in equation (3)).
DWPA (ψ ik ) is defined as: where, R = the radius of the canopy opening in which the sample site is located X ik (z) = the plant abundance measure consisting of the contribution of taxon i to the pollen assemblage formed at site k from plants located distance z from sampling location k, and g i (z) is the distance weighting term for taxon i at distance z from any sampling location.
and, z = distance γ = a coefficient of 0.125 (Prentice, 1985) v g = approximated by v s (fall speed, m s −1 ) C z = the vertical diffusion coefficient (m 1/8 ) n = a dimensionless turbulence parameter equal to 2 u = windspeed (m s −1 ), set equal to 3. Note that C z and n depend on atmospheric stability. The equation (2) can be re-written as a sum with two addends: (1) the unique contribution of the vegetation close to site k where ζ is the pollen source area for site k and (2) the long-distance pollen transport ("background pollen," which is uniform beyond ζ), giving: Which can be written as

Spatial area represented by the pollen record
We estimated the spatial extent of our sites' source area of pollen in two ways (definitions in Table 2). We calculated the standard assemblage-specific metric -the relevant source area of pollen (aRSAP) -which is defined as the area beyond which the correlation between pollen and vegetation does not improve (Sugita, 1993). Estimates of aRSAP can be extracted from extended R-value (ERV) analysis using pollen percentage data (Parsons and Prentice, 1981). ERV analysis is the process of solving n equations for 2n unknowns in order to extract the parameter estimates, where ERV sub-models 1, 2 and 3 are the underlying vegetation-pollen relationship models. The three sub-models define background pollen differently (Sugita, 1994). Whereas models 1 and 2 use pollen data and vegetation percentages (Parsons and Prentice, 1981), model 3 uses pollen percentages and plant abundance data in absolute units (e.g. biomass per area) (Sugita, 1994). Using the maximum likelihood method, ERV models iteratively fit the relationship between pollen and vegetation percentages (Bunting and Hjelle, 2010). Maximum likelihood function scores measure the goodness of fit between pollen percentages and distance-weighted plant abundance. The aRSAP can be estimated from visual inspection of the likelihood function score plotted against distance; it is the point at which scores approach an asymptote Sugita et al., 1999).
We then calculated a taxon-specific metric of the relevant source area (tRSAP) to compare to the aRSAP. We call the tRSAP the distance beyond which the correlation between PAR and DWPA summed to that distance does not improve (Jackson, 1990). We fit a linear equation (equation 5) for each individual taxon because both y and ψ are measured in independent terms. We again used the ring source model, which converts the integral into a summation. That is, we summed the value for each of the rings and g i (z) includes ring area in this formulation. As with aRSAP, tRSAP can be estimated from visual inspection of the R 2 value against the distance from the lake shore (m) (Matthias and Giesecke, 2014).

Methods
Fitting PAR-biomass relationships requires a number of steps shown in a flowchart ( Figure 2) with numbers matching the following sections.

Lake selection and core sampling
We used the following criteria to determine suitable lake sites: small size (radius approximately 100 m), no permanent outflow, simple basin, and core length greater than 25 cm. A total of 10 such lakes were identified from topographical maps and satellite imagery as promising, but each were assessed in the field. Out of this collection, seven lakes were viable and selected for 210 Pb dating. During the summer of 2018, short cores (~50 cm) of 7 cm diameter were taken from each lake's center using either a gravity corer (Ogaromtoc, Fish Lakes) or a piston corer (all other lakes). The sediment-water interface was immobilized by sodium polyacrylate for transport. Cores were later split and sectioned in the laboratory.

Sediment dating, age-depth model, and sediment lithology
We used lead-210 ( 210 Pb; 22.3 year half-life) to assign ages to sediment deposited in the last 150 years. Surface bulk sediments from 0 cm to a maximum of 45 cm were taken from each core and dried to 105°C (see Supplemental Tables S1-S7, available online). 210 Pb activity was determined by alpha spectrometry (see SI for complete dating methodology). We used the Bayesianbased Plum software to develop age models from excess (unsupported) 210 Pb data (Aquino-López et al., 2018). Plum uses a self-adjusting Markov Chain Monte Carlo (MCMC) algorithm called the t-walk (Christen and Fox, 2010). Plum uses millions of MCMC iterations to model the accumulation of sediment, using a gamma autoregressive semiparametric age-depth function (Blaauw and Christen, 2011). This algorithm results in a probability envelope around the mean age model. Supported 210 Pb activities were determined from the direct measurements of 226 Ra by gamma-ray spectrometry.

Pollen analysis
Pollen samples -one from each lake site -were extracted from 0.63 cm 3 of wet sediment from the top 0.5 cm of each core and were processed according to standard pollen preparation procedures (Faegri and Iversen, 1989) but modified to include two steps: (1) sieving with 5-and 153-µm mesh under vacuum and (2) swirling, with the less dense fractions retained. These steps draw on Doher's palynomorph methodology and current United States Geological Survey procedures (Doher, 1980). One Lycopodium spore tracer tablet containing 20,848 spores was added to each sample to calculate pollen concentration (Faegri and Iversen, 1989;Stockmarr, 1971). Acetolysis and sieving steps were repeated for samples containing high amounts of organic material. Pollen samples were mounted in silicone oil and examined at 500× magnification. At least 400 terrestrial grains per sample were counted and identified using the UC Berkeley Museum of Paleontology modern pollen reference collection, as well as pollen atlases (Halbritter et al., 2018;Knapp, 1969).  Sugita (1994) as the "smallest area within which reliable estimates of parameter values and asymptotic r 2 or likelihood function scores can be obtained." The definition was refined as the "distance from a pollen deposition point beyond which the relationship between vegetation composition and pollen assemblage does not improve" (Bunting et al., 2004, with Sugita). Estimates are derived for the overall assemblage from extended R-value analysis (Parsons and Prentice, 1981) through inspection of the likelihood function score plot. RSAP varies depending on which taxa and which sites are included in the analysis, thus is dependent on the assemblage chosen for analysis. aRSAP Identical to the standard RSAP, but with the addition of an "a" to denote that it is an assemblage-specific metric, in contrast to the tRSAP. tRSAP The RSAP concept can be extended to single taxa where pollen taxa are measured independently (e.g. PAR values rather than percentage values). In this situation, we define a taxon-specific Relevant Source Area of Pollen, the tRSAP, as the distance beyond which the correlation between PAR (Y) and distance-weighted plant abundance (ψ) summed to that distance for a single taxon does not improve (Jackson, 1990). Seven wind-pollinated taxa were identified at all sites: Pinus, Pseudotsuga, Quercus, Notholithocarpus, Alnus, TCT (Taxodiaceae, Cupressaceae, and Taxaceae families), and Abies. The corresponding plant taxa from the study area were sugar pine, Jeffrey pine, ponderosa pine (Pinus); Douglas-fir (Pseudotsuga); California black oak, canyon live oak (Quercus); tanoak, golden chinquapin (Notholithocarpus); white alder (Alnus); Port-Orford-cedar, incense-cedar (TCT); white fir, red fir (Abies). we only encountered Port-Orford-cedar and incense-cedar in the vegetation survey at the study sites and assume all TCT originating within the surveyed vegetation area came from these species. Counts of Pinus, Quercus, Notholithocarpus and Abies reflect all the pollen grains from their respective genera (i.e. we report total Pinus which likely contained sugar pine, Jeffrey pine, and ponderosa pine grains). Pseudotsuga and Alnus counts represent the species Pseudotsuga menziesii (Douglas-fir) and Alnus rhombifolia (white alder). Other wind-pollinated tree pollen present in trace amounts includes willow (Salix), buckthorn (Rhamnaceae), hazel (Corylus) and silk tassel (Garrya). This group of "other hardwoods" accounted for only 0.35% of the woody species. Given their rarity, we omitted them from the determination of pollen source area and subsequent PAR-biomass modeling.

PAR determination
Pollen concentrations (grains cm −3 ) and PAR (grains cm −2 year −1 ) were determined using the Lycopodium marker grains, the number of pollen grains counted for each taxon, the volume of the pollen sample (e.g. 0.63 cm 3 ) (Stockmarr, 1971), and the sediment accumulation rate (Davis and Deevey, 1964), which differed by lake site and was determined by the Plum age model in increments of 0.5 cm (see SI for equations used).

Forest inventories
We used cruising prisms (wedges of glass with a known size/ angle) to determine the basal area of the dominant pollen-producing taxa within 750 m from each lake's shoreline (United States Department of Agriculture, Forest Service, 2000). The prism method employs variable plot radius sampling at the stand level. Transects in eight directions (N, S, E, W, NE, NW, SE, SW) from the lake shore were sampled. The basal area of live trees was measured using the prisms every 50 m along the transects, following Han et al. (2017) (Figure 3).
We used aboveground live tree biomass (AGL) as the specific measure of abundance that is distance weighted. To estimate AGL from basal area measurements, we developed species-specific allometric equations using contemporary data from the US Forest Service Forest Inventory and Analysis program (FIA). From the FIA plots inventoried in Six Rivers National Forest between 2001 and 2017 (FIADB, 2020), we calculated plot-level basal area (m 2 ha −1 ) for every species in the plot and linked it to the estimate of plot-level aboveground live biomass (Mg ha −1 ) for each species (n = 3428 plot-by-species observations). AGL was estimated using the regional model of tree biomass (Zhou and Hemstrom, 2009). For every species, we predicted AGL as a function of basal area using a linear log-log (natural) equation (sensu Knight et al., 2020). Specifically, where ln(AGL ij ) is the natural log of aboveground live biomass for species i in plot j, ln(Basal Area ij ) is the natural log of tree basal area for species i in plot j, β 0i is the intercept for species i, and β 1i is the slope coefficient for species i. For the six most abundant species that accounted for 90% of the basal area, fits ranged from a low of 0.85 for sugar pine to a high of 0.97 for Port-Orfordcedar (Supplemental Table S8, available online). With these equations and field measurements of species basal area, we calculated the AGL of each species in the prism sample.

ERV analysis and estimation of aRSAP
The aRSAP values were extracted from conventional ERV analysis using model 3. We used PolERV from the software suite HUMPOL (Bunting and Middleton, 2005) which has the same core code (erv-v6.exe and polsim-v3.exe) as other ERV software, e.g. POLLSCAPE (Sugita, 1994). In order to meet the requirement that the number of sites is at least twice the number of taxa used in ERV analysis (Soepboer et al. 2007), we analyzed sub-sets of three taxa across the seven sites using the same reference taxon (TCT) every time. For example, one such subset combination was TCT, Pseudotsuga, and Pinus. We selected TCT as the reference taxon (i.e. specified that TCT has a relative pollen productivity of 1.0) for several reasons. First, a scatter plot of TCT pollen values and unscaled distance weighted plant abundance is positive and linear (Supplemental Figure S1, available online). Second, TCT has an estimated relative pollen productivity in the middle of the dataset upon ERV analysis with all seven taxa. Lastly, TCT is represented in pollen data at all sites (unlike Abies, Alnus, or Quercus), and is present in vegetation close to the coring site. aRSAP was estimated by plotting the likelihood scores for each distance across all taxa combinations and pooling the results.

Distance weighting and estimation of tRSAP
AGL results were first averaged by the number of plots in each concentric ring and then each ring was weighted using the Prentice-Sugita weighting under stable conditions, which affect parameters C z and n (equations (3) and (3a)). We assumed stable atmospheric conditions because simulation experiments comparing unstable and stable models demonstrate little difference in estimated aRSAP and pollen productivity (Broström et al., 2004). The pollen-specific fall speeds (m s −1 ) of Abies, Alnus, Pinus, Pseudotsuga, and Quercus have been determined in previous work (Supplemental Table S9, available online). For TCT, Stoke's Law (Gregory, 1973) was used to calculate fall speed using the average grain size of each taxon and weighted by relative abundance of the contributing species Port-Orford-cedar and incensecedar (both Cupressaceae family). For subprolate grain Notholithocarpus, major and minor axes were measured from reference slides in UC Berkeley's collections, and then Stoke's Law with Falck's (1927) correction was used. Lastly, we determined the coefficient of determination (R 2 ) of the linear model predicted from AGL dw at distance z ( AGL dw z ) as a function of PAR. The R 2 between PAR and summed AGL dw for each ring distance was plotted against the distance. The tRSAP occurs where the line reaches an asymptote.

PAR-biomass transfer equations
We developed transfer equations to predict taxon-specific contributions to the distance-weighted AGL (AGL dw ) as a function of taxon-specific PAR. Although biomass "predicts" pollen accumulation rates in a functional sense, our aim was to apply calibrated transfer functions to predict biomass in the past. Consequently, we fitted regression lines with PAR values as the independent variable. This reasoning has been used for needle accumulation rate as a predictor of Holocene-era basal area (Blarquez et al., 2011). In this analysis, each lake represented a sample with the source area defined by either aRSAP or tRSAP. We included seven pollen-producing taxa, namely Pseudotsuga, Pinus, Notholithocarpus, TCT, Alnus, Quercus, and Abies, that collectively account for greater than 99% of the pollen-producing trees present in the surrounding landscape. Using the assigned source area distances, we calculated AGL dw for the taxa present at each lake and regressed it against PAR using linear models (see Figures 8 and 9). Specifically, we evaluated three model forms: a linear model with an intercept term and slope term, a linear model with only a slope term, and segmented linear model with one breakpoint. In the linear models with intercepts, the intercept represents the "background" pollen component; because we treated PAR values as the independent variable, these intercepts are negative. So, we included an origin-forced model as an option because negativeintercept models are not biologically meaningful for biomass reconstruction given that very low PAR values would yield negative biomass. We included the segmented model to potentially capture threshold responses in the relationship between AGL and PAR (Muggeo, 2008). We ranked the models by the Akaike Information Criterion for small samples (AICc) in order to compare performance. AICc imposes a stronger penalty on model complexity than AIC and was chosen in order to avoid fitting models which were overly complex given the size of the dataset (Burnham and Anderson, 2002).
To evaluate the uncertainty introduced by the PAR transfer functions, the AGL dw predicted from PAR at each lake (predicted AGL dw ) was compared to the AGL dw calculated from the observed AGL dw . Error was propagated using a resampling method (Crowley et al., 1992). Specifically, we estimated the error in predicted AGL dw for each iteration as a random sample from a normal distribution with the mean equal to zero and the standard deviation equal to the standard error of the regression estimate (SEE) for each taxon. Results were based on 10,000 iterations and reported as means and standard errors of the predicted AGL dw for each lake. Bias between the predicted and observed AGL dw was calculated as:

Chronology
The seven lakes' chronologies were established using at least 20 210 Pb measurements at each site (see Supplemental Table S3-S9, available online for exact number of samples for each core). Blue Lake is shown as an example (Figure 4). The chronologies for Fish, North Twin, Ogaromtoc, Onion, Red Mountain, and South Twin lakes followed the same procedure (Supplemental Figure  S2, available online). The lakes are characterized by rapid sedimentation rates, with rates in the upper sediments in the range of 0.14-0.33 cm year −1 (3-7 year cm −1 ). Therefore, surface samples (upper 0.5 cm) contain pollen from 2018 (collection date) to 2011 at the oldest. Core lithology results are provided in the supplement (Supplemental Figures S3 and S4, available online).

PAR
A group of highly abundant tree taxa contained Pseudotsuga, Pinus, Notholithocarpus, and TCT, which were reflected in high (>2000 grains cm −2 year −1 ) PAR values in most samples ( Figure 5, Supplemental Table S10, available online). For example, Pseudotsuga values were above 5000 grains cm −2 year −1 at all sites except Onion Lake. The highest overall PAR value was Pinus at Onion Lake which exceeded 10,000 grains cm −2 year −1 . High PAR values reflect the Douglas-fir and pine-dominant composition of Six Rivers National Forest. Onion Lake is the only lake situated in the True Fir alliance zone and, unsurprisingly, the Abies PAR value was the highest compared to all other sites (5000 grains cm −2 year −1 ). PAR values for Notholithocarpus and TCT varied across sites and were between 1000 and 4000 grains cm −2 year −1 .
The group of less abundant arboreal taxa included Alnus, Abies, and Quercus which were present in most samples with PARs of less than 2000 grains cm −2 year −1 ( Figure 5). Alnus values were generally around 1000 grains cm −2 year −1 , although values above 2000 grains cm −2 year −1 were observed at Ogaromtoc and Fish Lakes. Abies values were low (<1000 grains cm −2 year −1 ) at all sites except North Twin and Onion Lakes. Although pollen from Alnus, Abies, and Quercus were found at all sites, the taxa themselves were not recorded from the transect sampling at several lakes. This could be due to low abundance such that they were not captured in the survey or due to their absence in the sedimentary basin in which case their PAR contributions are background deposition. Pollen from the "other hardwood" category (defined as willows, buckthorn, hazel, and silk tassel) was detected in trace amounts (<100 grains cm −2 year −1 ).

aRSAP and tRSAP
Using the sub-setting approach for the aRSAP calculation, a coherent pattern was exhibited in the likelihood function scores from model 3. The values were high at short distances and then decreased rapidly until 175 m where they begin to flatten out. For all taxa combinations, we inferred via visual inspection that the curves reached their asymptotes at a distance of 625 m and thus the aRSAP of these lakes is 625 m from the lake shore. The likelihood function scores in relation to the distance from the lake shore are shown for one of the three sub-set examples: TCT, Pseudotsuga, and Pinus ( Figure 6).
Based on tRSAP calculations for the four dominant tree taxa, maximum R 2 values were reached before the maximum distance surveyed (750 m) from the shoreline (Figure 7). The R 2 values for Pinus and TCT were high (>0.75) at only 25 m from the shore and stabilized around 225 m, the tRSAP. The R 2 values for Pseudotsuga and Notholithocarpus continued to improve for some distance from the lake shore. For Pseudotsuga, the tRSAP was 625 m; for Notholithocarpus, it was 525 m. Sample sizes were insufficient  Lake sites are abbreviated: OGA, Ogaromtoc lake; FSH, Fish lake; STW, South Twin Lake; NTW, North Twin lake; ONO, Onion lake; RED, Red Mountain lake; BLU, Blue lake. For the data's tabular form, see Supplemental Table S10, available online.
to estimate tRSAP values for the minor taxa. For these taxa, we used the aRSAP value in AGL dw calculations (i.e. 625 m).

Transfer functions: PAR to AGL dw
PAR was a statistically meaningful and reasonably precise estimator of contemporary AGL dw for most of the pollen taxa present (Figures 8 and 9). Based on the aRSAP distances, the linear model without intercept was the best performing model (∆AIC c > 4.0) for the four most abundant taxa (Figure 8). For these taxa, the nointercept regressions were not only significantly better than the null model (p < 0.001) but also explained most of the variation. R 2 ranged from 0.87 for TCT to 0.96 for Pseudotsuga (Supplemental Table S11, available online). The model results for the three less abundant species (i.e. Alnus, Abies and Quercus) were more complex (Figure 9). Based on ∆AIC c , the segmented regression model best fit the Alnus and Abies data. However, both species were rare and found in abundance at only one lake ( Figure 5). The existence of this one abundant point exerts extraordinary leverage in the segmented regression. To avoid relying on a single point in these two transfer functions, we used the second-best regression model. For Alnus, it was a linear model; for Abies, it was a linear model without intercept (Supplemental Table S11, available online). For Quercus, none of the regression models were superior to the null (Figure 9, Supplemental Table S11, available online), so we used the mean and standard error to predict Quercus contribution to AGL dw estimates for each lake. We recalculated the biomass transfer functions using the tRSAP weighted AGL dw estimates for all taxa. Both the functional forms and fits were similar to aRSAP-based results (Supplemental Table  S12, available online). However, the coefficients varied with changes in the source area distance.
The transfer functions based on aRSAP distances provide robust means to estimate contemporary AGL from PAR (Table 3). The coefficient of variation (COV) in predicted AGL dw ranged from 13% to17% for six lakes with Ogaromtoc being the exception with a COV = 24%. The standard error of the estimate varied little among lakes and averaged 32 Mg ha −1 . In terms of accuracy the relative root mean squared error (rRMSE) between predicted and observed AGL dw was 9.2%. There was a small tendency for predicted AGL dw to overestimate the observed. The mean bias was 4.8% with two lakes, Red Mountain and South Twin, contributing the most to the positive bias ( Table 3). The predictions of AGL weighted using tRSAP distances (Table 4) tended to less accurate (rRMSE = 12.7%) and more biased (10.1%).

Source areas of pollen: aRSAP and tRSAP
Calibration of pollen-vegetation relationships is only effective when the scale of the vegetation sampling is close to or exceeds the scale of the relevant source area of pollen (Bunting et al., 2004). Therefore, being able to specify the source area of pollen in a given basin is an important step towards quantitative reconstruction of past vegetation (Hellman et al., 2009;Sugita et al., 1999). A primary aim of this work was to understand the spatial extent represented by the pollen assemblage. We addressed this aim by determining the assemblage-level relevant source area of pollen (aRSAP) obtained from pollen percentage data and ERV analysis and comparing those estimates with the taxon-specific source area of pollen (tRSAP) for four main taxa. Both metrics estimate the extent of vegetation that requires surveying for a subsequent reconstruction step but are seldom compared. aRSAP values have been estimated for lakes similar in size to those presented here (i.e. 100 m radius), in different settings including simulated landscapes. Reported aRSAP values have  ranged from: 300 m in a simulation of a hemlock-hardwood forest in the US (Sugita, 1994), to 800-1000 m in a simulation of spruce forest in Sweden (Sugita et al., 1999), to 1700 m in varying landcover types in Denmark (Nielsen and Sugita, 2005), to 1500-2000 m in semi-boreal forests of Estonia (Poska et al., 2011), and to 2200 m in the upper Tibetan Plateau (Wang and Herzschuh, 2011). Within this list, all aRSAP estimates were derived from Prentice-Sugita-Sutton distance-weighted models and are thus comparable to our estimate. Our aRSAP value of 625 m falls in the range (300-2200 m), though on the small end.
The aRSAP is unique to a given set of lakes and is sensitive to numerous factors such as lake size and basin shape (Sugita, 1993), vegetation patch size (Broström et al., 2005;Sugita, 1994), vegetation patterns (Bunting et al., 2004), and taxa spatial distribution (Hellman et al., 2009). For example, aRSAP values tend to increase with landscape openness defined as the extent of the vegetation cover in the sedimentary basin. For example, the aRSAP for small ponds in a closed forest was simulated to be 300 m (Sugita, 1994) and empirically verified by Calcote (1995), whereas the aRSAP for small ponds in an open Swedish landscape was 1000 m Figure 9. The relationship between distance-weighted aboveground live biomass (AGL dw ) and pollen accumulation rate (PAR) for Quercus and Alnus at the seven lake sites in the Klamath Mountains. The line represents the intercept of the null model. The relevant source area of pollen was defined as a circle with a radius of 625 m from the centroid of the lake. For details on the linear model, see Supplemental Table S11, available online. Figure 8. The relationship between distance-weighted aboveground live biomass (AGL DW ) and pollen accumulation rate (PAR) for five of the pollen taxa present at the seven lake sites in the Klamath Mountains. Lines represent linear regressions forced through the origin. The relevant source area of pollen (aRSAP) was defined as a circle with a radius of 625 m from the centroid of the lake. Note that the scales change for each pollen taxa. For summaries of the linear models, see Supplemental Table S11, available online. (Note: Although biomass "predicts" pollen accumulation rates in a functional sense, our aim is to eventually apply calibrated transfer functions to predict biomass in the past; thus, we fitted regression lines with PAR values as the independent variable.) (Sugita et al., 1999). However, expectations based on landscape openness can be complicated by vegetation heterogeneity. Higher vegetation diversity and complex spatial distribution of taxa are associated with larger aRSAPs (Hellman et al., 2009). The presence of rare taxa in a landscape can also increase the aRSAP, other factors being held constant (Bunting et al., 2004). For example, Commerford et al. (2013) observed the effect of rare taxa empirically: small lakes in a "very open" grassland in Kansas had a large aRSAP of 1060 m, which they attributed to scattered tree taxa in the tallgrass prairie.
The contemporary forests around our lake sites are dense, closed, and heavily dominated by Douglas-fir (Skinner et al., 2018). Taxa like black oak (Quercus), white alder (Alnus), and white fir (Abies) are present but not common. These rare taxa in the area contributed little to the overall biomass (2.3%) but make the landscape more heterogeneous. This heterogeneity can result in a larger aRSAP than if there were no rare taxa present. All else being equal, longer distances from each sampling site are required to get a sufficient cover of all taxa within the landscape to reach the regional average. These greater distances produce larger aRSAP estimates (Hellman et al., 2009). tRSAPs have been estimated at small lakes and ponds. For example, the tRSAP for Pinus was 200 m from the lakeshore in northeastern Germany (Matthias and Giesecke, 2014), and other tRSAP values in that study ranged from 50 m (Quercus) to 300 m (Fagus) to 1000 m (Betula). Jackson (1990) found small tRSAP estimates from ponds in New York: Acer (<20 m), Betula (>1000 m), Fagus (>1000 m), Picea (<100 m), Quercus (>1000 m) and Pinus/Tsuga (< 500 m). In this study, the tRSAP value for Pinus was 225 m, a near match to Matthias and Giesecke (2014) and comparable to Jackson (1990). The other tRSAP values in this study ranged from 225 m (TCT) to 525 m (Notholithocarpus) to 625 m (Pseudotsuga). Like Matthias and Giesecke's results, the tRSAPs are inconsistent with expectations based solely on the respective fall speeds of the taxa. For example, Pinus has one of the assemblage's lowest fall speeds and was expected to travel longer distances and have a large tRSAP; in fact, it had one of the shortest tRSAPs.
Unexpectedly small source areas of highly dispersible taxa have been observed in simulated landscapes (e.g. Betula, Sugita, 1994) and have been attributed to vegetation patterning. The estimated RSAP reflects the minimum distance at which the regional vegetation composition is attained. For example, if Betula is uniformly spread in a forest, the regional distribution signal of Betula will be captured closer to the sampling point than in a forest where Betula is heterogeneously spread across the forest. In this case, tRSAP reflects the distance at which regional vegetation composition is reached, instead of being predominantly controlled by the taxon's pollen dispersal ability and depositional properties (Sugita, 1994). The vegetation patterns in the Klamath area are complex and heterogeneous (Skinner et al., 2018). Within the sampling area, Douglas-fir (Pseudotsuga) is the dominant species with large amounts of tanoak (Notholithocarpus) at most lake sites, with pines (Pinus) intermixed and some cedars (TCT). The small "patches" of pines and cedars within a Douglas-fir dominant overstory could effectively shrink the tRSAPs of Pinus and TCT, following the logic presented in Sugita (1994). Thus, the finding of relatively small tRSAPs for Pinus and TCT, despite their fall speeds, and relatively large tRSAPs for Pseudotsuga and Notholithocarpus, aligns with the study area's vegetation patterning.
Our estimated aRSAP (625 m) and tRSAP values (all 625 m or less) suggest consistent, though not identical, interpretations of the source area of pollen. Both estimates indicate that the pollen record "senses" a local view of about the same area of the surrounding vegetation. Given that vegetation surveying must meet or exceed the scale of the relevant source area of pollen for quantitative reconstruction (Bunting et al., 2004), vastly different aRSAP and tRSAP estimates would potentially be consequential. If, for example, we had estimated an aRSAP << tRSAP, it would Table 3. A comparison of observed to predicted distance-weighted aboveground live biomass (AGL dw ) for each lake site using the assemblagelevel relevant source area pollen (aRSAP) estimates. Predicted AGL dw is the mean from 10,000 resampling iterations; Standard Error is the standard deviation of the 10,000 samples; COV is the coefficient of variation (Standard Error/Predicted AGL dw ); Bias is the percent difference between predicted and observed AGL dw . Predicted AGL dw is the mean from 10,000 resampling iterations; Standard Error is the standard deviation of the 10,000 samples; COV is the coefficient of variation (Standard Error/Predicted AGL dw ); Bias is the percent difference between predicted and observed AGL dw . imply that our assemblage-level view was in some way blind to taxa in the assemblage, and thus missing important landscape patterning or other features of the area from which pollen originated.
On the other hand, if we had estimated an aRSAP >> tRSAP, it would imply the subsequent reconstruction represents a larger area than is potentially being recorded by the pollen system. This consistency between the aRSAP and tRSAP results was reflected in the similarity of the AGL dw reconstructions (Tables 3  and 4). On average, observed AGL dw for each lake was 10.1 Mg ha −1 (5.2%) larger using aRSAP with the differences ranging from 2.6 Mg ha −1 (1.1%) larger at North Twin Lake to 22.2 Mg ha −1 (10.3%) larger at Onion Lake. The differences in terms of predictive ability were equally modest with aRSAP estimates producing somewhat more accurate and less biased results.

The potential of calibrated PAR as a bioproxy
Establishing the relationship between contemporary biomass and modern PAR values is contingent upon obtaining accurate sedimentation rates in cores. We are confident in our estimated sedimentation rates for two key reasons. First, we used a state of the art, robust Bayesian model to develop age models from 210 Pb dates (Aquino-López et al., 2018). Our results showed low uncertainty in the modeled ages in all cores, particularly in the top 20 cm. Second, we were able to compare our upper sedimentation rates representing the last decade to estimates from two of the same lakes (Ogaromtoc and Fish lakes) that were collected in 2008 and 2009 (Crawford et al., 2015). We found similar sedimentation rates in the upper sediments: 2.0-4.0 mm year −1 compared to 2.0-3.3 mm year −1 . Our modern PAR values are also in agreement with PAR values from the youngest sediments in Crawford et al. (2015).
The ultimate goal of this research was to assess whether PAR be used to predict distance-weighted biomass for major tree taxa in the Klamath area, and therefore generate models suitable for reconstruction of past biomass dynamics. The fact that contemporary pollen influx is a reasonably precise predictor of contemporary distance-weighted AGL at these sites suggests that PAR can be used to infer changes in plant biomass for these sites. But even with apparently statistically sound modern models, it may not be reasonable to apply the models for reconstruction in all contexts.
In an ideal situation, the calibration dataset would include sites with a wide range of population sizes of the main taxa to allow any time in the fossil record to be reconstructed. Our model had less skill in estimating low levels of forest biomass because we were unable to find lake sites that met our selection criteria and supported sparse forest cover. Other modern quantitative vegetation reconstruction models have been restricted at the upper end of the calibration. Trees growing in dense forest stands produce less pollen than an exposed tree in a field, which suggests that increased forest density could result in reduced net pollen productivity (Andersen, 1970;Faegri and Iverson, 1989;Feldman et al., 1999). For example, Blarquez et al. (2011) found that the relationship between needle accumulation rate and forest basal area tended to saturate above 40 m 2 ha −1 for conifer-dominated sites. However, despite the high biomass-density of the contemporary forest at our sites (Knight et al., 2020), there was no evidence of saturation in the PAR-biomass functions for the major taxa. Even at the maximum PAR values, the biomass values increase at pace following the log-linear fits (Figure 8).
Long-term PAR records from lakes in the area provide insight into time periods where our calibrated models will be able to capture past conditions. Comparable taxa-specific PAR values from lake sites in the region were only available for Pinus, and they suggest time periods of agreement with our Pinus PAR measurements and our total Pinus PAR-AGL model, which covers a range between 1500 and 11,000 grains cm −2 year −1 . For example, Briles et al. (2008) reported Pinus PAR between 2000 and 8000 grains cm −2 year −1 at Sanger Lake in the western Klamath Mountains over 15,000 years BP. Likewise, a 3000-year PAR record from Fish Lake (a lake also examined in this study) shows agreement with our total Pinus PAR range during some time periods. Fish Lake's record shows temporal variability in Pinus PAR values between 2000 and 9000 grains cm −2 year −1 during the last two hundred years (Crawford et al., 2015), which falls within our calibration. Lastly, total PAR values measured at eight lakes in the Klamath area since 15,000 BP range from 2000 to 15,000 grains cm −2 year −1 (Briles et al., 2011) and are similar in size to lakes in this study and have a dense surrounding forest, although they are located in the white-fir vegetation zone.
In addition to selecting a range of forest conditions, researchers undertaking similar efforts will need to consider the number of lakes needed for statistical soundness for the calibration. The seven lakes presented here appear to have been sufficient to build robust models in terms of low coefficient of variation (Table 3), but it may be difficult in other locations to find enough suitable lakes using consistent selection criteria. If reconstructions of continuous Holocene-length biomass records are sought, using a high number of lakes has the downside of great expense (from isotopic dating) and labor (from pollen counting), unless accurate automatic classification systems become widespread (Sevillano et al., 2020).
The calibration step we undertook required modern biomass data, which may be difficult to obtain empirically for a large number of lakes or in settings with challenging topography. For example, transects in this study ran 750 m from the shoreline, but steep topography and scree slopes occasionally prevented a complete survey. Because we studied small lakes and needed finely resolved biomass data, sparse inventory data with large geographic extent (e.g. FIA data) were not an appropriate substitute for field surveys. However, FIA data provided essential information regarding the basal area to biomass relationships for the common tree species in the region (Supplemental Table S8, available online).

Limitations of PAR and PVMs
Our results show the utility of calibrated PAR-AGL models for this study, and we have provided a robust process for including uncertainty in PAR-AGL models. However, PAR itself may vary in ways that reduce its value for pollen-based reconstructions in all landscapes. For instance, net pollen deposition can vary spatially and temporally if sediment focusing or pollen redeposition occurs. While studies investigating PAR from modern sedimentary records did not find that redeposition and sediment focusing affected PAR (Giesecke and Fontana, 2008;Seppä and Hicks, 2006), other studies have documented the influence of these factors on PAR (Davis et al., 1984;Matthias and Giesecke, 2014;Odgaard, 1993). Additionally, between-lake differences in PAR values can arise from differences in pollen taphonomy due to basin size or stream inflow (Davis, 1967). Pollen monitoring studies have illustrated another known issue with PAR: the amount of pollen produced can change year to year and is related to the weather conditions of the preceding year (Hicks, 2006). Lastly, one study has implied that PAR may depend on the net primary production of the pollen-producing taxa as well as overall plant biomass (Matthias and Giesecke, 2014). Without long-term pollen monitoring studies across different biomes and accompanying detailed biomass data, true data validation will not be possible.
Pollen transport in mountain environments has been studied in Europe through the European Pollen Monitoring Programme, but, to our knowledge, has not been studied in the mountains of western North America outside of the present work. Several pollen monitoring studies with transects running through multiple vegetation zones in mountainous areas tend to show that pollen from lower forest zones is quite abundant in upper zones, and this effect appears more pronounced when high altitude zones have lower productivity (e.g. the Rila Mountains in Bulgaria, Tonkov et al., 2001). Unlike mountain transect studies, our sites are all within one vegetation zone, therefore reducing the significance of these effects, and we are not studying tree line position. The Douglas-fir dominated conifer forest in the Klamath Mountain is a relatively high productivity zone, and such zones typically show less of an "uphill" effect that impacts tree-line pollen assemblages (e.g. Swiss Alps tree line study, Sjögren et al., 2008).
All PVMs, including PAR-biomass transfer functions, are based on assumptions that may not hold in a changing landscape. It must be assumed, for example, that taphonomic processes filtering pollen in lake sediments are constant over time and among lakes, unless taphonomic biases are precisely quantified (Allison and Bottjer, 2011). Using our method, quantitative biomass reconstruction would also assume that the relevant source area of pollen is constant over time. We estimated the aRSAP of our seven sites as well as the tRSAP of abundant taxa, but these may apply to a landscape arrangement which is unique in the last 3000 years. The present-day high PAR values of Pseudotsuga are not replicated in the fossil pollen record at any other time in three millennia (Crawford et al., 2015), suggesting that the dominance of shade tolerant Pseudotsuga is also not found elsewhere in this time period. Deep-time reconstructions from lakes in this study have shown large changes in vegetation composition due to climate, Native land-use, fire disturbances, and, in the last century, fire suppression. In response, we anticipate that the relevant source areas of pollen will expand and contract over time. Because the spatial patterns of past vegetation are usually unknown, it is difficult to estimate past relevant pollen sources areas. However, the Multiple Scenario Approach (MSA, Bunting and Middleton, 2009) offers insight on this issue. Under MSA, hypothetical landscapes are created via rules for plant placement and environmental parameters, and then pollen assemblages are simulated and compared to known pollen signals to identify probable past vegetation mosaics. Another experimental method to estimate past relevant pollen source area has been explored through modeling (Hellman et al., 2009) where regional vegetation composition and available pollen productivity estimates are available for multiple sites (Sugita, 2007b). Hellman et al.'s (2009) simulations suggest relatively robust aRSAP estimates of 1000-2500 m for small lakes under hypothetical landscapes from southern Sweden where natural and anthropogenic disturbances have occurred during the Holocene. Such simulations provide a means to test the potential robustness of aRSAP in the Klamath area.

Conclusion
Although methodologically challenging, calibrating PAR-biomass models is an important step towards quantitative reconstruction of past vegetation. Our calibration steps included estimating the spatial extent represented by the pollen system, comparing two estimates of the RSAP, and evaluating PAR-AGL models. We found comparable aRSAP and tRSAP estimates that aligned with expectations given the modern forest's dense, closed conditions. We also demonstrated that PARs of major tree taxa derived from lake sediments are linearly related to distance-weighted AGL, and our PAR-AGL dw models accurately reconstruct modern lake-surrounding biomass. According to PAR values from local and regional lakes sites, our modern models are broad enough to capture a range of forest structures over the last 15,000 years BP. We therefore conclude that our results prove the utility of calibrated PAR-AGL models for quantitative reconstruction of past vegetation.