GIS for Fun: My Digitized BBS Route

The North American Breeding Bird Survey (BBS) was meant to be a stratified random sample of the continent’s terrestrial bird populations, so we could monitor the pulse of populations and trends. When I went to the American Ornithologists’ U/COS conference (back when it was still called that) in 2015, I presented in a symposium dedicated to BBS. The opening talk was about the history of BBS, which helped give me some historical context about the data set. It added a dimension to my understanding of the time periods of ecological research. The data set was developed before landscape ecology really took hold in the U.S., and along with it an explicit awareness of spatial analysis.

The spatial data we have for the BBS is what I believe to be digitized route paths from old highlighted paper route maps. As such, it’s rife with errors: I’ve found routes in the layer of many variable lengths, but some egregiously too long. Being that it’s a volunteer effort, it’s at times hard to get clarification on what’s going on here. I’ve done and continue to do analyses based on these route paths, so their accuracy matters to me. Side note: if the United States Geological Survey – Patuxent would have me, I’d be happy to travel the country in something like the “Google street view car” and help digitize these routes during BBS season! Just putting that out there.

I decided to pull up my route in GIS to see what calculated stops would look like, by first placing stops every 0.5 mi. (probably the more correct method) and then placing 50 equally spaced stops along the route. Since the route is a little long, the 50 stops end up probably being more spread out than they should be. Also, as route runners know, it’s not an exact science, in that you might need to move a stop up to 0.1 mi. for safety concerns.

Anyway, when I synced this up to road data, I found that the routes were significantly off from the latest road path data, probably because the latest version is > 4 years old now. The road data I’m using is quite intensive, and I’m trying to figure out what to do next. How will I line up the paths, and will this little experiment even matter in the long run?

My 1st Breeding Bird Survey Route!

It feels appropriate that I made my 1st contribution to the BBS today, as a form of conservation patriotism! It’s one of the longest-running and most extensive citizen science efforts in the world, and I was happy to kick off my Independence Day contributing to this data set. The timing also comes on the heels of the death of its founder earlier this year, so I feel like I’m carrying the torch, along with thousands of other volunteers nationwide. (Not to mention, my participation was long overdue because my entire dissertation was based on it.)

Bayesian Hierarchical Model

These days, the Bayesian hierarchical modeling framework is more-or-less standard for analyzing BBS data. I wrote a collection of posts about gathering BBS and remote sensing data, summarizing and importing into R, and thereby getting the data frame in shape for an analysis. From there, I would clean up the work space such that only the data frame for the analysis is left.

rm(list= ls()[!(ls() %in% c('vars'))])

Here’s an  example of a formula (“test”), and how to fit it using the package R-INLA. In this model, in keeping with the example that I linked to that got us here, “MN” (mean % tree cover) is our variable of interest, and the rest of the variables are nuisance effects. In this model, year and rteobs are random effects (hyper-parameters), while yearfix, firstyear and MN are fixed effects (parameters).

test = abundance ~ f(Year,model="iid") + yearfix + f(rteobs, model="iid") + firstyear + MN
library(INLA)
model = inla(formula=test, data=vars, family="gpoisson", control.compute = list(waic=TRUE), verbose=TRUE)

The generalized Poisson distribution is used to incorporate over-dispersion, and I chose to calculate Watanabe-Akaike Information Criterion (WAIC) in case I was to compare this to another model. To examine the output once the model has stopped running…

summary(model)
names(model)

The latter shows all that was generated in this model run. One I find useful is summary.fitted.values, which are the modeled values of the response variable. If you want to predict values, you can add or replace response variable values in your data frame with NA.

Master Post: BBS 10-Stop Data

This one may not be as step-by-step, because the GIS layer of BBS routes I have may not have been published yet. Nonetheless, here’s what I did.

  1. Cleaning up the spatial data and splitting routes
  2. Assigning segment order once the routes are split
  3. Summarizing spatial covariates by segment
  4. Determining which segments can be considered the same
  5. Importing BBS and spatial data into R at the segment level

The final step looks similar to the route-level tutorial, but with pieceID being the variable to match on. Set “bird” to the AOU code of your species of interest.

dataperspecies <- subset(BBS[which(BBS$Aou == bird),]) 
vars$abundance <- dataperspecies$count[match(vars$pieceID, dataperspecies$pieceID)]
vars[is.na(vars)] <- 0

assess <- data.table(vars)
whattodrop <- assess[,list(abundance=sum(abundance)),by='rteno']
droprtes <- whattodrop$rteno[whattodrop$abundance==0]
perspp <- vars[!(vars$rteno %in% droprtes),]
perspp$Year <- as.integer(perspp$Year)
perspp$yearfix <- perspp$Year
perspp <- perspp[perspp$RunType > 0,]
#vars <- vars[vars$RPID < 102]
perspp <- perspp[c("Year","MN","firstyear","rteno","ObsN","abundance")]

Here is the full script: bbs_10stop_species

A Thought on Downscaled BBS

The BBS was designed to capture avian populations in natural areas, so if the land surrounding a portion of the route was developed, historically the route path was altered and given a new number. The problem is, that new numerical designation was not standardized (though often, it’s 100 + the original route number).

In wanting to maintain spatio-temporal continuity, I thought that the portions of the routes that were still spatially congruent should be considered the same segment. This serves my random effect of location, such that an observation isn’t considered to come from a different location just because the route number is different (if the “old” and “new” segments spatially coincide). I use “coincide” loosely, in that I considered original and +100 route segments to be the same if they intersected. You can measure the degree of overlap and incorporate that if you choose. I made a shape file of intersecting duplicated segments, and imported it into R to adjust the location designations.

repeats <- read.dbf("intersecting_duplicated_segment.dbf")
repeats$new_rteno <- repeats$rteno - 100
repeats$new_route <- paste(repeats$new_rteno,repeats$segment,sep="_")

Now, the newer segments have the same numbering as the original routes. I match the new segments with the original route numbers to the BBS data, and then replace the new route segment numbers with the original numbering.

BBS$new_route<-repeats[match(BBS$route_piece, repeats$route),"new_route"]
BBS$new_rteno<-repeats[match(BBS$route_piece, repeats$route),"new_rteno"]
BBS[!is.na(new_route)]$route_piece <- BBS[!is.na(new_route)]$new_route
BBS[!is.na(new_route)]$rteno <- as.integer(BBS[!is.na(new_route)]$new_rteno)

The new route segments that intersect the original routes now have the old route numbering, and thus counts from these segments appear in the code to to have come from the same routes as the original.

Here’s the full script: bbs_10stop

Down-scaling BBS

As my several recent posts have mentioned, I’m working with BBS spatial data (i.e. route paths and start points). I wanted to “downscale” by splitting routes into smaller segments and looking at data sub-route level. Specifically, I needed to split routes into fifths. I needed to know what order the segments were in from the start of the route to match the finer-scale data.

Here’s how to do it…

  1. Break up the accounted for routes (or maybe even raw data set?) into “good” and “bad” routes depending on if the start is nearest a segment for a matching route
  2. Work with “good starts” and “good routes” (the easiest!)
    1. identify the segments closest to the start point as the first segment of the route
      1. run near on the start points to find out which segment in the master file is closest
      2. use the near_FID to join with the FID of the route segment layer
      3. export the resultant start segments into their own layer (number them 1)
      4. remove join
    2. delete the start segments from the master file of all the broken up segments of the route (select by location: identical to)
    3. use select by location (touches the boundary of) to find the next closest line segment to the start in the master file, and number them 2
    4. repeat 2-4 for the remaining segments, and number them accordingly
  3. Work with bad starts: once the good routes are out of the data set, is the bad point now closest to a remaining matching segment?
    1. break up those newly matched lines
    2. put leftover points into yet another file
  4. Look at remaining points to see if any more lines can be put to use

Given the numbering scheme recommended in the steps above, your segments should be numbered 1-5 on each route. I buffered those segments and averaged tree cover within  each segment buffer. In order to match my segment numbering with the rest of my R code, I renamed the segments.

vcf_subroute <- read.dbf("landcover/bbs_segment_buffers.dbf")
vcf_subroute$interval <- paste("count",vcf_subroute$segment,"0",sep="")

The remainder of the code is similar to that described in the route-level MODIS post, just with the added level of the interval.

vcf_subroute <- vcf_subroute[,c("rteno","interval",grep("[[:digit:]]", names(vcf_subroute), perl=TRUE, value=TRUE))]
vcf_subroute <- melt(vcf_subroute,id=c("rteno","interval"))
vcf_subroute$variable <- as.character(vcf_subroute$variable)
vcf_subroute$Year <- str_extract(vcf_subroute$variable,"[[:digit:]]+")
vcf_subroute$var <- sapply(strsplit(vcf_subroute$variable, "[[:digit:]]+"),"[[",2)
vcf_subroute$variable <- NULL
vcf_subroute <- dcast(vcf_subroute,rteno + interval ~ var,fun.aggregate=mean)
vcf_subroute$rteno <- as.character(vcf_subroute$rteno)

The full script, to this point: bbs10vcf

Cleaning Up BBS Routes

When I downloaded the BBS route shape file (apologies for not linking, I’m not sure if it’s publicly available) I had some work to do. I wrote down the steps I used in cleaning up the layer, and then splitting into fifths.

  1. One line per route. Great! set aside. (Still a note of caution, some BBS routes are too long. When I asked about this, I was told they were digitization errors but weren’t immediately fixable.)
  2. More than one line per route: I needed to only be sure to dissolve segments with matching route identifiers that touched. This is where iterative spatial joins and dissolves came in…
    1. spatial join: what is touching each segment?
    2. do the route identifiers match? if so, keep those…get rid of the ones that don’t
    3. join field to get the target ID’s, which end up making “connectivity groups,” to dissolve by
  3. repeat steps in #2 until there’s only one poly-line per connectable (zero distance) route parts
  4. this leaves disconnected fragments of things that don’t touch (decide what to do with those on a case-by-case basis)

Now, you might want to look at the lengths of the resulting routes and examine ones that are suspiciously long, and decide what to do with them in your analysis.

  1. depending on the geometry of the lines, the nice “split line” tool may not work as expected. I made a topology to figure out where splits had gaps. ones that had gaps were removed from the set
  2. once split, was there mysteriously more or less than 5 segments per route? if so, toss
  3. for the good splits, I matched available good starts to determine order of the segments
  4. I salvaged the remaining “split-able” lines
Adventures in fixing the rest: I tried to be ambitious and work with the problem lines, to little avail. I used “trim lines” to get rid of the short spurs, and that seemed to have worked. Yet, it just didn’t fix all of them.

“Master Post” of BBS Tutorial

I made a series of posts of how to get BBS data and import it into R, as well as an example of predictor data for a model, and how to get that into R. Here I’ll collect it all into a step-by-step by linking to those posts.

  1. How to get BBS data
  2. Importing BBS data into R
  3. Getting predictor data (in my case, remote sensing imagery)
  4. Summarizing images for BBS routes
  5. Making a data frame for a single species analysis (zero-filling)

Here’s the total script at the end of all the posts: bbs_species

To get started, pick a Bird Conservation Region of interest, and a species. Take the BCR specification line out of comment (remove the #) and specify the number(s) of the BCR(s) within the empty vector. Set the “bird” variable to be the AOU code of your chosen species from the species list (if you choose from  that link, remember to remove the leading zero). For instance, to look at eastern wood-pewees in the Blue Ridge…

BBS <- BBS[BBS$BCR %in% c(28), ]
bird <- 4610

Another thought is how to handle nuisance effects. I consider route and observer to be linked, so that I concatenate them for a random effect.

vars$rteobs <- paste(vars$rteno,vars$ObsN,sep="_")

You now have a data frame of BBS data for a single species matched up with remotely sensed data (i.e. ready for an analysis). Half the battle is the data acquisition, cleaning and shaping! 🙂

Now try an example analysis.

BBS Data for a Single Species

BBS data is essentially aggregated point count data, that in its raw form has a row for each species observed at each route in each year. This means that if a species is not recorded in a  given year, it just doesn’t have a record. I need to add those zeros back in, such that if a species is missing from a route, it needs a row with a zero. For my purposes, it makes the most sense to consider routes only where a given species has been recorded at some point in the history of the route (because I’m supposing its presence/absence depends on some annual variation in a predictor variable).

Leading up to this post, we’ve been shaping and reshaping BBS data and predictor variables, and the last post left off with matching up some remote sensing data to BBS data. The variable “bird” is to be set to an AOU number corresponding to the species list we imported prior.

dataperspecies <- subset(BBS[which(BBS$Aou == bird),]) 
vars$abundance <- dataperspecies$count[match(vars$rteno, dataperspecies$rteno)]
vars[is.na(vars)] <- 0

This is to determine total counts of the species on routes, and to toss routes where the species has never occurred.

assess <- data.table(vars)
whattodrop <- assess[,list(abundance=sum(abundance)),by='rteno']
droprtes <- whattodrop$rteno[whattodrop$abundance==0]
vars <- vars[!(vars$rteno %in% droprtes),]
vars <- vars[vars$RunType > 0,]
vars <- vars[vars$RPID < 102]

Last but not least, keep only the columns that will be used in your analysis to minimize space used in the R work space. For my models, in addition to the response and predictor variables, I keep the route number, observer, and whether or not it’s the observer’s first year for consideration as nuisance effects.

Here’s the full script: bbs_species