RGEE on Ubuntu with your Conda environment

  1. Create a Google Earth Engine account: get to the place of where you can login to the coding console. If you use the package to help you get setup with an account, it will not take you through the final step(s). I went through this process while creating a new account, and though there is currently a bug with authentication, this was causing me more headaches than I was realizing.
  2. Setup a conda environment (I used miniconda):
    • create and activate your environment
    • install python
    • conda install numpy
    • conda install earthengine-api==0.1.370
  3. Install the following dependencies if you don’t already have them. You also might run into a bear of a problem with protobuf, protoc, libprotoc etc. If so, activate your conda environment and roll back protobuf to pip install protobuf==3.20.3
    • gdal-bin
    • libgdal-dev
    • libudunits2-dev
  4. Edit your ~/.Renviron file to include the following (or create it if it doesn’t exist):
    RETICULATE_PYTHON="path to your conda env python install" RETICULATE_PYTHON_ENV="path to your conda env"
  5. Install reticulate R package
  6. Install rgee R package
    • use install_ee_set_pyenv() with your environment settings as above
    • ee_Authenticate()

This should get you to the point of having rgee run off your own custom Python environment, with the downgraded version of the Earth Engine API required to authenticate at the time of this writing. A workaround for the bug in initializing GEE is starting your script as follows (bypassing ee_Initialize()):
ee$Initialize(project='your project name')

You will also have .Renviron entries for Earth Engine now. This final step is one I’m unsure of: if you haven’t manually created a legacy asset folder, you may from here be able to run ee_Initialize() and create one. I had created one, and so I got caught in a loop of it saying it already existed while wanting me to create one. So, in short, I’ve never really been able to successfully run ee_Initialize() and the package author is at a point of not even wanting to maintain it anymore. The problem here is that you don’t get this file generated: .config/earthengine/rgee_sessioninfo.txt

So, you can make your own structured as so:

“user” “drive_cre” “gcs_cre”

The path to your credential file is the one you created to authenticate earth engine, somewhere in that folder. (Save a copy elsewhere to to move back into that path, because I’m not sure the file persists between sessions.)

Remote Sensing for Birders

Or: Birding in the era of climate change

In my previous job as a researcher, I modeled hydrology of wetlands, including the playa lakes region. Visiting Kansas this weekend was my first time getting to spend time in that region since that project concluded, but sadly, I got to see the phenomenon we were testing: what drought on the landscape looks like. So, I embarked on a quest to find water with tools that are by and large accessible to birders.

I continue my evangelism of Google Earth Engine, which has somewhat quietly changed the game of remote sensing, though maybe more loudly in today’s research landscape: I was an early adopter when it first came out and swooned over the potential. Now, I imagine it has become a much more popular research tool, but I want the floodgates to open for hobbyists too. It is a truly remarkable game changer and uprooted how we always did things (e.g. during my PhD) to process and analyze remote sensing (e.g. satellite imagery) data. These days, you can pull from a library of images straight into your coding environment and mapping output (I still get goosebumps typing that). You can query a compendium of publicly available satellite images, write your code to analyze it, and display the output in one browser tab…oh, and did I mention the excellent documentation and gobs of available coding examples?

The Quarry

On this mission, I was specifically looking for Clark’s grebe (Aechmophorus clarkii): though their habitat requirements are flexible in migration, they tend to like big wetlands or lakes with deep, persistent water. These proclivities governed what I was able to do to identify suitable water bodies.

The Solution

I devised a script (below) to leverage Sentinel imagery to calculate normalized difference water index (NDWI) which is a well-known indicator of water on the landscape. You can copy the whole script right into your code editor and run it as-is, but I’ll walk through in general what it does in the remainder of the post. I also co-opted a fair number of code chunks from the Google Earth Engine examples, so you’ll find some of these functions (e.g. the cloud mask function) elsewhere in their documentation. I commented out some map layers as well, but you can play with adding them back in.


First, I filtered the imagery down to the state of Kansas to make the analysis relevant and manageable. Given that playas are filled by rainwater and the problem was drought, I could be a little more focused in my image gathering: I looked at the 2 weeks prior to my arrival and kept images with < 30% cloud cover. (If you were looking at a rainier area or season, you would have to expand your time window and possibly adjust your cloud filter threshold to make cloudless composites of your area.) After masking out cloudy pixels, I calculated NDWI for each image (taken at a specific time and over a specific footprint).

I took the median over time for all the footprints to create one composite image. Of course, how you average and filter needs to be considered as you vary your time window: the longer your time window, the more possibility for e.g. a signal relevant to a given time period to be swamped out by conditions across the time span. I chose a threshold of only keeping pixels greater than or equal to 0.1 NDWI because this becomes fairly reliable for representing open water. I was also looking for a species that likes open and deeper water, so a conservative threshold represented suitable water bodies.

There’s an optional resource in this example that needs to be requested if you want to use it (i.e. is not freely downloadable): spatial range map files for birds of the world. If your data request is approved, you can pull out the files for your species of interest. I uploaded the shape file for my species of interest as an asset (clark) and imported it into my script by clicking the arrow that appears when you hover over it in the Assets tab. I just used it to display the range boundaries on the map. I commented out all of those lines in the script below, so if you don’t have a range map shape file (i.e. a polygon) to put in its place, just keep the script below as is:

var table = ee.FeatureCollection("TIGER/2018/States"),
    //clark = ee.FeatureCollection("users/gorzo/clark"),
    Sentinel = ee.ImageCollection("COPERNICUS/S2_HARMONIZED");

 * Function to mask clouds using the Sentinel-2 QA band
 * @param {ee.Image} image Sentinel-2 image
 * @return {ee.Image} cloud masked Sentinel-2 image
function maskS2clouds(image) {
  var qa = image.select('QA60');

  // Bits 10 and 11 are clouds and cirrus, respectively.
  var cloudBitMask = 1 << 10;
  var cirrusBitMask = 1 << 11;

  // Both flags should be set to zero, indicating clear conditions.
  var mask = qa.bitwiseAnd(cloudBitMask).eq(0)

  return image.updateMask(mask).divide(10000);
var spring2023 = spatialFiltered.filterDate('2023-04-01','2023-04-15')
                  // Pre-filter to get less cloudy granules.
                  .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 30))

var rgbVis = {
  min: 0.0,
  max: 0.3,
  bands: ['B4', 'B3', 'B2'],

/* Map.addLayer(spring2023.median(), rgbVis, 'RGB');
var empty = ee.Image().byte();
var outline = empty.paint({
  featureCollection: clark,
  color: 1,
  width: 3
Map.addLayer(outline, {palette: 'purple'}, 'AOI'); */

// Function to calculate and add an NDWI band
var addNDWI = function(image) {
return image.addBands(image.normalizedDifference(['B3', 'B8']));

// Add NDWI band to image collection
var spring2023 = spring2023.map(addNDWI);

var pre_NDWI = spring2023.select(['nd'],['NDWI']);
var NDWImg = pre_NDWI.median();
var ndwiMasked = NDWImg.gte(0.1);

ndwiMasked = ndwiMasked.updateMask(ndwiMasked.neq(0));

var vectors = ndwiMasked.reduceToVectors({
  geometry: ks,
  crs: NDWImg.projection(),
  scale: 234,
  geometryType: 'centroid',
  eightConnected: true,
Map.addLayer(ndwiMasked, {palette:['ff0000']});

  collection: vectors,
  fileFormat: 'KML'

In the example map visualization left in the code, water bodies identified by this analysis are in red, which helped me pick them out as I was poring over my laptop in power saving screen mode on the go (yes, I was fiddling with this on the plane…and Southwest Wi-Fi had enough juice for me to play!). So, how do you then find these in the real world, assuming you don’t have your laptop propped on the dashboard of your rental car (I would never…)? I turned my raster layer (i.e. grid data, in this case the pixels of an image) into a vector layer (commonly known as shapes). The vectors variable in the code block is the output of a function that connects those water pixels into features by an 8 neighbor rule, and in this case, outputs the geographic centers of those features. At the end of the script, I commented the code out to save a KML of those points to your Google drive, but you could export them into a variety of formats. From there, you can get creative with optimizing a route between them (maybe that will yet be the subject of a later post).

Herein lies a bit of a limitation: that’s a costly operation, so you can only run something like that at a given scale, and in my case I found the magic number to be 166. So, features below that size are not identified (zoom into the map to see red water bodies not marked by a corresponding dot). That was sort of OK with me though, because the species I was looking for likes bigger water bodies. It would be limiting for looking for smaller wetlands though.

So how did it go? Sadly in all of my trip prep, I started this endeavor a little late and had to dust off the cobwebs in transit. I ended up trading off some geek out time for exploration. But, here was an intriguing looking wetland I found on the output image, and stopped at on my way back (for what it’s worth, too small to earn a point on the statewide version of the analysis I ran).

Maybe in keeping with the small size, no Clark’s grebes (though their stopover habitat requirements are much more flexible in migration) but plenty of cool waterfowl piled up at this little wetland. Now, armed with some tools locked and loaded, I’m ready for my next trip out west!

GDAL Rasterize

In this example, I had a shape file I wanted to use as a mask. However, the features were those I wanted to keep. In other words, I wanted to keep all the pixels that had values (i.e. not “no data”) that intersected with the shapes, and set those outside the shapes to 0. Then, I needed to set the no data value to 0, and only keep “clumps” of pixels w/values that were 88 or more.

gdal_rasterize -i -burn 0 mask.shp input.tif
gdal_translate -a_nodata 0 input.tif output.tif
gdal_sieve.py -st 88 -8 output.tif

Voila! (The mask can also be a geo package, with the layer specified using the -l option.)

Ratified Raster Brick

This is a script that takes selected fields (selected_fields) from a shape file and turns each into a raster. If the field is character-based categories, it ratifies the raster. At the end, you have a list of rasters. You can then bind these into a brick or perform other list-wise operations on them.

# The input file geodatabase
fgdb <- "land.gdb"

# Read the feature class
fc <- st_read(dsn=fgdb,layer="all_land")
forest <- read.csv("IDs_to_keep.csv")
world.map <- fc[fc$gridcode %in% forest$gridcode,]

ext <- extent(world.map)
gridsize <- 30
r <- raster(ext, res=gridsize)

selected_fields <- c("Own", "Cover", "Secondary")

test <- lapply(selected_fields, function(x) {
   myraster <- fasterize(world.map, r, field=x)
   names(myraster) <- x
   if(is.factor(world.map[,x][[1]])) {
      field <- x
      myraster <- ratify(myraster)
      rat <- levels(myraster)[[1]]
      rat[[field]] <- levels(world.map[,x][[1]])[rat$ID]
   levels(myraster) <- rat}

Weekend GIS Fun: Part 3

I had been referencing stops from the original north shore geology field trip guide, until I found an updated version of the guide in a book (embedded below p.132-144)!

In this chapter, he lists the military grid reference system (MGRS) coordinates (Green 2011) so I entered them into a spreadsheet. I then downloaded it as a comma separated values (CSV) sheet to convert it to a spatial object. The main highlight of this script is how I dealt with different MGRS zones for points in the same data frame:

plot_locations_HARV <- read.csv("geology.csv")
co <- st_read(dsn="counties_in_minnesota", layer="mn_county_boundaries")
pts <- lapply(1:nrow(plot_locations_HARV),function(x){
   point <- plot_locations_HARV[x,]
   epsg <- ifelse(as.integer(gsub("[^[:digit:]]","",point$utm))==15,2027,2028)
   pt <- st_as_sf(point, coords = c("E", "N"))
   this_crs <- st_crs(paste0("+init=epsg:",epsg))
   st_crs(pt) <- this_crs
   pt_project <- st_transform(pt, st_crs(co))
df <- do.call(rbind, pts)
st_write(df, dsn="geology", layer="geology", driver= "ESRI Shapefile")

I assigned the numeric part of the MGRS zone as the Universal Transverse Mercator (UTM) zone. From there, I did my best assigning a coordinate reference system (CRS) to the points accordingly. I ultimately wanted it in the same projection as the MN county layer.

I’ll resume my efforts to “ground truth” the locations as soon as the conditions are right, and to add some additional points corresponding to details in the stop descriptions. If next weekend doesn’t allow the ice to thaw, that will probably mean in the spring!

Literature Cited

Green, John C., et al. “The North Shore Volcanic Group: Mesoproterozoic plateau volcanic rocks of the Midcontinent Rift system in northeastern Minnesota.” Archean to Anthropocene: Field Guides to the Geology of the Mid-Continent of North America: Geological Society of America Field Guide 24 (2011): 121-146.

Latest Error

I’m getting this error after a 137

/panfs/roc/msisoft/R/3.6.0/lib64/R/bin/BATCH: line 60: 30473 Killed ${R_HOME}/bin/R -f ${in} ${opts} ${R_BATCH_OPTIONS} > ${out} 2>&1

It ends on this line…

fr <- sapply(1:nrow(forest),function(y) fasterize(forest[y,], cr[[1]]))

Since smaller memory processes run without error, I can only assume that storing a vector of larger cropped rasters is the problem. So, onto a velox-based solution…

Best Practices for Big Polygon Data Sets

Some common, useful formats for storing this type of information…

  • CSV
  • GeoJSON/TopoJSON
  • GML
  • KML
  • MySQL/SQLite/SpatiaLite/PostGIS
  • NetCDF
  • SVG
  • VRT
  • ESRI shape file
  • ESRI feature class

It can also be advantageous to abstract data in memory to work with it. I had never heard of GMT or VPF until now but I’m curious about them! A colleague suggested GPKG so I have been looking into that format. I am also reading about the INTERLIS and OGC standards.

I’ve seen advice that database formats are better than file formats for spatial data, and this fits well with another project I’m working on. So, I think I will invest in learning about spatial databases.

GDAL Warp Adventure

I’m working with a colleague on processing a raster, in order to define patches by connectivity rules. The input raster was a habitat suitability model, where…

  • 1 = suitable habitat
  • 0 = unsuitable habitat
  • 255 = background “no data” value

So, we needed to ultimately make the “1s” into patches. To do this, we needed to set all of the other values to “no data.” It looked something like this…

gdalwarp -srcnodata 255 -dstnodata 0 input.tif output.tif
gdal_translate -a_nodata 0 output.tif unary.tif

In the 1st line, we changed all the 255’s to 0’s. In the 2nd line, we changed all the 0’s to “no data.” So, the only data value left in the file was 1.