Best Approach for Large Area Zonal Stats

Posted on October 28, 2019October 31, 2019 by Jess

The best solution I found was to chop up the state by county, crop the raster brick to that extent, and import it as a velox object. Then, I used velox extract to get the values.

GDAL Warp Adventure

Posted on October 11, 2019October 25, 2019 by Jess

I’m working with a colleague on processing a raster, in order to define patches by connectivity rules. The input raster was a habitat suitability model, where…

1 = suitable habitat
0 = unsuitable habitat
255 = background “no data” value

So, we needed to ultimately make the “1s” into patches. To do this, we needed to set all of the other values to “no data.” It looked something like this…

gdalwarp -srcnodata 255 -dstnodata 0 input.tif output.tif
gdal_translate -a_nodata 0 output.tif unary.tif

In the 1st line, we changed all the 255’s to 0’s. In the 2nd line, we changed all the 0’s to “no data.” So, the only data value left in the file was 1.

GDAL Polygonize in Action!

Posted on September 3, 2019September 6, 2019 by Jess

#!/bin/bash
for VARIABLE in ./test_export/*.tif; do
   gdal_translate -of GTiff -a_nodata 0 "$VARIABLE" "PPR_no/$(basename "$VARIABLE" .tif).tif"
   gdal_sieve.py -st 2 -8 "PPR_no/$(basename "$VARIABLE" .tif).tif"
   gdal_polygonize.py -8 "PPR_no/$(basename "$VARIABLE" .tif).tif" -f "ESRI Shapefile" "shapefile/$(basename "$VARIABLE" .tif).shp"
   rm "PPR_no/$(basename "$VARIABLE" .tif).tif"
   ogr2ogr -f "ESRI Shapefile" -progress "shapefiles/$(basename "$VARIABLE" .tif).shp" "shapefile/$(basename "$VARIABLE" .tif).shp" -t_srs "EPSG:5070"
done

listofmodisdays=`ls shapefiles/*.shp | cut -d/ -f2 | cut -d- -f1 | uniq`

for i in $listofmodisdays; do
   file="./final/$i.shp"
   for y in $(ls shapefiles/$i*.shp); do
      if [ -f "$file" ]; then
         ogr2ogr -f "ESRI Shapefile" -update -append $file $y -nln $i
      else
         ogr2ogr -f "ESRI Shapefile" $file $y
      fi
   done
done

Parallel Raster Extraction

Posted on August 16, 2019August 20, 2019 by Jess

This is my job submission script for the raster extraction.

#!/bin/bash -l
#PBS -l walltime=48:00:00,nodes=1:ppn=24,mem=61gb
#PBS -m abe
#PBS -M myaddress@email.com
module load gdal/2.0.1
module load proj
module load R
module load ompi
module load udunits
module load geos

export OMP_NUM_THREADS=1
mpirun -v -hostfile $PBS_NODEFILE -np 1 R --slave CMD BATCH ./extract.R /dev/stdout

This is the R script.

library(raster)
library(rgdal)
library(Rmpi)
library(sf)
all <- stack("all.tif")
forest <- st_read(dsn="./forest",layer="forest_simple")

beginCluster(n=23,type="MPI")
raster_cells <- extract(all,forest)
endCluster()

save(raster_cells,file="raster_vals.RData")

R Markdown

Posted on August 15, 2019August 16, 2019 by Jess

https://bookdown.org/yihui/rmarkdown/word-document.html

Writing a MS-Word document using R (with as little overhead as possible)

https://warwick.ac.uk/fac/sci/wdsi/events/vacationschool2016/for-participants/materials/knitr.pdf

Tables from R into Word

https://rdrr.io/github/crsh/papaja/man/apa_table.html

Efficient Zonal Statistics

Posted on July 29, 2019July 31, 2019 by Jess

https://dl.acm.org/citation.cfm?id=3274985

https://pdfs.semanticscholar.org/3425/a9ba03ba65cc9e9e0a0c1b73a8359e6772ac.pdf

https://www.perrygeo.com/

http://brian-mcgill-4.ums.maine.edu/postgis_zonal.pdf

http://www-cs.ccny.cuny.edu/~jzhang/papers/zonalstat_tr.pdf

http://www.guru-gis.net/efficient-zonal-statistics-using-r-and-gdal/

https://robbymarrotte.weebly.com/blog/faster-zonal-statistics

Super-computing?

Posted on June 10, 2019June 10, 2019 by Jess

Here’s an interesting note: just to try a parallel run with multiple nodes, I requested 9 nodes x 24 cores for 48 hours and got it (maybe because everyone is clearing out their jobs before the shutdown). However, I am just running with a small subset; it scheduled last night and is still running! It finished faster single-core. So, this might be a window into whatever is happening with the parallel runs.

Then, even running parallel w/in a single node, I’m getting that MPI job killed error again, but interestingly only for 2 of the 3 jobs…once again, points to finnicky hardware or connections. My colleague was able to do the same task and not get any errors.

How do I Classify This?

Posted on May 29, 2019June 1, 2019 by Jess

I’m going to have an output raster brick, and I have to classify it into 2 classes based on thresholds for multiple criteria.

https://pro.arcgis.com/en/pro-app/help/mapping/layer-properties/data-classification-methods.htm

Super-computing

Posted on May 28, 2019June 8, 2019 by Jess

So apparently…

“…Bootstrapping and Markov Chain Monte Carlo, can be [sped] up significantly by using several connected computers in parallel.” – snow simplified

Looks like I missed the boat on model fitting. Currently, I’m trying to use a MPI cluster implemented via snow (technically, clusterR). Here’s my latest error:

Child job 2 terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
mpirun noticed that process rank 21 with PID 0 on node cn0195 exited on signal 9 (Killed).

IMO this points back my personal suspect: the memory error.

Predicting Big Raster Stacks w/HPC

Posted on May 7, 2019May 16, 2019 by Jess

Here’s the latest of what I’m working on…

Fitting models to predict FIA metrics from summarized LiDAR variables
1. separate models for each NPC system
2. boral
Validating those models
Using those models to predict the values of FIA metrics across the state

The last objective, given the fine scale of summarized LiDAR data and the broad scale of analysis, is going to be computationally intensive. So, I’m trying to make that work as best I can by using the HPC system.