python mask netcdf data using shapefile

2024/10/3 12:25:37

I am using the following packages:

import pandas as pd
import numpy as np
import xarray as xr
import geopandas as gpd

I have the following objects storing data:

print(precip_da)Out[]:<xarray.DataArray 'precip' (time: 13665, latitude: 200, longitude: 220)>[601260000 values with dtype=float32]Coordinates:* longitude  (longitude) float32 35.024994 35.074997 35.125 35.175003 ...* latitude   (latitude) float32 5.0249977 5.074997 5.125 5.174999 ...* time       (time) datetime64[ns] 1981-01-01 1981-01-02 1981-01-03 ...Attributes:standard_name:       convective precipitation ratelong_name:           Climate Hazards group InfraRed Precipitation with St...units:               mm/daytime_step:           daygeostatial_lat_min:  -50.0geostatial_lat_max:  50.0geostatial_lon_min:  -180.0geostatial_lon_max:  180.0

This looks as follows:

precip_da.mean(dim="time").plot()

Mean precipitation over NE Ethiopia

I have my shapefile as a geopandas.GeoDataFrame which represents a polygon.

awash = gpd.read_file(shp_dir)awash
Out[]:OID_         Name      FolderPath  SymbolID  AltMode Base  Clamped Extruded  Snippet PopupInfo Shape_Leng  Shape_Area  geometry
0     0 Awash_Basin Awash_Basin.kml         0        0  0.0       -1        0     None      None  30.180944    9.411263  POLYGON Z ((41.78939511000004 11.5539922500000...

Which looks as follows:

awash.plot()

Region shapefile stored as <code>geopandas.GeoDataFrame</code>

Plotted one on top of the other they look like this:

ax = awash.plot(alpha=0.2, color='black')
precip_da.mean(dim="time").plot(ax=ax,zorder=-1)

Awash Region superimposed on precipitation data

My question is, how do I mask the xarray.DataArray by checking if the lat-lon points lie INSIDE the shapefile stored as a geopandas.GeoDataFrame?

 So I want ONLY the precipitation values (mm/day) which fall INSIDE that shapefile.

I want to do something like the following:

masked_precip = precip_da.within(awash)

OR

masked_precip = precip_da.loc[precip_da.isin(awash)]

EDIT 1

I have thought about using the rasterio.mask module but I don't know what format the input data needs to be. It sounds as if it does exactly the right thing:

"Creates a masked or filled array using input shapes. Pixels are masked or set to nodata outside the input shapes"

Reposted from GIS Stack Exchange here

Answer

This is the current working solution that I have taken from this gist. This is Stephan Hoyer's answer to a github issue for the xarray project.

On top of the other packages above both affine and rasterio are required

from rasterio import features
from affine import Affinedef transform_from_latlon(lat, lon):""" input 1D array of lat / lon and output an Affine transformation"""lat = np.asarray(lat)lon = np.asarray(lon)trans = Affine.translation(lon[0], lat[0])scale = Affine.scale(lon[1] - lon[0], lat[1] - lat[0])return trans * scaledef rasterize(shapes, coords, latitude='latitude', longitude='longitude',fill=np.nan, **kwargs):"""Rasterize a list of (geometry, fill_value) tuples onto the givenxray coordinates. This only works for 1d latitude and longitudearrays.usage:-----1. read shapefile to geopandas.GeoDataFrame`states = gpd.read_file(shp_dir+shp_file)`2. encode the different shapefiles that capture those lat-lons as differentnumbers i.e. 0.0, 1.0 ... and otherwise np.nan`shapes = (zip(states.geometry, range(len(states))))`3. Assign this to a new coord in your original xarray.DataArray`ds['states'] = rasterize(shapes, ds.coords, longitude='X', latitude='Y')`arguments:---------: **kwargs (dict): passed to `rasterio.rasterize` functionattrs:-----:transform (affine.Affine): how to translate from latlon to ...?:raster (numpy.ndarray): use rasterio.features.rasterize fill the valuesoutside the .shp file with np.nan:spatial_coords (dict): dictionary of {"X":xr.DataArray, "Y":xr.DataArray()}with "X", "Y" as keys, and xr.DataArray as valuesreturns:-------:(xr.DataArray): DataArray with `values` of nan for points outside shapefileand coords `Y` = latitude, 'X' = longitude."""transform = transform_from_latlon(coords[latitude], coords[longitude])out_shape = (len(coords[latitude]), len(coords[longitude]))raster = features.rasterize(shapes, out_shape=out_shape,fill=fill, transform=transform,dtype=float, **kwargs)spatial_coords = {latitude: coords[latitude], longitude: coords[longitude]}return xr.DataArray(raster, coords=spatial_coords, dims=(latitude, longitude))def add_shape_coord_from_data_array(xr_da, shp_path, coord_name):""" Create a new coord for the xr_da indicating whether or not it is inside the shapefileCreates a new coord - "coord_name" which will have integer valuesused to subset xr_da for plotting / analysis/Usage:-----precip_da = add_shape_coord_from_data_array(precip_da, "awash.shp", "awash")awash_da = precip_da.where(precip_da.awash==0, other=np.nan) """# 1. read in shapefileshp_gpd = gpd.read_file(shp_path)# 2. create a list of tuples (shapely.geometry, id)#    this allows for many different polygons within a .shp file (e.g. States of US)shapes = [(shape, n) for n, shape in enumerate(shp_gpd.geometry)]# 3. create a new coord in the xr_da which will be set to the id in `shapes`xr_da[coord_name] = rasterize(shapes, xr_da.coords, longitude='longitude', latitude='latitude')return xr_da

It can be implemented as follows:

precip_da = add_shape_coord_from_data_array(precip_da, shp_dir, "awash")
awash_da = precip_da.where(precip_da.awash==0, other=np.nan)
awash_da.mean(dim="time").plot()

The mean rainfall just in the Awash Basin of Ethiopia

https://en.xdnf.cn/q/70722.html

Related Q&A

Whats a good general way to look SQLAlchemy transactions, complete with authenticated user, etc?

Im using SQLAlchemys declarative extension. Id like all changes to tables logs, including changes in many-to-many relationships (mapping tables). Each table should have a separate "log" table…

OpenCV - Tilted camera and triangulation landmark for stereo vision

I am using a stereo system and so I am trying to get world coordinates of some points by triangulation.My cameras present an angle, the Z axis direction (direction of the depth) is not normal to my sur…

Node.jss python child script outputting on finish, not real time

I am new to node.js and socket.io and I am trying to write a small server that will update a webpage based on python output. Eventually this will be used for a temperature sensor so for now I have a du…

lambda function returning the key value for use in defaultdict

The function collections.defaultdict returns a default value that can be defined by a lambda function of my own making if the key is absent from my dictionary.Now, I wish my defaultdict to return the u…

Calling Matlab function from python

I have one project in which I have one one matlab code which I have to run tho Django. I tried installing Mlabwrap ..But it gives me following error.Traceback (most recent call last): File "<st…

Suds ignoring proxy setting

Im trying to use the salesforce-python-toolkit to make web services calls to the Salesforce API, however Im having trouble getting the client to go through a proxy. Since the toolkit is based on top of…

CSV to JSON script

I took this script from here: import csv from itertools import izip f = open( /django/sw2/wkw2/csvtest1.csv, r ) reader = csv.reader( f ) keys = ( "firm_url", "firm_name", "fir…

Accessing an ALREADY running process, with Python

Question: Is there a way, using Python, to access the stdout of a running process? This process has not been started by Python.Context: There is a program called mayabatch, that renders out images fro…

sum up two pandas dataframes with different indexes element by element

I have two pandas dataframes, say df1 and df2, of some size each but with different indexes and I would like to sum up the two dataframes element by element. I provide you an easy example to better und…

Urwid: make cursor invisible

Im using urwid, which is a Python "framework" for designing terminal user interfaces in ncurses. Theres one thing though that Im not able to do in urwid that was easy in curses - make the cur…