Brief demonstration of ncompare
: to compare the structure, groups, variables, and attributes of two netCDF files"¶
Installation instructions for ncompare
can be found in either of these locations:
ncompare
's command line arguments, provided by the --help
description¶
✍️ Syntax Note: Commands preceeded by an exclamation point "!"
(which is needed to run shell commands in a Jupyter notebook) can be run from a terminal.
In a shell/terminal, the exclamation point should not be used.
! ncompare --help
usage: ncompare [-h] [-v COMPARISON_VAR_NAME] [-g COMPARISON_VAR_GROUP] [--only-diffs] [--file-text FILE_TEXT] [--file-csv FILE_CSV] [--file-xlsx FILE_XLSX] [--no-color] [--show-attributes] [--show-chunks] [--column-widths COLUMN_WIDTHS COLUMN_WIDTHS COLUMN_WIDTHS] [--version] nc_a nc_b Compare the variables contained within two different NetCDF datasets positional arguments: nc_a First NetCDF file nc_b First NetCDF file options: -h, --help show this help message and exit -v COMPARISON_VAR_NAME, --comparison_var_name COMPARISON_VAR_NAME Comparison variable name -g COMPARISON_VAR_GROUP, --comparison_var_group COMPARISON_VAR_GROUP Comparison variable group --only-diffs Only display variables and attributes that are different --file-text FILE_TEXT A text file to which the output will be written. --file-csv FILE_CSV A csv (comma separated values) file to which the output will be written. --file-xlsx FILE_XLSX An Excel file to which the output will be written. --no-color Turn off all colorized output --show-attributes Include variable attributes in comparison --show-chunks Include chunk sizes in the table that compares variables --column-widths COLUMN_WIDTHS COLUMN_WIDTHS COLUMN_WIDTHS Width, in number of characters, of the three columns in the comparison report --version Show the current version.
Example 1: Two netCDF files with the same groups, variables, and attributes¶
Data files are first defined. The examples here rely on three files: two from NOAA National Centers of Environmental Information's (NCEI) (a) Global Precipitation Climatology Project (GPCP) Climate Data Record (CDR), Monthly V2.3 and one from the (b) Climate Data Record (CDR) of Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks (PERSIANN-CDR), Version 1 Revision 1) (a daily quasi-global precipitation product), accessible via this GPCP catalog and this PERSIANN catalog:
- https://www.ncei.noaa.gov/thredds/catalog/cdr/gpcp_final/2023/catalog.html?dataset=cdr_gpcp_final/2023/gpcp_v02r03_monthly_d202301_c20230411.nc
- https://www.ncei.noaa.gov/thredds/catalog/cdr/gpcp_final/2023/catalog.html?dataset=cdr_gpcp_final/2023/gpcp_v02r03_monthly_d202302_c20230505.nc
- https://www.ncei.noaa.gov/thredds/fileServer/cdr/persiann/2023/PERSIANN-CDR_v01r01_20230419_c20231030.nc
from pathlib import Path
file_urls = [
"https://www.ncei.noaa.gov/thredds/fileServer/cdr/gpcp_final/2023/gpcp_v02r03_monthly_d202301_c20230411.nc",
"https://www.ncei.noaa.gov/thredds/fileServer/cdr/gpcp_final/2023/gpcp_v02r03_monthly_d202302_c20230505.nc",
"https://www.ncei.noaa.gov/thredds/fileServer/cdr/persiann/2023/PERSIANN-CDR_v01r01_20230419_c20231030.nc",
]
file_names = [Path(url).name for url in file_urls]
To download these files (e.g., for the first time running this notebook), run the following:
import requests
for url, filename in zip(file_urls, file_names):
r = requests.get(url, allow_redirects=True)
open(filename, 'wb').write(r.content)
Next, we pass the two filepaths to ncompare
, and any differences would be printed in red. In this case, there are no differences; therefore, all of the variables are printed in black.
✍️ Syntax Note: the curly brackets, "{" and "}", that follow are simply a way to substitute python variables into a shell command.
In a shell/terminal, one can just write out the full arguments, separated by spaces.
For example, the following command would be run at the terminal as ncompare notebook_example_data/MOP03JM-202205-L3V95.6.3.he5 notebook_example_data/MOP03JM-202205-L3V95.9.3.he5
✍️ ncompare
Options Note: the --column-widths 33 26 26
arguments are optional, and they are being used here to shrink the columns width-wise from their defaults to a size that fits better in the GitHub notebook renderer.
! ncompare --column-widths 33 26 26 {file_names[0]} {file_names[1]}
File A: gpcp_v02r03_monthly_d202301_c20230411.nc File B: gpcp_v02r03_monthly_d202302_c20230505.nc Root-level Dimensions: Are all items the same? ---> True. [('latitude', 72), ('longitude', 144), ('nv', 2), ('time', 1)] Root-level Groups: Are all items the same? ---> True. (No items exist.) No variable group selected for comparison. Skipping.. All variables: File A File B All Variables - -------------------------- -------------------------- GROUP #00 -------------------------/ -------------------------/ num variables in group: 8 8 - -------------------------- -------------------------- -----VARIABLE-----: lat_bounds lat_bounds dtype: float32 float32 shape: (72, 2) (72, 2) -----VARIABLE-----: latitude latitude dtype: float32 float32 shape: (72,) (72,) -----VARIABLE-----: lon_bounds lon_bounds dtype: float32 float32 shape: (144, 2) (144, 2) -----VARIABLE-----: longitude longitude dtype: float32 float32 shape: (144,) (144,) -----VARIABLE-----: precip precip dtype: float32 float32 shape: (1, 72, 144) (1, 72, 144) -----VARIABLE-----: precip_error precip_error dtype: float32 float32 shape: (1, 72, 144) (1, 72, 144) -----VARIABLE-----: time time dtype: float32 float32 shape: (1,) (1,) -----VARIABLE-----: time_bounds time_bounds dtype: float32 float32 shape: (1, 2) (1, 2) - -------------------------- -------------------------- Total number of shared items: 8 8 Total number of non-shared items: 0 0 Done.
Example 2: Two netCDF files with different groups, variables, and attributes¶
! ncompare --column-widths 33 30 30 {file_names[0]} {file_names[2]}
File A: gpcp_v02r03_monthly_d202301_c20230411.nc File B: PERSIANN-CDR_v01r01_20230419_c20231030.nc Root-level Dimensions: /usr/local/Caskroom/miniconda/base/envs/ncompare-jupyter-example/lib/python3.12/site-packages/xarray/conventions.py:428: SerializationWarning: variable 'precipitation' has multiple fill values {-9999.0, -1.0}, decoding all values to NaN. new_vars[k] = decode_cf_variable( Are all items the same? ---> False. (2 items are shared, out of 6 total.) Which items are different? File A File B #00 ------------------------------ ------------------('lat', 480) #01 --------------('latitude', 72) ------------------------------ #02 ------------------------------ -----------------('lon', 1440) #03 ------------('longitude', 144) ------------------------------ #04 ---------------------('nv', 2) ---------------------('nv', 2) #05 -------------------('time', 1) -------------------('time', 1) Number of non-shared items: 2 2 Root-level Groups: Are all items the same? ---> True. (No items exist.) No variable group selected for comparison. Skipping.. All variables: File A File B All Variables - ------------------------------ ------------------------------ GROUP #00 -----------------------------/ -----------------------------/ num variables in group: 8 6 - ------------------------------ ------------------------------ -----VARIABLE-----: lat dtype: float32 shape: (480,) -----VARIABLE-----: lat_bnds dtype: float32 shape: (480, 2) -----VARIABLE-----: lat_bounds dtype: float32 shape: (72, 2) -----VARIABLE-----: latitude dtype: float32 shape: (72,) -----VARIABLE-----: lon dtype: float32 shape: (1440,) -----VARIABLE-----: lon_bnds dtype: float32 shape: (1440, 2) -----VARIABLE-----: lon_bounds dtype: float32 shape: (144, 2) -----VARIABLE-----: longitude dtype: float32 shape: (144,) -----VARIABLE-----: precip dtype: float32 shape: (1, 72, 144) -----VARIABLE-----: precip_error dtype: float32 shape: (1, 72, 144) -----VARIABLE-----: precipitation dtype: float32 shape: (1, 1440, 480) -----VARIABLE-----: time time dtype: float32 int32 shape: (1,) (1,) -----VARIABLE-----: time_bounds dtype: float32 shape: (1, 2) - ------------------------------ ------------------------------ Total number of shared items: 1 1 Total number of non-shared items: 7 5 Done.
More file details can be examined by using the --show-attributes
and --show-chunks
options¶
! ncompare --show-attributes --show-chunks --column-widths 33 30 30 {file_names[0]} {file_names[2]}
File A: gpcp_v02r03_monthly_d202301_c20230411.nc File B: PERSIANN-CDR_v01r01_20230419_c20231030.nc Root-level Dimensions: /usr/local/Caskroom/miniconda/base/envs/ncompare-jupyter-example/lib/python3.12/site-packages/xarray/conventions.py:428: SerializationWarning: variable 'precipitation' has multiple fill values {-9999.0, -1.0}, decoding all values to NaN. new_vars[k] = decode_cf_variable( Are all items the same? ---> False. (2 items are shared, out of 6 total.) Which items are different? File A File B #00 ------------------------------ ------------------('lat', 480) #01 --------------('latitude', 72) ------------------------------ #02 ------------------------------ -----------------('lon', 1440) #03 ------------('longitude', 144) ------------------------------ #04 ---------------------('nv', 2) ---------------------('nv', 2) #05 -------------------('time', 1) -------------------('time', 1) Number of non-shared items: 2 2 Root-level Groups: Are all items the same? ---> True. (No items exist.) No variable group selected for comparison. Skipping.. All variables: File A File B All Variables - ------------------------------ ------------------------------ GROUP #00 -----------------------------/ -----------------------------/ num variables in group: 8 6 - ------------------------------ ------------------------------ -----VARIABLE-----: lat dtype: float32 shape: (480,) chunksize: contiguous bounds: lat_bnds long_name: latitude standard_name: latitude units: degrees_north valid_max: 60.0 valid_min: -60.0 -----VARIABLE-----: lat_bnds dtype: float32 shape: (480, 2) chunksize: contiguous -----VARIABLE-----: lat_bounds dtype: float32 shape: (72, 2) chunksize: contiguous comment: latitude values at the north and south bounds of each pixel. -----VARIABLE-----: latitude dtype: float32 shape: (72,) chunksize: contiguous axis: Y bounds: lat_bounds long_name: Latitude standard_name: latitude units: degrees_north valid_range: [-90.0, 90.0, ...] -----VARIABLE-----: lon dtype: float32 shape: (1440,) chunksize: contiguous bounds: lon_bnds long_name: longitude standard_name: longitude units: degrees_east valid_max: 360.0 valid_min: 0.0 -----VARIABLE-----: lon_bnds dtype: float32 shape: (1440, 2) chunksize: contiguous -----VARIABLE-----: lon_bounds dtype: float32 shape: (144, 2) chunksize: contiguous comment: longitude values at the west and east bounds of each pixel. -----VARIABLE-----: longitude dtype: float32 shape: (144,) chunksize: contiguous axis: X bounds: lon_bounds long_name: Longitude standard_name: longitude units: degrees_east valid_range: [0.0, 360.0, ...] -----VARIABLE-----: precip dtype: float32 shape: (1, 72, 144) chunksize: contiguous cell_methods: area: mean time: mean coordinates: time latitude longitude long_name: NOAA Climate Data Record (CDR) of GPCP Monthly Satellite-Gauge Combined Precipitation missing_value: -9999.0 standard_name: precipitation amount units: mm/day valid_range: [0.0, 100.0, ...] -----VARIABLE-----: precip_error dtype: float32 shape: (1, 72, 144) chunksize: contiguous coordinates: time latitude longitude long_name: NOAA CDR of GPCP Satellite-Gauge Combined Precipitation Error missing_value: -9999.0 units: mm/day valid_range: [0.0, 100.0, ...] -----VARIABLE-----: precipitation dtype: float32 shape: (1, 1440, 480) chunksize: [1, 1440, 480] _FillValue: -1.0 cell_method: sum long_name: NOAA Climate Data Record of PERSIANN-CDR daily precipitation missing_value: -9999.0 standard_name: precipitation_amount units: mm valid_max: 999999.0 valid_min: 0.0 -----VARIABLE-----: time time dtype: float32 int32 shape: (1,) (1,) chunksize: contiguous contiguous axis: T bounds: time_bounds calendar: Gregorian long_name: time time standard_name: time time units: days since 1970-01-01 00:00:00 0:00 days since 1979-01-01 0:0:0 -----VARIABLE-----: time_bounds dtype: float32 shape: (1, 2) chunksize: contiguous comment: time bounds for each time value - ------------------------------ ------------------------------ Total number of shared items: 1 1 Total number of non-shared items: 7 5 Done.
END of Notebook.