Metadata-Version: 2.4
Name: cwrf-bias-correction
Version: 0.1.0
Summary: Reproducible CWRF bias-correction, hindcast evaluation, and summary heatmap workflows.
Author: CWRF Bias Correction Contributors
License: MIT
Project-URL: Homepage, https://example.invalid/cwrf-bias-correction
Project-URL: Documentation, https://example.invalid/cwrf-bias-correction/docs
Keywords: climate,bias-correction,CWRF,xarray,xsdba
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Atmospheric Science
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: cartopy
Requires-Dist: cftime
Requires-Dist: dask
Requires-Dist: h5netcdf
Requires-Dist: joblib
Requires-Dist: matplotlib
Requires-Dist: netCDF4
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: PyYAML
Requires-Dist: scipy
Requires-Dist: shapely
Requires-Dist: threadpoolctl
Requires-Dist: xarray
Requires-Dist: xsdba
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Dynamic: license-file

# CWRF Bias Correction

Python package and unified CLI for CWRF bias-correction research workflows. The
package preserves the legacy scientific algorithms and filenames while adding
YAML configuration, metadata, tests, and reproducible command entry points.

## Installation

```bash
conda env create -f environment.yml
conda activate cwrf-bias-correction
pip install -e .
```

For development:

```bash
pip install -e ".[dev]"
pytest
```

## Hindcast Case 1: All Variables

This is the full hindcast-evaluation workflow: bias-correct individual methods,
calculate `ENSEMBLE_MEAN`, generate evaluation maps, then summarize heatmaps.

Step 1, bias correction:

```bash
cwrf-bc run-hindcast-test \
  --hindcast-root /data/public/CWRF/combined \
  --obs-root /data/public/regrid_daily \
  --out-root /data/public/CWRF/bias_corrected_hindcast_all \
  --cache-root /tmp/cwrf_bc_cache_all \
  --landmask-file ~/us_landmask_output.nc \
  --soil-reference-root /data/sunchao/ERA_land/SOIL_DAYMEAN \
  --vars PRAVG AT2M T2MAX T2MIN ASWDNS AXSMTG \
  --methods EQM QDM DQM LOCI Scaling VARI ExtremeValues \
  --test-year-start 2011 \
  --test-year-end 2024 \
  --n-jobs 64
```

Step 2, ensemble:

```bash
cwrf-bc ensemble --config configs/ensemble_hindcast.yaml
```

Step 3, evaluation maps:

```bash
cwrf-bc evaluate-hindcast \
  --hindcast-root /data/public/CWRF/combined \
  --corrected-root /data/public/CWRF/bias_corrected_hindcast_all \
  --obs-root /data/public/regrid_daily \
  --soil-reference-root /data/sunchao/ERA_land/SOIL_DAYMEAN \
  --landmask-file ~/us_landmask_output.nc \
  --out-root /data/public/CWRF/bias_corrected_hindcast_all/evaluation_maps \
  --vars PRAVG AT2M T2MAX T2MIN ASWDNS AXSMTG \
  --init-months 2 3 4 5 6 7 8 \
  --years 2011 2024 \
  --train-year-start 1982 \
  --train-year-end 2009 \
  --max-lead 6 \
  --methods RAW ENSEMBLE_MEAN EQM QDM DQM LOCI Scaling VARI ExtremeValues \
  --precip-indicators MEAN SDII CDD Rainydays R95p R99p Rx1day Rx5day \
  --rxxp-fortran-lib "$CWRF_PRECIP_EXTREMES_LIB" \
  --n-workers 64 \
  --overwrite
```

Step 4, heatmaps:

```bash
cwrf-bc summarize-heatmaps \
  --metrics-root /data/public/CWRF/bias_corrected_hindcast_all/evaluation_maps \
  --landmask-file ~/us_landmask_output.nc \
  --croptype-file croptype.nc \
  --out-root /data/public/CWRF/bias_corrected_hindcast_all/evaluation_maps/summary_heatmaps \
  --init-months 2 3 4 5 6 7 8 \
  --leads 1 2 3 4 5 6 \
  --metrics corr rmse bias nrmse mae mape median_ae mean_tweedie_deviance d2_tweedie_score mean_gamma_deviance \
  --nonprecip-targets AT2M T2MAX T2MIN ASWDNS AXSMTG \
  --precip-targets PRAVG PRAVG_SDII PRAVG_CDD PRAVG_Rainydays PRAVG_R95p PRAVG_R99p PRAVG_Rx1day PRAVG_Rx5day \
  --methods RAW ENSEMBLE_MEAN EQM QDM DQM LOCI Scaling VARI ExtremeValues \
  --drop-empty-targets \
  --n-workers 64 \
  --overwrite
```

## Hindcast Case 2: Scaling Plus Precipitation Extremes

Use this case when you want `Scaling` for temperature, radiation, and soil
moisture, plus `LOCI` and `ExtremeValues` for precipitation-related variables.
In this version, `Scaling` is not an active `PRAVG` method.

Step 1, bias correction:

```bash
cwrf-bc run-hindcast-test \
  --hindcast-root /data/public/CWRF/combined \
  --obs-root /data/public/regrid_daily \
  --out-root /data/public/CWRF/bias_corrected_hindcast_scaling_loci_extremes \
  --cache-root /tmp/cwrf_bc_cache_scaling_loci_extremes \
  --landmask-file ~/us_landmask_output.nc \
  --soil-reference-root /data/sunchao/ERA_land/SOIL_DAYMEAN \
  --vars PRAVG AT2M T2MAX T2MIN ASWDNS AXSMTG \
  --methods Scaling LOCI ExtremeValues \
  --test-year-start 2011 \
  --test-year-end 2024 \
  --n-jobs 64
```

Step 2, ensemble. Use a config with variable-specific source methods:

```yaml
mode: hindcast
input_root: /data/public/CWRF/bias_corrected_hindcast_scaling_loci_extremes
output_root: /data/public/CWRF/bias_corrected_hindcast_scaling_loci_extremes
variables:
  PRAVG:
    methods: [LOCI, ExtremeValues]
  AT2M:
    methods: [Scaling]
  T2MAX:
    methods: [Scaling]
  T2MIN:
    methods: [Scaling]
  ASWDNS:
    methods: [Scaling]
  AXSMTG:
    methods: [Scaling]
init_months: [2, 3, 4, 5, 6, 7, 8]
years: [2011, 2024]
overwrite: false
strict_missing: false
```

Run it:

```bash
cwrf-bc ensemble --config ensemble_scaling_loci_extremes.yaml
```

Step 3, evaluation maps:

```bash
cwrf-bc evaluate-hindcast \
  --hindcast-root /data/public/CWRF/combined \
  --corrected-root /data/public/CWRF/bias_corrected_hindcast_scaling_loci_extremes \
  --obs-root /data/public/regrid_daily \
  --soil-reference-root /data/sunchao/ERA_land/SOIL_DAYMEAN \
  --landmask-file ~/us_landmask_output.nc \
  --out-root /data/public/CWRF/bias_corrected_hindcast_scaling_loci_extremes/evaluation_maps \
  --vars PRAVG AT2M T2MAX T2MIN ASWDNS AXSMTG \
  --init-months 2 3 4 5 6 7 8 \
  --years 2011 2024 \
  --train-year-start 1982 \
  --train-year-end 2009 \
  --max-lead 6 \
  --methods RAW ENSEMBLE_MEAN Scaling LOCI ExtremeValues \
  --precip-indicators MEAN SDII CDD Rainydays R95p R99p Rx1day Rx5day \
  --rxxp-fortran-lib "$CWRF_PRECIP_EXTREMES_LIB" \
  --n-workers 64 \
  --overwrite
```

Step 4, heatmaps:

```bash
cwrf-bc summarize-heatmaps \
  --metrics-root /data/public/CWRF/bias_corrected_hindcast_scaling_loci_extremes/evaluation_maps \
  --landmask-file ~/us_landmask_output.nc \
  --croptype-file croptype.nc \
  --out-root /data/public/CWRF/bias_corrected_hindcast_scaling_loci_extremes/evaluation_maps/summary_heatmaps \
  --init-months 2 3 4 5 6 7 8 \
  --leads 1 2 3 4 5 6 \
  --metrics corr rmse bias nrmse mae mape median_ae mean_tweedie_deviance d2_tweedie_score mean_gamma_deviance \
  --nonprecip-targets AT2M T2MAX T2MIN ASWDNS AXSMTG \
  --precip-targets PRAVG PRAVG_SDII PRAVG_CDD PRAVG_Rainydays PRAVG_R95p PRAVG_R99p PRAVG_Rx1day PRAVG_Rx5day \
  --methods RAW ENSEMBLE_MEAN Scaling LOCI ExtremeValues \
  --drop-empty-targets \
  --n-workers 64 \
  --overwrite
```

## Operational Run: DAWN2

Operational production also uses a two-step correction workflow. First generate
individual method files, then generate `ENSEMBLE_MEAN`, then produce likelihood
maps.

Step 1, bias correction:

```bash
cwrf-bc run-operational \
  20260501 \
  --operational-root /data/pub/CWRF/Operational/daily \
  --operational-out-root /data/pub/CWRF/Operational/daily_bias_corrected \
  --hindcast-root /data/pub/CWRF/combined \
  --obs-root /data/pub/obs_grid_met/regrid_daily \
  --cache-root /tmp/cwrf_bc_cache_operational \
  --landmask-file ~/us_landmask_output.nc \
  --soil-reference-root /data/sunchao/ERA_land/SOIL_DAYMEAN \
  --vars PRAVG T2MAX T2MIN ASWDNS \
  --methods LOCI Scaling ExtremeValues \
  --n-jobs 62 \
  --overwrite
```

Step 2, ensemble:

```bash
cwrf-bc ensemble --config configs/ensemble_operational.yaml
```

Step 3, anomaly and likelihood maps:

```bash
cwrf-bc plot-operational-anomaly \
  --forecast-date 20260501 \
  --vars PRAVG T2MAX T2MIN ASWDNS \
  --sources corrected \
  --corrected-methods ENSEMBLE_MEAN \
  --corrected-root /data/pub/CWRF/Operational/daily_bias_corrected \
  --climatology-dir /data/pub/CWRF/hindcast/climatology \
  --out-root /data/pub/CWRF/Operational/anomaly_maps \
  --grid-file wrf_lambert_latlon_grid.nc \
  --logo scripts/operational/DAWN_new_acro.png
```

## CFSv2 Hindcast Input

CFSv2 is supported as another model input source when its NetCDF files use the
same daily variable structure as the existing CWRF files. The CFSv2 archive is
expected under:

```text
/data/sunchao/GCM/CFSv2/YYYY/YYYYMM/YYYYMMDD/YYYYMMDD_VARIABLE_daily.nc
```

Example:

```text
/data/sunchao/GCM/CFSv2/2000/200002/20000205/20000205_PRAVG_daily.nc
```

Run CFSv2 hindcast bias correction with:

```bash
cwrf-bc run-hindcast-test --config configs/cfsv2_hindcast.yaml
```

Equivalent command:

```bash
cwrf-bc run-hindcast-test \
  --input-source CFSv2 \
  --model-root /data/sunchao/GCM/CFSv2 \
  --obs-root /data/public/regrid_daily \
  --out-root /data/sunchao/GCM/CFSv2/bias_corrected \
  --cache-root /tmp/cwrf_bc_cache_cfsv2 \
  --landmask-file ~/us_landmask_output.nc \
  --vars ASWDNS AT2M AXTSS PRAVG T2MAX T2MIN xsmtg \
  --methods EQM QDM DQM LOCI Scaling VARI ExtremeValues \
  --test-year-start 2012 \
  --test-year-end 2024 \
  --train-year-start 1982 \
  --train-year-end 2009 \
  --soil-model-file-suffix _xsmtg_daily.nc \
  --soil-model-variable xsmtg \
  --n-jobs 64
```

`--model-root` is an alias for the model input root. For hindcast-test runs it
overrides `--hindcast-root`; for operational runs it overrides
`--operational-root`. CFSv2 currently has 1982-2024 data under the same root, and
future operational CFSv2 initialization dates can be placed under the same
`YYYY/YYYYMM/YYYYMMDD` layout and processed with the same workflow.

## Output Layout Details

Hindcast input cases are discovered under:

```text
<hindcast-root>/<YYYY>/<YYYYMM>/<YYYYMMDD>/
```

For a single input case `2015/201505/20150516`, the main bias-correction step
writes only individual method files:

```text
<out-root>/
|-- run_metadata.json
|-- run_config.yaml                         # only when --config is used
`-- 2015/
    `-- 201505/
        `-- 20150516/
            `-- bias_methods/
                |-- EQM/
                |-- QDM/
                |-- DQM/
                |-- LOCI/                   # precipitation only: PRAVG
                |-- Scaling/                # non-precipitation and AXSMTG
                |-- VARI/                   # non-precipitation and AXSMTG
                `-- ExtremeValues/          # precipitation only: PRAVG
```

The separate ensemble step adds:

```text
<out-root>/2015/201505/20150516/bias_methods/ENSEMBLE_MEAN/
|-- 20150516_<member>_<VAR>_daily.nc
`-- metadata/
    `-- 20150516_<member>_<VAR>_daily.ensemble_metadata.json
```

Evaluation outputs are grouped by target and initialization month:

```text
<evaluation-out-root>/
|-- PRAVG/init_month_05/PRAVG_initmon_05_metrics.nc
|-- PRAVG/init_month_05/PRAVG_initmon_05_skill_metrics.nc
|-- PRAVG_R95p/init_month_05/PRAVG_R95p_initmon_05_metrics.nc
|-- AT2M/init_month_05/AT2M_initmon_05_metrics.nc
`-- AXSMTG_010/init_month_05/AXSMTG_010_initmon_05_metrics.nc
```

Operational likelihood-map outputs are grouped by source label:

```text
<out-root>/
`-- ENSEMBLE_MEAN/
    |-- anomaly/
    |   `-- 20260501_PRAVG_anomaly_CWRF_operational.nc
    `-- probability/
        |-- data/
        |   |-- 20260501_PRAVG_below_probability_CWRF_operational.nc
        |   |-- 20260501_PRAVG_normal_probability_CWRF_operational.nc
        |   `-- 20260501_PRAVG_above_probability_CWRF_operational.nc
        `-- figures/
            |-- 20260501_PRAVG_0_MJJ_terciles_CONUS.png
            |-- 20260501_PRAVG_1_JJA_terciles_CONUS.png
            |-- 20260501_PRAVG_2_JAS_terciles_CONUS.png
            `-- 20260501_PRAVG_3_ASO_terciles_CONUS.png
```

## Ensemble Configuration

`ENSEMBLE_MEAN` is a post-processing product. It is not generated by
`run-hindcast-test` or `run-operational`.

Config fields:

- `mode`: `operational` or `hindcast`.
- `input_root`: root containing existing `bias_methods/<METHOD>/` files.
- `output_root`: root for `bias_methods/ENSEMBLE_MEAN/`; it can equal
  `input_root`.
- `variables`: mapping of variable name to an explicit `methods` list.
- `dates`: operational `YYYYMMDD` dates.
- `years`: hindcast start/end year, for example `[2012, 2024]`.
- `init_months`: hindcast initialization months to process.
- `overwrite`: rewrite existing `ENSEMBLE_MEAN` files.
- `strict_missing`: fail when any requested method/file is missing. When false,
  continue with available files and record missing inputs in metadata.

Each `ENSEMBLE_MEAN` NetCDF has global attributes including
`bias_correction_method = "ENSEMBLE_MEAN"`, `ensemble_source_methods`, and
`ensemble_metadata_file`. The JSON metadata records the output file, variable,
initialization month, year/date, configured methods, input files used, missing
methods/files, timestamp, package version, git commit when available, and config
file.

## Variables and Methods

Supported bias-correction variables:

```text
PRAVG AT2M T2MAX T2MIN ASWDNS AXSMTG
```

Supported correction methods:

```text
EQM QDM DQM LOCI Scaling VARI ExtremeValues
```

Evaluation-only methods:

```text
RAW ENSEMBLE_MEAN
```

Precipitation indicators:

```text
MEAN SDII CDD Rainydays R95p R99p Rx1day Rx5day
```

When all methods are requested, the workflow keeps only valid methods per
variable:

- `PRAVG`: `EQM`, `QDM`, `DQM`, `LOCI`, `ExtremeValues`
- `AT2M`, `T2MAX`, `T2MIN`, `ASWDNS`: `EQM`, `QDM`, `DQM`, `Scaling`, `VARI`
- `AXSMTG`: `EQM`, `QDM`, `DQM`, `Scaling`, `VARI`

Soil moisture uses grouped CLM-like model output from
`<init_date>_axsmtg_daily.nc`. Bias correction converts selected model layers
from `kg m-2` storage to volumetric `m3 m-3`, corrects them against ERA5-Land
soil water, clips corrected values, then converts back to the original storage
units and file structure.

## Data Layout Assumptions

- Hindcast roots contain `YYYY/YYYYMM/YYYYMMDD` case folders.
- Observation files match `OBS_<VAR>_<START>_<END>.nc`.
- Landmask files contain `reg_mask`, with land selected by `reg_mask > 0`.
- Heatmap Cornbelt summaries require `CROPTYPE == 1` and land mask true.
- Existing individual-method output filenames are preserved.

Each `cwrf-bc` command writes `run_metadata.json` to the command output
directory unless `--no-metadata` is supplied. If a YAML config is used, the
metadata writer also copies it to `run_config.yaml`.

## Fortran R95p/R99p Library

The evaluator can use `libcwrf_precip_extremes.so` for R95p/R99p. Build it with:

```bash
gfortran -O3 -fPIC -shared fortran/cwrf_precip_extremes.f90 -o libcwrf_precip_extremes.so
export CWRF_PRECIP_EXTREMES_LIB="$PWD/libcwrf_precip_extremes.so"
```

You can also pass `--rxxp-fortran-lib /path/to/libcwrf_precip_extremes.so`.
