Changelog
openeo-geopyspark-driver 0.66.0a3.dev20250430+2660
In progress: 0.66.0
apply_neighborhood
/apply_dimension
: support changing band names via apply_metadata (#1155)StacApiWorkspace
: support arbitrary paths inmerge
; the last part of a path becomes the collection ID in the STAC API (#1074)- Fix compatibility with Shapely 2 (#1161)
- Change default
use_zk_job_registry
config toFalse
(#632, #863, #1165)
0.65.0
sar_backscatter
: soft-errors can now be a fraction, allowing to tolerate a certain failure percentage provided as a number between 0 and 1. (#443)save_result
: write GeoTiff assets with valid tile size; override withtile_size
format option. (#1083)sar_backscatter
support: automatically use backend-specificcoefficient
default even if there is no explicitsar_backscatter
in the process graph (Open-EO/openeo-python-driver#376)- Add
capabilities_extras
config to easily extend capabilities document (Open-EO/openeo-python-driver#384) - Add
CalrissianS3Result.generate_presigned_url()
(#937) - Restore COG layout for
GTiff
output format (Open-EO/openeo-geotrellis-extensions#393) - Add
CalrissianS3Result.download()
(#937) - Support additional Sentinel 3 collections (eu-cdse/openeo-cdse-infra#380)
- Add
CalrissianS3Result.generate_public_url()
(#937) - Replace
ConfigParams.layer_catalog_metadata_files
withGpsBackendConfig.layer_catalog_files
(#285, #1084) StacApiWorkspace
: log body of error response from STAC API for better root cause analysis (#1116)export_workspace
: add"derived_from"
links to STAC Collection (#1050)- Calrissian integration: avoid unnecessary pulls of
alpine
image ([#1132]https://github.com/Open-EO/openeo-geopyspark-driver/issues/1132) - Calrissian integration: refactor config to a
CalrissianConfig
sub-config (#1009) save_result
: support zarr format (experimental)save_result
: allow non-string values inGTiff
file_metadata
(#1142)- Add
udp_registry_zookeeper_client_reuse
config forKazooClient
reuse inZooKeeperUserDefinedProcessRepository
(#1037) GpsBackendConfig
: be more forgiving about unknown config keys to better support use cases that involve backward/forward incompatible configurations (Open-EO/openeo-python-driver#322)- Improved API alignment between
DoubleJobRegistry
andJobRegistryInterface
/ElasticJobRegistry
(#863, #1123) - Add
use_new_feature_extent_intersection_2
option toload_collection
: To use new intersection code to work with products crossing the antimeridian. (#1072)
0.64.1
load_custom_processes
: allow specifying path directly
0.63.0
- Add
bbox
andgeometry
to exported STAC items pertaining to GeoTiff assets of a spatial data cube (eu-cdse/openeo-cdse-infra#418) - Support UDF dependency extraction from remote process definitions (URL based UDPs) (#1063)
StacApiWorkspace
: improve STAC requests resilience (#1073)StacApiWorkspace
: reject unsupported characters inmerge
(eu-cdse/openeo-cdse-infra#415)load_stac
: fix delay in driver due to expensive item/asset processing (#1081)
0.62.0
load_stac
: improve STAC requests resilience (#818)- Extract demo CWL content to package resources (#1057)
- Disable EJR health check from batch job context (#1066)
load_stac
: support empty data cubes (#1049)- Remove proof-of-concept CWL processes (now in openeo-geotrellis-kubernetes) from generic openeo-geopyspark-driver (#1057/#1038)
0.61.0
python-memory
: make job option available on YARN, add a default configload_stac
: optimize resolution with regard to requested bands (experimental) (#1043)load_stac
: apply offset (experimental) (#1051)- Deprecate non-standard "logging-threshold" job option in favor of standardized "log_level" job creation parameter (#909)
- Fail fast on UDF dependency installation failure (#1048)
load_stac
: avoid OOM on global low-res assets (#1055)
0.60.1
load_stac
: restore spatial dimensions of data cube loaded from STAC Collection that lackscube:dimensions
. (#1036)sar_backscatter
: report soft-errors fraction in usage metricsapply_neighborhood
: for t='P1D', add date to dataarray.attrs with key 't'
0.60.0
- Make environment variables to be passed from web app driver to batch job driver configurable (#867)
load_collection
/load_stac
: support parameters inproperties
(Open-EO/openeo-python-driver#327)
0.59.0
- load_stac: cube creation is now cached, just like load_collection (#993)
- logs: Provide a performance summary at the end of a batch job.
StacApiWorkspace
: supportfilepath_per_band
(#867)load_stac
: use STAC API Filter Extension to prevent driver OOM (#979)
0.58.1
- Fix spatial and temporal extents of exported STAC Collection (#867)
0.58.0
- Improve "App not found" logs to avoid red herrings in root cause analysis (eu-cdse/openeo-cdse-infra#147)
- Avoid pixel shift when source data is not aligned in load_stac. (#648)
- Fix when outputting an empty vector cube to GeoParquet (#987)
- Fix for outputting a vector cube to legacy
timeseries.json
format. (#342) - Add gdalinfo json files next to tiff files. (openeo-geotrellis-extensions#352)
- Support exporting assets for different collections to different paths (#867)
- load_collection: bugfix in 'global extent' computation, increases performance (#334)
0.57.0
- Initial support for S3 profiles and tokens during batch job execution (#969)
0.56.0
- Initial support for CWL based processes with Calrissian on a Kubernetes cluster (#936)
0.55.0
- Support
file_metadata
format option to set file-specific metadata onGTiff
output assets (#970) - load_collection for Sentinel-3 Level2 data: avoid data corruption in specific case (#755)
0.54.0
export_workspace
: experimental support for merging into STAC API (#867)
0.53.1
export_workspace
: fixKeyError: 'alternate'
upon merging into existing STAC collection (#677)
0.53.0
export_workspace
: experimental support for merging STAC Collections (#677)- Add support for
orgId
in ETL resource/cost reporting (#671) load_stac
: Align output pixels with source pixels if source data is UTM and has an offset. (#648)
0.52.0
- Throw error when trying to use unsupported
target_dimension
inaggregate_spatial
(#951) - Allow specifying region name for an ObjectStorageWorkspace (#955)
- Better print ApiException. (#962)
- Include job title by default in user job listings (#963)
- Support pagination of user job listings (#959/Open-EO/openeo-python-driver#332)
0.51.0
load_stac
: omitdatetime
parameter from STAC API item search request if notemporal_extent
specified (#950)
0.50.1
- Fix
reduce_dimension
of bands for GeoTIFF output in batch job (#943)
0.50.0
- Fix type of
ZLEVEL
option forGTiff
format - Add
filepath_per_band
tosave_result
options. (#877) - Allow pointing to custom processes with
OPENEO_CUSTOM_PROCESSES
env var (related to #936)
0.49.1
- Py4j log level is now always 'WARN' to avoid spurious messages.
- Fix removal of multiple original job result assets (#883)
- Fix using asset_per_band with large extends giving partial tiff files (Open-EO/openeo-geotrellis-extensions#329)
0.49.0
- Fix
load_stac
for collections from stac.terrascope.be (#862) - Point
href
of job result asset to workspace URI if original was removed (#883)
0.48.2
- Fix
resample_spatial
of Sentinel-3 data cube (#920)
0.48.0
- Expose
filename_prefix
format option fornetCDF
output assets (#876) - Make sure
OPENEO_BACKEND_CONFIG
env var is set in K8s executors
0.47.0
- Support
bands_metadata
format option to set band-specific scale, offset and other metadata onGTiff
output assets (Open-EO/openeo-geotrellis-extensions#317)
0.46.0
- Automatic Python UDF dependency handling: add option to work with ZIP archive instead of full tree in job work folder, to improve performance/stability in contexts where large file trees under the job work folder are not ideal, e.g. FUSE-mounted S3 storage (#845, docs)
0.45.0
- Experimental support for removal of originals of assets exported to workspace (#883)
- A rounding bug was fixed in a downstream library that in specific cases leads to a change in the number of pixel rows/columns in the output. We mainly observe this when the input bounding box is not well aligned to the pixel grid of the Copernicus data. #297
- Fixed an issue where jobs asset_per_band sometimes returned empty tiff files. (Open-EO/openeo-geotrellis-extensions#329)
0.44.1
- Stream assets from object storage to prevent batch job driver pod from being OOMKilled (eu-cdse/openeo-cdse-infra#278)
0.44.0
- Job tracker: only consider jobs updated in last 2 weeks (#902)
0.43.0
- Support exporting objects to object storage workspace (eu-cdse/openeo-cdse-infra#278)
0.42.0
- Job tracker (still based on
DoubleJobRegistry
): only consider last 2 weeks of "trackable" jobs (#902)
0.41.0
- quantiles, when used in apply_dimension was corrected to use the interpolation method that is prescribed by the openEO process definition.
- return STAC Items with valid date/time for time series job results (#852)
- filter_labels now also supported for collections that use sar_backscatter and use the internal Orfeo toolbox based method to compute backscatter on the fly. For example: SENTINEL1_GRD (Open-EO/openeo-geotrellis-extensions#320)
- support mask assets in
load_stac
(#874) - align
DataCubeParameters
withload_collection
(#812) - apply/apply_dimension(dimension='bands'): nodata tiles were removed as an optimization, but this could lead to unexpected results depending on subsequent steps. They are now replaced with a memory efficient implementation. ([WorldCereal issue][https://github.com/WorldCereal/worldcereal-classification/issues/141])
load_collection
with an excessive extent (temporal or spatial) will now be blocked to avoid excessive resource usage. This check can be disabled withjob_options.do_extent_check=False
(#815)- Mixing bands with signed and unsigned data types could lead to negative values being misrepresented. This is now fixed by using the correct data type for the output.
- Logging output is being reduced to focus on most relevant messages from a user perspective.
- Support multiple
export_workspace
processes (eu-cdse/openeo-cdse-infra#264) - Fix
export_workspace
process not executed in process graph with multiplesave_result
processes (eu-cdse/openeo-cdse-infra#264)
0.40.1
- Fix
load_stac
of GeoTiff batch job results by returning compliant GeoJSON in"geometry"
of STAC Items. #854
0.40.0
- a new 'python-memory' option allows to more explicitly limit memory usage for UDF's, sar_backscatter and Sentinel3 data loading. The executor-memoryOverhead option can be reduced or removed when using the new python-memory option.
- The default processing chunk size can now be configured for backends. If not set, the default may be determined automatically. We observe that a lower default, like 128 pixels, allows running jobs with less memory. (Open-EO/openeo-geotrellis-extensions#311)
- aggregate_spatial: trying to use the probabilities argument in a single 'quantiles' reduces was throwing an error. (#821)
- sar_backscatter: when a target resolution is provided via resample_spatial, it is now immediately taken into account for computing backscatter, reducing memory usage.
- the temporary folder which is created for aggregate_spatial now contains a timestamp to aid cleanup scripts.
- apply_neighborhood: support applying UDF on cubes without a time dimension
- Add "separate_asset_per_band" to save_result options. Currently, for TIFF only.
load_stac
: presence ofeo:bands
is no longer a hard requirement; band name defaults to asset key #762.- Optionally sleep after automatic UDF dependency install (config
udf_dependencies_sleep_after_install
) (eu-cdse/openeo-cdse-infra#112)
0.39.0
- Correctly apply the method parameter in resample_spatial and resample_cube_spatial, when downsampling to lower resolution, and the sampling is not applied at load time. (Open-EO/openeo-geotrellis-extensions#303)
- Use band names as column name in GeoParquet output (#723)
- Prevent nightly cleaner from failing a job tracker run (eu-cdse/openeo-cdse-infra#166)
- Sentinelhub collections handle non zero nodata better (openeo-geotrellis-extensions#300)
- Support
allow_empty_cubes
job option (#649) - Cross-backend jobs: support running main job on backends that lack
async_task
infrastructure. (#786) - Support
save_result
processes in arbitrary subtrees in the process graph i.e. those not necessarily contributing to the final result (#424)
0.38.6
- Increase default resource usage configs for sync processing (eu-cdse/openeo-cdse-infra#158)
0.38.5
- Fix EJR configuration in batch jobs on YARN (#792)
- Improvements to DriverVectorCube support in
apply_polygon
(Open-EO/openeo-python-driver#287, Open-EO/openeo-python-driver#288, Open-EO/openeo-python-driver#291, #801)
0.38.4
- Fix load_stac from unsigned job results URL in batch jobs (#792)
0.38.3
- Automatically include declared/installed UDF dependencies in
PYTHONPATH
on YARN deploys (#237)
0.38.2
- Automatically include declared/installed UDF dependencies in
PYTHONPATH
on K8s deploys (#237) - Less pixels will become nodata in special cases (Open-EO/openeo-geotrellis-extensions#280)
0.38.1
- fix load_stac from MS Planetary Computer STAC API (#784)
0.38.0
- Initial, experimental support for automatic installing declared UDF dependencies (#237)
0.37.2
- load_stac: incorporate STAC Item geometry (#778)
0.37.1
- load_stac from STAC API: fix CRS and resolution of output assets (#781)
0.37.0
- Support additional options in
FreeIpaClient.user_add()
(eu-cdse/openeo-cdse-infra#56)
0.36.0
- Job tracker: skip jobs where application id can't be found (instead of giving status "error") to be less destructive in distributed contexts with partial replication (related to eu-cdse/openeo-cdse-infra#141)
0.35.0
- Add config
zk_job_registry_max_specification_size
to set a limit on the size of the process graph items when registering a new batch job withZkJobRegistry
. Jobs with a process graph that is too large will be partially stored in the registry: most metadata will be available, but use cases that try to get the process graph itself will fail with a JobNotFound-like error. This is intended to be combined withElasticJobRegistry
throughDoubleJobRegistry
to allowElasticJobRegistry
to act as fallback for jobs that are too large forZkJobRegistry
. (related to #498, eu-cdse/openeo-cdse-infra#141)
0.34.0
- load_stac from unsigned job results URL: fix CRS and resolution of output assets (#669)
- Align job registry implementations to omit "process" and "job_options" in user job listings (related to #498, eu-cdse/openeo-cdse-infra#141)
0.33.1
- array_create bugfix to support mixed data types (#287)
0.33.0
- Support correlation ID in job tracker logs (#707)
0.32.0
- Always enable
allow_dynamic_etl_api
from synchronous processing (drop feature flag) (#531, eu-cdse/openeo-cdse-infra#114)
0.31.1
- Initial support for
job_options
handling inOpenEoBackendImplementation.request_costs()
(#531, eu-cdse/openeo-cdse-infra#114)
0.31.0
- vector_to_raster now returns a valid raster cube (Open-EO/openeo-python-driver#273)
- aggregate_spatial can now be used as input for vector_to_raster (#663)
- raster_to_vector now returns a valid vector cube (Open-EO/openeo-python-driver#276)
- raster_to_vector now includes the pixel values of the output polygons (#578)
- raster_to_vector now adds an id to each polygon including band name, date, and index (#578)
- resample_spatial now has better performance when resampling to a high resolution (#265)
- support property filters for STAC based collections (#460)
- performance improvement when reading from object storage (#740)
- support Microsoft Planetary Computer in load_stac (#760)
0.30.2
- load_stac: fix filtering by Item properties
- Fix common cause of 'TopologyException' errors in vector processing
- sample_by_feature will now use the 'feature_id_property' setting for naming generated assets. #722
0.30.1
- Reinstate
ejr_credentials_vault_path
config option.
0.30.0
- Remove deprecated and unused
ejr_credentials_vault_path
config option.
0.29.0
- aggregate_temporal(_period) Performance improvement to use number of bands in metadata rather than computing the band count.
- Added initial implementation of FreeIpaClient (eu-cdse/openeo-cdse-infra#56)
0.28.2
- Mask process now also works when the mask is in a different projection, so resample_cube_spatial is needed in fewer cases.
- Improve resilience by retrying ETL API requests (#720).
0.28.1
- Support excluding Sentinel Hub processing units from usage reporting (openeo-cdse-infra#37).
0.28.0
- Export to JSON is now more robust, supports datetime objects returned by dimension_labels, and will default to the string representation.
- GDAL upgraded to 3.8.4 and Orfeo Toolbox to 8.1.2. This mainly reduces the volume of bytes read from object storage by GDAL. (#571)
- Size of incoming requests is now limited to 2MB by default (Open-EO/openeo-python-driver#254)
- load_stac: support loading netCDF multiple netCDF items with a time dimension, as produced with 'sample_by_feature' option
- In batch result STAC metadata proj:shape is fixed to be in Y-X order, as prescribed by the standard. (#693)
- Copy batch job output assets to a workspace with the
export_workspace
process (#676). - Support vector cubes loaded from
load_url
in "sample_by_feature" feature (#700) - Keep polygons and multipolygons sorted when calling
aggregate_spatial
(#60)
0.27.1
- Add timeout to requests towards ETL API to unblock JobTracker (#690).
0.27.0
- Expose "bbox" and "geometry" for spatial STAC Item with netCDF assets (#646)
0.26.2
MultiEtlApiConfig
: don't fail-fast on missing env vars for credentials extraction, just skip with warnings for now
0.26.1
Bugfix
- fix load_stac from unsigned job results URL in batch job (#644)
0.26.0
- Introduce
MultiEtlApiConfig
to support multiple ETL API configurations (#531)
0.25.0
- The default for the soft-errors job option is now set to 0.1 and made configurable at backend level. This value better recognizes the fact that many EO archives have corrupt files that otherwise break jobs #617.
- Support GeoParquet output format for
aggregate_spatial
(#623)
Improved datatype conversion
A rather big improvement in this release is the handling of datatypes. OpenEO does not have very explicit rules when it comes to datatypes, and in this implementation, in most cases the datatype was simply preserved, like in most programming languages.
For most users, this resulted in unexpected behaviour, for instance when dividing integer dataypes, or subtracting two unsigned 8 bit numbers, and expecting to get negative values.
This implementation will now try to use wider datatypes when necessary. For instance by switching to floating point when performing a division. This change makes writing formulas more intuitive, and should save time debugging issues.
When there is still a need to get a smaller datatype, users can use the 'linear_scale_range' process. This process for instance will convert to 8 bit unsigned integers if the target range uses integer values and fits in the [0,255] range.
Relevant issues: - #225 - #581 - #601
0.24.0
- Start using
DynamicEtlApiJobCostCalculator
in job tracker. Effective ETL API selection strategy is to be configured throughEtlApiConfig
Bugfix
- added max_processing_area_pixels custom option to sar_backscatter, avoiding going out of memory when processing too large chunks
0.23.1
Bugfix
- Requests towards Job Registry Elastic API are unreliable; reconsider ZK as primary data store.
0.23.0
Added
- Support disabling ZkJobRegistry (#632)
0.22.3
Bugfix
- Restore batch job result metadata; this reverts the Zookeeper fix introduced in 0.22.2
0.22.2
Bugfix
- Prevent Zookeeper from blocking requests (https://github.com/Open-EO/openeo-geopyspark-driver/pull/639)
0.22.1
Bugfix
- Prevent usage duplication in ETL API (#41)
0.22.0
Added
- Added config
use_zk_job_registry
to disableZkJobRegistry
usage
Bugfix
- apply_neighborhood: fix error if overlap is null/None (#519)
0.21.5
- Initial implementation of
DynamicEtlApiJobCostCalculator
and added caching feature toget_etl_api()
(#531)
0.21.4
- Support for reading GeoPackage vector data
- move legacy-vs-dynamic ETL selection logic to
get_etl_api()
(#531)
0.21.3
- job tracker: move app state mapping to CostDetails construction time
0.21.2
- job tracker: pass
job_options
to JobCostsCalculator throughCostDetails
(related to #531)
0.21.1
- job tracker: do job info iteration in streaming fashion (instead of loading all job info in memory at once)
0.21.0
- Initial support for dynamic ETL API configuration (#531)
0.20.1
- finetune
zookeeper_set.py
script andconcurrent_pod_limit
logic
0.20.0
- Introduce
GpsBackendConfig.zookeeper_hosts
andGpsBackendConfig.zookeeper_root_path
0.19.4
- eliminate
use_etl_api
arg fromGeoPySparkBackendImplementation
in favor ofuse_etl_api_on_sync_processing
config field - Upgrade GDAL to 3.8.1 and Orfeo Toolbox to 8.1.2 https://github.com/Open-EO/openeo-geopyspark-driver/issues/571
- Performance improvement for apply_dimension with target='bands' (https://github.com/Open-EO/openeo-geotrellis-extensions/issues/235 , https://github.com/Open-EO/openeo-geopyspark-driver/issues/595 )
0.19.3
Feature
- Experimental support for filter_labels: only works with catalog based collections, and when used close to the load_collection call #559
- Support for UDF signature that works directly on XArray DataArray, avoiding the need for openEO specific wrapper class.
- Support filtering on tileId with a wildcard. https://github.com/Open-EO/openeo-opensearch-client/issues/25
Bugfix
- Error fixed when doing aggregate_temporal + merge_cubes https://github.com/Open-EO/openeo-geotrellis-extensions/issues/201
- Avoid 'out-of-memory' errors when writing large netCDF files. Allows files of >600MB without custom memory settings. #199
- netCDF output will generate a more useful warning in case of a mismatch with cube band metadata
0.19.2
0.18.0a1
Removed
- Remove old "v1"
job_tracker
script (#545)
0.17.0a1
Feature
- date_difference, date_replace_component: support for these two experimental processes https://github.com/Open-EO/openeo-geopyspark-driver/issues/515
- array_apply: provide access to the 'label' and 'index' parameter https://github.com/Open-EO/openeo-geotrellis-extensions/issues/205
Bugfix
- Fix batch job deletion in EJR code path (https://github.com/Open-EO/openeo-python-driver/issues/163, https://github.com/Open-EO/openeo-geopyspark-driver/issues/523, https://github.com/Open-EO/openeo-geopyspark-driver/issues/498)
2023-09-18 (0.9.5a1)
Important change: time intervals are now left closed. Workflows that are sensitive to exact time intervals may need to be updated.
Feature
- load_stac support, allowing to load STAC collections that conform to the mainstream approach for organizing metadata in STAC. (https://github.com/Open-EO/openeo-geopyspark-driver/issues/402)
- First support for UDF's that change resolution, in Python-Jep runtime. (https://github.com/Open-EO/openeo-geotrellis-extensions/pull/197)
- Improved support for running UDF's on vector cubes.
- Support load_geojson and load_url processes to create vector cubes. (https://github.com/Open-EO/openeo-python-driver/issues/211)
- The 'partial' query parameter is now supported when requesting job results, load_stac supports loading unfinished results. (https://github.com/Open-EO/openeo-geopyspark-driver/issues/489)
-
Support new (experimental) vector_to_raster process, allowing to combine data from a vector source with EO data. (https://github.com/Open-EO/openeo-geopyspark-driver/issues/423)
Bugfix
- Fixed numerical rounding errors in datacubes that use epsg:4326, which could lead to pixel shift or missing lines of data.
- Fixed a deadlock, causing the backend to 'hang' on requests. (https://github.com/Open-EO/openeo-python-driver/issues/209)
- Time intervals are now left-closed (https://github.com/Open-EO/openeo-geopyspark-driver/issues/34)
- Fixed automatic selection of polarization for SENTINEL1_CARD4L. (https://github.com/Open-EO/openeo-geopyspark-driver/issues/473)
- Reduced memory usage in specific case of apply_neighborhood on a smaller chunk size. (https://github.com/Open-EO/openeo-geotrellis-extensions/issues/191)
- Fixed error with apply_neighborhood on Sentinelhub backed layers. (https://github.com/Open-EO/openeo-geopyspark-driver/issues/434)
- Fix in evaluation of 'mask' process, improving performance. (https://github.com/Open-EO/openeo-python-driver/issues/161)
- Fix wrong datacube organanization for sparse cubes in aggregate_temporal(_period) (https://github.com/Open-EO/openeo-geotrellis-extensions/issues/209)
Changed
- Improved performance for small (synchronous) requests. (https://github.com/Open-EO/openeo-geotrellis-extensions/issues/186)
2023-07-30 (0.9.5a1)
Feature
- array_element: Support band selection by label (https://github.com/Open-EO/openeo-geopyspark-driver/issues/43)
- apply_neigborhood: Support applying function over time intervals (https://github.com/Open-EO/openeo-geopyspark-driver/issues/415)
2023-06-30 (0.9.5a1)
Feature
- Add statistics to asset metadata (#391)
- Add array_apply (openeo-geotrellis-extensions/#154)
Changed
- Add filtering on log_level when retrieving logs from Elasticsearch. (Open-EO/openeo-python-driver#170)
- Prevent running UDFs in Spark driver process (#404)
2023-03-30 (0.9.5a1)
Bugfix
- Fix "Permission denied" issue with
run_udf
usage on vector date cube (#367) - Fix: Extent in STAC result metadata should be lat lon (#321)
- Single row/line results with SentinelHub (#375)
- Fix: Creodias: Download asset from object storage (S3) before extracting projection metadata (#403)
Changed
- /validation now detects if the amount of pixels that will be processed is too large (#320)
- Add projection extension metadata to batch job results (openeo-geotrellis-extensions/#72)
2023-03-08 (0.9.3a1)
Build 20230307-1166
, with components:
openeo-geopyspark-0.9.3a1.dev20230307+1073
, openeo_driver-0.37.0a1.dev20230307+441
,
openeo-0.15.0
, geotrellis-extensions-static-2.3.0_2.12-SNAPSHOT-5822561
Feature
- Add "filename_prefix" to format_options.
Bugfix
- Area returned by job metadata is now calculated using WGS84 ellipsoid (https://github.com/Open-EO/openeo-python-driver/issues/144)
2023-02-27 (0.7.0a1)
Build 20230221-1118
Note: this deploy was rolled back to previous build 20230117-966
the same day.
- GeoParquet support to allow loading large vector files
- Improved specific log messages
- Better support for multiple filter_spatial prcesses in same process graph (https://github.com/Open-EO/openeo-geopyspark-driver/issues/147)
- Bugfix for sampling sentinelhub based collections (https://github.com/Open-EO/openeo-geopyspark-driver/issues/279)
- vector_buffer: Throw an error when a negative buffer size resuls in invalid geometries (https://github.com/Open-EO/openeo-python-driver/issues/164)
- batch jobs now also report usage of credits (https://github.com/Open-EO/openeo-geopyspark-driver/issues/272)
- non-utm collections should now have a better alignment to the original rasters, if the process graph does not apply an explicit resampling (https://github.com/Open-EO/openeo-geotrellis-extensions/issues/69)
2023-02-07 (0.6.7a1)
Build 20230117-966
- Added initial support for the
inspect
process. It can be used on datacubes and in callbacks. - The size of a single chunk is now automatically increased for larger jobs, to improve IO performance.
- resample_cube_spatial is no longer needed in all cases when using
merge_cubes
ormask
- Better detection of duplicate products in source catalogs
- The 'if' process will no longer evaluate the branch that is not accepted https://github.com/Open-EO/openeo-python-driver/issues/109
2023-01-20 (0.6.7a1)
- Changed: Getting a job's logs now leaves out log lines that have no loglevel or a level that is not supported. Open-EO/openeo-python-driver#160
2022-11-28 (0.6.3a1)
- Added an experimental job option 'udf-dependency-archives' to pass on archives of UDF dependencies
2022-10-27 (0.6.3a1)
- Reprojection is performed at load time whenever possible, by pushing down parameters from resample_spatial and resample_cube_spatial
- PROBA-V collections can now be loaded at original resolution
- Overlap between original products is now handled based on the footprint in STAC/Opensearch metadata
- Logging for synchronous jobs is now more complete
- First prototype for running vector data UDF's on Spark
- Bugfix: allow large (multiple GB) CSV output
- Try to avoid going out of memory by reducing default partition size
2022-09-21 (0.6.3a1)
- Expose logging from UDF's
- Feature id's from GeoJSON are used to name timeseries in netCDF export
- NetCDF's are now cropped to provided extent
- Support remote STAC collections
- Sentinelhub usage is now recorded for batch jobs
- "task-cpus" job option to control number of cpu's for a single Spark task. Mostly relevant for UDF's that use multi-threaded libraries such as Tensorflow.
- New processes:
- array_find
- exp
2022-05-04 (0.6.3a1)
- Enable JSON logging from batch_job.py (and inject user_id/job_id)
- New processes:
- predict_catboost (not-standard)
- predict_random_forest
- fit_class_random_forest
- array_interpolate_linear
- Faster sar_backscatter both for large areas and sparse sampling
- STAC metadata for random forest models
- Colormap support in PNG's
- Support custom Sentinelhub collections, e.g. PlanetScope data
- 'soft-errors' job option to allow failure of individual Sentinelhub requests
2022-04-07 (0.6.2a1)
- EP-4012: implement collection source selection based on product availability (e.g. collection "SENTINEL2_L2A" "forwards" to "TERRASCOPE_S2_TOC_V2" when possible, but falls back to "SENTINEL2_L2A_SENTINELHUB" when there are missing products for the selected spatiotemporal extent.
2021-11-17
Feature
- Support load_result
- Allow raster masks to filter a collection before loading any data
- Caching of Sentinelhub data
- Streaming writing of netCDF files
- Support filter_spatial
- Support first and last processes
- Jep based UDF implementation
2021-07-14
Changed
- Add support for
openeo.udf
based UDFs and keep backward compatibility withopeneo_udf
based UDFs (EP-3856, #78, #93)
2021-04-08
Feature
- Add support for (multiple) default OIDC clients (for EGI Check-in OIDC provider) (EP-3700, Open-EO/openeo-api#366)
2021-03-30
Feature
- Add support for Sentinelhub layers on different endpoints (e.g. Landsat-8, MODIS)
- In batch jobs, write one geotiff per date as opposed to reducing all dates into a single pixel
- Improved CARD4L metadata generation for atmospheric_correction
2021-03-12
- Fix support for UDPs in batch jobs (EP-3754)
- Fix support for custom processes in batch jobs (EP-3771)
2021-01-26
Feature
Add an experimental resolution_merge for Sentinel-2 based on the implemntation in FORCE.
Support reading Copernicus Global Land NetCDF files.
Support the Sentinelhub batch process API to generate Sentinel-1 backscatter data.
The atmospheric_correction process can now apply iCor on SentinelHub layers.
2021-01-25
Feature
- Add implementation of on-the-fly Sentinel1 Backscatter (Sigma0) calculation using Orfeo Toolbox on Creodias (EP-3612)
2020-12-06
Performance
Performance improvement for requests with small spatial extents. The backend was loading too much tile metadata.
2020-11-11
Feature
Support the "if" process:https://processes.openeo.org/#if
Major performance improvements for SentinelHub layers. The UTM projection is now used by default when processing these layers. The datatype is no longer set to float by default.
2020-10-28
Internal
Refactored internal process graph parsing: first to a dry-run processing to extract information that can help loading initial data sources. (EP-3509)
2020-10-14
Feature
Support "PNG" output format (non-indexed only).
2020-10-06
Performance improvement
Geotiff (GTiff) output format is faster to generate, and works for larger areas.
Compatibility
Copernicus projections stored in UTM are now also processed and returned in UTM, as opposed to web mercator. This affects processing parameters that depend on a specific projection, like the size of pixels in map units.
This change also improves memory usage and performance for a number of layers.
openeo-python-driver 0.134.0a4.dev20250429+1119
In progress: 0.134.0
- Introduce
asset_url
option to allow backend implementations to have custom code for retrieving assets. Default behavior remains unchanged. - Improve data cube dimension detection in
load_stac
dry-run (#394) - Download asset: return
NoSuchKey
error as 404 Not Found response Open-EO/openeo-geopyspark-driver#1149 - Preserve original non-spatial dimensions in
resample_cube_spatial
dry run (#397) - Fix compatibility with Shapely2 (#158)
0.133.0
- Add
namespace
option tonon_standard_process
- Improve API alignment between
JobRegistryInterface
/ElasticJobRegistry
andDoubleJobRegistry
(Open-EO/openeo-geopyspark-driver#863, Open-EO/openeo-geopyspark-driver#1123) export_workspace
: merge"derived_from"
links of STAC Collections (Open-EO/openeo-geopyspark-driver#1050)- Eliminate usage of deprecated
datetime.utcnow()
(#389) - Add
Content-Range
header when streaming job result content from S3 buckets to support byte range downloads
0.132.0
EvalEnv
: addopeneo_api_version
field to replace vagueversion
(#382)
0.131.1
custom_process_from_process_graph
: add option to hide process from public process listing
0.131.0
ProcessRegistry
: addallow_override mode
(related to #376)
0.130.0
- Allow customization of
GET /process_graphs
response. AddedUserDefinedProcesses.list_for_user()
to replace now deprecatedUserDefinedProcesses.get_for_user()
(for Open-EO/openeo-aggregator#125) - Allow customization of
GET /collections
response. AddedAbstractCollectionCatalog.get_collections_listing()
to eventually replaceAbstractCollectionCatalog.get_all_metadata()
(for Open-EO/openeo-aggregator#122) - Allow customization of
GET /processes
response (for Open-EO/openeo-aggregator#123)
0.129.0
- array_apply: sub-process should now work on all supported processes (Open-EO/openeo-geopyspark-driver#1064)
- Prevent access to non-public UDPs through URL guessing.
0.128.0
load_collection
/load_stac
:spatial_extent
requires (Multi)Polygon geometries (Open-EO/openeo-geopyspark-driver#996)
0.127.0
- Add
simple_job_progress_estimation
config for simple job progress estimation (Open-EO/openeo-geopyspark-driver#772) OpenEoBackendConfig
: be more forgiving about unknown config keys to better support use cases that involve backward/forward incompatible configurations (#322)
0.126.0
- Add STAC collections conformance class (#195)
- update openeo_driver/specs/openeo-api/1.x submodule to tag
1.2.0
(#195) - Extract job option defaults from UDPs and remote process descriptions (#366, Process Parameter Extension)
0.125.0
- Add log level to batch job logs response (#195)
0.124.0
- Better argument validation in
resample_spatial
/resample_cube_spatial
(related to Open-EO/openeo-python-client#690) - Improve
resample_spatial
/resample_cube_spatial
metadata tracking in dry-run (#348) load_collection
/load_stac
: support parameters inproperties
(#327)
0.123.0
- Add time resolution to date prefix of
generate_unique_id()
- Add target version of openEO processes to
GET /processes
(#352, Open-EO/openeo-api#549)
0.122.0
load_collection
: more consistent cube extent handling when a buffer is applied. (#334)load_collection
: collapse multipleload_collection
calls into a single one in cases with buffers. (#336)export_workspace
: fixKeyError: 'alternate'
upon merging into existing STAC collection (Open-EO/openeo-geopyspark-driver#677)- Support custom default in
FlaskRequestCorrelationIdLogging.get_request_id()
0.121.0
export_workspace
: experimental support for merging STAC Collections (Open-EO/openeo-geopyspark-driver#677)
0.120.0
- mask: also apply at load time when resample_spatial is used
- NDVI process: correctly handle band dimension as part of dry run
- Introduce support for user job pagination (#332)
0.119.0
load_stac
: allow omittingdatetime
parameter from STAC API item search request if notemporal_extent
specified (Open-EO/openeo-geopyspark-driver#950)
0.118.0
- Add
openeo_driver.config.load.exec_py_file
(related to Open-EO/openeo-geopyspark-driver#936))
0.116.0
- Propagate alternate
href
s of job result assets (Open-EO/openeo-geopyspark-driver#883) - Ensure that a top level UDF can return a DriverVectorCube. Previously it only returned a JSONResult (#323)
0.115.0
- Support pointing
href
of job result asset to workspace URI (Open-EO/openeo-geopyspark-driver#883) - Fix saving DriverVectorCube to GeoParquet (#300)
0.114.0
- Support removing original assets exported to workspace: (Open-EO/openeo-geopyspark-driver#883)
0.113.0
- Add
max_updated_ago
toJobRegistryInterface.list_active_jobs
API (Open-EO/openeo-geopyspark-driver#902)
0.112.0
- Support exporting objects to object storage workspace (eu-cdse/openeo-cdse-infra#278)
- Move ObjectStorageWorkspace implementation to openeo-geopyspark-driver (eu-cdse/openeo-cdse-infra#278)
0.111.1
- Remove
JobRegistryInterface.list_trackable_jobs
API (Open-EO/openeo-geopyspark-driver#902)
0.111.0
- Add
has_application_id
argument toJobRegistryInterface.list_active_jobs
in preparation to eliminatelist_trackable_jobs
(Open-EO/openeo-geopyspark-driver#902)
0.110.0
- Add
max_age
support toElasticJobRegistry.list_trackable_jobs
(Open-EO/openeo-geopyspark-driver#902)
0.109.0
- Support multiple
export_workspace
processes (eu-cdse/openeo-cdse-infra#264) - Fix
export_workspace
process not executed in process graph with multiplesave_result
processes (eu-cdse/openeo-cdse-infra#264) - Restore deterministic evaluation of process graph with multiple end nodes
0.108.0
- Added support for
apply_vectorcube
UDF signature inrun_udf_code
([Open-EO/openeo-geopyspark-driver#881]https://github.com/Open-EO/openeo-geopyspark-driver/issues/881)
0.107.8
- add
check_config_definition
helper to check definition ofOpenEoBackendConfig
based configs
0.107.7
- return STAC Items with valid date/time for time series job results (Open-EO/openeo-geopyspark-driver#852)
0.107.6
- support passing the output of
raster_to_vector
toaggregate_spatial
during dry run (EU-GRASSLAND-WATCH/EUGW#7) - support
vector_to_raster
of geometries not in EPSG:4326 (EU-GRASSLAND-WATCH/EUGW#7)
0.107.5
- Return compliant GeoJSON from
DriverVectorCube#get_bounding_box_geojson
(Open-EO/openeo-geopyspark-driver#854)
0.107.4
- Don't require a
final_result
entry in theEvalEnv
inconvert_node
(openeo-aggregator#151)
0.107.3
- Support
save_result
processes in arbitrary subtrees in the process graph i.e. those not necessarily contributing to the final result (Open-EO/openeo-geopyspark-driver#424)
0.107.2
- Fix default level of
inspect
process (defaults toinfo
) (Open-EO/openeo-geopyspark-driver#424) apply_polygon
: add support forgeometries
argument (in addition to legacy, but still supportedpolygons
) (Open-EO/openeo-processes#511)
0.107.1
- Update to "remote-process-definition" extension (originally called "remote-udp") (#297, Open-EO/openeo-api#540)
0.107.0
evaluate_process_from_url
: drop support for URL guessing from folder-like URL (#297))evaluate_process_from_url
: align with new (and experimental) "remote-udp" extension (#297))
0.106.0
- Add API to define conformance classes to
OpenEoBackendImplementation
0.105.0
- Require at least
werkzeug>=3.0.3
(#281)
0.104.0
- Expose CSV/GeoParquet output assets as STAC items (Open-EO/openeo-geopyspark-driver#787)
0.103.2
- Start warning about deprecated
evaluate_process_from_url
usage (eu-cdse/openeo-cdse-infra#167)
0.103.0, 0.103.1
- Add helper for finding changelog path
0.102.2
- Support
DriverVectorCube
inapply_polygon
(#287)
0.102.0
- Emit "in" operator (Open-EO/openeo-opensearch-client#32, Open-EO/openeo-geopyspark-driver/#776)
0.101.0
- Add simple enum
AUTHENTICATION_METHOD
forUser.internal_auth_data.get("authentication_method")
values
0.100.0
- Rename
BatchJobLoggingFilter
to more general applicableGlobalExtraLoggingFilter
0.99.0
- Support
job_options
in synchronous processing (experimental) (related to Open-EO/openeo-geopyspark-driver#531, eu-cdse/openeo-cdse-infra#114)
0.98.0
- Add
job_options
argument toOpenEoBackendImplementation.request_costs()
API. It's optional and unused for now, but allows openeo-geopyspark-driver to adapt already. (related to Open-EO/openeo-geopyspark-driver#531, eu-cdse/openeo-cdse-infra#114)
0.97.0
- Remove deprecated and now unused
user_id
argument fromOpenEoBackendImplementation.request_costs()
(cleanup related to Open-EO/openeo-geopyspark-driver#531)
0.96.2
- Decreased default ttl in
ClientCredentialsAccessTokenHelper
to 5 minutes
0.96.1
- Fix delete in EJR CLI app
0.96.0
- Add rudimentary multi-project changelog support
0.95.2
- Automatically add job_id and user_id to all logs during job start handling (#214, eu-cdse/openeo-cdse-infra#56)
0.95.1
- Enable
ExtraLoggingFilter
by default fromget_logging_config
(#214)
0.95.0
0.94.2
- Fix dry run flow for aggregate_spatial, run_udf, and vector_to_raster (#276).
0.94.1
- Improve resilience by retrying EJR search requests (Open-EO/openeo-geopyspark-driver#720).
0.93.0
- For client credentials: use OIDC "sub" identifier as user_id instead of config based mapping to be compatible with ETL API reporting requirements (Open-EO/openeo-geopyspark-driver#708)
0.92.0
- Reinstate the
werkzeug<3
constraint. Apparently too many deployments are stuck with a very low Flask version, which is not compatible with Werkzeug 3 (#243). Pinning this down in openeo-python-driver is unfortunately the most feasible solution for now.
0.91.0
- Support
export_workspace
process andDiskWorkspace
implementation (Open-EO/openeo-geopyspark-driver#676)
0.90.1
- Fix picking up
flask_settings
from OpenEoBackendConfig. This introduces/enables a default maximum request size (MAX_CONTENT_LENGTH
) of 2MB (#254)
0.90.0
- Drop werkzeug<3 constraint (#243)
0.89.0
- Bump Werkzeug dependency to at least 2.3.8 (but below 3.0.0) for security issue (#243)
0.88.0
- job metadata: remove un-official "file:nodata" field (Open-EO/openeo-geopyspark-driver#588)
0.86.0
- Eliminate need to subclass
ConfigGetter
0.85.0
- Expose mapping of job status to partial job status (Open-EO/openeo-geopyspark-driver#644)
0.84.0
- Support GeoParquet output format for
aggregate_spatial
(Open-EO/openeo-geopyspark-driver#623)
0.83.0
- Add
Processing.verify_for_synchronous_processing
API (#248)
0.82.0
- Support EJR replacing ZkJobRegistry
0.81.0
- ~~Block sync request with too large extent. Use batch-job instead for those. (Open-EO/openeo-geopyspark-driver#616)~~
0.80.0
- Add
User
argument toGpsBatchJobs.create_job()
0.79.0
- Disable basic auth support by default (#90)
0.78.0
OpenEoBackendConfig
: make showing stack trace on_load
configurable
0.77.4
- Flag
/openeo/1.2
API version as production ready (#195)
0.77.2
- fixup "polygons" argument of "apply_polygon" (#229)
0.76.1
- Attempt to workaround issue with in-place process graph modification and process registry process spec JSON (re)encoding (Open-EO/openeo-geopyspark-driver#567)
0.76.0
- Add
OpenEoBackendConfig.deploy_env
0.75.0
- Move
enable_basic_auth
/enable_oidc_auth
toOpenEoBackendConfig
0.73.0
- add
ClientCredentials.from_credentials_string()
0.72.3
- Improve request id logging when log collection failed (Open-EO/openeo-geopyspark-driver#546)
0.72.2
- use
yymmdd
prefix in job/req ids for now
0.72.1
- Add access_token introspection result (when enabled) to
User.internal_auth_data
0.72.0
- Start returning "OpenEO-Costs-experimental" header on synchronous processing responses
- Extract client credentials access token fetch logic from ElasticJobRegistry
into
ClientCredentialsAccessTokenHelper
to make it reusable (e.g. for ETL API as well) (Open-EO/openeo-geopyspark-driver#531)
0.71.0
OpenEoBackendImplementation.request_costs()
: add support for passing User object (related to Open-EO/openeo-geopyspark-driver#531)
0.70.0
- Initial support for openeo-processes v2.0, when requesting version 1.2 of the openEO API (#195)
- Drop support for 0.4 version of openeo-processes (#47)
0.69.1
- Add backoff to ensure EJR deletion (#163)
0.69.0
- Support job deletion in EJR (#163)