weiji14
foss4g2025
BlogDocsChangelog

:alembic: Benchmark nvTIFF CUDA GPU-based decoding

#4Merged
Comparing
bench/cudacogreader
(
81202a8
) with
main
(
43809ea
)
CodSpeed Performance Gauge
-23%
Regressions
1
Untouched
1

Benchmarks

Failed

2_async-tiff_CPU_threads=4[Sentinel-2 TCI]
benches/read_geotiff.rs::benches::criterion_benchmark::read_geotiff
Regression
CodSpeed Performance Gauge
-23%
12.4 s16.1 s

Passed

1_gdal_CPU_threads=4[Sentinel-2 TCI]
benches/read_geotiff.rs::benches::criterion_benchmark::read_geotiff
CodSpeed Performance Gauge
0%
7.9 s7.9 s

Commits

Click on a commit to change the comparison range
Base
main
43809ea
-0.15%
:alembic: Benchmark nvTIFF CUDA GPU-based decoding Run benchmarks reading the LZW-compressed GeoTIFF to CUDA GPU memory via DLPack. Using cog3pio's CudaCogReader which uses bindings to the nvTIFF library.
f8ab974
2 days ago
by weiji14
-14.84%
:truck: Perform host to device copy for CPU benchmarks When 'cuda' feature flag is enabled, copy decoded bytes from host (CPU) to device (GPU) to allow fair comparison with nvTIFF benchmark where data resides in CUDA memory. Well, not exactly fair since nvTIFF is winning, but still need this. Note that async-tiff's decoded byte length seems longer than expected, not sure why... Added some extra docs and links to the main README.md too.
dea2b8e
2 days ago
by weiji14
-8.06%
:recycle: Collapse async-tiff tile decode into single flat_map_iter call No need for separate `.flat_map` and `.map`. Can coerce Bytes into u8 directly apparently. Still need to figure out if there's a more efficient way of multi-threaded decoding to raw bytes though.
81202a8
2 days ago
by weiji14
© 2025 CodSpeed Technology
Home Terms Privacy Docs