Optimize `save_report_lines` a bit
This avoids some intermediate allocations in particular for a `flag_map` case, as well as avoiding another allocation.
In other cases its unfortunately not possible to avoid more intermediate allocations.
Migrate benchmarks to criterion and setup `codspeed`
This migrates the pyreport benchmarks to `criterion` via the codspeed compatibility layer.
Additionally, this also creates a CI job to run the benchmarks within the codspeed runner, and upload the results.