Avatar for the mluttikh user
mluttikh
xml2arrow
BlogDocsChangelog

Performance History

Latest Results

perf: skip attribute decode/unescape for entity-free values Attribute values were unconditionally run through decode_and_unescape_value, costing a per-attribute decode + unescape even for plain UTF-8 values with no entity references. Mirror the Event::Text path: append the raw attribute bytes directly and only fall back to full decode/unescape when an entity ('&') is present. Utf8 fields are validated once at row finalization anyway, and numeric fields parse straight from bytes. Also mark parse_attributes #[inline(never)] so the added check does not bloat the shared handle_event dispatch loop (kept the hot Start/Text/End path compact and avoided a code-layout regression). Measured -3.2% to -4.2% on the attribute-heavy parse_small/parse_medium benches (both buffered and zero-copy, p<0.01).
perf/skip-attribute-decode
9 days ago
chore: Release v0.17
release_v0_17
10 days ago
perf: add reusable Parser to amortize path-trie compilation (#83) * perf: add opt-in validate_closing_tags to skip end-tag checks Add ParserOptions::validate_closing_tags (defaults to true, preserving prior behavior). Setting it false disables quick-xml's per-end-tag name validation, since PathTracker already enforces nesting via depth tracking. Measured ~2-6% throughput gain across parse_small, parse_medium, and parse_wide_fanout (all p<0.01). The trade-off is that opening/closing-tag mismatches are no longer rejected, so the fast path is opt-in for trusted inputs only. Also factor the shared reader configuration into configure_reader() so both entry points apply ParserOptions identically. * perf: add opt-in validate_attributes to skip duplicate-attribute check Add ParserOptions::validate_attributes (defaults to true, preserving current behavior). When false, the attribute iterator runs with quick-xml's duplicate-key detection disabled via with_checks(false). Beyond skipping an O(n^2) scan of an element's attributes, this removes a heap allocation quick-xml otherwise makes per attribute-bearing element (it records each seen key's byte range in a Vec). On attribute-heavy documents that allocation dominates the check's cost. Measured on the attribute-heavy benches (clean A/B, only the config bool flipped): parse_small / buffered -6.5% (p=0.00) parse_small / zero_copy -6.2% (p=0.00) parse_medium / buffered -4.9% (p=0.00) parse_medium / zero_copy -7.1% (p=0.00) The trade-off mirrors validate_closing_tags: a duplicated attribute is no longer rejected. Because field values accumulate by appending, a duplicate's values are concatenated rather than reported as an error, so the fast path is opt-in for trusted inputs only. Covered by test_validate_attributes_false_still_parses_attributes and test_validate_attributes_false_tolerates_duplicate_attribute. * perf: add reusable Parser to amortize path-trie compilation parse_xml/parse_xml_slice rebuild the PathRegistry trie and re-validate the Config on every call — a fixed cost (~8.5us here) paid before any XML is read. On large documents it is amortized to nothing, but on small ones it dominates: a measured ~33% of total parse time on a 2KB document. Introduce a `Parser` type that compiles the config + path registry once and exposes parse()/parse_slice() so callers processing many documents with one schema pay that cost a single time. The existing free functions become thin wrappers over a throwaway Parser, so the public API is purely additive and behavior is unchanged. Internally the per-parse converter now borrows the registry from the Parser rather than owning it, and builds only the fresh Arrow builders each parse. Adds a parse_tiny benchmark (with reused-Parser variants) to guard the setup cost, and an integration test proving no state leaks between documents parsed through one Parser. * docs: document reusable Parser for many-document workloads * style: rustfmt run_parse signature
main
10 days ago
style: rustfmt run_parse signature
perf/reuse-parser
10 days ago

Latest Branches

CodSpeed Performance Gauge
+4%
perf: skip attribute decode/unescape for entity-free values#85
9 days ago
587540a
perf/skip-attribute-decode
CodSpeed Performance Gauge
+4%
CodSpeed Performance Gauge
0%
chore: Release v0.17#84
10 days ago
f9d475f
release_v0_17
© 2026 CodSpeed Technology
Home Terms Privacy Docs