Latest Results
perf: add reusable Parser to amortize path-trie compilation (#83)
* perf: add opt-in validate_closing_tags to skip end-tag checks
Add ParserOptions::validate_closing_tags (defaults to true, preserving
prior behavior). Setting it false disables quick-xml's per-end-tag name
validation, since PathTracker already enforces nesting via depth
tracking. Measured ~2-6% throughput gain across parse_small,
parse_medium, and parse_wide_fanout (all p<0.01).
The trade-off is that opening/closing-tag mismatches are no longer
rejected, so the fast path is opt-in for trusted inputs only.
Also factor the shared reader configuration into configure_reader() so
both entry points apply ParserOptions identically.
* perf: add opt-in validate_attributes to skip duplicate-attribute check
Add ParserOptions::validate_attributes (defaults to true, preserving
current behavior). When false, the attribute iterator runs with
quick-xml's duplicate-key detection disabled via with_checks(false).
Beyond skipping an O(n^2) scan of an element's attributes, this removes
a heap allocation quick-xml otherwise makes per attribute-bearing
element (it records each seen key's byte range in a Vec). On
attribute-heavy documents that allocation dominates the check's cost.
Measured on the attribute-heavy benches (clean A/B, only the config
bool flipped):
parse_small / buffered -6.5% (p=0.00)
parse_small / zero_copy -6.2% (p=0.00)
parse_medium / buffered -4.9% (p=0.00)
parse_medium / zero_copy -7.1% (p=0.00)
The trade-off mirrors validate_closing_tags: a duplicated attribute is
no longer rejected. Because field values accumulate by appending, a
duplicate's values are concatenated rather than reported as an error,
so the fast path is opt-in for trusted inputs only.
Covered by test_validate_attributes_false_still_parses_attributes and
test_validate_attributes_false_tolerates_duplicate_attribute.
* perf: add reusable Parser to amortize path-trie compilation
parse_xml/parse_xml_slice rebuild the PathRegistry trie and re-validate
the Config on every call — a fixed cost (~8.5us here) paid before any XML
is read. On large documents it is amortized to nothing, but on small ones
it dominates: a measured ~33% of total parse time on a 2KB document.
Introduce a `Parser` type that compiles the config + path registry once
and exposes parse()/parse_slice() so callers processing many documents
with one schema pay that cost a single time. The existing free functions
become thin wrappers over a throwaway Parser, so the public API is purely
additive and behavior is unchanged.
Internally the per-parse converter now borrows the registry from the
Parser rather than owning it, and builds only the fresh Arrow builders
each parse. Adds a parse_tiny benchmark (with reused-Parser variants) to
guard the setup cost, and an integration test proving no state leaks
between documents parsed through one Parser.
* docs: document reusable Parser for many-document workloads
* style: rustfmt run_parse signature Latest Branches
+4%
perf/skip-attribute-decode +4%
0%
© 2026 CodSpeed Technology