Latest Results
perf(linter/plugins): lazy deserialize tokens and comments (#20474)
Performance improvement to tokens and comments APIs.
## The problem
Previously, all tokens and comments methods would deserialize *all* tokens/comments into an array of `Token` / `Comment` / `Token | Comment` objects, and then binary search through those arrays to find the token(s) / comment(s) they're looking for.
This has 2 major disadvantages:
1. Files typically contain *a lot* of tokens (even more than the number of AST nodes). Deserializing them all is very costly (up to 30% of total Oxlint runtime when run with only a JS rule which just calls a tokens-related method).
2. The binary searches these methods do are quite expensive. Even in TurboFan-optimized code, accessing `token.start` involves getting pointer to the `Token` object from the `tokens` array, an "is this object a `Token`?" safety check, then reading the `start` field from the `Token` - all just to access a single `u32`, and that happens over and over.
## This PR's solution
Solve both these problems by making tokens and comments methods read `start` / `end` offsets directly from the buffers which contain the tokens/comments data.
This data is tightly packed in memory, and strongly typed (read from `Uint32Array`s), so getting `start` / `end` of a token requires no indirection and no type checks.
More importantly, it removes the need to deserialize all tokens / comments upfront. The desired token(s) are located, touching only the buffer, and then *only* the ones which need to be returned to rule code are deserialized into JS objects.
If a rule accesses `ast.tokens`, `ast.comments`, or `sourceCode.tokensAndComments` then all tokens / comments need to be deserialized, as they're all returned to the rule as an array - but that's unavoidable. This PR doesn't make that any cheaper, but it doesn't make it measurably more costly either.
But where no rule requires the full array of tokens / comments, and they only use token/comment search methods (e.g. `getFirstToken`, `getCommentsBefore`), a great deal of work will be saved. This covers the vast majority of rules.
## Implementation details
The main complication is the `includeComments` option to tokens methods. When `true`, search needs to be over a combined set of both tokens and comments.
When `includeComments: true` option is passed to a tokens method, a buffer is created containing data about all tokens and comments, interleaved in source code order. This buffer can then be used for binary search in tokens methods.
Whether each token / comment has been deserialized already or not is tracked by a "deserialized" flag in the tokens/comments buffers. Each token / comment in the buffer is 16 bytes. This flag lives in byte 15. For tokens, this byte is always already 0 in the buffer when it arrives from Rust side. For comments, we manually set `comment.content = CommentContent::None;` for every comment on Rust side. `comment.content` is positioned at byte 15 in the `Comment` struct, and `CommentContent::None` is stored as 0.
## Possible future improvements
### SoA storage
Binary search operates only on `start` field of tokens / comments, which are 16 bytes apart in the buffer. It would be more efficient if tokens were stored in struct-of-arrays (SoA) style so all `start` values were tightly packed together. This would reduce CPU cache misses in the hot loops of binary searches.
### Pre-compute tokens-and-comments buffer on Rust side
The buffer containing tokens and comments, required to support `includeComments: true`, is currently generated on JS side (but lazily). We could move that to Rust side, which would be faster. However, it might be redundant work in many cases because the buffer is only required if a rule uses `includeComments: true`.
We could alternatively keep the laziness optimization, by calling back into Rust to build the buffer on demand - but JS-Rust calls have a cost too. Maybe communicating via `Atomics` would be faster than an actual function call?
If we had a way to share buffers with WASM, optimal solution might be to generate the buffer lazily (as now) but in WASM, which would be faster for this kind of pure number-crunching, but without the overhead of calling into Rust. Latest Branches
0%
costajohnt:feat/minify-ternary-numeric 0%
baevm:prefer-strict-boolean-matchers 0%
feat/optimize-enums-option Ā© 2026 CodSpeed Technology