Latest Results
feat(util-charset): decode MLLP wire bytes as UTF-8, fatally (#659) (#660)
* test(mllp): regression — server silently corrupts non-UTF-8 feeds (MSH-18 ignored)
The server decodes every inbound payload as UTF-8 with a non-fatal
TextDecoder (serve.ts) and never reads the HL7v2 character set declared in
MSH-18. A legacy single-byte feed (ISO 8859/1 / Windows-1252) is therefore
silently corrupted: a non-UTF-8 byte (0xE9 = 'é' in a patient name) is
replaced with U+FFFD and the mangled message is parsed and routed with no
error raised.
This characterization test drives a real serve() loop with raw Latin-1 bytes
and pins the current behavior: the decoded body loses "José" and gains the
U+FFFD replacement character, while the server still answers AA. When MSH-18
handling lands the assertions must flip.
Refs #659.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat(util-charset): decode MLLP wire bytes as UTF-8, fatally (#659)
Add @glion/util-charset, a zero-dependency, runtime-agnostic UTF-8 codec for
HL7v2 wire bytes, and route the MLLP server and client through it so a non-UTF-8
feed fails loudly instead of being silently corrupted to U+FFFD.
- @glion/util-charset: `decodeBytes` (fatal UTF-8 decode; strips a UTF-8 BOM) and
`encodeBytes`; `IncompatibleCharsetError` (code `INCOMPATIBLE_CHARSET`), thrown
on a non-UTF-8 byte-order mark or otherwise-invalid UTF-8.
- mllp server: decode payloads via `decodeBytes` inside the per-message try, so a
non-UTF-8 message surfaces through `onError` instead of being decoded to U+FFFD
and acknowledged as valid.
- mllp-client: decode ACKs via `decodeBytes`; a non-UTF-8 ACK rejects with
`PARSE_FAILED`, carrying the `IncompatibleCharsetError` on `cause`.
- Flip the #659 regression test to assert the loud rejection.
Honouring MSH-18 / BOM-declared charsets and emitting non-UTF-8 for legacy
receivers are deferred to #662; dependency/SBOM governance to #661.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* chore: install hunk-review skill
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* refactor(util-charset): extract BOM check into a guard; cover all branches
- Extract the non-UTF-8 BOM detect-and-throw into a `rejectNonUtf8Bom` guard so
`decodeBytes` reads as guard + decode
- Test all four non-UTF-8 BOMs (UTF-16LE/BE, UTF-32LE/BE) — util-charset now 100%
- mllp server: drop the catch-site error normalization that duplicated
`reportError`'s; widen `getMessageInfo` to accept `unknown`
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* test(util-charset): cross-check UTF-8 and BOM constants against iconv-lite
- Add a dev-only iconv-lite oracle test: UTF-8 encode/decode round-trips and the
non-UTF-8 BOM bytes (UTF-16/32 LE/BE) are validated against an independent
implementation, not only against themselves
- Cite the Unicode FAQ for the BOM byte sequences and document why UTF-32LE is
tested before UTF-16LE (shared `FF FE` prefix)
- iconv-lite stays a devDependency; the package remains dependency-free
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* refactor(util-charset): rename to CharsetError; split into error/decode/encode modules
- Rename IncompatibleCharsetError → CharsetError; drop the redundant exported
INCOMPATIBLE_CHARSET const (inlined into the error's `code`)
- Move the non-UTF-8 BOM table and message-building onto CharsetError as a
private static plus a `CharsetError.forBytes(bytes, cause)` factory
- decodeBytes relies on the fatal decoder as the single gate; the BOM table only
enriches the error message on the failure path
- Split src into error.ts / decode.ts / encode.ts with index.ts as the public
barrel; split tests to match
- Strengthen BOM tests to assert the CharsetError type and code, not just the message
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(mllp): surface inbound decode failures as MllpServerError, not the codec's error
Don't leak @glion/util-charset's CharsetError across the MLLP package boundaries —
each package surfaces only its own error vocabulary, with the CharsetError on `cause`.
- Server: translate a decode CharsetError into MllpServerError (new code
INCOMPATIBLE_CHARSET) before onError; the CharsetError rides on `cause`. Handler
and lifecycle errors still pass through raw (the consumer owns those).
- Client: a non-UTF-8 ACK rejects with MllpClientError(PARSE_FAILED), CharsetError
on `cause` (reverts the dedicated client code — minimal vocabulary).
- Consumers of @glion/mllp / @glion/mllp-client branch on those packages' own
errors and never import @glion/util-charset.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Latest Branches
0%
0%
test/msh18-charset-regression 0%
© 2026 CodSpeed Technology