langchain-ai
langchain
Blog
Docs
Changelog
Blog
Docs
Changelog
Overview
Branches
Benchmarks
Runs
Performance History
Latest Results
feat(langchain): apply lookback withholding to cumulative tool-call args Corridor flagged a partial-exposure window on the streaming tool-call args path: the in-place redaction caught complete PII patterns, but intermediate cumulative states (`{"to": "alice@`, `{"to": "alice@example`, …) were forwarded unredacted before the final chunk triggered a match. Extend the same lookback semantics already used for `text-delta` and `reasoning-delta` to the cumulative tool-call args path: detect on the full cumulative args, then emit only `args[:len(args) - lookback]` on each chunk. The trailing `stream_lookback` characters are withheld — they might be the start of a partial PII match completing in a future cumulative delta. The withheld tail surfaces at `content-block- finish` where `_finalize_block` redacts the parsed args dict. For args ≤ `stream_lookback` (the typical case for tool calls), this withholds the entire args during streaming — the redacted dict appears only at finalize. For args > `stream_lookback`, the safe prefix streams incrementally as the cumulative state grows. Residual exposure note: PII appearing more than `stream_lookback` chars from the cumulative tail in a delta where the pattern hasn't yet completed can still surface in the emit prefix — same shape as "PII longer than `stream_lookback`" on the text path. The `content- block-finish` snapshot redaction remains the backstop. `block` strategy unconditionally emits empty args during streaming and lets `_finalize_block` empty the args dict on PII detection; `after_model` raises shortly after.
nh/pii-middleware-stream-redaction
2 hours ago
fix(langchain): walk `AIMessage.tool_calls` in `_redact_value` Corridor flagged that `_redact_value`'s `BaseMessage` branch returned the message unchanged when `.content` was empty or non-string. For tool-calling responses (where empty content + populated `tool_calls` is the common shape), this meant `values`-channel state snapshots containing such a message could leak the raw `tool_calls[*].args` on the wire. The legacy `(AIMessage, metadata)` payload path masked the bug incidentally — it mutates `tool_calls` in place before the values event fires — but on the v3 streaming path the state's AIMessage is assembled by langgraph without that mutation, leaving the values snapshot to leak. Extend the `BaseMessage` branch to also walk `AIMessage.tool_calls`: each tool call's `args` dict recurses through `_redact_value`, and if either content or any tool_call's args changed, return a single fresh `model_copy(update=...)` carrying both updates. The original message stays intact for the state-level enforcers. Adds two regression tests covering the empty-content-with-tool-call case and the combined content + tool_calls case.
nh/pii-middleware-stream-redaction
2 hours ago
docs(langchain): document tool-call and tools-channel coverage on `PIIMiddleware`
nh/pii-middleware-stream-redaction
2 hours ago
style(langchain): ruff format on `PIIMiddleware`
nh/pii-middleware-stream-redaction
3 hours ago
fix(langchain): close `block` strategy bypass on `PIIMiddleware` stream `PIIMiddleware(..., strategy="block", apply_to_output=True)` installed no stream transformer, so raw `content-block-delta` events flowed unfiltered to consumers iterating `astream_events` / `run.messages`. By the time `after_model` raised `PIIDetectionError`, the consumer had already seen the PII on the wire. The previous reasoning — that raising mid-stream would leave projections torn — was correct in principle but left the streaming surface unprotected. Install a buffering transformer for `block` instead. It withholds every delta from the consumer (empty `delta["text"]`) and runs detection once on the assembled block at `content-block-finish`. If PII is present the finalize content is zeroed too; `after_model` then raises on the original state message to terminate the run. If no PII, the finalize event carries the full text and the consumer sees the message all at once. No mid-stream raises, no leaked PII. For the Python-3.10 legacy `(BaseMessage, metadata)` event shape, returning `False` from `process()` only drops the event from the main log — `MessagesTransformer` still aprocesses it before that and projects the live AIMessage. Replacing the event's `data` tuple with an empty-content copy keeps the `run.messages` projection clean while leaving the original message in graph state for `after_model` to raise on. Adds: - a unit test that streams PII under `block` and asserts every delta and the finalize content are empty - a unit test that streams clean text under `block` and asserts the finalize content carries the full text - an end-to-end test under `create_agent` that confirms no PII characters reach the consumer and `PIIDetectionError` is raised
nh/pii-middleware-stream-redaction
3 hours ago
fix(langchain): detect PII on the full accumulated buffer in `_PIIStreamTransformer` Corridor finding c479d3d1 (CWE-200, high). The previous logic split `combined` at `safe_end = len(combined) - lookback` and ran detection only on the prefix about to be emitted. When `combined` grew past `lookback` and PII sat near the start, the split landed inside the match — the truncated prefix in `safe_text` failed the detector's `\b...\b` anchored regex, so no redaction ran and the partial PII leaked on the wire. The suffix sat in the held tail and emerged in subsequent deltas, also raw. A live consumer concatenating deltas reconstructed the full PII even though the finalize snapshot was later correctly redacted. Run detection on the full `combined` buffer first, redact in place, then split at `len(combined) - lookback`. Any complete PII fully contained within the accumulated buffer is now guaranteed redacted before any character lands on the wire. PII extending beyond `combined` (split across more deltas than the lookback can hold) still relies on the finalize snapshot — that's the documented trade-off of the `stream_lookback` cap. Adds a regression test that reproduces the straddle case Corridor described: a 30-char email at position 0 of a 50-char delta with `lookback=32`. The old logic emitted `"alice@longerdomain"` on the wire; the new logic redacts before emission.
nh/pii-middleware-stream-redaction
4 hours ago
fix(langchain): redact PII on legacy `(BaseMessage, metadata)` events too CI on Python 3.10 surfaced that the langgraph→langchain integration can emit the legacy `(BaseMessage, metadata)` shape on `messages` stream events instead of the v3 `content-block-delta` protocol dicts — this happens when `_astream` falls back to `_agenerate` and the full message is delivered as a single event. The previous transformer code only handled the v3 dict shape, so streamed redaction silently no-op'd on that path and the unredacted message reached the consumer. Branch on the payload shape: when it's a `BaseMessage`, scrub `.content` in place; when it's a v3 protocol-event dict, run the existing lookback machinery. Either way the consumer sees redacted text. Updated `test_main_event_log_carries_redacted_deltas` to assert the security claim against whatever surface the runtime emits — v3 deltas on 3.14, legacy message payloads on 3.10 — instead of tying the assertion to a specific event shape that varies by environment.
nh/pii-middleware-stream-redaction
4 hours ago
refactor(langchain): drop whitespace-boundary flush from `PIIMiddleware` stream transformer The optimization was gated by an `is_builtin` heuristic that incorrectly classified `credit_card` as whitespace-safe. Its detector regex matches `\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}`, so a stream like `["Card: 5425 ", "2334 3010 9903"]` flushed each whitespace-separated group to the wire before detection saw the full card — leaking the PAN on the very PII type users most want streamed redaction to protect. Rather than introduce a per-detector `whitespace_safe` property (which would need conjunction across rules and an ergonomic story for custom detectors), drop whitespace flushing entirely. The transformer now always holds `stream_lookback` characters in the buffer, which is bulletproof against any whitespace-permissive detector regex. Users trading some first-token latency for in-flight redaction can tune `stream_lookback` down to recover throughput on short-pattern detectors. Adds a regression test that streams a space-separated card across delta boundaries and asserts no digit groups reach the wire.
nh/pii-middleware-stream-redaction
5 hours ago
Latest Branches
CodSpeed Performance Gauge
0%
feat(langchain): redact streamed PII in flight on `PIIMiddleware`
#37616
2 hours ago
8b20e6f
nh/pii-middleware-stream-redaction
CodSpeed Performance Gauge
0%
chore(infra): bump `langchain-tests` floor to 1.1.9
#37610
6 hours ago
d53385c
mdrxy/bump-standard-floor
CodSpeed Performance Gauge
0%
release(standard-tests): 1.1.9
#37609
6 hours ago
ccf06cc
mdrxy/release-standard-tests-1-1-9
© 2026 CodSpeed Technology
Home
Terms
Privacy
Docs