Avatar for the langchain-ai user
langchain-ai
langchain
BlogDocsChangelog

Performance History

Latest Results

fix(ci): close shell-injection vector in middleware evals workflow Addresses the Corridor security review on this PR. GitHub textually expands `${{ inputs.* }}` / `${{ matrix.* }}` expressions before the shell runs, so splicing those expressions into the `run:` script body lets a value containing `'` break out of the string literal and execute arbitrary commands in a job that has every provider API key and `LANGSMITH_API_KEY` in scope. The fix is the canonical mitigation: - Every value derived from `inputs.*` or `matrix.*` is now passed via `env:` instead of spliced into the script source. GitHub fills env vars at job start; the values never appear in the script text and bash treats them as data via `"$VAR"`. - The script body invokes `pytest` directly rather than going through `make evals`. The Makefile target's `$(MODEL)` / `$(PYTEST_EXTRA)` are Make's textual expansion (the second issue Corridor flagged) — bypassing `make` from the workflow keeps user-controlled values out of the Make layer entirely. The Makefile target itself is unchanged and remains the supported local-run path; running it locally is safe because the operator controls their own inputs. - `pytest` args are built as a bash array (`PYTEST_ARGS+=(...)`) so word-splitting is handled by bash, not by the script source. - `set -euo pipefail` so any earlier failure halts the job before the pytest invocation. Realistic exposure of the pre-fix code: workflow_dispatch requires repository write access, so the realistic attacker was a malicious insider or a compromised maintainer account, not an external actor. Mitigating anyway because secrets-in-scope is the wrong default.
nh/todo-middleware-loop-contract
14 hours ago
fix(langchain): emit final answer after the final `write_todos` call in `TodoListMiddleware` Appends a short "Finishing a task" section to `WRITE_TODOS_SYSTEM_PROMPT` that tells the model to call `write_todos(completed)` in one turn and then deliver its substantive final answer in the next turn, after the tool result is returned. The existing prompt content is preserved unchanged. Why: with Anthropic models (Sonnet 4.6 in particular), the model tends to put a substantive final answer in the same turn as the final `write_todos` call that marks every todo `completed`. The agent loop then forces one more model turn after the tool result, and that loop-terminating message ends up being either empty or a content-free recap ("All tasks complete! ✅"). Downstream consumers that read the last `AIMessage` to extract the agent's answer see the wrap-up, not the real answer. The new section is structural guidance, not Anthropic-specific: it restores the natural agent-loop contract that the last message is the substantive answer. Validated under the eval suite added in this PR: Sonnet 4.6, baseline tier (n=1 per task): test_density_rank_lands_in_final_message PASS test_population_compare_lands_in_final_message PASS test_rank_with_unknown_lookup_lands_in_final_message PASS test_trivial_plan_skips_write_todos PASS Sonnet 4.6, hillclimb tier (n=1 per task): test_design_api_lands_in_final_message PASS test_density_cairo_lands_in_final_message PASS Under the prior prompt, `density-rank` and `population-compare` failed 0/8 trials at n=8 (the final `AIMessage` was a 21-character "All tasks complete!" wrap with the substantive ranking living in the previous turn). With this change the failure mode does not reproduce. GPT-5 is unaffected: it does not exhibit the wasted-turn pattern under either prompt. The added guidance is structural, so it is a no-op for models that already place their final answer in the loop-terminating message. AI-agent involvement is disclosed below. --- Co-developed with Claude (Anthropic).
nh/todo-middleware-loop-contract
14 hours ago
feat(langchain): redact streamed PII in flight on `PIIMiddleware` (#37616) `PIIMiddleware` previously scrubbed detected PII only at the state level via its `after_model` / `before_model` hooks. Consumers reading the live stream — `astream_events(version="v3")` or `run.messages` / `run.tool_calls` / `run.values` — saw the raw model text, the raw tool-call args, the raw tool outputs, and the raw state snapshots until the run finished and the canonical conversation history was written. This change registers a stream transformer ahead of `MessagesTransformer` that redacts every wire surface of an agent run. The transformer holds a sliding lookback buffer (default 128 characters) per `(run_id, content-block index)` so PII patterns that straddle delta boundaries are caught before the safe prefix is released downstream. Anything older than the lookback is run through the configured detector and emitted; the trailing tail stays buffered until a later delta extends it past the cap or the block finishes. `_finalize_block` always re-runs detection over the full block snapshot so the finalized content lands fully redacted even when the in-flight buffer never released a tail (short responses, or PII arriving in the final delta). The `block` strategy is now supported on the streaming path via a buffering mode that withholds every delta until the block resolves — clean blocks release the full text at finalize, PII-bearing blocks zero the wire and let `after_model` / `apply_to_tool_results` raise `PIIDetectionError` on the original state message. Activation is gated on `apply_to_output=True`, matching the existing post-hoc semantics. The middleware's transformer factory is cloned by `StreamMux._make_child` into every subgraph scope, so attaching `PIIMiddleware` at the outer agent also redacts streamed deltas from sub-agents invoked inside tools. ## Tool-call and tools-channel coverage The transformer covers every wire surface of an agent run, not just AI message text: - **Streamed AI text deltas** (`content-block-delta` of type `text-delta`) — lookback machinery, redacted in place. - **Streamed tool-call args** (`content-block-delta` with `tool_call_chunk` / `server_tool_call_chunk` fields) — each delta carries the full cumulative args string; detection runs on the field directly and redacts in place. Verified empirically against `_compat_bridge.py` and the consumer-side `_merge_block_delta_into_store` snapshot-replace semantics. - **Finalized tool-call blocks** (`content-block-finish` with `tool_call` / `server_tool_call` / `invalid_tool_call`) — `args` dict walked recursively and each string leaf redacted. - **Tool execution events on the `tools` channel** — `tool-started.input`, `tool-output-delta`, `tool-finished.output`, `tool-error.message` all run through detection. String deltas use the same lookback machinery as text-deltas keyed by `tool_call_id`; structured payloads walk recursively. - **State snapshots on the `values` channel** — message lists are walked and each message's `.content` is redacted on a fresh copy. Graph state itself stays intact for the state-level enforcer (`apply_to_tool_results` via `before_model`) to act on independently. - **Legacy `(BaseMessage, metadata)` payloads** on the `messages` channel (Python 3.10 path, where `langgraph`'s `ASYNCIO_ACCEPTS_CONTEXT = sys.version_info >= (3, 11)` falls back to a code path that doesn't propagate the streaming callback into the chat model) — `.content` and `AIMessage.tool_calls[*].args` are scrubbed. For `block`, the event's `data` tuple is replaced with an empty-content copy so the original message stays in state for `after_model` to raise on. ## Worth a careful look - `_PIIStreamTransformer._mutate_text_delta` — lookback partition. Anything older than `lookback` characters is released after redaction; the tail stays buffered. Bulletproof against whitespace-permissive detectors (notably `credit_card`, whose regex matches across spaces). - `_PIIStreamTransformer._mutate_tool_call_chunk_delta` — direct in-place redaction of the cumulative args string. No buffer; the wire shape is cumulative-snapshot, the consumer-side merge is replace-not-append. - `_PIIStreamTransformer._mutate_legacy_payload` — the dual path: mutate-in-place for non-`block` (idempotent with `after_model`), replace-with-empty-copy for `block` (keeps original in graph state for `after_model` to raise on). - `_PIIStreamTransformer._redact_value` — the recursive walker. `BaseMessage` branch returns a fresh `.content`-redacted copy via `model_copy(update=...)` — never mutates in place — so tool-output payloads that wrap a `ToolMessage` and message lists in state snapshots flow through cleanly. - The new `transformers` attribute on `PIIMiddleware`: this is what makes `create_agent` pick the factory up. Multiple `PIIMiddleware` instances each register one transformer; ordering is preserved within the `before_builtins` lane. ## Compatibility Bumps `langgraph` to `>=1.2.1` for the `before_builtins` opt-in on `StreamTransformer`.
master
18 hours ago
fix: lazyload numpy, simsimd fix numpy lint
wkpark:lazy_load_numpy_transformers
20 hours ago

Latest Branches

CodSpeed Performance Gauge
0%
fix(langchain): land final answer in last AIMessage for `TodoListMiddleware`#37643
14 hours ago
8587b09
nh/todo-middleware-loop-contract
CodSpeed Performance Gauge
0%
feat(langchain): redact streamed PII in flight on `PIIMiddleware`#37616
18 hours ago
0a6b801
nh/pii-middleware-stream-redaction
CodSpeed Performance Gauge
+11%
3 months ago
a34d883
wkpark:lazy_load_numpy_transformers
© 2026 CodSpeed Technology
Home Terms Privacy Docs