Avatar for the langchain-ai user
langchain-ai
langchain
BlogDocsChangelog

Performance History

Latest Results

revert(langchain): remove the eval framework from this PR The real-model eval framework, test file, GH Actions workflow, Makefile target, and pyproject markers added earlier in this branch are removed. The actual eval coverage for `TodoListMiddleware` is being moved to `langchain-ai/deepagents`'s existing `libs/evals` suite — same `TrajectoryScorer` API, but with the tests run against `create_agent` + `TodoListMiddleware` (not `create_deep_agent`) so they still probe bare langchain middleware behavior. Why move it: the framework here was a near-duplicate of deepagents' `libs/evals/tests/evals/utils.py` (a slimmed copy, but architecturally identical). The motivation for the duplication was the theory that deepagents' `BASE_AGENT_PROMPT` masked the wasted-turn bug, so running through `create_deep_agent` wouldn't expose it. That theory turned out to be wrong — `create_deep_agent` on Sonnet 4.6 against our task shapes without the langchain prompt fix produces the exact failure mode (final text `"All tasks complete! ✅"` on density_rank, empty string on population_compare, `"All tasks complete!"` on rank_with_unknown). The deepagents framework is a sensitive substrate; it just needs new test files that drive `create_agent + TodoListMiddleware` instead of `create_deep_agent`. Removing here: - `libs/langchain_v1/tests/evals/` (whole tree) - `.github/workflows/middleware_evals.yml` - `evals` target in `libs/langchain_v1/Makefile` - `eval_category`/`eval_tier`/`langsmith` markers in `pyproject.toml` Remaining in this PR: - `WRITE_TODOS_SYSTEM_PROMPT` / `WRITE_TODOS_TOOL_DESCRIPTION` / `_write_todos` tool-message changes in `libs/langchain_v1/langchain/agents/middleware/todo.py` (the actual fix this PR exists to land) - Corresponding unit test updates in `tests/unit_tests/agents/middleware/implementations/test_todo.py`. The companion `deepagents` PR (link to follow) will land the real-model evals using deepagents' existing infrastructure.
nh/todo-middleware-loop-contract
22 minutes ago
Merge branch 'master' into mmk/store_cached_generation
keenborder786:mmk/store_cached_generation
19 hours ago
Merge branch 'master' into mmk/reasoning_strip
keenborder786:mmk/reasoning_strip
19 hours ago
Merge branch 'master' into mmk/bound_parameters
keenborder786:mmk/bound_parameters
19 hours ago
Merge branch 'master' into mmk/over_load_variant_middlewares
keenborder786:mmk/over_load_variant_middlewares
19 hours ago
fix(langchain): preserve structured pii redaction in state hooks
Alexxigang:fix/pii-state-hook-redaction
2 days ago
fix(ci): close shell-injection vector in middleware evals workflow Addresses the Corridor security review on this PR. GitHub textually expands `${{ inputs.* }}` / `${{ matrix.* }}` expressions before the shell runs, so splicing those expressions into the `run:` script body lets a value containing `'` break out of the string literal and execute arbitrary commands in a job that has every provider API key and `LANGSMITH_API_KEY` in scope. The fix is the canonical mitigation: - Every value derived from `inputs.*` or `matrix.*` is now passed via `env:` instead of spliced into the script source. GitHub fills env vars at job start; the values never appear in the script text and bash treats them as data via `"$VAR"`. - The script body invokes `pytest` directly rather than going through `make evals`. The Makefile target's `$(MODEL)` / `$(PYTEST_EXTRA)` are Make's textual expansion (the second issue Corridor flagged) — bypassing `make` from the workflow keeps user-controlled values out of the Make layer entirely. The Makefile target itself is unchanged and remains the supported local-run path; running it locally is safe because the operator controls their own inputs. - `pytest` args are built as a bash array (`PYTEST_ARGS+=(...)`) so word-splitting is handled by bash, not by the script source. - `set -euo pipefail` so any earlier failure halts the job before the pytest invocation. Realistic exposure of the pre-fix code: workflow_dispatch requires repository write access, so the realistic attacker was a malicious insider or a compromised maintainer account, not an external actor. Mitigating anyway because secrets-in-scope is the wrong default.
nh/todo-middleware-loop-contract
4 days ago

Latest Branches

CodSpeed Performance Gauge
0%
fix(langchain): land final answer in last AIMessage for `TodoListMiddleware`#37643
25 minutes ago
ec46fd4
nh/todo-middleware-loop-contract
CodSpeed Performance Gauge
+1%
19 hours ago
a01e1bc
keenborder786:mmk/store_cached_generation
CodSpeed Performance Gauge
+1%
19 hours ago
1ad3061
keenborder786:mmk/reasoning_strip
© 2026 CodSpeed Technology
Home Terms Privacy Docs