ambershen:optimize/llm-sync-generate-parallelization - Branch - langchain-ai/langchain - CodSpeed

Blog Docs Changelog

feat: parallelize sync generate method for improved LLM throughput

Comparing

ambershen:optimize/llm-sync-generate-parallelization

(

e85d221

) with

master

(

525d5c0

)

-24%

Regressions: 1

Untouched: 12

Skipped: 21

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data. For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Benchmarks

Skipped (21)

Failed

test_async_callbacks_in_sync

libs/core/tests/benchmarks/test_async_callbacks.py

Regression

-24%

18.4 ms24.3 ms

Passed

test_import_time[Document]

libs/core/tests/benchmarks/test_imports.py

-5%

174.9 ms183.6 ms

test_import_time[InMemoryVectorStore]

libs/core/tests/benchmarks/test_imports.py

-5%

559.9 ms589.4 ms

test_import_time[RunnableLambda]

libs/core/tests/benchmarks/test_imports.py

-5%

447.5 ms471.9 ms

test_import_time[InMemoryRateLimiter]

libs/core/tests/benchmarks/test_imports.py

-6%

160.3 ms170.4 ms

test_import_time[Runnable]

libs/core/tests/benchmarks/test_imports.py

-6%

444.2 ms472.7 ms

test_import_time[ChatPromptTemplate]

libs/core/tests/benchmarks/test_imports.py

-6%

534.9 ms570 ms

test_import_time[LangChainTracer]

libs/core/tests/benchmarks/test_imports.py

-6%

395.2 ms421.3 ms

test_import_time[BaseChatModel]

libs/core/tests/benchmarks/test_imports.py

-6%

468.8 ms500.9 ms

test_import_time[CallbackManager]

libs/core/tests/benchmarks/test_imports.py

-8%

407.1 ms442.9 ms

test_import_time[HumanMessage]

libs/core/tests/benchmarks/test_imports.py

-9%

236.9 ms260.4 ms

test_import_time[PydanticOutputParser]

libs/core/tests/benchmarks/test_imports.py

-9%

465.9 ms512.2 ms

test_import_time[tool]

libs/core/tests/benchmarks/test_imports.py

-9%

451.6 ms497.3 ms

Commits

Click on a commit to change the comparison range

Base

master

525d5c0

-24.16%

feat: parallelize sync generate method for improved LLM throughput - Replace sequential loop with thread-pool executor mapping for multi-input processing - Preserve ordering, callback behavior, and error propagation - Add fast path for single input to avoid unnecessary overhead - Use get_executor_for_config context manager for proper resource management This optimization improves throughput when processing multiple prompts without breaking existing functionality or changing the API.

e85d221

3 hours ago

by ambershen

© 2025 CodSpeed Technology

Home Terms Privacy Docs