ambershen:optimize/llm-sync-generate-parallelization - Branch - langchain-ai/langchain

feat: parallelize sync generate method for improved LLM throughput

#34043

Comparing

ambershen:optimize/llm-sync-generate-parallelization

(

e85d221

) with

master

(

525d5c0

)

-24%

Regressions: 1

Untouched: 12

Skipped: 21

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data. For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Benchmarks

Skipped (21)

These benchmarks were skipped, so their baseline results are used instead. If they were deleted in your codebase, archive them to remove them from the performance reports.
Learn more about archiving benchmarks

Uses the CPU Simulation instrument
to collect CPU performance metrics.

test_create_chat_prompt_init_time

libs/partners/prompty/tests/unit_tests/test_standard.py

Skipped

311.9 µs*

Uses the CPU Simulation instrument
to collect CPU performance metrics.

test_exa_retriever_init_time

libs/partners/exa/tests/unit_tests/test_standard.py

Skipped

325.3 µs*

Uses the CPU Simulation instrument
to collect CPU performance metrics.

test_qdrant_vectorstore_init_time

libs/partners/qdrant/tests/unit_tests/test_standard.py

Skipped

224.2 ms*

Uses the CPU Simulation instrument
to collect CPU performance metrics.

test_chroma_init_time

libs/partners/chroma/tests/unit_tests/test_standard.py

Skipped

57.2 ms*

Uses the CPU Simulation instrument
to collect CPU performance metrics.

test_init_time

libs/partners/deepseek/tests/unit_tests/test_chat_models.py::TestChatDeepSeekUnit

Skipped

1.6 s*

Uses the CPU Simulation instrument
to collect CPU performance metrics.

test_init_time

libs/partners/perplexity/tests/unit_tests/test_chat_models_standard.py::TestPerplexityStandard

Skipped

837.5 ms*

Uses the CPU Simulation instrument
to collect CPU performance metrics.

test_init_time

libs/partners/ollama/tests/unit_tests/test_chat_models.py::TestChatOllama

Skipped

1.6 s*

Uses the CPU Simulation instrument
to collect CPU performance metrics.

test_init_time

libs/partners/xai/tests/unit_tests/test_chat_models_standard.py::TestXAIStandard

Skipped

3.3 s*

Uses the CPU Simulation instrument
to collect CPU performance metrics.

test_init_time

libs/partners/fireworks/tests/unit_tests/test_standard.py::TestFireworksStandard

Skipped

6.6 s*

Uses the CPU Simulation instrument
to collect CPU performance metrics.

test_init_time

libs/partners/mistralai/tests/unit_tests/test_standard.py::TestMistralStandard

Skipped

9.1 ms*

Uses the CPU Simulation instrument
to collect CPU performance metrics.

test_nomic_embeddings_init_time

libs/partners/nomic/tests/unit_tests/test_standard.py

Skipped

1.5 ms*

Uses the CPU Simulation instrument
to collect CPU performance metrics.

test_init_time

libs/partners/groq/tests/unit_tests/test_standard.py::TestGroqStandard

Skipped

1.6 s*

Uses the CPU Simulation instrument
to collect CPU performance metrics.

test_stream_time

libs/partners/openai/tests/integration_tests/chat_models/test_responses_standard.py::TestOpenAIResponses

Skipped

857.3 ms*

Uses the CPU Simulation instrument
to collect CPU performance metrics.

test_init_time

libs/partners/openai/tests/unit_tests/chat_models/test_responses_standard.py::TestOpenAIResponses

Skipped

12.2 ms*

Uses the CPU Simulation instrument
to collect CPU performance metrics.

test_stream_time

libs/partners/openai/tests/integration_tests/chat_models/test_responses_standard.py::TestOpenAIStandard

Skipped

1.2 s*

Uses the CPU Simulation instrument
to collect CPU performance metrics.

test_init_time

libs/partners/openai/tests/unit_tests/chat_models/test_base_standard.py::TestOpenAIStandard

Skipped

12.1 ms*

Uses the CPU Simulation instrument
to collect CPU performance metrics.

test_stream_time

libs/partners/openai/tests/integration_tests/chat_models/test_base_standard.py::TestOpenAIStandard

Skipped

1.2 s*

Uses the CPU Simulation instrument
to collect CPU performance metrics.

test_init_time

libs/partners/openai/tests/unit_tests/chat_models/test_azure_standard.py::TestOpenAIStandard

Skipped

1.7 s*

Uses the CPU Simulation instrument
to collect CPU performance metrics.

test_init_time

libs/partners/anthropic/tests/unit_tests/test_standard.py::TestAnthropicStandard

Skipped

763.3 µs*

Uses the CPU Simulation instrument
to collect CPU performance metrics.

test_stream_time

libs/partners/anthropic/tests/integration_tests/test_standard.py::TestAnthropicStandard

Skipped

34.7 ms*

Uses the CPU Simulation instrument
to collect CPU performance metrics.

test_init_time_with_client

libs/partners/anthropic/tests/unit_tests/test_standard.py

Skipped

2.2 ms*

Failed

Uses the Wall Time instrument
to collect wall time performance metrics.

test_async_callbacks_in_sync

libs/core/tests/benchmarks/test_async_callbacks.py

Regression

-24%

18.4 ms24.3 ms

Passed

Uses the Wall Time instrument
to collect wall time performance metrics.

test_import_time[Document]

libs/core/tests/benchmarks/test_imports.py

-5%

174.9 ms183.6 ms

Uses the Wall Time instrument
to collect wall time performance metrics.

test_import_time[InMemoryVectorStore]

libs/core/tests/benchmarks/test_imports.py

-5%

559.9 ms589.4 ms

Uses the Wall Time instrument
to collect wall time performance metrics.

test_import_time[RunnableLambda]

libs/core/tests/benchmarks/test_imports.py

-5%

447.5 ms471.9 ms

Uses the Wall Time instrument
to collect wall time performance metrics.

test_import_time[InMemoryRateLimiter]

libs/core/tests/benchmarks/test_imports.py

-6%

160.3 ms170.4 ms

Uses the Wall Time instrument
to collect wall time performance metrics.

test_import_time[Runnable]

libs/core/tests/benchmarks/test_imports.py

-6%

444.2 ms472.7 ms

Uses the Wall Time instrument
to collect wall time performance metrics.

test_import_time[ChatPromptTemplate]

libs/core/tests/benchmarks/test_imports.py

-6%

534.9 ms570 ms

Uses the Wall Time instrument
to collect wall time performance metrics.

test_import_time[LangChainTracer]

libs/core/tests/benchmarks/test_imports.py

-6%

395.2 ms421.3 ms

Uses the Wall Time instrument
to collect wall time performance metrics.

test_import_time[BaseChatModel]

libs/core/tests/benchmarks/test_imports.py

-6%

468.8 ms500.9 ms

Uses the Wall Time instrument
to collect wall time performance metrics.

test_import_time[CallbackManager]

libs/core/tests/benchmarks/test_imports.py

-8%

407.1 ms442.9 ms

Uses the Wall Time instrument
to collect wall time performance metrics.

test_import_time[HumanMessage]

libs/core/tests/benchmarks/test_imports.py

-9%

236.9 ms260.4 ms

Uses the Wall Time instrument
to collect wall time performance metrics.

test_import_time[PydanticOutputParser]

libs/core/tests/benchmarks/test_imports.py

-9%

465.9 ms512.2 ms

Uses the Wall Time instrument
to collect wall time performance metrics.

test_import_time[tool]

libs/core/tests/benchmarks/test_imports.py

-9%

451.6 ms497.3 ms

Commits

Click on a commit to change the comparison range

Base

master

525d5c0

-24.16%

feat: parallelize sync generate method for improved LLM throughput - Replace sequential loop with thread-pool executor mapping for multi-input processing - Preserve ordering, callback behavior, and error propagation - Add fast path for single input to avoid unnecessary overhead - Use get_executor_for_config context manager for proper resource management This optimization improves throughput when processing multiple prompts without breaking existing functionality or changing the API.

e85d221

2 days ago

by ambershen

Home Terms Privacy Docs