evergreen-paper - Branch - moltis-org/moltis

fix(agents): detect and break tool-call reflex loops (#658)

#664Merged

Comparing

evergreen-paper

(

158085a

) with

main

(

c3da499

)

-1%

Untouched: 39

Skipped: 5

Benchmarks

44 total

namespaced_model_id

crates/benchmarks/benches/boot.rs

+4%

2.6 µs2.5 µs

vision_support_lookup[unknown-model-xyz]

crates/benchmarks/benches/boot.rs

+1%

2.6 µs2.6 µs

values_to_chat_messages[50]

crates/benchmarks/benches/boot.rs

62.3 µs62.1 µs

config_default_construction

crates/benchmarks/benches/boot.rs

33.4 µs33.3 µs

sanitize_tool_result[10000]

crates/benchmarks/benches/boot.rs

159 µs158.9 µs

vision_support_lookup[gpt-4o]

crates/benchmarks/benches/boot.rs

2.4 µs2.4 µs

vision_support_lookup[mistral-large-latest]

crates/benchmarks/benches/boot.rs

2.5 µs2.5 µs

context_window_lookup[mistral-large-latest]

crates/benchmarks/benches/boot.rs

2.6 µs2.6 µs

context_window_lookup[unknown-model-xyz]

crates/benchmarks/benches/boot.rs

2.7 µs2.7 µs

sanitize_tool_result[1000000]

crates/benchmarks/benches/boot.rs

15.1 ms15.1 ms

vision_support_lookup[gemini-2.0-flash]

crates/benchmarks/benches/boot.rs

2.5 µs2.5 µs

session_key_to_filename[default]

crates/benchmarks/benches/boot.rs

653.3 ns653.3 ns

tool_result_to_content_vision[1000000]

crates/benchmarks/benches/boot.rs

22.3 ms22.3 ms

tool_result_to_content_vision[100000]

crates/benchmarks/benches/boot.rs

2.2 ms2.3 ms

sanitize_tool_result[100000]

crates/benchmarks/benches/boot.rs

1.5 ms1.5 ms

vision_support_lookup[claude-sonnet-4-5-20250929]

crates/benchmarks/benches/boot.rs

2.6 µs2.6 µs

context_window_lookup[claude-sonnet-4-5-20250929]

crates/benchmarks/benches/boot.rs

2.6 µs2.6 µs

tool_result_to_content_vision[10000]

crates/benchmarks/benches/boot.rs

240.7 µs240.9 µs

vision_support_lookup[gpt-5]

crates/benchmarks/benches/boot.rs

2.4 µs2.4 µs

values_to_chat_messages[500]

crates/benchmarks/benches/boot.rs

505.7 µs506.5 µs

config_template_generation

crates/benchmarks/benches/boot.rs

108.3 µs108.8 µs

values_to_chat_messages[2000]

crates/benchmarks/benches/boot.rs

-1%

2 ms2 ms

config_load_toml

crates/benchmarks/benches/boot.rs

-1%

1.9 ms1.9 ms

full_config_boot_path

crates/benchmarks/benches/boot.rs

-1%

4.2 ms4.2 ms

config_validate_toml

crates/benchmarks/benches/boot.rs

-1%

2.4 ms2.4 ms

Commits

Click on a commit to change the comparison range

Base

main

c3da499

-0.96%

fix(agents): detect and break tool-call reflex loops (#658)

829db4c

10 days ago

by penso

-0.06%

fix(agents): address Greptile review feedback on #658

cf39c1a

10 days ago

by penso

-43.13%

fix(agents): loop detector handles mixed-outcome batches correctly (#658)

93dba9e

9 days ago

by penso

+43.29%

fix(agents): treat success=false without error field as failure (#658)

158085a

9 days ago

by penso

Home Terms Privacy Docs