Summary
Tree-of-Thought (ToT) search has emerged as a promising approach for enhancing the reasoning capabilities of large language models (LLMs). However, deploying these methods in practice raises a critical, under-explored question: how do different search strategies behave under varying compute budgets, model sizes, and problem difficulties? This study systematically evaluates two representative ToT methods—DPTS (a Monte Carlo tree search-based approach) and SSDP (a semantic deduplication-based approach)—across two mathematical reasoning benchmarks (Math500 and GSM8K), two model scales (Llama-3B and Llama-8B), and four token budgets (3k–10k).
Key Findings
Our analysis reveals that the two methods exhibit limitations that pull in opposite directions:
- DPTS suffers from a cold-start bottleneck at low budgets: it requires sufficient exploration before its value estimates become reliable, making it a poor fit for resource-constrained settings—despite strong scaling behavior at higher budgets.
- SSDP, on the other hand, reaches candidate solutions efficiently but is prone to frontier depletion: its aggressive node merging permanently discards unexplored paths, leaving it unable to improve regardless of how much budget remains.
Implications
Together, these findings suggest that neither a fixed exploration strategy nor a fixed pruning strategy is sufficient across the compute continuum. As of 2026—when LLM-based reasoning agents are increasingly deployed in scientific, industrial, and edge-computing environments with highly variable resource constraints—the need for adaptive search strategies is more pressing than ever.
We argue that effective search for scientific reasoning agents requires strategies that can dynamically adjust their behavior based on search progress and available resources, rather than relying on static budgets or monolithic heuristics.
Metadata
- Comments: Flexscience'26: ACM HPDC workshop
- Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- Cite as: arXiv:2606.20599 [cs.AI]
- DOI: https://doi.org/10.48550/arXiv.2606.20599
via ArXiv AI
