Beyond Parallel Sampling: Diverse Query Initialization for Agentic Search

LLMs 📅 2026-06-17 👁 20 views 🏷 agentic search, test-time scaling, diverse initialization, query diversity, multi-hop QA, DivInit, parallel sampling, LLM reasoning, breadth scaling

Beyond Parallel Sampling: Diverse Query Initialization for Agentic Search

Authors: Sidhaarth Murali, João Coelho, Jingjie Ning, João Magalhães, Bruno Martins, Chenyan Xiong

Submitted: 15 June 2026

Subjects: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

Comments: 15 pages, 8 figures; under review at EMNLP 2026

Code: GitHub Repository

arXiv ID: 2606.17209

Abstract

Test-time scaling for agentic search typically focuses on increasing either search depth (more turns and tokens per trajectory) or breadth (more parallel rollouts). In this work, we examine breadth scaling and identify a fundamental limitation: standard parallel sampling yields diminishing returns due to query redundancy at the first turn. When models issue similar initial queries across multiple rollouts, the resulting threads retrieve overlapping evidence, and subsequent turns are conditioned on this shared information.

To address this issue, we introduce DivInit, a training-free intervention at the first turn. Instead of sampling k independent first queries independently, DivInit draws n candidates from a single call, selects k < n diverse seeds, and executes them as parallel trajectories. This simple modification ensures broader coverage of the search space from the outset.

Across five open-weight models and eight benchmarks, DivInit consistently outperforms standard parallel sampling, delivering average gains of 5–7 points on multi-hop question-answering tasks under matched compute budgets. These results suggest that query diversity—not just parallelism—is a critical axis for scaling agentic search in the 2026 landscape of increasingly capable language models.

1. Introduction

As of 2026, agentic search systems have become central to multi-step reasoning and information retrieval tasks. The dominant approach to improving performance at test time has been to scale either the depth of search (more reasoning turns and generated tokens per trajectory) or the breadth (executing multiple independent trajectories in parallel). While depth scaling has received considerable attention, we focus on breadth scaling and uncover a key inefficiency: naive parallelism leads to redundant first-turn queries.

When multiple parallel trajectories begin with similar queries, they retrieve overlapping evidence sets. Consequently, each trajectory's subsequent reasoning steps are conditioned on largely the same information, negating the benefits of breadth. This redundancy is especially problematic in multi-hop QA tasks, where diverse evidence coverage is critical.

2. The Redundancy Problem in Parallel Sampling

Standard parallel sampling for agentic search proceeds by: (1) generating k independent first queries from the model, (2) executing k separate search trajectories, and (3) aggregating results. Our analysis reveals that the k queries are often near-duplicates, particularly for well-defined tasks. This arises because greedy or top-k sampling from the model's output distribution tends to produce semantically similar queries.

We quantify this redundancy empirically and show that it leads to linear scaling of evidence overlap with k, rather than the sublinear scaling expected from diverse sampling. The consequence is diminishing returns on performance as k increases.

3. DivInit: Diverse Query Initialization

DivInit is a training-free modification to the first step of breadth-scaled agentic search. The procedure is straightforward:

Generate n candidate first queries from the model in a single forward pass.
Select a subset of k < n queries that maximize diversity, measured by embedding-space distance or semantic coverage.
Execute k parallel trajectories starting from these diverse queries.

Because DivInit operates only on the first turn and requires no additional model training or fine-tuning, it integrates seamlessly into existing agentic search pipelines. The diversity selection step adds minimal computational overhead (a few milliseconds per query set).

4. Experimental Results

We evaluate DivInit against standard parallel sampling across five open-weight models (including Llama-3, Mistral, and Gemma variants) on eight benchmarks spanning multi-hop QA, fact verification, and open-domain reasoning tasks. Key findings:

Consistent improvements: DivInit outperforms standard parallel sampling on all models and benchmarks.
Average gains of 5–7 points on multi-hop QA tasks at matched compute budgets.
Larger benefits for harder tasks: Gains are most pronounced on tasks requiring synthesis of disparate information.
Robust to model size: DivInit benefits small and large models alike.

5. Implications for 2026 Agentic Search

As language models continue to improve in 2026, the bottleneck in agentic search increasingly shifts from raw model capability to search strategy. Our results underscore that breadth scaling must be paired with diversity to be effective. DivInit provides a simple, compute-efficient solution that can be layered on top of any existing parallel sampling framework.

6. Conclusion

We identify query redundancy as a key limitation of standard parallel sampling for agentic search and propose DivInit, a lightweight intervention that diversifies first-turn queries. Empirical results across diverse models and benchmarks show consistent gains, establishing query diversity as an essential consideration for test-time scaling in agentic systems.

via ArXiv AI

Beyond Parallel Sampling: Diverse Query Initialization for Agentic Search

Beyond Parallel Sampling: Diverse Query Initialization for Agentic Search

Abstract

1. Introduction

2. The Redundancy Problem in Parallel Sampling

3. DivInit: Diverse Query Initialization

4. Experimental Results

5. Implications for 2026 Agentic Search

6. Conclusion

Related