Auto-FL-Research: Agentic Search for Federated Learning Algorithms

LLMs 📅 2026-07-03 👁 13 views ⭐ 8/10

Auto-FL-Research: Agentic Search for Federated Learning Algorithms

Overview

Federated learning (FL) research is characterized by numerous small yet consequential algorithmic choices: optimizer variants, server aggregation rules, local training schedules, normalization and regularization techniques, and model architectures. These decisions are expensive to explore manually and difficult to compare fairly, particularly when candidate modifications alter the FL training or evaluation path. This paper presents Auto-FL-Research (AFR), a constrained coding-agent workflow designed for systematic FL algorithmic recipe search.

Methodology

AFR employs a multi-agent framework where candidate algorithms can propose and implement modifications across several dimensions:

Server aggregation rules
Client update schedules
Local objective functions
Registered model variants

Task profiles constrain the mutation surface, compute budget, communication contract, and final model evaluation procedure. Each campaign documents candidate scores, runtime, edited source files, generated artifacts, and failure status, enabling comprehensive reproducibility analysis.

Experimental Setup

AFR was evaluated on two benchmark families:

Five healthcare cross-silo FL tasks from the FLamby benchmark suite
Grouped-client profiles for the five fixed LEAF datasets, plus the LEAF synthetic task

All experiments used five-seed repeat evaluations to assess statistical reliability.

Key Results

Positive findings: AFR demonstrated performance gains on four of the five FLamby tasks and five of the six LEAF profiles
Identified limitations: The evaluation also revealed seed-sensitive outcomes and search-selected failure cases
Control experiments: When matched with same-budget controls, several gains corresponded to genuine FL-recipe changes. However, some improvements were recovered by fixed-surface scalar controls, and others failed under repeat or held-out evaluation

Contributions and Implications

These mixed outcomes represent a core contribution of the work: they demonstrate a methodology for separating agent-generated candidates into three distinct categories:

Repeated FL mechanisms — algorithmic changes that reliably improve performance
Fixed-surface tuning effects — improvements attributable to hyperparameter optimization within the existing algorithmic surface
Selected single-run artifacts — apparent gains that do not replicate under controlled conditions

This framework is particularly relevant for 2026 as FL research matures and the community increasingly requires rigorous evaluation protocols that can distinguish genuine algorithmic advances from statistical artifacts or search-induced overfitting.

via ArXiv AI

Auto-FL-Research: Agentic Search for Federated Learning Algorithms

Auto-FL-Research: Agentic Search for Federated Learning Algorithms

Overview

Methodology

Experimental Setup

Key Results

Contributions and Implications

Related