Title: RIFT-Bench: Dynamic Red-teaming for Agentic AI Systems
Abstract:
Agentic AI systems built on large language models (LLMs) are rapidly evolving into autonomous decision-making entities, introducing novel attack vectors that extend beyond traditional LLM vulnerabilities. As of 2026, these systems are increasingly deployed across critical sectors, amplifying the urgency for robust and generalizable security evaluations. Existing security benchmarks are often tailored to specific implementations or domains, hindering unified comparisons across heterogeneous agentic architectures.
To address this gap, we present RIFT-Bench, a graph-representation-driven methodology for dynamic red-teaming that supports unified evaluations across diverse agentic frameworks. RIFT-Bench introduces a novel hierarchical representation and operates in two automated phases:
- Discovery: Extracts the structural composition of the target system.
- Scanning: Deploys adaptive adversarial attacks and generates a comprehensive security report.
RIFT-Bench directly evaluates the system under test, leveraging a broad set of dynamically adaptable adversarial probes that target multiple attack vectors and objectives. We validate the proposed pipeline across 45 agentic systems spanning a wide range of implementations, demonstrating its effectiveness and generalizability across heterogeneous architectures. Beyond systems-level attacks, RIFT-Bench also supports direct evaluation of mitigation strategies. These capabilities establish RIFT-Bench as a scalable foundation for security evaluation of agentic AI systems.
Keywords: Agentic AI Security, Red-teaming, Adversarial Attacks, LLM Security, Dynamic Evaluation, Graph-based Methodologies
via ArXiv AI
