Research Sharing System

From Benchmark Scores to Deployment Readiness: A Journal-scale Evaluation Framework for Autonomous Software Development Agents

Views: 74

Author : Partha Sarathi Samal , Suresh Kumar Palus and Sai Kiran Padmam

Affiliation : Independent Researcher

Country : USA

Category : Software Engineering & Security

Volume, Issue, Month, Year : 17, 2, March, 2026

Abstract :

Autonomous software engineering agents are progressing from code assistants to systems that can plan and execute multi-step repository changes. This shift requires evaluation methods that move beyond one-shot pass rate reporting. Many current studies still provide limited visibility into repeatability, failure recovery, and operational efficiency under realistic constraints. This journal article presents an expanded DevAgentBench and DevAgentEval methodology designed for deployment-oriented assessment. The benchmark covers bug fixing, test generation, refactoring, code review assistance, and long-horizon feature work. We organize analysis into three metric layers: task-level success and correctness, robustness under perturbation, and business-aligned operational efficiency. We also formalize a nine-category failure-mode taxonomy linked to trace-level evidence and remediation guidance. Baseline experiments across agent patterns and model families show that rankings are sensitive to context reduction, tool-output noise, transient execution failures, and tighter resource budgets. These findings indicate that average success rates alone are insufficient for production decisions. We therefore recommend condition-aware reporting, repeated-run variance estimation, and reproducible artifact release as minimum standards for autonomous software-agent benchmarking.

Keyword : Autonomous software agents, agentic AI, software engineering benchmarks, repository-scale evaluation, reliability analysis, robustness testing, failure taxonomy, bug fixing, test generation

Journal/ Proceedings Name : International Journal of Software Engineering & Applications (IJSEA)

URL : https://aircconline.com/ijsea/V17N2/17226ijsea01.pdf

User Name : austin
Posted 15-05-2026 on 03:38:41 AEDT

Related Research Work