Search Paper
  • Home
  • Login
  • Categories
  • Post URL
  • Academic Resources
  • Contact Us

 

From Benchmark Scores to Deployment Readiness: A Journal-scale Evaluation Framework for Autonomous Software Development Agents

google+
Views: 74                 

Author :  Partha Sarathi Samal , Suresh Kumar Palus and Sai Kiran Padmam

Affiliation :  Independent Researcher

Country :  USA

Category :  Software Engineering & Security

Volume, Issue, Month, Year :  17, 2, March, 2026

Abstract :


Autonomous software engineering agents are progressing from code assistants to systems that can plan and execute multi-step repository changes. This shift requires evaluation methods that move beyond one-shot pass rate reporting. Many current studies still provide limited visibility into repeatability, failure recovery, and operational efficiency under realistic constraints. This journal article presents an expanded DevAgentBench and DevAgentEval methodology designed for deployment-oriented assessment. The benchmark covers bug fixing, test generation, refactoring, code review assistance, and long-horizon feature work. We organize analysis into three metric layers: task-level success and correctness, robustness under perturbation, and business-aligned operational efficiency. We also formalize a nine-category failure-mode taxonomy linked to trace-level evidence and remediation guidance. Baseline experiments across agent patterns and model families show that rankings are sensitive to context reduction, tool-output noise, transient execution failures, and tighter resource budgets. These findings indicate that average success rates alone are insufficient for production decisions. We therefore recommend condition-aware reporting, repeated-run variance estimation, and reproducible artifact release as minimum standards for autonomous software-agent benchmarking.

Keyword :  Autonomous software agents, agentic AI, software engineering benchmarks, repository-scale evaluation, reliability analysis, robustness testing, failure taxonomy, bug fixing, test generation

Journal/ Proceedings Name :  International Journal of Software Engineering & Applications (IJSEA)

URL :  https://aircconline.com/ijsea/V17N2/17226ijsea01.pdf

User Name : austin
Posted 15-05-2026 on 03:38:41 AEDT



Related Research Work

  • Eunicert: Ethereum Based Digital Certificate Verification System
  • Machine Learning In Network Security Using Knime Analytics
  • Semantic Intelligence In Test Automation: Context-driven Adaptation Through Natural Language Understanding And Machine Learning
  • Which Approach Of Evolution For A Service Of Document Units Recommendation?

About Us | Post Cfp | Share URL Main | Share URL category | Post URL
All Rights Reserved @ Call for Papers - Conference & Journals