Analyst memo

Research1 sourceDeveloping

ITBench-AA: New Benchmark Shows Challenges for AI in Enterprise Tasks

IBM and Artificial Analysis launch ITBench-AA showing AI models' low performance on SRE tasks, highlighting challenges in agentic IT tasks.

Published May 28, 2026, 4:16 AMUpdated May 28, 2026, 4:16 AM

What happened

IBM and Artificial Analysis introduced ITBench-AA, revealing that leading AI models performed below 50% on Site Reliability Engineering tasks, marking it as a challenging benchmark.

Why it matters

The findings underline the complexity of agentic IT tasks like Kubernetes incident response and indicate AI's current limitations in enterprise IT operations.

Who is affected

This benchmark directly impacts AI developers, enterprises utilizing AI for IT operations, and potentially affects the decision-makers in enterprise IT strategy.

Risks / uncertainty

The subpar performance raises concerns about AI readiness for complex IT incidents, with potential risks if over-reliance on AI leads to mismanagement of tech infrastructure.