Discover ANY AI to make more online for less.

select between over 22,900 AI Tool and 17,900 AI News Posts.


venturebeat
LangSmith Engine closes the agent debugging loop automatically — but multi-model enterprises still need a neutral layer

Enterprises building and deploying agents have a problem: it’s taking their engineers too long to find out that an agent made a mistake, and the loop has continued to perpetuate, especially without a human at every step. LangSmith, the monitoring and evaluation platform from LangChain, launched a new capability in public beta that could make that issue more manageable. LangSmith Engine automates the entire chain by detecting production failures, diagnosing root causes against the live codebase, drafting a fix and preventing regression. It does this in a single automated pass. LangSmith Engine gives AI engineers a faster path to triage, but it launches into a crowded field: Anthropic, OpenAI and Google are all pulling observability and evaluation into their own platforms.LangSmith Engine looks at failuresLangChain said in a blog post that the typical agent development cycle starts by tracing the agent to understand what it’s doing, followed by identifying gaps, making changes to the prompts and tools, and creating ground-truth datasets. Developers then run experiments and check for regressions before shipping the agent. The problem is that customers often run into issues when the trace review doesn’t surface faulty patterns, error repetition gets difficult to see, and there’s no targeted evaluator to catch the same problem when it repeats in production.LangSmith Engine works by monitoring production traces for several signal types, “explicit errors, online evaluator failures, trace anomalies, negative user feedback and unusual behaviors like user asking questions the agent wasn’t built to answer,” according to the blog post.Engine will then read the live codebase, find the culprit and draft a pull request before proposing a custom evaluator for that specific failure pattern. The human comes in at the approval step. It’s built on top of LangSmith’s existing tracing and evaluation infrastructure and also works with an enterprise’s evaluator results. Unlike observability tools such as Weights & Biases, Arize Phoenix and Honeyhive, LangSmith Engine takes the entire chain automatically — detecting the failure, diagnosing root cause, drafting a fix — and brings the human in only at the approval step.Model providers bringing evaluators in platformWhile LangSmith identified this evaluation loop as a need for many enterprises, Engine comes at a time where the larger providers are beginning to offer observability tools within their platform. This means enterprises may choose to use an end-to-end platform rather than add LangSmith Engine onto their existing workflows. Anthropic's Claude Managed Agents brings together agentic deployment, evaluation and orchestration into a single suite. OpenAI's Frontier offers a similar end-to-end platform for building, governing and evaluating enterprise agents — though both have faced questions from enterprises wary of committing to a single vendor.However, practitioners point out that not everyone wants to bring evaluations and observability fully into one platform.Leigh Coney, founder and principal consultant at Workwise Solutions, told VentureBeat that third-party observability is the default for many enterprises. “One fund I work with runs Claude for analysis and GPT for a separate workflow. If observability lives inside each provider's tooling, you now have two systems that can't talk to each other. Your compliance team can't produce a unified audit trail,” he said. “So third-party observability is surviving because multi-model is already the default in enterprise, and somebody has to sit across providers.”Jessica Arredondo Murphy, CEO and co-founder of True Fit, said independent platforms like LangSmith have to prove to enterprises that they can "answer the long-term question of whether they become the cross-model operating layer for quality and reliability.”“Enterprises are not consolidating onto the first-party model provider tooling as quickly as the model providers would prefer. What I see is a pragmatic split: teams will use first-party tooling for fast onboarding and early-stage debugging, but as soon as they care about production reliability, governance, and long-term flexibility, they tend to introduce a more neutral layer for observability and evaluation,” she said. LangSmith Engine is available now in public beta. Teams can connect a tracing project, optionally connect their repo, and Engine will begin surfacing issues from production traces automatically.

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

venturebeat
Claude’s next enterprise battle is not models: it’s the agent control p

<p><i>New VB Pulse data shows Microsoft and OpenAI leading enterprise agent orchestration, but Anthropic’s first measurable foothold points to a larger fight over who controls the infras [...]

Match Score: 226.02

venturebeat
Most enterprises can't stop stage-three AI agent threats, VentureBeat

<p>A rogue AI agent at Meta <a href="https://venturebeat.com/security/meta-rogue-ai-agent-confused-deputy-iam-identity-governance-matrix">passed every identity check and still ex [...]

Match Score: 220.03

venturebeat
Are you paying an AI ‘swarm tax’? Why single agents often beat complex

<p>Enterprise teams building multi-agent AI systems may be paying a compute premium for gains that don&#x27;t hold up under equal-budget conditions. New Stanford University research finds th [...]

Match Score: 153.82

venturebeat
Nvidia's agentic AI stack is the first major platform to ship with sec

<p>For the first time on a major AI platform release, security shipped at launch — not bolted on 18 months later. At Nvidia GTC this week, five security vendors announced protection for Nvidia [...]

Match Score: 153.25

venturebeat
Testing autonomous agents (Or: how I learned to stop worrying and embrace c

<p>Look, we&#x27;ve spent the last 18 months building production AI systems, and we&#x27;ll tell you what keeps us up at night — and it&#x27;s not whether the model can answer ques [...]

Match Score: 137.96

venturebeat
An AI agent rewrote a Fortune 50 security policy. Here's how to govern

<p>A CEO’s AI agent rewrote the company’s security policy. Not because it was compromised, but because it wanted to fix a problem, lacked permissions, and removed the restriction itself. Eve [...]

Match Score: 137.94

venturebeat
One command turns any open-source repo into an AI agent backdoor. OpenClaw

<p>Just two months ago, researchers at the <a href="https://github.com/HKUDS">Data Intelligence Lab at the University of Hong Kong</a> introduced <a href="https://g [...]

Match Score: 132.39

venturebeat
Microsoft takes Agent 365 out of preview as shadow AI becomes an enterprise

<p><a href="https://microsoft.com/">Microsoft</a> last week took <a href="https://www.microsoft.com/en-us/microsoft-agent-365">Agent 365</a>, its mana [...]

Match Score: 129.17

venturebeat
RSAC 2026 shipped five agent identity frameworks and left three critical ga

<p>“You can deceive, manipulate, and lie. That’s an inherent property of language. It’s a feature, not a flaw,” <a href="https://www.crowdstrike.com/en-us/press-releases/crowdstr [...]

Match Score: 125.16