AI Agents for Enterprises

Selador Team
May 29
11 min read

If you find this interesting and want to learn more about how GenAI can unlock new possibilities for your enterprise, reach out and let's chat!

AI agents trigger a lot of feelings inside an enterprise: excitement, dread, suspicion, apathy. Depending on your role, your familiarity with AI, and how your week is going, you probably feel something. And honestly you’re probably right to feel that way. The field is changing fast and is still pretty messy.

We might not be able to change how you feel about AI agents but we can help you understand them better and explain what’s there to like and dislike about them as they stand today.

The World Before AI Agents

In order to understand AI agents better, let’s do a quick rundown of what the enterprise landscape looks like without them.

How is information stored?

In databases, data lakes, file systems, CRMs, spreadsheets. Structured data lived in structured places and unstructured data lived anywhere and everywhere.

How do we search for information?

We use keyword search, filters and structured queries. Whether through tools like Google or Elasticsearch, most enterprise systems had a search box with not much brains behind it (can’t believe we can actually call google search dumb at this point).

How do systems talk to each other?

They spoke in well-defined, rigid protocols: REST APIs, SQL queries, SDKs, message queues, XML schemas, JSON contracts and on and on. Each system had its own language, and both the sender and receiver needed to speak it fluently.

Can machines reason like humans?

Not really. They rely on rule-based systems or early machine learning models (e.g., decision trees, basic neural networks) that follow explicit instructions or statistical patterns.

That’s where large language models (LLMs) come in. They can sift through mountains of structured and unstructured data and pull out insights, answer questions, help us search for or summarize what matters.

They can also understand natural language and generate the kind of structured output that not only humans, but other systems also understand. They don’t just speak English, they also speak SQL, Python and REST.

Mental Model Shift

With the building blocks above, you might already start to understand the shift that’s underway. We’ve essentially created a bridge from human language to all the strange, structured languages machines use to talk to one another. Not only that, we’ve also created a bridge between different kind of systems to talk to each other.

And that changes everything.

We’re heading toward a world where instead of asking, “Do you have an API for that?” clients will ask, “Do you have an agent for that?” or, “Can my agent integrate with your tools?” or even, “Can my agent call your agent?”

That is a huge paradigm shift. And to support this, our systems will evolve and the communication protocols for those systems will evolve and the authorization around those communication protocols will evolve and we will evolve and we will all build a stronger, better and faster world.

But building that world is hard, especially for enterprises.

Enterprises are enterprises because they do some things very well and for most, it’s usually not AI. Their systems weren’t built for AI. Their data isn’t organized for AI. Their authentication models were not designed for AI. Their IT teams certainly don’t have the tools (or the time) to handle AI.

It’s okay.. We’ll get there.

(Unless you work in finance. You’re still using COBOL. Good luck..)

What agents can do

They can read your documents: Have 10 years of compliance documentation, legacy architecture diagrams, manuals and vendor contracts? No human is going to read all of it before making a decision. But an agent can do it pretty easily and give key insights to the human to inform their decision.

They can facilitate communication between systems: This is pretty powerful. A single API call to your ticketing system might take in 100 different parameters and return 1,000 tickets, each packed with detailed metadata. An LLM-powered agent can intelligently make this API call for you based on your need and parse the response, extract what matters, summarize trends, and even flag anomalies.

They can take action: Agents can not only facilitate the communication between systems but also, if given the power, take actions on those systems as well. For example a finance agent might reconcile transactions across five banking systems, flag discrepancies based on past anomalies, and automatically create tickets for the correct teams.

They can click buttons for you: If you’re brave enough you can even employ UI-based agents. These can control a browser, navigate mobile UIs, understand dashboards, extract insights, run tests and submit forms.

What agents can’t do

Agents can’t independently solve world hunger, make nuclear fusion a practical reality, or resolve the P vs NP problem.

In other words: hard problems are still hard problems.

Agents don’t invent new capabilities out of thin air. They’re only as good as the tools they’re allowed to use and the knowledge they’re given access to. If your systems can’t do something today, an agent probably won’t be able to either. It’ll just do the existing things faster, more consistently, and at a much greater scale.

But that’s the point. Agents free up your team to work on the real challenges. The strategic, high-impact, human-level stuff that is more important than sifting through ticket queues or trying to make the most magical spreadsheet.

They’re not here to replace human thinking. They’re here to give your people the time to think.

The easy parts

As of May 2025, some parts of building agents have been figured out and made easy. Many popular tools now have LLM-friendly abstractions (e.g., LangChain, Bedrock, OpenAI functions) that make this setup quick. For example, it’s relatively straightforward to:

Set up an agent that can interact with a single system and perform atomic tasks like “Get my tickets from Asana” or “What’s on my calendar tomorrow?”
Connect to a knowledge base: whether it’s a wiki, set of PDFs, or structured docs, there are now standardised ways to get this done.
Handle simple local workflows where all the context and actions live within one environment. Think: “Summarize my last five meetings” or “Draft a reply to this customer ticket.”

The Hard Parts

Beyond building a basic agent, the next steps start to get really complicated really fast:

Complex workflows involving multiple systems

Most real-world processes span multiple tools and data sources. Think: “If the issue affects more than three clients, escalate it, notify support and recommend a fix.” This isn’t just calling a single API, it requires context tracking, complex reasoning, and coordination across systems that were not built to work together.

Mutating actions and UX around them

Reading from a system is safe. Writing to it can be very risky. Agents that create, update, or delete records need guardrails, validations, confirmations, auditing, and a user experience that builds trust. No one wants an agent that silently mass-deletes customer records because it misunderstood a prompt.

Scaling your agents

One agent serving one team might be easy enough. One agent serving an entire enterprise, across roles, permissions, departments, tools, and workflows is a different beast. Now you’re dealing with latency bottlenecks, access management, varying guardrail requirements, compliance issues, prompt injections, quota limits, and retrieval inconsistencies

Remote workflows and distributed context

Some workflows can’t happen in one place and can’t even happen with the user being present. Users need to be able to trigger workflows and go take a nap and come back to find work well done. A manufacturing agent, for example, might need to coordinate inputs from a technician’s mobile device, factory sensors, and a backend ERP system. Keeping context consistent across platforms and users becomes a real engineering challenge.

Some workflows might need to be triggered automatically e.g. an email incoming into a certain folder triggers a workflow that does research based on the incoming email and has that research ready for the user when they read the email. Managing things like user authentication becomes challenging in these cases.

Setting sane guardrails for your agents

You want your agent to be useful but not reckless. Setting limits on what it can say, what it can do, and who it can do it for is a non-trivial task. Prompt injections, overly confident hallucinations, and authorization leaks are very real risks in enterprise settings.

Monitoring and debugging your agents

When a regular app breaks, you check logs. When an agent breaks, you get a weird paragraph and no idea why it hallucinated your CEO’s phone number. Observability for LLM-based systems is still in its infancy, and debugging becomes part psychology, part guesswork.

Tool discovery and orchestration

Giving your agent access to 10 tools is easy. Giving it 50 tools is hard. And getting it to figure out which one to use, when to use it, and in what order is even harder. Especially when the tools have overlapping functionality or require precise input formatting.

The Even Harder Parts

Now we’re really in the weeds. These are the problems that make even the most seasoned engineers sweat (not us though, reach out for a chat). They don’t show up in cool product demos, but absolutely show up in production.

Authentication & Authorization

This has always been hard. Now it’s harder. Imagine a workflow that spans five different systems: Dashboards, Asana, CloudWatch, Salesforce, and your internal finance tool. You ask the agent to pull data from all of them to generate a weekly report.

Each system has its own auth flow. Your agent now needs to handle five different tokens, scopes, expiry models and will most definitely run into your IT team’s custom Single Sign-On layer while it's at it.

Building a unified, seamless, secure auth model for agents is still a dark art.

Quality

Let’s say your agent manages to gather all the data. Producing a correct and useful report is still another level of challenge. In enterprise settings, a wrong number or hallucinated insight doesn’t just break trust, it can lead to real business consequences. And if people keep blaming the agent for mistakes, the tool will pretty soon be banned from the enterprise.

You need evaluation harnesses, feedback loops, and tight grounding protocols to make sure output isn’t just plausible, it’s right.

Building Expert Agents

It’s easy to make an agent that can book a meeting. It’s hard to make one that understands how to write a risk analysis memo for an infrastructure team, or build a threat model document for your new service or summarize customer churn patterns from raw telemetry.

Expertise doesn’t come from fine-tuning alone, it requires curated knowledge, strong workflows, and systems designed to reason like a domain expert. The higher your quality bar, the more the cracks begin to show.

Agent Collaboration

Once you build the best agent you possibly can, your problems still aren’t finished. You’ll soon need a system where multiple agents need to work together. Now you’ve got state sharing, context passing, tool arbitration, and goal alignment to manage.

These agents will probably need to work across multiple systems and that will introduce the beautiful world of inter-agent authentication.

Doing all this securely, with role-based permissions and audit logs is where even the best engineers start to quietly back out of the room.

How to Solve for These Problems

Yes, these problems are hard, but we, as an industry, are starting to figure out how to solve them. The AI ecosystem is beginning to rally around shared standards that could help enterprises tame the chaos. Let’s discuss some of these below

MCP: Model Context Protocol

MCP attempts to solve the problems of tool discovery, input / output mapping and authentication for Agents. Think of it as a formal handshake between AI agents and the tools they need to use. It’s a protocol that lets systems describe how they should be used as tools by Agents. What methods they expose, what authentication is required, and what structured inputs/outputs to expect. Instead of an agent reverse-engineering your tool’s API, it reads an MCP spec that says:

"Here’s the endpoint"
"Here’s how you authenticate"
"Here’s what inputs I expect and what outputs I give back"

This flips the script: tools must now present themselves in a way agents can understand.

A2A: Agent-to-Agent Protocol

If MCP is about agents using tools, A2A is about agents collaborating with each other.

Imagine your finance agent needs to call the procurement agent, who then talks to the legal agent for contract review. Without a shared protocol, these interactions would be clunky at best and impossible at worst.

A2A solves this by:

Standardizing how agents are discovered
Defining how they negotiate tasks, manage state, and share progress
Supporting long-running, multi-modal, async interactions (think streaming results, push updates, multi-turn conversations)

Agents don’t need to expose internal logic or memory, they just declare what they’re capable of, and A2A handles the communication.

With the help of A2A and MCP, we can separate the creation, management and ownership of agents and their connection to tools across the enterprise. This will naturally evolve to follow Conway’s law and the architecture will end up following the architecture of your enterprise.

Authentication & Authorization

If your agent talks to five systems, it needs five secure ways to access them. The industry standard here is delegation, agents authenticate on behalf of the user using OAuth2 or OIDC, inheriting their permissions without hardcoding credentials.

These standards however don’t support all agent use cases out of the box since they were designed for systems performing actions on behalf of humans. What we’re looking for is agents performing actions on systems on behalf of humans. This requires some work and additional layers on top of these protocols or a redesign of the standards itself.

Tools like Okta's Auth for GenAI are now starting to support secure, on-behalf-of access flows designed for AI

Quality & Hallucination Control

In order to ensure answers come from real data and not the models imagination, one pattern to follow is to ground responses with Retrieval-Augmented Generation (RAG) (or CAG). You will also need to layer in tools like source citations, structured output constraints, and observability tools like LangSmith, TruLens, and Vectara’s hallucination classifier.

Guardrails frameworks like Guardrails AI and NVIDIA NeMo Guardrails can block or reroute answers if confidence is low or context is missing. This stuff doesn't work like magic, but it makes your agent’s outputs defensible and improvable over time.

Building Expert Agents

In order to build expert agents you need to combine the right model, the right data, the right people, the right process and the right oversight (yeah it’s a lot). That’s how you go from “chatbot” to “domain expert that people can actually rely on.”

1. The right model

You need an LLM that matches your domain’s complexity and risk. For example, you might use a large foundational model like Claude for a general co-pilot, or fine-tune an open-source model for a healthcare or finance use case. In high-stakes environments, tighter control over model weights, behavior, and privacy often justifies building on smaller or private models.

2. The right data

An expert agent is only as good as the knowledge and context it has access to. This means indexing internal docs, connecting to tools like Jira or GitHub, and feeding in architecture diagrams and historical decisions. For example, a compliance agent that reviews internal policies must be grounded in the latest regulatory PDFs, audit logs, and exception cases specific to your org.

3. The right people

You can’t make a truly expert agent without expert humans. For example, in order to build an agent that will do software security reviews, you’ll need to bring in your best security engineers to detail their workflow, their tools, their knowledge sources, the nuances and the challenges in their work. They then need to be involved in evaluating, improving and testing the agents.

4. The right process

Agents improve through structured feedback like reinforcement learning with human feedback (RLHF), thumbs-up/down signals, or even lightweight retraining on correction logs. Even without full-blown RLHF, tracking how users correct the agent’s suggestions lets you evolve it from co-pilot to trusted partner.

5. The right oversight

Enterprise-grade agents require audit trails, versioning, and safety checks. Especially when they’re touching production systems or offering recommendations in regulated environments.

For instance, if your IT agent suggests firewall rule changes, you’ll want a human-in-the-loop review, full change logs, and rollback capabilities before it acts.

Conclusion

Yes, building with AI agents isn’t easy an the challenges are real. But we strongly believe that there is enough value in AI agents that we should put in the time and effort to solve them. Our future generations (and more importantly of course, stakeholders) will thank us for it.

If your enterprise is on this journey, or even thinking about it, we’d love to talk. This is the kind of hard, interesting work we’re built for.

AI Agents for Enterprises

The World Before AI Agents

Mental Model Shift

What agents can do

What agents can’t do

The easy parts

The Hard Parts

Complex workflows involving multiple systems

Mutating actions and UX around them

Scaling your agents

Remote workflows and distributed context

Setting sane guardrails for your agents

Monitoring and debugging your agents

Tool discovery and orchestration

The Even Harder Parts

Authentication & Authorization

Quality

Building Expert Agents

Agent Collaboration

How to Solve for These Problems

MCP: Model Context Protocol

A2A: Agent-to-Agent Protocol

Authentication & Authorization

Quality & Hallucination Control

Building Expert Agents

Conclusion

Recent Posts

Comments

Selador AI