AI agents are digital entities that can act autonomously (without the intervention of humans) for individuals or organizations.They're also called 'agents' in short and can be both softwares or embodied in for example physical robots.An agent typically has a defined role that resonates with professions in the human social construct, like an AI Sales agent, AI Lawyer, AI Customer Support Rep, or an AI HR agent, but they can also have a more general or more specific scope.
Most AI agents are currently focussed on automation of white collar jobs, since those are the jobs where AI, and specifically agents driven by Large Language Models (LLMs), have the most capabilities (acting based on data with reasoning and collaboration skills).
Introduction to AI agents
AI agents are also called autonomous agents, agents or sometimes AI workers or digital workers.An important distinction from AI agents with assistants or co-pilots is that they are a lot more autonomous in terms of decision making and acting, manifesting itself as an entity that can act independently (with agency) rather than an enhancement of current human labour and workflows.
Agents are fundamentally different than tools because they have a high level of autonomy
Key capabilities of AI agents
As described, AI agents are different from the software tools you use. They are autonomously acting entities and have some key capabilities that make them different from software tools.
These are the key capabilities of AI agents:
- Plan: the ability to plan actions and determine dependencies and responsibilities
- Act: the ability to execute on tasks and sequences of task autonomously (without human intervention) while making use of tools
- Reflect: the ability to reflect based on actions and outcomes and determine how to improve
- Collaborate: the ability to work together with humans and other AIs by communicating and aligning goals, planning, responsibilities and requirements
Agents are able to plan, act, reflect and collaborate because they are capable of reasoning (understanding, calculating, problem-solving) and communication (understanding and generating language and other modalities like vision and sound).
The difference between tools, GenAI and agents
On the surface, tools look the same as AI agents. But they are fundamentally different in terms of capabilities and how they are used, or better, collaborated with.
Agents can reason and have a high level of automation, which creates a level of autonomy. Because of that high level of autonomy they behave similar to humans, very differently from how software tools are 'used' instead of collaborated with.
Software tools have a low level of automation and low level of reasoning, so a very low level of autonomy. Generative AI has reasoning capability so is already more autonomous than a software tool, but can only execute one task at a time. Robotic Process Automation (RPA) is used by enterprises to execute a series of tasks (a process) in an automated way but is 'dumb' and therefore not autonomous.
- Agents have a very high level of automation and reasoning capabilities
- Generative AI: good at reasoning with 'single-shot prompting' but needs explicit instruction and can't collaborate with other AIs
- Software tools: can support in mainly individual independent tasks, but need a human to operate it and executes without reasoning ability
- Robotic Process Automation: can execute sequences of tasks automatically, but can't reason and needs a human to explicitly define and develop the steps and sequences that it has to execute
- Agents: can reason and use reasoning to make decisions and plans, can also execute on these plans, making them autonomous
Read more here about how agents are changing the economy.
How can AI agents be autonomous?
AI agents can be autonomous because they make use of different technology than for example software tools do, they use language models instead of rules based systems which allow them to reason. Language models are powering the capabilities of AI agents because they allow for generalization instead of having to be programmed in a specific way to do one specific task. This opens up a world of possibilities because with language models an agent is not dependent on a very specific workflow anymore, it can answer and action based on almost any given input guided by their main purpose.
Agents iteratively prompt these models to make use of the intelligence of the models and chain together several prompts, databases and tools to achieve their goals, possibly in collaboration with other agents.
The agent's universe is one of context, configuration, skills, training and prompts, which make up the agent's 'awareness' and abilities.
These are the dimensions that agents are aware of:
- Context: the agent can have knowledge (databases and search) and can be specialized (with fine tuning to focus the agent on a specific domain)
- Configuration: the agent has a defined role and identity (defined with system prompts) and has certain defined behavior (determined with settings)
- Skills: the agent uses and even generates tools (with code generation and execution and API calls) and uses its know-how to utilize those (with the right parameters)
- Training: the agent has general knowledge (from the model pre-training) and has a model of the world (determined by model weights)
- Prompt: the agent gets direct instruction by a user (based on the user's goal) and can get specific input (like file uploads and multimodal inputs)
In summary, AI agents are different in capabilities, with a level of autonomy that's fundamentally different from tools and more simplistic systems in how they operate in our world.