Agentic Spectrum

Agentic Spectrum

Tags
Published
June 30, 2024
notion image
 

Intro

AI agents are self-driving computer programs.
And just like self-driving cars, it’s useful to consider agents as a spectrum of autonomy: the agentic spectrum. Agents can be characterized along this spectrum by answering one simple question:
Who is deciding how to construct and traverse the execution graph of an AI agent?
The execution graph or control flow of a program is a graph of possible actions the program may take over time. Most computer programs today derive their execution graphs from source code written by one or more human programmers. This is a very deterministic and relatively easy to debug world, where even primitives such as random number utilities are pseudorandom and can be controlled deterministically via seeds.
Agents, on the other hand, use an AI model such as an LLM to decide the control flow of an application. The extent that the control flow of a program is determined ahead of time by a human programmer versus being dynamically determined by an LLM at runtime is the deciding factor in the level of autonomy of an AI agent.
 

Why is this important?

Reliability and generality are the two key factors holding back AI agents from more widespread adoption. By understanding the agentic spectrum, AI engineers and agentic authors can make better decisions on how to create more reliable agents today.
A question I ask myself a lot when thinking about different L5 agent demos is how these use cases could be reframed from the perspective of less autonomous yet more reliable L2/L3 agents.
 

Spectrum

  • L1 agents are traditional computer programs and deterministic workflows.
    • The execution graph is controlled by static code which is most often written by a human programmer.
    • L1 agents may invoke LLMs or other AI models, but they do not rely on these models to determine the program’s control flow.
    • L1 agents may be autonomous in that they don’t require user interaction at runtime (like a background workflow script), but they are not autonomous from the perspective of the human programmers who created them. E.g., the human programmer is still driving the bus.
  • L2 agents use LLMs selectively to decide how to handle key points in the program’s control flow.
    • Today, this often boils down to deciding which tool to invoke based on a set of tools which have been carefully curated by a human programmer.
    • The most common example of L2 agents today is invoking an LLM with access to tools in a while loop.
    • The majority of the program’s control flow still resides outside of the LLM’s purview and is controlled by a human programmer.
  • L3 agents are defined by an execution graph that is constructed statically by an expert human programmer which is handed off to an LLM to determine how to traverse this graph at runtime.
    • The defining characteristic of L3 agents is that they tend to be hierarchical or recursive in nature, but the entire execution graph has been carefully crafted by a human programmer ahead of time.
    • L3 agents are commonly composed of L1/L2 agents with some higher-order agent orchestrating the execution of these sub-agents.
  • L4 agents add some dynamic actions to the execution graph which are not known a priori by the human programmer.
    • The majority of the control flow may still controlled by code written by a human programmer, but for the first time, L4 agents introduce the ability for an LLM to create novel, dynamic actions at runtime.
    • If an agent creates and executes dynamic code at runtime, it is at least an L4 agent.
  • L5 agents are fully autonomous programs where the entire execution graph is constructed and traversed on-the-fly.
 

Insights

  • You can reduce L4/L5 agents to the dynamic construction of an execution graph plus the ability to execute that graph reliably – e.g., L3 agents.
  • Reliable, general L5 agents are functionally equivalent to AGI.
  • The same spectrum can be applied to embodied agents aka robots.
 

Further Reading