Agentic AI is already starting to plan holidays, what will it do next?

UST's chief AI architect on the emergence of autonomous AI in business

Agentic AI is already starting to plan holidays, what will it do next?

Image:
Agentic AI is already starting to plan holidays, what will it do next?

One of the less remarked upon impacts of GenAI is the way it has abstracted away the input process, at a stroke lowering the bar to entry to advanced technology.

"Natural language has become the lingua franca for interacting with technology," said Dr Adnan Masood, chief AI architect at business transformation company UST.

"It's not programming languages like C# and C++, it's prompts."

Nowadays even the the most technically challenged or unimaginative of users can produce sophisticated text, images and video using a keyboard or even their voice.

So why stop there? The next step is for the LLM to start giving instructions to other applications or agents to act on the user's behalf. This requires the model to properly comprehend the instructions.

"People think that progress in generative AI is all about generation, but it's not. It's also about comprehension," Masood told Computing.

Comprehension should not be confused with intelligence ("they are still stochastic parrots, just predicting the next token"), but improvements in the way models process input does open up a new set of possibilities, including agentic AI.

Agentic AI systems go beyond simply responding to inputs. They have level of independence and are capable of taking actions in the physical or virtual world. This requires interpreting and comprehending goals, planning a course of action, and executing tasks.

"When you have agents combined with large language models, you can start applying complex goals through natural language and take autonomous actions based on that to achieve those goals," said Masood.

This brings us to large action models (LAMs), which can translate human intentions into action. LAMs sit between the LLM and the agents, "autonomously handling complex tasks with end to end planning, and triggering actions based on that."

But it's one thing to have a recommender select the next song on a Spotify playlist, quite another for a medical device to take decisions autonomously, or for a financial bot to perform a transaction or manage an investment portfolio on behalf of its owner. The risks are clear.

As we are all aware, LLMs make things up. This is a feature, not a bug. It's a function of the creativity that LLMs require to produce new content rather than regurgitating chunks of text. However, hallucinations (Masood dislikes the term because it implies sentience) can be mitigated by guardrails using a range of techniques such as citation and attribution and ad hoc and post hoc processing.

Another dimension

But for LAMs we need protections in another dimension. Instead of checking that the input is permissible and the output text or image is appropriate for the input, developers need to put limits on the range of possible actions.

"LAMs are powerful because they can act and build actions on the fly," Masood said. "So we can say, 'these are the actions you can perform, and you have already pre-thought of those actions'."

The big challenge is how to permit enough flexibility of input and input modality (text, video, audio, etc) to allow the LAM to be creative in its response, while ensuring it is not prompted to act harmfully in the context in which it is being used.

One approach is to classify action based on impact, with guardrails created around a risk matrix of authorised and unauthorised actions. For high-risk actions there should always be a "human in the loop" to press the OK button, while low-risk actions may be performed autonomously.

Another approach is to have the LAM refer to a repository of allowed actions, analogous to the use of retrieval augmented generation (RAG) to restrict and contextualise an LLM's response by forcing it refer to a specific set of documents.

And models can also be fine tuned using techniques like chain of thought reasoning, a step-by-step process that seeks to predict what the impact of the action is going to be. "If that action is found to impact high-risk areas such as health or finance it will classify it as an action that cannot be performed."

A high bar

Thus far, LAMs and agentic AI are largely confined to R&D, out of the public eye, and with the human firmly in the loop.

"Everyone's worried about having the second shoe drop," said Masood. "Any failure of security or reliability is going to cause a big uproar. There are so many bad news stories about AI in the media. The people I work with in finance and healthcare are very cautious, they want to keep it in pilot phase."

However, in less contentious areas such as holiday planning, things are already becoming commercially viable, according to Masood, with agents exploring the search space to create itineraries and packages. LAMs can also inspect an applicant's CV to identify required verifications, then sending agents off to check GitHub or LinkedIn profiles, for example, and sending out emails for reference checks, all without human intervention.

Masood foresees valuable use cases for manufacturing, retail, finance - and especially healthcare: "Including getting claims data information from multiple different places and taking autonomous action, but the bars are really high for actionability."

The new EU AI Act has a lot to say about automated decision making being subject to human oversight, and this and other regulations will limit the range of actions that AI can permissibly undertake, such as denying someone a claim or a loan.

"For approvals, yes; for denials, no, that's a huge bar."