Interview: What is meaning-based computing?
Mike Lynch of Autonomy argues that firms must take steps to organise their unstructured data
Lynch: Meaning-based computing will improve regulatory compliance
IT Week: Autonomy has started to talk about itself as a provider of meaning-based computing [MBC]. As its chief executive, how do you define this concept?
Mike Lynch: Meaning-based computing is the ability for a machine to act on the basis of what something means – something being text like email and documents, but also PDFs, voice over IP and other types of content. The reason it is important is that information is broken into two distinct groups: structured stuff that goes in relational databases and all the unstructured stuff, which, while it is exploding in terms of usage, doesn’t fit into IT infrastructures very well.
Why is unstructured data important to organisations?
More and more of the information in companies – estimated to be 85 percent – is unstructured and it is often the most interesting information... [Accessing this information] gives you the opportunity to better leverage your information assets. But there is also a really big negative in the form of compliance problems caused by unstructured data and the need, in the case of litigation, to find all this data.
How can MBC systems help?
The problem with unstructured data is computers haven’t been able to do anything with it apart from move it around. The way we’ve got used to working with it is that we retrieve it, which is why people like search, then a human being looks at it and does the work. In contrast, with structured information the point of IT is to automate, so if you are a bank, the database with the account information spots if someone goes overdrawn and then sends them a letter, with no human being involved in the process at all. The aim of MBC is to enable companies to do a similar thing with unstructured information.
What types of technology fall under the banner of MBC?
It is a broad church. There are lots of methods that go into MBC, from sp eech recognition to text understanding, but the whole point is to produce platforms that can go that step further. This is a big change [in the way IT works] as it is really moving the data back to what humans want. We started with human data, then IT came along and we took all the rich information and boiled it down to database tables. Now we are going back and computers are catching up with what the humans can do. We are just at the beginning of this movement, but in a few years time you will see unstructured information used and processed all over the place.
How will this happen?
Interview: What is meaning-based computing?
Mike Lynch of Autonomy argues that firms must take steps to organise their unstructured data
How will this happen?
Obviously [MBC] is currently dominated by enterprise search, where [Autonomy is] very strong and the biggest and fastest growing company in the market. But the big story here is about doing more than search... A lot of it is about the core IT benefit of automating tasks that would have been done manually. For example, lots of companies employ hundreds of people who read emails sent in and work out where to forward them to be answered. With an MBC system that understands those emails you can get rid of 90 percent of those people and still get the same accuracy when routing those emails.
That is quite a specific example, but if MBC is to challenge keyword search as a means of accessing unstructured data it will need to reach a wider number of users. How will it do that?
One technology that is quite radical is implistic query. This means that rather than stopping your work, going to a search engine and making up a query, the system can read what is on your screen at any time, be it an email or a web page or whatever, and if you press one key it understands what is on the screen and brings you related information. That’s a nice example of how I think search will look very different as MBC evolves.
So you would have search available to users at any time?
Absolutely. A related technology is hyperlinking, where just as a newspaper web story will have links to similar stories, the system provides similar links to internal and external information based on what you are working on. So you call up an email and there are related links to everything the business has on the subject discussed in the email. Again that is not straight search, but it is one of the most useful ways of getting to information inside an enterprise. Another new technology that will change the way that users act are tools such as smart or active folders. They look like normal folders, but they do filing themselves. For example, if you set up a folder on Autonomy all documents on the company will be automatically filed there.
MBC systems can’t guarantee they will always understand the meaning of content. How do they cope with false positives and negatives?
Part of the secret is not to be too ambitious. Coming back to the email allocation example, you do not get rid of 100 percent of the people as there are some emails that are too difficult for the computer to understand because they have things like sarcasm in them.
But some people will still be concerned about false readings undermining the value of the system. How do you tackle that?
It is a perception issue. The thing is, people assume no email ever gets sent to the wrong department with people doing the task, which is not the case. The issue is to make sure [the MBC system] is as accurate as a human being… [Both MBC] technology and people are going to make mistakes, but the question is who will make the most mistakes. A human being using a keyword technology will often miss a lot more than MBC systems.
How can this technology help firms comply with regulations?
At the moment [compliance initiatives handling unstructured data] look for keywords. For example, if you are an investment bank handling Ford you set up a keyword tracker to pick up any documents with the word Ford, because no one except the approved analyst is allowed to say anything about the company for fear of an SEC investigation. The trouble is that if a secretary writes an em ail saying “come out of the airport and look for a blue Ford” that will fire off an alert and you’ve got a false positive. MBC systems understand that particular email is not relevant and that even though it mentioned a Ford it has nothing to do with stocks and shares, while one that says “our analyst is about to downgrade Ford, better sell” is highly relevant. That is one of the big differentiators MBC has against keyword search and it makes it a very powerful compliance tool.
About Mike Lynch
Dr Mike Lynch OBE is founder and chief executive of content management and enterprise search specialist Autonomy.
Since the company was founded in 1996 it has grown into a global company and last year completed the $500m acquisition of former competitor Verity.
Lynch has the Electrical Engineer’s medal for outstanding achievement and the Confederation of British Industry’s Entrepreneur of the Year award.