NLP AI: Back to the Future of CLM

By Rick Ralston – CEO of Contract Logix

It was as easy as 1, 2, 3. You picked a category, typed in your keyword, and hit search. Miraculously, product pricing and availability would appear on the screen right before your eyes – all made possible by natural language processing (NLP).

NLP has come a long way from old-school rules-based data extraction. Back in the late 1900s at, we extracted data in real-time from online merchants to deliver the viewer aggregated prices, reviews, shipping costs, and product availability. Okay, it was 1999, but it is fun to say back in the late 1900s.

It was all done with NLP, but as I said, it was rules-based. For example, can you tell the difference between a social security number and a phone number? Of course you can because the format is xxx-xx-xxxx vs. xxx-xxx-xxxx. There are two digits in the middle, vs. three. How about a credit card number? Is it xxxx-xxxx-xxxx-xxxx or could it be xxxx-xxxxxx-xxxxx? Well, it depends on the card type. It’s starting to get a little more difficult, right?

Typical tracking numbers for major shipping carriers vary widely, too. FedEx tracking numbers are 12-14 digits, while UPS can total 18 characters. Building the rules for this type of analysis can become complex with varied calculations and rules nested within rules. In addition, a human manually codes the rules and decision trees using imperative programming so that the software knows what data to extract. It’s being told how to do something, not what we want it to do.

The NLP Evolution to AI

Converting words into decisions started off with this simple rules-based NLP. But, in today’s world, it’s hard to sort out the simple from the complex and the truth from fiction. Rules-based NLP now requires machine learning (ML) and ultimately deep learning (DL) to be effective.

And, what about the broader context of Artificial Intelligence (AI)? Is it too complex to be useful? Is it too scary?

Well, we all use it now. Speaking into a home device, “Hey Google,” has become natural. The software is processing our natural language. Just the other day, I was watching a demo of a text-to-type app. It was super-fast and incredibly accurate. It also gave me pause when it played back something in the tester’s own voice that had been typed, not dictated. The technology was both impressive and creepy at the same time, which brings me back to the beginning, back to 1999.

Long before the acronyms ML, DL, and AI became commonplace, there was rules-based NLP. It was not a subset of anything, and nothing was a subset of it. It was a lonely planet.

Then, data analysts became data scientists and the NLP started doing some very cool things. It became NLP AI with the help of ML and DL.

Figure 1: Evolution from Rules-Based NLP to NLP AI

Figure 1: Evolution from Rules-Based NLP to NLP AI

From the outside looking in, we weren’t sure what it was. We could understand how to extract data, like back at BottomDollar. We were learning how consistent data would give us consistent results. We even started learning how to interpret inconsistent data (i.e. ML). We thought with enough data, we could learn anything. We thought we could identify what people would buy before they bought it and help stock shelves in grocery stores. We thought we could predict the stock market. What couldn’t we learn? What couldn’t this technology do (i.e. DL)?

The AI solar system was starting to take shape and the formation of something bigger was glowing in the background pulling the framework altogether.

Figure 2: NLP AI as Subset of AI
Figure 2: NLP AI as Subset of AI

The shapes keep changing in size and interrelationships but is it useful? Can we dissect it all and make it actionable? What about in the field of contract lifecycle management (CLM)?

Absolutely! Let’s start with something simple like data extraction which can save anyone involved in contract management a lot of time. Let’s say you have a PDF of a contract executed on third-party paper or an existing executed contract that you want to manage. Instead of manually typing in all the contract metadata like company names and addresses, effective and expirations dates, and signatory names, the CLM software does it for you using NLP AI.

In the past, this was done using techniques like rules and regular expressions, but it was very inflexible and had to be constantly updated to adapt to changes in contracts. Today, NLP AI, which is a subset of AI, “reads” a contract and uses a technique called named entity recognition (NER) to identify the parties involved and extract metadata from the contract without having rules or inflexible pre-defined text searches. It focuses on what data should be extracted from a contract vs. how to extract it.

How NLP AI Works

At Contract Logix, our data extraction, made possible by NLP AI, begins with document processing and optical character recognition (OCR) technologies to be able to read both digital files and scanned images. We call this the “Capture” phase of AI in CLM. Using OCR, all contract data and language such as effective and expiration dates, organization names, terms, signatories, values, etc. is automatically indexed and searchable within the software without requiring the user to input manual keyword tags.

Once the documents are converted into text, the NLP AI will “read” the document and begin extracting the key information. We call this the “Predict & Extract” phase of AI in CLM because the NLP AI is predicting what data to extract from the contract. The first step of this is done using processing pipeline, which tokenizes the text, processes the tokens as parts of speech before it parses the sentences, identifies the entities within the document, and then finds the custom attributes associated with them.

For our NLP AI to do this, it must first be trained using a large corpus of business contracts which allows it to build statistical and weighted neural network models. These models allow it to understand the difference between a word like Amazon being used as reference to a rainforest vs. it being used as a company name and determine what metadata is associated with it (e.g. its address and whether it is an internal party or external party to the contract).  These AI predictions are then validated by the customer in an extraction training and validation user interface which eliminates any errors that might arise when new contracts are imported into the CLM software.

This may all sound complicated, but the result is an incredibly accurate, fast, and efficient solution for contract management professionals to get contracts entered into CLM software without the need for manual data entry.


Data extraction and NLP have come a very long way since the days of 1999 and Rules-based NLP has evolved into a subset of AI with the ability to capture, predict, and extract complex data in an extremely accurate fashion thanks to technology like ML. As in-house legal, procurement, sales, and other contract management-related functions continue to digitize the way they handle legal agreements using CLM software, one thing is certain, NLP AI is their friend and is here to stay. The question remains, what’s the next phase for AI in CLM? Stayed tuned for our answer to that.

Editors note: This article was originally published on LinkedIn.