NLP (Natural Language Processing) technology is already among us. Every time your smartphone keyboard corrects a typo, every time you hit “translate” on a tweet or a post, every time you ask Siri for the weather – that’s NLP in action. But the true potential of NLP isn’t in consumer tech, it’s in big data and artificial intelligence.

To create meaningful AI you need data, and lots of it. Fortunately, humans are creating more and more data every day – 90% of the data in existence right now comes from the last two years alone. The problem? That data is a mess: it is unstructured snippets of text, audio, video in countless different languages and formats that are only legible to humans.

Enter NLP.

“We need to have a robust way to take this data in its raw form and translate it into a structured representation that computers can understand,” says Regina Barzilay, Delta Electronics Professor of Electrical Engineering and Computer Science at MIT. Only then can we unlock the true power of big data.

Barzilay heads up a class at MIT’s Computer Science and Artificial Intelligence Laboratory and is considered a leading expert in the NLP world, this year winning a MacArthur Fellowship (often referred to as the “genius grant”) for her work in the field. “I got into this field by chance,” she explains. “I started in ‘96, ‘97, when [modern NLP techniques] simply didn’t exist. It was uncharted territory.”

Barzilay recently made waves with her research on cancer drug prescription practices. A former sufferer of breast cancer herself, she says she was struck by how little data was used in the industry.

“In the U.S., all the decisions as to whether or not you should get a certain drug are essentially based on clinical trials,” she explains. “The problem is, only 3% of patients participate in these trials – the data of the other 97% simply isn’t used.” The remaining data is often locked away in prescriptions and medical records that don’t follow any consistent format.

“In order to personalize treatments effectively, you need to see way more than 3%,” she continues. “You need to be able to continuously monitor the data over years to see how it changes, rather than just say ‘this clinical trial is done, it’s over, here are the recommendations.’”

This is where Barzilay’s NLP work comes into play. “How can you take this raw data, which you cannot really use directly, and translate it into a structured database so that machines can then identify important correlations?”

Barzilay has applied a similar approach in utilizing machines to analyze mammogram data, reducing the need for redundant surgeries by predicting breast cancer risk to a high degree of accuracy.

However, she says, a major problem with using large amounts of non-linear data is that while the machines are learning, humans are not. The neural models that generate solutions for such problems give no rationale for their predictions, leaving doctors, researchers and clients scratching their heads as to exactly which piece of data drove the decision.

“As these models become part of our daily lives, it is not enough just to say, ‘Here is the prediction.’ We really need to understand the ‘why’ aspect,” she explains. “We do a lot of work on how to make this process transparent.”

Barzilay’s work isn’t limited to the medical field. Her team has worked with the Rakuten Institute of Technology in Boston on developing methods to automatically gather and categorize product information on e-commerce websites, while also imagining applications for NLP in machine analysis of product reviews. Machine translation has also been an area of focus for Barzilay, who recently developed a computer system that automatically deciphered the ancient language Ugaritic.

But across Barzilay’s research, one theme remains constant: How do we take the enormous amounts of data that humans produce and make it available for machines to learn?

“The way in which we define whether someone is intelligent is not just if they remember a lot,” she explains. “What makes somebody intelligent is their ability to learn new things … their ability to capture new things on the fly and connect it to previous knowledge.”

Thanks to Barzilay and her team, NLP could just become the key that unlocks true artificial intelligence.


Read more posts from the Rakuten Technology Conference here.