The US is a cultural and linguistic melting pot. How you describe your soft drink says a lot about who you are and where you’re from. So what is it: Pop, Soda or Coke?
The site popvssoda.com (what Alan McConchie describes as his 15-minutes of fame) contains an interactive data visualisation exploring the different ways soft drinks are described across the US. Vox described it as the ‘great American soft drink debate’.
So what did Alan’s visualisation show?
Along the North Eastern and South Western Coasts, Soda is the winner. Move towards the Gulf of Mexico, and Coke is on top. Make your way through the Midwest, and you’ll be ordering a Pop.
This simple visualisation perfectly illustrates how individuals and groups can express the same idea in a variety of different ways.
It also serves to highlight the unique challenge that such inconsistencies present to data scientists and software engineers.
People often express the same idea in very different ways. What does that mean for organisations sifting through large volumes of documents trying to identify meaningful insights?
Take the great American soft drink debate.
A search for ‘Soda’ wouldn’t necessarily surface results for ‘Pop’ and ‘Coke’ due to regional linguistic variations. But those terms refer to the same thing. How do you connect these ideas and link them together?
Understanding intent and meaning
Semantic search allows users to explore text data using natural language. The technique can be applied to any group of digitised documents – internal records, meeting minutes, customer correspondence, news articles, scientific reports.
Our semantic search model ingests the required search term and converts it to a numeric representation that captures its meaning and context. The search engine then finds the documents in the dataset with a similar numeric representation and by extension, similar meaning.
Keyword v Semantic Search
For a recent project, we harvested hundreds of thousands of news articles on risks to the food supply chain. A typical search for this dataset would be “fraud”.
Traditional keyword search would return results like these:
Search Query: Fraud
1. … reported fraud in connection to a…
2. …there have been some frauds recorded…
3. …to be treated as fraudulent or deceptive…
For the same query, our Semantic Search engine would return results like these:
Search Query: Fraud
1. …reported fraud in connection to a…
2. …was known to be using false identities to facilitate…
3. …conspiracy in connection to a scheme to conceal…
Semantic Search goes beyond keywords – it goes to the heart of the idea and its context. This approach provides deeper, richer, more insightful and useful results.
A common problem that can occur is that documents might contain words and phrases that have a particular meaning within a specific domain. To account for these differences, we use a process known as fine-tuning.
Fine-tuning works by taking a model that has been trained on documents from a more general domain such as Wikipedia and using Transfer Learning to adapt it to the specific language of the user’s documents. Fine-tuning enables the model to identify and filter results based on the specific requirements of the user.
Semantic search is an incredibly powerful tool that enables organisations to understand their documents in a more comprehensive way. It enables organisations to surface insights that might have been otherwise missed.
Semantic Search is just one of the ways by which data science is helping organisations to transform how they operate. In our Innovate UK White Paper, we looked at how Natural Language Processing has helped the Applications and Assessment Team at Innovate UK respond to some of the unique challenges they have faced as a result of the COVID-19 crisis.
Find out more
Our Semantic Search solutions add context to data in a way that enables users to quickly and easily extract the information that’s valuable and important to them. To find out more about us and how our experience with Semantic Search, Transfer Learning and Natural Language Processing might help your organisation, contact us, using the form below.