The 3 Golden Principles To Maximize The Value From Unstructured Data — Maru Group

5 min readMay 11, 2021

By Ged Parton, Chief Executive Officer

When sharing the Maru Text Analytics capability with clients, one of the first questions we are often asked is ‘What level of accuracy can I achieve?”

Speaking to people keen to use automated verbatim analysis tools, they’ve often been promised 100% accuracy from this sort of technology in the past, only to find that this is never the case.

Accuracy is frequently dependent on a number of different factors. Using Maru’s Text Analytics, maximum accuracy is achievable but it needs to be considered in a framework of best practice golden rules for unstructured data analysis.

There are better methods than trying to achieve 100% accuracy in coding

Trying to reach 100% accuracy should not be the sole objective of text analytics users — while adjusting and tailoring rules and categories is key for achieving a high level of accuracy, too much interference could actually lead to reduced insight and action from the data set.

Instead, an accuracy level of 85–95% is much more realistic and achievable.

Even the most disciplined human could never understand, analyze and categorize every comment accurately every time. Typos, misspellings, multiple word meanings, language trends and variability of sentence structure are all everyday occurrences that effect how textual data is treated.

That said, there are proven methods to maximize the value in unstructured data.

Th e ideal Discovery phase should prioritize speed of analysis above all other criteria

Discovery is all about speed — it’s key to understand what people are saying quickly in order to act, especially before your competitors do. It’s for this reason that accuracy is less important for Discovery purposes — as long as the deployed software consistently looks at each comment in the same way, it will quickly highlight prominent words and phrases.

Maru’s software enables the Discovery phase to follow two distinct and vital routes.

Firstly, we have worked closely with academic experts in machine learning to create a unique engine and capability that we call Topic Modelling. This uses the power of AI to rapidly identify the stories within large volumes of unstructured text to automatically build a knowledge base of the topics that customers are talking about.

This AI capability allows the data to speak for itself. It removes any pre-existing bias in code frames and ensures that the customer’s voice is heard in all its purity. Appropriately deployed this AI ensures that Maru clients stay ahead by quickly spotting emerging themes and pain points in feedback.

At the same time Maru’s software also enables the insight professional to classify by a flexible model to create rule-based code frames with complete data coverage rather than restricted sampling of comments. This second route empowers the user to draw on their hard-earned existing knowledge.

We also believe in the power of sentiment, so Maru’s software analyzes the strength of feeling in comments to ensure that the narratives which are most valuable and most strongly felt are surfaced.

Customization is key for successful code frames and the ability to continuously iterate is an essential need

We often find that our clients have two objectives for their unstructured data analysis: as well as Discover emerging themes, they want to categorize topics into quantifiable data.

Quantifying what customers are saying is less about speed and more about accuracy. Here the first step on the accuracy mission is to define your text analytics objective and understand how much accuracy will play a part.

Every source of textual data is different; this means that one size does not fit all when it comes to accuracy. Accordingly, Maru has created software which encourages bespoke solutions for our clients.

Whether analyzing customer service data with social media comments or simply looking at inputs from across different areas of the enterprise, tailoring defined rules and parameters is key to achieving high levels of accuracy.

Standard rule sets are a great starting point, but for an optimum program software which encourages continuous improvement, kaizen if you will, is key. This flexibility to evolve rules which is baked into Maru’s software enables the enterprise to tweak those standard rules continuously and so drive towards ever higher levels of accuracy.

The pursuit of accuracy requires broad categories which risk missing crucial insight

Categories — and more importantly, how many categories — has a big impact on the level of achievable accuracy. There’s often a trade off when trying to achieve perfection. Users need to decide the enterprise objective here, either an extremely high level of accuracy or granular category detail.

Generally speaking, the broader the rules for a defined category net, the higher level of accuracy that will be achieved.

In contrast, a granular range of categories might provide a more specific assessment of customers views, the more categories there are for comments to be categorized against, the higher the chance that a comment might end up in the wrong one.

For example, take the below comments about staff from one of our clients CX programs. It’s easy to detect in the comments below what department or staff level each customer is talking about.

But customer comments do not always give you the context required.

This particular comment doesn’t confirm if Pete was a phone agent or engineer, however, it does fall into the broader category of staff. The association of a name and the term ‘phone’ together, however, could result in an ambiguous ‘phone agent comment’ categorization.

Of course, rules can be adjusted, for instance in the example above to exclude this scenario but an attempt to cover every possible sentence structure is unlikely to succeed, so the required trade off should be discussed and priorities defined.

Iteration of code frames and a focus on real time interventions maximize the value in unstructured data

A change in language trends, the input of new data and changes to businesses will all influence the level of accuracy you can achieve when automating verbatim analysis.

In order to maximize a tool’s accuracy, it’s important to see where it might be going wrong and quickly and easily put it right.

A primary goal of Maru’s Text Analytics tool is allowing users to take control of accuracy levels and allow them to put right any wrongs quickly and easily. Users have full control and visibility over criteria, or better yet, can utilize our team of in-house research experts who will manage the whole process for you.

Originally published at https://www.marugroup.net on May 11, 2021.