NLP to increase diversity in text analysis

Felicia Ziparo, Methods Analytics

Image:

Felicia Ziparo, Methods Analytics

More and more articles, blogs and videos mention natural language processing (NLP) as a tool to get information from vast amount of text. What they rarely mention is that NLP can also be used to increase diversity and reduce the bias of text analysis.

NLP is a branch of data science that enables automated processes to analyse and extract meaningful insights from human language. It can be used to supplement the manual processing, drawing out insights that might otherwise have been missed, and reducing some of the manual processing that doesn't add any value. If this is done well, it could reduce the cost of many operations, while improving the quality of the outcomes.

Our white paper Gaining Greater Insights from Public Consultations with data Science & NLP explores how data science techniques can be applied to public consultations, with a particular focus on how this can help humans get more information out of a time expensive task, while reducing bias in the analysis. As a practical application, techniques like Topic Modelling and Named Entity Recognition can help the reader to extract the main themes present in the text, while highlighting the context they have been used in. Named Entity Linking enables linking given entities to a knowledge graph to acquire additional information such as definitions, aliases and conceptual categories. This also gives entities context by creating connections and associations, while accounting for permutations and synonyms. A reduction of bias can be evident from this process: contribution from individuals who use less frequent terms or keywords, would be considered in the analysis. If we want to go further, we could learn from opinions and text that are normally excluded because they do not make the set frequency threshold, making sure these are addressed, if relevant. This would ensure more voices are heard, generating fairer and more in-depth results than has previously been possible using traditional technologies and techniques.

Keywords, organisations, and people the public cite, along with public sentiments in responses to open questions could change across different demographic, economic and geographic groups. By better knowing the data, it is possible to account for any possible underrepresentation when building a model and to test the algorithm, checking that minorities are not affected. Standardising the text and reducing the human bias, can help increase diversity in consultation responses while improving policies and government replies.

NLP has also been used to reduce bias in other fields, for example by matching skills for the recruitment process in the US. While this topic can be controversial, the aim is to use NLP to standardise skills in CVs and successfully match people from different background to open positions. Testing the algorithm is certainly a crucial aspect of this process, making sure that this tool is not used to make automatic decisions on candidates.

As the above example shows, ethics plays an important role in the field of AI. I am a great advocate of AI being used to augment the human processes, rather than replacing them, allowing for greater scrutiny of feedback, rather than less.

Felicia Ziparo is Lead Data Scientist at Methods Analytics and finalist in Team Leader of the Year category at the upcoming Women in Technology Excellence Awards.