Big data facing growing scrutiny to mitigate ‘bad data’ risks

Big data is coming under growing scrutiny in certain quarters, as the risks of bad data could have an adverse affect on machine learning tools.

By Charles Gubert

Financial services have embellished the concept of big data analytics, leveraging it alongside artificial intelligence (AI) technology and machine learning to identify investment opportunities, detect erroneous behaviour or emerging risks, reduce operational inefficiencies and even predict behavioural patterns or the future needs of clients. While application of big data has yielded some positive results, namely reducing settlement fails and mitigating frauds, its wider use is coming under growing scrutiny in certain quarters.

Big data analytics can help organisations identify trends but its accuracy in some instances is spurious. The proliferation of big data is indisputable but a large proportion of it is fake or manipulated, suggesting the insights being acquired from vast data lakes may be misleading.

Furthermore, a study of 800 bankers by Accenture found more than half of organisations appeared not to be doing enough to validate the authenticity of their data, despite 80% of respondents basing their most critical strategy decisions on that very data.  

Mitigating bad data risks will require firms to ensure they have a solid oversight of the origin of the information which they are processing and this should be followed up by cross-examination of the insights with other data variables and metrics to ensure its accuracy. Institutions have also been advised to conduct more in-depth due diligence on data providers to check their sources are credible and legitimate particularly in light of GDPR (General Data Protection Regulation) and growing concerns about privacy rights. 

“Having privileged access to data is of strategic importance – and transparency is important here too. Larger institutions – with sizeable budgets and in-depth capabilities – typically have more robust processes when verifying where data comes from. However, the proliferation of data – which is often derived from new, alternative sources – has forced organisations to up their game in terms of checking the accuracy and validity of data, as well as the reliability of the data providers,” said Matthias Voelkel, a partner at McKinsey.

Leading academics have also voiced doubts about machine learning tools being used to analyse big data sets. An academic, speaking at the American Association for the Advancement of Science, warned machine learning tools, combing through big data sets, were producing unreliable results, as the technology was only detecting patterns in those specific data sets and not the wider world. The academic added mistakes by AI tools had not been spotted until subsequent analyses of different data sets revealed conflicting results. 

“The application of machine learning in financial services in general – and capital markets and securities services more specifically – has an ever growing number of positive use cases, principally around data extraction and cleansing, scanning and structuring documentation, and answering customer queries, all of which will bring efficiencies as well as better service. While there is a lot of potential upside to adopting machine learning more widely, firms do need to be sure that the data they input into machine learning applications is accurate and its origins can be traced,” continued Voelkel. 

In financial markets, bad data can be dangerous. If a traditional asset manager or private equity firm bases an investment decision on poorly constructed AI-driven insights derived from sloppy data, losses could transpire. At larger trading houses, those losses might even have systemic implications. In response, more institutions are becoming increasingly hesitant about relying excessively on big data tools when performing investment analytics, with a greater number of firms now stressing the importance of human intermediation in some of these processes.