
Our 3 Key Take-away's
Banking is a data-driven business. However, the actual realization of AI use cases lags far behind the potential of methods and data
The reasons for this often lie in the technical foundations of data storage and use, the organizational framework of responsibilities and structuring and regulatory requirements for data use and storage - in short, the tension between technology, organization and regulation
The use of suitable methods and frameworks can ease this tension to a certain extent and thus enable AI implementations. This and a number of best practices are discussed in this blog post
Do you know to what extent your written regulations are complying with various regulatory requirements (e.g. MaRisk or DORA) at this very moment? AI methods can determine this information - continuously and in near-real time, ideally presented clearly in a dashboard. However, such implementations are often a long way off, especially in the banking environment, as the foundations for realizing such use cases have not yet been laid. This complexity needs to be navigated.
But let's start at the beginning. Analytics and artificial intelligence form a broad spectrum of methods for extracting information from data and thus enabling data-driven decisions, optimization and automation. Some of these methods have been used in banking for decades (e.g. statistical models), while others have received far more media attention than banking applications at this point (e.g. large language models). The potential of these technologies for product sales and optimization as well as savings in the back office and compliance with regulatory requirements is undisputed. However, the implementation of these methods in the banking environment must take place in a field of tension between organization, technology and regulation, which - if ignored - can lead to uncomfortable scope restrictions or budget expansions. It is not uncommon for the creation of a training and test data set to lead to considerable coordination efforts. Whether information can be given to a customer advisor in real time often depends on the performance of the fragmented data landscape and data pipelines. So let's discuss which framework conditions need to be considered in order to enable a beneficial implementation and how they are taken into account in the planning.

First of all, use cases - why do we do it?
Before we come to the field of tension itself, let's first clarify which areas of banking benefit most from analytics and/or AI. The following list is not exhaustive, but covers the typical areas:
Optimized customer experience - the analysis of customer behaviour to optimize offer and interaction. Exemplary use cases are
Personalized product offers/nudging - e.g. analysis of transaction and interaction behaviour to predict product interest. If the model achieves sufficient predictive accuracy and the threshold value is set sufficiently high, offers relevant to the customer can be displayed at the right time without being perceived as annoying.
Customer interaction/language interfaces - since generative models have developed a corresponding quality, text generation (at this point image generation is not in focus) is particularly interesting for banking applications. Text-based interaction between customer and advisor or customer and documents can now be designed 24/7 in an almost human-like fashion. The required models (LLMs) were initially only available in proprietary form (e.g. GPT), but are increasingly also available in open source in comparable quality (e.g. Llama)
Regulatory and reporting - regulatory activities represent a cost item that often leads to high, sometimes avoidable costs triggered by events (e.g. audits). The following shows a series of examples:
Compliance screening - The announcement of an audit often implies a short-term analysis phase followed by a high-pressure remediation of findings. This leads to short-term, expensive consulting expenses and obstruction of operational activities. Text-based methods of AI (e.g. LLMs) can perform such analysis tasks in suitable architectures (e.g. RAGs), partly automated, partly in tandem with a compliance employee, continuously and in near-real-time. An appropriate structuring of the compliance documents is a necessary basis for such a system.
Compliance Loop Up - A system as described above allows not only complete screens on the compliance basis, but also event-driven queries. For example, questions can be asked about completeness or specific addressing of points.
Transaction analysis - the analysis of transactions to identify anomalies and fraudulent patterns. The following points out a series of examples
Fraud detection - analysis of a series of transactions to identify fraud patterns. Previously mainly expert/rule-based systems, nowadays more modern methods (e.g. deep learning) are used to identify fraud patterns before they occur.
Anti-money laundering - analogous to the fraud detection use case as pattern recognition in transactional data.
Credit risk - The prediction of risk-relevant KPIs that must be collected both for regulatory reasons and for lending. Every bank already has its own set of models
Operational efficiency - the data-driven automation of operational processes saves costs and enables continuous, immediate service. A brief excerpt of use cases
(Partially) automated credit decisions - no surprises here. Based on the credit scoring, some applications (green cases) can be accepted automatically. Others are rejected (red cases) and some are forwarded to a clerk (yellow cases).
Robotic Process Automation - no surprises here either. Repetitive processes are run through automatically using software. However, it often makes sense to simplify the process instead of automating the current process.
Customer loyalty - the analysis of customer data to predict customer behavior in order to be able to react at an early stage. Typical use cases are
Churn analysis - identification of customers who intend to switch banks in order to persuade them to stay with dedicated offers. This leads to cost savings, as offers are targeted and expensive new customer acquisition is reduced
Customer engagement - scoring of customer engagement and willingness to participate in initiatives. An identified customer group can be addressed with suitable products.
Investment and trading - for the sake of completeness, we also address this point. At this point, however, it must be said that the competence for these methods clearly lies with the trading houses, which is why we will not discuss them further.
The aim is to implement the aforementioned use cases. However, especially in the banking environment, it is often found that for example legacy databases (technical), unclear functional responsibilities (organizational) or regulation of the handling of data types (regulatory), pose an unexpected challenge for implementation.
Technology - where and in what state is my data?
IT systems in banks have often grown historically. These systems are often monolithic and not fully integrated, with data being stored in isolated silos. Business areas such as credit, payment transactions or securities trading use separate systems without seamless integration. The direct technical consequence of fragmentation is redundant data and a complex interface landscape.
A mostly fragmented data landscape, in which master, transaction and risk position data, for example, are distributed across separate systems, makes it difficult - if not impossible - to analyze the database comprehensively. Some of the systems work on a batch-based process (e.g. consolidation of the data basis in the daily night run), so that real-time analyses on the data basis are restricted. The same applies to the high latency of ETL processes from legacy databases to analytical data storage, meaning that banks often do not run analyses on real-time data. In addition, the integration of systems in a system landscape that has grown over decades is technically and professionally complex.
The typical approach to central consolidation and harmonization of data is to set up analytical data management. Depending on the structuring and type of data, there are various approaches to choose from, with more modern approaches (e.g. Data Lakehouse) offering almost all the advantages. On the way to real-time data analysis, the choice often falls on event-driven architectures (e.g. Eventstore or Kafka) to access data streams from legacy systems and make them available. Depending on the use case and the capabilities of the databases, however, data can also be extracted in a batch-driven process. Depending on the capabilities and modularity of the system landscape, data virtualization can also be considered, whereby data points remain in their source systems and are only queried at runtime in order to connect and harmonize distributed data sources without the need for complete physical migration. The use of technical tools can also improve the organization of data and provide traceability. Frameworks such as Data Mesh or a Medallion Architecture organize the business enrichment of the data and document it technically in order to ensure the long-term value and business background of the data.
However, as is often the case, the devil is in the detail. The complexity lies in the combination and integration of the legacy and target landscape. Solutions are often case-by-case decisions, whereby the overall picture must be kept in mind.
Organization - what and from whom is my data?
It is not only data that is fragmented, it is often the organization as well. “Head monopolies” and unclear technical responsibilities lead to a lack of technical background knowledge and declining data quality. In short, the data loses its value until it finally loses its usefulness. This loss of value is difficult to assess and is often only noticed when it has already taken place.
Data organization approaches define processes for handling data and thus address some of the deficits mentioned, a selection of which we discuss below. Data governance is already established in most companies today (even required by regulation according to BCBS 239). It consists of the definition of responsibilities for certain data as well as an organizational structure in which these responsibilities are harmonized and tracked. In data lifecycle management, the lifecycle of a data point from creation to deletion is defined by the organization. This ensures that no data swamp emerges, as the time of deletion is also defined. However, it is also defined that data is available in a certain form at a certain point in time, which often has a regulatory background. The traceability of the path and transformation of a data point through the system is ensured as part of the data lineage. To this end, technical tools are introduced and organizational framework conditions are created. This is particularly interesting if source data or data transformations turn out to be incorrect, as all the resulting data points can be traced. The origin and history of data can be traced as part of data provenance, which ensures the integrity and authenticity of data.
When introducing such a data organization, it is particularly important to ensure that it is embedded in the overarching organization, as more established structures (such as those often found in banks) are particularly resistant to change. Successful implementation, however, embeds data processing in daily processes and forms the basis for a data-driven organization.
Regulation - what can I do with the data?
The more sensitive data points are, the higher the requirements for handling them. With personal and transactional data points, the banking industry is therefore moving into the more demanding area of requirements. Challenges such as data retention in the context of the right to erasure of data (GDPR) must also be defined technically. The challenge is to harmonize regulatory compliance with operational efficiency and technological innovation. The increasing demands on data processing and security require continuous adjustments to new legal requirements as well as investments in modern IT infrastructures and data protection measures. One promising solution worth mentioning at this point is a more efficient presentation of compliance documents in the form of data points whose structural dependencies have been technically stored, thus enabling more efficient processes in retrieval and application.
However, there is considerable room for improvement not only in the handling of data, but also in the handling of regulations themselves. The work of a CISO or a compliance employee is characterized by text-based analyses that can be supported by modern technologies (e.g. LLM agents). As part of a target/target comparison, for example, the current version of the company's written regulations (sfO) is checked for consistency with itself and with the regulatory basis (e.g. DORA). This is a highly time-consuming task that firstly ties up specialist staff, secondly involves a considerable number of consultants and thirdly is usually event-driven and takes place at short notice (e.g. announced audit instead of continuous). On the one hand, this means a considerable financial outlay at certain points and, on the other, the current regulatory status cannot be determined with certainty between events. In addition to regulatory uncertainty, this also results in efficiency losses in day-to-day operations.
Structured, central storage (e.g. graphs) of compliance documents together with modern word processing methods (LLMs) in suitable architectures (RAG/agents) with inherent technical logic (rule system and workflows) can partially automate this work and perform it continuously as required. The specialist works in conjunction with the system and is thus relieved, maintains a continuous overview and can better distribute the workload.
In summary - how is data usage in banking realized?
The analytics/AI use cases discussed in banking enable data-supported decisions, optimization and automation and thus open up potential cost savings and service offerings for both the customer and the bank itself. However, banks often lack the necessary resources, as their data storage and processing, processes and regulatory framework conditions significantly increase the complexity of implementation. The challenges posed by the tension between technology, organization and regulation need to be navigated. It can be found again in the aspects of project/program planning (e.g. business case, timeline), communication (e.g. committees, management communication) but also complexity reduction (e.g. best practices, project integration). If you have encountered similar challenges or need a second opinion, feel free to get in touch.
About the author(s)