Lithuanian Actuarial Society is inviting actuaries and other interested parties to an online seminar Clustering-based optimization in fraud detection classifier training that will take place on 20th of October 2022 at 5pm. Seminar will be held via Teams by Dalia Breskuviene (Data Scientist at Danske Bank, PhD student). There will be 1.5 hour granted as CPD points.
Fraud detection is an essential problem in the bank industry. It can create the loss of money and can do massive harm to the reputation of financial institutions. Therefore, in real-world examples, fraud comes as a prevalent and influential research area. The goal is to train the transactions classifier of two classes: fraudulent and regular transactions. Fraudulent transactions are a rare event that leads to very imbalanced data. Therefore, the imbalanced data set faces unsolved issues when used for classifier training. Let us have a data set of transactions. We suggest splitting the classification process into several ones. The training data set is clustered, and different sub-classifiers are trained on the clustered data. We chose XGBoost as the classifier of transactions. When testing the classification, the decision is made by a sub-classifier whose training set center is the
closest to the particular point from the training set. In our case, the proper criterion of classification is the F1 score because it is a harmonic mean of precision and recall. For the experimental evaluation of the suggested strategy, We use the credit card transaction database (https://data.world/ealtman/synthetic-credit-card-transactions) representing actual transactions of the credit card users living in the United States. The experiments show that we succeed in the significant increase of F1 score as compared with the case without clustering.