Statistical models - Should we trust them for detecting fraud?

Statistical models are ubiquitous, from predicting the next wind direction in the Atlantic to whether you are about to buy a Beatles tune from Amazon. Statistical models have also been used widely to predict fraud among mobile subscribers. The object is to eliminate fraud as soon as it occurs. However, more often than not, legitimate customers behave suspiciously in the eyes of the statistical models, making the case for: should such models be used at all, in particular, for making decisions about “anything beyond the trivial”.

To illustrate, a legitimate customer can suddenly get the opportunity to travel and thereby make a lot of calls to share his experience before, during and after the event. This is completely normal for the customer, albeit a little unusual. However for the model, the change in behaviour has deviated so far that it categorises the customer to be suspicious or even fraudulent based on his average usage pattern or the average pattern to the group which the particular customer has an affinity. Depending on the thresholds set by the Analyst the customer may be subject to partial service disconnection, asked for additional deposits or asked to resubmit ID proofs. Read here that all these don’t happen in split seconds, so the customer is inconvenienced to a great extent before things turn normal, if it does. One could even argue that such fraud detection systems, instead of saving money, ironically, can cause great loss to the operator as the frustrated customer may take his or her business elsewhere.

Several research papers have questioned the use of statistical models for predicting complex human behaviour and casts doubt on whether such techniques should be used for making decisions about “anything beyond the trivial”. One landmark paper published in the British Journal of Psychiatry by forensic psychologists Stephen D. Hart, Christine Michie and David J. Cooke (May 2007 issue) argue that predictive statistical models are best avoided or used with great caution. The article “Forecasting human behaviour carries big risks” in the Guardian by Christine Evans-Pughe has succinctly summarised the flaws in using statistical models for predicting complex human behaviours (http://www.guardian.co.uk/technology/2007/jul/19/guardianweeklytechnologysection.it).

Our own experience of using statistical models is a mixed bag. While some models with a wide array of reliable variables and parameters have closely predicted group behaviour, it fell woefully short on predicting individual behaviour with any accuracy or confidence. This method of fraud prediction, i.e. using pure behaviour models, had a false positive rate of over 80%, wasting valuable Analyst time. By depending on hard links (e.g. customers using hot listed handsets, call fingerprints etc.) between individuals and groups, we were able to build a far more reliable model to predict suspicious behaviour (false positive down to about 45%) than pure predictive algorithms based simply on subscriber affinity to certain groups marked as high risk.

Some common issues stand out in our experience of using pure statistical models. One is that it’s difficult to create reliable profiles or states of fraudsters for its use in predictive models. For example, operators classify many kinds of people as fraud. However, only a small proportion of them are technically fraud, i.e. people who come with intent to defraud. The rest might just be people frustrated with the service and don’t pay their bills or customers whose circumstances have changed so drastically that they have abandoned service without trace. Such cases should generally be classified as bad debt but is dropped into the fraud pot.

The second issue is that most predictive models assume that the call behaviour is normally distributed (read bell curve) which rarely is the case. I mean think about this. Do you use your phone in exactly the same way day after day and month after month? Finally, poorly built rules, limited historical data, changing customer behaviours and technologies also render such detection systems ineffective.

Statistical models are important, in so far that they help human beings look through the clutter to see what is essential to make decisions. It has a rare scientific rigour to it. However, it shouldn’t be the ‘be-all and end-all’ of finding solutions to complex human problems.

Built for flexibility, compliance and reliability to serve multiple industry segments.