Your AI credit models are fine, but their training data is problematic


Your AI credit models are fine, but their training data is problematic

The promise of in lending offers faster decisions and broader access to credit, but it often perpetuates existing inequities. Be wary: Your might not be.

Don't believe me? Let's look at a few instances. First, car loans -- reported that women were more likely to be disproportionately favored for loan originations as opposed to their male counterparts, even while controlling other financial factors. Oh, and with mortgages, we see a very similar story. A to determine creditworthiness found that Black applicants were at a higher risk of being denied as compared to their white counterparts. And it's not just race or color. It expands across age, postal codes and even the college you attended.

At the end of the day, lenders are looking for deterministic factors to underwrite products -- and that's what's going on here. I know all too well. I ran the product for the data science and decisioning team at Ondeck Capital and we looked at every data point we could get our hands on. And I mean it. Got a bad Yelp rating? It was accounted for in our model. Your FourSquare check-ins were down? Oh, we know. We even considered factors like seasonality in cash flow and how businesses in your neighborhood were doing. Our machine learning, or ML, models were designed to process thousands of data points to make lending decisions in seconds.

But I'm here to give you an alternate narrative. I think your AI models are fine (for now), but your data is fundamentally flawed. The issue isn't in the algorithms themselves, but in the historical data we're feeding them. You see, models are trained on datasets that literally go back decades. So, if a certain group has historically been denied loans at higher rates, ML models will subconsciously associate this with "high risk." The model doesn't know it's being unfair; it's simply learning from the patterns we've provided.

The problem is exacerbated by what we in the industry call "thin files" -- credit reports with limited history. This disproportionately affects young adults and recent immigrants -- arguably two groups most in need of access to credit. The alternative is to take on loans, often ones people cannot afford and on unfavorable terms, to build up credit, creating a Catch-22 situation that can trap people in a cycle of debt.

The impact of thin files on creditworthiness is staggering. According to a , banks in the U.K. could be denying loans to 80% of adults with thin credit files, often low-risk customers. These applications typically deemed high risk by traditional lending models would often, in theory, be auto-declined through "hard cuts," a process where applications are eliminated based on specific criteria deemed necessary for approval. If these criteria are missing, models typically disregard any other relevant financial information -- often with more exhaustive data points, effectively shutting out individuals from credit.

So how do we solve this? I'd argue the future of our lending models needs to account for a more holistic picture to determine creditworthiness. We need to diversify our data sources, implement rigorous back-testing for biases and make our models as transparent as possible. Transparency should also extend to the consumer, by allowing them to understand the factors influencing their creditworthiness.

It shouldn't just be about smarter algorithms. It should be about smarter, fairer and more complete data. And on some level, it's about ensuring algorithmic accountability -- and the ethical application of AI and ML in products that have a broad-reaching impact on society.

Previous articleNext article

POPULAR CATEGORY

corporate

15477

entertainment

18694

research

9456

misc

18011

wellness

15432

athletics

19775