Best practices for fusing internal and external data to enhance credit decisioning for SMB lending with Stripe’s Yaakov Erlichman
- Learn strategic and tactical best practices for maximizing the fusion of internal and external data to enhance credit decisioning and risk management for SMB lending.
- Stripe's Yaakov Ehrlichman, head of capital and SMB risk, covers both conceptual and implementation considerations for reconciling data sources tailored to the unique needs of the SMB market.
As a part of Tearsheet' Talks: Lending x Credit x Data, Stripe's Yaakov Erlichman, Head of Capital and Small Business Risk, provides a practical framework for integrating proprietary and third-party data sources in small and medium business lending to drive effective underwriting. He discusses recommendations for sequencing data introduction, balancing model predictive power with applicant friction reduction, optimizing the mix of internal and external data, and managing model degradation.
Yaakov is a seasoned risk management professionalresponsible for leading the credit strategy at Stripe. He joined the company four years ago to build out the risk infrastructure for Stripe Capital. Prior to Stripe, Yaakov was part of the risk teams at Kabbage and American Express.
Learn strategic and tactical best practices for maximizing the fusion of internal and external data to enhance credit decisioning and risk management for SMB lending. The session covers both conceptual and implementation considerations for reconciling data sources tailored to the unique needs of the SMB market.
Prioritizing internal data to launch a lending product
Yaakov Erlichman, Stripe: I think that one needs to really think about what data they have access to. That's the most important thing. Money is very fungible and in the SMB lending space, there's a lot of lenders out there but those lenders are offering products that are all very similar. So, to differentiate yourself in that market, you really need to use data. Data is ultimately what's going to differentiate because you're going to be able to offer better pricing, approve more customers, and just create a better user experience. Users are going to apply, they're already going to be pre approved, because we know so much information about these users, and it's a really good experience for them.
If you're a standalone lender, it's actually quite difficult, because you're not going to be able to differentiate yourself against anybody else -- it's really just going to be the same data sources that everyone has access to, whether from the commercial bureaus or the consumer bureaus. So you really have to think about how you want to differentiate yourself. And it's actually quite difficult.
The Stripe perspective is that if you are an embedded lender, you should really be leaning in, in the early days, to your own data, because that's really going to differentiate you from anyone else out there. You have something about your customers that no one else knows. So let's say you're going to be a little bit inward looking here, you're a payment processor. And you can see that a user is really consistent in their transaction volume. Lean into that. If they're a coffee shop on the block, and you can see every single day that they have the same number of transactions, that's a pretty good signal as to the riskiness of that business. And you might want to exclude any external signals.
I remember a case that we had we had a boat rental place in the middle of Vermont, and they were so happy that we were able to offer them funding because we didn't look at their bureau score, we didn't see their 500 FICO score, we just saw their consistent transaction volume. And I think that that really is a differentiating factor, so lean into that internal data, especially when you launch.
Balancing between internal and external data
It's actually really important because if you happen to be an embedded lender, and I'm probably telling you something that's very obvious, you have to be cautious about how you leverage internal data. We'll use the payment processor example and you're saying, Oh, look at this transaction volume, it's really, really consistent. You might be only picking up a small signal on that business, right? You might only be seeing 5% to 10% of their volume -- maybe they use another payment processor for volume that is very volatile. Or maybe they're really heavily into cash and wires. That would be a different signal.
So there is a cautious balance that you need to strike. As you evolve over time, to ensure that you're really understanding that business, the key to all of data is to fully understand the health of a business. Whether the data is internal or external, the business is a business. The business just wants you to understand that. And unlike consumer, where things are quite clean and everyone has access to consumer bureau and most of your trades get reported to the Consumer Bureau, commercial is still the Wild Wild West -- there's nobody who's picked up all the signals and put them into one repository. So it gives you an opportunity to differentiate yourself in that space, if you are able to pick up the right signals and weight those signals in an effective way. But it's critical that you understand that because if you're really leaning in on internal data, you could be erroneous in your decision because you're not fully picking up all the signals.
For the example of the boat rental business, our secret sauce here is that we didn't look at consumer bureau as part of that decisioning logic. We built out our models on our internal data. And again, there was some caution that you're not necessarily seeing the whole picture. We were really confident in the models that we built against our internal data -- we really did a good job of back testing them and having a really strong data science team that was rigorous in their approach in how we treated that internal data. And again, when we launched, we launched with only internal in this case. But over time, we evolved to say that you know what, maybe we could expand our market share if we got additional information -- we kind of eventually evolved similar to what everyone does as a standalone lender to take in banking data, financial data, and other data sources to supplement our internal data.
User experience and embedding lending
It's a huge component and really a differentiating factor. Again, money is fungible, but the user experience is a critical part of the journey. Using internal data is a great way to differentiate yourself, because you can come up with much stronger offers to your customer or your prospects. We can go to a user and say, we're 95% sure and even higher that you're approved, basically, for this offer because we are so confident in our models, just click here to take the money. There's no application, there's no information that we're asking to gather from you. And that is a great experience that we can provide.
But, again, that experience can only be provided to some users. Talking to other lenders out there, you can get to a point where you're sourcing data in such a confident way that you can provide that frictionless experience to a user to really say to them, Hey, we know so much about you, before you even show up at our door, here's an offer or here's money, you just click here to accept it. That is really an awesome experience for a customer.
The right balance between heuristics and models
Initially, you're going to have to really lean into heuristics. You're not going to have the data that's going to allow you to build the really rigorous gradient boost models or logistic regression models or even the new modeling techniques that are out there. You're really going to just have to use some industry experience and talk to folks that are really experts in this domain area and really understand okay, what are the parameters that we can put into place soon? What are the basic rules that we can put into place ourselves to keep us out of trouble?
If you're going to be using FICO, what FICO threshold are we going to be comfortable with? And the other thing I would say is to balance that against how much you want to learn. One of my previous employers wanted to know how far into the credit box they could go and still make money. So like, how low of a FICO threshold can you go and still be profitable? And in order to do that, you have to test. You can put some parameters in place. And you can say, Okay, this is how we're going to operate. But you also have a hypothesis like, How much am I willing to invest? And think about it as an investment -- it's really money you're going to lose, but how much money are you really going to invest to able to get the data that you need to be able to build your models?
That's what the heuristics will allow you to do -- to really build out your data sets that will allow you to build your models. Most companies are not going to be in a position to get the right data on day zero, or negative day 10 or negative day 100, to build out the models that will allow them to launch on day zero. So you really are going to have to use heuristics and think about those heuristics as an entryway into buying data that will allow you to build out the models, which is ultimately your objective. Your objective should be to transition from heuristics into models, because models are much more powerful than individual roles.
Resources needed to build lending models
You're going to need a strong credit risk team that's going to be able to do this process. Somebody who really understands the domain area and is an expert in lending, whatever that lending might be. It could be consumer, commercial, a lot of those skills are transferable across different verticals. But somebody who's able to create that structure, and I think you're really going to need a strong data science team. The successes that we've been able to achieve have really been driven on the backs of our data science team. I think it's also important to call out that a data scientist doesn't just have the label of data scientist and is automatically going to enable your success. I've seen a lot of challenges where companies hire data scientists that are experts in the advertising space, in marketing, or in other areas.
There's one key differentiating factor that I see: time to learn is is very important. When you're a Google or Facebook engineer, and you launch a new algorithm on the site, you can get your results instantaneously -- is this going to lift? On Amazon, is the customer going to buy more products, buy more dresses, or buy more shirts based upon how you adjust the screen or adjust the algorithm or whatever it is?
In lending, your time to learn is months, if not years. You deploy, you have a one year working capital term loan, it could take you 12, 15, 18 months until you actually see performance. A lot of data scientists don't appreciate that time to learn. And a lot of product managers don't think about it as well, which is another important call. We think of it like a roller coaster: you launch the car down, you have no brakes, no controls, you're just going to let that go, you don't know where it's going to end up. Having that frame of mind is critical as you think about financial service products.
The tie between products and data
I think that you really can lean into the product side to differentiate yourself. You can really take the data that you're able to understand or accumulate about your users and transform that into the product. So for example, let's think about like the cash flows of the user. You can really tailor your products around the risk strategy. So first, you could have a seasonal business, right? Our favorite is our landscapers that are very busy from, let's say, April to October. And you could offer them a product. You could say, I can offer them a product during their busy season. Or you can offer them a product during the low season to help sustain them. That is a differentiating factor.
A lot of lenders can do that are strategic about we shape a product to really meet the needs of our customer. Do we want to help that user during their offseason? Or do we want to help them during their busy season? There's reasons to do both. But even having those insights to inform the product strategy are incredibly powerful. And you can message that, as well: hey, we know that you're going into a low season, or hey, we know that you need to buy inventory and ramp up your business before you go into your busy season. We want to offer you some money to be able to do that. That is hugely differentiating, if you're able to get to that level.
Anyone is able to do that, if you're thinking about that to inform product strategy. I think that a lot of product managers just think about risk as like a checkbox. Can I approve the user: yes or no. And it's very narrow minded. I think that if you expand it and risk can really be a product enabler, to say, how can we scale our products based using sound risk strategy? And that's how we operate here -- risk is really a driver of product growth. Because we are thinking through not just can we keep ourselves out of trouble, but like, how do we optimize our returns? How do we optimize our conversions, using risk as a differentiating factor? And data is what ultimately drives that. That's ultimately what this comes down to.
Selecting data vendors
I'm not going to specifically call out individual vendors. But I think there are only a handful of very unique large vendors that are out there. Actually, I will call out a couple: you have the DNBs, the Lexises, the Equifaxes of the world that are really good at what they do. They have a very broad understanding of a lot of users that are out there. And I think it's worthwhile to have conversations with them to see what they offer.
I will caveat, though, in the commercial space, it is really challenging to use a commercial bureau in an effective way because they don't necessarily have access to the granular information. A lot of the information they have is distilled -- it's a little bit rough around the edges, like they're taking information from various ad hoc trade lines and stuff. They don't really paint a complete picture of a user.
That being said, there are some very effective scores that are out there. DNB has a couple of scores that are quite good. I would say the CCS score that they offer is quite good. It's also quite expensive, but the value is there. If you're starting from scratch, I think those are very effective.
I think over time, though, what I've seen is that the Alloys of the world where you're able to take multiple data sources, and they provide a lot of rigorous balancing and underwriting of the data, could also be a really good way to lean into multiple data sources, rather than you having to set up relationships with a lot of vendors.
I haven't seen a lot of very effective newcomers into this space, which has been unfortunate. I don't see a lot of new entrants that are offering differentiating products on the commercial side. There were a couple of companies that were trying to do it, but it's a very difficult thing to do.
Before I forget, I will say SBFE, if you're not familiar with that, the Small Business Financial Exchange, is kind of a secret sauce. It's a very effective consortium of commercial lenders. They span probably 500 lenders and I don't know if a lot of folks know about this. It means you have to sign up for the consortium and participate in it. They are a great source of information, probably one of the best sources of information, to really understand the risk of a user because commercial trade lines get reported to them. And they've done a really good job of managing that information and effectively disseminating it to their membership.
Frequency of measuring performance
You have to constantly be evaluating your model performance. Models degrade pretty fast. So I would say you should be looking at your models every three to six months, and constantly determining whether they're still performing as expected. And you should be constantly looking at what additional features you can update in your models -- there's constantly new developments, whether they're internal signals or external signals that will really inform how your models are performing.
One of the things that we added in the last year or two is just the recession signals, basically looking at how the overall economy is performing. When you initially launch, it's probably not relevant. If you have a small portfolio, that's probably not going to be meaningful, but as you start to scale, you could really start to see meaningful differentiation from external signals like that. I would say you should be evaluating what features you could be adding every quarter and then you should just be monitoring your models probably every three to six months and deploying new versions of those models, at least twice a year.