I build statistical models for banks which help assess the risk of a loan. Effectively, my models will get converted into the grades (A, B, C, D, etc.) mentioned in the article. The strategies (second chance, family guy, safe haven) are generally consistent with experiences from the portfolios of most financial institutions.
However, I am skeptical (prove me wrong) of the statement in the article - "Lenders get a return on their investment that is typically much better than traditional Certificate of Deposit or Saving Accounts". In finance terms, I will be surprised if they have a higher RAROC [1] as compared to large banks. If they really do, then congratulations (you will put banks out of business in a few years)??
> In finance terms, I will be surprised if they have a higher RAROC [1] as compared to large banks. If they really do, then congratulations (you will put banks out of business in a few years)??
Or, more likely, just drive down bank profit margins.
You are correct, a comparison with CD/SA is an incorrect comparison that doesn't take into consideration the default premium and liquidity premium included in Lenders' return. CD/SA is as close to risk free return (with FDIC insurance) as you can get while Lending Club loans are unsecured loans (highest risk fixed-income investment).
Banks haven't proven too skilled in estimating risk (see 2007). The average Lending Club lender is surely worse than the banks, but the best ones are surely better.
Net it all out and hopefully we get a Darwinian marketplace for lenders, lenders who don't need acres of employees preparing powerpoints for each other in class A office space.
Of course, the true measure of estimating risk is hidden until a crisis occurs. Interesting to see where it all ends up.
> Banks haven't proven too skilled in estimating risk (see 2007).
Arguably, they demonstrated that you can accept a lot of risk and then if things go wrong the Fed will bail you and your creditors out. Not _quite_ the same thing as simply underestimating risk.
I don't know much about the banking industry but savings accounts and CDs have always felt scammy to me. They're marketed to the rubes that have no idea what they're doing so they can get away with not being competitive with other financial instruments.
The reason why savings accounts and CDs have such a low interest rate is because they are far less risky than other financial instruments.
If you compare them to other assets with a similar risk profile and payment structure, such as short dated treasury notes and annuities, you will find that the rates are at least competitive.
The employment length is really bugging me. I've always selected people with a few years at their current job, leaning towards higher, because it feels safe, but this says that <2 years of experience is better than longer! I wonder if they're "newer" so more likely to stay around and not be pushed out, or if the rates are much higher compared to a marginal increase in risk. I'm leaning toward the latter. It looks like income has the same effect for home loans; <50k has a much higher return simply because they get a huge rating hit.
My other big hit is 3 versus 5-year terms. Anyone here care to comment? I like the 36 months because it feels more liquid and when I started I wasn't sure LendingClub was going to be around for a decade or more. Beginning to think I should reconsider that stance.
For students going to a particular university, math SAT scores are inversely correlated with verbal. If students had a higher math SAT and a higher verbal SAT, they'd be at a better school (and worse math + worse verbal = worse school).
For debtors inside a certain grade, it looks like employment history and other creditworthiness metrics are inversely correlated. So I suspect that it's less employment length being an anti-signal, but rather within the grade people with short employment length have compensatory advantages to stay in that grade.
Their current offering document is here[1], but I don't see much mention of specifics. There's some detail on mapping to grades on p42 of the Aug 22 doc, as well as interest rates charged for each risk category.
If you only focus on return, you will draw erroneous conclusions. The upper limit to return is restricted by the interest rate. An A grade loan that carries interest rate of 5% will never give you return of 10% while a D grade loan carrying 18% interest rate has some chance of giving you 10% return, assuming you hold loans to maturity. If you only consider returns, your findings will always be biased toward high interest (supposedly high risk) loans. Similarly, the return comparison across vintage will generate erroneous conclusions.
You need to take into consideration Interest Rate and Default/Loss Rate while considering the validity of the relationship.
The Employment Length on its own is a poor indicator (statistically insignificant). You need to at least combine employment length with credit age (when the first credit line was opened) and Income to improve predictability.
There is no vintage of Lending Club 5-year term loans that have fully matured. The first 5-year term loan was issued in early 2010 (IIRC, May) so the first vintage is just coming up to full maturity.
You need to determine whether you are being compensated on 5-year term loan for potential change in inflation, higher default risk, and longer maturity over 3-year term loan.
> if the rates are much higher compared to a marginal increase in risk
Exactly. Same thing with public records -- having a public record could very well mean you have a higher default rate but what is important is LC punishes it more than they should so its a value investment.
We invest via a model (not filters) and the whole idea behind the model isn't to find what criteria makes someone less likely to default in absolute terms, but what makes one D2 loan less likely to default than another D2 loan.
I'm no expert here but current job duration may be counter-intuitive. We usually think that the longer the better but it may also correlate with inability to find another job, so in fact during a recession, these kinds of people may actually be more at risk than people used to job-hop.
It probably needs to be controlled for social class and job type but I think this could explain the phenomenon.
> but this says that <2 years of experience is better than longer
When you say better, do you mean risk or return? First of all, you are looking at the variable in a "univariate" sense,i.e, the relationship of the default rate or return by the categories of this variable. But their internal model is multivariate - there may be other factors influencing risk or return which is not obvious in the univariate dimension. It also depends on the power of this variable in predicting risk. And finally, lower risk may mean lower return - you just need to find the efficient frontier :) - http://www.investopedia.com/terms/e/efficientfrontier.asp.
The code to process the lending club data is done in python. I got the full history of all payments made on the platform and pre-process it to have something that's light enough to be explored with a good user experience.
For the viz', yes I used DC.js, I can open source the .js if you guys want it.
Hi Clement, very neat dataviz ! How did you get the full history of all payments ? Is it available in the open ? What is 100mdeep btw ? Just a blogging site you are bootstraping or any plans to make a living out of it ? (We met a few months ago @ ToucanToco...)
Nice image! I'm a big fan of Guillaume Nery, the french free diver (4 word records). Actually Constant weight divers are allowed to use fins (monofins most of the time), although they use it only to get to ~30m deep, after that point, the pressure is such that the volume of the body decreases and Archimedes thrust is no longer sufficient to compensate for the weight... so the diver can descend without any mouvement... It's very impressive to see (http://www.liveleak.com/view?i=d4a_1244462128).
Funny to read about this a few days after Guillaume Nery's accident: a line was set to the wrong depth causing the diverto dive to -139m/456ft instead of -129m/-423ft that he had announced the night before...
Thanks Clement for this beautifully simple dc.js dataviz. How long did you play with it before finding the pearl? Do you think there are yet other pearls to find in your tool?
There are many pearls to find. It depends on your 'set of preferences'. You want return? you want low-risk ? you care to deploy a lot of money, or not that much?
But I'd say that for starter, anything in the 8%-9% range is a very good deal these days in this environment
Very useful tool. Gives valuable insight on how to select filters in portfolio construction. If "the Pearl" was a existing product, I would definitely invest in it.
Look for the Pearl!
you can definitely get 8% interest over the long run with a good liquidity on your cash deployed. To me, it's totally worth it for money that you don't need in the really short term
I've been using it for about a month and its one of the better LendingClub auto-investors out there. By far the best interface, but the fee is a little steep, though the fee only takes effect after investing 10k though them. I've rolled my own auto-investor in the past and used the one on NSRPlatform(free for accounts <20k). I'm planning switch over to LendingRobot 100% when I can't use NSRPlatform for free anymore.
Thanks! Any recommendations for someone looking to get into this space, especially in terms of things you had wish you knew starting off? (I'm looking at this as a "long shot" diversification of a portion my savings, allocating a comparatively very small amount and going entirely hands off)
I have all my LendingClub funds in a Roth IRA so I don't have to deal with any of the tax loss/gains stuff. I've heard the tax accounting can get tricky in normal accounts, so I highly recommend putting your LendingClub funds in a tax-advantaged account.
As for filtering and stuff, you can you do your own underwriting(loan risk analysis) with the data lendingclub provides[1], or use https://www.nsrplatform.com, which has a nice GUI tool to explore the data with you're own filters. LendingClub has a JSON api, so you can an order executer for yourself. (Here's the remnants of the one I was working on https://github.com/gtremper/LoanInvestor. P2P-Picks was a 3rd party underwriter that isn't available anymore). I've noticed that the D and E loans tend to be the best balance of risk and return
Also, be aware that you'll need to continuously buy new notes as payments come in to your account, otherwise you'll build up cash rather quickly. That's why these auto-investing services are so useful. Its best to buy only $25(the minimum) per loan so you can spread your risk among as many notes as possible.
I operate an online crowd-lending analytics and automation platform PeerCube https:/www.peercube.com. I have been analyzing both Lending Club and Prosper data for my institutional clients for almost 4 years now. While OP made a good first attempt on analyzing the data, the analysis suffers from two major shortcomings that I normally see from people getting started with data analysis.
1. Domain Knowledge: Novice analyst tend to put the data in a blender and see what comes out first instead of building some preliminary knowledge and intuition about the domain. This is quite evident in OP's analysis and finding about annual income. A person familiar with domain will ask the question "Why would a borrower with high annual income will borrow a small amount loan at high interest rate?" This right away will raise flags about risks of lending to such borrowers. OP will benefit by reading some of the publications (books, research) on credit scoring and modeling before deep diving into analyzing Lending Club data.
2. Data Exploration: Not spending enough time exploring the data can lead to erroneous conclusion like The second chance strategy. When did Lending Club start issuing loans to borrowers with delinquencies and public records has a big impact on returns as newer loans are not aged enough to have sufficient defaults.
> Watch for your average return (expected return), consistency of returns through time (risk), while making sure there is enough supply (liquidity) on the platform to deploy your strategy.
Time is not Risk. You need to find a proper measure for risk. Also consider negative kurtosis and frequent low positive returns but a few high negative returns nature of return distribution.
> I considered that investors deploy and re-invest their money continuously on the platform and therefore own a portfolio with different ‘vintages’ of loans. The ROI that are computed reflect this, as they are average returns across vintages.
Re-consider this argument of "average return across vintages" being representative of investor returns. Tip: look at loan volume across vintages as well as typical re-investment pattern of a typical investor.
> Please also note than due to the low issuance volume in the early days of the platform, the returns computed for the pre-2010 period are much less reliable than the post-2010 returns.
Please don't do this. The data between 2006 and 2010 is the most valuable due to the business cycle we were in at that time. The data since 2010 tells nothing about how loans might perform in the future when business cycle is not as good it has been in last few years.
OP will really benefit from re-evaluating his finings with critical eyes. I will suggest gaining some domain knowledge, spending lot of time on just exploring the data before start drawing definite conclusions, focusing on distributions, correlations and statistical significance.
A bit more courtesy would have been welcome. You sound very condescending. And I hope you talk to your clients in a different way!
Let me address your methodology comments nonetheless, which are for the most part unfounded.
* I don't have any finding about annual income. I don't think it is mentionned anywhere in my conclusions.
* "delinquencies and public records has a big impact on returns as newer loans are not aged enough": because I average across vintage, and because I don't average based on volume on the platform, I account for the aging biais.
* "Time is not risk [..] kurtosis etc.": I don't say that time is risk. I suggest the reader to look at the return series through time. Essentially to look at the volatility of the returns ( without pronouncing the word volatility to keep the content accessible to a novice reader). I essentially encourage the reader to visually assess his Sharpe ratio. Which is a good universal risk measure.
* "reconsider average across vintage": averaging across vintage is a first approximation. I acknowledge the fact that a better methodology would be to take a weighted average that matches the amortization profile of a loan.
* I maintain that any statistics you compute in 2006, 2007 or 2008 is less reliable (statistically). Yes it is an important period to have because of the crisis. And this is why I put on the chart. However, you can't compute very reliable returns when you have a dozen of loans to average across.
Anyway, I happy to exchange with you in PM on methodology if you would like to continue the discussion
Sorry for coming across condescending. This was not my intention. I was just trying to guide you in the right direction as you came across someone who is just getting started with data analysis.
Once again, I will stress, you need to reconsider your methodology if you want to learn.
However, I am skeptical (prove me wrong) of the statement in the article - "Lenders get a return on their investment that is typically much better than traditional Certificate of Deposit or Saving Accounts". In finance terms, I will be surprised if they have a higher RAROC [1] as compared to large banks. If they really do, then congratulations (you will put banks out of business in a few years)??
[1] https://en.wikipedia.org/wiki/Risk-adjusted_return_on_capita...