| I operate an online crowd-lending analytics and automation platform PeerCube https:/www.peercube.com. I have been analyzing both Lending Club and Prosper data for my institutional clients for almost 4 years now. While OP made a good first attempt on analyzing the data, the analysis suffers from two major shortcomings that I normally see from people getting started with data analysis. 1. Domain Knowledge: Novice analyst tend to put the data in a blender and see what comes out first instead of building some preliminary knowledge and intuition about the domain. This is quite evident in OP's analysis and finding about annual income. A person familiar with domain will ask the question "Why would a borrower with high annual income will borrow a small amount loan at high interest rate?" This right away will raise flags about risks of lending to such borrowers. OP will benefit by reading some of the publications (books, research) on credit scoring and modeling before deep diving into analyzing Lending Club data. 2. Data Exploration: Not spending enough time exploring the data can lead to erroneous conclusion like The second chance strategy. When did Lending Club start issuing loans to borrowers with delinquencies and public records has a big impact on returns as newer loans are not aged enough to have sufficient defaults. > Watch for your average return (expected return), consistency of returns through time (risk), while making sure there is enough supply (liquidity) on the platform to deploy your strategy. Time is not Risk. You need to find a proper measure for risk. Also consider negative kurtosis and frequent low positive returns but a few high negative returns nature of return distribution. > I considered that investors deploy and re-invest their money continuously on the platform and therefore own a portfolio with different ‘vintages’ of loans. The ROI that are computed reflect this, as they are average returns across vintages. Re-consider this argument of "average return across vintages" being representative of investor returns. Tip: look at loan volume across vintages as well as typical re-investment pattern of a typical investor. > Please also note than due to the low issuance volume in the early days of the platform, the returns computed for the pre-2010 period are much less reliable than the post-2010 returns. Please don't do this. The data between 2006 and 2010 is the most valuable due to the business cycle we were in at that time. The data since 2010 tells nothing about how loans might perform in the future when business cycle is not as good it has been in last few years. OP will really benefit from re-evaluating his finings with critical eyes. I will suggest gaining some domain knowledge, spending lot of time on just exploring the data before start drawing definite conclusions, focusing on distributions, correlations and statistical significance. |
Let me address your methodology comments nonetheless, which are for the most part unfounded.
* I don't have any finding about annual income. I don't think it is mentionned anywhere in my conclusions.
* "delinquencies and public records has a big impact on returns as newer loans are not aged enough": because I average across vintage, and because I don't average based on volume on the platform, I account for the aging biais.
* "Time is not risk [..] kurtosis etc.": I don't say that time is risk. I suggest the reader to look at the return series through time. Essentially to look at the volatility of the returns ( without pronouncing the word volatility to keep the content accessible to a novice reader). I essentially encourage the reader to visually assess his Sharpe ratio. Which is a good universal risk measure.
* "reconsider average across vintage": averaging across vintage is a first approximation. I acknowledge the fact that a better methodology would be to take a weighted average that matches the amortization profile of a loan.
* I maintain that any statistics you compute in 2006, 2007 or 2008 is less reliable (statistically). Yes it is an important period to have because of the crisis. And this is why I put on the chart. However, you can't compute very reliable returns when you have a dozen of loans to average across.
Anyway, I happy to exchange with you in PM on methodology if you would like to continue the discussion