Lending Club Data Analysis

LendingClub is a US peer-to-peer lending company headquartered in San Francisco, California, and has helped over 2.5 million customers simplify their finances in the last 10 years. LendingClub improves the loan process for borrows by offering a fast and easy online application. For investors, the company offers historical returns of 3 – 8% and anyone can invest with as little as $1000 [1].

Because LendingClub relies heavily on technology to evaluate their borrowers, getting an accurate risk analysis for each applicant requires systems which can quickly assess the applications, and upon approval, offer these loans to interested investors at a given interest rate. Of the $7.9 billion dollars loaned in 2018, $233 million was written off as defaulted loans. While this may seem insignificant at 2.9%, this does represent risks and losses which investors and the company would prefer to avoid. In order to mitigate risk, lending companies traditionally apply a fitting interest rate to each loan. For example, loans for a home or a car may have lower interest rates because the risk is reduced due to directly related collateral. In another example, someone with a poor credit history or having declared bankruptcy may have a higher interest rate due to the inherent risk of history repeating.

LendingClub provides an anonymized data set [2] of all their current and completed loans available for download on their website. Our goal was to use the data set to try and understand which data points may contribute to the interest rate designated to the loans. We reviewed the data set of 107,000+ observations (Appendix 1) and the accompanying data dictionary (Appendix 2) of the ~120 data categories included for each loan, and decided that we wanted to build a model which included 7 categories, resulting in 8 independent variables due to dummy variables, to try and predict if there is any relationship between these independent variables and the interest rate on the loan. This relationship could be described by the basic model Equation (1) below:

Int_rate = b0 – b1(loan_amount) + b2(funded_amount) – b3(annual_income) – b4(rent) – b5(last_payment_amount) + b6(debt_to_income_ratio) + b7(open_accounts) – b8(total_accounts) – b9(mortgage) (1)

Table 1: Independent variables include

b1 Loan Amount The listed amount of the loan applied for by the borrower. If at some point in time, the credit department reduces the loan amount, then it will be reflected in this value.
b2 Funded Amount The total amount committed to that loan by investors.
b3 Annual Income The self-reported annual income provided by the borrower during registration.
b4   b9 Home Ownership Mortgage, Renter or Own
b5 Last Payment Amount The amount of the last payment made by the borrower
b6 Debt to income ratio A ratio calculated using the borrower’s total monthly debt payments on the total debt obligations, excluding mortgage and the requested LC loan, divided by the borrower’s self-reported monthly income.
b7 Open accounts The number of open credit lines in the borrower’s credit file.
b8 Total Accounts The total number of credit lines currently in the borrower’s credit file.

Note: Home ownership was divided into to two dummy variables, mortgage and rent. Therefore, if both values are zero, the observation is for an individual who owns property outright.

Table 2: Results Summary

Analysis of results

            In initially looking at the overall results, we can take away a few points. After running our regression analysis on the data set, we found we had an R2 of 0.0308 using the stated independent variables. Traditionally, this is not a strong R2 value for cross sectional data. In general, the acceptable to excellent range for this type of data would be from 0.3 to 0.7. However, we can take a deeper look at the results to see if we can deduct any other information. Next, we can take a look at the F-test. The F-test tells us if the independent variables, as a group, explain a statistically significant share of variation in the dependent variable. Our results included an F (calculated value) and the test is shown below:

F-TEST:

Null hypothesis H0: R2 = 0

Alternative hypothesis HA: R2 > 0

The null can be stated alternatively as the model has explanatory power; the alternative is then that the model has no explanatory power.

If the value of  F Calculated is greater than, or equal to the F Critical (From F Table) 1.88 (9 DoF x ∞ DoF) we can reject H0, if the value of F Calculated is less than F Critical we fail to reject H0.

In looking at the results, we find a calculated F value of 379 which is greater than the table value of 1.88. Therefore, we can reject H0. This implies that the model has explanatory power; however, we must look at several other factors before validating the model.

T-TEST:

Our next step is to look at each independent variable and its relationship to interest rate. This can be validated through a T-test, as shown below. These calculated t values shown in Table 3 are compared to the threshold value of 2.262.

Table 3: Estimate regression coefficients of interest rate relating to various variables

Null hypothesis H0: βi = 0

Alternative Hypothesis HA: βi ≠ 0

β = coefficient of variable being checked/Std Error of coefficient

If the absolute value of t Calculated is greater than or equal to t Critical (From the t Table) 2.262 (9 DoF x 0.05) we can reject H0, if the absolute value of t Calculated is less than the t Critical we fail to reject H0.

Comparing the t Critical value to each independent variable, we can see that we would reject H0 for annual income, home ownership status (rent/mortgage), last payment amount, debt to income ratio and both open and total accounts. On the other hand, we would fail to reject H0 for the variables loan amount and funded amount. These are calculated at the 95% confidence level.

P-TEST:

A third test that can be used to validate the model is the P test.

Null hypothesis H0: βi = 0

Alternative Hypothesis HA: βi ≠ 0

If the absolute value of the P value is less than or equal to 0.05, we can reject H0, if the P value is greater than 0.05, which is the P critical value, we fail to reject H0.

In reviewing the P values in Table 3, we can see that they all fall below the threshold of 0.05, or a confidence level equal to, or above 95%. This implies that all variables are relevant, contradicting some of our results from the T test shown above.

ELASTICITIES:

Elasticities were calculated in order to determine the magnitude of effect of each independent variable on the dependent variable. Elasticity is defined as the percent change in the dependent variable as a result of a percent change in the independent variable.  The elasticities and formula are shown below in Table 4.

Table 4: Elasticities

            According to Table 4 above, the loan amount and funded amount have the greatest impact on the interest rate. The greatest impact is determined through the largest absolute value. The anomaly in this is that the loan amount variable has a negative elasticity and funded amount has a positive elasticity. This does not make sense given that most of the observations have the same value for these variables. It would make sense that the loan amount has a negative elasticity because a large loan amount generally lean towards something that contains equity, such as a car or house. These types of loans typically have lower interest rates and lower risk due to the availability of collateral. Alternatively, an account such as a credit card generally has a small balance and a high interest rate due to high risk and no collateral. Other interesting trends to note here would be the elasticities associated with open accounts and total accounts. A negative elasticity on open accounts is intuitive because if a person has many open accounts (large amounts of debt), risk is increased. Alternatively, a person with a large number of total accounts is assumed to have a lot of credit history and may be assumed to be less of a risk, therefore showing a negative elasticity.

Conclusion

            In conclusion, the model described above has some characteristics that are intuitive given the input variables used; however the overall model is not great. The R2 value is very low for cross-sectional data despite having passed the F-test. All of the variables pass the P-test but a majority of them do not pass the T-test. The model could be improved if other variables were available to test. Things that might impact the interest rate could be credit score or the purpose of the loan. We did have access to credit “grade” which we believe is related to credit score, but these variables showed colinearity when included in the model. Additionally, the data for “purpose of the loan” was available in the data set, yet the inputs were not uniform. In order to include this in the model, we would need to adjust the observation values for each of the 107,000 data points.  Overall, we have a valid model with plenty of room for improvement.

Appendix 1

Screenshot of example data set to be used.

Appendix 2

Included below is a sample of the data set, along with the metadata, explaining the fields, and the descriptions.

Field Description
acceptD The date which the borrower accepted  the offer
accNowDelinq The number of accounts on which the borrower is now delinquent.
accOpenPast24Mths Number of trades opened in past 24 months.
addrState The state provided by the borrower in the loan application
all_util Balance to credit limit on all trades
annual_inc_joint The combined self-reported annual income provided by the co-borrowers during registration
annualInc The self-reported annual income provided by the borrower during registration.
application_type Indicates whether the loan is an individual application or a joint application with two co-borrowers
avg_cur_bal Average current balance of all accounts
bcOpenToBuy Total open to buy on revolving bankcards.
bcUtil Ratio of total current balance to high credit/credit limit for all bankcard accounts.
chargeoff_within_12_mths Number of charge-offs within 12 months
collections_12_mths_ex_med Number of collections in 12 months excluding medical collections
creditPullD The date LC pulled credit for this loan
delinq2Yrs The number of 30+ days past-due incidences of delinquency in the borrower’s credit file for the past 2 years
delinqAmnt The past-due amount owed for the accounts on which the borrower is now delinquent.
desc Loan description provided by the borrower
dti A ratio calculated using the borrower’s total monthly debt payments on the total debt obligations, excluding mortgage and the requested LC loan, divided by the borrower’s self-reported monthly income.
dti_joint A ratio calculated using the co-borrowers’ total monthly payments on the total debt obligations, excluding mortgages and the requested LC loan, divided by the co-borrowers’ combined self-reported monthly income
earliestCrLine The date the borrower’s earliest reported credit line was opened
effective_int_rate The effective interest rate is equal to the interest rate on a Note reduced by Lending Club’s estimate of the impact of uncollected interest prior to charge off.
emp_title The job title supplied by the Borrower when applying for the loan.*
empLength Employment length in years. Possible values are between 0 and 10 where 0 means less than one year and 10 means ten or more years.
expD The date the listing will expire
expDefaultRate The expected default rate of the loan.
ficoRangeHigh The upper boundary range the borrower’s FICO at loan origination belongs to.
ficoRangeLow The lower boundary range the borrower’s FICO at loan origination belongs to.
fundedAmnt The total amount committed to that loan at that point in time.
grade LC assigned loan grade
homeOwnership The home ownership status provided by the borrower during registration. Our values are: RENT, OWN, MORTGAGE, OTHER.
id A unique LC assigned ID for the loan listing.
il_util Ratio of total current balance to high credit/credit limit on all install acct
ils_exp_d wholeloan platform expiration date
initialListStatus The initial listing status of the loan. Possible values are – W, F
inq_fi Number of personal finance inquiries
inq_last_12m Number of credit inquiries in past 12 months
inqLast6Mths The number of inquiries in past 6 months (excluding auto and mortgage inquiries)
installment The monthly payment owed by the borrower if the loan originates.
intRate Interest Rate on the loan
isIncV Indicates if income was verified by LC, not verified, or if the income source was verified
listD The date which the borrower’s application was listed on the platform.
loanAmnt The listed amount of the loan applied for by the borrower. If at some point in time, the credit department reduces the loan amount, then it will be reflected in this value.
max_bal_bc Maximum current balance owed on all revolving accounts
memberId A unique LC assigned Id for the borrower member.
mo_sin_old_rev_tl_op Months since oldest revolving account opened
mo_sin_rcnt_rev_tl_op Months since most recent revolving account opened
mo_sin_rcnt_tl Months since most recent account opened
mortAcc Number of mortgage accounts.
msa Metropolitan Statistical Area of the borrower.
mths_since_last_major_derog Months since most recent 90-day or worse rating
mths_since_oldest_il_open Months since oldest bank installment account opened
mths_since_rcnt_il Months since most recent installment accounts opened
mthsSinceLastDelinq The number of months since the borrower’s last delinquency.
mthsSinceLastRecord The number of months since the last public record.
mthsSinceMostRecentInq Months since most recent inquiry.
mthsSinceRecentBc Months since most recent bankcard account opened.
mthsSinceRecentLoanDelinq Months since most recent personal finance delinquency.
mthsSinceRecentRevolDelinq Months since most recent revolving delinquency.
num_accts_ever_120_pd Number of accounts ever 120 or more days past due
num_actv_bc_tl Number of currently active bankcard accounts
num_actv_rev_tl Number of currently active revolving trades
num_bc_sats Number of satisfactory bankcard accounts
num_bc_tl Number of bankcard accounts
num_il_tl Number of installment accounts
num_op_rev_tl Number of open revolving accounts
num_rev_accts Number of revolving accounts
num_rev_tl_bal_gt_0 Number of revolving trades with balance >0
num_sats Number of satisfactory accounts
num_tl_120dpd_2m Number of accounts currently 120 days past due (updated in past 2 months)
num_tl_30dpd Number of accounts currently 30 days past due (updated in past 2 months)
num_tl_90g_dpd_24m Number of accounts 90 or more days past due in last 24 months
num_tl_op_past_12m Number of accounts opened in past 12 months
open_acc_6m Number of open trades in last 6 months
open_il_12m Number of installment accounts opened in past 12 months
open_il_24m Number of installment accounts opened in past 24 months
open_act_il Number of currently active installment trades
open_rv_12m Number of revolving trades opened in past 12 months
open_rv_24m Number of revolving trades opened in past 24 months
openAcc The number of open credit lines in the borrower’s credit file.
pct_tl_nvr_dlq Percent of trades never delinquent
percentBcGt75 Percentage of all bankcard accounts > 75% of limit.
pub_rec_bankruptcies Number of public record bankruptcies
pubRec Number of derogatory public records
purpose A category provided by the borrower for the loan request.
reviewStatus The status of the loan during the listing period. Values: APPROVED, NOT_APPROVED.
reviewStatusD The date the loan application was reviewed by LC
revolBal Total credit revolving balance
revolUtil Revolving line utilization rate, or the amount of credit the borrower is using relative to all available revolving credit.
serviceFeeRate Service fee rate paid by the investor for this loan.
subGrade LC assigned loan subgrade
tax_liens Number of tax liens
term The number of payments on the loan. Values are in months and can be either 36 or 60.
title The loan title provided by the borrower
tot_coll_amt Total collection amounts ever owed
tot_cur_bal Total current balance of all accounts
tot_hi_cred_lim Total high credit/credit limit
total_bal_il Total current balance of all installment accounts
total_cu_tl Number of finance trades
total_il_high_credit_limit Total installment high credit/credit limit
total_rev_hi_lim  Total revolving high credit/credit limit
totalAcc The total number of credit lines currently in the borrower’s credit file
totalBalExMort Total credit balance excluding mortgage
totalBcLimit Total bankcard high credit/credit limit
url URL for the LC page with listing data.
verified_status_joint Indicates if the co-borrowers’ joint income was verified by LC, not verified, or if the income source was verified
zip_code The first 3 numbers of the zip code provided by the borrower in the loan application.
revol_bal_joint  Sum of revolving credit balance of the co-borrowers, net of duplicate balances
sec_app_fico_range_low  FICO range (high) for the secondary applicant
sec_app_fico_range_high  FICO range (low) for the secondary applicant
sec_app_earliest_cr_line  Earliest credit line at time of application for the secondary applicant
sec_app_inq_last_6mths  Credit inquiries in the last 6 months at time of application for the secondary applicant
sec_app_mort_acc  Number of mortgage accounts at time of application for the secondary applicant
sec_app_open_acc  Number of open trades at time of application for the secondary applicant
sec_app_revol_util  Ratio of total current balance to high credit/credit limit for all revolving accounts
sec_app_open_act_il  Number of currently active installment trades at time of application for the secondary applicant
sec_app_num_rev_accts  Number of revolving accounts at time of application for the secondary applicant
sec_app_chargeoff_within_12_mths  Number of charge-offs within last 12 months at time of application for the secondary applicant
sec_app_collections_12_mths_ex_med  Number of collections within last 12 months excluding medical collections at time of application for the secondary applicant
sec_app_mths_since_last_major_derog  Months since most recent 90-day or worse rating at time of application for the secondary applicant
disbursement_method The method by which the borrower receives their loan. Possible values are: CASH, DIRECT_PAY