Lending Club Data Analysis

LendingClub is a US peer-to-peer lending company headquartered in San Francisco, California, and has helped over 2.5 million customers simplify their finances in the last 10 years. LendingClub improves the loan process for borrows by offering a fast and easy online application. For investors, the company offers historical returns of 3 – 8% and anyone can invest with as little as $1000 ^[1].

Because LendingClub relies heavily on technology to evaluate their borrowers, getting an accurate risk analysis for each applicant requires systems which can quickly assess the applications, and upon approval, offer these loans to interested investors at a given interest rate. Of the $7.9 billion dollars loaned in 2018, $233 million was written off as defaulted loans. While this may seem insignificant at 2.9%, this does represent risks and losses which investors and the company would prefer to avoid. In order to mitigate risk, lending companies traditionally apply a fitting interest rate to each loan. For example, loans for a home or a car may have lower interest rates because the risk is reduced due to directly related collateral. In another example, someone with a poor credit history or having declared bankruptcy may have a higher interest rate due to the inherent risk of history repeating.

LendingClub provides an anonymized data set ^[2] of all their current and completed loans available for download on their website. Our goal was to use the data set to try and understand which data points may contribute to the interest rate designated to the loans. We reviewed the data set of 107,000+ observations (Appendix 1) and the accompanying data dictionary (Appendix 2) of the ~120 data categories included for each loan, and decided that we wanted to build a model which included 7 categories, resulting in 8 independent variables due to dummy variables, to try and predict if there is any relationship between these independent variables and the interest rate on the loan. This relationship could be described by the basic model Equation (1) below:

Int_rate = b₀– b₁(loan_amount) + b₂(funded_amount) – b₃(annual_income) – b₄(rent) – b₅(last_payment_amount) + b₆(debt_to_income_ratio) + b₇(open_accounts) – b₈(total_accounts) – b₉(mortgage) (1)

Table 1: Independent variables include

b₁	Loan Amount	The listed amount of the loan applied for by the borrower. If at some point in time, the credit department reduces the loan amount, then it will be reflected in this value.
b₂	Funded Amount	The total amount committed to that loan by investors.
b₃	Annual Income	The self-reported annual income provided by the borrower during registration.
b₄b₉	Home Ownership	Mortgage, Renter or Own
b₅	Last Payment Amount	The amount of the last payment made by the borrower
b₆	Debt to income ratio	A ratio calculated using the borrower’s total monthly debt payments on the total debt obligations, excluding mortgage and the requested LC loan, divided by the borrower’s self-reported monthly income.
b₇	Open accounts	The number of open credit lines in the borrower’s credit file.
b₈	Total Accounts	The total number of credit lines currently in the borrower’s credit file.

Note: Home ownership was divided into to two dummy variables, mortgage and rent. Therefore, if both values are zero, the observation is for an individual who owns property outright.

Table 2: Results Summary

Analysis of results

In initially looking at the overall results, we can take away a few points. After running our regression analysis on the data set, we found we had an R²of 0.0308 using the stated independent variables. Traditionally, this is not a strong R²value for cross sectional data. In general, the acceptable to excellent range for this type of data would be from 0.3 to 0.7. However, we can take a deeper look at the results to see if we can deduct any other information. Next, we can take a look at the F-test. The F-test tells us if the independent variables, as a group, explain a statistically significant share of variation in the dependent variable. Our results included an F (calculated value) and the test is shown below:

F-TEST:

Null hypothesis H0: R² = 0

Alternative hypothesis HA: R² > 0

The null can be stated alternatively as the model has explanatory power; the alternative is then that the model has no explanatory power.

If the value of F Calculated is greater than, or equal to the F Critical (From F Table) 1.88 (9 DoF x ∞ DoF) we can reject H0, if the value of F Calculated is less than F Critical we fail to reject H0.

In looking at the results, we find a calculated F value of 379 which is greater than the table value of 1.88. Therefore, we can reject H0. This implies that the model has explanatory power; however, we must look at several other factors before validating the model.

T-TEST:

Our next step is to look at each independent variable and its relationship to interest rate. This can be validated through a T-test, as shown below. These calculated t values shown in Table 3 are compared to the threshold value of 2.262.

Table 3: Estimate regression coefficients of interest rate relating to various variables

Null hypothesis H0: βi = 0

Alternative Hypothesis HA: βi ≠ 0

β = coefficient of variable being checked/Std Error of coefficient

If the absolute value of t Calculated is greater than or equal to t Critical (From the t Table) 2.262 (9 DoF x 0.05) we can reject H0, if the absolute value of t Calculated is less than the t Critical we fail to reject H0.

Comparing the t Critical value to each independent variable, we can see that we would reject H0 for annual income, home ownership status (rent/mortgage), last payment amount, debt to income ratio and both open and total accounts. On the other hand, we would fail to reject H0 for the variables loan amount and funded amount. These are calculated at the 95% confidence level.

P-TEST:

A third test that can be used to validate the model is the P test.

Null hypothesis H0: βi = 0

Alternative Hypothesis HA: βi ≠ 0

If the absolute value of the P value is less than or equal to 0.05, we can reject H0, if the P value is greater than 0.05, which is the P critical value, we fail to reject H0.

In reviewing the P values in Table 3, we can see that they all fall below the threshold of 0.05, or a confidence level equal to, or above 95%. This implies that all variables are relevant, contradicting some of our results from the T test shown above.

ELASTICITIES:

Elasticities were calculated in order to determine the magnitude of effect of each independent variable on the dependent variable. Elasticity is defined as the percent change in the dependent variable as a result of a percent change in the independent variable. The elasticities and formula are shown below in Table 4.

Table 4: Elasticities

According to Table 4 above, the loan amount and funded amount have the greatest impact on the interest rate. The greatest impact is determined through the largest absolute value. The anomaly in this is that the loan amount variable has a negative elasticity and funded amount has a positive elasticity. This does not make sense given that most of the observations have the same value for these variables. It would make sense that the loan amount has a negative elasticity because a large loan amount generally lean towards something that contains equity, such as a car or house. These types of loans typically have lower interest rates and lower risk due to the availability of collateral. Alternatively, an account such as a credit card generally has a small balance and a high interest rate due to high risk and no collateral. Other interesting trends to note here would be the elasticities associated with open accounts and total accounts. A negative elasticity on open accounts is intuitive because if a person has many open accounts (large amounts of debt), risk is increased. Alternatively, a person with a large number of total accounts is assumed to have a lot of credit history and may be assumed to be less of a risk, therefore showing a negative elasticity.

Conclusion

In conclusion, the model described above has some characteristics that are intuitive given the input variables used; however the overall model is not great. The R²value is very low for cross-sectional data despite having passed the F-test. All of the variables pass the P-test but a majority of them do not pass the T-test. The model could be improved if other variables were available to test. Things that might impact the interest rate could be credit score or the purpose of the loan. We did have access to credit “grade” which we believe is related to credit score, but these variables showed colinearity when included in the model. Additionally, the data for “purpose of the loan” was available in the data set, yet the inputs were not uniform. In order to include this in the model, we would need to adjust the observation values for each of the 107,000 data points. Overall, we have a valid model with plenty of room for improvement.

Appendix 1

Screenshot of example data set to be used.

Appendix 2

Included below is a sample of the data set, along with the metadata, explaining the fields, and the descriptions.

Field	Description
acceptD	The date which the borrower accepted the offer
accNowDelinq	The number of accounts on which the borrower is now delinquent.
accOpenPast24Mths	Number of trades opened in past 24 months.
addrState	The state provided by the borrower in the loan application
all_util	Balance to credit limit on all trades
annual_inc_joint	The combined self-reported annual income provided by the co-borrowers during registration
annualInc	The self-reported annual income provided by the borrower during registration.
application_type	Indicates whether the loan is an individual application or a joint application with two co-borrowers
avg_cur_bal	Average current balance of all accounts
bcOpenToBuy	Total open to buy on revolving bankcards.
bcUtil	Ratio of total current balance to high credit/credit limit for all bankcard accounts.
chargeoff_within_12_mths	Number of charge-offs within 12 months
collections_12_mths_ex_med	Number of collections in 12 months excluding medical collections
creditPullD	The date LC pulled credit for this loan
delinq2Yrs	The number of 30+ days past-due incidences of delinquency in the borrower’s credit file for the past 2 years
delinqAmnt	The past-due amount owed for the accounts on which the borrower is now delinquent.
desc	Loan description provided by the borrower
dti	A ratio calculated using the borrower’s total monthly debt payments on the total debt obligations, excluding mortgage and the requested LC loan, divided by the borrower’s self-reported monthly income.
dti_joint	A ratio calculated using the co-borrowers’ total monthly payments on the total debt obligations, excluding mortgages and the requested LC loan, divided by the co-borrowers’ combined self-reported monthly income
earliestCrLine	The date the borrower’s earliest reported credit line was opened
effective_int_rate	The effective interest rate is equal to the interest rate on a Note reduced by Lending Club’s estimate of the impact of uncollected interest prior to charge off.
emp_title	The job title supplied by the Borrower when applying for the loan.*
empLength	Employment length in years. Possible values are between 0 and 10 where 0 means less than one year and 10 means ten or more years.
expD	The date the listing will expire
expDefaultRate	The expected default rate of the loan.
ficoRangeHigh	The upper boundary range the borrower’s FICO at loan origination belongs to.
ficoRangeLow	The lower boundary range the borrower’s FICO at loan origination belongs to.
fundedAmnt	The total amount committed to that loan at that point in time.
grade	LC assigned loan grade
homeOwnership	The home ownership status provided by the borrower during registration. Our values are: RENT, OWN, MORTGAGE, OTHER.
id	A unique LC assigned ID for the loan listing.
il_util	Ratio of total current balance to high credit/credit limit on all install acct
ils_exp_d	wholeloan platform expiration date
initialListStatus	The initial listing status of the loan. Possible values are – W, F
inq_fi	Number of personal finance inquiries
inq_last_12m	Number of credit inquiries in past 12 months
inqLast6Mths	The number of inquiries in past 6 months (excluding auto and mortgage inquiries)
installment	The monthly payment owed by the borrower if the loan originates.
intRate	Interest Rate on the loan
isIncV	Indicates if income was verified by LC, not verified, or if the income source was verified
listD	The date which the borrower’s application was listed on the platform.
loanAmnt	The listed amount of the loan applied for by the borrower. If at some point in time, the credit department reduces the loan amount, then it will be reflected in this value.
max_bal_bc	Maximum current balance owed on all revolving accounts
memberId	A unique LC assigned Id for the borrower member.
mo_sin_old_rev_tl_op	Months since oldest revolving account opened
mo_sin_rcnt_rev_tl_op	Months since most recent revolving account opened
mo_sin_rcnt_tl	Months since most recent account opened
mortAcc	Number of mortgage accounts.
msa	Metropolitan Statistical Area of the borrower.
mths_since_last_major_derog	Months since most recent 90-day or worse rating
mths_since_oldest_il_open	Months since oldest bank installment account opened
mths_since_rcnt_il	Months since most recent installment accounts opened
mthsSinceLastDelinq	The number of months since the borrower’s last delinquency.
mthsSinceLastRecord	The number of months since the last public record.
mthsSinceMostRecentInq	Months since most recent inquiry.
mthsSinceRecentBc	Months since most recent bankcard account opened.
mthsSinceRecentLoanDelinq	Months since most recent personal finance delinquency.
mthsSinceRecentRevolDelinq	Months since most recent revolving delinquency.
num_accts_ever_120_pd	Number of accounts ever 120 or more days past due
num_actv_bc_tl	Number of currently active bankcard accounts
num_actv_rev_tl	Number of currently active revolving trades
num_bc_sats	Number of satisfactory bankcard accounts
num_bc_tl	Number of bankcard accounts
num_il_tl	Number of installment accounts
num_op_rev_tl	Number of open revolving accounts
num_rev_accts	Number of revolving accounts
num_rev_tl_bal_gt_0	Number of revolving trades with balance >0
num_sats	Number of satisfactory accounts
num_tl_120dpd_2m	Number of accounts currently 120 days past due (updated in past 2 months)
num_tl_30dpd	Number of accounts currently 30 days past due (updated in past 2 months)
num_tl_90g_dpd_24m	Number of accounts 90 or more days past due in last 24 months
num_tl_op_past_12m	Number of accounts opened in past 12 months
open_acc_6m	Number of open trades in last 6 months
open_il_12m	Number of installment accounts opened in past 12 months
open_il_24m	Number of installment accounts opened in past 24 months
open_act_il	Number of currently active installment trades
open_rv_12m	Number of revolving trades opened in past 12 months
open_rv_24m	Number of revolving trades opened in past 24 months
openAcc	The number of open credit lines in the borrower’s credit file.
pct_tl_nvr_dlq	Percent of trades never delinquent
percentBcGt75	Percentage of all bankcard accounts > 75% of limit.
pub_rec_bankruptcies	Number of public record bankruptcies
pubRec	Number of derogatory public records
purpose	A category provided by the borrower for the loan request.
reviewStatus	The status of the loan during the listing period. Values: APPROVED, NOT_APPROVED.
reviewStatusD	The date the loan application was reviewed by LC
revolBal	Total credit revolving balance
revolUtil	Revolving line utilization rate, or the amount of credit the borrower is using relative to all available revolving credit.
serviceFeeRate	Service fee rate paid by the investor for this loan.
subGrade	LC assigned loan subgrade
tax_liens	Number of tax liens
term	The number of payments on the loan. Values are in months and can be either 36 or 60.
title	The loan title provided by the borrower
tot_coll_amt	Total collection amounts ever owed
tot_cur_bal	Total current balance of all accounts
tot_hi_cred_lim	Total high credit/credit limit
total_bal_il	Total current balance of all installment accounts
total_cu_tl	Number of finance trades
total_il_high_credit_limit	Total installment high credit/credit limit
total_rev_hi_lim	Total revolving high credit/credit limit
totalAcc	The total number of credit lines currently in the borrower’s credit file
totalBalExMort	Total credit balance excluding mortgage
totalBcLimit	Total bankcard high credit/credit limit
url	URL for the LC page with listing data.
verified_status_joint	Indicates if the co-borrowers’ joint income was verified by LC, not verified, or if the income source was verified
zip_code	The first 3 numbers of the zip code provided by the borrower in the loan application.
revol_bal_joint	Sum of revolving credit balance of the co-borrowers, net of duplicate balances
sec_app_fico_range_low	FICO range (high) for the secondary applicant
sec_app_fico_range_high	FICO range (low) for the secondary applicant
sec_app_earliest_cr_line	Earliest credit line at time of application for the secondary applicant
sec_app_inq_last_6mths	Credit inquiries in the last 6 months at time of application for the secondary applicant
sec_app_mort_acc	Number of mortgage accounts at time of application for the secondary applicant
sec_app_open_acc	Number of open trades at time of application for the secondary applicant
sec_app_revol_util	Ratio of total current balance to high credit/credit limit for all revolving accounts
sec_app_open_act_il	Number of currently active installment trades at time of application for the secondary applicant
sec_app_num_rev_accts	Number of revolving accounts at time of application for the secondary applicant
sec_app_chargeoff_within_12_mths	Number of charge-offs within last 12 months at time of application for the secondary applicant
sec_app_collections_12_mths_ex_med	Number of collections within last 12 months excluding medical collections at time of application for the secondary applicant
sec_app_mths_since_last_major_derog	Months since most recent 90-day or worse rating at time of application for the secondary applicant
disbursement_method	The method by which the borrower receives their loan. Possible values are: CASH, DIRECT_PAY