Supervised Learning: Loan Default Prediction

The Data Science supervised learning demonstration conducts loan default prediction using Lending Club data. It utilizes several files that together demonstrate the functionality of the Spectrum™ Technology Platform Data Science Solution in Enterprise Designer.

Spectrum_DataScience_Supervised_Learning.zip includes the following files:
  • Spectrum_DataScience_Supervised_Learning.pdf—Documentation that walks you through how to build and use the single categorizer dataflow, the scoring dataflow, and all supporting files.
  • Data.zip—The required input files, test files, and training files for each of the included dataflows.
    • loan.csv
    • LoanStats_2016Q1.csv
    • LoanStats_2016Q2.csv
    • LoanStats_2016Q3.csv
    • testData.txt
    • testDataCollege.txt
    • testDataStable.txt
    • testDataThankful.txt
    • trainData.txt
    • trainDataCollege.txt
    • trainDataStable.txt
    • trainDataThankful.txt
    • training.xml
    • trainingCollege.xml
    • trainingStable.xml
    • trainingThanks.xml
  • Lending_Club_Demo_DF_(V12.1).zip—The dataflows for Spectrum™ Technology Platform 12.1.
    • LendingClub_2007_2016Q12_v121_MultipleCategorizers.df
    • LendingClub_2007_2016Q1Q2_v121_SingleCategorizer.df
    • LendingClub_2016Q3_v121_SingleCategorizer_Scoring.df
  • Lending_Club_Demo_DF_(V12.2).zip—The dataflows for Spectrum™ Technology Platform 12.2.
    • LendingClub_2007_2016Q12_v122_MultipleCategorizers.df
    • LendingClub_2007_2016Q1Q2_v122_SingleCategorizer.df
    • LendingClub_2016Q3_v122_SingleCategorizer_Scoring.df
  • ReadMe.txt—High-level descriptions and instructions for the previously mentioned files.
You can create your own dataflow by following the step-by-step instructions in the documentation, or you can use the included dataflows as references to confirm what the individual completed stages and dataflows as a whole should look like.