Analysis of German Credit Data
Analysis of German Credit DataData mining is a critical step in knowledge discovery involving theories, methodologies, and tools for revealing patterns in data. It is important to understand the rationale behind the methods so that tools and methods have appropriate fit with the data and the objective of pattern recognition. There may be several options for tools available for a dataset.
When a bank receives a loan application, based on the applicant’s profile the bank has to make a decision regarding whether to go ahead with the loan approval or not. Two types of risks are associated with the bank’s decision –
- If the applicant is a good credit risk, i.e. is likely to repay the loan, then not approving the loan to the person results in a loss of business to the bank
- If the applicant is a bad credit risk, i.e. is not likely to repay the loan, then approving the loan to the person results in a financial loss to the bank
Objective of Analysis:
Minimization of risk and maximization of profit on behalf of the bank.
To minimize loss from the bank’s perspective, the bank needs a decision rule regarding who to give approval of the loan and who not to. An applicant’s demographic and socio-economic profiles are considered by loan managers before a decision is taken regarding his/her loan application.
The German Credit Data contains data on 20 variables and the classification whether an applicant is considered a Good or a Bad credit risk for 1000 loan applicants. Here is a link to the German Credit data (right-click and "save as" ). A predictive model developed on this data is expected to provide a bank manager guidance for making a decision whether to approve a loan to a prospective applicant based on his/her profiles.
Data Files for this case (right-click and "save as" ) :
- German Credit data - german_credit.csv
- Training dataset - Training50.csv
- Test dataset - Test.csv
The following analytical approaches are taken:
- Logistic regression: The response is binary (Good credit risk or Bad) and several predictors are available.
- Discriminant Analysis:
- Tree-based method and Random Forest
Sample R code for Reading a .csv file
read.csv(“C:/Users/sbasu/Desktop/Stat_508/German Credit”, header = TRUE, sep = ",")
GCD.1 - Exploratory Data Analysis (EDA) and Data Pre-processing
GCD.1 - Exploratory Data Analysis (EDA) and Data Pre-processingBefore getting into any sophisticated analysis, the first step is to do an EDA and data cleaning. Since both categorical and continuous variables are included in the data set, appropriate tables and summary statistics are provided.
Sample R code for creating marginal proportional tables
margin.table(prop.table(table(Duration.in.Current.address, Most.valuable.available.asset, Concurrent.Credits,No.of.Credits.at.this.Bank,Occupation,No.of.dependents,Telephone, Foreign.Worker)),1)
margin.table(prop.table(table(Duration.in.Current.address, Most.valuable.available.asset, Concurrent.Credits,No.of.Credits.at.this.Bank,Occupation,No.of.dependents,Telephone, Foreign.Worker)),2)
margin.table(prop.table(table(Duration.in.Current.address, Most.valuable.available.asset, Concurrent.Credits,No.of.Credits.at.this.Bank,Occupation,No.of.dependents,Telephone, Foreign.Worker)),3)
margin.table(prop.table(table(Duration.in.Current.address, Most.valuable.available.asset, Concurrent.Credits,No.of.Credits.at.this.Bank,Occupation,No.of.dependents,Telephone, Foreign.Worker)),4)
margin.table(prop.table(table(Duration.in.Current.address, Most.valuable.available.asset, Concurrent.Credits,No.of.Credits.at.this.Bank,Occupation,No.of.dependents,Telephone, Foreign.Worker)),5)
margin.table(prop.table(table(Duration.in.Current.address, Most.valuable.available.asset, Concurrent.Credits,No.of.Credits.at.this.Bank,Occupation,No.of.dependents,Telephone, Foreign.Worker)),6)
margin.table(prop.table(table(Duration.in.Current.address, Most.valuable.available.asset, Concurrent.Credits,No.of.Credits.at.this.Bank,Occupation,No.of.dependents,Telephone, Foreign.Worker)),7)
margin.table(prop.table(table(Duration.in.Current.address, Most.valuable.available.asset, Concurrent.Credits,No.of.Credits.at.this.Bank,Occupation,No.of.dependents,Telephone, Foreign.Worker)),8)
Proportions of applicants belonging to each classification of a categorical variable are shown in the following table (below). The pink shadings indicate that these levels have too few observations and the levels are merged for final analysis.
Predictor (Categorical) | Levels and Proportions | ||||
---|---|---|---|---|---|
Account Balance | No Account | None | Below 200 DM | 200 DM or Above | |
(%) | 27.4% | 26.9% | 6.3% | 39.4% | |
Payment Status | Delayed | Other Credits | Paid Up | No Problem with Current Credits | Previous Credits Paid |
(%) | 4.0% | 4.9% | 53.0% | 8.8% | 29.3% |
Savings/ Stock Value | None | Below 100 DM | [100, 500) | [500, 1000) | Above 1000 |
60.3% | 10.3% | 6.3% | 4.8% | 18.3% | |
Length of Current Employment | Unemployed | <1 Year | [1, 4) | [4, 7) | Above 7 |
6.2% | 17.2% | 33.9% | 17.4% | 25.3% | |
Installments % | Above 35% | (25%, 35%) | [20%, 25%) | Below 20% | |
13.6% | 23.1% | 15.7% | 47.6% | ||
Occupation | Unemployed, unskilled | Unskilled Permanent Resident | Skilled | Executive | |
2.2% | 20.0% | 63.0% | 14.8% | ||
Sex and Marital Status | Male, Divorced | Male, Single | Male, Married/Widowed | Female | |
5.0% | 31.0% | 54.8% | 9.2% | ||
Duration in Current Address | <1 Year | [1, 4) | [4, 7) | Above 7 | |
13.0% | 30.8% | 14.9% | 41.3% | ||
Type of Apartment | Free | Rented | Owned | ||
17.9% | 71.4% | 10.7% | |||
Most Valuable Asset | None | Car | Life Insurance | Real Estate | |
28.2% | 23.2% | 33.2% | 15.4% | ||
No. of credits at Bank | 1 | 2 or 3 | 4 or 5 | Above 6 | |
63.3% | 33.3% | 2.8% | 0.06% | ||
Guarantor | None | Co-applicant | Guarantor | ||
90.7% | 4.1% | 5.2% | |||
Concurrent Credits | Other Banks | Dept. Store | None | ||
13.9% | 4.7% | 81.4% | |||
No. of Departments | 3 or More | Less than 3 | |||
84.5% | 15.5% | ||||
Telephone | Yes | No | |||
40.4% | 59.6% | ||||
Foreign Worker | Yes | No | |||
3.7% | 96.3% |
Purpose of Credit | |||||||||
---|---|---|---|---|---|---|---|---|---|
New Car | Used Car | Furniture | Radio/TV | Appliances | Repair | Vacation | Retraining | Business | Other |
10.3% | 18.1% | 28% | 1.2% | 2.2% | 5.0% | 0.9% | 9.7% | 1.2% | 23.4% |
Since most of the predictors are categorical with several levels, the full cross-classification of all variables will lead to zero observations in many cells. Hence we need to reduce the table size. For details of variable names and classification see Appendix 1.
Depending on the cell proportions given in the one-way table above two or more cells are merged for several categorical predictors. We present below the final classification for the predictors that may potentially have any influence on Creditability
- Account Balance: No account (1), None (No balance) (2), Some Balance (3)
- Payment Status: Some Problems (1), Paid Up (2), No Problems (in this bank) (3)
- Savings/Stock Value: None, Below 100 DM, [100, 1000] DM, Above 1000 DM
- Employment Length: Below 1 year (including unemployed), [1, 4), [4, 7), Above 7
- Sex/Marital Status: Male Divorced/Single, Male Married/Widowed, Female
- No of Credits at this bank: 1, More than 1
- Guarantor: None, Yes
- Concurrent Credits: Other Banks or Dept Stores, None
- ForeignWorker variable may be dropped from the study
- Purpose of Credit: New car, Used car, Home Related, Other
Cross-tabulation of the 9 predictors as defined above with Creditability is shown below. The proportions shown in the cells are column proportions and so are the marginal proportions. For example, 30% of 1000 applicants have no account and another 30% have no balance while 40% have some balance in their account. Among those who have no account 135 are found to be Creditable and 139 are found to be Non-Creditable. In the group with no balance in their account, 40% were found to be on-Creditable whereas in the group having some balance only 1% are found to be Non-Creditable.
Sample R code for creating K1 x K2 contingency table.
CrossTable(Creditability, Account.Balance, digits=1, prop.r=F, prop.t=F, prop.chisq=F, chisq=T)
CrossTable(Creditability, Payment.Status.of.Previous.Credit, digits=1, prop.r=F, prop.t=F, prop.chisq=F, chisq=T)
CrossTable(Creditability, Purpose, digits=1, prop.r=F, prop.t=F, prop.chisq=F, chisq=T)
Summary for the continuous variables:
Sample R code for Descriptive Statistics.
attach(German.Credit) # If the data frame is attached then the column names may be directly called
summry(Duration.of.Credit.Month) # Summary statistics are printed for this variable
brksCredit <- seq(0, 80, 10) # Bins for a nice looking histogram
hist(Duration.of.Credit.Month., breaks=brksCredit, xlab = "Credit Month", ylab = "Frequency", main = " ", cex=0.4) # produces nice looking histogram
boxplot(Duration.of.Credit.Month., bty="n",xlab = "Credit Month", cex=0.4) # For boxplot
Predictors (Continuous) | Min | Q1 | Median | Q3 | Max | Mean | SD |
---|---|---|---|---|---|---|---|
Duration of Credit (Month) | 4 | 12 | 18 | 24 | 72 | 20.9 | 12.06 |
Amount of Credit (DM) | 250 | 1366 | 2320 | 3972 | 18420 | 3271 | 2822.75 |
Age (of Applicant) | 19 | 27 | 33 | 42 | 75 | 35.54 | 11.35 |
Distribution of the continuous variables:
All the three variables show marked positive skewness. Boxplots bear this out even more clearly.
In preparation of predictors to use in building a logistic regression model, we consider bivariate association of the response (Creditability) with the categorical predictors.
GCD.2 - Towards Building a Logistic Regression Model
GCD.2 - Towards Building a Logistic Regression ModelSince the number of predictors in this problem is not very high, it is possible to look into the dependency of the response (Creditability) on each of them individually. The following table summarizes the chi-square p-values for each contingency table. Note that among the sample of size 1000, 700 were Creditable and 300 Non-Creditable. This classification is based on the Bank’s opinion on the actual applicants.
Only significant predictors are to be included in the logistic regression model. Since there are 1000 observations 50:50 cross-validation scheme is tried:
Model Building with 50:50 Cross-validation
Sample R code for 50:50 cross-validation data creation
indexes = sample(1:nrow(German.Credit), size=0.5*nrow(German.Credit)) # Random sample of 50% of row numbers created
Train50 <- German.Credit[indexes,] # Training data contains created indices
Test50 <- German.Credit[-indexes,] # Test data contains the rest
# Using any proportion, other than 0.5 above and size Training and Test data can be constructed
1000 observations are randomly partitioned into two equal sized subsets – Training and Test data. A logistic model is fit to the Training set.
Results are given below, shaded rows indicate variables not significant at 10% level.
Sample R code for Logistic Model building with Training data and assessing for Test data
LogisticModel50 <- glm(Creditability ~ Account.Balance + Payment.Status.of.Previous.Credit + Purpose + Value.Savings.Stocks + Length.of.current.employment + Sex...Marital.Status + Most.valuable.available.asset + Type.of.apartment + Concurrent.Credits + Duration.of.Credit..month.+ Credit.Amount + Age..years., family=binomial, data = Train50)
LogisticModel50final <- glm(Creditability ~ Account.Balance + Payment.Status.of.Previous.Credit + Purpose + Length.of.current.employment + Sex...Marital.Status, family=binomial, data = Train50)
fit50 <- fitted.values(LogisticModel50S1)
Threshold50 <- rep(0,500)
for (i in 1:500)
if(fit50[i] >= 0.5) Threshold50[i] <- 1
CrossTable(Train50$Creditability, Threshold50, digits=1, prop.r=F, prop.t=F, prop.chisq=F, chisq=F, data=Train50)
perf <- performance(pred, "tpr", "fpr")
plot(perf)
R output:
Null deviance: 598.536 on 499 degrees of freedom
Residual deviance: 464.01 on 477 degrees of freedom
AIC: 510.01
Removing the nonsignificant variables a second logistic regression is fit to the data.
R output:
Null deviance: 598.53 on 499 degrees of freedom
Residual deviance: 472.12 on 483 degrees of freedom
AIC: 506.12
Need to remove another variable to come up with a model where all predictors are significant at 10% level.
R output:
Null deviance: 598.53 on 499 degrees of freedom
Residual deviance: 474.67 on 484 degrees of freedom
AIC: 506.67
This model is recommended as the final model based on the Training Data. Final performance of a model is evaluated by considering the classification power. Following are a few tables defined at different thresholds of classification.
The following figure shows the performance of the classifier through ROC curve.
GCD.3 - Applying Discriminant Analysis
GCD.3 - Applying Discriminant AnalysisFor discriminant analysis all the predictors are not used. Only the continuous variables and the ordinal variables are used as for the nominal variables there will be no concept of group means and linear discriminants will be difficult to interpret. The predictors are assumed to have a multivariate normal distribution.
Sample R code for Discriminant Analysis
library(MASS)
ldafit <- lda(Creditability ~ Value.Savings.Stocks + Length.of.current.employment + Duration.of.Credit..month.+ Credit.Amount + Age..years., data = Train50)
ldafit
plot(ldafit)
lda.pred <- predict(ldafit, data=Test50)
ldaclass <- lda.pred$class
table(ldaclass, Test50$Creditability)
qdafit <- qda(Creditability ~ Value.Savings.Stocks + Length.of.current.employment + Duration.of.Credit..month.+ Credit.Amount + Age..years., data = Train50)
qdafit
qda.pred <- predict(qdafit, data=Test50)
qdaclass <- qda.pred$class
table(qdaclass, Test50$Creditability)
Prior probability was taken as observed in the Training sample:
71.4% Creditable and 28.6% Non-creditable
Linear Discriminant Analysis
Quadratic Discriminant Analysis
Neither logistic regression nor discriminant analysis is performing well for this data. The reason DA may not do well is that, most of the predictors are categorical and nominal predictors are not used in this analysis.
GCD.4 - Applying Tree-Based Methods
GCD.4 - Applying Tree-Based MethodsSample R code for Tree method
library(tree)
Train50_tree <- tree(Creditability ~ Account.Balance+Duration.of.Credit..month.+Payment.Status.of.Previous.Credit+Purpose+Credit.Amount+Value.Savings.Stocks+Length.of.current.employment+Instalment.per.cent+Sex...Marital.Status+Guarantors+Duration.in.Current.address+Most.valuable.available.asset+Age..years.+Concurrent.Credits+Type.of.apartment+No.of.Credits.at.this.Bank+Occupation+No.of.dependents+Telephone, data=Train50, method="class")
summary(Train50_tree)
plot(Train50_tree)
text(Train50_tree, pretty=0,cex=0.6)
Test50_pred <- predict(Train50_tree, Test50, type="class")
table(Test50_pred, Test50$Creditability)
Train50_prune8 <- prune.misclass(Train50_tree, best=8)
Test50_prune8_pred <- predict(Train50_prune8, Test50, type="class")
table(Test50_prune8_pred, Test50$Creditability))
Both categorical and continuous predictors are used for binary classification. Using rpart{library=rpart}, the following tree is obtained without any pruning.
R output:
n= 500
node), split, n, loss, yval, (yprob)
* denotes terminal node
1) root 500 143 1 (0.28600000 0.71400000)
2) Account.Balance=1,2 261 110 1 (0.42145594 0.57854406)
4) Duration.of.Credit..month.>=13 165 79 0 (0.52121212 0.47878788)
8) Value.Savings.Stocks< 1.5 111 43 0 (0.61261261 0.38738739)
16) Purpose=4 45 9 0 (0.80000000 0.20000000)
32) Duration.in.Current.address>=1.5 38 4 0 (0.89473684 0.10526316) *
33) Duration.in.Current.address< 1.5 7 2 1 (0.28571429 0.71428571) *
17) Purpose=1,2,3 66 32 1 (0.48484848 0.51515152)
34) Duration.of.Credit..month.>=33 26 7 0 (0.73076923 0.26923077) *
35) Duration.of.Credit..month.< 33 40 13 1 (0.32500000 0.67500000)
70) No.of.Credits.at.this.Bank< 1.5 28 12 1 (0.42857143 0.57142857)
140) Instalment.per.cent>=2.5 17 7 0 (0.58823529 0.41176471) *
141) Instalment.per.cent< 2.5 11 2 1 (0.18181818 0.81818182) *
71) No.of.Credits.at.this.Bank>=1.5 12 1 1 (0.08333333 0.91666667) *
9) Value.Savings.Stocks>=1.5 54 18 1 (0.33333333 0.66666667)
18) Length.of.current.employment< 2.5 32 15 1 (0.46875000 0.53125000)
36) Type.of.apartment=1 10 2 0 (0.80000000 0.20000000) *
37) Type.of.apartment=2,3 22 7 1 (0.31818182 0.68181818) *
19) Length.of.current.employment>=2.5 22 3 1 (0.13636364 0.86363636) *
5) Duration.of.Credit..month.< 13 96 24 1 (0.25000000 0.75000000)
10) Payment.Status.of.Previous.Credit=1 7 2 0 (0.71428571 0.28571429) *
11) Payment.Status.of.Previous.Credit=2,3 89 19 1 (0.21348315 0.78651685) *
3) Account.Balance=3 239 33 1 (0.13807531 0.86192469)
6) Purpose=4 72 18 1 (0.25000000 0.75000000)
12) Concurrent.Credits< 1.5 11 4 0 (0.63636364 0.36363636) *
13) Concurrent.Credits>=1.5 61 11 1 (0.18032787 0.81967213) *
7) Purpose=1,2,3 167 15 1 (0.08982036 0.91017964) *
Applying the procedure on Test data, classification probability shows improvement.
The CP table is as follows:
Following is the result for pruning the above tree for cross-validated classification error rate 90%.
n= 500
node), split, n, loss, yval, (yprob)
* denotes terminal node
1) root 500 143 1 (0.2860000 0.7140000)
2) Account.Balance=1,2 261 110 1 (0.4214559 0.5785441)
4) Duration.of.Credit..month.>=13 165 79 0 (0.5212121 0.4787879)
8) Value.Savings.Stocks< 1.5 111 43 0 (0.6126126 0.3873874)
16) Purpose=4 45 9 0 (0.8000000 0.2000000)
32) Duration.in.Current.address>=1.5 38 4 0 (0.8947368 0.1052632) *
33) Duration.in.Current.address< 1.5 7 2 1 (0.2857143 0.7142857) *
17) Purpose=1,2,3 66 32 1 (0.4848485 0.5151515)
34) Duration.of.Credit..month.>=33 26 7 0 (0.7307692 0.2692308) *
35) Duration.of.Credit..month.< 33 40 13 1 (0.3250000 0.6750000) *
9) Value.Savings.Stocks>=1.5 54 18 1 (0.3333333 0.6666667)
18) Length.of.current.employment< 2.5 32 15 1 (0.4687500 0.5312500)
36) Type.of.apartment=1 10 2 0 (0.8000000 0.2000000) *
37) Type.of.apartment=2,3 22 7 1 (0.3181818 0.6818182) *
19) Length.of.current.employment>=2.5 22 3 1 (0.1363636 0.8636364) *
5) Duration.of.Credit..month.< 13 96 24 1 (0.2500000 0.7500000)
10) Payment.Status.of.Previous.Credit=1 7 2 0 (0.7142857 0.2857143) *
11) Payment.Status.of.Previous.Credit=2,3 89 19 1 (0.2134831 0.7865169) *
3) Account.Balance=3 239 33 1 (0.1380753 0.8619247) *
There is minor improvement in accuracy % also
Conclusion: For this data set tree-based method seems to be working better than logistic regression or discriminant analysis.
GCD.5 - Random Forest
GCD.5 - Random ForestSample R code for Random Forest
library(randomForest)
rf50 <- randomForest(Creditability ~., data = Train50, ntree=200, importance=T, proximity=T)
plot(rf50, main="")
rf50
Test50_rf_pred <- predict(rf50, Test50, type="class")
table(Test50_rf_pred, Test50$Creditability)
importance(rf50)
varImpPlot(rf50, main="", cex=0.8)
Completely unsupervised random forest method on Training data with ntree = 200 leads to the following error plot:
Importance of predictors are given in the following dotplot.
which gives rise to the following classification table:
With judicious choice of more important predictors, further improvement in accuracy is possible. But as improvement is slight, no attempt is made for supervised random forest.
GCD.6 - Cost-Profit Consideration
GCD.6 - Cost-Profit ConsiderationUltimately these statistical decisions must be translated into profit consideration for the bank. Let us assume that a correct decision of the bank would result in 35% profit at the end of 5 years. A correct decision here means that the bank predicts an application to be good or credit-worthy and it actually turns out to be credit worthy. When the opposite is true, i.e. bank predicts the application to be good but it turns out to be bad credit, then the loss is 100%. If the bank predicts an application to be non-creditworthy, then loan facility is not extended to that applicant and bank does not incur any loss (opportunity loss is not considered here). The cost matrix, therefore, is as follows:
Out of 1000 applicants, 70% are creditworthy. A loan manager without any model would incur [0.7*0.35 + 0.3 (-1)] = - 0.055 or 0.055 unit loss. If the average loan amount is 3200 DM (approximately), then the total loss will be 1760000 DM and per applicant loss is 176 DM.
Logistic regression model performance:
Tree-based classification and random forest show a per unit profit; other methods are not doing well.
GCD - Appendix - Description of Dataset
GCD - Appendix - Description of Dataset
Variable |
Description |
Categories |
Score |
rel. frequency |
|
good |
bad |
||||
kredit |
Creditability: |
||||
laufkont |
Balance of current account |
no balance or debit |
2 |
35.00 |
23.43 |
0 <= ... < 200 DM |
3 |
4.67 |
7.00 |
||
... >= 200 DM or checking account for at least 1 year |
4 |
15.33 |
49.71 |
||
no running account |
1 |
45.00 |
19.86 |
||
laufzeit |
Duration in months (metric) |
||||
dlaufzeit |
Duration in months (categorized) |
<=6 |
10 |
3.00 |
10.43 |
6 < ... <= 12 |
9 |
22.33 |
30.00 |
||
12 < ... <= 18 |
8 |
18.67 |
18.71 |
||
18 < ... <= 24 |
7 |
22.00 |
22.57 |
||
24 < ... <= 30 |
6 |
6.33 |
5.43 |
||
30 < ... <= 36 |
5 |
12.67 |
6.86 |
||
36 < ... <= 42 |
4 |
1.67 |
1.71 |
||
42 < ... <= 48 |
3 |
10.67 |
3.14 |
||
48 < ... <= 54 |
2 |
0.33 |
0.14 |
||
> 54 |
1 |
2.33 |
1.00 |
||
moral |
Payment of previous credits |
no previous credits / paid back all previous credits |
2 |
56.33 |
51.57 |
paid back previous credits at this bank |
4 |
16.67 |
34.71 |
||
no problems with current credits at this bank |
3 |
9.33 |
8.57 |
||
hesitant payment of previous credits |
0 |
8.33 |
2.14 |
||
problematic running account / there are further credits running but at other banks |
1 |
9.33 |
3.00 |
||
verw |
Purpose of credit |
new car |
1 |
5.67 |
12.29 |
used car |
2 |
19.33 |
17.57 |
||
items of furniture |
3 |
20.67 |
31.14 |
||
radio / television |
4 |
1.33 |
1.14 |
||
household appliances |
5 |
2.67 |
2.00 |
||
repair |
6 |
7.33 |
4.00 |
||
education |
7 |
0.00 |
0.00 |
||
vacation |
8 |
0.33 |
1.14 |
||
retraining |
9 |
11.33 |
9.00 |
||
business |
10 |
1.67 |
1.00 |
||
other |
0 |
29.67 |
20.71 |
||
Hoehe |
Amount of credit in "Deutsche Mark" (metric) |
||||
dhoehe |
Amount of credit in DM (categorized) |
<=500 |
10 |
1.00 |
2.14 |
500 < ... <= 1000 |
9 |
11.33 |
9.14 |
||
1000 < ... <= 1500 |
8 |
17.00 |
19.86 |
||
1500 < ... <= 2500 |
7 |
19.67 |
24.57 |
||
2500 < ... <= 5000 |
6 |
25.00 |
28.57 |
||
5000 < ... <= 7500 |
5 |
11.33 |
9.71 |
||
7500 < ... <= 10000 |
4 |
6.67 |
3.71 |
||
10000 < ... <= 15000 |
3 |
7.00 |
2.00 |
||
15000 < ... <= 20000 |
2 |
1.00 |
0.29 |
||
> 20000 |
1 |
0.00 |
0.00 |
||
sparkont |
Value of savings or stocks |
< 100,- DM |
2 |
11.33 |
9.86 |
100,- <= ... < 500,- DM |
3 |
3.67 |
7.43 |
||
500,- <= ... < 1000,- DM |
4 |
2.00 |
6.00 |
||
>= 1000,- DM |
5 |
10.67 |
21.57 |
||
not available / no savings |
1 |
72.33 |
55.14 |
||
beszeit |
Has been employed by current employer for |
unemployed |
1 |
7.67 |
5.57 |
<= 1 year |
2 |
23.33 |
14.57 |
||
1 <= ... < 4 years |
3 |
34.67 |
33.57 |
||
4 <= ... < 7 years |
4 |
13.00 |
19.29 |
||
>= 7 years |
5 |
21.33 |
27.00 |
||
rate |
Instalment in % of available income |
>= 35 |
1 |
11.33 |
14.57 |
25 <= ... < 35 |
2 |
20.67 |
24.14 |
||
20 <= ... < 25 |
3 |
15.00 |
16.00 |
||
< 20 |
4 |
53.00 |
45.29 |
||
famges |
Marital Status / Sex |
male: divorced / living apart |
1 |
6.67 |
4.29 |
male: single |
2 |
36.33 |
28.72 |
||
male: married / widowed |
3 |
48.67 |
57.43 |
||
female: |
4 |
8.33 |
9.57 |
||
buerge |
Further debtors / Guarantors |
none |
1 |
90.67 |
90.71 |
Co-Applicant |
2 |
6.00 |
3.29 |
||
Guarantor |
3 |
3.33 |
6.00 |
||
wohnzeit |
Living in current household for |
< 1 year |
1 |
12.00 |
13.43 |
1 <= ... < 4 years |
2 |
32.33 |
30.14 |
||
4 <= ... < 7 years |
3 |
14.33 |
15.14 |
||
>= 7 years |
4 |
41.33 |
41.29 |
||
verm |
Most valuable available assets |
Ownership of house or land |
4 |
22.33 |
12.43 |
Savings contract with a building society / Life insurance |
3 |
34.00 |
32.86 |
||
Car / Other |
2 |
23.67 |
23.00 |
||
not available / no assets |
1 |
20.00 |
31.71 |
||
alter |
Age in years (metric) |
||||
dalter |
Age in years (categorized) |
0 <= ... <= 25 |
1 |
26.67 |
15.71 |
26 <= ... <= 39 |
2 |
47.33 |
52.72 |
||
40 <= ... <= 59 |
3 |
21.67 |
26.14 |
||
60 <= ... <= 64 |
5 |
2.33 |
3.00 |
||
>= 65 |
4 |
2.00 |
2.43 |
||
weitkred |
Further running credits |
at other banks |
1 |
19.00 |
11.71 |
at department store or mail order house |
2 |
6.33 |
4.00 |
||
no further running credits |
3 |
74.67 |
84.29 |
||
wohn |
Type of apartment |
rented flat |
2 |
62.00 |
75.43 |
owner-occupied flat |
3 |
14.67 |
9.14 |
||
free apartment |
1 |
23.33 |
15.57 |
||
bishkred |
Number of previous credits at this bank (including the running one) |
one |
1 |
66.67 |
61.86 |
two or three |
2 |
30.67 |
34.43 |
||
four or five |
3 |
2.00 |
3.14 |
||
six or more |
4 |
0.67 |
0.57 |
||
beruf |
Occupation |
unemployed / unskilled with no permanent residence |
1 |
2.33 |
2.14 |
unskilled with permanent residence |
2 |
18.67 |
20.57 |
||
skilled worker / skilled employee / minor civil servant |
3 |
62.00 |
63.43 |
||
executive / self-employed / higher civil servant |
4 |
17.00 |
13.86 |
||
pers |
Number of persons entitled to maintenance |
0 to 2 |
2 |
84.67 |
84.43 |
3 and more |
1 |
15.33 |
15.57 |
||
telef |
Telephone |
no |
1 |
62.33 |
58.43 |
yes |
2 |
37.67 |
41.57 |
||
gastarb |
Foreign worker |
yes |
1 |
1.33 |
4.71 |
no |
2 |
98.67 |
95.29 |
Data and additional description may be found here.