Challenge 1: Credit Scoring
Submission File Format
The submission file should include the following for each address included in the validation dataset:
Predicted probabilities of liquidation
Predicted labels
All feature values used to make the prediction
The predictions should be formatted as follows (and submitted via Spectral CLI):
Note: Depending upon the model architecture, some models can also predict logits instead of probabilities. Please ensure that your models output probabilities directly (and without converting logits into probabilities through torch.nn.Sigmoid()
or other similar functions) for them to remain compatible with our zkML setup.
Model Validation Criteria
All submitted models will be evaluated against the weighted average of the following seven model validation metrics:
Area Under the Receiver Operating Characteristic Curve (AUC/AUROC)
Area Under the Precision-Recall Curve (PR-AUC)
Recall Score
F1 Score
Brier Score (since the lower the Brier Score the better it is, we use
1 - Brier Score
to score models)Kolmogorov-Smirnov Statistic (KS Statistic)
Predicted Probability Densities (difference between the median predicted probability of the two labels)
These metrics will be calculated for the predictions (probabilities + labels) returned by the modeler on the validation dataset.
The respective weights and knock-out thresholds for each of the above metrics are:
Metric | Knock-Out Threshold (%) | Weight (%) |
AUC | 75 | 16 |
PR-AUC | 75 | 16 |
Recall Score | 55 | 16 |
1 - Brier Score | 80 | 16 |
KS Statistic | 45 | 16 |
Predicted Probability Densities | 30 | 10 |
F1 Score | 65 | 10 |
Additional Details:
The overall Model Score (which is a number between 0 and 100 inclusive) is the weighted average of all seven metrics based on their respective weights (akin to Excel’s SUMPRODUCT function)
The Knock-Out Thresholds indicate the minimum required metric value for a given model, i.e., any model that results in any of the seven metrics being less than the knock-out threshold will be automatically discarded, irrespective of the overall Model Score
Last updated