Scoring Methodology
How are MACRO scores computed? What is the general framework around on-chain credit scoring?
Although the methodology underpinning the development of MACRO scores draws heavily from the conventional credit scoring concepts being utilized in the traditional finance space for decades, there are some vital differences between the two.
Thanks to the open-source nature of the Ethereum blockchain at our disposal, we are able to collect and extract a substantially broad and deep dataset. This dataset comprises all the on-chain borrowing transactions and behavior recorded on some of the leading DeFi protocols since their inception (including Compound, Aave, and MakerDAO) together with several additional data points related to an address' wallet history and transactions not necessarily executed on the DeFi platforms.
Several interesting new features are engineered from this raw dataset based on our collective experience, judgment, and intuition. The majority of the data (henceforth called "features") are numerical while some are bucketed into discrete categories.
One of the most critical aspects of any credit scoring methodology is to define the target label - what is it that we are trying to predict - and this is also one of the key differences with the conventional credit scoring mechanisms. Given the fluid nature of on-chain borrowing behavior (where a single payment can repay multiple loans, no fixed repayment dates, etc.), we adopt various notions when it comes to target labeling.
For instance, one of our hybrid target labels represents whether a borrower has gotten liquidated within a predefined time window post the date of borrowing and whether his health factor dropped below a certain threshold within the same time frame. In other words, our methodology can be used to model the propensity of either a borrower getting liquidated or his health factor dropping below some threshold in the future.
In order to feed the machine learning algorithm with only the relevant, high-quality, and highly predictive set of features, several iterations are performed to select the best subset of features through the following approaches.
- Correlation Evaluation: Features that exhibit high correlations among themselves (determined through Pearson correlation coefficient and VIF) are excluded. This ensures that none of the features represent the same information value and thereby impact the interpretability of our final model.
- Variance Analysis: Features with a very low variance among observations do not add any value to the predictive power of a classification algorithm and are accordingly excluded.
- Feature Importance Derivation: Several well-established feature importance metrics are utilized to identify and rank the features based on their relative importance observed during model training.
A combination of the above approaches enables us to short-list the most suitable features required for risk assessment without compromising on the model's performance.
Several techniques are utilized to confirm that our final model will survive in production.
- Traditional validation metrics, e.g. recall, F1 score, Area Under the Receiver Operating Characteristic Curve (AUROC), and the Gini index
- Backtesting and stress testing: to ensure our model performs well given different time space and input space
- Distribution check: to ensure that our scores exhibit appropriate discriminatory power across various score categories, i.e. the proportion of "good" observations should progressively increase as we move from the lower end of scores to the high end
With our model, we are able to predict the probability of not being liquidated. After certain transformations, these predicted probabilities are scaled to the final score range of 300 - 850. These are then utilized to score prospective borrowers on Spectral Finance.
Given its rapid pace of development, high data velocity, and the relatively nascent stage of DeFi compared to traditional finance, our scoring methodology will not remain static. Instead, it will continuously evolve as a result of periodic statistical validation in terms of stability, robustness, and predictive power. Model recalibration or redevelopment will be a regular activity, as and when warranted by the model evaluation results. We also intend to include additional data sources or specific data points as they become available and only if they add to the predictive power of our scoring methodology while retaining its robustness and other characteristics.