Hetero Feature Binning¶
Feature binning or data binning is a data pre-processing technique. It can be used to reduce the effects of minor observation errors, calculate information values and so on.
Currently, we provide quantile binning and bucket binning methods. To achieve quantile binning approach, we have used a special data structure mentioned in this paper. Feel free to check out the detail algorithm in the paper.
As for calculating the federated iv and woe values, the following figure can describe the principle properly.
As the figure shows, B party which has the data labels encrypt its labels with Addiction homomorphic encryption and then send to A. A static each bin's label sum and send back. Then B can calculate woe and iv base on the given information.
For multiple hosts, it is similar with one host case. Guest sends its encrypted label information to all hosts, and each of the hosts calculates and sends back the static info.
Features¶
- Support Quantile Binning based on quantile summary algorithm.
- Support Bucket Binning.
- Support calculating woe and iv values.
- Support transforming data into bin indexes or woe value(guest only).
- Support multiple-host binning.
- Support asymmetric binning methods on Host & Guest sides.
Below lists supported features with links to examples:
Cases | Scenario |
---|---|
Input Data with Categorical Features | bucket binning quantile binning |
Output Data Transformed | bin index woe value(guest-only) |
Skip Metrics Calculation | multi_host |