跳转至

Hetero Feature Selection

Feature selection is a process that selects a subset of features for model construction. Taking advantage of feature selection can improve model performance.

In this version, we provide several filter methods for feature selection. Note that module works in a cascade manner where selected result of filter A will be input into next filter B. User should pay attention to the order of listing when supplying multiple filters to filter_methods param in job configuration.

Features

Below lists available input models and their corresponding filter methods(as parameters in configuration):

Isometric Model Filter Method
None manually
percentage_value
Binning iv_filter(threshold)
iv_filter(top_k)
iv_filter(top_percentile)
Statistic statistic_filter
Pearson correlation_filter(with 'iv' metric & binning model)
vif_filter
SBT hetero_sbt_filter
hetero_fast_sbt_filter
PSI psi_filter

Most of the filter methods above share the same set of configurable parameters. Below lists their acceptable parameter values.

Filter Method Parameter Name metrics filter_type take_high
IV Filter filter_param "iv" "threshold", "top_k", "top_percentile" True
Statistic Filter statistic_param "max", "min", "mean", "median", "stddev", "variance", "coefficient_of_variance", "skewness", "kurtosis", "missing_count", "missing_ratio", quantile(e.g."95%") "threshold", "top_k", "top_percentile" True/False
PSI Filter psi_param "psi" "threshold", "top_k", "top_percentile" False
VIF Filter vif_param "vif" "threshold", "top_k", "top_percentile" False
Hetero/Homo/HeteroFast SBT Filter sbt_param "feature_importance" "threshold", "top_k", "top_percentile" True
  1. unique_value: filter the columns if all values in this feature are the same

    • iv_filter: Use iv as criterion to selection features. Support three mode: threshold value, top-k and top-percentile.

      • threshold value: Filter those columns whose iv is smaller than threshold. You can also set different threshold for each party.
      • top-k: Sort features from larger iv to smaller and take top k features in the sorted result.
      • top-percentile. Sort features from larger to smaller and take top percentile.

      Besides, multi-class iv filter is available if multi-class iv has been calculated in upstream component. There are three mechanisms to select features. Please remind that there exist as many ivs calculated as the number of labels since we use one-vs-rest for multi-class cases.

      • "min": take the minimum iv among all results.
      • "max": take the maximum ones

      * "average": take the average among all results. After that, we get unique one iv for each column so that we can use the three mechanism mentioned above to select features.

  2. statistic_filter: Use statistic values calculate from DataStatistic component. Support coefficient of variance, missing value, percentile value etc. You can pick the columns with higher statistic values or smaller values as you need.

  3. psi_filter: Take PSI component as input isometric model. Then, use its psi value as criterion of selection.

  4. hetero_sbt_filter/homo_sbt_filter/hetero_fast_sbt_filter: Take secureboost component as input isometric model. And use feature importance as criterion of selection.

  5. manually: Indicate features that need to be filtered.

  6. percentage_value: Filter the columns that have a value that exceeds a certain percentage.

Besides, we support multi-host federated feature selection for iv filters. Hosts encode feature names and send the feature ids that are involved in feature selection. Guest use iv filters' logic to judge whether a feature is left or not. Then guest sends result back to hosts. Hosts decode feature ids back to feature names and obtain selection results.

Figure 4: Multi-Host Selection
Principle\</div>

Param

feature_selection_param

Attributes

deprecated_param_list = ['iv_value_param', 'iv_percentile_param', 'iv_top_k_param', 'variance_coe_param', 'unique_param', 'outlier_param'] module-attribute

Classes

UniqueValueParam(eps=1e-05)

Bases: BaseParam

Use the difference between max-value and min-value to judge.

Parameters:

Name Type Description Default
eps float, default

The column(s) will be filtered if its difference is smaller than eps.

1e-05
Source code in python/federatedml/param/feature_selection_param.py
34
35
def __init__(self, eps=1e-5):
    self.eps = eps
Attributes
eps = eps instance-attribute
Functions
check()
Source code in python/federatedml/param/feature_selection_param.py
37
38
39
40
def check(self):
    descr = "Unique value param's"
    self.check_positive_number(self.eps, descr)
    return True
IVValueSelectionParam(value_threshold=0.0, host_thresholds=None, local_only=False)

Bases: BaseParam

Use information values to select features.

Parameters:

Name Type Description Default
value_threshold

Used if iv_value_thres method is used in feature selection.

0.0
host_thresholds

Set threshold for different host. If None, use same threshold as guest. If provided, the order should map with the host id setting.

None
Source code in python/federatedml/param/feature_selection_param.py
56
57
58
59
60
def __init__(self, value_threshold=0.0, host_thresholds=None, local_only=False):
    super().__init__()
    self.value_threshold = value_threshold
    self.host_thresholds = host_thresholds
    self.local_only = local_only
Attributes
value_threshold = value_threshold instance-attribute
host_thresholds = host_thresholds instance-attribute
local_only = local_only instance-attribute
Functions
check()
Source code in python/federatedml/param/feature_selection_param.py
62
63
64
65
66
67
68
69
70
71
72
73
def check(self):
    if not isinstance(self.value_threshold, (float, int)):
        raise ValueError("IV selection param's value_threshold should be float or int")

    if self.host_thresholds is not None:
        if not isinstance(self.host_thresholds, list):
            raise ValueError("IV selection param's host_threshold should be list or None")

    if not isinstance(self.local_only, bool):
        raise ValueError("IV selection param's local_only should be bool")

    return True
IVPercentileSelectionParam(percentile_threshold=1.0, local_only=False)

Bases: BaseParam

Use information values to select features.

Parameters:

Name Type Description Default
percentile_threshold

0 <= percentile_threshold <= 1.0, default: 1.0, Percentile threshold for iv_percentile method

1.0
Source code in python/federatedml/param/feature_selection_param.py
86
87
88
89
def __init__(self, percentile_threshold=1.0, local_only=False):
    super().__init__()
    self.percentile_threshold = percentile_threshold
    self.local_only = local_only
Attributes
percentile_threshold = percentile_threshold instance-attribute
local_only = local_only instance-attribute
Functions
check()
Source code in python/federatedml/param/feature_selection_param.py
91
92
93
94
95
96
def check(self):
    descr = "IV selection param's"
    if self.percentile_threshold != 0 or self.percentile_threshold != 1:
        self.check_decimal_float(self.percentile_threshold, descr)
    self.check_boolean(self.local_only, descr)
    return True
IVTopKParam(k=10, local_only=False)

Bases: BaseParam

Use information values to select features.

Parameters:

Name Type Description Default
k

should be greater than 0, default: 10, Percentile threshold for iv_percentile method

10
Source code in python/federatedml/param/feature_selection_param.py
109
110
111
112
def __init__(self, k=10, local_only=False):
    super().__init__()
    self.k = k
    self.local_only = local_only
Attributes
k = k instance-attribute
local_only = local_only instance-attribute
Functions
check()
Source code in python/federatedml/param/feature_selection_param.py
114
115
116
117
118
def check(self):
    descr = "IV selection param's"
    self.check_positive_integer(self.k, descr)
    self.check_boolean(self.local_only, descr)
    return True
VarianceOfCoeSelectionParam(value_threshold=1.0)

Bases: BaseParam

Use coefficient of variation to select features. When judging, the absolute value will be used.

Parameters:

Name Type Description Default
value_threshold

Used if coefficient_of_variation_value_thres method is used in feature selection. Filter those columns who has smaller coefficient of variance than the threshold.

1.0
Source code in python/federatedml/param/feature_selection_param.py
133
134
def __init__(self, value_threshold=1.0):
    self.value_threshold = value_threshold
Attributes
value_threshold = value_threshold instance-attribute
Functions
check()
Source code in python/federatedml/param/feature_selection_param.py
136
137
138
139
def check(self):
    descr = "Coff of Variances param's"
    self.check_positive_number(self.value_threshold, descr)
    return True
OutlierColsSelectionParam(percentile=1.0, upper_threshold=1.0)

Bases: BaseParam

Given percentile and threshold. Judge if this quantile point is larger than threshold. Filter those larger ones.

Parameters:

Name Type Description Default
percentile

The percentile points to compare.

1.0
upper_threshold

Percentile threshold for coefficient_of_variation_percentile method

1.0
Source code in python/federatedml/param/feature_selection_param.py
154
155
156
def __init__(self, percentile=1.0, upper_threshold=1.0):
    self.percentile = percentile
    self.upper_threshold = upper_threshold
Attributes
percentile = percentile instance-attribute
upper_threshold = upper_threshold instance-attribute
Functions
check()
Source code in python/federatedml/param/feature_selection_param.py
158
159
160
161
162
def check(self):
    descr = "Outlier Filter param's"
    self.check_decimal_float(self.percentile, descr)
    self.check_defined_type(self.upper_threshold, descr, ['float', 'int'])
    return True
CommonFilterParam(metrics, filter_type='threshold', take_high=True, threshold=1, host_thresholds=None, select_federated=True)

Bases: BaseParam

All of the following parameters can set with a single value or a list of those values. When setting one single value, it means using only one metric to filter while a list represent for using multiple metrics.

Please note that if some of the following values has been set as list, all of them should have same length. Otherwise, error will be raised. And if there exist a list type parameter, the metrics should be in list type.

Parameters:

Name Type Description Default
metrics

Indicate what metrics are used in this filter

required
filter_type

Should be one of "threshold", "top_k" or "top_percentile"

'threshold'
take_high

When filtering, taking highest values or not.

True
threshold

If filter type is threshold, this is the threshold value. If it is "top_k", this is the k value. If it is top_percentile, this is the percentile threshold.

1
host_thresholds

Set threshold for different host. If None, use same threshold as guest. If provided, the order should map with the host id setting.

None
select_federated

Whether select federated with other parties or based on local variables

True
Source code in python/federatedml/param/feature_selection_param.py
194
195
196
197
198
199
200
201
202
def __init__(self, metrics, filter_type='threshold', take_high=True, threshold=1,
             host_thresholds=None, select_federated=True):
    super().__init__()
    self.metrics = metrics
    self.filter_type = filter_type
    self.take_high = take_high
    self.threshold = threshold
    self.host_thresholds = host_thresholds
    self.select_federated = select_federated
Attributes
metrics = metrics instance-attribute
filter_type = filter_type instance-attribute
take_high = take_high instance-attribute
threshold = threshold instance-attribute
host_thresholds = host_thresholds instance-attribute
select_federated = select_federated instance-attribute
Functions
check()
Source code in python/federatedml/param/feature_selection_param.py
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
def check(self):
    self._convert_to_list(param_names=["filter_type", "take_high",
                                       "threshold", "select_federated"])

    for v in self.filter_type:
        if v not in ["threshold", "top_k", "top_percentile"]:
            raise ValueError('filter_type should be one of '
                             '"threshold", "top_k", "top_percentile"')

    descr = "hetero feature selection param's"
    for v in self.take_high:
        self.check_boolean(v, descr)

    for idx, v in enumerate(self.threshold):
        if self.filter_type[idx] == "threshold":
            if not isinstance(v, (float, int)):
                raise ValueError(descr + f"{v} should be a float or int")
        elif self.filter_type[idx] == 'top_k':
            self.check_positive_integer(v, descr)
        else:
            if not (v == 0 or v == 1):
                self.check_decimal_float(v, descr)

    if self.host_thresholds is not None:
        if not isinstance(self.host_thresholds, list):
            self.host_thresholds = [self.host_thresholds]
            # raise ValueError("selection param's host_thresholds should be list or None")

    assert isinstance(self.select_federated, list)
    for v in self.select_federated:
        self.check_boolean(v, descr)
IVFilterParam(filter_type='threshold', threshold=1, host_thresholds=None, select_federated=True, mul_class_merge_type='average')

Bases: CommonFilterParam

Parameters:

Name Type Description Default
mul_class_merge_type

Indicate how to merge multi-class iv results. Support "average", "min" and "max".

'average'
Source code in python/federatedml/param/feature_selection_param.py
267
268
269
270
271
def __init__(self, filter_type='threshold', threshold=1,
             host_thresholds=None, select_federated=True, mul_class_merge_type="average"):
    super().__init__(metrics='iv', filter_type=filter_type, take_high=True, threshold=threshold,
                     host_thresholds=host_thresholds, select_federated=select_federated)
    self.mul_class_merge_type = mul_class_merge_type
Attributes
mul_class_merge_type = mul_class_merge_type instance-attribute
Functions
check()
Source code in python/federatedml/param/feature_selection_param.py
273
274
275
def check(self):
    super(IVFilterParam, self).check()
    self._convert_to_list(param_names=["mul_class_merge_type"])
CorrelationFilterParam(sort_metric='iv', threshold=0.1, select_federated=True)

Bases: BaseParam

This filter follow this specific rules: 1. Sort all the columns from high to low based on specific metric, eg. iv. 2. Traverse each sorted column. If there exists other columns with whom the absolute values of correlation are larger than threshold, they will be filtered.

Parameters:

Name Type Description Default
sort_metric

Specify which metric to be used to sort features.

'iv'
threshold

Correlation threshold

0.1
select_federated

Whether select federated with other parties or based on local variables

True
Source code in python/federatedml/param/feature_selection_param.py
295
296
297
298
299
def __init__(self, sort_metric='iv', threshold=0.1, select_federated=True):
    super().__init__()
    self.sort_metric = sort_metric
    self.threshold = threshold
    self.select_federated = select_federated
Attributes
sort_metric = sort_metric instance-attribute
threshold = threshold instance-attribute
select_federated = select_federated instance-attribute
Functions
check()
Source code in python/federatedml/param/feature_selection_param.py
301
302
303
304
305
306
307
308
309
def check(self):
    descr = "Correlation Filter param's"

    self.sort_metric = self.sort_metric.lower()
    support_metrics = ['iv']
    if self.sort_metric not in support_metrics:
        raise ValueError(f"sort_metric in Correlation Filter should be one of {support_metrics}")

    self.check_positive_number(self.threshold, descr)
PercentageValueParam(upper_pct=1.0)

Bases: BaseParam

Filter the columns that have a value that exceeds a certain percentage.

Parameters:

Name Type Description Default
upper_pct

The upper percentage threshold for filtering, upper_pct should not be less than 0.1.

1.0
Source code in python/federatedml/param/feature_selection_param.py
323
324
325
def __init__(self, upper_pct=1.0):
    super().__init__()
    self.upper_pct = upper_pct
Attributes
upper_pct = upper_pct instance-attribute
Functions
check()
Source code in python/federatedml/param/feature_selection_param.py
327
328
329
330
331
332
333
334
def check(self):
    descr = "Percentage Filter param's"
    if self.upper_pct not in [0, 1]:
        self.check_decimal_float(self.upper_pct, descr)
    if self.upper_pct < consts.PERCENTAGE_VALUE_LIMIT:
        raise ValueError(descr + f" {self.upper_pct} not supported,"
                                 f" should not be smaller than {consts.PERCENTAGE_VALUE_LIMIT}")
    return True
ManuallyFilterParam(filter_out_indexes=None, filter_out_names=None, left_col_indexes=None, left_col_names=None)

Bases: BaseParam

Specified columns that need to be filtered. If exist, it will be filtered directly, otherwise, ignore it.

Both Filter_out or left parameters only works for this specific filter. For instances, if you set some columns left in this filter but those columns are filtered by other filters, those columns will NOT left in final.

Please note that (left_col_indexes & left_col_names) cannot use with (filter_out_indexes & filter_out_names) simultaneously.

Parameters:

Name Type Description Default
filter_out_indexes

Specify columns' indexes to be filtered out Note tha columns specified by filter_out_indexes and filter_out_names will be combined.

None
filter_out_names list of string, default

Specify columns' names to be filtered out Note tha columns specified by filter_out_indexes and filter_out_names will be combined.

None
left_col_indexes

Specify left_col_index Note tha columns specified by left_col_indexes and left_col_names will be combined.

None
left_col_names

Specify left col names Note tha columns specified by left_col_indexes and left_col_names will be combined.

None
Source code in python/federatedml/param/feature_selection_param.py
362
363
364
365
366
367
368
def __init__(self, filter_out_indexes=None, filter_out_names=None, left_col_indexes=None,
             left_col_names=None):
    super().__init__()
    self.filter_out_indexes = filter_out_indexes
    self.filter_out_names = filter_out_names
    self.left_col_indexes = left_col_indexes
    self.left_col_names = left_col_names
Attributes
filter_out_indexes = filter_out_indexes instance-attribute
filter_out_names = filter_out_names instance-attribute
left_col_indexes = left_col_indexes instance-attribute
left_col_names = left_col_names instance-attribute
Functions
check()
Source code in python/federatedml/param/feature_selection_param.py
370
371
372
373
374
375
376
377
378
379
380
381
def check(self):
    descr = "Manually Filter param's"
    self.check_defined_type(self.filter_out_indexes, descr, ['list', 'NoneType'])
    self.check_defined_type(self.filter_out_names, descr, ['list', 'NoneType'])
    self.check_defined_type(self.left_col_indexes, descr, ['list', 'NoneType'])
    self.check_defined_type(self.left_col_names, descr, ['list', 'NoneType'])

    if (self.filter_out_indexes or self.filter_out_names) is not None and \
            (self.left_col_names or self.left_col_indexes) is not None:
        raise ValueError("(left_col_indexes & left_col_names) cannot use with"
                         " (filter_out_indexes & filter_out_names) simultaneously")
    return True
FeatureSelectionParam(select_col_indexes=-1, select_names=None, filter_methods=None, unique_param=UniqueValueParam(), iv_value_param=IVValueSelectionParam(), iv_percentile_param=IVPercentileSelectionParam(), iv_top_k_param=IVTopKParam(), variance_coe_param=VarianceOfCoeSelectionParam(), outlier_param=OutlierColsSelectionParam(), manually_param=ManuallyFilterParam(), percentage_value_param=PercentageValueParam(), iv_param=IVFilterParam(), statistic_param=CommonFilterParam(metrics=consts.MEAN), psi_param=CommonFilterParam(metrics=consts.PSI, take_high=False), vif_param=CommonFilterParam(metrics=consts.VIF, threshold=5.0, take_high=False), sbt_param=CommonFilterParam(metrics=consts.FEATURE_IMPORTANCE), correlation_param=CorrelationFilterParam(), use_anonymous=False, need_run=True)

Bases: BaseParam

Define the feature selection parameters.

Parameters:

Name Type Description Default
select_col_indexes

Specify which columns need to calculated. -1 represent for all columns. Note tha columns specified by select_col_indexes and select_names will be combined.

-1
select_names list of string, default

Specify which columns need to calculated. Each element in the list represent for a column name in header. Note tha columns specified by select_col_indexes and select_names will be combined.

None
filter_methods

“hetero_sbt_filter", "homo_sbt_filter", "hetero_fast_sbt_filter", "percentage_value", "vif_filter", "correlation_filter"], default: ["manually"]. The following methods will be deprecated in future version: "unique_value", "iv_value_thres", "iv_percentile", "coefficient_of_variation_value_thres", "outlier_cols" Specify the filter methods used in feature selection. The orders of filter used is depended on this list. Please be notified that, if a percentile method is used after some certain filter method, the percentile represent for the ratio of rest features. e.g. If you have 10 features at the beginning. After first filter method, you have 8 rest. Then, you want top 80% highest iv feature. Here, we will choose floor(0.8 * 8) = 6 features instead of 8.

None
unique_param

filter the columns if all values in this feature is the same

UniqueValueParam()
iv_value_param

Use information value to filter columns. If this method is set, a float threshold need to be provided. Filter those columns whose iv is smaller than threshold. Will be deprecated in the future.

IVValueSelectionParam()
iv_percentile_param

Use information value to filter columns. If this method is set, a float ratio threshold need to be provided. Pick floor(ratio * feature_num) features with higher iv. If multiple features around the threshold are same, all those columns will be keep. Will be deprecated in the future.

IVPercentileSelectionParam()
variance_coe_param

Use coefficient of variation to judge whether filtered or not. Will be deprecated in the future.

VarianceOfCoeSelectionParam()
outlier_param

Filter columns whose certain percentile value is larger than a threshold. Will be deprecated in the future.

OutlierColsSelectionParam()
percentage_value_param

Filter the columns that have a value that exceeds a certain percentage.

PercentageValueParam()
iv_param

Setting how to filter base on iv. It support take high mode only. All of "threshold", "top_k" and "top_percentile" are accepted. Check more details in CommonFilterParam. To use this filter, hetero-feature-binning module has to be provided.

IVFilterParam()
statistic_param

Setting how to filter base on statistic values. All of "threshold", "top_k" and "top_percentile" are accepted. Check more details in CommonFilterParam. To use this filter, data_statistic module has to be provided.

CommonFilterParam(metrics=consts.MEAN)
psi_param

Setting how to filter base on psi values. All of "threshold", "top_k" and "top_percentile" are accepted. Its take_high properties should be False to choose lower psi features. Check more details in CommonFilterParam. To use this filter, data_statistic module has to be provided.

CommonFilterParam(metrics=consts.PSI, take_high=False)
use_anonymous

whether to interpret 'select_names' as anonymous names.

False
need_run

Indicate if this module needed to be run

True
Source code in python/federatedml/param/feature_selection_param.py
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
def __init__(self, select_col_indexes=-1, select_names=None, filter_methods=None,
             unique_param=UniqueValueParam(),
             iv_value_param=IVValueSelectionParam(),
             iv_percentile_param=IVPercentileSelectionParam(),
             iv_top_k_param=IVTopKParam(),
             variance_coe_param=VarianceOfCoeSelectionParam(),
             outlier_param=OutlierColsSelectionParam(),
             manually_param=ManuallyFilterParam(),
             percentage_value_param=PercentageValueParam(),
             iv_param=IVFilterParam(),
             statistic_param=CommonFilterParam(metrics=consts.MEAN),
             psi_param=CommonFilterParam(metrics=consts.PSI,
                                         take_high=False),
             vif_param=CommonFilterParam(metrics=consts.VIF,
                                         threshold=5.0,
                                         take_high=False),
             sbt_param=CommonFilterParam(metrics=consts.FEATURE_IMPORTANCE),
             correlation_param=CorrelationFilterParam(),
             use_anonymous=False,
             need_run=True
             ):
    super(FeatureSelectionParam, self).__init__()
    self.correlation_param = correlation_param
    self.vif_param = vif_param
    self.select_col_indexes = select_col_indexes
    if select_names is None:
        self.select_names = []
    else:
        self.select_names = select_names
    if filter_methods is None:
        self.filter_methods = [consts.MANUALLY_FILTER]
    else:
        self.filter_methods = filter_methods

    # deprecate in the future
    self.unique_param = copy.deepcopy(unique_param)
    self.iv_value_param = copy.deepcopy(iv_value_param)
    self.iv_percentile_param = copy.deepcopy(iv_percentile_param)
    self.iv_top_k_param = copy.deepcopy(iv_top_k_param)
    self.variance_coe_param = copy.deepcopy(variance_coe_param)
    self.outlier_param = copy.deepcopy(outlier_param)
    self.percentage_value_param = copy.deepcopy(percentage_value_param)

    self.manually_param = copy.deepcopy(manually_param)
    self.iv_param = copy.deepcopy(iv_param)
    self.statistic_param = copy.deepcopy(statistic_param)
    self.psi_param = copy.deepcopy(psi_param)
    self.sbt_param = copy.deepcopy(sbt_param)
    self.need_run = need_run
    self.use_anonymous = use_anonymous
Attributes
correlation_param = correlation_param instance-attribute
vif_param = vif_param instance-attribute
select_col_indexes = select_col_indexes instance-attribute
select_names = [] instance-attribute
filter_methods = [consts.MANUALLY_FILTER] instance-attribute
unique_param = copy.deepcopy(unique_param) instance-attribute
iv_value_param = copy.deepcopy(iv_value_param) instance-attribute
iv_percentile_param = copy.deepcopy(iv_percentile_param) instance-attribute
iv_top_k_param = copy.deepcopy(iv_top_k_param) instance-attribute
variance_coe_param = copy.deepcopy(variance_coe_param) instance-attribute
outlier_param = copy.deepcopy(outlier_param) instance-attribute
percentage_value_param = copy.deepcopy(percentage_value_param) instance-attribute
manually_param = copy.deepcopy(manually_param) instance-attribute
iv_param = copy.deepcopy(iv_param) instance-attribute
statistic_param = copy.deepcopy(statistic_param) instance-attribute
psi_param = copy.deepcopy(psi_param) instance-attribute
sbt_param = copy.deepcopy(sbt_param) instance-attribute
need_run = need_run instance-attribute
use_anonymous = use_anonymous instance-attribute
Functions
check()
Source code in python/federatedml/param/feature_selection_param.py
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
def check(self):
    descr = "hetero feature selection param's"

    self.check_defined_type(self.filter_methods, descr, ['list'])

    for idx, method in enumerate(self.filter_methods):
        method = method.lower()
        self.check_valid_value(method, descr, [consts.UNIQUE_VALUE, consts.IV_VALUE_THRES, consts.IV_PERCENTILE,
                                               consts.COEFFICIENT_OF_VARIATION_VALUE_THRES, consts.OUTLIER_COLS,
                                               consts.MANUALLY_FILTER, consts.PERCENTAGE_VALUE,
                                               consts.IV_FILTER, consts.STATISTIC_FILTER, consts.IV_TOP_K,
                                               consts.PSI_FILTER, consts.HETERO_SBT_FILTER,
                                               consts.HOMO_SBT_FILTER, consts.HETERO_FAST_SBT_FILTER,
                                               consts.VIF_FILTER, consts.CORRELATION_FILTER])

        self.filter_methods[idx] = method

    self.check_defined_type(self.select_col_indexes, descr, ['list', 'int'])

    self.unique_param.check()
    self.iv_value_param.check()
    self.iv_percentile_param.check()
    self.iv_top_k_param.check()
    self.variance_coe_param.check()
    self.outlier_param.check()
    self.manually_param.check()
    self.percentage_value_param.check()

    self.iv_param.check()
    for th in self.iv_param.take_high:
        if not th:
            raise ValueError("Iv filter should take higher iv features")
    for m in self.iv_param.metrics:
        if m != consts.IV:
            raise ValueError("For iv filter, metrics should be 'iv'")

    self.statistic_param.check()
    self.psi_param.check()
    for th in self.psi_param.take_high:
        if th:
            raise ValueError("PSI filter should take lower psi features")
    for m in self.psi_param.metrics:
        if m != consts.PSI:
            raise ValueError("For psi filter, metrics should be 'psi'")

    self.sbt_param.check()
    for th in self.sbt_param.take_high:
        if not th:
            raise ValueError("SBT filter should take higher feature_importance features")
    for m in self.sbt_param.metrics:
        if m != consts.FEATURE_IMPORTANCE:
            raise ValueError("For SBT filter, metrics should be 'feature_importance'")

    self.vif_param.check()
    for m in self.vif_param.metrics:
        if m != consts.VIF:
            raise ValueError("For VIF filter, metrics should be 'vif'")

    self.correlation_param.check()
    self.check_boolean(self.use_anonymous, f"{descr} use_anonymous")

    self._warn_to_deprecate_param("iv_value_param", descr, "iv_param")
    self._warn_to_deprecate_param("iv_percentile_param", descr, "iv_param")
    self._warn_to_deprecate_param("iv_top_k_param", descr, "iv_param")
    self._warn_to_deprecate_param("variance_coe_param", descr, "statistic_param")
    self._warn_to_deprecate_param("unique_param", descr, "statistic_param")
    self._warn_to_deprecate_param("outlier_param", descr, "statistic_param")

Functions


最后更新: 2022-08-16