跳转至

联邦机器学习

Federatedml模块包括许多常见机器学习算法联邦化实现。所有模块均采用去耦的模块化方法开发,以增强模块的可扩展性。具体来说,我们提供:

  1. 联邦统计: 包括隐私交集计算,并集计算,皮尔逊系数, PSI等
  2. 联邦信息检索:基于OT的PIR(SIR)
  3. 联邦特征工程:包括联邦采样,联邦特征分箱,联邦特征选择等。
  4. 联邦机器学习算法:包括横向和纵向的联邦LR, GBDT, DNN,迁移学习, 无监督学习,纵向半监督学习等
  5. 模型评估:提供对二分类,多分类,回归评估,聚类评估,联邦和单边对比评估
  6. 安全协议:提供了多种安全协议,以进行更安全的多方交互计算。

算法清单

算法 模块名 描述 数据输入 数据输出 模型输入 模型输出
DataTransform DataTransform 该组件将原始数据转换为Instance对象。 Table,值为原始数据 转换后的数据表,值为Data Instance的实例 DataTransform模型
Intersect Intersection 计算两方的相交数据集,而不会泄漏任何差异数据集的信息。主要用于纵向任务。 Table 两方Table中相交的部分 Intersect模型
Federated Sampling FederatedSample 对数据进行联邦采样,使得数据分布在各方之间变得平衡。这一模块同时支持单机和集群版本。 Table 采样后的数据,同时支持随机采样和分层采样
Feature Scale FeatureScale 特征归一化和标准化。 Table,其值为instance 转换后的Table 变换系数,例如最小值/最大值,平均值/标准差
Hetero Feature Binning HeteroFeatureBinning 使用分箱的输入数据,计算每个列的iv和woe,并根据合并后的信息转换数据。 Table,在guest中有标签y,在host中没有标签y 转换后的Table 每列的iv/woe,分裂点,事件计数,非事件计数等
Homo Feature Binning HomoFeatureBinning 计算横向场景的等频分箱 Table 转换后的Table 每列的分裂点
OneHot Encoder OneHotEncoder 将一列转换为One-Hot格式。 Table, 值为Instance 转换了带有新列名的Table 原始列名和特征值到新列名的映射
Hetero Feature Selection HeteroFeatureSelection 提供多种类型的filter。每个filter都可以根据用户配置选择列。 Table, 值为Instance 转换的Table具有新的header和已过滤的数据实例 模型输入如果使用iv filters,则需要hetero_binning模型 每列是否留下
Union Union 将多个数据表合并成一个。 Tables 多个Tables合并后的Table
Hetero-LR HeteroLR 通过多方构建纵向逻辑回归模块。 Table, 值为Instance Logistic回归模型,由模型本身和模型参数组成
Local Baseline LocalBaseline 使用本地数据运行sklearn Logistic回归模型。 Table, 值为Instance
Hetero-LinR HeteroLinR 通过多方建立纵向线性回归模块。 Table, 值为Instance 线性回归模型,由模型本身和模型参数组成
Hetero-Poisson HeteroPoisson 通过多方构建纵向泊松回归模块。 Table, 值为Instance 泊松回归模型,由模型本身和模型参数组成
Homo-LR HomoLR 通过多方构建横向逻辑回归模块。 Table, 值为Instance Logistic回归模型,由模型本身和模型参数组成
Homo-NN HomoNN 通过多方构建横向神经网络模块。 Table, 值为Instance 神经网络模型,由模型本身和模型参数组成
Hetero Secure Boosting HeteroSecureBoost 通过多方构建纵向Secure Boost模块。 Table,值为Instance SecureBoost模型,由模型本身和模型参数组成
Hetero Fast Secure Boosting HeteroFastSecureBoost 使用分层/混合模式快速构建树模型 Table,值为Instance Table,值为Instance FastSecureBoost模型
Evaluation Evaluation 为用户输出模型评估指标。 Table(s), 值为Instance
Hetero Pearson HeteroPearson 计算来自不同方的特征的Pearson相关系数。 Table, 值为Instance
Hetero-NN HeteroNN 构建纵向神经网络模块。 Table, 值为Instance 纵向神经网络模型
Homo Secure Boosting HomoSecureBoost 通过多方构建横向Secure Boost模块 Table, 值为Instance SecureBoost模型,由模型本身和模型参数组成
Homo OneHot Encoder HomoOneHotEncoder 将一列转换为One-Hot格式。 Table, 值为Instance 转换了带有新列名的Table 原始列名和特征值到新列名的映射
Hetero Data Split HeteroDataSplit 将输入数据集按用户自定义比例或样本量切分为3份子数据集 Table, 值为Instance 3 Tables
Homo Data Split HomoDataSplit 将输入数据集按用户自定义比例或样本量切分为3份子数据集 Table, 值为Instance 3 Tables
Column Expand ColumnExpand 对原始Table添加任意列数的任意数值 Table, 值为原始数据 转换后带有新数列与列名的Table Column Expand模型
Secure Information Retrieval SecureInformationRetrieval 通过不经意传输协议安全取回所需数值 Table, 值为Instance Table, 值为取回数值
Hetero Federated Transfer Learning FTL 在两个party间构建联邦迁移模型 Table, 值为Instance FTL神经网络模型参数等
PSI PSI 计算两个表特征间的PSI值 Table, 值为Instance PSI 结果
Hetero KMeans HeteroKMeans 构建K均值模块 Table, 值为Instance Table, 值为Instance; Arbiter方输出2个Table Hetero KMeans模型
Data Statistics DataStatistics 这个组件会对数据做一些统计工作,包括统计均值,最大最小值,中位数等 Table, 值为Instance Table Statistic Result
Scorecard Scorecard 转换二分类预测分数至信用分 Table, 值为二分类预测结果 Table, 值为转化后信用分结果
Sample Weight SampleWeight 根据用户设置对输入数据加权 Table, 值为Instance Table, 值为加权后Instance SampleWeight Model
Feldman Verifiable Sum FeldmanVerifiableSum 不暴露隐私数据的前提下进行多方隐私数据求和 Table, 值为加数或被加数 Table, 值为求和结果
Feature Imputation FeatureImputation 使用指定方法、数值填充特征缺失值 Table, 值为Instance Table, 值为填充后Instance Feature Imputation Model FeatureImputation Model
Label Transform LabelTransform 转化输入数据与预测结果的标签值 Table, 值为Instance或预测结果 Table, 值为标签转化后的Instance或预测结果 LabelTransform Model
Hetero SSHE Logistic Regression HeteroSSHELR 两方构建纵向逻辑回归(无可信第三方) Table, 值为Instance Table, 值为Instance SSHE LR Model
Hetero SSHE Linear Regression HeteroSSHELinR 两方构建纵向线性回归(无可信第三方) Table, 值为Instance Table, 值为Instance SSHE LinR Model
Positive Unlabeled Learning PositiveUnlabeled 构建positive unlabeled learning(PU learning)模型 Table, 值为Instance Table, 值为Instance

安全协议

算法参数

param

Attributes

__all__ = ['BoostingParam', 'ObjectiveParam', 'DecisionTreeParam', 'CrossValidationParam', 'DataSplitParam', 'DataIOParam', 'DataTransformParam', 'EncryptParam', 'EncryptedModeCalculatorParam', 'FeatureBinningParam', 'FeatureSelectionParam', 'FTLParam', 'HeteroNNParam', 'HomoNNParam', 'HomoOneHotParam', 'InitParam', 'IntersectParam', 'EncodeParam', 'RSAParam', 'LinearParam', 'LocalBaselineParam', 'LogisticParam', 'OneVsRestParam', 'PearsonParam', 'PoissonParam', 'PositiveUnlabeledParam', 'PredictParam', 'PSIParam', 'SampleParam', 'ScaleParam', 'SecureAddExampleParam', 'StochasticQuasiNewtonParam', 'StatisticsParam', 'StepwiseParam', 'UnionParam', 'ColumnExpandParam', 'KmeansParam', 'ScorecardParam', 'SecureInformationRetrievalParam', 'SampleWeightParam', 'FeldmanVerifiableSumParam', 'EvaluateParam'] module-attribute

Classes

PSIParam(max_bin_num=20, need_run=True, dense_missing_val=None, binning_error=consts.DEFAULT_RELATIVE_ERROR)

Bases: BaseParam

Source code in python/federatedml/param/psi_param.py
 7
 8
 9
10
11
12
13
def __init__(self, max_bin_num=20, need_run=True, dense_missing_val=None,
             binning_error=consts.DEFAULT_RELATIVE_ERROR):
    super(PSIParam, self).__init__()
    self.max_bin_num = max_bin_num
    self.need_run = need_run
    self.dense_missing_val = dense_missing_val
    self.binning_error = binning_error
Attributes
max_bin_num = max_bin_num instance-attribute
need_run = need_run instance-attribute
dense_missing_val = dense_missing_val instance-attribute
binning_error = binning_error instance-attribute
Functions
check()
Source code in python/federatedml/param/psi_param.py
15
16
17
18
19
20
21
22
23
24
def check(self):
    assert isinstance(self.max_bin_num, int) and self.max_bin_num > 0, 'max bin must be an integer larger than 0'
    assert isinstance(self.need_run, bool)

    if self.dense_missing_val is not None:
        assert isinstance(self.dense_missing_val, str) or isinstance(self.dense_missing_val, int) or \
            isinstance(self.dense_missing_val, float), \
            'missing value type {} not supported'.format(type(self.dense_missing_val))

    self.check_decimal_float(self.binning_error, "psi's param")
HomoOneHotParam(transform_col_indexes=-1, transform_col_names=None, need_run=True, need_alignment=True)

Bases: BaseParam

Parameters:

Name Type Description Default
transform_col_indexes

Specify which columns need to calculated. -1 represent for all columns.

-1
need_run

Indicate if this module needed to be run

True
need_alignment

Indicated whether alignment of features is turned on

True
Source code in python/federatedml/param/homo_onehot_encoder_param.py
24
25
26
27
28
29
def __init__(self, transform_col_indexes=-1, transform_col_names=None, need_run=True, need_alignment=True):
    super(HomoOneHotParam, self).__init__()
    self.transform_col_indexes = transform_col_indexes
    self.transform_col_names = transform_col_names
    self.need_run = need_run
    self.need_alignment = need_alignment
Attributes
transform_col_indexes = transform_col_indexes instance-attribute
transform_col_names = transform_col_names instance-attribute
need_run = need_run instance-attribute
need_alignment = need_alignment instance-attribute
Functions
check()
Source code in python/federatedml/param/homo_onehot_encoder_param.py
31
32
33
34
35
36
37
38
def check(self):
    descr = "One-hot encoder with alignment param's"
    self.check_defined_type(self.transform_col_indexes, descr, ['list', 'int'])
    self.check_boolean(self.need_run, descr)
    self.check_boolean(self.need_alignment, descr)

    self.transform_col_names = [] if self.transform_col_names is None else self.transform_col_names
    return True
DataIOParam(input_format='dense', delimitor=',', data_type='float64', exclusive_data_type=None, tag_with_value=False, tag_value_delimitor=':', missing_fill=False, default_value=0, missing_fill_method=None, missing_impute=None, outlier_replace=False, outlier_replace_method=None, outlier_impute=None, outlier_replace_value=0, with_label=False, label_name='y', label_type='int', output_format='dense', need_run=True)

Bases: BaseParam

Define dataio parameters that used in federated ml.

Parameters:

Name Type Description Default
input_format

please have a look at this tutorial at "DataIO" section of federatedml/util/README.md. Formally, dense input format data should be set to "dense", svm-light input format data should be set to "sparse", tag or tag:value input format data should be set to "tag".

'dense'
delimitor str

the delimitor of data input, default: ','

','
data_type

the data type of data input

'float64'
exclusive_data_type dict

the key of dict is col_name, the value is data_type, use to specified special data type of some features.

None
tag_with_value

use if input_format is 'tag', if tag_with_value is True, input column data format should be tag[delimitor]value, otherwise is tag only

False
tag_value_delimitor

use if input_format is 'tag' and 'tag_with_value' is True, delimitor of tag[delimitor]value column value.

':'
missing_fill bool

need to fill missing value or not, accepted only True/False, default: False

False
default_value None or object or list

the value to replace missing value. if None, it will use default value define in federatedml/feature/imputer.py, if single object, will fill missing value with this object, if list, it's length should be the sample of input data' feature dimension, means that if some column happens to have missing values, it will replace it the value by element in the identical position of this list.

0
missing_fill_method

the method to replace missing value

None
missing_impute

element of list can be any type, or auto generated if value is None, define which values to be consider as missing

None
outlier_replace

need to replace outlier value or not, accepted only True/False, default: True

False
outlier_replace_method

the method to replace missing value

None
outlier_impute

element of list can be any type, which values should be regard as missing value, default: None

None
outlier_replace_value None or object or list

the value to replace outlier. if None, it will use default value define in federatedml/feature/imputer.py, if single object, will replace outlier with this object, if list, it's length should be the sample of input data' feature dimension, means that if some column happens to have outliers, it will replace it the value by element in the identical position of this list.

0
with_label bool

True if input data consist of label, False otherwise. default: 'false'

False
label_name str

column_name of the column where label locates, only use in dense-inputformat. default: 'y'

'y'
label_type

use when with_label is True.

'int'
output_format

output format

'dense'
Source code in python/federatedml/param/dataio_param.py
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
def __init__(self, input_format="dense", delimitor=',', data_type='float64',
             exclusive_data_type=None,
             tag_with_value=False, tag_value_delimitor=":",
             missing_fill=False, default_value=0, missing_fill_method=None,
             missing_impute=None, outlier_replace=False, outlier_replace_method=None,
             outlier_impute=None, outlier_replace_value=0,
             with_label=False, label_name='y',
             label_type='int', output_format='dense', need_run=True):
    self.input_format = input_format
    self.delimitor = delimitor
    self.data_type = data_type
    self.exclusive_data_type = exclusive_data_type
    self.tag_with_value = tag_with_value
    self.tag_value_delimitor = tag_value_delimitor
    self.missing_fill = missing_fill
    self.default_value = default_value
    self.missing_fill_method = missing_fill_method
    self.missing_impute = missing_impute
    self.outlier_replace = outlier_replace
    self.outlier_replace_method = outlier_replace_method
    self.outlier_impute = outlier_impute
    self.outlier_replace_value = outlier_replace_value
    self.with_label = with_label
    self.label_name = label_name
    self.label_type = label_type
    self.output_format = output_format
    self.need_run = need_run
Attributes
input_format = input_format instance-attribute
delimitor = delimitor instance-attribute
data_type = data_type instance-attribute
exclusive_data_type = exclusive_data_type instance-attribute
tag_with_value = tag_with_value instance-attribute
tag_value_delimitor = tag_value_delimitor instance-attribute
missing_fill = missing_fill instance-attribute
default_value = default_value instance-attribute
missing_fill_method = missing_fill_method instance-attribute
missing_impute = missing_impute instance-attribute
outlier_replace = outlier_replace instance-attribute
outlier_replace_method = outlier_replace_method instance-attribute
outlier_impute = outlier_impute instance-attribute
outlier_replace_value = outlier_replace_value instance-attribute
with_label = with_label instance-attribute
label_name = label_name instance-attribute
label_type = label_type instance-attribute
output_format = output_format instance-attribute
need_run = need_run instance-attribute
Functions
check()
Source code in python/federatedml/param/dataio_param.py
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
def check(self):

    descr = "dataio param's"

    self.input_format = self.check_and_change_lower(self.input_format,
                                                    ["dense", "sparse", "tag"],
                                                    descr)

    self.output_format = self.check_and_change_lower(self.output_format,
                                                     ["dense", "sparse"],
                                                     descr)

    self.data_type = self.check_and_change_lower(self.data_type,
                                                 ["int", "int64", "float", "float64", "str", "long"],
                                                 descr)

    if type(self.missing_fill).__name__ != 'bool':
        raise ValueError("dataio param's missing_fill {} not supported".format(self.missing_fill))

    if self.missing_fill_method is not None:
        self.missing_fill_method = self.check_and_change_lower(self.missing_fill_method,
                                                               ['min', 'max', 'mean', 'designated'],
                                                               descr)

    if self.outlier_replace_method is not None:
        self.outlier_replace_method = self.check_and_change_lower(self.outlier_replace_method,
                                                                  ['min', 'max', 'mean', 'designated'],
                                                                  descr)

    if type(self.with_label).__name__ != 'bool':
        raise ValueError("dataio param's with_label {} not supported".format(self.with_label))

    if self.with_label:
        if not isinstance(self.label_name, str):
            raise ValueError("dataio param's label_name {} should be str".format(self.label_name))

        self.label_type = self.check_and_change_lower(self.label_type,
                                                      ["int", "int64", "float", "float64", "str", "long"],
                                                      descr)

    if self.exclusive_data_type is not None and not isinstance(self.exclusive_data_type, dict):
        raise ValueError("exclusive_data_type is should be None or a dict")

    return True
DataTransformParam(input_format='dense', delimitor=',', data_type='float64', exclusive_data_type=None, tag_with_value=False, tag_value_delimitor=':', missing_fill=False, default_value=0, missing_fill_method=None, missing_impute=None, outlier_replace=False, outlier_replace_method=None, outlier_impute=None, outlier_replace_value=0, with_label=False, label_name='y', label_type='int', output_format='dense', need_run=True, with_match_id=False, match_id_name='', match_id_index=0)

Bases: BaseParam

Define data transform parameters that used in federated ml.

Parameters:

Name Type Description Default
input_format

please have a look at this tutorial at "DataTransform" section of federatedml/util/README.md. Formally, dense input format data should be set to "dense", svm-light input format data should be set to "sparse", tag or tag:value input format data should be set to "tag". Note: in fate's version >= 1.9.0, this params can be used in uploading/binding data's meta

'dense'
delimitor str

the delimitor of data input, default: ','

','
data_type int

{'float64','float','int','int64','str','long'} the data type of data input

'float64'
exclusive_data_type dict

the key of dict is col_name, the value is data_type, use to specified special data type of some features.

None
tag_with_value

use if input_format is 'tag', if tag_with_value is True, input column data format should be tag[delimitor]value, otherwise is tag only

False
tag_value_delimitor

use if input_format is 'tag' and 'tag_with_value' is True, delimitor of tag[delimitor]value column value.

':'
missing_fill bool

need to fill missing value or not, accepted only True/False, default: False

False
default_value None or object or list

the value to replace missing value. if None, it will use default value define in federatedml/feature/imputer.py, if single object, will fill missing value with this object, if list, it's length should be the sample of input data' feature dimension, means that if some column happens to have missing values, it will replace it the value by element in the identical position of this list.

0
missing_fill_method

the method to replace missing value, should be one of [None, 'min', 'max', 'mean', 'designated']

None
missing_impute

element of list can be any type, or auto generated if value is None, define which values to be consider as missing

None
outlier_replace

need to replace outlier value or not, accepted only True/False, default: True

False
outlier_replace_method

the method to replace missing value, should be one of [None, 'min', 'max', 'mean', 'designated']

None
outlier_impute

element of list can be any type, which values should be regard as missing value

None
outlier_replace_value

the value to replace outlier. if None, it will use default value define in federatedml/feature/imputer.py, if single object, will replace outlier with this object, if list, it's length should be the sample of input data' feature dimension, means that if some column happens to have outliers, it will replace it the value by element in the identical position of this list.

0
with_label bool

True if input data consist of label, False otherwise. default: 'false' Note: in fate's version >= 1.9.0, this params can be used in uploading/binding data's meta

False
label_name str

column_name of the column where label locates, only use in dense-inputformat. default: 'y'

'y'
label_type

use when with_label is True

'int','int64','float','float64','long','str'
output_format

output format

'dense'
with_match_id

True if dataset has match_id, default: False Note: in fate's version >= 1.9.0, this params can be used in uploading/binding data's meta

False
match_id_name

Valid if input_format is "dense", and multiple columns are considered as match_ids, the name of match_id to be used in current job Note: in fate's version >= 1.9.0, this params can be used in uploading/binding data's meta

''
match_id_index

Valid if input_format is "tag" or "sparse", and multiple columns are considered as match_ids, the index of match_id, default: 0 This param works only when data meta has been set with uploading/binding.

0
Source code in python/federatedml/param/data_transform_param.py
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
def __init__(self, input_format="dense", delimitor=',', data_type='float64',
             exclusive_data_type=None,
             tag_with_value=False, tag_value_delimitor=":",
             missing_fill=False, default_value=0, missing_fill_method=None,
             missing_impute=None, outlier_replace=False, outlier_replace_method=None,
             outlier_impute=None, outlier_replace_value=0,
             with_label=False, label_name='y',
             label_type='int', output_format='dense', need_run=True,
             with_match_id=False, match_id_name='', match_id_index=0):
    self.input_format = input_format
    self.delimitor = delimitor
    self.data_type = data_type
    self.exclusive_data_type = exclusive_data_type
    self.tag_with_value = tag_with_value
    self.tag_value_delimitor = tag_value_delimitor
    self.missing_fill = missing_fill
    self.default_value = default_value
    self.missing_fill_method = missing_fill_method
    self.missing_impute = missing_impute
    self.outlier_replace = outlier_replace
    self.outlier_replace_method = outlier_replace_method
    self.outlier_impute = outlier_impute
    self.outlier_replace_value = outlier_replace_value
    self.with_label = with_label
    self.label_name = label_name
    self.label_type = label_type
    self.output_format = output_format
    self.need_run = need_run
    self.with_match_id = with_match_id
    self.match_id_name = match_id_name
    self.match_id_index = match_id_index
Attributes
input_format = input_format instance-attribute
delimitor = delimitor instance-attribute
data_type = data_type instance-attribute
exclusive_data_type = exclusive_data_type instance-attribute
tag_with_value = tag_with_value instance-attribute
tag_value_delimitor = tag_value_delimitor instance-attribute
missing_fill = missing_fill instance-attribute
default_value = default_value instance-attribute
missing_fill_method = missing_fill_method instance-attribute
missing_impute = missing_impute instance-attribute
outlier_replace = outlier_replace instance-attribute
outlier_replace_method = outlier_replace_method instance-attribute
outlier_impute = outlier_impute instance-attribute
outlier_replace_value = outlier_replace_value instance-attribute
with_label = with_label instance-attribute
label_name = label_name instance-attribute
label_type = label_type instance-attribute
output_format = output_format instance-attribute
need_run = need_run instance-attribute
with_match_id = with_match_id instance-attribute
match_id_name = match_id_name instance-attribute
match_id_index = match_id_index instance-attribute
Functions
check()
Source code in python/federatedml/param/data_transform_param.py
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
def check(self):

    descr = "data_transform param's"

    self.input_format = self.check_and_change_lower(self.input_format,
                                                    ["dense", "sparse", "tag"],
                                                    descr)

    self.output_format = self.check_and_change_lower(self.output_format,
                                                     ["dense", "sparse"],
                                                     descr)

    self.data_type = self.check_and_change_lower(self.data_type,
                                                 ["int", "int64", "float", "float64", "str", "long"],
                                                 descr)

    if type(self.missing_fill).__name__ != 'bool':
        raise ValueError("data_transform param's missing_fill {} not supported".format(self.missing_fill))

    if self.missing_fill_method is not None:
        self.missing_fill_method = self.check_and_change_lower(self.missing_fill_method,
                                                               ['min', 'max', 'mean', 'designated'],
                                                               descr)

    if self.outlier_replace_method is not None:
        self.outlier_replace_method = self.check_and_change_lower(self.outlier_replace_method,
                                                                  ['min', 'max', 'mean', 'designated'],
                                                                  descr)

    if type(self.with_label).__name__ != 'bool':
        raise ValueError("data_transform param's with_label {} not supported".format(self.with_label))

    if self.with_label:
        if not isinstance(self.label_name, str):
            raise ValueError("data transform param's label_name {} should be str".format(self.label_name))

        self.label_type = self.check_and_change_lower(self.label_type,
                                                      ["int", "int64", "float", "float64", "str", "long"],
                                                      descr)

    if self.exclusive_data_type is not None and not isinstance(self.exclusive_data_type, dict):
        raise ValueError("exclusive_data_type is should be None or a dict")

    if not isinstance(self.with_match_id, bool):
        raise ValueError("with_match_id should be boolean variable, but {} find".format(self.with_match_id))

    if not isinstance(self.match_id_index, int) or self.match_id_index < 0:
        raise ValueError("match_id_index should be non negative integer")

    if self.match_id_name is not None and not isinstance(self.match_id_name, str):
        raise ValueError("match_id_name should be str")

    return True
FeldmanVerifiableSumParam(sum_cols=None, q_n=6)

Bases: BaseParam

Define how to transfer the cols

Parameters:

Name Type Description Default
sum_cols list of column index, default

Specify which columns need to be sum. If column index is None, each of columns will be sum.

None
q_n int, positive integer less than or equal to 16, default

q_n is the number of significant decimal digit, If the data type is a float, the maximum significant digit is 16. The sum of integer and significant decimal digits should be less than or equal to 16.

6
Source code in python/federatedml/param/feldman_verifiable_sum_param.py
36
37
38
def __init__(self, sum_cols=None, q_n=6):
    self.sum_cols = sum_cols
    self.q_n = q_n
Attributes
sum_cols = sum_cols instance-attribute
q_n = q_n instance-attribute
Functions
check()
Source code in python/federatedml/param/feldman_verifiable_sum_param.py
40
41
42
43
44
45
46
47
48
49
50
51
52
53
def check(self):
    self.sum_cols = [] if self.sum_cols is None else self.sum_cols
    if isinstance(self.sum_cols, list):
        for idx in self.sum_cols:
            if not isinstance(idx, int):
                raise ValueError(f"type mismatch, column_indexes with element {idx}(type is {type(idx)})")

    if not isinstance(self.q_n, int):
        raise ValueError(f"Init param's q_n {self.q_n} not supported, should be int type", type is {type(self.q_n)})

    if self.q_n < 0:
        raise ValueError(f"param's q_n {self.q_n} not supported, should be non-negative int value")
    elif self.q_n > 16:
        raise ValueError(f"param's q_n {self.q_n} not supported, should be less than or equal to 16")
InitParam(init_method='random_uniform', init_const=1, fit_intercept=True, random_seed=None)

Bases: BaseParam

Initialize Parameters used in initializing a model.

Parameters:

Name Type Description Default
init_method

Initial method.

'random_uniform'
init_const int or float, default

Required when init_method is 'const'. Specify the constant.

1
fit_intercept bool, default

Whether to initialize the intercept or not.

True
Source code in python/federatedml/param/init_model_param.py
36
37
38
39
40
41
def __init__(self, init_method='random_uniform', init_const=1, fit_intercept=True, random_seed=None):
    super().__init__()
    self.init_method = init_method
    self.init_const = init_const
    self.fit_intercept = fit_intercept
    self.random_seed = random_seed
Attributes
init_method = init_method instance-attribute
init_const = init_const instance-attribute
fit_intercept = fit_intercept instance-attribute
random_seed = random_seed instance-attribute
Functions
check()
Source code in python/federatedml/param/init_model_param.py
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
def check(self):
    if type(self.init_method).__name__ != "str":
        raise ValueError(
            "Init param's init_method {} not supported, should be str type".format(self.init_method))
    else:
        self.init_method = self.init_method.lower()
        if self.init_method not in ['random_uniform', 'random_normal', 'ones', 'zeros', 'const']:
            raise ValueError(
                "Init param's init_method {} not supported, init_method should in 'random_uniform',"
                " 'random_normal' 'ones', 'zeros' or 'const'".format(self.init_method))

    if type(self.init_const).__name__ not in ['int', 'float']:
        raise ValueError(
            "Init param's init_const {} not supported, should be int or float type".format(self.init_const))

    if type(self.fit_intercept).__name__ != 'bool':
        raise ValueError(
            "Init param's fit_intercept {} not supported, should be bool type".format(self.fit_intercept))

    if self.random_seed is not None:
        if type(self.random_seed).__name__ != 'int':
            raise ValueError(
                "Init param's random_seed {} not supported, should be int or float type".format(self.random_seed))

    return True
SecureAddExampleParam(seed=None, partition=1, data_num=1000)

Bases: BaseParam

Source code in python/federatedml/param/secure_add_example_param.py
23
24
25
26
def __init__(self, seed=None, partition=1, data_num=1000):
    self.seed = seed
    self.partition = partition
    self.data_num = data_num
Attributes
seed = seed instance-attribute
partition = partition instance-attribute
data_num = data_num instance-attribute
Functions
check()
Source code in python/federatedml/param/secure_add_example_param.py
28
29
30
31
32
33
34
35
36
def check(self):
    if self.seed is not None and type(self.seed).__name__ != "int":
        raise ValueError("random seed should be None or integers")

    if type(self.partition).__name__ != "int" or self.partition < 1:
        raise ValueError("partition should be an integer large than 0")

    if type(self.data_num).__name__ != "int" or self.data_num < 1:
        raise ValueError("data_num should be an integer large than 0")
StochasticQuasiNewtonParam(update_interval_L=3, memory_M=5, sample_size=5000, random_seed=None)

Bases: BaseParam

Parameters used for stochastic quasi-newton method.

Parameters:

Name Type Description Default
update_interval_L int, default

Set how many iteration to update hess matrix

3
memory_M int, default

Stack size of curvature information, i.e. y_k and s_k in the paper.

5
sample_size int, default

Sample size of data that used to update Hess matrix

5000
Source code in python/federatedml/param/sqn_param.py
37
38
39
40
41
42
def __init__(self, update_interval_L=3, memory_M=5, sample_size=5000, random_seed=None):
    super().__init__()
    self.update_interval_L = update_interval_L
    self.memory_M = memory_M
    self.sample_size = sample_size
    self.random_seed = random_seed
Attributes
update_interval_L = update_interval_L instance-attribute
memory_M = memory_M instance-attribute
sample_size = sample_size instance-attribute
random_seed = random_seed instance-attribute
Functions
check()
Source code in python/federatedml/param/sqn_param.py
44
45
46
47
48
49
50
51
def check(self):
    descr = "hetero sqn param's"
    self.check_positive_integer(self.update_interval_L, descr)
    self.check_positive_integer(self.memory_M, descr)
    self.check_positive_integer(self.sample_size, descr)
    if self.random_seed is not None:
        self.check_positive_integer(self.random_seed, descr)
    return True
EncryptParam(method=consts.PAILLIER, key_length=1024)

Bases: BaseParam

Define encryption method that used in federated ml.

Parameters:

Name Type Description Default
method

If method is 'Paillier', Paillier encryption will be used for federated ml. To use non-encryption version in HomoLR, set this to None. For detail of Paillier encryption, please check out the paper mentioned in README file.

'Paillier'
key_length int, default

Used to specify the length of key in this encryption method.

1024
Source code in python/federatedml/param/encrypt_param.py
38
39
40
41
def __init__(self, method=consts.PAILLIER, key_length=1024):
    super(EncryptParam, self).__init__()
    self.method = method
    self.key_length = key_length
Attributes
method = method instance-attribute
key_length = key_length instance-attribute
Functions
check()
Source code in python/federatedml/param/encrypt_param.py
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
def check(self):
    if self.method is not None and type(self.method).__name__ != "str":
        raise ValueError(
            "encrypt_param's method {} not supported, should be str type".format(
                self.method))
    elif self.method is None:
        pass
    else:
        user_input = self.method.lower()
        if user_input == "paillier":
            self.method = consts.PAILLIER
        elif user_input == consts.ITERATIVEAFFINE.lower() or user_input == consts.RANDOM_ITERATIVEAFFINE:
            LOGGER.warning('Iterative Affine and Random Iterative Affine are not supported in version>=1.7.1 '
                           'due to safety concerns, encrypt method will be reset to Paillier')
            self.method = consts.PAILLIER
        elif user_input == "ipcl":
            self.method = consts.PAILLIER_IPCL
        else:
            raise ValueError(
                "encrypt_param's method {} not supported".format(user_input))

    if type(self.key_length).__name__ != "int":
        raise ValueError(
            "encrypt_param's key_length {} not supported, should be int type".format(self.key_length))
    elif self.key_length <= 0:
        raise ValueError(
            "encrypt_param's key_length must be greater or equal to 1")

    LOGGER.debug("Finish encrypt parameter check!")
    return True
EncryptedModeCalculatorParam(mode='strict', re_encrypted_rate=1)

Bases: BaseParam

Define the encrypted_mode_calulator parameters.

Parameters:

Name Type Description Default
mode

encrypted mode, default: strict

'strict'
re_encrypted_rate

numeric number in [0, 1], use when mode equals to 'balance', default: 1

1
Source code in python/federatedml/param/encrypted_mode_calculation_param.py
35
36
37
def __init__(self, mode="strict", re_encrypted_rate=1):
    self.mode = mode
    self.re_encrypted_rate = re_encrypted_rate
Attributes
mode = mode instance-attribute
re_encrypted_rate = re_encrypted_rate instance-attribute
Functions
check()
Source code in python/federatedml/param/encrypted_mode_calculation_param.py
39
40
41
42
43
44
45
46
47
48
49
50
51
def check(self):
    descr = "encrypted_mode_calculator param"
    self.mode = self.check_and_change_lower(self.mode,
                                            ["strict", "fast", "balance", "confusion_opt", "confusion_opt_balance"],
                                            descr)

    if self.mode != "strict":
        LOGGER.warning("encrypted_mode_calculator will be remove in later version, "
                       "but in current version user can still use it, but it only supports strict mode, "
                       "other mode will be reset to strict for compatibility")
        self.mode = "strict"

    return True
EvaluateParam(eval_type='binary', pos_label=1, need_run=True, metrics=None, run_clustering_arbiter_metric=False, unfold_multi_result=False)

Bases: BaseParam

Define the evaluation method of binary/multiple classification and regression

Parameters:

Name Type Description Default
eval_type

support 'binary' for HomoLR, HeteroLR and Secureboosting, support 'regression' for Secureboosting, 'multi' is not support these version

'binary'
unfold_multi_result bool

unfold multi result and get several one-vs-rest binary classification results

False
pos_label int or float or str

specify positive label type, depend on the data's label. this parameter effective only for 'binary'

1
need_run

Indicate if this module needed to be run

True
Source code in python/federatedml/param/evaluation_param.py
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
def __init__(self, eval_type="binary", pos_label=1, need_run=True, metrics=None,
             run_clustering_arbiter_metric=False, unfold_multi_result=False):
    super().__init__()
    self.eval_type = eval_type
    self.pos_label = pos_label
    self.need_run = need_run
    self.metrics = metrics
    self.unfold_multi_result = unfold_multi_result
    self.run_clustering_arbiter_metric = run_clustering_arbiter_metric

    self.default_metrics = {
        consts.BINARY: consts.ALL_BINARY_METRICS,
        consts.MULTY: consts.ALL_MULTI_METRICS,
        consts.REGRESSION: consts.ALL_REGRESSION_METRICS,
        consts.CLUSTERING: consts.ALL_CLUSTER_METRICS
    }

    self.allowed_metrics = {
        consts.BINARY: consts.ALL_BINARY_METRICS,
        consts.MULTY: consts.ALL_MULTI_METRICS,
        consts.REGRESSION: consts.ALL_REGRESSION_METRICS,
        consts.CLUSTERING: consts.ALL_CLUSTER_METRICS
    }
Attributes
eval_type = eval_type instance-attribute
pos_label = pos_label instance-attribute
need_run = need_run instance-attribute
metrics = metrics instance-attribute
unfold_multi_result = unfold_multi_result instance-attribute
run_clustering_arbiter_metric = run_clustering_arbiter_metric instance-attribute
default_metrics = {consts.BINARY: consts.ALL_BINARY_METRICS, consts.MULTY: consts.ALL_MULTI_METRICS, consts.REGRESSION: consts.ALL_REGRESSION_METRICS, consts.CLUSTERING: consts.ALL_CLUSTER_METRICS} instance-attribute
allowed_metrics = {consts.BINARY: consts.ALL_BINARY_METRICS, consts.MULTY: consts.ALL_MULTI_METRICS, consts.REGRESSION: consts.ALL_REGRESSION_METRICS, consts.CLUSTERING: consts.ALL_CLUSTER_METRICS} instance-attribute
Functions
check()
Source code in python/federatedml/param/evaluation_param.py
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
def check(self):

    descr = "evaluate param's "
    self.eval_type = self.check_and_change_lower(self.eval_type,
                                                 [consts.BINARY, consts.MULTY, consts.REGRESSION,
                                                  consts.CLUSTERING],
                                                 descr)

    if type(self.pos_label).__name__ not in ["str", "float", "int"]:
        raise ValueError(
            "evaluate param's pos_label {} not supported, should be str or float or int type".format(
                self.pos_label))

    if type(self.need_run).__name__ != "bool":
        raise ValueError(
            "evaluate param's need_run {} not supported, should be bool".format(
                self.need_run))

    if self.metrics is None or len(self.metrics) == 0:
        self.metrics = self.default_metrics[self.eval_type]
        LOGGER.warning('use default metric {} for eval type {}'.format(self.metrics, self.eval_type))

    self.check_boolean(self.unfold_multi_result, 'multi_result_unfold')

    self.metrics = self._check_valid_metric(self.metrics)

    return True
check_single_value_default_metric()
Source code in python/federatedml/param/evaluation_param.py
143
144
145
146
147
148
149
150
151
152
153
154
155
def check_single_value_default_metric(self):
    self._use_single_value_default_metrics()

    # in validation strategy, psi f1-score and confusion-mat pr-quantile are not supported in cur version
    if self.metrics is None or len(self.metrics) == 0:
        self.metrics = self.default_metrics[self.eval_type]
        LOGGER.warning('use default metric {} for eval type {}'.format(self.metrics, self.eval_type))

    ban_metric = [consts.PSI, consts.F1_SCORE, consts.CONFUSION_MAT, consts.QUANTILE_PR]
    for metric in self.metrics:
        if metric in ban_metric:
            self.metrics.remove(metric)
    self.check()
KmeansParam(k=5, max_iter=300, tol=0.001, random_stat=None)

Bases: BaseParam

Parameters:

Name Type Description Default
k int, default 5

The number of the centroids to generate. should be larger than 1 and less than 100 in this version

5
max_iter int, default 300.

Maximum number of iterations of the hetero-k-means algorithm to run.

300
tol float, default 0.001.

tol

0.001
random_stat None or int

random seed

None
Source code in python/federatedml/param/hetero_kmeans_param.py
38
39
40
41
42
43
def __init__(self, k=5, max_iter=300, tol=0.001, random_stat=None):
    super(KmeansParam, self).__init__()
    self.k = k
    self.max_iter = max_iter
    self.tol = tol
    self.random_stat = random_stat
Attributes
k = k instance-attribute
max_iter = max_iter instance-attribute
tol = tol instance-attribute
random_stat = random_stat instance-attribute
Functions
check()
Source code in python/federatedml/param/hetero_kmeans_param.py
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
def check(self):
    descr = "Kmeans_param's"

    if not isinstance(self.k, int):
        raise ValueError(
            descr + "k {} not supported, should be int type".format(self.k))
    elif self.k <= 1:
        raise ValueError(
            descr + "k {} not supported, should be larger than 1")
    elif self.k > 100:
        raise ValueError(
            descr + "k {} not supported, should be less than 100 in this version")

    if not isinstance(self.max_iter, int):
        raise ValueError(
            descr + "max_iter not supported, should be int type".format(self.max_iter))
    elif self.max_iter <= 0:
        raise ValueError(
            descr + "max_iter not supported, should be larger than 0".format(self.max_iter))

    if not isinstance(self.tol, (float, int)):
        raise ValueError(
            descr + "tol not supported, should be float type".format(self.tol))
    elif self.tol < 0:
        raise ValueError(
            descr + "tol not supported, should be larger than or equal to 0".format(self.tol))

    if self.random_stat is not None:
        if not isinstance(self.random_stat, int):
            raise ValueError(descr + "random_stat not supported, should be int type".format(self.random_stat))
        elif self.random_stat < 0:
            raise ValueError(
                descr + "random_stat not supported, should be larger than/equal to 0".format(self.random_stat))
PearsonParam(column_names=None, column_indexes=None, cross_parties=True, need_run=True, use_mix_rand=False, calc_local_vif=True)

Bases: BaseParam

param for pearson correlation

Parameters:

Name Type Description Default
column_names list of string

list of column names

None
column_index list of int

list of column index

required
cross_parties bool, default

if True, calculate correlation of columns from both party

True
need_run bool

set False to skip this party

True
use_mix_rand bool, defalut

mix system random and pseudo random for quicker calculation

False
calc_loca_vif bool, default True

calculate VIF for columns in local

required
Source code in python/federatedml/param/pearson_param.py
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
def __init__(
    self,
    column_names=None,
    column_indexes=None,
    cross_parties=True,
    need_run=True,
    use_mix_rand=False,
    calc_local_vif=True,
):
    super().__init__()
    self.column_names = column_names
    self.column_indexes = column_indexes
    self.cross_parties = cross_parties
    self.need_run = need_run
    self.use_mix_rand = use_mix_rand
    self.calc_local_vif = calc_local_vif
Attributes
column_names = column_names instance-attribute
column_indexes = column_indexes instance-attribute
cross_parties = cross_parties instance-attribute
need_run = need_run instance-attribute
use_mix_rand = use_mix_rand instance-attribute
calc_local_vif = calc_local_vif instance-attribute
Functions
check()
Source code in python/federatedml/param/pearson_param.py
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
def check(self):
    if not isinstance(self.use_mix_rand, bool):
        raise ValueError(
            f"use_mix_rand accept bool type only, {type(self.use_mix_rand)} got"
        )
    if self.cross_parties and (not self.need_run):
        raise ValueError(
            f"need_run should be True(which is default) when cross_parties is True."
        )

    self.column_indexes = [] if self.column_indexes is None else self.column_indexes
    self.column_names = [] if self.column_names is None else self.column_names
    if not isinstance(self.column_names, list):
        raise ValueError(
            f"type mismatch, column_names with type {type(self.column_names)}"
        )
    for name in self.column_names:
        if not isinstance(name, str):
            raise ValueError(
                f"type mismatch, column_names with element {name}(type is {type(name)})"
            )

    if isinstance(self.column_indexes, list):
        for idx in self.column_indexes:
            if not isinstance(idx, int):
                raise ValueError(
                    f"type mismatch, column_indexes with element {idx}(type is {type(idx)})"
                )

    if isinstance(self.column_indexes, int) and self.column_indexes != -1:
        raise ValueError(
            f"column_indexes with type int and value {self.column_indexes}(only -1 allowed)"
        )

    if self.need_run:
        if isinstance(self.column_indexes, list) and isinstance(
            self.column_names, list
        ):
            if len(self.column_indexes) == 0 and len(self.column_names) == 0:
                raise ValueError(f"provide at least one column")
PositiveUnlabeledParam(strategy='probability', threshold=0.9)

Bases: BaseParam

Parameters used for positive unlabeled.

strategy: {"probability", "quantity", "proportion", "distribution"} The strategy of converting unlabeled value.

threshold: int or float, default: 0.9 The threshold in labeling strategy.

Source code in python/federatedml/param/positive_unlabeled_param.py
34
35
36
37
def __init__(self, strategy="probability", threshold=0.9):
    super(PositiveUnlabeledParam, self).__init__()
    self.strategy = strategy
    self.threshold = threshold
Attributes
strategy = strategy instance-attribute
threshold = threshold instance-attribute
Functions
check()
Source code in python/federatedml/param/positive_unlabeled_param.py
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
def check(self):
    base_descr = "Positive Unlabeled Param's "
    float_descr = "Probability or Proportion Strategy Param's "
    int_descr = "Quantity Strategy Param's "
    numeric_descr = "Distribution Strategy Param's "

    self.check_valid_value(self.strategy, base_descr,
                           [consts.PROBABILITY, consts.QUANTITY, consts.PROPORTION, consts.DISTRIBUTION])

    self.check_defined_type(self.threshold, base_descr, [consts.INT, consts.FLOAT])

    if self.strategy == consts.PROBABILITY or self.strategy == consts.PROPORTION:
        self.check_decimal_float(self.threshold, float_descr)

    if self.strategy == consts.QUANTITY:
        self.check_positive_integer(self.threshold, int_descr)

    if self.strategy == consts.DISTRIBUTION:
        self.check_positive_number(self.threshold, numeric_descr)

    return True
SampleParam(mode='random', method='downsample', fractions=None, random_state=None, task_type='hetero', need_run=True)

Bases: BaseParam

Define the sample method

Parameters:

Name Type Description Default
mode

specify sample to use, default: 'random'

'random'

fractions: None or float or list if mode equals to random, it should be a float number greater than 0, otherwise a list of elements of pairs like [label_i, sample_rate_i], e.g. [[0, 0.5], [1, 0.8], [2, 0.3]]. default: None

random_state: int, RandomState instance or None, default: None random state

need_run: bool, default True Indicate if this module needed to be run

Source code in python/federatedml/param/sample_param.py
47
48
49
50
51
52
53
54
def __init__(self, mode="random", method="downsample", fractions=None,
             random_state=None, task_type="hetero", need_run=True):
    self.mode = mode
    self.method = method
    self.fractions = fractions
    self.random_state = random_state
    self.task_type = task_type
    self.need_run = need_run
Attributes
mode = mode instance-attribute
method = method instance-attribute
fractions = fractions instance-attribute
random_state = random_state instance-attribute
task_type = task_type instance-attribute
need_run = need_run instance-attribute
Functions
check()
Source code in python/federatedml/param/sample_param.py
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
def check(self):
    descr = "sample param"
    self.mode = self.check_and_change_lower(self.mode,
                                            ["random", "stratified", "exact_by_weight"],
                                            descr)

    self.method = self.check_and_change_lower(self.method,
                                              ["upsample", "downsample"],
                                              descr)

    if self.mode == "stratified" and self.fractions is not None:
        if not isinstance(self.fractions, list):
            raise ValueError("fractions of sample param when using stratified should be list")
        for ele in self.fractions:
            if not isinstance(ele, collections.Container) or len(ele) != 2:
                raise ValueError(
                    "element in fractions of sample param using stratified should be a pair like [label_i, rate_i]")

    return True
ScaleParam(method='standard_scale', mode='normal', scale_col_indexes=-1, scale_names=None, feat_upper=None, feat_lower=None, with_mean=True, with_std=True, need_run=True)

Bases: BaseParam

Define the feature scale parameters.

Parameters:

Name Type Description Default
method

like scale in sklearn, now it support "min_max_scale" and "standard_scale", and will support other scale method soon. Default standard_scale, which will do nothing for scale

"standard_scale"
mode

for mode is "normal", the feat_upper and feat_lower is the normal value like "10" or "3.1" and for "cap", feat_upper and feature_lower will between 0 and 1, which means the percentile of the column. Default "normal"

"normal"
feat_upper int or float or list of int or float

the upper limit in the column. If use list, mode must be "normal", and list length should equal to the number of features to scale. If the scaled value is larger than feat_upper, it will be set to feat_upper

None
feat_lower

the lower limit in the column. If use list, mode must be "normal", and list length should equal to the number of features to scale. If the scaled value is less than feat_lower, it will be set to feat_lower

None
scale_col_indexes

the idx of column in scale_column_idx will be scaled, while the idx of column is not in, it will not be scaled.

-1
scale_names list of string

Specify which columns need to scaled. Each element in the list represent for a column name in header. default: []

None
with_mean bool

used for "standard_scale". Default True.

True
with_std bool

used for "standard_scale". Default True. The standard scale of column x is calculated as : z = (x - u) / s , where u is the mean of the column and s is the standard deviation of the column. if with_mean is False, u will be 0, and if with_std is False, s will be 1.

True
need_run bool

Indicate if this module needed to be run, default True

True
Source code in python/federatedml/param/scale_param.py
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
def __init__(
        self,
        method="standard_scale",
        mode="normal",
        scale_col_indexes=-1,
        scale_names=None,
        feat_upper=None,
        feat_lower=None,
        with_mean=True,
        with_std=True,
        need_run=True):
    super().__init__()
    self.scale_names = [] if scale_names is None else scale_names

    self.method = method
    self.mode = mode
    self.feat_upper = feat_upper
    # LOGGER.debug("self.feat_upper:{}, type:{}".format(self.feat_upper, type(self.feat_upper)))
    self.feat_lower = feat_lower
    self.scale_col_indexes = scale_col_indexes

    self.with_mean = with_mean
    self.with_std = with_std

    self.need_run = need_run
Attributes
scale_names = [] if scale_names is None else scale_names instance-attribute
method = method instance-attribute
mode = mode instance-attribute
feat_upper = feat_upper instance-attribute
feat_lower = feat_lower instance-attribute
scale_col_indexes = scale_col_indexes instance-attribute
with_mean = with_mean instance-attribute
with_std = with_std instance-attribute
need_run = need_run instance-attribute
Functions
check()
Source code in python/federatedml/param/scale_param.py
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
def check(self):
    if self.method is not None:
        descr = "scale param's method"
        self.method = self.check_and_change_lower(self.method,
                                                  [consts.MINMAXSCALE, consts.STANDARDSCALE],
                                                  descr)

    descr = "scale param's mode"
    self.mode = self.check_and_change_lower(self.mode,
                                            [consts.NORMAL, consts.CAP],
                                            descr)
    # LOGGER.debug("self.feat_upper:{}, type:{}".format(self.feat_upper, type(self.feat_upper)))
    # if type(self.feat_upper).__name__ not in ["float", "int"]:
    #     raise ValueError("scale param's feat_upper {} not supported, should be float or int".format(
    #         self.feat_upper))

    if self.scale_col_indexes != -1 and not isinstance(self.scale_col_indexes, list):
        raise ValueError("scale_col_indexes is should be -1 or a list")

    if self.scale_names is None:
        self.scale_names = []
    if not isinstance(self.scale_names, list):
        raise ValueError("scale_names is should be a list of string")
    else:
        for e in self.scale_names:
            if not isinstance(e, str):
                raise ValueError("scale_names is should be a list of string")

    self.check_boolean(self.with_mean, "scale_param with_mean")
    self.check_boolean(self.with_std, "scale_param with_std")
    self.check_boolean(self.need_run, "scale_param need_run")

    LOGGER.debug("Finish scale parameter check!")
    return True
DataSplitParam(random_state=None, test_size=None, train_size=None, validate_size=None, stratified=False, shuffle=True, split_points=None, need_run=True)

Bases: BaseParam

Define data split param that used in data split.

Parameters:

Name Type Description Default
random_state None or int, default

Specify the random state for shuffle.

None
test_size float or int or None, default

Specify test data set size. float value specifies fraction of input data set, int value specifies exact number of data instances

None
train_size float or int or None, default

Specify train data set size. float value specifies fraction of input data set, int value specifies exact number of data instances

None
validate_size float or int or None, default

Specify validate data set size. float value specifies fraction of input data set, int value specifies exact number of data instances

None
stratified bool, default

Define whether sampling should be stratified, according to label value.

False
shuffle bool, default

Define whether do shuffle before splitting or not.

True
split_points None or list, default

Specify the point(s) by which continuous label values are bucketed into bins for stratified split. eg.[0.2] for two bins or [0.1, 1, 3] for 4 bins

None
need_run

Specify whether to run data split

True
Source code in python/federatedml/param/data_split_param.py
52
53
54
55
56
57
58
59
60
61
62
def __init__(self, random_state=None, test_size=None, train_size=None, validate_size=None, stratified=False,
             shuffle=True, split_points=None, need_run=True):
    super(DataSplitParam, self).__init__()
    self.random_state = random_state
    self.test_size = test_size
    self.train_size = train_size
    self.validate_size = validate_size
    self.stratified = stratified
    self.shuffle = shuffle
    self.split_points = split_points
    self.need_run = need_run
Attributes
random_state = random_state instance-attribute
test_size = test_size instance-attribute
train_size = train_size instance-attribute
validate_size = validate_size instance-attribute
stratified = stratified instance-attribute
shuffle = shuffle instance-attribute
split_points = split_points instance-attribute
need_run = need_run instance-attribute
Functions
check()
Source code in python/federatedml/param/data_split_param.py
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
def check(self):
    model_param_descr = "data split param's "
    if self.random_state is not None:
        if not isinstance(self.random_state, int):
            raise ValueError(f"{model_param_descr} random state should be int type")
        BaseParam.check_nonnegative_number(self.random_state, f"{model_param_descr} random_state ")

    if self.test_size is not None:
        BaseParam.check_nonnegative_number(self.test_size, f"{model_param_descr} test_size ")
        if isinstance(self.test_size, float):
            BaseParam.check_decimal_float(self.test_size, f"{model_param_descr} test_size ")
    if self.train_size is not None:
        BaseParam.check_nonnegative_number(self.train_size, f"{model_param_descr} train_size ")
        if isinstance(self.train_size, float):
            BaseParam.check_decimal_float(self.train_size, f"{model_param_descr} train_size ")
    if self.validate_size is not None:
        BaseParam.check_nonnegative_number(self.validate_size, f"{model_param_descr} validate_size ")
        if isinstance(self.validate_size, float):
            BaseParam.check_decimal_float(self.validate_size, f"{model_param_descr} validate_size ")
    # use default size values if none given
    if self.test_size is None and self.train_size is None and self.validate_size is None:
        self.test_size = 0.0
        self.train_size = 0.8
        self.validate_size = 0.2

    BaseParam.check_boolean(self.stratified, f"{model_param_descr} stratified ")
    BaseParam.check_boolean(self.shuffle, f"{model_param_descr} shuffle ")
    BaseParam.check_boolean(self.need_run, f"{model_param_descr} need run ")

    if self.split_points is not None:
        if not isinstance(self.split_points, list):
            raise ValueError(f"{model_param_descr} split_points should be list type")

    LOGGER.debug("Finish data_split parameter check!")
    return True
OneVsRestParam(need_one_vs_rest=False, has_arbiter=True)

Bases: BaseParam

Define the one_vs_rest parameters.

Parameters:

Name Type Description Default
has_arbiter

For some algorithm, may not has arbiter, for instances, secureboost of FATE, for these algorithms, it should be set to false.

True
Source code in python/federatedml/param/one_vs_rest_param.py
35
36
37
38
def __init__(self, need_one_vs_rest=False, has_arbiter=True):
    super().__init__()
    self.need_one_vs_rest = need_one_vs_rest
    self.has_arbiter = has_arbiter
Attributes
need_one_vs_rest = need_one_vs_rest instance-attribute
has_arbiter = has_arbiter instance-attribute
Functions
check()
Source code in python/federatedml/param/one_vs_rest_param.py
40
41
42
43
44
45
46
47
def check(self):
    if type(self.has_arbiter).__name__ != "bool":
        raise ValueError(
            "one_vs_rest param's has_arbiter {} not supported, should be bool type".format(
                self.has_arbiter))

    LOGGER.debug("Finish one_vs_rest parameter check!")
    return True
SampleWeightParam(class_weight=None, sample_weight_name=None, normalize=False, need_run=True)

Bases: BaseParam

Define sample weight parameters

Parameters:

Name Type Description Default
class_weight str or dict, or None, default None

class weight dictionary or class weight computation mode, string value only accepts 'balanced'; If dict provided, key should be class(label), and weight will not be normalize, e.g.: {'0': 1, '1': 2} If both class_weight and sample_weight_name are None, return original input data.

None
sample_weight_name str

name of column which specifies sample weight. feature name of sample weight; if both class_weight and sample_weight_name are None, return original input data

None
normalize bool, default False

whether to normalize sample weight extracted from sample_weight_name column

False
need_run bool, default True

whether to run this module or not

True
Source code in python/federatedml/param/sample_weight_param.py
44
45
46
47
48
def __init__(self, class_weight=None, sample_weight_name=None, normalize=False, need_run=True):
    self.class_weight = class_weight
    self.sample_weight_name = sample_weight_name
    self.normalize = normalize
    self.need_run = need_run
Attributes
class_weight = class_weight instance-attribute
sample_weight_name = sample_weight_name instance-attribute
normalize = normalize instance-attribute
need_run = need_run instance-attribute
Functions
check()
Source code in python/federatedml/param/sample_weight_param.py
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
def check(self):

    descr = "sample weight param's"

    if self.class_weight:
        if not isinstance(self.class_weight, str) and not isinstance(self.class_weight, dict):
            raise ValueError(f"{descr} class_weight must be str, dict, or None.")
        if isinstance(self.class_weight, str):
            self.class_weight = self.check_and_change_lower(self.class_weight,
                                                            [consts.BALANCED],
                                                            f"{descr} class_weight")
        if isinstance(self.class_weight, dict):
            for k, v in self.class_weight.items():
                if v < 0:
                    LOGGER.warning(f"Negative value {v} provided for class {k} as class_weight.")

    if self.sample_weight_name:
        self.check_string(self.sample_weight_name, f"{descr} sample_weight_name")

    self.check_boolean(self.need_run, f"{descr} need_run")

    self.check_boolean(self.normalize, f"{descr} normalize")

    return True
StepwiseParam(score_name='AIC', mode=consts.HETERO, role=consts.GUEST, direction='both', max_step=10, nvmin=2, nvmax=None, need_stepwise=False)

Bases: BaseParam

Define stepwise params

Parameters:

Name Type Description Default
score_name

Specify which model selection criterion to be used

'AIC'
mode

Indicate what mode is current task

consts.HETERO
role

Indicate what role is current party

consts.GUEST
direction

Indicate which direction to go for stepwise. 'forward' means forward selection; 'backward' means elimination; 'both' means possible models of both directions are examined at each step.

'both'
max_step

Specify total number of steps to run before forced stop.

10
nvmin

Specify the min subset size of final model, cannot be lower than 2. When nvmin > 2, the final model size may be smaller than nvmin due to max_step limit.

2
nvmax

Specify the max subset size of final model, 2 <= nvmin <= nvmax. The final model size may be larger than nvmax due to max_step limit.

None
need_stepwise

Indicate if this module needed to be run

False
Source code in python/federatedml/param/stepwise_param.py
50
51
52
53
54
55
56
57
58
59
60
def __init__(self, score_name="AIC", mode=consts.HETERO, role=consts.GUEST, direction="both",
             max_step=10, nvmin=2, nvmax=None, need_stepwise=False):
    super(StepwiseParam, self).__init__()
    self.score_name = score_name
    self.mode = mode
    self.role = role
    self.direction = direction
    self.max_step = max_step
    self.nvmin = nvmin
    self.nvmax = nvmax
    self.need_stepwise = need_stepwise
Attributes
score_name = score_name instance-attribute
mode = mode instance-attribute
role = role instance-attribute
direction = direction instance-attribute
max_step = max_step instance-attribute
nvmin = nvmin instance-attribute
nvmax = nvmax instance-attribute
need_stepwise = need_stepwise instance-attribute
Functions
check()
Source code in python/federatedml/param/stepwise_param.py
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
def check(self):
    model_param_descr = "stepwise param's"
    self.score_name = self.check_and_change_lower(self.score_name, ["aic", "bic"], model_param_descr)
    self.check_valid_value(self.mode, model_param_descr, valid_values=[consts.HOMO, consts.HETERO])
    self.check_valid_value(self.role, model_param_descr, valid_values=[consts.HOST, consts.GUEST, consts.ARBITER])
    self.direction = self.check_and_change_lower(self.direction, ["forward", "backward", "both"], model_param_descr)
    self.check_positive_integer(self.max_step, model_param_descr)
    self.check_positive_integer(self.nvmin, model_param_descr)
    if self.nvmin < 2:
        raise ValueError(model_param_descr + " nvmin must be no less than 2.")
    if self.nvmax is not None:
        self.check_positive_integer(self.nvmax, model_param_descr)
        if self.nvmin > self.nvmax:
            raise ValueError(model_param_descr + " nvmax must be greater than nvmin.")
    self.check_boolean(self.need_stepwise, model_param_descr)
UnionParam(need_run=True, allow_missing=False, keep_duplicate=False)

Bases: BaseParam

Define the union method for combining multiple dTables and keep entries with the same id

Parameters:

Name Type Description Default
need_run

Indicate if this module needed to be run

True
allow_missing

Whether allow mismatch between feature length and header length in the result. Note that empty tables will always be skipped regardless of this param setting.

False
keep_duplicate

Whether to keep entries with duplicated keys. If set to True, a new id will be generated for duplicated entry in the format {id}_{table_name}.

False
Source code in python/federatedml/param/union_param.py
38
39
40
41
42
def __init__(self, need_run=True, allow_missing=False, keep_duplicate=False):
    super().__init__()
    self.need_run = need_run
    self.allow_missing = allow_missing
    self.keep_duplicate = keep_duplicate
Attributes
need_run = need_run instance-attribute
allow_missing = allow_missing instance-attribute
keep_duplicate = keep_duplicate instance-attribute
Functions
check()
Source code in python/federatedml/param/union_param.py
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
def check(self):
    descr = "union param's "

    if type(self.need_run).__name__ != "bool":
        raise ValueError(
            descr + "need_run {} not supported, should be bool".format(
                self.need_run))

    if type(self.allow_missing).__name__ != "bool":
        raise ValueError(
            descr + "allow_missing {} not supported, should be bool".format(
                self.allow_missing))

    if type(self.keep_duplicate).__name__ != "bool":
        raise ValueError(
            descr + "keep_duplicate {} not supported, should be bool".format(
                self.keep_duplicate))

    LOGGER.info("Finish union parameter check!")
    return True
ColumnExpandParam(append_header=None, method='manual', fill_value=consts.FLOAT_ZERO, need_run=True)

Bases: BaseParam

Define method used for expanding column

Parameters:

Name Type Description Default
append_header None or str or List[str], default

Name(s) for appended feature(s). If None is given, module outputs the original input value without any operation.

None
method str, default

If method is 'manual', use user-specified fill_value to fill in new features.

'manual'
fill_value int or float or str or List[int] or List[float] or List[str], default

Used for filling expanded feature columns. If given a list, length of the list must match that of append_header

consts.FLOAT_ZERO
need_run

Indicate if this module needed to be run.

True
Source code in python/federatedml/param/column_expand_param.py
41
42
43
44
45
46
47
def __init__(self, append_header=None, method="manual",
             fill_value=consts.FLOAT_ZERO, need_run=True):
    super(ColumnExpandParam, self).__init__()
    self.append_header = append_header
    self.method = method
    self.fill_value = fill_value
    self.need_run = need_run
Attributes
append_header = append_header instance-attribute
method = method instance-attribute
fill_value = fill_value instance-attribute
need_run = need_run instance-attribute
Functions
check()
Source code in python/federatedml/param/column_expand_param.py
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
def check(self):
    descr = "column_expand param's "
    if not isinstance(self.method, str):
        raise ValueError(f"{descr}method {self.method} not supported, should be str type")
    else:
        user_input = self.method.lower()
        if user_input == "manual":
            self.method = consts.MANUAL
        else:
            raise ValueError(f"{descr} method {user_input} not supported")

    BaseParam.check_boolean(self.need_run, descr=descr)

    self.append_header = [] if self.append_header is None else self.append_header
    if not isinstance(self.append_header, list):
        raise ValueError(f"{descr} append_header must be None or list of str. "
                         f"Received {type(self.append_header)} instead.")
    for feature_name in self.append_header:
        BaseParam.check_string(feature_name, descr + "append_header values")

    if isinstance(self.fill_value, list):
        if len(self.append_header) != len(self.fill_value):
            raise ValueError(
                f"{descr} `fill value` is set to be list, "
                f"and param `append_header` must also be list of the same length.")
    else:
        self.fill_value = [self.fill_value]
    for value in self.fill_value:
        if type(value).__name__ not in ["float", "int", "long", "str"]:
            raise ValueError(
                f"{descr} fill value(s) must be float, int, or str. Received type {type(value)} instead.")

    LOGGER.debug("Finish column expand parameter check!")
    return True
CrossValidationParam(n_splits=5, mode=consts.HETERO, role=consts.GUEST, shuffle=True, random_seed=1, need_cv=False, output_fold_history=True, history_value_type='score')

Bases: BaseParam

Define cross validation params

Parameters:

Name Type Description Default
n_splits

Specify how many splits used in KFold

5
mode

Indicate what mode is current task

consts.HETERO
role

Indicate what role is current party

consts.GUEST
shuffle

Define whether do shuffle before KFold or not.

True
random_seed

Specify the random seed for numpy shuffle

1
need_cv

Indicate if this module needed to be run

False
output_fold_history

Indicate whether to output table of ids used by each fold, else return original input data returned ids are formatted as: {original_id}#fold{fold_num}#{train/validate}

True
history_value_type

Indicate whether to include original instance or predict score in the output fold history, only effective when output_fold_history set to True

'score'
Source code in python/federatedml/param/cross_validation_param.py
52
53
54
55
56
57
58
59
60
61
62
63
def __init__(self, n_splits=5, mode=consts.HETERO, role=consts.GUEST, shuffle=True, random_seed=1,
             need_cv=False, output_fold_history=True, history_value_type="score"):
    super(CrossValidationParam, self).__init__()
    self.n_splits = n_splits
    self.mode = mode
    self.role = role
    self.shuffle = shuffle
    self.random_seed = random_seed
    # self.evaluate_param = copy.deepcopy(evaluate_param)
    self.need_cv = need_cv
    self.output_fold_history = output_fold_history
    self.history_value_type = history_value_type
Attributes
n_splits = n_splits instance-attribute
mode = mode instance-attribute
role = role instance-attribute
shuffle = shuffle instance-attribute
random_seed = random_seed instance-attribute
need_cv = need_cv instance-attribute
output_fold_history = output_fold_history instance-attribute
history_value_type = history_value_type instance-attribute
Functions
check()
Source code in python/federatedml/param/cross_validation_param.py
65
66
67
68
69
70
71
72
73
74
75
def check(self):
    model_param_descr = "cross validation param's "
    self.check_positive_integer(self.n_splits, model_param_descr)
    self.check_valid_value(self.mode, model_param_descr, valid_values=[consts.HOMO, consts.HETERO])
    self.check_valid_value(self.role, model_param_descr, valid_values=[consts.HOST, consts.GUEST, consts.ARBITER])
    self.check_boolean(self.shuffle, model_param_descr)
    self.check_boolean(self.output_fold_history, model_param_descr)
    self.history_value_type = self.check_and_change_lower(
        self.history_value_type, ["instance", "score"], model_param_descr)
    if self.random_seed is not None:
        self.check_positive_integer(self.random_seed, model_param_descr)
ScorecardParam(method='credit', offset=500, factor=20, factor_base=2, upper_limit_ratio=3, lower_limit_value=0, need_run=True)

Bases: BaseParam

Define method used for transforming prediction score to credit score

Parameters:

Name Type Description Default
method

score method, currently only supports "credit"

"credit"
offset int or float, default

score baseline

500
factor int or float, default

scoring step, when odds double, result score increases by this factor

20
factor_base int or float, default

factor base, value ln(factor_base) is used for calculating result score

2
upper_limit_ratio int or float, default

upper bound for odds, credit score upper bound is upper_limit_ratio * offset

3
lower_limit_value int or float, default

lower bound for result score

0
need_run bool, default

Indicate if this module needs to be run.

True
Source code in python/federatedml/param/scorecard_param.py
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
def __init__(
        self,
        method="credit",
        offset=500,
        factor=20,
        factor_base=2,
        upper_limit_ratio=3,
        lower_limit_value=0,
        need_run=True):
    super(ScorecardParam, self).__init__()
    self.method = method
    self.offset = offset
    self.factor = factor
    self.factor_base = factor_base
    self.upper_limit_ratio = upper_limit_ratio
    self.lower_limit_value = lower_limit_value
    self.need_run = need_run
Attributes
method = method instance-attribute
offset = offset instance-attribute
factor = factor instance-attribute
factor_base = factor_base instance-attribute
upper_limit_ratio = upper_limit_ratio instance-attribute
lower_limit_value = lower_limit_value instance-attribute
need_run = need_run instance-attribute
Functions
check()
Source code in python/federatedml/param/scorecard_param.py
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
def check(self):
    descr = "scorecard param"
    if not isinstance(self.method, str):
        raise ValueError(f"{descr}method {self.method} not supported, should be str type")
    else:
        user_input = self.method.lower()
        if user_input == "credit":
            self.method = consts.CREDIT
        else:
            raise ValueError(f"{descr} method {user_input} not supported")

    if type(self.offset).__name__ not in ["int", "long", "float"]:
        raise ValueError(f"{descr} offset must be numeric,"
                         f"received {type(self.offset)} instead.")

    if type(self.factor).__name__ not in ["int", "long", "float"]:
        raise ValueError(f"{descr} factor must be numeric,"
                         f"received {type(self.factor)} instead.")

    if type(self.factor_base).__name__ not in ["int", "long", "float"]:
        raise ValueError(f"{descr} factor_base must be numeric,"
                         f"received {type(self.factor_base)} instead.")

    if type(self.upper_limit_ratio).__name__ not in ["int", "long", "float"]:
        raise ValueError(f"{descr} upper_limit_ratio must be numeric,"
                         f"received {type(self.upper_limit_ratio)} instead.")

    if type(self.lower_limit_value).__name__ not in ["int", "long", "float"]:
        raise ValueError(f"{descr} lower_limit_value must be numeric,"
                         f"received {type(self.lower_limit_value)} instead.")

    BaseParam.check_boolean(self.need_run, descr=descr + "need_run ")

    LOGGER.debug("Finish Scorecard parameter check!")
    return True
LocalBaselineParam(model_name='LogisticRegression', model_opts=None, predict_param=PredictParam(), need_run=True)

Bases: BaseParam

Define the local baseline model param

Parameters:

Name Type Description Default
model_name str

sklearn model used to train on baseline model

'LogisticRegression'
model_opts dict or none, default None

Param to be used as input into baseline model

None
predict_param PredictParam object, default

predict param

PredictParam()
need_run

Indicate if this module needed to be run

True
Source code in python/federatedml/param/local_baseline_param.py
42
43
44
45
46
47
def __init__(self, model_name="LogisticRegression", model_opts=None, predict_param=PredictParam(), need_run=True):
    super(LocalBaselineParam, self).__init__()
    self.model_name = model_name
    self.model_opts = model_opts
    self.predict_param = copy.deepcopy(predict_param)
    self.need_run = need_run
Attributes
model_name = model_name instance-attribute
model_opts = model_opts instance-attribute
predict_param = copy.deepcopy(predict_param) instance-attribute
need_run = need_run instance-attribute
Functions
check()
Source code in python/federatedml/param/local_baseline_param.py
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
def check(self):
    descr = "local baseline param"

    self.model_name = self.check_and_change_lower(self.model_name,
                                                  ["logisticregression"],
                                                  descr)
    self.check_boolean(self.need_run, descr)
    if self.model_opts is not None:
        if not isinstance(self.model_opts, dict):
            raise ValueError(descr + " model_opts must be None or dict.")
    if self.model_opts is None:
        self.model_opts = {}
    self.predict_param.check()

    return True
PredictParam(threshold=0.5)

Bases: BaseParam

Define the predict method of HomoLR, HeteroLR, SecureBoosting

Parameters:

Name Type Description Default
threshold

The threshold use to separate positive and negative class. Normally, it should be (0,1)

0.5
Source code in python/federatedml/param/predict_param.py
36
37
def __init__(self, threshold=0.5):
    self.threshold = threshold
Attributes
threshold = threshold instance-attribute
Functions
check()
Source code in python/federatedml/param/predict_param.py
39
40
41
42
43
44
45
46
def check(self):

    if type(self.threshold).__name__ not in ["float", "int"]:
        raise ValueError("predict param's predict_param {} not supported, should be float or int".format(
            self.threshold))

    LOGGER.debug("Finish predict parameter check!")
    return True
SecureInformationRetrievalParam(security_level=0.5, oblivious_transfer_protocol=consts.OT_HAUCK, commutative_encryption=consts.CE_PH, non_committing_encryption=consts.AES, key_size=consts.DEFAULT_KEY_LENGTH, dh_params=DHParam(), raw_retrieval=False, target_cols=None)

Bases: BaseParam

Parameters:

Name Type Description Default
security_level

security level, should set value in [0, 1] if security_level equals 0.0 means raw data retrieval

0.5
oblivious_transfer_protocol

OT type, only supports OT_Hauck

consts.OT_HAUCK
commutative_encryption

the commutative encryption scheme used

"CommutativeEncryptionPohligHellman"
non_committing_encryption

the non-committing encryption scheme used

"aes"
dh_params

params for Pohlig-Hellman Encryption

DHParam()
key_size

the key length of the commutative cipher; note that this param will be deprecated in future, please specify key_length in PHParam instead.

consts.DEFAULT_KEY_LENGTH
raw_retrieval

perform raw retrieval if raw_retrieval

False
target_cols

target cols to retrieve; any values not retrieved will be marked as "unretrieved", if target_cols is None, label will be retrieved, same behavior as in previous version default None

None
Source code in python/federatedml/param/sir_param.py
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
def __init__(self, security_level=0.5,
             oblivious_transfer_protocol=consts.OT_HAUCK,
             commutative_encryption=consts.CE_PH,
             non_committing_encryption=consts.AES,
             key_size=consts.DEFAULT_KEY_LENGTH,
             dh_params=DHParam(),
             raw_retrieval=False,
             target_cols=None):
    super(SecureInformationRetrievalParam, self).__init__()
    self.security_level = security_level
    self.oblivious_transfer_protocol = oblivious_transfer_protocol
    self.commutative_encryption = commutative_encryption
    self.non_committing_encryption = non_committing_encryption
    self.dh_params = dh_params
    self.key_size = key_size
    self.raw_retrieval = raw_retrieval
    self.target_cols = target_cols
Attributes
security_level = security_level instance-attribute
oblivious_transfer_protocol = oblivious_transfer_protocol instance-attribute
commutative_encryption = commutative_encryption instance-attribute
non_committing_encryption = non_committing_encryption instance-attribute
dh_params = dh_params instance-attribute
key_size = key_size instance-attribute
raw_retrieval = raw_retrieval instance-attribute
target_cols = target_cols instance-attribute
Functions
check()
Source code in python/federatedml/param/sir_param.py
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
def check(self):
    descr = "secure information retrieval param's "
    self.check_decimal_float(self.security_level, descr + "security_level")
    self.oblivious_transfer_protocol = self.check_and_change_lower(self.oblivious_transfer_protocol,
                                                                   [consts.OT_HAUCK.lower()],
                                                                   descr + "oblivious_transfer_protocol")
    self.commutative_encryption = self.check_and_change_lower(self.commutative_encryption,
                                                              [consts.CE_PH.lower()],
                                                              descr + "commutative_encryption")
    self.non_committing_encryption = self.check_and_change_lower(self.non_committing_encryption,
                                                                 [consts.AES.lower()],
                                                                 descr + "non_committing_encryption")
    if self._warn_to_deprecate_param("key_size", descr, "dh_param's key_length"):
        self.dh_params.key_length = self.key_size
    self.dh_params.check()
    if self._warn_to_deprecate_param("raw_retrieval", descr, "dh_param's security_level = 0"):
        self.check_boolean(self.raw_retrieval, descr)

    self.target_cols = [] if self.target_cols is None else self.target_cols
    if not isinstance(self.target_cols, list):
        self.target_cols = [self.target_cols]
    for col in self.target_cols:
        self.check_string(col, descr + "target_cols")
    if len(self.target_cols) == 0:
        LOGGER.warning(f"Both 'target_cols' and 'target_indexes' are empty. Label will be retrieved.")
StatisticsParam(statistics='summary', column_names=None, column_indexes=-1, need_run=True, abnormal_list=None, quantile_error=consts.DEFAULT_RELATIVE_ERROR, bias=True)

Bases: BaseParam

Define statistics params

Parameters:

Name Type Description Default
statistics

Specify the statistic types to be computed. "summary" represents list: [consts.SUM, consts.MEAN, consts.STANDARD_DEVIATION, consts.MEDIAN, consts.MIN, consts.MAX, consts.MISSING_COUNT, consts.SKEWNESS, consts.KURTOSIS]

'summary'
column_names

Specify columns to be used for statistic computation by column names in header

None
column_indexes

Specify columns to be used for statistic computation by column order in header -1 indicates to compute statistics over all columns

-1
bias

If False, the calculations of skewness and kurtosis are corrected for statistical bias.

True
need_run

Indicate whether to run this modules

True
Source code in python/federatedml/param/statistics_param.py
61
62
63
64
65
66
67
68
69
70
71
def __init__(self, statistics="summary", column_names=None,
             column_indexes=-1, need_run=True, abnormal_list=None,
             quantile_error=consts.DEFAULT_RELATIVE_ERROR, bias=True):
    super().__init__()
    self.statistics = statistics
    self.column_names = column_names
    self.column_indexes = column_indexes
    self.abnormal_list = abnormal_list
    self.need_run = need_run
    self.quantile_error = quantile_error
    self.bias = bias
Attributes
LEGAL_STAT = [consts.COUNT, consts.SUM, consts.MEAN, consts.STANDARD_DEVIATION, consts.MEDIAN, consts.MIN, consts.MAX, consts.VARIANCE, consts.COEFFICIENT_OF_VARIATION, consts.MISSING_COUNT, consts.MISSING_RATIO, consts.SKEWNESS, consts.KURTOSIS] instance-attribute class-attribute
BASIC_STAT = [consts.SUM, consts.MEAN, consts.STANDARD_DEVIATION, consts.MEDIAN, consts.MIN, consts.MAX, consts.MISSING_RATIO, consts.MISSING_COUNT, consts.SKEWNESS, consts.KURTOSIS, consts.COEFFICIENT_OF_VARIATION] instance-attribute class-attribute
LEGAL_QUANTILE = re.compile('^(100)|([1-9]?[0-9])%$') instance-attribute class-attribute
statistics = statistics instance-attribute
column_names = column_names instance-attribute
column_indexes = column_indexes instance-attribute
abnormal_list = abnormal_list instance-attribute
need_run = need_run instance-attribute
quantile_error = quantile_error instance-attribute
bias = bias instance-attribute
Functions
find_stat_name_match(stat_name) staticmethod
Source code in python/federatedml/param/statistics_param.py
86
87
88
89
90
@staticmethod
def find_stat_name_match(stat_name):
    if stat_name in StatisticsParam.LEGAL_STAT or StatisticsParam.LEGAL_QUANTILE.match(stat_name):
        return True
    return False
check()
Source code in python/federatedml/param/statistics_param.py
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
def check(self):
    model_param_descr = "Statistics's param statistics"
    BaseParam.check_boolean(self.need_run, model_param_descr)
    statistics = copy.copy(self.BASIC_STAT)
    if not isinstance(self.statistics, list):
        if self.statistics in [consts.SUMMARY]:
            self.statistics = statistics
        else:
            if self.statistics not in statistics:
                statistics.append(self.statistics)
            self.statistics = statistics
    else:
        for s in self.statistics:
            if s not in statistics:
                statistics.append(s)
        self.statistics = statistics

    for stat_name in self.statistics:
        match_found = StatisticsParam.find_stat_name_match(stat_name)
        if not match_found:
            raise ValueError(f"Illegal statistics name provided: {stat_name}.")

    self.column_names = [] if self.column_names is None else self.column_names
    self.column_indexes = [] if self.column_indexes is None else self.column_indexes
    self.abnormal_list = [] if self.abnormal_list is None else self.abnormal_list
    model_param_descr = "Statistics's param column_names"
    if not isinstance(self.column_names, list):
        raise ValueError(f"column_names should be list of string.")
    for col_name in self.column_names:
        BaseParam.check_string(col_name, model_param_descr)

    model_param_descr = "Statistics's param column_indexes"
    if not isinstance(self.column_indexes, list) and self.column_indexes != -1:
        raise ValueError(f"column_indexes should be list of int or -1.")
    if self.column_indexes != -1:
        for col_index in self.column_indexes:
            if not isinstance(col_index, int):
                raise ValueError(f"{model_param_descr} should be int or list of int")
            if col_index < -consts.FLOAT_ZERO:
                raise ValueError(f"{model_param_descr} should be non-negative int value(s)")

    if not isinstance(self.abnormal_list, list):
        raise ValueError(f"abnormal_list should be list of int or string.")

    self.check_decimal_float(self.quantile_error, "Statistics's param quantile_error ")
    self.check_boolean(self.bias, "Statistics's param bias ")
    return True
EncodeParam(salt='', encode_method='none', base64=False)

Bases: BaseParam

Define the hash method for raw intersect method

Parameters:

Name Type Description Default
salt

the src id will be str = str + salt, default by empty string

''
encode_method

the hash method of src id, support md5, sha1, sha224, sha256, sha384, sha512, sm3, default by None

'none'
base64

if True, the result of hash will be changed to base64, default by False

False
Source code in python/federatedml/param/intersect_param.py
43
44
45
46
47
def __init__(self, salt='', encode_method='none', base64=False):
    super().__init__()
    self.salt = salt
    self.encode_method = encode_method
    self.base64 = base64
Attributes
salt = salt instance-attribute
encode_method = encode_method instance-attribute
base64 = base64 instance-attribute
Functions
check()
Source code in python/federatedml/param/intersect_param.py
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
def check(self):
    if type(self.salt).__name__ != "str":
        raise ValueError(
            "encode param's salt {} not supported, should be str type".format(
                self.salt))

    descr = "encode param's "

    self.encode_method = self.check_and_change_lower(self.encode_method,
                                                     ["none", consts.MD5, consts.SHA1, consts.SHA224,
                                                      consts.SHA256, consts.SHA384, consts.SHA512,
                                                      consts.SM3],
                                                     descr)

    if type(self.base64).__name__ != "bool":
        raise ValueError(
            "hash param's base64 {} not supported, should be bool type".format(self.base64))

    LOGGER.debug("Finish EncodeParam check!")
    LOGGER.warning(f"'EncodeParam' will be replaced by 'RAWParam' in future release."
                   f"Please do not rely on current param naming in application.")
    return True
PoissonParam(penalty='L2', tol=0.0001, alpha=1.0, optimizer='rmsprop', batch_size=-1, learning_rate=0.01, init_param=InitParam(), max_iter=20, early_stop='diff', exposure_colname=None, encrypt_param=EncryptParam(), encrypted_mode_calculator_param=EncryptedModeCalculatorParam(), cv_param=CrossValidationParam(), stepwise_param=StepwiseParam(), decay=1, decay_sqrt=True, validation_freqs=None, early_stopping_rounds=None, metrics=None, use_first_metric_only=False, floating_point_precision=23, callback_param=CallbackParam())

Bases: LinearModelParam

Parameters used for Poisson Regression.

Parameters:

Name Type Description Default
penalty

Penalty method used in Poisson. Please note that, when using encrypted version in HeteroPoisson, 'L1' is not supported.

'L2'
tol float, default

The tolerance of convergence

0.0001
alpha float, default

Regularization strength coefficient.

1.0
optimizer

Optimize method

'rmsprop'
batch_size int, default

Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.

-1
learning_rate float, default

Learning rate

0.01
max_iter int, default

The maximum iteration for training.

20
init_param

Init param method object.

InitParam()
early_stop str, 'weight_diff', 'diff' or 'abs', default

Method used to judge convergence. a) diff: Use difference of loss between two iterations to judge whether converge. b) weight_diff: Use difference between weights of two consecutive iterations c) abs: Use the absolute value of loss to judge whether converge. i.e. if loss < eps, it is converged.

'diff'
exposure_colname

Name of optional exposure variable in dTable.

None
encrypt_param

encrypt param

EncryptParam()
encrypted_mode_calculator_param

encrypted mode calculator param

EncryptedModeCalculatorParam()
cv_param

cv param

CrossValidationParam()
stepwise_param

stepwise param

StepwiseParam()
decay

Decay rate for learning rate. learning rate will follow the following decay schedule. lr = lr0/(1+decay*t) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decay*t) where t is the iter number.

1
decay_sqrt

lr = lr0/(1+decay*t) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decay*t)

True
validation_freqs

validation frequency during training, required when using early stopping. The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to speed up training by skipping validation rounds. When it is larger than 1, a number which is divisible by "max_iter" is recommended, otherwise, you will miss the validation scores of the last training iteration.

None
early_stopping_rounds

If positive number specified, at every specified training rounds, program checks for early stopping criteria. Validation_freqs must also be set when using early stopping.

None
metrics

Specify which metrics to be used when performing evaluation during training process. If metrics have not improved at early_stopping rounds, trianing stops before convergence. If set as empty, default metrics will be used. For regression tasks, default metrics are ['root_mean_squared_error', 'mean_absolute_error']

None
use_first_metric_only

Indicate whether to use the first metric in metrics as the only criterion for early stopping judgement.

False
floating_point_precision

if not None, use floating_point_precision-bit to speed up calculation, e.g.: convert an x to round(x * 2**floating_point_precision) during Paillier operation, divide the result by 2**floating_point_precision in the end.

23
callback_param

callback param

CallbackParam()
Source code in python/federatedml/param/poisson_regression_param.py
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
def __init__(self, penalty='L2',
             tol=1e-4, alpha=1.0, optimizer='rmsprop',
             batch_size=-1, learning_rate=0.01, init_param=InitParam(),
             max_iter=20, early_stop='diff',
             exposure_colname=None,
             encrypt_param=EncryptParam(),
             encrypted_mode_calculator_param=EncryptedModeCalculatorParam(),
             cv_param=CrossValidationParam(), stepwise_param=StepwiseParam(),
             decay=1, decay_sqrt=True,
             validation_freqs=None, early_stopping_rounds=None, metrics=None, use_first_metric_only=False,
             floating_point_precision=23, callback_param=CallbackParam()):
    super(PoissonParam, self).__init__(penalty=penalty, tol=tol, alpha=alpha, optimizer=optimizer,
                                       batch_size=batch_size, learning_rate=learning_rate,
                                       init_param=init_param, max_iter=max_iter,
                                       early_stop=early_stop, cv_param=cv_param, decay=decay,
                                       decay_sqrt=decay_sqrt, validation_freqs=validation_freqs,
                                       early_stopping_rounds=early_stopping_rounds, metrics=metrics,
                                       floating_point_precision=floating_point_precision,
                                       encrypt_param=encrypt_param,
                                       use_first_metric_only=use_first_metric_only,
                                       stepwise_param=stepwise_param,
                                       callback_param=callback_param)
    self.encrypted_mode_calculator_param = copy.deepcopy(encrypted_mode_calculator_param)
    self.exposure_colname = exposure_colname
Attributes
encrypted_mode_calculator_param = copy.deepcopy(encrypted_mode_calculator_param) instance-attribute
exposure_colname = exposure_colname instance-attribute
Functions
check()
Source code in python/federatedml/param/poisson_regression_param.py
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
def check(self):
    descr = "poisson_regression_param's "
    super(PoissonParam, self).check()
    if self.encrypt_param.method != consts.PAILLIER:
        raise ValueError(
            descr + "encrypt method supports 'Paillier' only")
    if self.optimizer not in ['sgd', 'rmsprop', 'adam', 'adagrad']:
        raise ValueError(
            descr + "optimizer not supported, optimizer should be"
                    " 'sgd', 'rmsprop', 'adam', or 'adagrad'")
    if self.exposure_colname is not None:
        if type(self.exposure_colname).__name__ != "str":
            raise ValueError(
                descr + "exposure_colname {} not supported, should be string type".format(self.exposure_colname))
    self.encrypted_mode_calculator_param.check()
    return True
LinearParam(penalty='L2', tol=0.0001, alpha=1.0, optimizer='sgd', batch_size=-1, learning_rate=0.01, init_param=InitParam(), max_iter=20, early_stop='diff', encrypt_param=EncryptParam(), sqn_param=StochasticQuasiNewtonParam(), encrypted_mode_calculator_param=EncryptedModeCalculatorParam(), cv_param=CrossValidationParam(), decay=1, decay_sqrt=True, validation_freqs=None, early_stopping_rounds=None, stepwise_param=StepwiseParam(), metrics=None, use_first_metric_only=False, floating_point_precision=23, callback_param=CallbackParam())

Bases: LinearModelParam

Parameters used for Linear Regression.

Parameters:

Name Type Description Default
penalty

Penalty method used in LinR. Please note that, when using encrypted version in HeteroLinR, 'L1' is not supported. When using Homo-LR, 'L1' is not supported

'L2' or 'L1'
tol float, default

The tolerance of convergence

0.0001
alpha float, default

Regularization strength coefficient.

1.0
optimizer

Optimize method

'sgd'
batch_size int, default

Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.

-1
learning_rate float, default

Learning rate

0.01
max_iter int, default

The maximum iteration for training.

20
init_param

Init param method object.

InitParam()
early_stop

Method used to judge convergence. a) diff: Use difference of loss between two iterations to judge whether converge. b) abs: Use the absolute value of loss to judge whether converge. i.e. if loss < tol, it is converged. c) weight_diff: Use difference between weights of two consecutive iterations

'diff'
encrypt_param

encrypt param

EncryptParam()
encrypted_mode_calculator_param

encrypted mode calculator param

EncryptedModeCalculatorParam()
cv_param

cv param

CrossValidationParam()
decay

Decay rate for learning rate. learning rate will follow the following decay schedule. lr = lr0/(1+decay*t) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decay*t) where t is the iter number.

1
decay_sqrt

lr = lr0/(1+decay*t) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decay*t)

True
validation_freqs

validation frequency during training, required when using early stopping. The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to speed up training by skipping validation rounds. When it is larger than 1, a number which is divisible by "max_iter" is recommended, otherwise, you will miss the validation scores of the last training iteration.

None
early_stopping_rounds

If positive number specified, at every specified training rounds, program checks for early stopping criteria. Validation_freqs must also be set when using early stopping.

None
metrics

Specify which metrics to be used when performing evaluation during training process. If metrics have not improved at early_stopping rounds, trianing stops before convergence. If set as empty, default metrics will be used. For regression tasks, default metrics are ['root_mean_squared_error', 'mean_absolute_error']

None
use_first_metric_only

Indicate whether to use the first metric in metrics as the only criterion for early stopping judgement.

False
floating_point_precision

if not None, use floating_point_precision-bit to speed up calculation, e.g.: convert an x to round(x * 2**floating_point_precision) during Paillier operation, divide the result by 2**floating_point_precision in the end.

23
callback_param

callback param

CallbackParam()
Source code in python/federatedml/param/linear_regression_param.py
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
def __init__(self, penalty='L2',
             tol=1e-4, alpha=1.0, optimizer='sgd',
             batch_size=-1, learning_rate=0.01, init_param=InitParam(),
             max_iter=20, early_stop='diff',
             encrypt_param=EncryptParam(), sqn_param=StochasticQuasiNewtonParam(),
             encrypted_mode_calculator_param=EncryptedModeCalculatorParam(),
             cv_param=CrossValidationParam(), decay=1, decay_sqrt=True, validation_freqs=None,
             early_stopping_rounds=None, stepwise_param=StepwiseParam(), metrics=None, use_first_metric_only=False,
             floating_point_precision=23, callback_param=CallbackParam()):
    super(LinearParam, self).__init__(penalty=penalty, tol=tol, alpha=alpha, optimizer=optimizer,
                                      batch_size=batch_size, learning_rate=learning_rate,
                                      init_param=init_param, max_iter=max_iter, early_stop=early_stop,
                                      encrypt_param=encrypt_param, cv_param=cv_param, decay=decay,
                                      decay_sqrt=decay_sqrt, validation_freqs=validation_freqs,
                                      early_stopping_rounds=early_stopping_rounds,
                                      stepwise_param=stepwise_param, metrics=metrics,
                                      use_first_metric_only=use_first_metric_only,
                                      floating_point_precision=floating_point_precision,
                                      callback_param=callback_param)
    self.sqn_param = copy.deepcopy(sqn_param)
    self.encrypted_mode_calculator_param = copy.deepcopy(encrypted_mode_calculator_param)
Attributes
sqn_param = copy.deepcopy(sqn_param) instance-attribute
encrypted_mode_calculator_param = copy.deepcopy(encrypted_mode_calculator_param) instance-attribute
Functions
check()
Source code in python/federatedml/param/linear_regression_param.py
116
117
118
119
120
121
122
123
124
125
126
127
def check(self):
    descr = "linear_regression_param's "
    super(LinearParam, self).check()
    if self.optimizer not in ['sgd', 'rmsprop', 'adam', 'adagrad', 'sqn']:
        raise ValueError(
            descr + "optimizer not supported, optimizer should be"
                    " 'sgd', 'rmsprop', 'adam', 'sqn' or 'adagrad'")
    self.sqn_param.check()
    if self.encrypt_param.method != consts.PAILLIER:
        raise ValueError(
            descr + "encrypt method supports 'Paillier' only")
    return True
LogisticParam(penalty='L2', tol=0.0001, alpha=1.0, optimizer='rmsprop', batch_size=-1, shuffle=True, batch_strategy='full', masked_rate=5, learning_rate=0.01, init_param=InitParam(), max_iter=100, early_stop='diff', encrypt_param=EncryptParam(), predict_param=PredictParam(), cv_param=CrossValidationParam(), decay=1, decay_sqrt=True, multi_class='ovr', validation_freqs=None, early_stopping_rounds=None, stepwise_param=StepwiseParam(), floating_point_precision=23, metrics=None, use_first_metric_only=False, callback_param=CallbackParam())

Bases: LinearModelParam

Parameters used for Logistic Regression both for Homo mode or Hetero mode.

Parameters:

Name Type Description Default
penalty

Penalty method used in LR. Please note that, when using encrypted version in HomoLR, 'L1' is not supported.

'L2'
tol float, default

The tolerance of convergence

0.0001
alpha float, default

Regularization strength coefficient.

1.0
optimizer

Optimize method.

'rmsprop'
batch_strategy str

Strategy to generate batch data. a) full: use full data to generate batch_data, batch_nums every iteration is ceil(data_size / batch_size) b) random: select data randomly from full data, batch_num will be 1 every iteration.

'full'
batch_size int, default

Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.

-1
shuffle bool, default

Work only in hetero logistic regression, batch data will be shuffle in every iteration.

True
masked_rate

Use masked data to enhance security of hetero logistic regression

5
learning_rate float, default

Learning rate

0.01
max_iter int, default

The maximum iteration for training.

100
early_stop

Method used to judge converge or not. a) diff: Use difference of loss between two iterations to judge whether converge. b) weight_diff: Use difference between weights of two consecutive iterations c) abs: Use the absolute value of loss to judge whether converge. i.e. if loss < eps, it is converged.

Please note that for hetero-lr multi-host situation, this parameter support "weight_diff" only. In homo-lr, weight_diff is not supported

'diff'
decay

Decay rate for learning rate. learning rate will follow the following decay schedule. lr = lr0/(1+decay*t) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decay*t) where t is the iter number.

1
decay_sqrt

lr = lr0/(1+decay*t) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decay*t)

True
encrypt_param

encrypt param

EncryptParam()
predict_param

predict param

PredictParam()
callback_param

callback param

CallbackParam()
cv_param

cv param

CrossValidationParam()
multi_class

If it is a multi_class task, indicate what strategy to use. Currently, support 'ovr' short for one_vs_rest only.

'ovr'
validation_freqs

validation frequency during training.

None
early_stopping_rounds

Will stop training if one metric doesn’t improve in last early_stopping_round rounds

None
metrics

Indicate when executing evaluation during train process, which metrics will be used. If set as empty, default metrics for specific task type will be used. As for binary classification, default metrics are ['auc', 'ks']

None
use_first_metric_only

Indicate whether use the first metric only for early stopping judgement.

False
floating_point_precision

if not None, use floating_point_precision-bit to speed up calculation, e.g.: convert an x to round(x * 2**floating_point_precision) during Paillier operation, divide the result by 2**floating_point_precision in the end.

23
Source code in python/federatedml/param/logistic_regression_param.py
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
def __init__(self, penalty='L2',
             tol=1e-4, alpha=1.0, optimizer='rmsprop',
             batch_size=-1, shuffle=True, batch_strategy="full", masked_rate=5,
             learning_rate=0.01, init_param=InitParam(),
             max_iter=100, early_stop='diff', encrypt_param=EncryptParam(),
             predict_param=PredictParam(), cv_param=CrossValidationParam(),
             decay=1, decay_sqrt=True,
             multi_class='ovr', validation_freqs=None, early_stopping_rounds=None,
             stepwise_param=StepwiseParam(), floating_point_precision=23,
             metrics=None,
             use_first_metric_only=False,
             callback_param=CallbackParam()
             ):
    super(LogisticParam, self).__init__()
    self.penalty = penalty
    self.tol = tol
    self.alpha = alpha
    self.optimizer = optimizer
    self.batch_size = batch_size
    self.learning_rate = learning_rate
    self.init_param = copy.deepcopy(init_param)
    self.max_iter = max_iter
    self.early_stop = early_stop
    self.encrypt_param = encrypt_param
    self.shuffle = shuffle
    self.batch_strategy = batch_strategy
    self.masked_rate = masked_rate
    self.predict_param = copy.deepcopy(predict_param)
    self.cv_param = copy.deepcopy(cv_param)
    self.decay = decay
    self.decay_sqrt = decay_sqrt
    self.multi_class = multi_class
    self.validation_freqs = validation_freqs
    self.stepwise_param = copy.deepcopy(stepwise_param)
    self.early_stopping_rounds = early_stopping_rounds
    self.metrics = metrics or []
    self.use_first_metric_only = use_first_metric_only
    self.floating_point_precision = floating_point_precision
    self.callback_param = copy.deepcopy(callback_param)
Attributes
penalty = penalty instance-attribute
tol = tol instance-attribute
alpha = alpha instance-attribute
optimizer = optimizer instance-attribute
batch_size = batch_size instance-attribute
learning_rate = learning_rate instance-attribute
init_param = copy.deepcopy(init_param) instance-attribute
max_iter = max_iter instance-attribute
early_stop = early_stop instance-attribute
encrypt_param = encrypt_param instance-attribute
shuffle = shuffle instance-attribute
batch_strategy = batch_strategy instance-attribute
masked_rate = masked_rate instance-attribute
predict_param = copy.deepcopy(predict_param) instance-attribute
cv_param = copy.deepcopy(cv_param) instance-attribute
decay = decay instance-attribute
decay_sqrt = decay_sqrt instance-attribute
multi_class = multi_class instance-attribute
validation_freqs = validation_freqs instance-attribute
stepwise_param = copy.deepcopy(stepwise_param) instance-attribute
early_stopping_rounds = early_stopping_rounds instance-attribute
metrics = metrics or [] instance-attribute
use_first_metric_only = use_first_metric_only instance-attribute
floating_point_precision = floating_point_precision instance-attribute
callback_param = copy.deepcopy(callback_param) instance-attribute
Functions
check()
Source code in python/federatedml/param/logistic_regression_param.py
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
def check(self):
    descr = "logistic_param's"
    super(LogisticParam, self).check()
    self.predict_param.check()
    if self.encrypt_param.method not in [consts.PAILLIER, consts.PAILLIER_IPCL, None]:
        raise ValueError(
            "logistic_param's encrypted method support 'Paillier' or None only")
    self.multi_class = self.check_and_change_lower(
        self.multi_class, ["ovr"], f"{descr}")
    if not isinstance(self.masked_rate, (float, int)) or self.masked_rate < 0:
        raise ValueError(
            "masked rate should be non-negative numeric number")
    if not isinstance(self.batch_strategy, str) or self.batch_strategy.lower() not in ["full", "random"]:
        raise ValueError("batch strategy should be full or random")
    self.batch_strategy = self.batch_strategy.lower()
    if not isinstance(self.shuffle, bool):
        raise ValueError("shuffle should be boolean type")
    return True
ObjectiveParam(objective='cross_entropy', params=None)

Bases: BaseParam

Define objective parameters that used in federated ml.

Parameters:

Name Type Description Default
objective

None in host's config, should be str in guest'config. when task_type is classification, only support 'cross_entropy', other 6 types support in regression task

None
params None or list

should be non empty list when objective is 'tweedie','fair','huber', first element of list shoulf be a float-number large than 0.0 when objective is 'fair', 'huber', first element of list should be a float-number in [1.0, 2.0) when objective is 'tweedie'

None
Source code in python/federatedml/param/boosting_param.py
50
51
52
def __init__(self, objective='cross_entropy', params=None):
    self.objective = objective
    self.params = params
Attributes
objective = objective instance-attribute
params = params instance-attribute
Functions
check(task_type=None)
Source code in python/federatedml/param/boosting_param.py
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
def check(self, task_type=None):
    if self.objective is None:
        return True

    descr = "objective param's"

    LOGGER.debug('check objective {}'.format(self.objective))

    if task_type not in [consts.CLASSIFICATION, consts.REGRESSION]:
        self.objective = self.check_and_change_lower(self.objective,
                                                     ["cross_entropy", "lse", "lae", "huber", "fair",
                                                      "log_cosh", "tweedie"],
                                                     descr)

    if task_type == consts.CLASSIFICATION:
        if self.objective != "cross_entropy":
            raise ValueError("objective param's objective {} not supported".format(self.objective))

    elif task_type == consts.REGRESSION:
        self.objective = self.check_and_change_lower(self.objective,
                                                     ["lse", "lae", "huber", "fair", "log_cosh", "tweedie"],
                                                     descr)

        params = self.params
        if self.objective in ["huber", "fair", "tweedie"]:
            if type(params).__name__ != 'list' or len(params) < 1:
                raise ValueError(
                    "objective param's params {} not supported, should be non-empty list".format(params))

            if type(params[0]).__name__ not in ["float", "int", "long"]:
                raise ValueError("objective param's params[0] {} not supported".format(self.params[0]))

            if self.objective == 'tweedie':
                if params[0] < 1 or params[0] >= 2:
                    raise ValueError("in tweedie regression, objective params[0] should betweend [1, 2)")

            if self.objective == 'fair' or 'huber':
                if params[0] <= 0.0:
                    raise ValueError("in {} regression, objective params[0] should greater than 0.0".format(
                        self.objective))
    return True
FTLParam(alpha=1, tol=1e-06, n_iter_no_change=False, validation_freqs=None, optimizer={'optimizer': 'Adam', 'learning_rate': 0.01}, nn_define={}, epochs=1, intersect_param=IntersectParam(consts.RSA), config_type='keras', batch_size=-1, encrypte_param=EncryptParam(), encrypted_mode_calculator_param=EncryptedModeCalculatorParam(mode='confusion_opt'), predict_param=PredictParam(), mode='plain', communication_efficient=False, local_round=5, callback_param=CallbackParam())

Bases: BaseParam

Parameters:

Name Type Description Default
alpha float

a loss coefficient defined in paper, it defines the importance of alignment loss

1
tol float

loss tolerance

1e-06
n_iter_no_change bool

check loss convergence or not

False
validation_freqs None or positive integer or container object in python

Do validation in training process or Not. if equals None, will not do validation in train process; if equals positive integer, will validate data every validation_freqs epochs passes; if container object in python, will validate data if epochs belong to this container. e.g. validation_freqs = [10, 15], will validate data when epoch equals to 10 and 15. The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to speed up training by skipping validation rounds. When it is larger than 1, a number which is divisible by "epochs" is recommended, otherwise, you will miss the validation scores of last training epoch.

None
optimizer str or dict

optimizer method, accept following types: 1. a string, one of "Adadelta", "Adagrad", "Adam", "Adamax", "Nadam", "RMSprop", "SGD" 2. a dict, with a required key-value pair keyed by "optimizer", with optional key-value pairs such as learning rate. defaults to "SGD"

{'optimizer': 'Adam', 'learning_rate': 0.01}
nn_define dict

a dict represents the structure of neural network, it can be output by tf-keras

{}
epochs int

epochs num

1
intersect_param

define the intersect method

IntersectParam(consts.RSA)
config_type

config type

'tf-keras'
batch_size int

batch size when computing transformed feature embedding, -1 use full data.

-1
encrypte_param

encrypted param

EncryptParam()
encrypted_mode_calculator_param

encrypted mode calculator param:

EncryptedModeCalculatorParam(mode='confusion_opt')
predict_param

predict param

PredictParam()
mode 'plain'
communication_efficient

will use communication efficient or not. when communication efficient is enabled, FTL model will update gradients by several local rounds using intermediate data

False
local_round

local update round when using communication efficient

5
Source code in python/federatedml/param/ftl_param.py
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
def __init__(self, alpha=1, tol=0.000001,
             n_iter_no_change=False, validation_freqs=None, optimizer={'optimizer': 'Adam', 'learning_rate': 0.01},
             nn_define={}, epochs=1, intersect_param=IntersectParam(consts.RSA), config_type='keras', batch_size=-1,
             encrypte_param=EncryptParam(),
             encrypted_mode_calculator_param=EncryptedModeCalculatorParam(mode="confusion_opt"),
             predict_param=PredictParam(), mode='plain', communication_efficient=False,
             local_round=5, callback_param=CallbackParam()):
    """
    Parameters
    ----------
    alpha : float
        a loss coefficient defined in paper, it defines the importance of alignment loss
    tol : float
        loss tolerance
    n_iter_no_change : bool
        check loss convergence or not
    validation_freqs : None or positive integer or container object in python
        Do validation in training process or Not.
        if equals None, will not do validation in train process;
        if equals positive integer, will validate data every validation_freqs epochs passes;
        if container object in python, will validate data if epochs belong to this container.
        e.g. validation_freqs = [10, 15], will validate data when epoch equals to 10 and 15.
        The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to
        speed up training by skipping validation rounds. When it is larger than 1, a number which is
        divisible by "epochs" is recommended, otherwise, you will miss the validation scores
        of last training epoch.
    optimizer : str or dict
        optimizer method, accept following types:
        1. a string, one of "Adadelta", "Adagrad", "Adam", "Adamax", "Nadam", "RMSprop", "SGD"
        2. a dict, with a required key-value pair keyed by "optimizer",
            with optional key-value pairs such as learning rate.
        defaults to "SGD"
    nn_define : dict
        a dict represents the structure of neural network, it can be output by tf-keras
    epochs : int
        epochs num
    intersect_param
        define the intersect method
    config_type : {'tf-keras'}
        config type
    batch_size : int
        batch size when computing transformed feature embedding, -1 use full data.
    encrypte_param
        encrypted param
    encrypted_mode_calculator_param
        encrypted mode calculator param:
    predict_param
        predict param
    mode: {"plain", "encrypted"}
        plain: will not use any encrypt algorithms, data exchanged in plaintext
        encrypted: use paillier to encrypt gradients
    communication_efficient: bool
        will use communication efficient or not. when communication efficient is enabled, FTL model will
        update gradients by several local rounds using intermediate data
    local_round: int
        local update round when using communication efficient
    """

    super(FTLParam, self).__init__()
    self.alpha = alpha
    self.tol = tol
    self.n_iter_no_change = n_iter_no_change
    self.validation_freqs = validation_freqs
    self.optimizer = optimizer
    self.nn_define = nn_define
    self.epochs = epochs
    self.intersect_param = copy.deepcopy(intersect_param)
    self.config_type = config_type
    self.batch_size = batch_size
    self.encrypted_mode_calculator_param = copy.deepcopy(encrypted_mode_calculator_param)
    self.encrypt_param = copy.deepcopy(encrypte_param)
    self.predict_param = copy.deepcopy(predict_param)
    self.mode = mode
    self.communication_efficient = communication_efficient
    self.local_round = local_round
    self.callback_param = copy.deepcopy(callback_param)
Attributes
alpha = alpha instance-attribute
tol = tol instance-attribute
n_iter_no_change = n_iter_no_change instance-attribute
validation_freqs = validation_freqs instance-attribute
optimizer = optimizer instance-attribute
nn_define = nn_define instance-attribute
epochs = epochs instance-attribute
intersect_param = copy.deepcopy(intersect_param) instance-attribute
config_type = config_type instance-attribute
batch_size = batch_size instance-attribute
encrypted_mode_calculator_param = copy.deepcopy(encrypted_mode_calculator_param) instance-attribute
encrypt_param = copy.deepcopy(encrypte_param) instance-attribute
predict_param = copy.deepcopy(predict_param) instance-attribute
mode = mode instance-attribute
communication_efficient = communication_efficient instance-attribute
local_round = local_round instance-attribute
callback_param = copy.deepcopy(callback_param) instance-attribute
Functions
check()
Source code in python/federatedml/param/ftl_param.py
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
def check(self):
    self.intersect_param.check()
    self.encrypt_param.check()
    self.encrypted_mode_calculator_param.check()

    self.optimizer = self._parse_optimizer(self.optimizer)

    supported_config_type = ["keras"]
    if self.config_type not in supported_config_type:
        raise ValueError(f"config_type should be one of {supported_config_type}")

    if not isinstance(self.tol, (int, float)):
        raise ValueError("tol should be numeric")

    if not isinstance(self.epochs, int) or self.epochs <= 0:
        raise ValueError("epochs should be a positive integer")

    if self.nn_define and not isinstance(self.nn_define, dict):
        raise ValueError("bottom_nn_define should be a dict defining the structure of neural network")

    if self.batch_size != -1:
        if not isinstance(self.batch_size, int) \
                or self.batch_size < consts.MIN_BATCH_SIZE:
            raise ValueError(
                " {} not supported, should be larger than 10 or -1 represent for all data".format(self.batch_size))

    for p in deprecated_param_list:
        # if self._warn_to_deprecate_param(p, "", ""):
        if self._deprecated_params_set.get(p):
            if "callback_param" in self.get_user_feeded():
                raise ValueError(f"{p} and callback param should not be set simultaneously,"
                                 f"{self._deprecated_params_set}, {self.get_user_feeded()}")
            else:
                self.callback_param.callbacks = ["PerformanceEvaluate"]
            break

    descr = "ftl's"

    if self._warn_to_deprecate_param("validation_freqs", descr, "callback_param's 'validation_freqs'"):
        self.callback_param.validation_freqs = self.validation_freqs

    if self._warn_to_deprecate_param("metrics", descr, "callback_param's 'metrics'"):
        self.callback_param.metrics = self.metrics

    if self.validation_freqs is None:
        pass
    elif isinstance(self.validation_freqs, int):
        if self.validation_freqs < 1:
            raise ValueError("validation_freqs should be larger than 0 when it's integer")
    elif not isinstance(self.validation_freqs, collections.Container):
        raise ValueError("validation_freqs should be None or positive integer or container")

    assert isinstance(self.communication_efficient, bool), 'communication efficient must be a boolean'
    assert self.mode in [
        'encrypted', 'plain'], 'mode options: encrpyted or plain, but {} is offered'.format(
        self.mode)

    self.check_positive_integer(self.epochs, 'epochs')
    self.check_positive_number(self.alpha, 'alpha')
    self.check_positive_integer(self.local_round, 'local round')
HomoNNParam(trainer=TrainerParam(), dataset=DatasetParam(), torch_seed=100, nn_define=None, loss=None, optimizer=None, ds_config=None)

Bases: BaseParam

Source code in python/federatedml/param/homo_nn_param.py
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
def __init__(self,
             trainer: TrainerParam = TrainerParam(),
             dataset: DatasetParam = DatasetParam(),
             torch_seed: int = 100,
             nn_define: dict = None,
             loss: dict = None,
             optimizer: dict = None,
             ds_config: dict = None
             ):

    super(HomoNNParam, self).__init__()
    self.trainer = trainer
    self.dataset = dataset
    self.torch_seed = torch_seed
    self.nn_define = nn_define
    self.loss = loss
    self.optimizer = optimizer
    self.ds_config = ds_config
Attributes
trainer = trainer instance-attribute
dataset = dataset instance-attribute
torch_seed = torch_seed instance-attribute
nn_define = nn_define instance-attribute
loss = loss instance-attribute
optimizer = optimizer instance-attribute
ds_config = ds_config instance-attribute
Functions
check()
Source code in python/federatedml/param/homo_nn_param.py
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
def check(self):

    assert isinstance(self.trainer, TrainerParam), 'trainer must be a TrainerParam()'
    assert isinstance(self.dataset, DatasetParam), 'dataset must be a DatasetParam()'

    self.trainer.check()
    self.dataset.check()

    # torch seed >= 0
    if isinstance(self.torch_seed, int):
        assert self.torch_seed >= 0, 'torch seed should be an int >=0'
    else:
        raise ValueError('torch seed should be an int >=0')

    if self.nn_define is not None:
        assert isinstance(self.nn_define, dict), 'nn define should be a dict defining model structures'
    if self.loss is not None:
        assert isinstance(self.loss, dict), 'loss parameter should be a loss config dict'
    if self.optimizer is not None:
        assert isinstance(self.optimizer, dict), 'optimizer parameter should be a config dict'
DecisionTreeParam(criterion_method='xgboost', criterion_params=[0.1, 0], max_depth=3, min_sample_split=2, min_impurity_split=0.001, min_leaf_node=1, max_split_nodes=consts.MAX_SPLIT_NODES, feature_importance_type='split', n_iter_no_change=True, tol=0.001, min_child_weight=0, use_missing=False, zero_as_missing=False, deterministic=False)

Bases: BaseParam

Define decision tree parameters that used in federated ml.

Parameters:

Name Type Description Default
criterion_method

the criterion function to use

"xgboost"
criterion_params

should be non empty and elements are float-numbers, if a list is offered, the first one is l2 regularization value, and the second one is l1 regularization value. if a dict is offered, make sure it contains key 'l1', and 'l2'. l1, l2 regularization values are non-negative floats. default: [0.1, 0] or {'l1':0, 'l2':0,1}

[0.1, 0]
max_depth

the max depth of a decision tree, default: 3

3
min_sample_split

least quantity of nodes to split, default: 2

2
min_impurity_split

least gain of a single split need to reach, default: 1e-3

0.001
min_child_weight

sum of hessian needed in child nodes. default is 0

0
min_leaf_node

when samples no more than min_leaf_node, it becomes a leave, default: 1

1
max_split_nodes

we will use no more than max_split_nodes to parallel finding their splits in a batch, for memory consideration. default is 65536

consts.MAX_SPLIT_NODES
feature_importance_type

if is 'split', feature_importances calculate by feature split times, if is 'gain', feature_importances calculate by feature split gain. default: 'split' Due to the safety concern, we adjust training strategy of Hetero-SBT in FATE-1.8, When running Hetero-SBT, this parameter is now abandoned. In Hetero-SBT of FATE-1.8, guest side will compute split, gain of local features, and receive anonymous feature importance results from hosts. Hosts will compute split importance of local features.

'split'
use_missing

use missing value in training process or not.

False
zero_as_missing

regard 0 as missing value or not, will be use only if use_missing=True, default: False

False
deterministic

ensure stability when computing histogram. Set this to true to ensure stable result when using same data and same parameter. But it may slow down computation.

False
Source code in python/federatedml/param/boosting_param.py
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
def __init__(self, criterion_method="xgboost", criterion_params=[0.1, 0], max_depth=3,
             min_sample_split=2, min_impurity_split=1e-3, min_leaf_node=1,
             max_split_nodes=consts.MAX_SPLIT_NODES, feature_importance_type='split',
             n_iter_no_change=True, tol=0.001, min_child_weight=0,
             use_missing=False, zero_as_missing=False, deterministic=False):

    super(DecisionTreeParam, self).__init__()

    self.criterion_method = criterion_method
    self.criterion_params = criterion_params
    self.max_depth = max_depth
    self.min_sample_split = min_sample_split
    self.min_impurity_split = min_impurity_split
    self.min_leaf_node = min_leaf_node
    self.min_child_weight = min_child_weight
    self.max_split_nodes = max_split_nodes
    self.feature_importance_type = feature_importance_type
    self.n_iter_no_change = n_iter_no_change
    self.tol = tol
    self.use_missing = use_missing
    self.zero_as_missing = zero_as_missing
    self.deterministic = deterministic
Attributes
criterion_method = criterion_method instance-attribute
criterion_params = criterion_params instance-attribute
max_depth = max_depth instance-attribute
min_sample_split = min_sample_split instance-attribute
min_impurity_split = min_impurity_split instance-attribute
min_leaf_node = min_leaf_node instance-attribute
min_child_weight = min_child_weight instance-attribute
max_split_nodes = max_split_nodes instance-attribute
feature_importance_type = feature_importance_type instance-attribute
n_iter_no_change = n_iter_no_change instance-attribute
tol = tol instance-attribute
use_missing = use_missing instance-attribute
zero_as_missing = zero_as_missing instance-attribute
deterministic = deterministic instance-attribute
Functions
check()
Source code in python/federatedml/param/boosting_param.py
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
def check(self):
    descr = "decision tree param"

    self.criterion_method = self.check_and_change_lower(self.criterion_method,
                                                        ["xgboost"],
                                                        descr)

    if len(self.criterion_params) == 0:
        raise ValueError("decisition tree param's criterio_params should be non empty")

    if isinstance(self.criterion_params, list):
        assert len(self.criterion_params) == 2, 'length of criterion_param should be 2: l1, l2 regularization ' \
                                                'values are needed'
        self.check_nonnegative_number(self.criterion_params[0], 'l2 reg value')
        self.check_nonnegative_number(self.criterion_params[1], 'l1 reg value')

    elif isinstance(self.criterion_params, dict):
        assert 'l1' in self.criterion_params and 'l2' in self.criterion_params, 'l1 and l2 keys are needed in ' \
                                                                                'criterion_params dict'
        self.criterion_params = [self.criterion_params['l2'], self.criterion_params['l1']]
    else:
        raise ValueError('criterion_params should be a dict or a list contains l1, l2 reg value')

    if type(self.max_depth).__name__ not in ["int", "long"]:
        raise ValueError("decision tree param's max_depth {} not supported, should be integer".format(
            self.max_depth))

    if self.max_depth < 1:
        raise ValueError("decision tree param's max_depth should be positive integer, no less than 1")

    if type(self.min_sample_split).__name__ not in ["int", "long"]:
        raise ValueError("decision tree param's min_sample_split {} not supported, should be integer".format(
            self.min_sample_split))

    if type(self.min_impurity_split).__name__ not in ["int", "long", "float"]:
        raise ValueError("decision tree param's min_impurity_split {} not supported, should be numeric".format(
            self.min_impurity_split))

    if type(self.min_leaf_node).__name__ not in ["int", "long"]:
        raise ValueError("decision tree param's min_leaf_node {} not supported, should be integer".format(
            self.min_leaf_node))

    if type(self.max_split_nodes).__name__ not in ["int", "long"] or self.max_split_nodes < 1:
        raise ValueError("decision tree param's max_split_nodes {} not supported, " +
                         "should be positive integer between 1 and {}".format(self.max_split_nodes,
                                                                              consts.MAX_SPLIT_NODES))

    if type(self.n_iter_no_change).__name__ != "bool":
        raise ValueError("decision tree param's n_iter_no_change {} not supported, should be bool type".format(
            self.n_iter_no_change))

    if type(self.tol).__name__ not in ["float", "int", "long"]:
        raise ValueError("decision tree param's tol {} not supported, should be numeric".format(self.tol))

    self.feature_importance_type = self.check_and_change_lower(self.feature_importance_type,
                                                               ["split", "gain"],
                                                               descr)
    self.check_nonnegative_number(self.min_child_weight, 'min_child_weight')
    self.check_boolean(self.deterministic, 'deterministic')

    return True
RSAParam(salt='', hash_method='sha256', final_hash_method='sha256', split_calculation=False, random_base_fraction=None, key_length=consts.DEFAULT_KEY_LENGTH, random_bit=DEFAULT_RANDOM_BIT)

Bases: BaseParam

Specify parameters for RSA intersect method

Parameters:

Name Type Description Default
salt

the src id will be str = str + salt, default ''

''
hash_method

the hash method of src id, support sha256, sha384, sha512, sm3, default sha256

'sha256'
final_hash_method

the hash method of result data string, support md5, sha1, sha224, sha256, sha384, sha512, sm3, default sha256

'sha256'
split_calculation

if True, Host & Guest split operations for faster performance, recommended on large data set

False
random_base_fraction

if not None, generate (fraction * public key id count) of r for encryption and reuse generated r; note that value greater than 0.99 will be taken as 1, and value less than 0.01 will be rounded up to 0.01

None
key_length

value >= 1024, bit count of rsa key, default 1024

consts.DEFAULT_KEY_LENGTH
random_bit

it will define the size of blinding factor in rsa algorithm, default 128

DEFAULT_RANDOM_BIT
Source code in python/federatedml/param/intersect_param.py
129
130
131
132
133
134
135
136
137
138
139
def __init__(self, salt='', hash_method='sha256', final_hash_method='sha256',
             split_calculation=False, random_base_fraction=None, key_length=consts.DEFAULT_KEY_LENGTH,
             random_bit=DEFAULT_RANDOM_BIT):
    super().__init__()
    self.salt = salt
    self.hash_method = hash_method
    self.final_hash_method = final_hash_method
    self.split_calculation = split_calculation
    self.random_base_fraction = random_base_fraction
    self.key_length = key_length
    self.random_bit = random_bit
Attributes
salt = salt instance-attribute
hash_method = hash_method instance-attribute
final_hash_method = final_hash_method instance-attribute
split_calculation = split_calculation instance-attribute
random_base_fraction = random_base_fraction instance-attribute
key_length = key_length instance-attribute
random_bit = random_bit instance-attribute
Functions
check()
Source code in python/federatedml/param/intersect_param.py
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
def check(self):
    descr = "rsa param's "
    self.check_string(self.salt, f"{descr}salt")

    self.hash_method = self.check_and_change_lower(self.hash_method,
                                                   [consts.SHA256, consts.SHA384, consts.SHA512, consts.SM3],
                                                   f"{descr}hash_method")

    self.final_hash_method = self.check_and_change_lower(self.final_hash_method,
                                                         [consts.MD5, consts.SHA1, consts.SHA224,
                                                          consts.SHA256, consts.SHA384, consts.SHA512,
                                                          consts.SM3],
                                                         f"{descr}final_hash_method")

    self.check_boolean(self.split_calculation, f"{descr}split_calculation")

    if self.random_base_fraction:
        self.check_positive_number(self.random_base_fraction, descr)
        self.check_decimal_float(self.random_base_fraction, f"{descr}random_base_fraction")

    self.check_positive_integer(self.key_length, f"{descr}key_length")
    if self.key_length < 1024:
        raise ValueError(f"key length must be >= 1024")
    self.check_positive_integer(self.random_bit, f"{descr}random_bit")

    LOGGER.debug("Finish RSAParam parameter check!")
    return True
FeatureBinningParam(method=consts.QUANTILE, compress_thres=consts.DEFAULT_COMPRESS_THRESHOLD, head_size=consts.DEFAULT_HEAD_SIZE, error=consts.DEFAULT_RELATIVE_ERROR, bin_num=consts.G_BIN_NUM, bin_indexes=-1, bin_names=None, adjustment_factor=0.5, transform_param=TransformParam(), local_only=False, category_indexes=None, category_names=None, need_run=True, skip_static=False)

Bases: BaseParam

Define the feature binning method

Parameters:

Name Type Description Default
method str, quantile

Binning method.

consts.QUANTILE
compress_thres

When the number of saved summaries exceed this threshold, it will call its compress function

consts.DEFAULT_COMPRESS_THRESHOLD
head_size

The buffer size to store inserted observations. When head list reach this buffer size, the QuantileSummaries object start to generate summary(or stats) and insert into its sampled list.

consts.DEFAULT_HEAD_SIZE
error

The error of tolerance of binning. The final split point comes from original data, and the rank of this value is close to the exact rank. More precisely, floor((p - 2 * error) * N) <= rank(x) <= ceil((p + 2 * error) * N) where p is the quantile in float, and N is total number of data.

consts.DEFAULT_RELATIVE_ERROR
bin_num

The max bin number for binning

consts.G_BIN_NUM
bin_indexes list of int or int, default

Specify which columns need to be binned. -1 represent for all columns. If you need to indicate specific cols, provide a list of header index instead of -1. Note tha columns specified by bin_indexes and bin_names will be combined.

-1
bin_names list of string, default

Specify which columns need to calculated. Each element in the list represent for a column name in header. Note tha columns specified by bin_indexes and bin_names will be combined.

None
adjustment_factor float, default

the adjustment factor when calculating WOE. This is useful when there is no event or non-event in a bin. Please note that this parameter will NOT take effect for setting in host.

0.5
category_indexes list of int or int, default

Specify which columns are category features. -1 represent for all columns. List of int indicate a set of such features. For category features, bin_obj will take its original values as split_points and treat them as have been binned. If this is not what you expect, please do NOT put it into this parameters. The number of categories should not exceed bin_num set above. Note tha columns specified by category_indexes and category_names will be combined.

None
category_names list of string, default

Use column names to specify category features. Each element in the list represent for a column name in header. Note tha columns specified by category_indexes and category_names will be combined.

None
local_only bool, default

Whether just provide binning method to guest party. If true, host party will do nothing. Warnings: This parameter will be deprecated in future version.

False
transform_param

Define how to transfer the binned data.

TransformParam()
need_run

Indicate if this module needed to be run

True
skip_static

If true, binning will not calculate iv, woe etc. In this case, optimal-binning will not be supported.

False
Source code in python/federatedml/param/feature_binning_param.py
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
def __init__(self, method=consts.QUANTILE,
             compress_thres=consts.DEFAULT_COMPRESS_THRESHOLD,
             head_size=consts.DEFAULT_HEAD_SIZE,
             error=consts.DEFAULT_RELATIVE_ERROR,
             bin_num=consts.G_BIN_NUM, bin_indexes=-1, bin_names=None, adjustment_factor=0.5,
             transform_param=TransformParam(),
             local_only=False,
             category_indexes=None, category_names=None,
             need_run=True, skip_static=False):
    super(FeatureBinningParam, self).__init__()
    self.method = method
    self.compress_thres = compress_thres
    self.head_size = head_size
    self.error = error
    self.adjustment_factor = adjustment_factor
    self.bin_num = bin_num
    self.bin_indexes = bin_indexes
    self.bin_names = bin_names
    self.category_indexes = category_indexes
    self.category_names = category_names
    self.transform_param = copy.deepcopy(transform_param)
    self.need_run = need_run
    self.skip_static = skip_static
    self.local_only = local_only
Attributes
method = method instance-attribute
compress_thres = compress_thres instance-attribute
head_size = head_size instance-attribute
error = error instance-attribute
adjustment_factor = adjustment_factor instance-attribute
bin_num = bin_num instance-attribute
bin_indexes = bin_indexes instance-attribute
bin_names = bin_names instance-attribute
category_indexes = category_indexes instance-attribute
category_names = category_names instance-attribute
transform_param = copy.deepcopy(transform_param) instance-attribute
need_run = need_run instance-attribute
skip_static = skip_static instance-attribute
local_only = local_only instance-attribute
Functions
check()
Source code in python/federatedml/param/feature_binning_param.py
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
def check(self):
    descr = "Binning param's"
    self.check_string(self.method, descr)
    self.method = self.method.lower()
    self.check_positive_integer(self.compress_thres, descr)
    self.check_positive_integer(self.head_size, descr)
    self.check_decimal_float(self.error, descr)
    self.check_positive_integer(self.bin_num, descr)
    if self.bin_indexes != -1:
        self.check_defined_type(self.bin_indexes, descr, ['list', 'RepeatedScalarContainer', "NoneType"])
    self.check_defined_type(self.bin_names, descr, ['list', "NoneType"])
    self.check_defined_type(self.category_indexes, descr, ['list', "NoneType"])
    self.check_defined_type(self.category_names, descr, ['list', "NoneType"])
    self.check_open_unit_interval(self.adjustment_factor, descr)
    self.check_boolean(self.local_only, descr)
HeteroNNParam(task_type='classification', bottom_nn_define=None, top_nn_define=None, interactive_layer_define=None, interactive_layer_lr=0.9, config_type='pytorch', optimizer='SGD', loss=None, epochs=100, batch_size=-1, early_stop='diff', tol=1e-05, seed=100, encrypt_param=EncryptParam(), encrypted_mode_calculator_param=EncryptedModeCalculatorParam(), predict_param=PredictParam(), cv_param=CrossValidationParam(), validation_freqs=None, early_stopping_rounds=None, metrics=None, use_first_metric_only=True, selector_param=SelectorParam(), floating_point_precision=23, callback_param=CallbackParam(), coae_param=CoAEConfuserParam(), dataset=DatasetParam())

Bases: BaseParam

Parameters used for Hetero Neural Network.

Parameters:

Name Type Description Default
task_type 'classification'
bottom_nn_define None
interactive_layer_define None
interactive_layer_lr 0.9
top_nn_define None
optimizer
  1. a string, one of "Adadelta", "Adagrad", "Adam", "Adamax", "Nadam", "RMSprop", "SGD"
  2. a dict, with a required key-value pair keyed by "optimizer", with optional key-value pairs such as learning rate. defaults to "SGD".
'SGD'
loss None
epochs 100
batch_size int, batch size when updating model.

-1 means use all data in a batch. i.e. Not to use mini-batch strategy. defaults to -1.

-1
early_stop str, accept 'diff' only in this version, default

Method used to judge converge or not. a) diff: Use difference of loss between two iterations to judge whether converge.

'diff'
floating_point_precision

e.g.: convert an x to round(x * 2**floating_point_precision) during Paillier operation, divide the result by 2**floating_point_precision in the end.

23
callback_param CallbackParam()
Source code in python/federatedml/param/hetero_nn_param.py
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
def __init__(self,
             task_type='classification',
             bottom_nn_define=None,
             top_nn_define=None,
             interactive_layer_define=None,
             interactive_layer_lr=0.9,
             config_type='pytorch',
             optimizer='SGD',
             loss=None,
             epochs=100,
             batch_size=-1,
             early_stop="diff",
             tol=1e-5,
             seed=100,
             encrypt_param=EncryptParam(),
             encrypted_mode_calculator_param=EncryptedModeCalculatorParam(),
             predict_param=PredictParam(),
             cv_param=CrossValidationParam(),
             validation_freqs=None,
             early_stopping_rounds=None,
             metrics=None,
             use_first_metric_only=True,
             selector_param=SelectorParam(),
             floating_point_precision=23,
             callback_param=CallbackParam(),
             coae_param=CoAEConfuserParam(),
             dataset=DatasetParam()
             ):

    super(HeteroNNParam, self).__init__()

    self.task_type = task_type
    self.bottom_nn_define = bottom_nn_define
    self.interactive_layer_define = interactive_layer_define
    self.interactive_layer_lr = interactive_layer_lr
    self.top_nn_define = top_nn_define
    self.batch_size = batch_size
    self.epochs = epochs
    self.early_stop = early_stop
    self.tol = tol
    self.optimizer = optimizer
    self.loss = loss
    self.validation_freqs = validation_freqs
    self.early_stopping_rounds = early_stopping_rounds
    self.metrics = metrics or []
    self.use_first_metric_only = use_first_metric_only
    self.encrypt_param = copy.deepcopy(encrypt_param)
    self.encrypted_model_calculator_param = encrypted_mode_calculator_param
    self.predict_param = copy.deepcopy(predict_param)
    self.cv_param = copy.deepcopy(cv_param)
    self.selector_param = selector_param
    self.floating_point_precision = floating_point_precision
    self.callback_param = copy.deepcopy(callback_param)
    self.coae_param = coae_param
    self.dataset = dataset
    self.seed = seed
    self.config_type = 'pytorch'  # pytorch only
Attributes
task_type = task_type instance-attribute
bottom_nn_define = bottom_nn_define instance-attribute
interactive_layer_define = interactive_layer_define instance-attribute
interactive_layer_lr = interactive_layer_lr instance-attribute
top_nn_define = top_nn_define instance-attribute
batch_size = batch_size instance-attribute
epochs = epochs instance-attribute
early_stop = early_stop instance-attribute
tol = tol instance-attribute
optimizer = optimizer instance-attribute
loss = loss instance-attribute
validation_freqs = validation_freqs instance-attribute
early_stopping_rounds = early_stopping_rounds instance-attribute
metrics = metrics or [] instance-attribute
use_first_metric_only = use_first_metric_only instance-attribute
encrypt_param = copy.deepcopy(encrypt_param) instance-attribute
encrypted_model_calculator_param = encrypted_mode_calculator_param instance-attribute
predict_param = copy.deepcopy(predict_param) instance-attribute
cv_param = copy.deepcopy(cv_param) instance-attribute
selector_param = selector_param instance-attribute
floating_point_precision = floating_point_precision instance-attribute
callback_param = copy.deepcopy(callback_param) instance-attribute
coae_param = coae_param instance-attribute
dataset = dataset instance-attribute
seed = seed instance-attribute
config_type = 'pytorch' instance-attribute
Functions
check()
Source code in python/federatedml/param/hetero_nn_param.py
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
def check(self):

    assert isinstance(self.dataset, DatasetParam), 'dataset must be a DatasetParam()'

    self.dataset.check()

    self.check_positive_integer(self.seed, 'seed')

    if self.task_type not in ["classification", "regression"]:
        raise ValueError("config_type should be classification or regression")

    if not isinstance(self.tol, (int, float)):
        raise ValueError("tol should be numeric")

    if not isinstance(self.epochs, int) or self.epochs <= 0:
        raise ValueError("epochs should be a positive integer")

    if self.bottom_nn_define and not isinstance(self.bottom_nn_define, dict):
        raise ValueError("bottom_nn_define should be a dict defining the structure of neural network")

    if self.top_nn_define and not isinstance(self.top_nn_define, dict):
        raise ValueError("top_nn_define should be a dict defining the structure of neural network")

    if self.interactive_layer_define is not None and not isinstance(self.interactive_layer_define, dict):
        raise ValueError(
            "the interactive_layer_define should be a dict defining the structure of interactive layer")

    if self.batch_size != -1:
        if not isinstance(self.batch_size, int) \
                or self.batch_size < consts.MIN_BATCH_SIZE:
            raise ValueError(
                " {} not supported, should be larger than 10 or -1 represent for all data".format(self.batch_size))

    if self.early_stop != "diff":
        raise ValueError("early stop should be diff in this version")

    if self.metrics is not None and not isinstance(self.metrics, list):
        raise ValueError("metrics should be a list")

    if self.floating_point_precision is not None and \
            (not isinstance(self.floating_point_precision, int) or
             self.floating_point_precision < 0 or self.floating_point_precision > 63):
        raise ValueError("floating point precision should be null or a integer between 0 and 63")

    self.encrypt_param.check()
    self.encrypted_model_calculator_param.check()
    self.predict_param.check()
    self.selector_param.check()
    self.coae_param.check()

    descr = "hetero nn param's "

    for p in ["early_stopping_rounds", "validation_freqs",
              "use_first_metric_only"]:
        if self._deprecated_params_set.get(p):
            if "callback_param" in self.get_user_feeded():
                raise ValueError(f"{p} and callback param should not be set simultaneously,"
                                 f"{self._deprecated_params_set}, {self.get_user_feeded()}")
            else:
                self.callback_param.callbacks = ["PerformanceEvaluate"]
            break

    if self._warn_to_deprecate_param("validation_freqs", descr, "callback_param's 'validation_freqs'"):
        self.callback_param.validation_freqs = self.validation_freqs

    if self._warn_to_deprecate_param("early_stopping_rounds", descr, "callback_param's 'early_stopping_rounds'"):
        self.callback_param.early_stopping_rounds = self.early_stopping_rounds

    if self._warn_to_deprecate_param("metrics", descr, "callback_param's 'metrics'"):
        if self.metrics:
            self.callback_param.metrics = self.metrics

    if self._warn_to_deprecate_param("use_first_metric_only", descr, "callback_param's 'use_first_metric_only'"):
        self.callback_param.use_first_metric_only = self.use_first_metric_only
BoostingParam(task_type=consts.CLASSIFICATION, objective_param=ObjectiveParam(), learning_rate=0.3, num_trees=5, subsample_feature_rate=1, n_iter_no_change=True, tol=0.0001, bin_num=32, predict_param=PredictParam(), cv_param=CrossValidationParam(), validation_freqs=None, metrics=None, random_seed=100, binning_error=consts.DEFAULT_RELATIVE_ERROR)

Bases: BaseParam

Basic parameter for Boosting Algorithms

Parameters:

Name Type Description Default
task_type

task type

'classification'
objective_param ObjectiveParam Object, default

objective param

ObjectiveParam()
learning_rate float, int or long

the learning rate of secure boost. default: 0.3

0.3
num_trees int or float

the max number of boosting round. default: 5

5
subsample_feature_rate float

a float-number in [0, 1], default: 1.0

1
n_iter_no_change bool

when True and residual error less than tol, tree building process will stop. default: True

True
bin_num

bin number use in quantile. default: 32

32
validation_freqs

Do validation in training process or Not. if equals None, will not do validation in train process; if equals positive integer, will validate data every validation_freqs epochs passes; if container object in python, will validate data if epochs belong to this container. e.g. validation_freqs = [10, 15], will validate data when epoch equals to 10 and 15. Default: None

None
Source code in python/federatedml/param/boosting_param.py
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
def __init__(self, task_type=consts.CLASSIFICATION,
             objective_param=ObjectiveParam(),
             learning_rate=0.3, num_trees=5, subsample_feature_rate=1, n_iter_no_change=True,
             tol=0.0001, bin_num=32,
             predict_param=PredictParam(), cv_param=CrossValidationParam(),
             validation_freqs=None, metrics=None, random_seed=100,
             binning_error=consts.DEFAULT_RELATIVE_ERROR):

    super(BoostingParam, self).__init__()

    self.task_type = task_type
    self.objective_param = copy.deepcopy(objective_param)
    self.learning_rate = learning_rate
    self.num_trees = num_trees
    self.subsample_feature_rate = subsample_feature_rate
    self.n_iter_no_change = n_iter_no_change
    self.tol = tol
    self.bin_num = bin_num
    self.predict_param = copy.deepcopy(predict_param)
    self.cv_param = copy.deepcopy(cv_param)
    self.validation_freqs = validation_freqs
    self.metrics = metrics
    self.random_seed = random_seed
    self.binning_error = binning_error
Attributes
task_type = task_type instance-attribute
objective_param = copy.deepcopy(objective_param) instance-attribute
learning_rate = learning_rate instance-attribute
num_trees = num_trees instance-attribute
subsample_feature_rate = subsample_feature_rate instance-attribute
n_iter_no_change = n_iter_no_change instance-attribute
tol = tol instance-attribute
bin_num = bin_num instance-attribute
predict_param = copy.deepcopy(predict_param) instance-attribute
cv_param = copy.deepcopy(cv_param) instance-attribute
validation_freqs = validation_freqs instance-attribute
metrics = metrics instance-attribute
random_seed = random_seed instance-attribute
binning_error = binning_error instance-attribute
Functions
check()
Source code in python/federatedml/param/boosting_param.py
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
def check(self):

    descr = "boosting tree param's"

    if self.task_type not in [consts.CLASSIFICATION, consts.REGRESSION]:
        raise ValueError("boosting_core tree param's task_type {} not supported, should be {} or {}".format(
            self.task_type, consts.CLASSIFICATION, consts.REGRESSION))

    self.objective_param.check(self.task_type)

    if type(self.learning_rate).__name__ not in ["float", "int", "long"]:
        raise ValueError("boosting_core tree param's learning_rate {} not supported, should be numeric".format(
            self.learning_rate))

    if type(self.subsample_feature_rate).__name__ not in ["float", "int", "long"] or \
            self.subsample_feature_rate < 0 or self.subsample_feature_rate > 1:
        raise ValueError(
            "boosting_core tree param's subsample_feature_rate should be a numeric number between 0 and 1")

    if type(self.n_iter_no_change).__name__ != "bool":
        raise ValueError("boosting_core tree param's n_iter_no_change {} not supported, should be bool type".format(
            self.n_iter_no_change))

    if type(self.tol).__name__ not in ["float", "int", "long"]:
        raise ValueError("boosting_core tree param's tol {} not supported, should be numeric".format(self.tol))

    if type(self.bin_num).__name__ not in ["int", "long"] or self.bin_num < 2:
        raise ValueError(
            "boosting_core tree param's bin_num {} not supported, should be positive integer greater than 1".format(
                self.bin_num))

    if self.validation_freqs is None:
        pass
    elif isinstance(self.validation_freqs, int):
        if self.validation_freqs < 1:
            raise ValueError("validation_freqs should be larger than 0 when it's integer")
    elif not isinstance(self.validation_freqs, collections.Container):
        raise ValueError("validation_freqs should be None or positive integer or container")

    if self.metrics is not None and not isinstance(self.metrics, list):
        raise ValueError("metrics should be a list")

    if self.random_seed is not None:
        assert isinstance(self.random_seed, int) and self.random_seed >= 0, 'random seed must be an integer >= 0'

    self.check_decimal_float(self.binning_error, descr)

    return True
IntersectParam(intersect_method=consts.RSA, random_bit=DEFAULT_RANDOM_BIT, sync_intersect_ids=True, join_role=consts.GUEST, only_output_key=False, with_encode=False, encode_params=EncodeParam(), raw_params=RAWParam(), rsa_params=RSAParam(), dh_params=DHParam(), ecdh_params=ECDHParam(), join_method=consts.INNER_JOIN, new_sample_id=False, sample_id_generator=consts.GUEST, intersect_cache_param=IntersectCache(), run_cache=False, cardinality_only=False, sync_cardinality=False, cardinality_method=consts.ECDH, run_preprocess=False, intersect_preprocess_params=IntersectPreProcessParam(), repeated_id_process=False, repeated_id_owner=consts.GUEST, with_sample_id=False, allow_info_share=False, info_owner=consts.GUEST)

Bases: BaseParam

Define the intersect method

Parameters:

Name Type Description Default
intersect_method str

it supports 'rsa', 'raw', 'dh', 'ecdh', default by 'rsa'

consts.RSA
random_bit

it will define the size of blinding factor in rsa algorithm, default 128 note that this param will be deprecated in future, please use random_bit in RSAParam instead

DEFAULT_RANDOM_BIT
sync_intersect_ids

In rsa, 'sync_intersect_ids' is True means guest or host will send intersect results to the others, and False will not. while in raw, 'sync_intersect_ids' is True means the role of "join_role" will send intersect results and the others will get them. Default by True.

True
join_role

role who joins ids, supports "guest" and "host" only and effective only for raw. If it is "guest", the host will send its ids to guest and find the intersection of ids in guest; if it is "host", the guest will send its ids to host. Default by "guest"; note this param will be deprecated in future version, please use 'join_role' in raw_params instead

consts.GUEST
only_output_key bool

if false, the results of intersection will include key and value which from input data; if true, it will just include key from input data and the value will be empty or filled by uniform string like "intersect_id"

False
with_encode

if True, it will use hash method for intersect ids, effective for raw method only; note that this param will be deprecated in future version, please use 'use_hash' in raw_params; currently if this param is set to True, specification by 'encode_params' will be taken instead of 'raw_params'.

False
encode_params

effective only when with_encode is True; this param will be deprecated in future version, use 'raw_params' in future implementation

EncodeParam()
raw_params

this param is deprecated

RAWParam()
rsa_params

effective for rsa method only, this param is deprecated

RSAParam()
dh_params

effective for dh method only

DHParam()
ecdh_params

effective for ecdh method only

ECDHParam()
join_method

if 'left_join', participants will all include sample_id_generator's (imputed) ids in output, default 'inner_join'

consts.INNER_JOIN
new_sample_id bool

whether to generate new id for sample_id_generator's ids, only effective when join_method is 'left_join' or when input data are instance with match id, default False

False
sample_id_generator

role whose ids are to be kept, effective only when join_method is 'left_join' or when input data are instance with match id, default 'guest'

consts.GUEST
intersect_cache_param

specification for cache generation, with ver1.7 and above, this param is ignored.

IntersectCache()
run_cache bool

whether to store Host's encrypted ids, only valid when intersect method is 'rsa', 'dh', 'ecdh', default False

False
cardinality_only bool

whether to output estimated intersection count(cardinality); if sync_cardinality is True, then sync cardinality count with host(s)

False
cardinality_method

specify which intersect method to use for coutning cardinality, default "ecdh"; note that with "rsa", estimated cardinality will be produced; while "dh" and "ecdh" method output exact cardinality, it only supports single-host task

consts.ECDH
sync_cardinality bool

whether to sync cardinality with all participants, default False, only effective when cardinality_only set to True

False
run_preprocess bool

whether to run preprocess process, default False

False
intersect_preprocess_params

used for preprocessing and cardinality_only mode

IntersectPreProcessParam()
repeated_id_process

if true, intersection will process the ids which can be repeatable; in ver 1.7 and above,repeated id process will be automatically applied to data with instance id, this param will be ignored

False
repeated_id_owner

which role has the repeated id; in ver 1.7 and above, this param is ignored

consts.GUEST
allow_info_share bool

in ver 1.7 and above, this param is ignored

False
info_owner

in ver 1.7 and above, this param is ignored

consts.GUEST
with_sample_id

data with sample id or not, default False; in ver 1.7 and above, this param is ignored

False
Source code in python/federatedml/param/intersect_param.py
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
def __init__(self, intersect_method: str = consts.RSA, random_bit=DEFAULT_RANDOM_BIT, sync_intersect_ids=True,
             join_role=consts.GUEST, only_output_key: bool = False,
             with_encode=False, encode_params=EncodeParam(),
             raw_params=RAWParam(), rsa_params=RSAParam(), dh_params=DHParam(), ecdh_params=ECDHParam(),
             join_method=consts.INNER_JOIN, new_sample_id: bool = False, sample_id_generator=consts.GUEST,
             intersect_cache_param=IntersectCache(), run_cache: bool = False,
             cardinality_only: bool = False, sync_cardinality: bool = False, cardinality_method=consts.ECDH,
             run_preprocess: bool = False,
             intersect_preprocess_params=IntersectPreProcessParam(),
             repeated_id_process=False, repeated_id_owner=consts.GUEST,
             with_sample_id=False, allow_info_share: bool = False, info_owner=consts.GUEST):
    super().__init__()
    self.intersect_method = intersect_method
    self.random_bit = random_bit
    self.sync_intersect_ids = sync_intersect_ids
    self.join_role = join_role
    self.with_encode = with_encode
    self.encode_params = copy.deepcopy(encode_params)
    self.raw_params = copy.deepcopy(raw_params)
    self.rsa_params = copy.deepcopy(rsa_params)
    self.only_output_key = only_output_key
    self.sample_id_generator = sample_id_generator
    self.intersect_cache_param = copy.deepcopy(intersect_cache_param)
    self.run_cache = run_cache
    self.repeated_id_process = repeated_id_process
    self.repeated_id_owner = repeated_id_owner
    self.allow_info_share = allow_info_share
    self.info_owner = info_owner
    self.with_sample_id = with_sample_id
    self.join_method = join_method
    self.new_sample_id = new_sample_id
    self.dh_params = copy.deepcopy(dh_params)
    self.cardinality_only = cardinality_only
    self.sync_cardinality = sync_cardinality
    self.cardinality_method = cardinality_method
    self.run_preprocess = run_preprocess
    self.intersect_preprocess_params = copy.deepcopy(intersect_preprocess_params)
    self.ecdh_params = copy.deepcopy(ecdh_params)
Attributes
intersect_method = intersect_method instance-attribute
random_bit = random_bit instance-attribute
sync_intersect_ids = sync_intersect_ids instance-attribute
join_role = join_role instance-attribute
with_encode = with_encode instance-attribute
encode_params = copy.deepcopy(encode_params) instance-attribute
raw_params = copy.deepcopy(raw_params) instance-attribute
rsa_params = copy.deepcopy(rsa_params) instance-attribute
only_output_key = only_output_key instance-attribute
sample_id_generator = sample_id_generator instance-attribute
intersect_cache_param = copy.deepcopy(intersect_cache_param) instance-attribute
run_cache = run_cache instance-attribute
repeated_id_process = repeated_id_process instance-attribute
repeated_id_owner = repeated_id_owner instance-attribute
allow_info_share = allow_info_share instance-attribute
info_owner = info_owner instance-attribute
with_sample_id = with_sample_id instance-attribute
join_method = join_method instance-attribute
new_sample_id = new_sample_id instance-attribute
dh_params = copy.deepcopy(dh_params) instance-attribute
cardinality_only = cardinality_only instance-attribute
sync_cardinality = sync_cardinality instance-attribute
cardinality_method = cardinality_method instance-attribute
run_preprocess = run_preprocess instance-attribute
intersect_preprocess_params = copy.deepcopy(intersect_preprocess_params) instance-attribute
ecdh_params = copy.deepcopy(ecdh_params) instance-attribute
Functions
check()
Source code in python/federatedml/param/intersect_param.py
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
def check(self):
    descr = "intersect param's "

    if self.intersect_method.lower() == consts.RAW.lower():
        self.intersect_method = consts.ECDH
        LOGGER.warning("Raw intersect method is not supported, it will be replaced by ECDH")

    self.intersect_method = self.check_and_change_lower(self.intersect_method,
                                                        [consts.RSA, consts.RAW, consts.DH, consts.ECDH],
                                                        f"{descr}intersect_method")

    if self._warn_to_deprecate_param("random_bit", descr, "rsa_params' 'random_bit'"):
        if "rsa_params.random_bit" in self.get_user_feeded():
            raise ValueError(f"random_bit and rsa_params.random_bit should not be set simultaneously")
        self.rsa_params.random_bit = self.random_bit

    self.check_boolean(self.sync_intersect_ids, f"{descr}intersect_ids")

    if self._warn_to_deprecate_param("encode_param", "", ""):
        if "raw_params" in self.get_user_feeded():
            raise ValueError(f"encode_param and raw_params should not be set simultaneously")
        else:
            self.callback_param.callbacks = ["PerformanceEvaluate"]

    if self._warn_to_deprecate_param("join_role", descr, "raw_params' 'join_role'"):
        if "raw_params.join_role" in self.get_user_feeded():
            raise ValueError(f"join_role and raw_params.join_role should not be set simultaneously")
        self.raw_params.join_role = self.join_role

    self.check_boolean(self.only_output_key, f"{descr}only_output_key")

    self.join_method = self.check_and_change_lower(self.join_method, [consts.INNER_JOIN, consts.LEFT_JOIN],
                                                   f"{descr}join_method")
    self.check_boolean(self.new_sample_id, f"{descr}new_sample_id")
    self.sample_id_generator = self.check_and_change_lower(self.sample_id_generator,
                                                           [consts.GUEST, consts.HOST],
                                                           f"{descr}sample_id_generator")

    if self.join_method == consts.LEFT_JOIN:
        if not self.sync_intersect_ids:
            raise ValueError(f"Cannot perform left join without sync intersect ids")

    self.check_boolean(self.run_cache, f"{descr} run_cache")

    if self._warn_to_deprecate_param("encode_params", descr, "raw_params") or \
            self._warn_to_deprecate_param("with_encode", descr, "raw_params' 'use_hash'"):
        # self.encode_params.check()
        if "with_encode" in self.get_user_feeded() and "raw_params.use_hash" in self.get_user_feeded():
            raise ValueError(f"'raw_params' and 'encode_params' should not be set simultaneously.")
        if "raw_params" in self.get_user_feeded() and "encode_params" in self.get_user_feeded():
            raise ValueError(f"'raw_params' and 'encode_params' should not be set simultaneously.")
        LOGGER.warning(f"Param values from 'encode_params' will override 'raw_params' settings.")
        self.raw_params.use_hash = self.with_encode
        self.raw_params.hash_method = self.encode_params.encode_method
        self.raw_params.salt = self.encode_params.salt
        self.raw_params.base64 = self.encode_params.base64

    self.raw_params.check()
    self.rsa_params.check()
    self.dh_params.check()
    self.ecdh_params.check()
    self.check_boolean(self.cardinality_only, f"{descr}cardinality_only")
    self.check_boolean(self.sync_cardinality, f"{descr}sync_cardinality")
    self.check_boolean(self.run_preprocess, f"{descr}run_preprocess")
    self.intersect_preprocess_params.check()
    if self.cardinality_only:
        if self.cardinality_method not in [consts.RSA, consts.DH, consts.ECDH]:
            raise ValueError(f"cardinality-only mode only support rsa, dh, ecdh.")
        if self.cardinality_method == consts.RSA and self.rsa_params.split_calculation:
            raise ValueError(f"cardinality-only mode only supports unified calculation.")
    if self.run_preprocess:
        if self.intersect_preprocess_params.false_positive_rate < 0.01:
            raise ValueError(f"for preprocessing ids, false_positive_rate must be no less than 0.01")
        if self.cardinality_only:
            raise ValueError(f"cardinality_only mode cannot run preprocessing.")
    if self.run_cache:
        if self.intersect_method not in [consts.RSA, consts.DH, consts.ECDH]:
            raise ValueError(f"Only rsa, dh, or ecdh method supports cache.")
        if self.intersect_method == consts.RSA and self.rsa_params.split_calculation:
            raise ValueError(f"RSA split_calculation does not support cache.")
        if self.cardinality_only:
            raise ValueError(f"cache is not available for cardinality_only mode.")
        if self.run_preprocess:
            raise ValueError(f"Preprocessing does not support cache.")

    deprecated_param_list = ["repeated_id_process", "repeated_id_owner", "intersect_cache_param",
                             "allow_info_share", "info_owner", "with_sample_id"]
    for param in deprecated_param_list:
        self._warn_deprecated_param(param, descr)

    LOGGER.debug("Finish intersect parameter check!")
    return True
FeatureSelectionParam(select_col_indexes=-1, select_names=None, filter_methods=None, unique_param=UniqueValueParam(), iv_value_param=IVValueSelectionParam(), iv_percentile_param=IVPercentileSelectionParam(), iv_top_k_param=IVTopKParam(), variance_coe_param=VarianceOfCoeSelectionParam(), outlier_param=OutlierColsSelectionParam(), manually_param=ManuallyFilterParam(), percentage_value_param=PercentageValueParam(), iv_param=IVFilterParam(), statistic_param=CommonFilterParam(metrics=consts.MEAN), psi_param=CommonFilterParam(metrics=consts.PSI, take_high=False), vif_param=CommonFilterParam(metrics=consts.VIF, threshold=5.0, take_high=False), sbt_param=CommonFilterParam(metrics=consts.FEATURE_IMPORTANCE), correlation_param=CorrelationFilterParam(), use_anonymous=False, need_run=True)

Bases: BaseParam

Define the feature selection parameters.

Parameters:

Name Type Description Default
select_col_indexes

Specify which columns need to calculated. -1 represent for all columns. Note tha columns specified by select_col_indexes and select_names will be combined.

-1
select_names list of string, default

Specify which columns need to calculated. Each element in the list represent for a column name in header. Note tha columns specified by select_col_indexes and select_names will be combined.

None
filter_methods

“hetero_sbt_filter", "homo_sbt_filter", "hetero_fast_sbt_filter", "percentage_value", "vif_filter", "correlation_filter"], default: ["manually"]. The following methods will be deprecated in future version: "unique_value", "iv_value_thres", "iv_percentile", "coefficient_of_variation_value_thres", "outlier_cols" Specify the filter methods used in feature selection. The orders of filter used is depended on this list. Please be notified that, if a percentile method is used after some certain filter method, the percentile represent for the ratio of rest features. e.g. If you have 10 features at the beginning. After first filter method, you have 8 rest. Then, you want top 80% highest iv feature. Here, we will choose floor(0.8 * 8) = 6 features instead of 8.

None
unique_param

filter the columns if all values in this feature is the same

UniqueValueParam()
iv_value_param

Use information value to filter columns. If this method is set, a float threshold need to be provided. Filter those columns whose iv is smaller than threshold. Will be deprecated in the future.

IVValueSelectionParam()
iv_percentile_param

Use information value to filter columns. If this method is set, a float ratio threshold need to be provided. Pick floor(ratio * feature_num) features with higher iv. If multiple features around the threshold are same, all those columns will be keep. Will be deprecated in the future.

IVPercentileSelectionParam()
variance_coe_param

Use coefficient of variation to judge whether filtered or not. Will be deprecated in the future.

VarianceOfCoeSelectionParam()
outlier_param

Filter columns whose certain percentile value is larger than a threshold. Will be deprecated in the future.

OutlierColsSelectionParam()
percentage_value_param

Filter the columns that have a value that exceeds a certain percentage.

PercentageValueParam()
iv_param

Setting how to filter base on iv. It support take high mode only. All of "threshold", "top_k" and "top_percentile" are accepted. Check more details in CommonFilterParam. To use this filter, hetero-feature-binning module has to be provided.

IVFilterParam()
statistic_param

Setting how to filter base on statistic values. All of "threshold", "top_k" and "top_percentile" are accepted. Check more details in CommonFilterParam. To use this filter, data_statistic module has to be provided.

CommonFilterParam(metrics=consts.MEAN)
psi_param

Setting how to filter base on psi values. All of "threshold", "top_k" and "top_percentile" are accepted. Its take_high properties should be False to choose lower psi features. Check more details in CommonFilterParam. To use this filter, data_statistic module has to be provided.

CommonFilterParam(metrics=consts.PSI, take_high=False)
use_anonymous

whether to interpret 'select_names' as anonymous names.

False
need_run

Indicate if this module needed to be run

True
Source code in python/federatedml/param/feature_selection_param.py
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
def __init__(self, select_col_indexes=-1, select_names=None, filter_methods=None,
             unique_param=UniqueValueParam(),
             iv_value_param=IVValueSelectionParam(),
             iv_percentile_param=IVPercentileSelectionParam(),
             iv_top_k_param=IVTopKParam(),
             variance_coe_param=VarianceOfCoeSelectionParam(),
             outlier_param=OutlierColsSelectionParam(),
             manually_param=ManuallyFilterParam(),
             percentage_value_param=PercentageValueParam(),
             iv_param=IVFilterParam(),
             statistic_param=CommonFilterParam(metrics=consts.MEAN),
             psi_param=CommonFilterParam(metrics=consts.PSI,
                                         take_high=False),
             vif_param=CommonFilterParam(metrics=consts.VIF,
                                         threshold=5.0,
                                         take_high=False),
             sbt_param=CommonFilterParam(metrics=consts.FEATURE_IMPORTANCE),
             correlation_param=CorrelationFilterParam(),
             use_anonymous=False,
             need_run=True
             ):
    super(FeatureSelectionParam, self).__init__()
    self.correlation_param = correlation_param
    self.vif_param = vif_param
    self.select_col_indexes = select_col_indexes
    if select_names is None:
        self.select_names = []
    else:
        self.select_names = select_names
    if filter_methods is None:
        self.filter_methods = [consts.MANUALLY_FILTER]
    else:
        self.filter_methods = filter_methods

    # deprecate in the future
    self.unique_param = copy.deepcopy(unique_param)
    self.iv_value_param = copy.deepcopy(iv_value_param)
    self.iv_percentile_param = copy.deepcopy(iv_percentile_param)
    self.iv_top_k_param = copy.deepcopy(iv_top_k_param)
    self.variance_coe_param = copy.deepcopy(variance_coe_param)
    self.outlier_param = copy.deepcopy(outlier_param)
    self.percentage_value_param = copy.deepcopy(percentage_value_param)

    self.manually_param = copy.deepcopy(manually_param)
    self.iv_param = copy.deepcopy(iv_param)
    self.statistic_param = copy.deepcopy(statistic_param)
    self.psi_param = copy.deepcopy(psi_param)
    self.sbt_param = copy.deepcopy(sbt_param)
    self.need_run = need_run
    self.use_anonymous = use_anonymous
Attributes
correlation_param = correlation_param instance-attribute
vif_param = vif_param instance-attribute
select_col_indexes = select_col_indexes instance-attribute
select_names = [] instance-attribute
filter_methods = [consts.MANUALLY_FILTER] instance-attribute
unique_param = copy.deepcopy(unique_param) instance-attribute
iv_value_param = copy.deepcopy(iv_value_param) instance-attribute
iv_percentile_param = copy.deepcopy(iv_percentile_param) instance-attribute
iv_top_k_param = copy.deepcopy(iv_top_k_param) instance-attribute
variance_coe_param = copy.deepcopy(variance_coe_param) instance-attribute
outlier_param = copy.deepcopy(outlier_param) instance-attribute
percentage_value_param = copy.deepcopy(percentage_value_param) instance-attribute
manually_param = copy.deepcopy(manually_param) instance-attribute
iv_param = copy.deepcopy(iv_param) instance-attribute
statistic_param = copy.deepcopy(statistic_param) instance-attribute
psi_param = copy.deepcopy(psi_param) instance-attribute
sbt_param = copy.deepcopy(sbt_param) instance-attribute
need_run = need_run instance-attribute
use_anonymous = use_anonymous instance-attribute
Functions
check()
Source code in python/federatedml/param/feature_selection_param.py
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
def check(self):
    descr = "hetero feature selection param's"

    self.check_defined_type(self.filter_methods, descr, ['list'])

    for idx, method in enumerate(self.filter_methods):
        method = method.lower()
        self.check_valid_value(method, descr, [consts.UNIQUE_VALUE, consts.IV_VALUE_THRES, consts.IV_PERCENTILE,
                                               consts.COEFFICIENT_OF_VARIATION_VALUE_THRES, consts.OUTLIER_COLS,
                                               consts.MANUALLY_FILTER, consts.PERCENTAGE_VALUE,
                                               consts.IV_FILTER, consts.STATISTIC_FILTER, consts.IV_TOP_K,
                                               consts.PSI_FILTER, consts.HETERO_SBT_FILTER,
                                               consts.HOMO_SBT_FILTER, consts.HETERO_FAST_SBT_FILTER,
                                               consts.VIF_FILTER, consts.CORRELATION_FILTER])

        self.filter_methods[idx] = method

    self.check_defined_type(self.select_col_indexes, descr, ['list', 'int'])

    self.unique_param.check()
    self.iv_value_param.check()
    self.iv_percentile_param.check()
    self.iv_top_k_param.check()
    self.variance_coe_param.check()
    self.outlier_param.check()
    self.manually_param.check()
    self.percentage_value_param.check()

    self.iv_param.check()
    for th in self.iv_param.take_high:
        if not th:
            raise ValueError("Iv filter should take higher iv features")
    for m in self.iv_param.metrics:
        if m != consts.IV:
            raise ValueError("For iv filter, metrics should be 'iv'")

    self.statistic_param.check()
    self.psi_param.check()
    for th in self.psi_param.take_high:
        if th:
            raise ValueError("PSI filter should take lower psi features")
    for m in self.psi_param.metrics:
        if m != consts.PSI:
            raise ValueError("For psi filter, metrics should be 'psi'")

    self.sbt_param.check()
    for th in self.sbt_param.take_high:
        if not th:
            raise ValueError("SBT filter should take higher feature_importance features")
    for m in self.sbt_param.metrics:
        if m != consts.FEATURE_IMPORTANCE:
            raise ValueError("For SBT filter, metrics should be 'feature_importance'")

    self.vif_param.check()
    for m in self.vif_param.metrics:
        if m != consts.VIF:
            raise ValueError("For VIF filter, metrics should be 'vif'")

    self.correlation_param.check()
    self.check_boolean(self.use_anonymous, f"{descr} use_anonymous")

    self._warn_to_deprecate_param("iv_value_param", descr, "iv_param")
    self._warn_to_deprecate_param("iv_percentile_param", descr, "iv_param")
    self._warn_to_deprecate_param("iv_top_k_param", descr, "iv_param")
    self._warn_to_deprecate_param("variance_coe_param", descr, "statistic_param")
    self._warn_to_deprecate_param("unique_param", descr, "statistic_param")
    self._warn_to_deprecate_param("outlier_param", descr, "statistic_param")

最后更新: 2023-05-31