Federated Machine Learning¶

[中文]

FederatedML includes implementation of many common machine learning algorithms on federated learning. All modules are developed in a decoupling modular approach to enhance scalability. Specifically, we provide:

Federated Statistic: PSI, Union, Pearson Correlation, etc.
Federated Feature Engineering: Feature Sampling, Feature Binning, Feature Selection, etc.
Federated Machine Learning Algorithms: LR, GBDT, DNN, TransferLearning, which support Heterogeneous and Homogeneous styles.
Model Evaluation: Binary | Multiclass | Regression | Clustering Evaluation, Local vs Federated Comparison.
Secure Protocol: Provides multiple security protocols for secure multi-party computing and interaction between participants.

Algorithm List¶

Algorithm¶
Algorithm	Module Name	Description	Data Input	Data Output	Model Input	Model Output
Reader	Reader	This component loads and transforms data from storage engine so that data is compatible with FATE computing engine	Original Data	Transformed Data
DataIO	DataIO	This component transforms user-uploaded date into Instance object(deprecate in FATe-v1.7, use DataTransform instead).	Table, values are raw data.	Transformed Table, values are data instance defined here		DataIO Model
DataTransform	DataTransform	This component transforms user-uploaded date into Instance object.	Table, values are raw data.	Transformed Table, values are data instance defined here		DataTransform Model
Intersect	Intersection	Compute intersect data set of multiple parties without leakage of difference set information. Mainly used in hetero scenario task.	Table.	Table with only common instance keys.		Intersect Model
Federated Sampling	FederatedSample	Federated Sampling data so that its distribution become balance in each party.This module supports standalone and federated versions.	Table	Table of sampled data; both random and stratified sampling methods are supported.
Feature Scale	FeatureScale	module for feature scaling and standardization.	Table，values are instances.	Transformed Table.	Transform factors like min/max, mean/std.
Hetero Feature Binning	Hetero Feature Binning	With binning input data, calculates each column’s iv and woe and transform data according to the binned information.	Table, values are instances.	Transformed Table.		iv/woe, split points, event count, non-event count etc. of each column.
Homo Feature Binning	Homo Feature Binning	Calculate quantile binning through multiple parties	Table	Transformed Table		Split points of each column
OneHot Encoder	OneHotEncoder	Transfer a column into one-hot format.	Table, values are instances.	Transformed Table with new header.		Feature-name mapping between original header and new header.
Hetero Feature Selection	HeteroFeatureSelection	Provide 5 types of filters. Each filters can select columns according to user config	Table	Transformed Table with new header and filtered data instance.	If iv filters used, hetero_binning model is needed.	Whether each column is filtered.
Union	Union	Combine multiple data tables into one.	Tables.	Table with combined values from input Tables.
Hetero-LR	HeteroLR	Build hetero logistic regression model through multiple parties.	Table, values are instances	Table, values are instances.		Logistic Regression Model, consists of model-meta and model-param.
Local Baseline	LocalBaseline	Wrapper that runs sklearn(scikit-learn) Logistic Regression model with local data.	Table, values are instances.	Table, values are instances.
Hetero-LinR	HeteroLinR	Build hetero linear regression model through multiple parties.	Table, values are instances.	Table, values are instances.		Linear Regression Model, consists of model-meta and model-param.
Hetero-Poisson	HeteroPoisson	Build hetero poisson regression model through multiple parties.	Table, values are instances.	Table, values are instances.		Poisson Regression Model, consists of model-meta and model-param.
Homo-LR	HomoLR	Build homo logistic regression model through multiple parties.	Table, values are instances.	Table, values are instances.		Logistic Regression Model, consists of model-meta and model-param.
Homo-NN	HomoNN	Build homo neural network model through multiple parties.	Table, values are instances.	Table, values are instances.		Neural Network Model, consists of model-meta and model-param.
Hetero Secure Boosting	HeteroSecureBoost	Build hetero secure boosting model through multiple parties	Table, values are instances.	Table, values are instances.		SecureBoost Model, consists of model-meta and model-param.
Hetero Fast Secure Boosting	HeteroFastSecureBoost	Build hetero secure boosting model through multiple parties in layered/mix manners.	Table, values are instances.	Table, values are instances.		FastSecureBoost Model, consists of model-meta and model-param.
Hetero Secure Boost Feature Transformer	SBT Feature Transformer	This component can encode sample using Hetero SBT leaf indices.	Table, values are instances.	Table, values are instances.		SBT Transformer Model
Evaluation	Evaluation	Output the model evaluation metrics for user.	Table(s), values are instances.
Hetero Pearson	HeteroPearson	Calculate hetero correlation of features from different parties.	Table, values are instances.
Hetero-NN	HeteroNN	Build hetero neural network model.	Table, values are instances.	Table, values are instances.		Hetero Neural Network Model, consists of model-meta and model-param.
Homo Secure Boosting	HomoSecureBoost	Build homo secure boosting model through multiple parties	Table, values are instances.	Table, values are instances.		SecureBoost Model, consists of model-meta and model-param.
Homo OneHot Encoder	HomoOneHotEncoder	Build homo onehot encoder model through multiple parties.	Table, values are instances.	Transformed Table with new header.		Feature-name mapping between original header and new header.
Data Split	Data Split	Split one data table into 3 tables by given ratio or count	Table, values are instances.	3 Tables, values are instance.
Column Expand	Column Expand	Add arbitrary number of columns with user-provided values.	Table, values are raw data.	Transformed Table with added column(s) and new header.		Column Expand Model
Secure Information Retrieval	Secure Information Retrieval	Securely retrieves information from host through oblivious transfer	Table, values are instance	Table, values are instance
Hetero Federated Transfer Learning	Hetero FTL	Build Hetero FTL Model Between 2 party	Table, values are instance			Hetero FTL Model
Hetero KMeans	Hetero KMeans	Build Hetero KMeans model through multiple parties	Table, values are instance	Table, values are instance; Arbier outputs 2 Tables		Hetero KMeans Model
PSI	PSI module	Compute PSI value of features between two table	Table, values are instance			PSI Results
Data Statistics	Data Statistics	This component will do some statistical work on the data, including statistical mean, maximum and minimum, median, etc.	Table, values are instance	Table		Statistic Result
Scorecard	Scorecard	Scale predict score to credit score by given scaling parameters	Table, values are predict score	Table, values are score results
Sample Weight	Sample Weight	Assign weight to instances according to user-specified parameters	Table, values are instance	Table, values are weighted instance
Feldman Verifiable Sum	Feldman Verifiable Sum	This component will sum multiple privacy values without exposing data	Table, values are addend	Table, values are sum results

Secure Protocol¶

Params¶

Classes:

`BoostingParam`([task_type, objective_param, …])	Basic parameter for Boosting Algorithms
`ColumnExpandParam`([append_header, method, …])	Define method used for expanding column
`CrossValidationParam`([n_splits, mode, role, …])	Define cross validation params
`DataIOParam`([input_format, delimitor, …])	Define dataio parameters that used in federated ml.
`DataSplitParam`([random_state, test_size, …])	Define data split param that used in data split.
`DataTransformParam`([input_format, …])	Define data transform parameters that used in federated ml.
`DecisionTreeParam`([criterion_method, …])	Define decision tree parameters that used in federated ml.
`EncodeParam`([salt, encode_method, base64])	Define the hash method for raw intersect method
`EncryptParam`([method, key_length])	Define encryption method that used in federated ml.
`EncryptedModeCalculatorParam`([mode, …])	Define the encrypted_mode_calulator parameters.
`FeatureBinningParam`([method, …])	Define the feature binning method
`FeatureSelectionParam`([select_col_indexes, …])	Define the feature selection parameters.
`HeteroNNParam`([task_type, config_type, …])	Parameters used for Hetero Neural Network.
`HomoNNParam`(api_version, secure_aggregate, …)	Parameters used for Homo Neural Network.
`HomoOneHotParam`([transform_col_indexes, …])	param transform_col_indexes Specify which columns need to calculated. -1 represent for all columns.
`InitParam`([init_method, init_const, …])	Initialize Parameters used in initializing a model.
`IntersectParam`(intersect_method[, …])	Define the intersect method
`KmeansParam`([k, max_iter, tol, random_stat])	Parameters used for K-means.
`LinearParam`([penalty, tol, alpha, …])	Parameters used for Linear Regression.
`LocalBaselineParam`([model_name, model_opts, …])	Define the local baseline model param
`LogisticParam`([penalty, tol, alpha, …])	Parameters used for Logistic Regression both for Homo mode or Hetero mode.
`ObjectiveParam`([objective, params])	Define objective parameters that used in federated ml.
`OneVsRestParam`([need_one_vs_rest, has_arbiter])	Define the one_vs_rest parameters.
`PoissonParam`([penalty, tol, alpha, …])	Parameters used for Poisson Regression.
`PredictParam`([threshold])	Define the predict method of HomoLR, HeteroLR, SecureBoosting
`RSAParam`([salt, hash_method, …])	Define the hash method for RSA intersect method
`RsaParam`([rsa_key_n, rsa_key_e, rsa_key_d, …])	Define the sample method
`SampleParam`([mode, method, fractions, …])	Define the sample method
`SampleWeightParam`([class_weight, …])	Define sample weight parameters
`ScaleParam`([method, mode, …])	Define the feature scale parameters.
`ScorecardParam`([method, offset, factor, …])	Define method used for transforming prediction score to credit score
`SecureInformationRetrievalParam`([…])	security_level: float [0, 1]; if security_level == 0, then do raw data retrieval oblivious_transfer_protocol: OT type, only supports consts.OT_HAUCK commutative_encryption: the commutative encryption scheme used, only supports consts.CE_PH non_committing_encryption: the non-committing encryption scheme used, only supports consts.AES key_size: int >= 768, the key length of the commutative cipher raw_retrieval: bool, perform raw retrieval if raw_retrieval
`StatisticsParam`([statistics, column_names, …])	Define statistics params
`StepwiseParam`([score_name, mode, role, …])	Define stepwise params
`StochasticQuasiNewtonParam`([…])	Parameters used for stochastic quasi-newton method.
`UnionParam`([need_run, allow_missing, …])	Define the union method for combining multiple dTables and keep entries with the same id

class BoostingParam(task_type='classification', objective_param=<federatedml.param.boosting_param.ObjectiveParam object>, learning_rate=0.3, num_trees=5, subsample_feature_rate=1, n_iter_no_change=True, tol=0.0001, bin_num=32, predict_param=<federatedml.param.predict_param.PredictParam object>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object>, validation_freqs=None, metrics=None, random_seed=100, binning_error=0.0001)¶

Basic parameter for Boosting Algorithms

Parameters

task_type (str, accepted 'classification', 'regression' only, default: 'classification') –
objective_param (ObjectiveParam Object, default: ObjectiveParam()) –
learning_rate (float, accepted float, int or long only, the learning rate of secure boost. default: 0.3) –
num_trees (int, accepted int, float only, the max number of boosting round. default: 5) –
subsample_feature_rate (float, a float-number in [0, 1], default: 1.0) –
n_iter_no_change (bool,) – when True and residual error less than tol, tree building process will stop. default: True
bin_num (int, positive integer greater than 1, bin number use in quantile. default: 32) –
validation_freqs (None or positive integer or container object in python. Do validation in training process or Not.) –
if equals None, will not do validation in train process; if equals positive integer, will validate data every validation_freqs epochs passes; if container object in python, will validate data if epochs belong to this container.

e.g. validation_freqs = [10, 15], will validate data when epoch equals to 10 and 15.

Default: None

class ColumnExpandParam(append_header=None, method='manual', fill_value=1e-08, need_run=True)¶

Define method used for expanding column

Parameters

append_header (None, str, List[str] default: None) – Name(s) for appended feature(s). If None is given, module outputs the original input value without any operation.
method (str, default: 'manual') – If method is ‘manual’, use user-specified fill_value to fill in new features.
fill_value (int, float, str, List[int], List[float], List[str] default: 1e-8) – Used for filling expanded feature columns. If given a list, length of the list must match that of append_header
need_run (bool, default: True) – Indicate if this module needed to be run.

class CrossValidationParam(n_splits=5, mode='hetero', role='guest', shuffle=True, random_seed=1, need_cv=False, output_fold_history=True, history_value_type='score')¶

Define cross validation params

Parameters

n_splits (int, default: 5) – Specify how many splits used in KFold
mode (str, default: 'Hetero') – Indicate what mode is current task
role (str, default: 'Guest') – Indicate what role is current party
shuffle (bool, default: True) – Define whether do shuffle before KFold or not.
random_seed (int, default: 1) – Specify the random seed for numpy shuffle
need_cv (bool, default False) – Indicate if this module needed to be run
output_fold_history (bool, default True) – Indicate whether to output table of ids used by each fold, else return original input data returned ids are formatted as: {original_id}#fold{fold_num}#{train/validate}
history_value_type (str, default score, choose between {'instance', 'score'}) – Indicate whether to include original instance or predict score in the output fold history, only effective when output_fold_history set to True

class DataIOParam(input_format='dense', delimitor=',', data_type='float64', exclusive_data_type=None, tag_with_value=False, tag_value_delimitor=':', missing_fill=False, default_value=0, missing_fill_method=None, missing_impute=None, outlier_replace=False, outlier_replace_method=None, outlier_impute=None, outlier_replace_value=0, with_label=False, label_name='y', label_type='int', output_format='dense', need_run=True)¶

Define dataio parameters that used in federated ml.

Parameters

input_format (str, accepted 'dense','sparse' 'tag' only in this version. default: 'dense'.) –
please have a look at this tutorial at “DataIO” section of federatedml/util/README.md. Formally,

dense input format data should be set to “dense”, svm-light input format data should be set to “sparse”, tag or tag:value input format data should be set to “tag”.
delimitor (str, the delimitor of data input, default: ',') –
data_type (str, the data type of data input, accepted 'float','float64','int','int64','str','long') – “default: “float64”
exclusive_data_type (dict, the key of dict is col_name, the value is data_type, use to specified special data type) – of some features.
tag_with_value (bool, use if input_format is 'tag', if tag_with_value is True,) – input column data format should be tag[delimitor]value, otherwise is tag only
tag_value_delimitor (str, use if input_format is 'tag' and 'tag_with_value' is True,) – delimitor of tag[delimitor]value column value.
missing_fill (bool, need to fill missing value or not, accepted only True/False, default: False) –
default_value (None or single object type or list, the value to replace missing value.) –
if None, it will use default value define in federatedml/feature/imputer.py, if single object, will fill missing value with this object, if list, it’s length should be the sample of input data’ feature dimension,

means that if some column happens to have missing values, it will replace it the value by element in the identical position of this list.

default: None
missing_fill_method (None or str, the method to replace missing value, should be one of [None, 'min', 'max', 'mean', 'designated'], default: None) –
missing_impute (None or list, element of list can be any type, or auto generated if value is None, define which values to be consider as missing, default: None) –
outlier_replace (bool, need to replace outlier value or not, accepted only True/False, default: True) –
outlier_replace_method (None or str, the method to replace missing value, should be one of [None, 'min', 'max', 'mean', 'designated'], default: None) –
outlier_impute (None or list, element of list can be any type, which values should be regard as missing value, default: None) –
outlier_replace_value (None or single object type or list, the value to replace outlier.) –
if None, it will use default value define in federatedml/feature/imputer.py, if single object, will replace outlier with this object, if list, it’s length should be the sample of input data’ feature dimension,

means that if some column happens to have outliers, it will replace it the value by element in the identical position of this list.

default: None
with_label (bool, True if input data consist of label, False otherwise. default: 'false') –
label_name (str, column_name of the column where label locates, only use in dense-inputformat. default: 'y') –
label_type (object, accepted 'int','int64','float','float64','long','str' only,) – use when with_label is True. default: ‘false’
output_format (str, accepted 'dense','sparse' only in this version. default: 'dense') –

class DataSplitParam(random_state=None, test_size=None, train_size=None, validate_size=None, stratified=False, shuffle=True, split_points=None, need_run=True)¶

Define data split param that used in data split.

Parameters

random_state (None, int, default: None) – Specify the random state for shuffle.
test_size (None, float, int, default: 0.0) – Specify test data set size. float value specifies fraction of input data set, int value specifies exact number of data instances
train_size (None, float, int, default: 0.8) – Specify train data set size. float value specifies fraction of input data set, int value specifies exact number of data instances
validate_size (None, float, int, default: 0.2) – Specify validate data set size. float value specifies fraction of input data set, int value specifies exact number of data instances
stratified (boolean, default: False) – Define whether sampling should be stratified, according to label value.
shuffle (boolean, default : True) – Define whether do shuffle before splitting or not.
split_points (None, list, default : None) – Specify the point(s) by which continuous label values are bucketed into bins for stratified split. eg.[0.2] for two bins or [0.1, 1, 3] for 4 bins
need_run (bool, default: True) – Specify whether to run data split

class DataTransformParam(input_format='dense', delimitor=',', data_type='float64', exclusive_data_type=None, tag_with_value=False, tag_value_delimitor=':', missing_fill=False, default_value=0, missing_fill_method=None, missing_impute=None, outlier_replace=False, outlier_replace_method=None, outlier_impute=None, outlier_replace_value=0, with_label=False, label_name='y', label_type='int', output_format='dense', need_run=True)¶

Define data transform parameters that used in federated ml.

Parameters

input_format (str, accepted 'dense','sparse' 'tag' only in this version. default: 'dense'.) –
please have a look at this tutorial at “DataTransform” section of federatedml/util/README.md. Formally,

dense input format data should be set to “dense”, svm-light input format data should be set to “sparse”, tag or tag:value input format data should be set to “tag”.
delimitor (str, the delimitor of data input, default: ',') –
data_type (str, the data type of data input, accepted 'float','float64','int','int64','str','long') – “default: “float64”
exclusive_data_type (dict, the key of dict is col_name, the value is data_type, use to specified special data type) – of some features.
tag_with_value (bool, use if input_format is 'tag', if tag_with_value is True,) – input column data format should be tag[delimitor]value, otherwise is tag only
tag_value_delimitor (str, use if input_format is 'tag' and 'tag_with_value' is True,) – delimitor of tag[delimitor]value column value.
missing_fill (bool, need to fill missing value or not, accepted only True/False, default: False) –
default_value (None or single object type or list, the value to replace missing value.) –
if None, it will use default value define in federatedml/feature/imputer.py, if single object, will fill missing value with this object, if list, it’s length should be the sample of input data’ feature dimension,

means that if some column happens to have missing values, it will replace it the value by element in the identical position of this list.

default: None
missing_fill_method (None or str, the method to replace missing value, should be one of [None, 'min', 'max', 'mean', 'designated'], default: None) –
missing_impute (None or list, element of list can be any type, or auto generated if value is None, define which values to be consider as missing, default: None) –
outlier_replace (bool, need to replace outlier value or not, accepted only True/False, default: True) –
outlier_replace_method (None or str, the method to replace missing value, should be one of [None, 'min', 'max', 'mean', 'designated'], default: None) –
outlier_impute (None or list, element of list can be any type, which values should be regard as missing value, default: None) –
outlier_replace_value (None or single object type or list, the value to replace outlier.) –
if None, it will use default value define in federatedml/feature/imputer.py, if single object, will replace outlier with this object, if list, it’s length should be the sample of input data’ feature dimension,

means that if some column happens to have outliers, it will replace it the value by element in the identical position of this list.

default: None
with_label (bool, True if input data consist of label, False otherwise. default: 'false') –
label_name (str, column_name of the column where label locates, only use in dense-inputformat. default: 'y') –
label_type (object, accepted 'int','int64','float','float64','long','str' only,) – use when with_label is True. default: ‘false’
output_format (str, accepted 'dense','sparse' only in this version. default: 'dense') –

class DecisionTreeParam(criterion_method='xgboost', criterion_params=[0.1, 0], max_depth=3, min_sample_split=2, min_impurity_split=0.001, min_leaf_node=1, max_split_nodes=65536, feature_importance_type='split', n_iter_no_change=True, tol=0.001, min_child_weight=0, use_missing=False, zero_as_missing=False, deterministic=False)¶

Define decision tree parameters that used in federated ml.

Parameters

criterion_method (str, accepted "xgboost" only, the criterion function to use, default: 'xgboost') –
criterion_params (list or dict, should be non empty and elements are float-numbers,) – if a list is offered, the first one is l2 regularization value, and the second one is l1 regularization value. if a dict is offered, make sure it contains key ‘l1’, and ‘l2’. l1, l2 regularization values are non-negative floats. default: [0.1, 0] or {‘l1’:0, ‘l2’:0,1}
max_depth (int, positive integer, the max depth of a decision tree, default: 3) –
min_sample_split (int, least quantity of nodes to split, default: 2) –
min_impurity_split (float, least gain of a single split need to reach, default: 1e-3) –
min_child_weight (float, sum of hessian needed in child nodes. default is 0) –
min_leaf_node (int, when samples no more than min_leaf_node, it becomes a leave, default: 1) –
max_split_nodes (int, positive integer, we will use no more than max_split_nodes to) – parallel finding their splits in a batch, for memory consideration. default is 65536
feature_importance_type (str, support 'split', 'gain' only.) – if is ‘split’, feature_importances calculate by feature split times, if is ‘gain’, feature_importances calculate by feature split gain. default: ‘split’
use_missing (bool, accepted True, False only, use missing value in training process or not. default: False) –
zero_as_missing (bool, accepted True, False only, regard 0 as missing value or not,) – will be use only if use_missing=True, default: False
deterministic (bool, ensure stability when computing histogram. Set this to true to ensure stable result when using) – same data and same parameter. But it may slow down computation.

class EncodeParam(salt='', encode_method='none', base64=False)¶

Define the hash method for raw intersect method

Parameters

salt (the src data string will be str = str + salt, default by empty string) –
encode_method (str, the hash method of src data string, it support md5, sha1, sha224, sha256, sha384, sha512, sm3, default by None) –
base64 (bool, if True, the result of hash will be changed to base64, default by False) –

class EncryptParam(method='Paillier', key_length=1024)¶

Define encryption method that used in federated ml.

Parameters

method (str, default: 'Paillier') – If method is ‘Paillier’, Paillier encryption will be used for federated ml. To use non-encryption version in HomoLR, set this to None. For detail of Paillier encryption, please check out the paper mentioned in README file. Accepted values: {‘Paillier’, ‘IterativeAffine’, ‘Random_IterativeAffine’}
key_length (int, default: 1024) – Used to specify the length of key in this encryption method.

class EncryptedModeCalculatorParam(mode='strict', re_encrypted_rate=1)¶

Define the encrypted_mode_calulator parameters.

Parameters

mode (str, support 'strict', 'fast', 'balance', 'confusion_opt', ' only, default: strict) –
re_encrypted_rate (float or int, numeric number in [0, 1], use when mode equals to 'balance, default: 1) –

class FeatureBinningParam(method='quantile', compress_thres=10000, head_size=10000, error=0.0001, bin_num=10, bin_indexes=-1, bin_names=None, adjustment_factor=0.5, transform_param=<federatedml.param.feature_binning_param.TransformParam object>, local_only=False, category_indexes=None, category_names=None, need_run=True, skip_static=False)¶

Define the feature binning method

Parameters

method (str, 'quantile'， 'bucket' or 'optimal', default: 'quantile') – Binning method.
compress_thres (int, default: 10000) – When the number of saved summaries exceed this threshold, it will call its compress function
head_size (int, default: 10000) – The buffer size to store inserted observations. When head list reach this buffer size, the QuantileSummaries object start to generate summary(or stats) and insert into its sampled list.
error (float, 0 <= error < 1 default: 0.001) – The error of tolerance of binning. The final split point comes from original data, and the rank of this value is close to the exact rank. More precisely, floor((p - 2 * error) * N) <= rank(x) <= ceil((p + 2 * error) * N) where p is the quantile in float, and N is total number of data.
bin_num (int, bin_num > 0, default: 10) – The max bin number for binning
bin_indexes (list of int or int, default: -1) – Specify which columns need to be binned. -1 represent for all columns. If you need to indicate specific cols, provide a list of header index instead of -1.
bin_names (list of string, default: []) – Specify which columns need to calculated. Each element in the list represent for a column name in header.
adjustment_factor (float, default: 0.5) – the adjustment factor when calculating WOE. This is useful when there is no event or non-event in a bin. Please note that this parameter will NOT take effect for setting in host.
category_indexes (list of int or int, default: []) –
Specify which columns are category features. -1 represent for all columns. List of int indicate a set of such features. For category features, bin_obj will take its original values as split_points and treat them as have been binned. If this is not what you expect, please do NOT put it into this parameters.

The number of categories should not exceed bin_num set above.
category_names (list of string, default: []) – Use column names to specify category features. Each element in the list represent for a column name in header.
local_only (bool, default: False) – Whether just provide binning method to guest party. If true, host party will do nothing. Warnings: This parameter will be deprecated in future version.
transform_param (TransformParam) – Define how to transfer the binned data.
need_run (bool, default True) – Indicate if this module needed to be run
skip_static (bool, default False) – If true, binning will not calculate iv, woe etc. In this case, optimal-binning will not be supported.

class FeatureSelectionParam(select_col_indexes=-1, select_names=None, filter_methods=None, unique_param=<federatedml.param.feature_selection_param.UniqueValueParam object>, iv_value_param=<federatedml.param.feature_selection_param.IVValueSelectionParam object>, iv_percentile_param=<federatedml.param.feature_selection_param.IVPercentileSelectionParam object>, iv_top_k_param=<federatedml.param.feature_selection_param.IVTopKParam object>, variance_coe_param=<federatedml.param.feature_selection_param.VarianceOfCoeSelectionParam object>, outlier_param=<federatedml.param.feature_selection_param.OutlierColsSelectionParam object>, manually_param=<federatedml.param.feature_selection_param.ManuallyFilterParam object>, percentage_value_param=<federatedml.param.feature_selection_param.PercentageValueParam object>, iv_param=<federatedml.param.feature_selection_param.CommonFilterParam object>, statistic_param=<federatedml.param.feature_selection_param.CommonFilterParam object>, psi_param=<federatedml.param.feature_selection_param.CommonFilterParam object>, vif_param=<federatedml.param.feature_selection_param.CommonFilterParam object>, sbt_param=<federatedml.param.feature_selection_param.CommonFilterParam object>, correlation_param=<federatedml.param.feature_selection_param.CorrelationFilterParam object>, need_run=True)¶

Define the feature selection parameters.

Parameters

select_col_indexes (list or int, default: -1) – Specify which columns need to calculated. -1 represent for all columns.
select_names (list of string, default: []) – Specify which columns need to calculated. Each element in the list represent for a column name in header.
filter_methods (list, ["manually", "iv_filter", "statistic_filter",) –

“psi_filter”, “hetero_sbt_filter”, “homo_sbt_filter”,
”hetero_fast_sbt_filter”, “percentage_value”, “vif_filter”, “correlation_filter”],

default: [“manually”]

The following methods will be deprecated in future version: “unique_value”, “iv_value_thres”, “iv_percentile”, “coefficient_of_variation_value_thres”, “outlier_cols”

Specify the filter methods used in feature selection. The orders of filter used is depended on this list. Please be notified that, if a percentile method is used after some certain filter method, the percentile represent for the ratio of rest features.

e.g. If you have 10 features at the beginning. After first filter method, you have 8 rest. Then, you want top 80% highest iv feature. Here, we will choose floor(0.8 * 8) = 6 features instead of 8.
unique_param (filter the columns if all values in this feature is the same) –
iv_value_param (Use information value to filter columns. If this method is set, a float threshold need to be provided.) – Filter those columns whose iv is smaller than threshold. Will be deprecated in the future.
iv_percentile_param (Use information value to filter columns. If this method is set, a float ratio threshold) – need to be provided. Pick floor(ratio * feature_num) features with higher iv. If multiple features around the threshold are same, all those columns will be keep. Will be deprecated in the future.
variance_coe_param (Use coefficient of variation to judge whether filtered or not.) – Will be deprecated in the future.
outlier_param (Filter columns whose certain percentile value is larger than a threshold.) – Will be deprecated in the future.
percentage_value_param (Filter the columns that have a value that exceeds a certain percentage.) –
iv_param (Setting how to filter base on iv. It support take high mode only. All of "threshold",) – “top_k” and “top_percentile” are accepted. Check more details in CommonFilterParam. To use this filter, hetero-feature-binning module has to be provided.
statistic_param (Setting how to filter base on statistic values. All of "threshold",) – “top_k” and “top_percentile” are accepted. Check more details in CommonFilterParam. To use this filter, data_statistic module has to be provided.
psi_param (Setting how to filter base on psi values. All of "threshold",) – “top_k” and “top_percentile” are accepted. Its take_high properties should be False to choose lower psi features. Check more details in CommonFilterParam. To use this filter, data_statistic module has to be provided.
need_run (bool, default True) – Indicate if this module needed to be run

class HeteroNNParam(task_type='classification', config_type='keras', bottom_nn_define=None, top_nn_define=None, interactive_layer_define=None, interactive_layer_lr=0.9, optimizer='SGD', loss=None, epochs=100, batch_size=-1, early_stop='diff', tol=1e-05, encrypt_param=<federatedml.param.encrypt_param.EncryptParam object>, encrypted_mode_calculator_param=<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object>, predict_param=<federatedml.param.predict_param.PredictParam object>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object>, validation_freqs=None, early_stopping_rounds=None, metrics=None, use_first_metric_only=True, selector_param=<federatedml.param.hetero_nn_param.SelectorParam object>, floating_point_precision=23, drop_out_keep_rate=1.0)¶

Parameters used for Hetero Neural Network.

Parameters

task_type – str, task type of hetero nn model, one of ‘classification’, ‘regression’.
config_type – str, accept “keras” only.
bottom_nn_define – a dict represents the structure of bottom neural network.
interactive_layer_define – a dict represents the structure of interactive layer.
interactive_layer_lr – float, the learning rate of interactive layer.
top_nn_define – a dict represents the structure of top neural network.
optimizer –
optimizer method, accept following types: 1. a string, one of “Adadelta”, “Adagrad”, “Adam”, “Adamax”, “Nadam”, “RMSprop”, “SGD” 2. a dict, with a required key-value pair keyed by “optimizer”,

with optional key-value pairs such as learning rate.

defaults to “SGD”
loss – str, a string to define loss function used
early_stopping_rounds – int, default: None
stop training if one metric doesn’t improve in last early_stopping_round rounds (Will) –
metrics – list, default: None Indicate when executing evaluation during train process, which metrics will be used. If not set, default metrics for specific task type will be used. As for binary classification, default metrics are [‘auc’, ‘ks’], for regression tasks, default metrics are [‘root_mean_squared_error’, ‘mean_absolute_error’], [ACCURACY, PRECISION, RECALL] for multi-classification task
use_first_metric_only – bool, default: False Indicate whether to use the first metric in metrics as the only criterion for early stopping judgement.
epochs – int, the maximum iteration for aggregation in training.
batch_size – int, batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy. defaults to -1.
early_stop –
str, accept ‘diff’ only in this version, default: ‘diff’ Method used to judge converge or not.
1. diff： Use difference of loss between two iterations to judge whether converge.
validation_freqs –
None or positive integer or container object in python. Do validation in training process or Not. if equals None, will not do validation in train process; if equals positive integer, will validate data every validation_freqs epochs passes; if container object in python, will validate data if epochs belong to this container.

e.g. validation_freqs = [10, 15], will validate data when epoch equals to 10 and 15.

Default: None The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to speed up training by skipping validation rounds. When it is larger than 1, a number which is divisible by “epochs” is recommended, otherwise, you will miss the validation scores of last training epoch.
floating_point_precision –
None or integer, if not None, means use floating_point_precision-bit to speed up calculation, e.g.: convert an x to round(x * 2**floating_point_precision) during Paillier operation, divide

the result by 2**floating_point_precision in the end.
drop_out_keep_rate – float, should betweend 0 and 1, if not equals to 1.0, will enabled drop out

class HomoNNParam(api_version: int = 0, secure_aggregate: bool = True, aggregate_every_n_epoch: int = 1, config_type: str = 'nn', nn_define: Optional[dict] = None, optimizer: Union[str, dict, types.SimpleNamespace] = 'SGD', loss: Optional[str] = None, metrics: Optional[Union[str, list]] = None, max_iter: int = 100, batch_size: int = -1, early_stop: Union[str, dict, types.SimpleNamespace] = 'diff', encode_label: bool = False, predict_param=<federatedml.param.predict_param.PredictParam object>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object>)¶

Parameters used for Homo Neural Network.

Parameters

Args –

secure_aggregate: enable secure aggregation or not, defaults to True. aggregate_every_n_epoch: aggregate model every n epoch, defaults to 1. config_type: one of “nn”, “keras”, “tf” nn_define: a dict represents the structure of neural network. optimizer: optimizer method, accept following types:

a string, one of “Adadelta”, “Adagrad”, “Adam”, “Adamax”, “Nadam”, “RMSprop”, “SGD”

a dict, with a required key-value pair keyed by “optimizer”,
with optional key-value pairs such as learning rate.

defaults to “SGD”

loss: a string metrics: max_iter: the maximum iteration for aggregation in training. batch_size : batch size when updating model.

-1 means use all data in a batch. i.e. Not to use mini-batch strategy. defaults to -1.

early_stopstr, ‘diff’, ‘weight_diff’ or ‘abs’, default: ‘diff’

Method used to judge converge or not.

diff： Use difference of loss between two iterations to judge whether converge.
weight_diff: Use difference between weights of two consecutive iterations
abs: Use the absolute value of loss to judge whether converge. i.e. if loss < eps, it is converged.

encode_label : encode label to one_hot.

class HomoOneHotParam(transform_col_indexes=- 1, transform_col_names=None, need_run=True, need_alignment=True)¶

Parameters

transform_col_indexes (list or int, default: -1) – Specify which columns need to calculated. -1 represent for all columns.
need_run (bool, default True) – Indicate if this module needed to be run
need_alignment (bool, default True) – Indicated whether alignment of features is turned on

class InitParam(init_method='random_uniform', init_const=1, fit_intercept=True, random_seed=None)¶

Initialize Parameters used in initializing a model.

Parameters

init_method (str, 'random_uniform', 'random_normal', 'ones', 'zeros' or 'const'. default: 'random_uniform') – Initial method.
init_const (int or float, default: 1) – Required when init_method is ‘const’. Specify the constant.
fit_intercept (bool, default: True) – Whether to initialize the intercept or not.

class IntersectParam(intersect_method: str = 'raw', random_bit=128, sync_intersect_ids=True, join_role='guest', with_encode=False, only_output_key=False, encode_params=<federatedml.param.intersect_param.EncodeParam object>, rsa_params=<federatedml.param.intersect_param.RSAParam object>, intersect_cache_param=<federatedml.param.intersect_param.IntersectCache object>, repeated_id_process=False, repeated_id_owner='guest', with_sample_id=False, allow_info_share: bool = False, info_owner='guest')¶

Define the intersect method

Parameters

intersect_method (str, it supports 'rsa' and 'raw', default by 'raw') –
random_bit (positive int, it will define the encrypt length of rsa algorithm. It effective only for intersect_method is rsa) –
sync_intersect_ids (bool. In rsa, 'synchronize_intersect_ids' is True means guest or host will send intersect results to the others, and False will not.) – while in raw, ‘synchronize_intersect_ids’ is True means the role of “join_role” will send intersect results and the others will get them. Default by True.
join_role (str, role who joins ids, supports "guest" and "host" only and effective only for raw. If it is "guest", the host will send its ids to guest and find the intersection of) – ids in guest; if it is “host”, the guest will send its ids to host. Default by “guest”.
with_encode (bool, if True, it will use hash method for intersect ids. Effective only for "raw".) –
encode_params (EncodeParam, it effective only for with_encode is True) –
rsa_params (RSAParam, effective for rsa method only) –
only_output_key (bool, if false, the results of intersection will include key and value which from input data; if true, it will just include key from input) – data and the value will be empty or some useless character like “intersect_id”
repeated_id_process (bool, if true, intersection will process the ids which can be repeatable) –
repeated_id_owner (str, which role has the repeated ids) –
with_sample_id (bool, data with sample id or not, default False; set this param to True may lead to unexpected behavior) –

class KmeansParam(k=5, max_iter=300, tol=0.001, random_stat=None)¶

kint, should be larger than 1 and less than 100 in this version, default 5.: The number of the centroids to generate.
max_iterint, default 300.: Maximum number of iterations of the hetero-k-means algorithm to run.

tol : float, default 0.001. random_stat : random seed

class LinearParam(penalty='L2', tol=0.0001, alpha=1.0, optimizer='sgd', batch_size=-1, learning_rate=0.01, init_param=<federatedml.param.init_model_param.InitParam object>, max_iter=20, early_stop='diff', predict_param=<federatedml.param.predict_param.PredictParam object>, encrypt_param=<federatedml.param.encrypt_param.EncryptParam object>, sqn_param=<federatedml.param.sqn_param.StochasticQuasiNewtonParam object>, encrypted_mode_calculator_param=<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object>, decay=1, decay_sqrt=True, validation_freqs=None, early_stopping_rounds=None, stepwise_param=<federatedml.param.stepwise_param.StepwiseParam object>, metrics=None, use_first_metric_only=False, floating_point_precision=23)¶

Parameters used for Linear Regression.

Parameters

penalty (str, 'L1' or 'L2'. default: 'L2') – Penalty method used in LinR. Please note that, when using encrypted version in HeteroLinR, ‘L1’ is not supported.
tol (float, default: 1e-4) – The tolerance of convergence
alpha (float, default: 1.0) – Regularization strength coefficient.
optimizer (str, 'sgd', 'rmsprop', 'adam', 'sqn', or 'adagrad', default: 'sgd') – Optimize method
batch_size (int, default: -1) – Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.
learning_rate (float, default: 0.01) – Learning rate
max_iter (int, default: 20) – The maximum iteration for training.
init_param (InitParam object, default: default InitParam object) – Init param method object.
early_stop (str, 'diff' or 'abs' or 'weight_dff', default: 'diff') –
Method used to judge convergence.
1. diff： Use difference of loss between two iterations to judge whether converge.
2. abs: Use the absolute value of loss to judge whether converge. i.e. if loss < tol, it is converged.
3. weight_diff: Use difference between weights of two consecutive iterations
predict_param (PredictParam object, default: default PredictParam object) –
encrypt_param (EncryptParam object, default: default EncryptParam object) –
encrypted_mode_calculator_param (EncryptedModeCalculatorParam object, default: default EncryptedModeCalculatorParam object) –
cv_param (CrossValidationParam object, default: default CrossValidationParam object) –
decay (int or float, default: 1) – Decay rate for learning rate. learning rate will follow the following decay schedule. lr = lr0/(1+decay*t) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decay*t) where t is the iter number.
decay_sqrt (Bool, default: True) – lr = lr0/(1+decay*t) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decay*t)
validation_freqs (int, list, tuple, set, or None) – validation frequency during training, required when using early stopping. The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to speed up training by skipping validation rounds. When it is larger than 1, a number which is divisible by “max_iter” is recommended, otherwise, you will miss the validation scores of the last training iteration.
early_stopping_rounds (int, default: None) – If positive number specified, at every specified training rounds, program checks for early stopping criteria. Validation_freqs must also be set when using early stopping.
metrics (list or None, default: None) – Specify which metrics to be used when performing evaluation during training process. If metrics have not improved at early_stopping rounds, trianing stops before convergence. If set as empty, default metrics will be used. For regression tasks, default metrics are [‘root_mean_squared_error’, ‘mean_absolute_error’]
use_first_metric_only (bool, default: False) – Indicate whether to use the first metric in metrics as the only criterion for early stopping judgement.
floating_point_precision (None or integer, if not None, use floating_point_precision-bit to speed up calculation,) –

e.g.: convert an x to round(x * 2**floating_point_precision) during Paillier operation, divide
the result by 2**floating_point_precision in the end.

class LocalBaselineParam(model_name='LogisticRegression', model_opts=None, predict_param=<federatedml.param.predict_param.PredictParam object>, need_run=True)¶

Define the local baseline model param

Parameters

model_name (str, sklearn model used to train on baseline model) –
model_opts (dict or none, default None) – Param to be used as input into baseline model
predict_param (PredictParam object, default: default PredictParam object) –
need_run (bool, default True) – Indicate if this module needed to be run

class LogisticParam(penalty='L2', tol=0.0001, alpha=1.0, optimizer='rmsprop', batch_size=-1, learning_rate=0.01, init_param=<federatedml.param.init_model_param.InitParam object>, max_iter=100, early_stop='diff', encrypt_param=<federatedml.param.encrypt_param.EncryptParam object>, predict_param=<federatedml.param.predict_param.PredictParam object>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object>, decay=1, decay_sqrt=True, multi_class='ovr', validation_freqs=None, early_stopping_rounds=None, stepwise_param=<federatedml.param.stepwise_param.StepwiseParam object>, floating_point_precision=23, metrics=None, use_first_metric_only=False)¶

Parameters used for Logistic Regression both for Homo mode or Hetero mode.

Parameters

penalty (str, 'L1', 'L2' or None. default: 'L2') – Penalty method used in LR. Please note that, when using encrypted version in HomoLR, ‘L1’ is not supported.
tol (float, default: 1e-4) – The tolerance of convergence
alpha (float, default: 1.0) – Regularization strength coefficient.
optimizer (str, 'sgd', 'rmsprop', 'adam', 'nesterov_momentum_sgd', 'sqn' or 'adagrad', default: 'rmsprop') – Optimize method, if ‘sqn’ has been set, sqn_param will take effect. Currently, ‘sqn’ support hetero mode only.
batch_size (int, default: -1) – Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.
learning_rate (float, default: 0.01) – Learning rate
max_iter (int, default: 100) – The maximum iteration for training.
early_stop (str, 'diff', 'weight_diff' or 'abs', default: 'diff') –
Method used to judge converge or not.
1. diff： Use difference of loss between two iterations to judge whether converge.
2. weight_diff: Use difference between weights of two consecutive iterations
3. abs: Use the absolute value of loss to judge whether converge. i.e. if loss < eps, it is converged.
Please note that for hetero-lr multi-host situation, this parameter support “weight_diff” only.
decay (int or float, default: 1) – Decay rate for learning rate. learning rate will follow the following decay schedule. lr = lr0/(1+decay*t) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decay*t) where t is the iter number.
decay_sqrt (Bool, default: True) – lr = lr0/(1+decay*t) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decay*t)
encrypt_param (EncryptParam object, default: default EncryptParam object) –
predict_param (PredictParam object, default: default PredictParam object) –
cv_param (CrossValidationParam object, default: default CrossValidationParam object) –
multi_class (str, 'ovr', default: 'ovr') – If it is a multi_class task, indicate what strategy to use. Currently, support ‘ovr’ short for one_vs_rest only.
validation_freqs (int, list, tuple, set, or None) – validation frequency during training.
early_stopping_rounds (int, default: None) – Will stop training if one metric doesn’t improve in last early_stopping_round rounds
metrics (list or None, default: None) – Indicate when executing evaluation during train process, which metrics will be used. If set as empty, default metrics for specific task type will be used. As for binary classification, default metrics are [‘auc’, ‘ks’]
use_first_metric_only (bool, default: False) – Indicate whether use the first metric only for early stopping judgement.
floating_point_precision (None or integer, if not None, use floating_point_precision-bit to speed up calculation,) –

e.g.: convert an x to round(x * 2**floating_point_precision) during Paillier operation, divide
the result by 2**floating_point_precision in the end.

class ObjectiveParam(objective='cross_entropy', params=None)¶

Define objective parameters that used in federated ml.

Parameters

objective (None or str, accepted None,'cross_entropy','lse','lae','log_cosh','tweedie','fair','huber' only,) – None in host’s config, should be str in guest’config. when task_type is classification, only support cross_entropy, other 6 types support in regression task. default: None
params (None or list, should be non empty list when objective is 'tweedie','fair','huber',) – first element of list shoulf be a float-number large than 0.0 when objective is ‘fair’,’huber’, first element of list should be a float-number in [1.0, 2.0) when objective is ‘tweedie’

class OneVsRestParam(need_one_vs_rest=False, has_arbiter=True)¶

Define the one_vs_rest parameters.

Parameters: has_arbiter (bool. For some algorithm, may not has arbiter, for instances, secureboost of FATE,) – for these algorithms, it should be set to false. default true

class PoissonParam(penalty='L2', tol=0.0001, alpha=1.0, optimizer='rmsprop', batch_size=-1, learning_rate=0.01, init_param=<federatedml.param.init_model_param.InitParam object>, max_iter=20, early_stop='diff', exposure_colname=None, predict_param=<federatedml.param.predict_param.PredictParam object>, encrypt_param=<federatedml.param.encrypt_param.EncryptParam object>, encrypted_mode_calculator_param=<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object>, stepwise_param=<federatedml.param.stepwise_param.StepwiseParam object>, decay=1, decay_sqrt=True, validation_freqs=None, early_stopping_rounds=None, metrics=None, use_first_metric_only=False, floating_point_precision=23)¶

Parameters used for Poisson Regression.

Parameters

penalty (str, 'L1' or 'L2'. default: 'L2') – Penalty method used in Poisson. Please note that, when using encrypted version in HeteroPoisson, ‘L1’ is not supported.
tol (float, default: 1e-4) – The tolerance of convergence
alpha (float, default: 1.0) – Regularization strength coefficient.
optimizer (str, 'sgd', 'rmsprop', 'adam' or 'adagrad', default: 'rmsprop') – Optimize method
batch_size (int, default: -1) – Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.
learning_rate (float, default: 0.01) – Learning rate
max_iter (int, default: 20) – The maximum iteration for training.
init_param (InitParam object, default: default InitParam object) – Init param method object.
early_stop (str, 'weight_diff', 'diff' or 'abs', default: 'diff') –
Method used to judge convergence.
1. diff： Use difference of loss between two iterations to judge whether converge.
2. weight_diff: Use difference between weights of two consecutive iterations
3. abs: Use the absolute value of loss to judge whether converge. i.e. if loss < eps, it is converged.
exposure_colname (str or None, default: None) – Name of optional exposure variable in dTable.
predict_param (PredictParam object, default: default PredictParam object) –
encrypt_param (EncryptParam object, default: default EncryptParam object) –
encrypted_mode_calculator_param (EncryptedModeCalculatorParam object, default: default EncryptedModeCalculatorParam object) –
cv_param (CrossValidationParam object, default: default CrossValidationParam object) –
stepwise_param (StepwiseParam object, default: default StepwiseParam object) –
decay (int or float, default: 1) – Decay rate for learning rate. learning rate will follow the following decay schedule. lr = lr0/(1+decay*t) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decay*t) where t is the iter number.
decay_sqrt (Bool, default: True) – lr = lr0/(1+decay*t) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decay*t)
validation_freqs (int, list, tuple, set, or None) – validation frequency during training, required when using early stopping. The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to speed up training by skipping validation rounds. When it is larger than 1, a number which is divisible by “max_iter” is recommended, otherwise, you will miss the validation scores of the last training iteration.
early_stopping_rounds (int, default: None) – If positive number specified, at every specified training rounds, program checks for early stopping criteria. Validation_freqs must also be set when using early stopping.
metrics (list or None, default: None) – Specify which metrics to be used when performing evaluation during training process. If metrics have not improved at early_stopping rounds, trianing stops before convergence. If set as empty, default metrics will be used. For regression tasks, default metrics are [‘root_mean_squared_error’, ‘mean_absolute_error’]
use_first_metric_only (bool, default: False) – Indicate whether to use the first metric in metrics as the only criterion for early stopping judgement.
floating_point_precision (None or integer, if not None, use floating_point_precision-bit to speed up calculation,) –

e.g.: convert an x to round(x * 2**floating_point_precision) during Paillier operation, divide
the result by 2**floating_point_precision in the end.

class PredictParam(threshold=0.5)¶

Define the predict method of HomoLR, HeteroLR, SecureBoosting

Parameters: threshold (float or int, The threshold use to separate positive and negative class. Normally, it should be (0,1)) –

class RSAParam(salt='', hash_method='sha256', final_hash_method='sha256', split_calculation=False, random_base_fraction=None, key_length=1024)¶

Define the hash method for RSA intersect method

Parameters

salt (the src data string will be str = str + salt, default '') –
hash_method (str, the hash method of src data string, it support sha256, sha384, sha512, sm3, default sha256) –
final_hash_method (str, the hash method of result data string, it support md5, sha1, sha224, sha256, sha384, sha512, sm3, default sha256) –
split_calculation (bool, if True, Host & Guest split operations for faster performance, recommended on large data set) –
random_base_fraction (positive float, if not None, generate (fraction * public key id count) of r for encryption and reuse generated r;) – note that value greater than 0.99 will be taken as 1, and value less than 0.01 will be rounded up to 0.01
key_length (positive int, bit count of rsa key, default 1024) –

class RsaParam(rsa_key_n=None, rsa_key_e=None, rsa_key_d=None, save_out_table_namespace=None, save_out_table_name=None)¶

Define the sample method

Parameters

rsa_key_n (integer, RSA modulus, default: None) –
rsa_key_e (integer, RSA public exponent, default: None) –
rsa_key_d (integer, RSA private exponent, default: None) –
save_out_table_namespace (str, namespace of dtable where stores the output data. default: None) –
save_out_table_name (str, name of dtable where stores the output data. default: None) –

class SampleParam(mode='random', method='downsample', fractions=None, random_state=None, task_type='hetero', need_run=True)¶

Define the sample method

Parameters

mode (str, accepted 'random','stratified'' only in this version, specify sample to use, default: 'random') –
method (str, accepted 'downsample','upsample' only in this version. default: 'downsample') –
fractions (None or float or list, if mode equals to random, it should be a float number greater than 0,) – otherwise a list of elements of pairs like [label_i, sample_rate_i], e.g. [[0, 0.5], [1, 0.8], [2, 0.3]]. default: None
random_state (int, RandomState instance or None, default: None) –
need_run (bool, default True) – Indicate if this module needed to be run

class SampleWeightParam(class_weight=None, sample_weight_name=None, normalize=False, need_run=True)¶

Define sample weight parameters

Parameters

class_weight (str or dict, default None) – class weight dictionary or class weight computation mode, string value only accepts ‘balanced’; If dict provided, key should be class(label), and weight will not be normalize, e.g.: {‘0’: 1, ‘1’: 2} If both class_weight and sample_weight_name are None, return original input data.
sample_weight_name (str, name of column which specifies sample weight.) – feature name of sample weight; if both class_weight and sample_weight_name are None, return original input data
normalize (bool, default False) – whether to normalize sample weight extracted from sample_weight_name column
need_run (bool, default True) – whether to run this module or not

class ScaleParam(method='standard_scale', mode='normal', scale_col_indexes=- 1, scale_names=None, feat_upper=None, feat_lower=None, with_mean=True, with_std=True, need_run=True)¶

Define the feature scale parameters.

Parameters

method (str, like scale in sklearn, now it support "min_max_scale" and "standard_scale", and will support other scale method soon.) – Default standard_scale, which will do nothing for scale
mode (str, the mode support "normal" and "cap". for mode is "normal", the feat_upper and feat_lower is the normal value like "10" or "3.1" and for "cap", feat_upper and) – feature_lower will between 0 and 1, which means the percentile of the column. Default “normal”
feat_upper (int or float or list of int or float, the upper limit in the column.) – If use list, mode must be “normal”, and list length should equal to the number of features to scale. If the scaled value is larger than feat_upper, it will be set to feat_upper. Default None.
feat_lower (int or float or list of int or float, the lower limit in the column.) – If use list, mode must be “normal”, and list length should equal to the number of features to scale. If the scaled value is less than feat_lower, it will be set to feat_lower. Default None.
scale_col_indexes (list,the idx of column in scale_column_idx will be scaled, while the idx of column is not in, it will not be scaled.) –
scale_names (list of string, default: []Specify which columns need to scaled. Each element in the list represent for a column name in header.) –
with_mean (bool, used for "standard_scale". Default True.) –
with_std (bool, used for "standard_scale". Default True.) – The standard scale of column x is calculated as : z = (x - u) / s, where u is the mean of the column and s is the standard deviation of the column. if with_mean is False, u will be 0, and if with_std is False, s will be 1.
need_run (bool, default True) – Indicate if this module needed to be run

class ScorecardParam(method='credit', offset=500, factor=20, factor_base=2, upper_limit_ratio=3, lower_limit_value=0, need_run=True)¶

Define method used for transforming prediction score to credit score

Parameters

method (str, default: 'credit') – score method, currently only supports “credit”
offset (int or float, default: 500) – score baseline
factor (int or float, default: 20) – scoring step, when odds double, result score increases by this factor
factor_base (int or float, default: 2) – factor base, value ln(factor_base) is used for calculating result score
upper_limit_ratio (int or float, default: 3) – upper bound for odds, credit score upper bound is upper_limit_ratio * offset
lower_limit_value (int or float, default: 0) – lower bound for result score
need_run (bool, default: True) – Indicate if this module needs to be run.

class SecureInformationRetrievalParam(security_level=0.5, oblivious_transfer_protocol='OT_Hauck', commutative_encryption='CommutativeEncryptionPohligHellman', non_committing_encryption='aes', key_size=1024, raw_retrieval=False)¶: security_level: float [0, 1]; if security_level == 0, then do raw data retrieval oblivious_transfer_protocol: OT type, only supports consts.OT_HAUCK commutative_encryption: the commutative encryption scheme used, only supports consts.CE_PH non_committing_encryption: the non-committing encryption scheme used, only supports consts.AES key_size: int >= 768, the key length of the commutative cipher raw_retrieval: bool, perform raw retrieval if raw_retrieval

class StatisticsParam(statistics='summary', column_names=None, column_indexes=- 1, need_run=True, abnormal_list=None, quantile_error=0.0001, bias=True)¶

Define statistics params

Parameters

statistics (list, string, default "summary") –
Specify the statistic types to be computed. “summary” represents list: [consts.SUM, consts.MEAN, consts.STANDARD_DEVIATION,

consts.MEDIAN, consts.MIN, consts.MAX, consts.MISSING_COUNT, consts.SKEWNESS, consts.KURTOSIS]
column_names (list of string, default []) – Specify columns to be used for statistic computation by column names in header
column_indexes (list of int, default -1) – Specify columns to be used for statistic computation by column order in header -1 indicates to compute statistics over all columns
bias (bool, default: True) – If False, the calculations of skewness and kurtosis are corrected for statistical bias.
need_run (bool, default True) – Indicate whether to run this modules

class StepwiseParam(score_name='AIC', mode='hetero', role='guest', direction='both', max_step=10, nvmin=2, nvmax=None, need_stepwise=False)¶

Define stepwise params

Parameters

score_name (str, default: 'AIC') – Specify which model selection criterion to be used, choose ‘aic’ or ‘bic’
mode (str, default: 'Hetero') – Indicate what mode is current task
role (str, default: 'Guest') – Indicate what role is current party
direction (str, default: 'both') – Indicate which direction to go for stepwise. ‘forward’ means forward selection; ‘backward’ means elimination; ‘both’ means possible models of both directions are examined at each step.
max_step (int, default: '10') – Specify total number of steps to run before forced stop.
nvmin (int, default: '2') – Specify the min subset size of final model, cannot be lower than 2. When nvmin > 2, the final model size may be smaller than nvmin due to max_step limit.
nvmax (int, default: None) – Specify the max subset size of final model, 2 <= nvmin <= nvmax. The final model size may be larger than nvmax due to max_step limit.
need_stepwise (bool, default False) – Indicate if this module needed to be run

class StochasticQuasiNewtonParam(update_interval_L=3, memory_M=5, sample_size=5000, random_seed=None)¶

Parameters used for stochastic quasi-newton method.

Parameters

update_interval_L (int, default: 3) – Set how many iteration to update hess matrix
memory_M (int, default: 5) – Stack size of curvature information, i.e. y_k and s_k in the paper.
sample_size (int, default: 5000) – Sample size of data that used to update Hess matrix

class UnionParam(need_run=True, allow_missing=False, keep_duplicate=False)¶

Define the union method for combining multiple dTables and keep entries with the same id

Parameters

need_run (bool, default True) – Indicate if this module needed to be run
allow_missing (bool, default False) – Whether allow mismatch between feature length and header length in the result. Note that empty tables will always be skipped regardless of this param setting.
keep_duplicate (bool, default False) – Whether to keep entries with duplicated keys. If set to True, a new id will be generated for duplicated entry in the format {id}_{table_name}.