Params¶

Classes:

`BoostingParam`([task_type, objective_param, …])	Basic parameter for Boosting Algorithms
`ObjectiveParam`([objective, params])	Define objective parameters that used in federated ml.
`DecisionTreeParam`([criterion_method, …])	Define decision tree parameters that used in federated ml.
`CrossValidationParam`([n_splits, mode, role, …])	Define cross validation params
`DataSplitParam`([random_state, test_size, …])	Define data split param that used in data split.
`DataIOParam`([input_format, delimitor, …])	Define dataio parameters that used in federated ml.
`EncryptParam`([method, key_length])	Define encryption method that used in federated ml.
`EncryptedModeCalculatorParam`([mode, …])	Define the encrypted_mode_calulator parameters.
`FeatureBinningParam`([method, …])	Define the feature binning method
`FeatureSelectionParam`([select_col_indexes, …])	Define the feature selection parameters.
`HeteroNNParam`([task_type, config_type, …])	Parameters used for Homo Neural Network.
`HomoNNParam`(secure_aggregate, …[, …])	Parameters used for Homo Neural Network.
`HomoOneHotParam`([transform_col_indexes, …])	param transform_col_indexes Specify which columns need to calculated. -1 represent for all columns.
`InitParam`([init_method, init_const, …])	Initialize Parameters used in initializing a model.
`IntersectParam`(intersect_method[, …])	Define the intersect method
`LinearParam`([penalty, tol, alpha, …])	Parameters used for Linear Regression.
`LocalBaselineParam`([model_name, model_opts, …])	Define the local baseline model param
`LogisticParam`([penalty, tol, alpha, …])	Parameters used for Logistic Regression both for Homo mode or Hetero mode.
`OneVsRestParam`([need_one_vs_rest, has_arbiter])	Define the one_vs_rest parameters.
`PoissonParam`([penalty, tol, alpha, …])	Parameters used for Poisson Regression.
`PredictParam`([threshold])	Define the predict method of HomoLR, HeteroLR, SecureBoosting
`RsaParam`([rsa_key_n, rsa_key_e, rsa_key_d, …])	Define the sample method
`SampleParam`([mode, method, fractions, …])	Define the sample method
`ScaleParam`([method, mode, …])	Define the feature scale parameters.
`StochasticQuasiNewtonParam`([…])	Parameters used for stochastic quasi-newton method.
`StatisticsParam`([statistics, column_names, …])	Define statistics params
`StepwiseParam`([score_name, mode, role, …])	Define stepwise params
`UnionParam`([need_run, allow_missing, …])	Define the union method for combining multiple dTables and keep entries with the same id

class BoostingParam(task_type='classification', objective_param=<federatedml.param.boosting_param.ObjectiveParam object>, learning_rate=0.3, num_trees=5, subsample_feature_rate=1, n_iter_no_change=True, tol=0.0001, bin_num=32, predict_param=<federatedml.param.predict_param.PredictParam object>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object>, validation_freqs=None, metrics=None, subsample_random_seed=None, binning_error=0.001)¶

Basic parameter for Boosting Algorithms

Parameters

task_type (str, accepted 'classification', 'regression' only, default: 'classification') –
objective_param (ObjectiveParam Object, default: ObjectiveParam()) –
learning_rate (float, accepted float, int or long only, the learning rate of secure boost. default: 0.3) –
num_trees (int, accepted int, float only, the max number of boosting round. default: 5) –
subsample_feature_rate (float, a float-number in [0, 1], default: 0.8) –
n_iter_no_change (bool,) – when True and residual error less than tol, tree building process will stop. default: True
bin_num (int, positive integer greater than 1, bin number use in quantile. default: 32) –
validation_freqs (None or positive integer or container object in python. Do validation in training process or Not.) –
if equals None, will not do validation in train process; if equals positive integer, will validate data every validation_freqs epochs passes; if container object in python, will validate data if epochs belong to this container.

e.g. validation_freqs = [10, 15], will validate data when epoch equals to 10 and 15.

Default: None

class ObjectiveParam(objective='cross_entropy', params=None)¶

Define objective parameters that used in federated ml.

Parameters

objective (None or str, accepted None,'cross_entropy','lse','lae','log_cosh','tweedie','fair','huber' only,) – None in host’s config, should be str in guest’config. when task_type is classification, only support cross_entropy, other 6 types support in regression task. default: None
params (None or list, should be non empty list when objective is 'tweedie','fair','huber',) – first element of list shoulf be a float-number large than 0.0 when objective is ‘fair’,’huber’, first element of list should be a float-number in [1.0, 2.0) when objective is ‘tweedie’

class DecisionTreeParam(criterion_method='xgboost', criterion_params=[0.1], max_depth=3, min_sample_split=2, min_imputiry_split=0.001, min_leaf_node=1, max_split_nodes=65536, feature_importance_type='split', n_iter_no_change=True, tol=0.001, use_missing=False, zero_as_missing=False)¶

Define decision tree parameters that used in federated ml.

Parameters

criterion_method (str, accepted "xgboost" only, the criterion function to use, default: 'xgboost') –
criterion_params (list, should be non empty and first element is float-number, default: 0.1.) –
max_depth (int, positive integer, the max depth of a decision tree, default: 3) –
min_sample_split (int, least quantity of nodes to split, default: 2) –
min_impurity_split (float, least gain of a single split need to reach, default: 1e-3) –
min_leaf_node (int, when samples no more than min_leaf_node, it becomes a leave, default: 1) –
max_split_nodes (int, positive integer, we will use no more than max_split_nodes to) – parallel finding their splits in a batch, for memory consideration. default is 65536
feature_importance_type (str, support 'split', 'gain' only.) – if is ‘split’, feature_importances calculate by feature split times, if is ‘gain’, feature_importances calculate by feature split gain. default: ‘split’
use_missing (bool, accepted True, False only, use missing value in training process or not. default: False) –
zero_as_missing (bool, accepted True, False only, regard 0 as missing value or not,) – will be use only if use_missing=True, default: False

class CrossValidationParam(n_splits=5, mode='hetero', role='guest', shuffle=True, random_seed=1, need_cv=False)¶

Define cross validation params

Parameters

n_splits (int, default: 5) – Specify how many splits used in KFold
mode (str, default: 'Hetero') – Indicate what mode is current task
role (str, default: 'Guest') – Indicate what role is current party
shuffle (bool, default: True) – Define whether do shuffle before KFold or not.
random_seed (int, default: 1) – Specify the random seed for numpy shuffle
need_cv (bool, default True) – Indicate if this module needed to be run

class DataSplitParam(random_state=None, test_size=None, train_size=None, validate_size=None, stratified=False, shuffle=True, split_points=None, need_run=True)¶

Define data split param that used in data split.

Parameters

random_state (None, int, default: None) – Specify the random state for shuffle.
test_size (None, float, int, default: 0.0) – Specify test data set size. float value specifies fraction of input data set, int value specifies exact number of data instances
train_size (None, float, int, default: 0.8) – Specify train data set size. float value specifies fraction of input data set, int value specifies exact number of data instances
validate_size (None, float, int, default: 0.2) – Specify validate data set size. float value specifies fraction of input data set, int value specifies exact number of data instances
stratified (boolean, default: False) – Define whether sampling should be stratified, according to label value.
shuffle (boolean, default : True) – Define whether do shuffle before splitting or not.
split_points (None, list, default : None) – Specify the point(s) by which continuous label values are bucketed into bins for stratified split. eg.[0.2] for two bins or [0.1, 1, 3] for 4 bins
need_run (bool, default: True) – Specify whether to run data split

class DataIOParam(input_format='dense', delimitor=',', data_type='float64', exclusive_data_type=None, tag_with_value=False, tag_value_delimitor=':', missing_fill=False, default_value=0, missing_fill_method=None, missing_impute=None, outlier_replace=False, outlier_replace_method=None, outlier_impute=None, outlier_replace_value=0, with_label=False, label_name='y', label_type='int', output_format='dense', need_run=True)¶

Define dataio parameters that used in federated ml.

Parameters

input_format (str, accepted 'dense','sparse' 'tag' only in this version. default: 'dense'.) –
please have a look at this tutorial at “DataIO” section of federatedml/util/README.md. Formally,

dense input format data should be set to “dense”, svm-light input format data should be set to “sparse”, tag or tag:value input format data should be set to “tag”.
delimitor (str, the delimitor of data input, default: ',') –
data_type (str, the data type of data input, accepted 'float','float64','int','int64','str','long') – “default: “float64”
exclusive_data_type (dict, the key of dict is col_name, the value is data_type, use to specified special data type) – of some features.
tag_with_value (bool, use if input_format is 'tag', if tag_with_value is True,) – input column data format should be tag[delimitor]value, otherwise is tag only
tag_value_delimitor (str, use if input_format is 'tag' and 'tag_with_value' is True,) – delimitor of tag[delimitor]value column value.
missing_fill (bool, need to fill missing value or not, accepted only True/False, default: False) –
default_value (None or single object type or list, the value to replace missing value.) –
if None, it will use default value define in federatedml/feature/imputer.py, if single object, will fill missing value with this object, if list, it’s length should be the sample of input data’ feature dimension,

means that if some column happens to have missing values, it will replace it the value by element in the identical position of this list.

default: None
missing_fill_method (None or str, the method to replace missing value, should be one of [None, 'min', 'max', 'mean', 'designated'], default: None) –
missing_impute (None or list, element of list can be any type, or auto generated if value is None, define which values to be consider as missing, default: None) –
outlier_replace (bool, need to replace outlier value or not, accepted only True/False, default: True) –
outlier_replace_method (None or str, the method to replace missing value, should be one of [None, 'min', 'max', 'mean', 'designated'], default: None) –
outlier_impute (None or list, element of list can be any type, which values should be regard as missing value, default: None) –
outlier_replace_value (None or single object type or list, the value to replace outlier.) –
if None, it will use default value define in federatedml/feature/imputer.py, if single object, will replace outlier with this object, if list, it’s length should be the sample of input data’ feature dimension,

means that if some column happens to have outliers, it will replace it the value by element in the identical position of this list.

default: None
with_label (bool, True if input data consist of label, False otherwise. default: 'false') –
label_name (str, column_name of the column where label locates, only use in dense-inputformat. default: 'y') –
label_type (object, accepted 'int','int64','float','float64','long','str' only,) – use when with_label is True. default: ‘false’
output_format (str, accepted 'dense','sparse' only in this version. default: 'dense') –

class EncryptParam(method='Paillier', key_length=1024)¶

Define encryption method that used in federated ml.

Parameters

method (str, default: 'Paillier') – If method is ‘Paillier’, Paillier encryption will be used for federated ml. To use non-encryption version in HomoLR, set this to None. For detail of Paillier encryption, please check out the paper mentioned in README file. Accepted values: {‘Paillier’, ‘IterativeAffine’, ‘Random_IterativeAffine’}
key_length (int, default: 1024) – Used to specify the length of key in this encryption method.

class EncryptedModeCalculatorParam(mode='strict', re_encrypted_rate=1)¶

Define the encrypted_mode_calulator parameters.

Parameters

mode (str, support 'strict', 'fast', 'balance', 'confusion_opt', ' only, default: strict) –
re_encrypted_rate (float or int, numeric number in [0, 1], use when mode equals to 'balance, default: 1) –

class FeatureBinningParam(method='quantile', compress_thres=10000, head_size=10000, error=0.001, bin_num=10, bin_indexes=-1, bin_names=None, adjustment_factor=0.5, transform_param=<federatedml.param.feature_binning_param.TransformParam object>, optimal_binning_param=<federatedml.param.feature_binning_param.OptimalBinningParam object>, local_only=False, category_indexes=None, category_names=None, need_run=True, skip_static=False)¶

Define the feature binning method

Parameters

method (str, 'quantile'， 'bucket' or 'optimal', default: 'quantile') – Binning method.
compress_thres (int, default: 10000) – When the number of saved summaries exceed this threshold, it will call its compress function
head_size (int, default: 10000) – The buffer size to store inserted observations. When head list reach this buffer size, the QuantileSummaries object start to generate summary(or stats) and insert into its sampled list.
error (float, 0 <= error < 1 default: 0.001) – The error of tolerance of binning. The final split point comes from original data, and the rank of this value is close to the exact rank. More precisely, floor((p - 2 * error) * N) <= rank(x) <= ceil((p + 2 * error) * N) where p is the quantile in float, and N is total number of data.
bin_num (int, bin_num > 0, default: 10) – The max bin number for binning
bin_indexes (list of int or int, default: -1) – Specify which columns need to be binned. -1 represent for all columns. If you need to indicate specific cols, provide a list of header index instead of -1.
bin_names (list of string, default: []) – Specify which columns need to calculated. Each element in the list represent for a column name in header.
adjustment_factor (float, default: 0.5) – the adjustment factor when calculating WOE. This is useful when there is no event or non-event in a bin. Please note that this parameter will NOT take effect for setting in host.
category_indexes (list of int or int, default: []) –
Specify which columns are category features. -1 represent for all columns. List of int indicate a set of such features. For category features, bin_obj will take its original values as split_points and treat them as have been binned. If this is not what you expect, please do NOT put it into this parameters.

The number of categories should not exceed bin_num set above.
category_names (list of string, default: []) – Use column names to specify category features. Each element in the list represent for a column name in header.
local_only (bool, default: False) – Whether just provide binning method to guest party. If true, host party will do nothing.
transform_param (TransformParam) – Define how to transfer the binned data.
need_run (bool, default True) – Indicate if this module needed to be run
skip_static (bool, default False) – If true, binning will not calculate iv, woe etc. In this case, optimal-binning will not be supported.

class FeatureSelectionParam(select_col_indexes=-1, select_names=None, filter_methods=None, unique_param=<federatedml.param.feature_selection_param.UniqueValueParam object>, iv_value_param=<federatedml.param.feature_selection_param.IVValueSelectionParam object>, iv_percentile_param=<federatedml.param.feature_selection_param.IVPercentileSelectionParam object>, iv_top_k_param=<federatedml.param.feature_selection_param.IVTopKParam object>, variance_coe_param=<federatedml.param.feature_selection_param.VarianceOfCoeSelectionParam object>, outlier_param=<federatedml.param.feature_selection_param.OutlierColsSelectionParam object>, manually_param=<federatedml.param.feature_selection_param.ManuallyFilterParam object>, percentage_value_param=<federatedml.param.feature_selection_param.PercentageValueParam object>, iv_param=<federatedml.param.feature_selection_param.CommonFilterParam object>, statistic_param=<federatedml.param.feature_selection_param.CommonFilterParam object>, psi_param=<federatedml.param.feature_selection_param.CommonFilterParam object>, sbt_param=<federatedml.param.feature_selection_param.CommonFilterParam object>, need_run=True)¶

Define the feature selection parameters.

Parameters

select_col_indexes (list or int, default: -1) – Specify which columns need to calculated. -1 represent for all columns.
select_names (list of string, default: []) – Specify which columns need to calculated. Each element in the list represent for a column name in header.
filter_methods (list, ["manually", "iv_filter", "statistic_filter",) –

“psi_filter”, “hetero_sbt_filter”, “homo_sbt_filter”,
”hetero_fast_sbt_filter”, “percentage_value”],

default: [“manually”]

The following methods will be deprecated in future version: “unique_value”, “iv_value_thres”, “iv_percentile”, “coefficient_of_variation_value_thres”, “outlier_cols”

Specify the filter methods used in feature selection. The orders of filter used is depended on this list. Please be notified that, if a percentile method is used after some certain filter method, the percentile represent for the ratio of rest features.

e.g. If you have 10 features at the beginning. After first filter method, you have 8 rest. Then, you want top 80% highest iv feature. Here, we will choose floor(0.8 * 8) = 6 features instead of 8.
unique_param (filter the columns if all values in this feature is the same) –
iv_value_param (Use information value to filter columns. If this method is set, a float threshold need to be provided.) – Filter those columns whose iv is smaller than threshold. Will be deprecated in the future.
iv_percentile_param (Use information value to filter columns. If this method is set, a float ratio threshold) – need to be provided. Pick floor(ratio * feature_num) features with higher iv. If multiple features around the threshold are same, all those columns will be keep. Will be deprecated in the future.
variance_coe_param (Use coefficient of variation to judge whether filtered or not.) – Will be deprecated in the future.
outlier_param (Filter columns whose certain percentile value is larger than a threshold.) – Will be deprecated in the future.
percentage_value_param (Filter the columns that have a value that exceeds a certain percentage.) –
iv_param (Setting how to filter base on iv. It support take high mode only. All of "threshold",) – “top_k” and “top_percentile” are accepted. Check more details in CommonFilterParam. To use this filter, hetero-feature-binning module has to be provided.
statistic_param (Setting how to filter base on statistic values. All of "threshold",) – “top_k” and “top_percentile” are accepted. Check more details in CommonFilterParam. To use this filter, data_statistic module has to be provided.
psi_param (Setting how to filter base on psi values. All of "threshold",) – “top_k” and “top_percentile” are accepted. Its take_high properties should be False to choose lower psi features. Check more details in CommonFilterParam. To use this filter, data_statistic module has to be provided.
need_run (bool, default True) – Indicate if this module needed to be run

class HeteroNNParam(task_type='classification', config_type='keras', bottom_nn_define=None, top_nn_define=None, interactive_layer_define=None, interactive_layer_lr=0.9, optimizer='SGD', loss=None, epochs=100, batch_size=-1, early_stop='diff', tol=1e-05, encrypt_param=<federatedml.param.encrypt_param.EncryptParam object>, encrypted_mode_calculator_param=<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object>, predict_param=<federatedml.param.predict_param.PredictParam object>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object>, validation_freqs=None, early_stopping_rounds=None, metrics=None, use_first_metric_only=True)¶

Parameters used for Homo Neural Network.

Parameters

task_type – str, task type of hetero nn model, one of ‘classification’, ‘regression’.
config_type – str, accept “keras” only.
bottom_nn_define – a dict represents the structure of bottom neural network.
interactive_layer_define – a dict represents the structure of interactive layer.
interactive_layer_lr – float, the learning rate of interactive layer.
top_nn_define – a dict represents the structure of top neural network.
optimizer –
optimizer method, accept following types: 1. a string, one of “Adadelta”, “Adagrad”, “Adam”, “Adamax”, “Nadam”, “RMSprop”, “SGD” 2. a dict, with a required key-value pair keyed by “optimizer”,

with optional key-value pairs such as learning rate.

defaults to “SGD”
loss – str, a string to define loss function used
early_stopping_rounds – int, default: None
stop training if one metric doesn’t improve in last early_stopping_round rounds (Will) –
metrics – list, default: None Indicate when executing evaluation during train process, which metrics will be used. If not set, default metrics for specific task type will be used. As for binary classification, default metrics are [‘auc’, ‘ks’], for regression tasks, default metrics are [‘root_mean_squared_error’, ‘mean_absolute_error’], [ACCURACY, PRECISION, RECALL] for multi-classification task
use_first_metric_only – bool, default: False Indicate whether to use the first metric in metrics as the only criterion for early stopping judgement.
epochs – int, the maximum iteration for aggregation in training.
batch_size – int, batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy. defaults to -1.
early_stop –
str, accept ‘diff’ only in this version, default: ‘diff’ Method used to judge converge or not.
1. diff： Use difference of loss between two iterations to judge whether converge.
validation_freqs –
None or positive integer or container object in python. Do validation in training process or Not. if equals None, will not do validation in train process; if equals positive integer, will validate data every validation_freqs epochs passes; if container object in python, will validate data if epochs belong to this container.

e.g. validation_freqs = [10, 15], will validate data when epoch equals to 10 and 15.

Default: None The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to speed up training by skipping validation rounds. When it is larger than 1, a number which is divisible by “epochs” is recommended, otherwise, you will miss the validation scores of last training epoch.

class HomoNNParam(secure_aggregate: bool = True, aggregate_every_n_epoch: int = 1, config_type: str = 'nn', nn_define: dict = None, optimizer: Union[str, dict, types.SimpleNamespace] = 'SGD', loss: str = None, metrics: Union[str, list] = None, max_iter: int = 100, batch_size: int = -1, early_stop: Union[str, dict, types.SimpleNamespace] = 'diff', encode_label: bool = False, predict_param=<federatedml.param.predict_param.PredictParam object>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object>)¶

Parameters used for Homo Neural Network.

Parameters

Args –

secure_aggregate: enable secure aggregation or not, defaults to True. aggregate_every_n_epoch: aggregate model every n epoch, defaults to 1. config_type: one of “nn”, “keras”, “tf” nn_define: a dict represents the structure of neural network. optimizer: optimizer method, accept following types:

a string, one of “Adadelta”, “Adagrad”, “Adam”, “Adamax”, “Nadam”, “RMSprop”, “SGD”

a dict, with a required key-value pair keyed by “optimizer”,
with optional key-value pairs such as learning rate.

defaults to “SGD”

loss: a string metrics: max_iter: the maximum iteration for aggregation in training. batch_size : batch size when updating model.

-1 means use all data in a batch. i.e. Not to use mini-batch strategy. defaults to -1.

early_stopstr, ‘diff’, ‘weight_diff’ or ‘abs’, default: ‘diff’

Method used to judge converge or not.

diff： Use difference of loss between two iterations to judge whether converge.
weight_diff: Use difference between weights of two consecutive iterations
abs: Use the absolute value of loss to judge whether converge. i.e. if loss < eps, it is converged.

encode_label : encode label to one_hot.

class HomoOneHotParam(transform_col_indexes=- 1, transform_col_names=None, need_run=True, need_alignment=True)¶

Parameters

transform_col_indexes (list or int, default: -1) – Specify which columns need to calculated. -1 represent for all columns.
need_run (bool, default True) – Indicate if this module needed to be run
need_alignment (bool, default True) – Indicated whether alignment of features is turned on

class InitParam(init_method='random_uniform', init_const=1, fit_intercept=True, random_seed=None)¶

Initialize Parameters used in initializing a model.

Parameters

init_method (str, 'random_uniform', 'random_normal', 'ones', 'zeros' or 'const'. default: 'random_uniform') – Initial method.
init_const (int or float, default: 1) – Required when init_method is ‘const’. Specify the constant.
fit_intercept (bool, default: True) – Whether to initialize the intercept or not.

class IntersectParam(intersect_method: str = 'raw', random_bit=128, sync_intersect_ids=True, join_role='guest', with_encode=False, only_output_key=False, encode_params=<federatedml.param.intersect_param.EncodeParam object>, intersect_cache_param=<federatedml.param.intersect_param.IntersectCache object>, repeated_id_process=False, repeated_id_owner='guest', allow_info_share: bool = False, info_owner='guest')¶

Define the intersect method

Parameters

intersect_method (str, it supports 'rsa' and 'raw', default by 'raw') –
random_bit (positive int, it will define the encrypt length of rsa algorithm. It effective only for intersect_method is rsa) –
sync_intersect_ids (bool. In rsa, 'synchronize_intersect_ids' is True means guest or host will send intersect results to the others, and False will not.) – while in raw, ‘synchronize_intersect_ids’ is True means the role of “join_role” will send intersect results and the others will get them. Default by True.
join_role (str, it supports "guest" and "host" only and effective only for raw. If it is "guest", the host will send its ids to guest and find the intersection of) – ids in guest; if it is “host”, the guest will send its ids. Default by “guest”.
with_encode (bool, if True, it will use encode method for intersect ids. It effective only for "raw".) –
encode_params (EncodeParam, it effective only for with_encode is True) –
only_output_key (bool, if false, the results of intersection will include key and value which from input data; if true, it will just include key from input) – data and the value will be empty or some useless character like “intersect_id”
repeated_id_process (bool, if true, intersection will process the ids which can be repeatable) –
repeated_id_owner (str, which role has the repeated ids) –

class LinearParam(penalty='L2', tol=0.0001, alpha=1.0, optimizer='sgd', batch_size=-1, learning_rate=0.01, init_param=<federatedml.param.init_model_param.InitParam object>, max_iter=20, early_stop='diff', predict_param=<federatedml.param.predict_param.PredictParam object>, encrypt_param=<federatedml.param.encrypt_param.EncryptParam object>, sqn_param=<federatedml.param.sqn_param.StochasticQuasiNewtonParam object>, encrypted_mode_calculator_param=<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object>, decay=1, decay_sqrt=True, validation_freqs=None, early_stopping_rounds=None, stepwise_param=<federatedml.param.stepwise_param.StepwiseParam object>, metrics=None, use_first_metric_only=False)¶

Parameters used for Linear Regression.

Parameters

penalty (str, 'L1' or 'L2'. default: 'L2') – Penalty method used in LinR. Please note that, when using encrypted version in HeteroLinR, ‘L1’ is not supported.
tol (float, default: 1e-4) – The tolerance of convergence
alpha (float, default: 1.0) – Regularization strength coefficient.
optimizer (str, 'sgd', 'rmsprop', 'adam', 'sqn', or 'adagrad', default: 'sgd') – Optimize method
batch_size (int, default: -1) – Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.
learning_rate (float, default: 0.01) – Learning rate
max_iter (int, default: 20) – The maximum iteration for training.
init_param (InitParam object, default: default InitParam object) – Init param method object.
early_stop (str, 'diff' or 'abs' or 'weight_dff', default: 'diff') –
Method used to judge convergence.
1. diff： Use difference of loss between two iterations to judge whether converge.
2. abs: Use the absolute value of loss to judge whether converge. i.e. if loss < tol, it is converged.
3. weight_diff: Use difference between weights of two consecutive iterations
predict_param (PredictParam object, default: default PredictParam object) –
encrypt_param (EncryptParam object, default: default EncryptParam object) –
encrypted_mode_calculator_param (EncryptedModeCalculatorParam object, default: default EncryptedModeCalculatorParam object) –
cv_param (CrossValidationParam object, default: default CrossValidationParam object) –
decay (int or float, default: 1) – Decay rate for learning rate. learning rate will follow the following decay schedule. lr = lr0/(1+decay*t) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decay*t) where t is the iter number.
decay_sqrt (Bool, default: True) – lr = lr0/(1+decay*t) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decay*t)
validation_freqs (int, list, tuple, set, or None) – validation frequency during training, required when using early stopping. The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to speed up training by skipping validation rounds. When it is larger than 1, a number which is divisible by “max_iter” is recommended, otherwise, you will miss the validation scores of the last training iteration.
early_stopping_rounds (int, default: None) – If positive number specified, at every specified training rounds, program checks for early stopping criteria. Validation_freqs must also be set when using early stopping.
metrics (list or None, default: None) – Specify which metrics to be used when performing evaluation during training process. If metrics have not improved at early_stopping rounds, trianing stops before convergence. If set as empty, default metrics will be used. For regression tasks, default metrics are [‘root_mean_squared_error’, ‘mean_absolute_error’]
use_first_metric_only (bool, default: False) – Indicate whether to use the first metric in metrics as the only criterion for early stopping judgement.

class LocalBaselineParam(model_name='LogisticRegression', model_opts=None, predict_param=<federatedml.param.predict_param.PredictParam object>, need_run=True)¶

Define the local baseline model param

Parameters

model_name (str, sklearn model used to train on baseline model) –
model_opts (dict or none, default None) – Param to be used as input into baseline model
predict_param (PredictParam object, default: default PredictParam object) –
need_run (bool, default True) – Indicate if this module needed to be run

class LogisticParam(penalty='L2', tol=0.0001, alpha=1.0, optimizer='rmsprop', batch_size=-1, learning_rate=0.01, init_param=<federatedml.param.init_model_param.InitParam object>, max_iter=100, early_stop='diff', encrypt_param=<federatedml.param.encrypt_param.EncryptParam object>, predict_param=<federatedml.param.predict_param.PredictParam object>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object>, decay=1, decay_sqrt=True, multi_class='ovr', validation_freqs=None, early_stopping_rounds=None, stepwise_param=<federatedml.param.stepwise_param.StepwiseParam object>, metrics=None, use_first_metric_only=False)¶

Parameters used for Logistic Regression both for Homo mode or Hetero mode.

Parameters

penalty (str, 'L1', 'L2' or None. default: 'L2') – Penalty method used in LR. Please note that, when using encrypted version in HomoLR, ‘L1’ is not supported.
tol (float, default: 1e-4) – The tolerance of convergence
alpha (float, default: 1.0) – Regularization strength coefficient.
optimizer (str, 'sgd', 'rmsprop', 'adam', 'nesterov_momentum_sgd', 'sqn' or 'adagrad', default: 'rmsprop') – Optimize method, if ‘sqn’ has been set, sqn_param will take effect. Currently, ‘sqn’ support hetero mode only.
batch_size (int, default: -1) – Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.
learning_rate (float, default: 0.01) – Learning rate
max_iter (int, default: 100) – The maximum iteration for training.
early_stop (str, 'diff', 'weight_diff' or 'abs', default: 'diff') –
Method used to judge converge or not.
1. diff： Use difference of loss between two iterations to judge whether converge.
2. weight_diff: Use difference between weights of two consecutive iterations
3. abs: Use the absolute value of loss to judge whether converge. i.e. if loss < eps, it is converged.
Please note that for hetero-lr multi-host situation, this parameter support “weight_diff” only.
decay (int or float, default: 1) – Decay rate for learning rate. learning rate will follow the following decay schedule. lr = lr0/(1+decay*t) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decay*t) where t is the iter number.
decay_sqrt (Bool, default: True) – lr = lr0/(1+decay*t) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decay*t)
encrypt_param (EncryptParam object, default: default EncryptParam object) –
predict_param (PredictParam object, default: default PredictParam object) –
cv_param (CrossValidationParam object, default: default CrossValidationParam object) –
multi_class (str, 'ovr', default: 'ovr') – If it is a multi_class task, indicate what strategy to use. Currently, support ‘ovr’ short for one_vs_rest only.
validation_freqs (int, list, tuple, set, or None) – validation frequency during training.
early_stopping_rounds (int, default: None) – Will stop training if one metric doesn’t improve in last early_stopping_round rounds
metrics (list or None, default: None) – Indicate when executing evaluation during train process, which metrics will be used. If set as empty, default metrics for specific task type will be used. As for binary classification, default metrics are [‘auc’, ‘ks’]
use_first_metric_only (bool, default: False) – Indicate whether use the first metric only for early stopping judgement.

class OneVsRestParam(need_one_vs_rest=False, has_arbiter=True)¶

Define the one_vs_rest parameters.

Parameters: has_arbiter (bool. For some algorithm, may not has arbiter, for instances, secureboost of FATE,) – for these algorithms, it should be set to false. default true

class PoissonParam(penalty='L2', tol=0.0001, alpha=1.0, optimizer='rmsprop', batch_size=-1, learning_rate=0.01, init_param=<federatedml.param.init_model_param.InitParam object>, max_iter=20, early_stop='diff', exposure_colname=None, predict_param=<federatedml.param.predict_param.PredictParam object>, encrypt_param=<federatedml.param.encrypt_param.EncryptParam object>, encrypted_mode_calculator_param=<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object>, stepwise_param=<federatedml.param.stepwise_param.StepwiseParam object>, decay=1, decay_sqrt=True, validation_freqs=None, early_stopping_rounds=None, metrics=None, use_first_metric_only=False)¶

Parameters used for Poisson Regression.

Parameters

penalty (str, 'L1' or 'L2'. default: 'L2') – Penalty method used in Poisson. Please note that, when using encrypted version in HeteroPoisson, ‘L1’ is not supported.
tol (float, default: 1e-4) – The tolerance of convergence
alpha (float, default: 1.0) – Regularization strength coefficient.
optimizer (str, 'sgd', 'rmsprop', 'adam' or 'adagrad', default: 'rmsprop') – Optimize method
batch_size (int, default: -1) – Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.
learning_rate (float, default: 0.01) – Learning rate
max_iter (int, default: 20) – The maximum iteration for training.
init_param (InitParam object, default: default InitParam object) – Init param method object.
early_stop (str, 'weight_diff', 'diff' or 'abs', default: 'diff') –
Method used to judge convergence.
1. diff： Use difference of loss between two iterations to judge whether converge.
2. weight_diff: Use difference between weights of two consecutive iterations
3. abs: Use the absolute value of loss to judge whether converge. i.e. if loss < eps, it is converged.
exposure_colname (str or None, default: None) – Name of optional exposure variable in dTable.
predict_param (PredictParam object, default: default PredictParam object) –
encrypt_param (EncryptParam object, default: default EncryptParam object) –
encrypted_mode_calculator_param (EncryptedModeCalculatorParam object, default: default EncryptedModeCalculatorParam object) –
cv_param (CrossValidationParam object, default: default CrossValidationParam object) –
stepwise_param (StepwiseParam object, default: default StepwiseParam object) –
decay (int or float, default: 1) – Decay rate for learning rate. learning rate will follow the following decay schedule. lr = lr0/(1+decay*t) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decay*t) where t is the iter number.
decay_sqrt (Bool, default: True) – lr = lr0/(1+decay*t) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decay*t)
validation_freqs (int, list, tuple, set, or None) – validation frequency during training, required when using early stopping. The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to speed up training by skipping validation rounds. When it is larger than 1, a number which is divisible by “max_iter” is recommended, otherwise, you will miss the validation scores of the last training iteration.
early_stopping_rounds (int, default: None) – If positive number specified, at every specified training rounds, program checks for early stopping criteria. Validation_freqs must also be set when using early stopping.
metrics (list or None, default: None) – Specify which metrics to be used when performing evaluation during training process. If metrics have not improved at early_stopping rounds, trianing stops before convergence. If set as empty, default metrics will be used. For regression tasks, default metrics are [‘root_mean_squared_error’, ‘mean_absolute_error’]
use_first_metric_only (bool, default: False) – Indicate whether to use the first metric in metrics as the only criterion for early stopping judgement.

class PredictParam(threshold=0.5)¶

Define the predict method of HomoLR, HeteroLR, SecureBoosting

Parameters: threshold (float or int, The threshold use to separate positive and negative class. Normally, it should be (0,1)) –

class RsaParam(rsa_key_n=None, rsa_key_e=None, rsa_key_d=None, save_out_table_namespace=None, save_out_table_name=None)¶

Define the sample method

Parameters

rsa_key_n (integer, RSA modulus, default: None) –
rsa_key_e (integer, RSA public exponent, default: None) –
rsa_key_d (integer, RSA private exponent, default: None) –
save_out_table_namespace (str, namespace of dtable where stores the output data. default: None) –
save_out_table_name (str, name of dtable where stores the output data. default: None) –

class SampleParam(mode='random', method='downsample', fractions=None, random_state=None, task_type='hetero', need_run=True)¶

Define the sample method

Parameters

mode (str, accepted 'random','stratified'' only in this version, specify sample to use, default: 'random') –
method (str, accepted 'downsample','upsample' only in this version. default: 'downsample') –
fractions (None or float or list, if mode equals to random, it should be a float number greater than 0,) – otherwise a list of elements of pairs like [label_i, sample_rate_i], e.g. [[0, 0.5], [1, 0.8], [2, 0.3]]. default: None
random_state (int, RandomState instance or None, default: None) –
need_run (bool, default True) – Indicate if this module needed to be run

class ScaleParam(method=None, mode='normal', scale_col_indexes=- 1, scale_names=None, feat_upper=None, feat_lower=None, with_mean=True, with_std=True, need_run=True)¶

Define the feature scale parameters.

Parameters

method (str, like scale in sklearn, now it support "min_max_scale" and "standard_scale", and will support other scale method soon.) – Default None, which will do nothing for scale
mode (str, the mode support "normal" and "cap". for mode is "normal", the feat_upper and feat_lower is the normal value like "10" or "3.1" and for "cap", feat_upper and) – feature_lower will between 0 and 1, which means the percentile of the column. Default “normal”
feat_upper (int or float, the upper limit in the column. If the scaled value is larger than feat_upper, it will be set to feat_upper. Default None.) –
feat_lower (int or float, the lower limit in the column. If the scaled value is less than feat_lower, it will be set to feat_lower. Default None.) –
scale_col_indexes (list,the idx of column in scale_column_idx will be scaled, while the idx of column is not in, it will not be scaled.) –
scale_names (list of string, default: []Specify which columns need to scaled. Each element in the list represent for a column name in header.) –
with_mean (bool, used for "standard_scale". Default False.) –
with_std (bool, used for "standard_scale". Default False.) – The standard scale of column x is calculated as : z = (x - u) / s, where u is the mean of the column and s is the standard deviation of the column. if with_mean is False, u will be 0, and if with_std is False, s will be 1.
need_run (bool, default True) – Indicate if this module needed to be run

class StochasticQuasiNewtonParam(update_interval_L=3, memory_M=5, sample_size=5000, random_seed=None)¶

Parameters used for stochastic quasi-newton method.

Parameters

update_interval_L (int, default: 3) – Set how many iteration to update hess matrix
memory_M (int, default: 5) – Stack size of curvature information, i.e. y_k and s_k in the paper.
sample_size (int, default: 5000) – Sample size of data that used to update Hess matrix

class StatisticsParam(statistics='summary', column_names=None, column_indexes=- 1, need_run=True, abnormal_list=None, quantile_error=0.001, bias=True)¶

Define statistics params

Parameters

statistics (list, string, default "summary") –
Specify the statistic types to be computed. “summary” represents list: [consts.SUM, consts.MEAN, consts.STANDARD_DEVIATION,

consts.MEDIAN, consts.MIN, consts.MAX, consts.MISSING_COUNT, consts.SKEWNESS, consts.KURTOSIS]
column_names (list of string, default []) – Specify columns to be used for statistic computation by column names in header
column_indexes (list of int, default -1) – Specify columns to be used for statistic computation by column order in header -1 indicates to compute statistics over all columns
bias (bool, default: True) – If False, the calculations of skewness and kurtosis are corrected for statistical bias.
need_run (bool, default True) – Indicate whether to run this modules

class StepwiseParam(score_name='AIC', mode='hetero', role='guest', direction='both', max_step=10, nvmin=2, nvmax=None, need_stepwise=False)¶

Define stepwise params

Parameters

score_name (str, default: 'AIC') – Specify which model selection criterion to be used
mode (str, default: 'Hetero') – Indicate what mode is current task
role (str, default: 'Guest') – Indicate what role is current party
direction (str, default: 'both') – Indicate which direction to go for stepwise. ‘forward’ means forward selection; ‘backward’ means elimination; ‘both’ means possible models of both directions are examined at each step.
max_step (int, default: '10') – Specify total number of steps to run before forced stop.
nvmin (int, default: '2') – Specify the min subset size of final model, cannot be lower than 2. When nvmin > 2, the final model size may be smaller than nvmin due to max_step limit.
nvmax (int, default: None) – Specify the max subset size of final model, 2 <= nvmin <= nvmax. The final model size may be larger than nvmax due to max_step limit.
need_stepwise (bool, default False) – Indicate if this module needed to be run

class UnionParam(need_run=True, allow_missing=False, keep_duplicate=False)¶

Define the union method for combining multiple dTables and keep entries with the same id

Parameters

need_run (bool, default True) – Indicate if this module needed to be run
allow_missing (bool, default False) – Whether allow mismatch between feature length and header length in the result. Note that empty tables will always be skipped regardless of this param setting.
keep_duplicate (bool, default False) – Whether to keep entries with duplicated keys. If set to True, a new id will be generated for duplicated entry in the format {id}_{table_name}.