Federated Machine Learning

[中文]

FederatedML includes implementation of many common machine learning algorithms on federated learning. All modules are developed in a decoupling modular approach to enhance scalability. Specifically, we provide:

  1. Federated Statistic: PSI, Union, Pearson Correlation, etc.

  2. Federated Feature Engineering: Feature Sampling, Feature Binning, Feature Selection, etc.

  3. Federated Machine Learning Algorithms: LR, GBDT, DNN, TransferLearning, which support Heterogeneous and Homogeneous styles.

  4. Model Evaluation: Binary | Multiclass | Regression | Clustering Evaluation, Local vs Federated Comparison.

  5. Secure Protocol: Provides multiple security protocols for secure multi-party computing and interaction between participants.

federatedml structure

Algorithm List

Algorithm

Algorithm

Module Name

Description

Data Input

Data Output

Model Input

Model Output

Reader

Reader

This component loads and transforms data from storage engine so that data is compatible with FATE computing engine

Original Data

Transformed Data

DataIO

DataIO

This component transforms user-uploaded date into Instance object.

Table, values are raw data.

Transformed Table, values are data instance defined here

DataIO Model

Intersect

Intersection

Compute intersect data set of multiple parties without leakage of difference set information. Mainly used in hetero scenario task.

Table.

Table with only common instance keys.

Intersect Model

Federated Sampling

FederatedSample

Federated Sampling data so that its distribution become balance in each party.This module supports standalone and federated versions.

Table

Table of sampled data; both random and stratified sampling methods are supported.

Feature Scale

FeatureScale

module for feature scaling and standardization.

Table,values are instances.

Transformed Table.

Transform factors like min/max, mean/std.

Hetero Feature Binning

Hetero Feature Binning

With binning input data, calculates each column’s iv and woe and transform data according to the binned information.

Table, values are instances.

Transformed Table.

iv/woe, split points, event count, non-event count etc. of each column.

OneHot Encoder

OneHotEncoder

Transfer a column into one-hot format.

Table, values are instances.

Transformed Table with new header.

Feature-name mapping between original header and new header.

Hetero Feature Selection

HeteroFeatureSelection

Provide 5 types of filters. Each filters can select columns according to user config

Table

Transformed Table with new header and filtered data instance.

If iv filters used, hetero_binning model is needed.

Whether each column is filtered.

Union

Union

Combine multiple data tables into one.

Tables.

Table with combined values from input Tables.

Hetero-LR

HeteroLR

Build hetero logistic regression model through multiple parties.

Table, values are instances

Table, values are instances.

Logistic Regression Model, consists of model-meta and model-param.

Local Baseline

LocalBaseline

Wrapper that runs sklearn(scikit-learn) Logistic Regression model with local data.

Table, values are instances.

Table, values are instances.

Hetero-LinR

HeteroLinR

Build hetero linear regression model through multiple parties.

Table, values are instances.

Table, values are instances.

Linear Regression Model, consists of model-meta and model-param.

Hetero-Poisson

HeteroPoisson

Build hetero poisson regression model through multiple parties.

Table, values are instances.

Table, values are instances.

Poisson Regression Model, consists of model-meta and model-param.

Homo-LR

HomoLR

Build homo logistic regression model through multiple parties.

Table, values are instances.

Table, values are instances.

Logistic Regression Model, consists of model-meta and model-param.

Homo-NN

HomoNN

Build homo neural network model through multiple parties.

Table, values are instances.

Table, values are instances.

Neural Network Model, consists of model-meta and model-param.

Hetero Secure Boosting

HeteroSecureBoost

Build hetero secure boosting model through multiple parties

Table, values are instances.

Table, values are instances.

SecureBoost Model, consists of model-meta and model-param.

Hetero Fast Secure Boosting

HeteroFastSecureBoost

Build hetero secure boosting model through multiple parties in layered/mix manners.

Table, values are instances.

Table, values are instances.

FastSecureBoost Model, consists of model-meta and model-param.

Evaluation

Evaluation

Output the model evaluation metrics for user.

Table(s), values are instances.

Hetero Pearson

HeteroPearson

Calculate hetero correlation of features from different parties.

Table, values are instances.

Hetero-NN

HeteroNN

Build hetero neural network model.

Table, values are instances.

Table, values are instances.

Hetero Neural Network Model, consists of model-meta and model-param.

Homo Secure Boosting

HomoSecureBoost

Build homo secure boosting model through multiple parties

Table, values are instances.

Table, values are instances.

SecureBoost Model, consists of model-meta and model-param.

Homo OneHot Encoder

HomoOneHotEncoder

Build homo onehot encoder model through multiple parties.

Table, values are instances.

Transformed Table with new header.

Feature-name mapping between original header and new header.

Data Split

Data Split

Split one data table into 3 tables by given ratio or count

Table, values are instances.

3 Tables, values are instance.

Column Expand

Column Expand

Add arbitrary number of columns with user-provided values.

Table, values are raw data.

Transformed Table with added column(s) and new header.

Column Expand Model

Secure Information Retrieval

Secure Information Retrieval

Securely retrieves information from host through oblivious transfer

Table, values are instance

Table, values are instance

Hetero Federated Transfer Learning

Hetero FTL

Build Hetero FTL Model Between 2 party

Table, values are instance

Hetero FTL Model

Hetero KMeans

Hetero KMeans

Build Hetero KMeans model through multiple parties

Table, values are instance

Table, values are instance; Arbier outputs 2 Tables

Hetero KMeans Model

PSI

PSI module

Compute PSI value of features between two table

Table, values are instance

PSI Results

Data Statistics

Data Statistics

This component will do some statistical work on the data, including statistical mean, maximum and minimum, median, etc.

Table, values are instance

Table

Statistic Result

Scorecard

Scorecard

Scale predict score to credit score by given scaling parameters

Table, values are predict score

Table, values are score results

Params

Classes:

BoostingParam([task_type, objective_param, …])

Basic parameter for Boosting Algorithms

CrossValidationParam([n_splits, mode, role, …])

Define cross validation params

DataIOParam([input_format, delimitor, …])

Define dataio parameters that used in federated ml.

DataSplitParam([random_state, test_size, …])

Define data split param that used in data split.

DecisionTreeParam([criterion_method, …])

Define decision tree parameters that used in federated ml.

EncryptParam([method, key_length])

Define encryption method that used in federated ml.

EncryptedModeCalculatorParam([mode, …])

Define the encrypted_mode_calulator parameters.

FeatureBinningParam([method, …])

Define the feature binning method

FeatureSelectionParam([select_col_indexes, …])

Define the feature selection parameters.

HeteroNNParam([task_type, config_type, …])

Parameters used for Homo Neural Network.

HomoNNParam(secure_aggregate, …[, …])

Parameters used for Homo Neural Network.

HomoOneHotParam([transform_col_indexes, …])

param transform_col_indexes

Specify which columns need to calculated. -1 represent for all columns.

InitParam([init_method, init_const, …])

Initialize Parameters used in initializing a model.

IntersectParam(intersect_method[, …])

Define the intersect method

LinearParam([penalty, tol, alpha, …])

Parameters used for Linear Regression.

LocalBaselineParam([model_name, model_opts, …])

Define the local baseline model param

LogisticParam([penalty, tol, alpha, …])

Parameters used for Logistic Regression both for Homo mode or Hetero mode.

ObjectiveParam([objective, params])

Define objective parameters that used in federated ml.

OneVsRestParam([need_one_vs_rest, has_arbiter])

Define the one_vs_rest parameters.

PoissonParam([penalty, tol, alpha, …])

Parameters used for Poisson Regression.

PredictParam([threshold])

Define the predict method of HomoLR, HeteroLR, SecureBoosting

RsaParam([rsa_key_n, rsa_key_e, rsa_key_d, …])

Define the sample method

SampleParam([mode, method, fractions, …])

Define the sample method

ScaleParam([method, mode, …])

Define the feature scale parameters.

StatisticsParam([statistics, column_names, …])

Define statistics params

StepwiseParam([score_name, mode, role, …])

Define stepwise params

StochasticQuasiNewtonParam([…])

Parameters used for stochastic quasi-newton method.

UnionParam([need_run, allow_missing, …])

Define the union method for combining multiple dTables and keep entries with the same id

class BoostingParam(task_type='classification', objective_param=<federatedml.param.boosting_param.ObjectiveParam object>, learning_rate=0.3, num_trees=5, subsample_feature_rate=1, n_iter_no_change=True, tol=0.0001, bin_num=32, predict_param=<federatedml.param.predict_param.PredictParam object>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object>, validation_freqs=None, metrics=None, subsample_random_seed=None, binning_error=0.001)

Basic parameter for Boosting Algorithms

Parameters
  • task_type (str, accepted 'classification', 'regression' only, default: 'classification') –

  • objective_param (ObjectiveParam Object, default: ObjectiveParam()) –

  • learning_rate (float, accepted float, int or long only, the learning rate of secure boost. default: 0.3) –

  • num_trees (int, accepted int, float only, the max number of boosting round. default: 5) –

  • subsample_feature_rate (float, a float-number in [0, 1], default: 0.8) –

  • n_iter_no_change (bool,) – when True and residual error less than tol, tree building process will stop. default: True

  • bin_num (int, positive integer greater than 1, bin number use in quantile. default: 32) –

  • validation_freqs (None or positive integer or container object in python. Do validation in training process or Not.) –

    if equals None, will not do validation in train process; if equals positive integer, will validate data every validation_freqs epochs passes; if container object in python, will validate data if epochs belong to this container.

    e.g. validation_freqs = [10, 15], will validate data when epoch equals to 10 and 15.

    Default: None

class CrossValidationParam(n_splits=5, mode='hetero', role='guest', shuffle=True, random_seed=1, need_cv=False)

Define cross validation params

Parameters
  • n_splits (int, default: 5) – Specify how many splits used in KFold

  • mode (str, default: 'Hetero') – Indicate what mode is current task

  • role (str, default: 'Guest') – Indicate what role is current party

  • shuffle (bool, default: True) – Define whether do shuffle before KFold or not.

  • random_seed (int, default: 1) – Specify the random seed for numpy shuffle

  • need_cv (bool, default True) – Indicate if this module needed to be run

class DataIOParam(input_format='dense', delimitor=',', data_type='float64', exclusive_data_type=None, tag_with_value=False, tag_value_delimitor=':', missing_fill=False, default_value=0, missing_fill_method=None, missing_impute=None, outlier_replace=False, outlier_replace_method=None, outlier_impute=None, outlier_replace_value=0, with_label=False, label_name='y', label_type='int', output_format='dense', need_run=True)

Define dataio parameters that used in federated ml.

Parameters
  • input_format (str, accepted 'dense','sparse' 'tag' only in this version. default: 'dense'.) –

    please have a look at this tutorial at “DataIO” section of federatedml/util/README.md. Formally,

    dense input format data should be set to “dense”, svm-light input format data should be set to “sparse”, tag or tag:value input format data should be set to “tag”.

  • delimitor (str, the delimitor of data input, default: ',') –

  • data_type (str, the data type of data input, accepted 'float','float64','int','int64','str','long') – “default: “float64”

  • exclusive_data_type (dict, the key of dict is col_name, the value is data_type, use to specified special data type) – of some features.

  • tag_with_value (bool, use if input_format is 'tag', if tag_with_value is True,) – input column data format should be tag[delimitor]value, otherwise is tag only

  • tag_value_delimitor (str, use if input_format is 'tag' and 'tag_with_value' is True,) – delimitor of tag[delimitor]value column value.

  • missing_fill (bool, need to fill missing value or not, accepted only True/False, default: False) –

  • default_value (None or single object type or list, the value to replace missing value.) –

    if None, it will use default value define in federatedml/feature/imputer.py, if single object, will fill missing value with this object, if list, it’s length should be the sample of input data’ feature dimension,

    means that if some column happens to have missing values, it will replace it the value by element in the identical position of this list.

    default: None

  • missing_fill_method (None or str, the method to replace missing value, should be one of [None, 'min', 'max', 'mean', 'designated'], default: None) –

  • missing_impute (None or list, element of list can be any type, or auto generated if value is None, define which values to be consider as missing, default: None) –

  • outlier_replace (bool, need to replace outlier value or not, accepted only True/False, default: True) –

  • outlier_replace_method (None or str, the method to replace missing value, should be one of [None, 'min', 'max', 'mean', 'designated'], default: None) –

  • outlier_impute (None or list, element of list can be any type, which values should be regard as missing value, default: None) –

  • outlier_replace_value (None or single object type or list, the value to replace outlier.) –

    if None, it will use default value define in federatedml/feature/imputer.py, if single object, will replace outlier with this object, if list, it’s length should be the sample of input data’ feature dimension,

    means that if some column happens to have outliers, it will replace it the value by element in the identical position of this list.

    default: None

  • with_label (bool, True if input data consist of label, False otherwise. default: 'false') –

  • label_name (str, column_name of the column where label locates, only use in dense-inputformat. default: 'y') –

  • label_type (object, accepted 'int','int64','float','float64','long','str' only,) – use when with_label is True. default: ‘false’

  • output_format (str, accepted 'dense','sparse' only in this version. default: 'dense') –

class DataSplitParam(random_state=None, test_size=None, train_size=None, validate_size=None, stratified=False, shuffle=True, split_points=None, need_run=True)

Define data split param that used in data split.

Parameters
  • random_state (None, int, default: None) – Specify the random state for shuffle.

  • test_size (None, float, int, default: 0.0) – Specify test data set size. float value specifies fraction of input data set, int value specifies exact number of data instances

  • train_size (None, float, int, default: 0.8) – Specify train data set size. float value specifies fraction of input data set, int value specifies exact number of data instances

  • validate_size (None, float, int, default: 0.2) – Specify validate data set size. float value specifies fraction of input data set, int value specifies exact number of data instances

  • stratified (boolean, default: False) – Define whether sampling should be stratified, according to label value.

  • shuffle (boolean, default : True) – Define whether do shuffle before splitting or not.

  • split_points (None, list, default : None) – Specify the point(s) by which continuous label values are bucketed into bins for stratified split. eg.[0.2] for two bins or [0.1, 1, 3] for 4 bins

  • need_run (bool, default: True) – Specify whether to run data split

class DecisionTreeParam(criterion_method='xgboost', criterion_params=[0.1], max_depth=3, min_sample_split=2, min_imputiry_split=0.001, min_leaf_node=1, max_split_nodes=65536, feature_importance_type='split', n_iter_no_change=True, tol=0.001, use_missing=False, zero_as_missing=False)

Define decision tree parameters that used in federated ml.

Parameters
  • criterion_method (str, accepted "xgboost" only, the criterion function to use, default: 'xgboost') –

  • criterion_params (list, should be non empty and first element is float-number, default: 0.1.) –

  • max_depth (int, positive integer, the max depth of a decision tree, default: 3) –

  • min_sample_split (int, least quantity of nodes to split, default: 2) –

  • min_impurity_split (float, least gain of a single split need to reach, default: 1e-3) –

  • min_leaf_node (int, when samples no more than min_leaf_node, it becomes a leave, default: 1) –

  • max_split_nodes (int, positive integer, we will use no more than max_split_nodes to) – parallel finding their splits in a batch, for memory consideration. default is 65536

  • feature_importance_type (str, support 'split', 'gain' only.) – if is ‘split’, feature_importances calculate by feature split times, if is ‘gain’, feature_importances calculate by feature split gain. default: ‘split’

  • use_missing (bool, accepted True, False only, use missing value in training process or not. default: False) –

  • zero_as_missing (bool, accepted True, False only, regard 0 as missing value or not,) – will be use only if use_missing=True, default: False

class EncryptParam(method='Paillier', key_length=1024)

Define encryption method that used in federated ml.

Parameters
  • method (str, default: 'Paillier') – If method is ‘Paillier’, Paillier encryption will be used for federated ml. To use non-encryption version in HomoLR, set this to None. For detail of Paillier encryption, please check out the paper mentioned in README file. Accepted values: {‘Paillier’, ‘IterativeAffine’, ‘Random_IterativeAffine’}

  • key_length (int, default: 1024) – Used to specify the length of key in this encryption method.

class EncryptedModeCalculatorParam(mode='strict', re_encrypted_rate=1)

Define the encrypted_mode_calulator parameters.

Parameters
  • mode (str, support 'strict', 'fast', 'balance', 'confusion_opt', ' only, default: strict) –

  • re_encrypted_rate (float or int, numeric number in [0, 1], use when mode equals to 'balance, default: 1) –

class FeatureBinningParam(method='quantile', compress_thres=10000, head_size=10000, error=0.001, bin_num=10, bin_indexes=-1, bin_names=None, adjustment_factor=0.5, transform_param=<federatedml.param.feature_binning_param.TransformParam object>, optimal_binning_param=<federatedml.param.feature_binning_param.OptimalBinningParam object>, local_only=False, category_indexes=None, category_names=None, need_run=True, skip_static=False)

Define the feature binning method

Parameters
  • method (str, 'quantile', 'bucket' or 'optimal', default: 'quantile') – Binning method.

  • compress_thres (int, default: 10000) – When the number of saved summaries exceed this threshold, it will call its compress function

  • head_size (int, default: 10000) – The buffer size to store inserted observations. When head list reach this buffer size, the QuantileSummaries object start to generate summary(or stats) and insert into its sampled list.

  • error (float, 0 <= error < 1 default: 0.001) – The error of tolerance of binning. The final split point comes from original data, and the rank of this value is close to the exact rank. More precisely, floor((p - 2 * error) * N) <= rank(x) <= ceil((p + 2 * error) * N) where p is the quantile in float, and N is total number of data.

  • bin_num (int, bin_num > 0, default: 10) – The max bin number for binning

  • bin_indexes (list of int or int, default: -1) – Specify which columns need to be binned. -1 represent for all columns. If you need to indicate specific cols, provide a list of header index instead of -1.

  • bin_names (list of string, default: []) – Specify which columns need to calculated. Each element in the list represent for a column name in header.

  • adjustment_factor (float, default: 0.5) – the adjustment factor when calculating WOE. This is useful when there is no event or non-event in a bin. Please note that this parameter will NOT take effect for setting in host.

  • category_indexes (list of int or int, default: []) –

    Specify which columns are category features. -1 represent for all columns. List of int indicate a set of such features. For category features, bin_obj will take its original values as split_points and treat them as have been binned. If this is not what you expect, please do NOT put it into this parameters.

    The number of categories should not exceed bin_num set above.

  • category_names (list of string, default: []) – Use column names to specify category features. Each element in the list represent for a column name in header.

  • local_only (bool, default: False) – Whether just provide binning method to guest party. If true, host party will do nothing.

  • transform_param (TransformParam) – Define how to transfer the binned data.

  • need_run (bool, default True) – Indicate if this module needed to be run

  • skip_static (bool, default False) – If true, binning will not calculate iv, woe etc. In this case, optimal-binning will not be supported.

class FeatureSelectionParam(select_col_indexes=-1, select_names=None, filter_methods=None, unique_param=<federatedml.param.feature_selection_param.UniqueValueParam object>, iv_value_param=<federatedml.param.feature_selection_param.IVValueSelectionParam object>, iv_percentile_param=<federatedml.param.feature_selection_param.IVPercentileSelectionParam object>, iv_top_k_param=<federatedml.param.feature_selection_param.IVTopKParam object>, variance_coe_param=<federatedml.param.feature_selection_param.VarianceOfCoeSelectionParam object>, outlier_param=<federatedml.param.feature_selection_param.OutlierColsSelectionParam object>, manually_param=<federatedml.param.feature_selection_param.ManuallyFilterParam object>, percentage_value_param=<federatedml.param.feature_selection_param.PercentageValueParam object>, iv_param=<federatedml.param.feature_selection_param.CommonFilterParam object>, statistic_param=<federatedml.param.feature_selection_param.CommonFilterParam object>, psi_param=<federatedml.param.feature_selection_param.CommonFilterParam object>, sbt_param=<federatedml.param.feature_selection_param.CommonFilterParam object>, need_run=True)

Define the feature selection parameters.

Parameters
  • select_col_indexes (list or int, default: -1) – Specify which columns need to calculated. -1 represent for all columns.

  • select_names (list of string, default: []) – Specify which columns need to calculated. Each element in the list represent for a column name in header.

  • filter_methods (list, ["manually", "iv_filter", "statistic_filter",) –

    “psi_filter”, “hetero_sbt_filter”, “homo_sbt_filter”,

    ”hetero_fast_sbt_filter”, “percentage_value”],

    default: [“manually”]

    The following methods will be deprecated in future version: “unique_value”, “iv_value_thres”, “iv_percentile”, “coefficient_of_variation_value_thres”, “outlier_cols”

    Specify the filter methods used in feature selection. The orders of filter used is depended on this list. Please be notified that, if a percentile method is used after some certain filter method, the percentile represent for the ratio of rest features.

    e.g. If you have 10 features at the beginning. After first filter method, you have 8 rest. Then, you want top 80% highest iv feature. Here, we will choose floor(0.8 * 8) = 6 features instead of 8.

  • unique_param (filter the columns if all values in this feature is the same) –

  • iv_value_param (Use information value to filter columns. If this method is set, a float threshold need to be provided.) – Filter those columns whose iv is smaller than threshold. Will be deprecated in the future.

  • iv_percentile_param (Use information value to filter columns. If this method is set, a float ratio threshold) – need to be provided. Pick floor(ratio * feature_num) features with higher iv. If multiple features around the threshold are same, all those columns will be keep. Will be deprecated in the future.

  • variance_coe_param (Use coefficient of variation to judge whether filtered or not.) – Will be deprecated in the future.

  • outlier_param (Filter columns whose certain percentile value is larger than a threshold.) – Will be deprecated in the future.

  • percentage_value_param (Filter the columns that have a value that exceeds a certain percentage.) –

  • iv_param (Setting how to filter base on iv. It support take high mode only. All of "threshold",) – “top_k” and “top_percentile” are accepted. Check more details in CommonFilterParam. To use this filter, hetero-feature-binning module has to be provided.

  • statistic_param (Setting how to filter base on statistic values. All of "threshold",) – “top_k” and “top_percentile” are accepted. Check more details in CommonFilterParam. To use this filter, data_statistic module has to be provided.

  • psi_param (Setting how to filter base on psi values. All of "threshold",) – “top_k” and “top_percentile” are accepted. Its take_high properties should be False to choose lower psi features. Check more details in CommonFilterParam. To use this filter, data_statistic module has to be provided.

  • need_run (bool, default True) – Indicate if this module needed to be run

class HeteroNNParam(task_type='classification', config_type='keras', bottom_nn_define=None, top_nn_define=None, interactive_layer_define=None, interactive_layer_lr=0.9, optimizer='SGD', loss=None, epochs=100, batch_size=-1, early_stop='diff', tol=1e-05, encrypt_param=<federatedml.param.encrypt_param.EncryptParam object>, encrypted_mode_calculator_param=<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object>, predict_param=<federatedml.param.predict_param.PredictParam object>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object>, validation_freqs=None, early_stopping_rounds=None, metrics=None, use_first_metric_only=True)

Parameters used for Homo Neural Network.

Parameters
  • task_type – str, task type of hetero nn model, one of ‘classification’, ‘regression’.

  • config_type – str, accept “keras” only.

  • bottom_nn_define – a dict represents the structure of bottom neural network.

  • interactive_layer_define – a dict represents the structure of interactive layer.

  • interactive_layer_lr – float, the learning rate of interactive layer.

  • top_nn_define – a dict represents the structure of top neural network.

  • optimizer

    optimizer method, accept following types: 1. a string, one of “Adadelta”, “Adagrad”, “Adam”, “Adamax”, “Nadam”, “RMSprop”, “SGD” 2. a dict, with a required key-value pair keyed by “optimizer”,

    with optional key-value pairs such as learning rate.

    defaults to “SGD”

  • loss – str, a string to define loss function used

  • early_stopping_rounds – int, default: None

  • stop training if one metric doesn’t improve in last early_stopping_round rounds (Will) –

  • metrics – list, default: None Indicate when executing evaluation during train process, which metrics will be used. If not set, default metrics for specific task type will be used. As for binary classification, default metrics are [‘auc’, ‘ks’], for regression tasks, default metrics are [‘root_mean_squared_error’, ‘mean_absolute_error’], [ACCURACY, PRECISION, RECALL] for multi-classification task

  • use_first_metric_only – bool, default: False Indicate whether to use the first metric in metrics as the only criterion for early stopping judgement.

  • epochs – int, the maximum iteration for aggregation in training.

  • batch_size – int, batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy. defaults to -1.

  • early_stop

    str, accept ‘diff’ only in this version, default: ‘diff’ Method used to judge converge or not.

    1. diff: Use difference of loss between two iterations to judge whether converge.

  • validation_freqs

    None or positive integer or container object in python. Do validation in training process or Not. if equals None, will not do validation in train process; if equals positive integer, will validate data every validation_freqs epochs passes; if container object in python, will validate data if epochs belong to this container.

    e.g. validation_freqs = [10, 15], will validate data when epoch equals to 10 and 15.

    Default: None The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to speed up training by skipping validation rounds. When it is larger than 1, a number which is divisible by “epochs” is recommended, otherwise, you will miss the validation scores of last training epoch.

class HomoNNParam(secure_aggregate: bool = True, aggregate_every_n_epoch: int = 1, config_type: str = 'nn', nn_define: Optional[dict] = None, optimizer: Union[str, dict, types.SimpleNamespace] = 'SGD', loss: Optional[str] = None, metrics: Optional[Union[str, list]] = None, max_iter: int = 100, batch_size: int = -1, early_stop: Union[str, dict, types.SimpleNamespace] = 'diff', encode_label: bool = False, predict_param=<federatedml.param.predict_param.PredictParam object>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object>)

Parameters used for Homo Neural Network.

Parameters

Args

secure_aggregate: enable secure aggregation or not, defaults to True. aggregate_every_n_epoch: aggregate model every n epoch, defaults to 1. config_type: one of “nn”, “keras”, “tf” nn_define: a dict represents the structure of neural network. optimizer: optimizer method, accept following types:

  1. a string, one of “Adadelta”, “Adagrad”, “Adam”, “Adamax”, “Nadam”, “RMSprop”, “SGD”

  2. a dict, with a required key-value pair keyed by “optimizer”,

    with optional key-value pairs such as learning rate.

defaults to “SGD”

loss: a string metrics: max_iter: the maximum iteration for aggregation in training. batch_size : batch size when updating model.

-1 means use all data in a batch. i.e. Not to use mini-batch strategy. defaults to -1.

early_stopstr, ‘diff’, ‘weight_diff’ or ‘abs’, default: ‘diff’
Method used to judge converge or not.
  1. diff: Use difference of loss between two iterations to judge whether converge.

  2. weight_diff: Use difference between weights of two consecutive iterations

  3. abs: Use the absolute value of loss to judge whether converge. i.e. if loss < eps, it is converged.

encode_label : encode label to one_hot.

class HomoOneHotParam(transform_col_indexes=- 1, transform_col_names=None, need_run=True, need_alignment=True)
Parameters
  • transform_col_indexes (list or int, default: -1) – Specify which columns need to calculated. -1 represent for all columns.

  • need_run (bool, default True) – Indicate if this module needed to be run

  • need_alignment (bool, default True) – Indicated whether alignment of features is turned on

class InitParam(init_method='random_uniform', init_const=1, fit_intercept=True, random_seed=None)

Initialize Parameters used in initializing a model.

Parameters
  • init_method (str, 'random_uniform', 'random_normal', 'ones', 'zeros' or 'const'. default: 'random_uniform') – Initial method.

  • init_const (int or float, default: 1) – Required when init_method is ‘const’. Specify the constant.

  • fit_intercept (bool, default: True) – Whether to initialize the intercept or not.

class IntersectParam(intersect_method: str = 'raw', random_bit=128, sync_intersect_ids=True, join_role='guest', with_encode=False, only_output_key=False, encode_params=<federatedml.param.intersect_param.EncodeParam object>, intersect_cache_param=<federatedml.param.intersect_param.IntersectCache object>, repeated_id_process=False, repeated_id_owner='guest', allow_info_share: bool = False, info_owner='guest')

Define the intersect method

Parameters
  • intersect_method (str, it supports 'rsa' and 'raw', default by 'raw') –

  • random_bit (positive int, it will define the encrypt length of rsa algorithm. It effective only for intersect_method is rsa) –

  • sync_intersect_ids (bool. In rsa, 'synchronize_intersect_ids' is True means guest or host will send intersect results to the others, and False will not.) – while in raw, ‘synchronize_intersect_ids’ is True means the role of “join_role” will send intersect results and the others will get them. Default by True.

  • join_role (str, it supports "guest" and "host" only and effective only for raw. If it is "guest", the host will send its ids to guest and find the intersection of) – ids in guest; if it is “host”, the guest will send its ids. Default by “guest”.

  • with_encode (bool, if True, it will use encode method for intersect ids. It effective only for "raw".) –

  • encode_params (EncodeParam, it effective only for with_encode is True) –

  • only_output_key (bool, if false, the results of intersection will include key and value which from input data; if true, it will just include key from input) – data and the value will be empty or some useless character like “intersect_id”

  • repeated_id_process (bool, if true, intersection will process the ids which can be repeatable) –

  • repeated_id_owner (str, which role has the repeated ids) –

class LinearParam(penalty='L2', tol=0.0001, alpha=1.0, optimizer='sgd', batch_size=-1, learning_rate=0.01, init_param=<federatedml.param.init_model_param.InitParam object>, max_iter=20, early_stop='diff', predict_param=<federatedml.param.predict_param.PredictParam object>, encrypt_param=<federatedml.param.encrypt_param.EncryptParam object>, sqn_param=<federatedml.param.sqn_param.StochasticQuasiNewtonParam object>, encrypted_mode_calculator_param=<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object>, decay=1, decay_sqrt=True, validation_freqs=None, early_stopping_rounds=None, stepwise_param=<federatedml.param.stepwise_param.StepwiseParam object>, metrics=None, use_first_metric_only=False)

Parameters used for Linear Regression.

Parameters
  • penalty (str, 'L1' or 'L2'. default: 'L2') – Penalty method used in LinR. Please note that, when using encrypted version in HeteroLinR, ‘L1’ is not supported.

  • tol (float, default: 1e-4) – The tolerance of convergence

  • alpha (float, default: 1.0) – Regularization strength coefficient.

  • optimizer (str, 'sgd', 'rmsprop', 'adam', 'sqn', or 'adagrad', default: 'sgd') – Optimize method

  • batch_size (int, default: -1) – Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.

  • learning_rate (float, default: 0.01) – Learning rate

  • max_iter (int, default: 20) – The maximum iteration for training.

  • init_param (InitParam object, default: default InitParam object) – Init param method object.

  • early_stop (str, 'diff' or 'abs' or 'weight_dff', default: 'diff') –

    Method used to judge convergence.
    1. diff: Use difference of loss between two iterations to judge whether converge.

    2. abs: Use the absolute value of loss to judge whether converge. i.e. if loss < tol, it is converged.

    3. weight_diff: Use difference between weights of two consecutive iterations

  • predict_param (PredictParam object, default: default PredictParam object) –

  • encrypt_param (EncryptParam object, default: default EncryptParam object) –

  • encrypted_mode_calculator_param (EncryptedModeCalculatorParam object, default: default EncryptedModeCalculatorParam object) –

  • cv_param (CrossValidationParam object, default: default CrossValidationParam object) –

  • decay (int or float, default: 1) – Decay rate for learning rate. learning rate will follow the following decay schedule. lr = lr0/(1+decay*t) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decay*t) where t is the iter number.

  • decay_sqrt (Bool, default: True) – lr = lr0/(1+decay*t) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decay*t)

  • validation_freqs (int, list, tuple, set, or None) – validation frequency during training, required when using early stopping. The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to speed up training by skipping validation rounds. When it is larger than 1, a number which is divisible by “max_iter” is recommended, otherwise, you will miss the validation scores of the last training iteration.

  • early_stopping_rounds (int, default: None) – If positive number specified, at every specified training rounds, program checks for early stopping criteria. Validation_freqs must also be set when using early stopping.

  • metrics (list or None, default: None) – Specify which metrics to be used when performing evaluation during training process. If metrics have not improved at early_stopping rounds, trianing stops before convergence. If set as empty, default metrics will be used. For regression tasks, default metrics are [‘root_mean_squared_error’, ‘mean_absolute_error’]

  • use_first_metric_only (bool, default: False) – Indicate whether to use the first metric in metrics as the only criterion for early stopping judgement.

class LocalBaselineParam(model_name='LogisticRegression', model_opts=None, predict_param=<federatedml.param.predict_param.PredictParam object>, need_run=True)

Define the local baseline model param

Parameters
  • model_name (str, sklearn model used to train on baseline model) –

  • model_opts (dict or none, default None) – Param to be used as input into baseline model

  • predict_param (PredictParam object, default: default PredictParam object) –

  • need_run (bool, default True) – Indicate if this module needed to be run

class LogisticParam(penalty='L2', tol=0.0001, alpha=1.0, optimizer='rmsprop', batch_size=-1, learning_rate=0.01, init_param=<federatedml.param.init_model_param.InitParam object>, max_iter=100, early_stop='diff', encrypt_param=<federatedml.param.encrypt_param.EncryptParam object>, predict_param=<federatedml.param.predict_param.PredictParam object>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object>, decay=1, decay_sqrt=True, multi_class='ovr', validation_freqs=None, early_stopping_rounds=None, stepwise_param=<federatedml.param.stepwise_param.StepwiseParam object>, metrics=None, use_first_metric_only=False)

Parameters used for Logistic Regression both for Homo mode or Hetero mode.

Parameters
  • penalty (str, 'L1', 'L2' or None. default: 'L2') – Penalty method used in LR. Please note that, when using encrypted version in HomoLR, ‘L1’ is not supported.

  • tol (float, default: 1e-4) – The tolerance of convergence

  • alpha (float, default: 1.0) – Regularization strength coefficient.

  • optimizer (str, 'sgd', 'rmsprop', 'adam', 'nesterov_momentum_sgd', 'sqn' or 'adagrad', default: 'rmsprop') – Optimize method, if ‘sqn’ has been set, sqn_param will take effect. Currently, ‘sqn’ support hetero mode only.

  • batch_size (int, default: -1) – Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.

  • learning_rate (float, default: 0.01) – Learning rate

  • max_iter (int, default: 100) – The maximum iteration for training.

  • early_stop (str, 'diff', 'weight_diff' or 'abs', default: 'diff') –

    Method used to judge converge or not.
    1. diff: Use difference of loss between two iterations to judge whether converge.

    2. weight_diff: Use difference between weights of two consecutive iterations

    3. abs: Use the absolute value of loss to judge whether converge. i.e. if loss < eps, it is converged.

    Please note that for hetero-lr multi-host situation, this parameter support “weight_diff” only.

  • decay (int or float, default: 1) – Decay rate for learning rate. learning rate will follow the following decay schedule. lr = lr0/(1+decay*t) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decay*t) where t is the iter number.

  • decay_sqrt (Bool, default: True) – lr = lr0/(1+decay*t) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decay*t)

  • encrypt_param (EncryptParam object, default: default EncryptParam object) –

  • predict_param (PredictParam object, default: default PredictParam object) –

  • cv_param (CrossValidationParam object, default: default CrossValidationParam object) –

  • multi_class (str, 'ovr', default: 'ovr') – If it is a multi_class task, indicate what strategy to use. Currently, support ‘ovr’ short for one_vs_rest only.

  • validation_freqs (int, list, tuple, set, or None) – validation frequency during training.

  • early_stopping_rounds (int, default: None) – Will stop training if one metric doesn’t improve in last early_stopping_round rounds

  • metrics (list or None, default: None) – Indicate when executing evaluation during train process, which metrics will be used. If set as empty, default metrics for specific task type will be used. As for binary classification, default metrics are [‘auc’, ‘ks’]

  • use_first_metric_only (bool, default: False) – Indicate whether use the first metric only for early stopping judgement.

class ObjectiveParam(objective='cross_entropy', params=None)

Define objective parameters that used in federated ml.

Parameters
  • objective (None or str, accepted None,'cross_entropy','lse','lae','log_cosh','tweedie','fair','huber' only,) – None in host’s config, should be str in guest’config. when task_type is classification, only support cross_entropy, other 6 types support in regression task. default: None

  • params (None or list, should be non empty list when objective is 'tweedie','fair','huber',) – first element of list shoulf be a float-number large than 0.0 when objective is ‘fair’,’huber’, first element of list should be a float-number in [1.0, 2.0) when objective is ‘tweedie’

class OneVsRestParam(need_one_vs_rest=False, has_arbiter=True)

Define the one_vs_rest parameters.

Parameters

has_arbiter (bool. For some algorithm, may not has arbiter, for instances, secureboost of FATE,) – for these algorithms, it should be set to false. default true

class PoissonParam(penalty='L2', tol=0.0001, alpha=1.0, optimizer='rmsprop', batch_size=-1, learning_rate=0.01, init_param=<federatedml.param.init_model_param.InitParam object>, max_iter=20, early_stop='diff', exposure_colname=None, predict_param=<federatedml.param.predict_param.PredictParam object>, encrypt_param=<federatedml.param.encrypt_param.EncryptParam object>, encrypted_mode_calculator_param=<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object>, stepwise_param=<federatedml.param.stepwise_param.StepwiseParam object>, decay=1, decay_sqrt=True, validation_freqs=None, early_stopping_rounds=None, metrics=None, use_first_metric_only=False)

Parameters used for Poisson Regression.

Parameters
  • penalty (str, 'L1' or 'L2'. default: 'L2') – Penalty method used in Poisson. Please note that, when using encrypted version in HeteroPoisson, ‘L1’ is not supported.

  • tol (float, default: 1e-4) – The tolerance of convergence

  • alpha (float, default: 1.0) – Regularization strength coefficient.

  • optimizer (str, 'sgd', 'rmsprop', 'adam' or 'adagrad', default: 'rmsprop') – Optimize method

  • batch_size (int, default: -1) – Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.

  • learning_rate (float, default: 0.01) – Learning rate

  • max_iter (int, default: 20) – The maximum iteration for training.

  • init_param (InitParam object, default: default InitParam object) – Init param method object.

  • early_stop (str, 'weight_diff', 'diff' or 'abs', default: 'diff') –

    Method used to judge convergence.
    1. diff: Use difference of loss between two iterations to judge whether converge.

    2. weight_diff: Use difference between weights of two consecutive iterations

    3. abs: Use the absolute value of loss to judge whether converge. i.e. if loss < eps, it is converged.

  • exposure_colname (str or None, default: None) – Name of optional exposure variable in dTable.

  • predict_param (PredictParam object, default: default PredictParam object) –

  • encrypt_param (EncryptParam object, default: default EncryptParam object) –

  • encrypted_mode_calculator_param (EncryptedModeCalculatorParam object, default: default EncryptedModeCalculatorParam object) –

  • cv_param (CrossValidationParam object, default: default CrossValidationParam object) –

  • stepwise_param (StepwiseParam object, default: default StepwiseParam object) –

  • decay (int or float, default: 1) – Decay rate for learning rate. learning rate will follow the following decay schedule. lr = lr0/(1+decay*t) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decay*t) where t is the iter number.

  • decay_sqrt (Bool, default: True) – lr = lr0/(1+decay*t) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decay*t)

  • validation_freqs (int, list, tuple, set, or None) – validation frequency during training, required when using early stopping. The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to speed up training by skipping validation rounds. When it is larger than 1, a number which is divisible by “max_iter” is recommended, otherwise, you will miss the validation scores of the last training iteration.

  • early_stopping_rounds (int, default: None) – If positive number specified, at every specified training rounds, program checks for early stopping criteria. Validation_freqs must also be set when using early stopping.

  • metrics (list or None, default: None) – Specify which metrics to be used when performing evaluation during training process. If metrics have not improved at early_stopping rounds, trianing stops before convergence. If set as empty, default metrics will be used. For regression tasks, default metrics are [‘root_mean_squared_error’, ‘mean_absolute_error’]

  • use_first_metric_only (bool, default: False) – Indicate whether to use the first metric in metrics as the only criterion for early stopping judgement.

class PredictParam(threshold=0.5)

Define the predict method of HomoLR, HeteroLR, SecureBoosting

Parameters

threshold (float or int, The threshold use to separate positive and negative class. Normally, it should be (0,1)) –

class RsaParam(rsa_key_n=None, rsa_key_e=None, rsa_key_d=None, save_out_table_namespace=None, save_out_table_name=None)

Define the sample method

Parameters
  • rsa_key_n (integer, RSA modulus, default: None) –

  • rsa_key_e (integer, RSA public exponent, default: None) –

  • rsa_key_d (integer, RSA private exponent, default: None) –

  • save_out_table_namespace (str, namespace of dtable where stores the output data. default: None) –

  • save_out_table_name (str, name of dtable where stores the output data. default: None) –

class SampleParam(mode='random', method='downsample', fractions=None, random_state=None, task_type='hetero', need_run=True)

Define the sample method

Parameters
  • mode (str, accepted 'random','stratified'' only in this version, specify sample to use, default: 'random') –

  • method (str, accepted 'downsample','upsample' only in this version. default: 'downsample') –

  • fractions (None or float or list, if mode equals to random, it should be a float number greater than 0,) – otherwise a list of elements of pairs like [label_i, sample_rate_i], e.g. [[0, 0.5], [1, 0.8], [2, 0.3]]. default: None

  • random_state (int, RandomState instance or None, default: None) –

  • need_run (bool, default True) – Indicate if this module needed to be run

class ScaleParam(method=None, mode='normal', scale_col_indexes=- 1, scale_names=None, feat_upper=None, feat_lower=None, with_mean=True, with_std=True, need_run=True)

Define the feature scale parameters.

Parameters
  • method (str, like scale in sklearn, now it support "min_max_scale" and "standard_scale", and will support other scale method soon.) – Default None, which will do nothing for scale

  • mode (str, the mode support "normal" and "cap". for mode is "normal", the feat_upper and feat_lower is the normal value like "10" or "3.1" and for "cap", feat_upper and) – feature_lower will between 0 and 1, which means the percentile of the column. Default “normal”

  • feat_upper (int or float, the upper limit in the column. If the scaled value is larger than feat_upper, it will be set to feat_upper. Default None.) –

  • feat_lower (int or float, the lower limit in the column. If the scaled value is less than feat_lower, it will be set to feat_lower. Default None.) –

  • scale_col_indexes (list,the idx of column in scale_column_idx will be scaled, while the idx of column is not in, it will not be scaled.) –

  • scale_names (list of string, default: []Specify which columns need to scaled. Each element in the list represent for a column name in header.) –

  • with_mean (bool, used for "standard_scale". Default False.) –

  • with_std (bool, used for "standard_scale". Default False.) – The standard scale of column x is calculated as : z = (x - u) / s, where u is the mean of the column and s is the standard deviation of the column. if with_mean is False, u will be 0, and if with_std is False, s will be 1.

  • need_run (bool, default True) – Indicate if this module needed to be run

class StatisticsParam(statistics='summary', column_names=None, column_indexes=- 1, need_run=True, abnormal_list=None, quantile_error=0.001, bias=True)

Define statistics params

Parameters
  • statistics (list, string, default "summary") –

    Specify the statistic types to be computed. “summary” represents list: [consts.SUM, consts.MEAN, consts.STANDARD_DEVIATION,

    consts.MEDIAN, consts.MIN, consts.MAX, consts.MISSING_COUNT, consts.SKEWNESS, consts.KURTOSIS]

  • column_names (list of string, default []) – Specify columns to be used for statistic computation by column names in header

  • column_indexes (list of int, default -1) – Specify columns to be used for statistic computation by column order in header -1 indicates to compute statistics over all columns

  • bias (bool, default: True) – If False, the calculations of skewness and kurtosis are corrected for statistical bias.

  • need_run (bool, default True) – Indicate whether to run this modules

class StepwiseParam(score_name='AIC', mode='hetero', role='guest', direction='both', max_step=10, nvmin=2, nvmax=None, need_stepwise=False)

Define stepwise params

Parameters
  • score_name (str, default: 'AIC') – Specify which model selection criterion to be used

  • mode (str, default: 'Hetero') – Indicate what mode is current task

  • role (str, default: 'Guest') – Indicate what role is current party

  • direction (str, default: 'both') – Indicate which direction to go for stepwise. ‘forward’ means forward selection; ‘backward’ means elimination; ‘both’ means possible models of both directions are examined at each step.

  • max_step (int, default: '10') – Specify total number of steps to run before forced stop.

  • nvmin (int, default: '2') – Specify the min subset size of final model, cannot be lower than 2. When nvmin > 2, the final model size may be smaller than nvmin due to max_step limit.

  • nvmax (int, default: None) – Specify the max subset size of final model, 2 <= nvmin <= nvmax. The final model size may be larger than nvmax due to max_step limit.

  • need_stepwise (bool, default False) – Indicate if this module needed to be run

class StochasticQuasiNewtonParam(update_interval_L=3, memory_M=5, sample_size=5000, random_seed=None)

Parameters used for stochastic quasi-newton method.

Parameters
  • update_interval_L (int, default: 3) – Set how many iteration to update hess matrix

  • memory_M (int, default: 5) – Stack size of curvature information, i.e. y_k and s_k in the paper.

  • sample_size (int, default: 5000) – Sample size of data that used to update Hess matrix

class UnionParam(need_run=True, allow_missing=False, keep_duplicate=False)

Define the union method for combining multiple dTables and keep entries with the same id

Parameters
  • need_run (bool, default True) – Indicate if this module needed to be run

  • allow_missing (bool, default False) – Whether allow mismatch between feature length and header length in the result. Note that empty tables will always be skipped regardless of this param setting.

  • keep_duplicate (bool, default False) – Whether to keep entries with duplicated keys. If set to True, a new id will be generated for duplicated entry in the format {id}_{table_name}.