Params

Classes:

BoostingParam([task_type, objective_param, …])

Basic parameter for Boosting Algorithms

ObjectiveParam([objective, params])

Define objective parameters that used in federated ml.

DecisionTreeParam([criterion_method, …])

Define decision tree parameters that used in federated ml.

CrossValidationParam([n_splits, mode, role, …])

Define cross validation params

DataSplitParam([random_state, test_size, …])

Define data split param that used in data split.

DataIOParam([input_format, delimitor, …])

Define dataio parameters that used in federated ml.

DataTransformParam([input_format, …])

Define data transform parameters that used in federated ml.

EncryptParam([method, key_length])

Define encryption method that used in federated ml.

EncryptedModeCalculatorParam([mode, …])

Define the encrypted_mode_calulator parameters.

FeatureBinningParam([method, …])

Define the feature binning method

FeatureSelectionParam([select_col_indexes, …])

Define the feature selection parameters.

HeteroNNParam([task_type, config_type, …])

Parameters used for Hetero Neural Network.

HomoNNParam(api_version, secure_aggregate, …)

Parameters used for Homo Neural Network.

HomoOneHotParam([transform_col_indexes, …])

param transform_col_indexes

Specify which columns need to calculated. -1 represent for all columns.

InitParam([init_method, init_const, …])

Initialize Parameters used in initializing a model.

IntersectParam(intersect_method[, …])

Define the intersect method

EncodeParam([salt, encode_method, base64])

Define the hash method for raw intersect method

RSAParam([salt, hash_method, …])

Define the hash method for RSA intersect method

LinearParam([penalty, tol, alpha, …])

Parameters used for Linear Regression.

LocalBaselineParam([model_name, model_opts, …])

Define the local baseline model param

LogisticParam([penalty, tol, alpha, …])

Parameters used for Logistic Regression both for Homo mode or Hetero mode.

OneVsRestParam([need_one_vs_rest, has_arbiter])

Define the one_vs_rest parameters.

PoissonParam([penalty, tol, alpha, …])

Parameters used for Poisson Regression.

PredictParam([threshold])

Define the predict method of HomoLR, HeteroLR, SecureBoosting

RsaParam([rsa_key_n, rsa_key_e, rsa_key_d, …])

Define the sample method

SampleParam([mode, method, fractions, …])

Define the sample method

ScaleParam([method, mode, …])

Define the feature scale parameters.

StochasticQuasiNewtonParam([…])

Parameters used for stochastic quasi-newton method.

StatisticsParam([statistics, column_names, …])

Define statistics params

StepwiseParam([score_name, mode, role, …])

Define stepwise params

UnionParam([need_run, allow_missing, …])

Define the union method for combining multiple dTables and keep entries with the same id

ColumnExpandParam([append_header, method, …])

Define method used for expanding column

KmeansParam([k, max_iter, tol, random_stat])

Parameters used for K-means.

ScorecardParam([method, offset, factor, …])

Define method used for transforming prediction score to credit score

SecureInformationRetrievalParam([…])

security_level: float [0, 1]; if security_level == 0, then do raw data retrieval oblivious_transfer_protocol: OT type, only supports consts.OT_HAUCK commutative_encryption: the commutative encryption scheme used, only supports consts.CE_PH non_committing_encryption: the non-committing encryption scheme used, only supports consts.AES key_size: int >= 768, the key length of the commutative cipher raw_retrieval: bool, perform raw retrieval if raw_retrieval

SampleWeightParam([class_weight, …])

Define sample weight parameters

class BoostingParam(task_type='classification', objective_param=<federatedml.param.boosting_param.ObjectiveParam object>, learning_rate=0.3, num_trees=5, subsample_feature_rate=1, n_iter_no_change=True, tol=0.0001, bin_num=32, predict_param=<federatedml.param.predict_param.PredictParam object>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object>, validation_freqs=None, metrics=None, random_seed=100, binning_error=0.0001)

Basic parameter for Boosting Algorithms

Parameters
  • task_type (str, accepted 'classification', 'regression' only, default: 'classification') –

  • objective_param (ObjectiveParam Object, default: ObjectiveParam()) –

  • learning_rate (float, accepted float, int or long only, the learning rate of secure boost. default: 0.3) –

  • num_trees (int, accepted int, float only, the max number of boosting round. default: 5) –

  • subsample_feature_rate (float, a float-number in [0, 1], default: 1.0) –

  • n_iter_no_change (bool,) – when True and residual error less than tol, tree building process will stop. default: True

  • bin_num (int, positive integer greater than 1, bin number use in quantile. default: 32) –

  • validation_freqs (None or positive integer or container object in python. Do validation in training process or Not.) –

    if equals None, will not do validation in train process; if equals positive integer, will validate data every validation_freqs epochs passes; if container object in python, will validate data if epochs belong to this container.

    e.g. validation_freqs = [10, 15], will validate data when epoch equals to 10 and 15.

    Default: None

class ObjectiveParam(objective='cross_entropy', params=None)

Define objective parameters that used in federated ml.

Parameters
  • objective (None or str, accepted None,'cross_entropy','lse','lae','log_cosh','tweedie','fair','huber' only,) – None in host’s config, should be str in guest’config. when task_type is classification, only support cross_entropy, other 6 types support in regression task. default: None

  • params (None or list, should be non empty list when objective is 'tweedie','fair','huber',) – first element of list shoulf be a float-number large than 0.0 when objective is ‘fair’,’huber’, first element of list should be a float-number in [1.0, 2.0) when objective is ‘tweedie’

class DecisionTreeParam(criterion_method='xgboost', criterion_params=[0.1, 0], max_depth=3, min_sample_split=2, min_impurity_split=0.001, min_leaf_node=1, max_split_nodes=65536, feature_importance_type='split', n_iter_no_change=True, tol=0.001, min_child_weight=0, use_missing=False, zero_as_missing=False, deterministic=False)

Define decision tree parameters that used in federated ml.

Parameters
  • criterion_method (str, accepted "xgboost" only, the criterion function to use, default: 'xgboost') –

  • criterion_params (list or dict, should be non empty and elements are float-numbers,) – if a list is offered, the first one is l2 regularization value, and the second one is l1 regularization value. if a dict is offered, make sure it contains key ‘l1’, and ‘l2’. l1, l2 regularization values are non-negative floats. default: [0.1, 0] or {‘l1’:0, ‘l2’:0,1}

  • max_depth (int, positive integer, the max depth of a decision tree, default: 3) –

  • min_sample_split (int, least quantity of nodes to split, default: 2) –

  • min_impurity_split (float, least gain of a single split need to reach, default: 1e-3) –

  • min_child_weight (float, sum of hessian needed in child nodes. default is 0) –

  • min_leaf_node (int, when samples no more than min_leaf_node, it becomes a leave, default: 1) –

  • max_split_nodes (int, positive integer, we will use no more than max_split_nodes to) – parallel finding their splits in a batch, for memory consideration. default is 65536

  • feature_importance_type (str, support 'split', 'gain' only.) – if is ‘split’, feature_importances calculate by feature split times, if is ‘gain’, feature_importances calculate by feature split gain. default: ‘split’

  • use_missing (bool, accepted True, False only, use missing value in training process or not. default: False) –

  • zero_as_missing (bool, accepted True, False only, regard 0 as missing value or not,) – will be use only if use_missing=True, default: False

  • deterministic (bool, ensure stability when computing histogram. Set this to true to ensure stable result when using) – same data and same parameter. But it may slow down computation.

class CrossValidationParam(n_splits=5, mode='hetero', role='guest', shuffle=True, random_seed=1, need_cv=False, output_fold_history=True, history_value_type='score')

Define cross validation params

Parameters
  • n_splits (int, default: 5) – Specify how many splits used in KFold

  • mode (str, default: 'Hetero') – Indicate what mode is current task

  • role (str, default: 'Guest') – Indicate what role is current party

  • shuffle (bool, default: True) – Define whether do shuffle before KFold or not.

  • random_seed (int, default: 1) – Specify the random seed for numpy shuffle

  • need_cv (bool, default False) – Indicate if this module needed to be run

  • output_fold_history (bool, default True) – Indicate whether to output table of ids used by each fold, else return original input data returned ids are formatted as: {original_id}#fold{fold_num}#{train/validate}

  • history_value_type (str, default score, choose between {'instance', 'score'}) – Indicate whether to include original instance or predict score in the output fold history, only effective when output_fold_history set to True

class DataSplitParam(random_state=None, test_size=None, train_size=None, validate_size=None, stratified=False, shuffle=True, split_points=None, need_run=True)

Define data split param that used in data split.

Parameters
  • random_state (None, int, default: None) – Specify the random state for shuffle.

  • test_size (None, float, int, default: 0.0) – Specify test data set size. float value specifies fraction of input data set, int value specifies exact number of data instances

  • train_size (None, float, int, default: 0.8) – Specify train data set size. float value specifies fraction of input data set, int value specifies exact number of data instances

  • validate_size (None, float, int, default: 0.2) – Specify validate data set size. float value specifies fraction of input data set, int value specifies exact number of data instances

  • stratified (boolean, default: False) – Define whether sampling should be stratified, according to label value.

  • shuffle (boolean, default : True) – Define whether do shuffle before splitting or not.

  • split_points (None, list, default : None) – Specify the point(s) by which continuous label values are bucketed into bins for stratified split. eg.[0.2] for two bins or [0.1, 1, 3] for 4 bins

  • need_run (bool, default: True) – Specify whether to run data split

class DataIOParam(input_format='dense', delimitor=',', data_type='float64', exclusive_data_type=None, tag_with_value=False, tag_value_delimitor=':', missing_fill=False, default_value=0, missing_fill_method=None, missing_impute=None, outlier_replace=False, outlier_replace_method=None, outlier_impute=None, outlier_replace_value=0, with_label=False, label_name='y', label_type='int', output_format='dense', need_run=True)

Define dataio parameters that used in federated ml.

Parameters
  • input_format (str, accepted 'dense','sparse' 'tag' only in this version. default: 'dense'.) –

    please have a look at this tutorial at “DataIO” section of federatedml/util/README.md. Formally,

    dense input format data should be set to “dense”, svm-light input format data should be set to “sparse”, tag or tag:value input format data should be set to “tag”.

  • delimitor (str, the delimitor of data input, default: ',') –

  • data_type (str, the data type of data input, accepted 'float','float64','int','int64','str','long') – “default: “float64”

  • exclusive_data_type (dict, the key of dict is col_name, the value is data_type, use to specified special data type) – of some features.

  • tag_with_value (bool, use if input_format is 'tag', if tag_with_value is True,) – input column data format should be tag[delimitor]value, otherwise is tag only

  • tag_value_delimitor (str, use if input_format is 'tag' and 'tag_with_value' is True,) – delimitor of tag[delimitor]value column value.

  • missing_fill (bool, need to fill missing value or not, accepted only True/False, default: False) –

  • default_value (None or single object type or list, the value to replace missing value.) –

    if None, it will use default value define in federatedml/feature/imputer.py, if single object, will fill missing value with this object, if list, it’s length should be the sample of input data’ feature dimension,

    means that if some column happens to have missing values, it will replace it the value by element in the identical position of this list.

    default: None

  • missing_fill_method (None or str, the method to replace missing value, should be one of [None, 'min', 'max', 'mean', 'designated'], default: None) –

  • missing_impute (None or list, element of list can be any type, or auto generated if value is None, define which values to be consider as missing, default: None) –

  • outlier_replace (bool, need to replace outlier value or not, accepted only True/False, default: True) –

  • outlier_replace_method (None or str, the method to replace missing value, should be one of [None, 'min', 'max', 'mean', 'designated'], default: None) –

  • outlier_impute (None or list, element of list can be any type, which values should be regard as missing value, default: None) –

  • outlier_replace_value (None or single object type or list, the value to replace outlier.) –

    if None, it will use default value define in federatedml/feature/imputer.py, if single object, will replace outlier with this object, if list, it’s length should be the sample of input data’ feature dimension,

    means that if some column happens to have outliers, it will replace it the value by element in the identical position of this list.

    default: None

  • with_label (bool, True if input data consist of label, False otherwise. default: 'false') –

  • label_name (str, column_name of the column where label locates, only use in dense-inputformat. default: 'y') –

  • label_type (object, accepted 'int','int64','float','float64','long','str' only,) – use when with_label is True. default: ‘false’

  • output_format (str, accepted 'dense','sparse' only in this version. default: 'dense') –

class DataTransformParam(input_format='dense', delimitor=',', data_type='float64', exclusive_data_type=None, tag_with_value=False, tag_value_delimitor=':', missing_fill=False, default_value=0, missing_fill_method=None, missing_impute=None, outlier_replace=False, outlier_replace_method=None, outlier_impute=None, outlier_replace_value=0, with_label=False, label_name='y', label_type='int', output_format='dense', need_run=True)

Define data transform parameters that used in federated ml.

Parameters
  • input_format (str, accepted 'dense','sparse' 'tag' only in this version. default: 'dense'.) –

    please have a look at this tutorial at “DataTransform” section of federatedml/util/README.md. Formally,

    dense input format data should be set to “dense”, svm-light input format data should be set to “sparse”, tag or tag:value input format data should be set to “tag”.

  • delimitor (str, the delimitor of data input, default: ',') –

  • data_type (str, the data type of data input, accepted 'float','float64','int','int64','str','long') – “default: “float64”

  • exclusive_data_type (dict, the key of dict is col_name, the value is data_type, use to specified special data type) – of some features.

  • tag_with_value (bool, use if input_format is 'tag', if tag_with_value is True,) – input column data format should be tag[delimitor]value, otherwise is tag only

  • tag_value_delimitor (str, use if input_format is 'tag' and 'tag_with_value' is True,) – delimitor of tag[delimitor]value column value.

  • missing_fill (bool, need to fill missing value or not, accepted only True/False, default: False) –

  • default_value (None or single object type or list, the value to replace missing value.) –

    if None, it will use default value define in federatedml/feature/imputer.py, if single object, will fill missing value with this object, if list, it’s length should be the sample of input data’ feature dimension,

    means that if some column happens to have missing values, it will replace it the value by element in the identical position of this list.

    default: None

  • missing_fill_method (None or str, the method to replace missing value, should be one of [None, 'min', 'max', 'mean', 'designated'], default: None) –

  • missing_impute (None or list, element of list can be any type, or auto generated if value is None, define which values to be consider as missing, default: None) –

  • outlier_replace (bool, need to replace outlier value or not, accepted only True/False, default: True) –

  • outlier_replace_method (None or str, the method to replace missing value, should be one of [None, 'min', 'max', 'mean', 'designated'], default: None) –

  • outlier_impute (None or list, element of list can be any type, which values should be regard as missing value, default: None) –

  • outlier_replace_value (None or single object type or list, the value to replace outlier.) –

    if None, it will use default value define in federatedml/feature/imputer.py, if single object, will replace outlier with this object, if list, it’s length should be the sample of input data’ feature dimension,

    means that if some column happens to have outliers, it will replace it the value by element in the identical position of this list.

    default: None

  • with_label (bool, True if input data consist of label, False otherwise. default: 'false') –

  • label_name (str, column_name of the column where label locates, only use in dense-inputformat. default: 'y') –

  • label_type (object, accepted 'int','int64','float','float64','long','str' only,) – use when with_label is True. default: ‘false’

  • output_format (str, accepted 'dense','sparse' only in this version. default: 'dense') –

class EncryptParam(method='Paillier', key_length=1024)

Define encryption method that used in federated ml.

Parameters
  • method (str, default: 'Paillier') – If method is ‘Paillier’, Paillier encryption will be used for federated ml. To use non-encryption version in HomoLR, set this to None. For detail of Paillier encryption, please check out the paper mentioned in README file. Accepted values: {‘Paillier’, ‘IterativeAffine’, ‘Random_IterativeAffine’}

  • key_length (int, default: 1024) – Used to specify the length of key in this encryption method.

class EncryptedModeCalculatorParam(mode='strict', re_encrypted_rate=1)

Define the encrypted_mode_calulator parameters.

Parameters
  • mode (str, support 'strict', 'fast', 'balance', 'confusion_opt', ' only, default: strict) –

  • re_encrypted_rate (float or int, numeric number in [0, 1], use when mode equals to 'balance, default: 1) –

class FeatureBinningParam(method='quantile', compress_thres=10000, head_size=10000, error=0.0001, bin_num=10, bin_indexes=-1, bin_names=None, adjustment_factor=0.5, transform_param=<federatedml.param.feature_binning_param.TransformParam object>, local_only=False, category_indexes=None, category_names=None, need_run=True, skip_static=False)

Define the feature binning method

Parameters
  • method (str, 'quantile', 'bucket' or 'optimal', default: 'quantile') – Binning method.

  • compress_thres (int, default: 10000) – When the number of saved summaries exceed this threshold, it will call its compress function

  • head_size (int, default: 10000) – The buffer size to store inserted observations. When head list reach this buffer size, the QuantileSummaries object start to generate summary(or stats) and insert into its sampled list.

  • error (float, 0 <= error < 1 default: 0.001) – The error of tolerance of binning. The final split point comes from original data, and the rank of this value is close to the exact rank. More precisely, floor((p - 2 * error) * N) <= rank(x) <= ceil((p + 2 * error) * N) where p is the quantile in float, and N is total number of data.

  • bin_num (int, bin_num > 0, default: 10) – The max bin number for binning

  • bin_indexes (list of int or int, default: -1) – Specify which columns need to be binned. -1 represent for all columns. If you need to indicate specific cols, provide a list of header index instead of -1.

  • bin_names (list of string, default: []) – Specify which columns need to calculated. Each element in the list represent for a column name in header.

  • adjustment_factor (float, default: 0.5) – the adjustment factor when calculating WOE. This is useful when there is no event or non-event in a bin. Please note that this parameter will NOT take effect for setting in host.

  • category_indexes (list of int or int, default: []) –

    Specify which columns are category features. -1 represent for all columns. List of int indicate a set of such features. For category features, bin_obj will take its original values as split_points and treat them as have been binned. If this is not what you expect, please do NOT put it into this parameters.

    The number of categories should not exceed bin_num set above.

  • category_names (list of string, default: []) – Use column names to specify category features. Each element in the list represent for a column name in header.

  • local_only (bool, default: False) – Whether just provide binning method to guest party. If true, host party will do nothing. Warnings: This parameter will be deprecated in future version.

  • transform_param (TransformParam) – Define how to transfer the binned data.

  • need_run (bool, default True) – Indicate if this module needed to be run

  • skip_static (bool, default False) – If true, binning will not calculate iv, woe etc. In this case, optimal-binning will not be supported.

class FeatureSelectionParam(select_col_indexes=-1, select_names=None, filter_methods=None, unique_param=<federatedml.param.feature_selection_param.UniqueValueParam object>, iv_value_param=<federatedml.param.feature_selection_param.IVValueSelectionParam object>, iv_percentile_param=<federatedml.param.feature_selection_param.IVPercentileSelectionParam object>, iv_top_k_param=<federatedml.param.feature_selection_param.IVTopKParam object>, variance_coe_param=<federatedml.param.feature_selection_param.VarianceOfCoeSelectionParam object>, outlier_param=<federatedml.param.feature_selection_param.OutlierColsSelectionParam object>, manually_param=<federatedml.param.feature_selection_param.ManuallyFilterParam object>, percentage_value_param=<federatedml.param.feature_selection_param.PercentageValueParam object>, iv_param=<federatedml.param.feature_selection_param.CommonFilterParam object>, statistic_param=<federatedml.param.feature_selection_param.CommonFilterParam object>, psi_param=<federatedml.param.feature_selection_param.CommonFilterParam object>, vif_param=<federatedml.param.feature_selection_param.CommonFilterParam object>, sbt_param=<federatedml.param.feature_selection_param.CommonFilterParam object>, correlation_param=<federatedml.param.feature_selection_param.CorrelationFilterParam object>, need_run=True)

Define the feature selection parameters.

Parameters
  • select_col_indexes (list or int, default: -1) – Specify which columns need to calculated. -1 represent for all columns.

  • select_names (list of string, default: []) – Specify which columns need to calculated. Each element in the list represent for a column name in header.

  • filter_methods (list, ["manually", "iv_filter", "statistic_filter",) –

    “psi_filter”, “hetero_sbt_filter”, “homo_sbt_filter”,

    ”hetero_fast_sbt_filter”, “percentage_value”, “vif_filter”, “correlation_filter”],

    default: [“manually”]

    The following methods will be deprecated in future version: “unique_value”, “iv_value_thres”, “iv_percentile”, “coefficient_of_variation_value_thres”, “outlier_cols”

    Specify the filter methods used in feature selection. The orders of filter used is depended on this list. Please be notified that, if a percentile method is used after some certain filter method, the percentile represent for the ratio of rest features.

    e.g. If you have 10 features at the beginning. After first filter method, you have 8 rest. Then, you want top 80% highest iv feature. Here, we will choose floor(0.8 * 8) = 6 features instead of 8.

  • unique_param (filter the columns if all values in this feature is the same) –

  • iv_value_param (Use information value to filter columns. If this method is set, a float threshold need to be provided.) – Filter those columns whose iv is smaller than threshold. Will be deprecated in the future.

  • iv_percentile_param (Use information value to filter columns. If this method is set, a float ratio threshold) – need to be provided. Pick floor(ratio * feature_num) features with higher iv. If multiple features around the threshold are same, all those columns will be keep. Will be deprecated in the future.

  • variance_coe_param (Use coefficient of variation to judge whether filtered or not.) – Will be deprecated in the future.

  • outlier_param (Filter columns whose certain percentile value is larger than a threshold.) – Will be deprecated in the future.

  • percentage_value_param (Filter the columns that have a value that exceeds a certain percentage.) –

  • iv_param (Setting how to filter base on iv. It support take high mode only. All of "threshold",) – “top_k” and “top_percentile” are accepted. Check more details in CommonFilterParam. To use this filter, hetero-feature-binning module has to be provided.

  • statistic_param (Setting how to filter base on statistic values. All of "threshold",) – “top_k” and “top_percentile” are accepted. Check more details in CommonFilterParam. To use this filter, data_statistic module has to be provided.

  • psi_param (Setting how to filter base on psi values. All of "threshold",) – “top_k” and “top_percentile” are accepted. Its take_high properties should be False to choose lower psi features. Check more details in CommonFilterParam. To use this filter, data_statistic module has to be provided.

  • need_run (bool, default True) – Indicate if this module needed to be run

class HeteroNNParam(task_type='classification', config_type='keras', bottom_nn_define=None, top_nn_define=None, interactive_layer_define=None, interactive_layer_lr=0.9, optimizer='SGD', loss=None, epochs=100, batch_size=-1, early_stop='diff', tol=1e-05, encrypt_param=<federatedml.param.encrypt_param.EncryptParam object>, encrypted_mode_calculator_param=<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object>, predict_param=<federatedml.param.predict_param.PredictParam object>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object>, validation_freqs=None, early_stopping_rounds=None, metrics=None, use_first_metric_only=True, selector_param=<federatedml.param.hetero_nn_param.SelectorParam object>, floating_point_precision=23, drop_out_keep_rate=1.0)

Parameters used for Hetero Neural Network.

Parameters
  • task_type – str, task type of hetero nn model, one of ‘classification’, ‘regression’.

  • config_type – str, accept “keras” only.

  • bottom_nn_define – a dict represents the structure of bottom neural network.

  • interactive_layer_define – a dict represents the structure of interactive layer.

  • interactive_layer_lr – float, the learning rate of interactive layer.

  • top_nn_define – a dict represents the structure of top neural network.

  • optimizer

    optimizer method, accept following types: 1. a string, one of “Adadelta”, “Adagrad”, “Adam”, “Adamax”, “Nadam”, “RMSprop”, “SGD” 2. a dict, with a required key-value pair keyed by “optimizer”,

    with optional key-value pairs such as learning rate.

    defaults to “SGD”

  • loss – str, a string to define loss function used

  • early_stopping_rounds – int, default: None

  • rounds (Will stop training if one metric doesn’t improve in last early_stopping_round) –

  • metrics – list, default: None Indicate when executing evaluation during train process, which metrics will be used. If not set, default metrics for specific task type will be used. As for binary classification, default metrics are [‘auc’, ‘ks’], for regression tasks, default metrics are [‘root_mean_squared_error’, ‘mean_absolute_error’], [ACCURACY, PRECISION, RECALL] for multi-classification task

  • use_first_metric_only – bool, default: False Indicate whether to use the first metric in metrics as the only criterion for early stopping judgement.

  • epochs – int, the maximum iteration for aggregation in training.

  • batch_size – int, batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy. defaults to -1.

  • early_stop

    str, accept ‘diff’ only in this version, default: ‘diff’ Method used to judge converge or not.

    1. diff: Use difference of loss between two iterations to judge whether converge.

  • validation_freqs

    None or positive integer or container object in python. Do validation in training process or Not. if equals None, will not do validation in train process; if equals positive integer, will validate data every validation_freqs epochs passes; if container object in python, will validate data if epochs belong to this container.

    e.g. validation_freqs = [10, 15], will validate data when epoch equals to 10 and 15.

    Default: None The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to speed up training by skipping validation rounds. When it is larger than 1, a number which is divisible by “epochs” is recommended, otherwise, you will miss the validation scores of last training epoch.

  • floating_point_precision

    None or integer, if not None, means use floating_point_precision-bit to speed up calculation, e.g.: convert an x to round(x * 2**floating_point_precision) during Paillier operation, divide

    the result by 2**floating_point_precision in the end.

  • drop_out_keep_rate – float, should betweend 0 and 1, if not equals to 1.0, will enabled drop out

class HomoNNParam(api_version: int = 0, secure_aggregate: bool = True, aggregate_every_n_epoch: int = 1, config_type: str = 'nn', nn_define: Optional[dict] = None, optimizer: Union[str, dict, types.SimpleNamespace] = 'SGD', loss: Optional[str] = None, metrics: Optional[Union[str, list]] = None, max_iter: int = 100, batch_size: int = -1, early_stop: Union[str, dict, types.SimpleNamespace] = 'diff', encode_label: bool = False, predict_param=<federatedml.param.predict_param.PredictParam object>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object>)

Parameters used for Homo Neural Network.

Parameters

Args

secure_aggregate: enable secure aggregation or not, defaults to True. aggregate_every_n_epoch: aggregate model every n epoch, defaults to 1. config_type: one of “nn”, “keras”, “tf” nn_define: a dict represents the structure of neural network. optimizer: optimizer method, accept following types:

  1. a string, one of “Adadelta”, “Adagrad”, “Adam”, “Adamax”, “Nadam”, “RMSprop”, “SGD”

  2. a dict, with a required key-value pair keyed by “optimizer”,

    with optional key-value pairs such as learning rate.

defaults to “SGD”

loss: a string metrics: max_iter: the maximum iteration for aggregation in training. batch_size : batch size when updating model.

-1 means use all data in a batch. i.e. Not to use mini-batch strategy. defaults to -1.

early_stopstr, ‘diff’, ‘weight_diff’ or ‘abs’, default: ‘diff’
Method used to judge converge or not.
  1. diff: Use difference of loss between two iterations to judge whether converge.

  2. weight_diff: Use difference between weights of two consecutive iterations

  3. abs: Use the absolute value of loss to judge whether converge. i.e. if loss < eps, it is converged.

encode_label : encode label to one_hot.

class HomoOneHotParam(transform_col_indexes=- 1, transform_col_names=None, need_run=True, need_alignment=True)
Parameters
  • transform_col_indexes (list or int, default: -1) – Specify which columns need to calculated. -1 represent for all columns.

  • need_run (bool, default True) – Indicate if this module needed to be run

  • need_alignment (bool, default True) – Indicated whether alignment of features is turned on

class InitParam(init_method='random_uniform', init_const=1, fit_intercept=True, random_seed=None)

Initialize Parameters used in initializing a model.

Parameters
  • init_method (str, 'random_uniform', 'random_normal', 'ones', 'zeros' or 'const'. default: 'random_uniform') – Initial method.

  • init_const (int or float, default: 1) – Required when init_method is ‘const’. Specify the constant.

  • fit_intercept (bool, default: True) – Whether to initialize the intercept or not.

class IntersectParam(intersect_method: str = 'raw', random_bit=128, sync_intersect_ids=True, join_role='guest', with_encode=False, only_output_key=False, encode_params=<federatedml.param.intersect_param.EncodeParam object>, rsa_params=<federatedml.param.intersect_param.RSAParam object>, intersect_cache_param=<federatedml.param.intersect_param.IntersectCache object>, repeated_id_process=False, repeated_id_owner='guest', with_sample_id=False, allow_info_share: bool = False, info_owner='guest')

Define the intersect method

Parameters
  • intersect_method (str, it supports 'rsa' and 'raw', default by 'raw') –

  • random_bit (positive int, it will define the encrypt length of rsa algorithm. It effective only for intersect_method is rsa) –

  • sync_intersect_ids (bool. In rsa, 'synchronize_intersect_ids' is True means guest or host will send intersect results to the others, and False will not.) – while in raw, ‘synchronize_intersect_ids’ is True means the role of “join_role” will send intersect results and the others will get them. Default by True.

  • join_role (str, role who joins ids, supports "guest" and "host" only and effective only for raw. If it is "guest", the host will send its ids to guest and find the intersection of) – ids in guest; if it is “host”, the guest will send its ids to host. Default by “guest”.

  • with_encode (bool, if True, it will use hash method for intersect ids. Effective only for "raw".) –

  • encode_params (EncodeParam, it effective only for with_encode is True) –

  • rsa_params (RSAParam, effective for rsa method only) –

  • only_output_key (bool, if false, the results of intersection will include key and value which from input data; if true, it will just include key from input) – data and the value will be empty or some useless character like “intersect_id”

  • repeated_id_process (bool, if true, intersection will process the ids which can be repeatable) –

  • repeated_id_owner (str, which role has the repeated ids) –

  • with_sample_id (bool, data with sample id or not, default False; set this param to True may lead to unexpected behavior) –

class EncodeParam(salt='', encode_method='none', base64=False)

Define the hash method for raw intersect method

Parameters
  • salt (the src data string will be str = str + salt, default by empty string) –

  • encode_method (str, the hash method of src data string, it support md5, sha1, sha224, sha256, sha384, sha512, sm3, default by None) –

  • base64 (bool, if True, the result of hash will be changed to base64, default by False) –

class RSAParam(salt='', hash_method='sha256', final_hash_method='sha256', split_calculation=False, random_base_fraction=None, key_length=1024)

Define the hash method for RSA intersect method

Parameters
  • salt (the src data string will be str = str + salt, default '') –

  • hash_method (str, the hash method of src data string, it support sha256, sha384, sha512, sm3, default sha256) –

  • final_hash_method (str, the hash method of result data string, it support md5, sha1, sha224, sha256, sha384, sha512, sm3, default sha256) –

  • split_calculation (bool, if True, Host & Guest split operations for faster performance, recommended on large data set) –

  • random_base_fraction (positive float, if not None, generate (fraction * public key id count) of r for encryption and reuse generated r;) – note that value greater than 0.99 will be taken as 1, and value less than 0.01 will be rounded up to 0.01

  • key_length (positive int, bit count of rsa key, default 1024) –

class LinearParam(penalty='L2', tol=0.0001, alpha=1.0, optimizer='sgd', batch_size=-1, learning_rate=0.01, init_param=<federatedml.param.init_model_param.InitParam object>, max_iter=20, early_stop='diff', predict_param=<federatedml.param.predict_param.PredictParam object>, encrypt_param=<federatedml.param.encrypt_param.EncryptParam object>, sqn_param=<federatedml.param.sqn_param.StochasticQuasiNewtonParam object>, encrypted_mode_calculator_param=<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object>, decay=1, decay_sqrt=True, validation_freqs=None, early_stopping_rounds=None, stepwise_param=<federatedml.param.stepwise_param.StepwiseParam object>, metrics=None, use_first_metric_only=False, floating_point_precision=23)

Parameters used for Linear Regression.

Parameters
  • penalty (str, 'L1' or 'L2'. default: 'L2') – Penalty method used in LinR. Please note that, when using encrypted version in HeteroLinR, ‘L1’ is not supported.

  • tol (float, default: 1e-4) – The tolerance of convergence

  • alpha (float, default: 1.0) – Regularization strength coefficient.

  • optimizer (str, 'sgd', 'rmsprop', 'adam', 'sqn', or 'adagrad', default: 'sgd') – Optimize method

  • batch_size (int, default: -1) – Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.

  • learning_rate (float, default: 0.01) – Learning rate

  • max_iter (int, default: 20) – The maximum iteration for training.

  • init_param (InitParam object, default: default InitParam object) – Init param method object.

  • early_stop (str, 'diff' or 'abs' or 'weight_dff', default: 'diff') –

    Method used to judge convergence.
    1. diff: Use difference of loss between two iterations to judge whether converge.

    2. abs: Use the absolute value of loss to judge whether converge. i.e. if loss < tol, it is converged.

    3. weight_diff: Use difference between weights of two consecutive iterations

  • predict_param (PredictParam object, default: default PredictParam object) –

  • encrypt_param (EncryptParam object, default: default EncryptParam object) –

  • encrypted_mode_calculator_param (EncryptedModeCalculatorParam object, default: default EncryptedModeCalculatorParam object) –

  • cv_param (CrossValidationParam object, default: default CrossValidationParam object) –

  • decay (int or float, default: 1) – Decay rate for learning rate. learning rate will follow the following decay schedule. lr = lr0/(1+decay*t) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decay*t) where t is the iter number.

  • decay_sqrt (Bool, default: True) – lr = lr0/(1+decay*t) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decay*t)

  • validation_freqs (int, list, tuple, set, or None) – validation frequency during training, required when using early stopping. The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to speed up training by skipping validation rounds. When it is larger than 1, a number which is divisible by “max_iter” is recommended, otherwise, you will miss the validation scores of the last training iteration.

  • early_stopping_rounds (int, default: None) – If positive number specified, at every specified training rounds, program checks for early stopping criteria. Validation_freqs must also be set when using early stopping.

  • metrics (list or None, default: None) – Specify which metrics to be used when performing evaluation during training process. If metrics have not improved at early_stopping rounds, trianing stops before convergence. If set as empty, default metrics will be used. For regression tasks, default metrics are [‘root_mean_squared_error’, ‘mean_absolute_error’]

  • use_first_metric_only (bool, default: False) – Indicate whether to use the first metric in metrics as the only criterion for early stopping judgement.

  • floating_point_precision (None or integer, if not None, use floating_point_precision-bit to speed up calculation,) –

    e.g.: convert an x to round(x * 2**floating_point_precision) during Paillier operation, divide

    the result by 2**floating_point_precision in the end.

class LocalBaselineParam(model_name='LogisticRegression', model_opts=None, predict_param=<federatedml.param.predict_param.PredictParam object>, need_run=True)

Define the local baseline model param

Parameters
  • model_name (str, sklearn model used to train on baseline model) –

  • model_opts (dict or none, default None) – Param to be used as input into baseline model

  • predict_param (PredictParam object, default: default PredictParam object) –

  • need_run (bool, default True) – Indicate if this module needed to be run

class LogisticParam(penalty='L2', tol=0.0001, alpha=1.0, optimizer='rmsprop', batch_size=-1, learning_rate=0.01, init_param=<federatedml.param.init_model_param.InitParam object>, max_iter=100, early_stop='diff', encrypt_param=<federatedml.param.encrypt_param.EncryptParam object>, predict_param=<federatedml.param.predict_param.PredictParam object>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object>, decay=1, decay_sqrt=True, multi_class='ovr', validation_freqs=None, early_stopping_rounds=None, stepwise_param=<federatedml.param.stepwise_param.StepwiseParam object>, floating_point_precision=23, metrics=None, use_first_metric_only=False)

Parameters used for Logistic Regression both for Homo mode or Hetero mode.

Parameters
  • penalty (str, 'L1', 'L2' or None. default: 'L2') – Penalty method used in LR. Please note that, when using encrypted version in HomoLR, ‘L1’ is not supported.

  • tol (float, default: 1e-4) – The tolerance of convergence

  • alpha (float, default: 1.0) – Regularization strength coefficient.

  • optimizer (str, 'sgd', 'rmsprop', 'adam', 'nesterov_momentum_sgd', 'sqn' or 'adagrad', default: 'rmsprop') – Optimize method, if ‘sqn’ has been set, sqn_param will take effect. Currently, ‘sqn’ support hetero mode only.

  • batch_size (int, default: -1) – Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.

  • learning_rate (float, default: 0.01) – Learning rate

  • max_iter (int, default: 100) – The maximum iteration for training.

  • early_stop (str, 'diff', 'weight_diff' or 'abs', default: 'diff') –

    Method used to judge converge or not.
    1. diff: Use difference of loss between two iterations to judge whether converge.

    2. weight_diff: Use difference between weights of two consecutive iterations

    3. abs: Use the absolute value of loss to judge whether converge. i.e. if loss < eps, it is converged.

    Please note that for hetero-lr multi-host situation, this parameter support “weight_diff” only.

  • decay (int or float, default: 1) – Decay rate for learning rate. learning rate will follow the following decay schedule. lr = lr0/(1+decay*t) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decay*t) where t is the iter number.

  • decay_sqrt (Bool, default: True) – lr = lr0/(1+decay*t) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decay*t)

  • encrypt_param (EncryptParam object, default: default EncryptParam object) –

  • predict_param (PredictParam object, default: default PredictParam object) –

  • cv_param (CrossValidationParam object, default: default CrossValidationParam object) –

  • multi_class (str, 'ovr', default: 'ovr') – If it is a multi_class task, indicate what strategy to use. Currently, support ‘ovr’ short for one_vs_rest only.

  • validation_freqs (int, list, tuple, set, or None) – validation frequency during training.

  • early_stopping_rounds (int, default: None) – Will stop training if one metric doesn’t improve in last early_stopping_round rounds

  • metrics (list or None, default: None) – Indicate when executing evaluation during train process, which metrics will be used. If set as empty, default metrics for specific task type will be used. As for binary classification, default metrics are [‘auc’, ‘ks’]

  • use_first_metric_only (bool, default: False) – Indicate whether use the first metric only for early stopping judgement.

  • floating_point_precision (None or integer, if not None, use floating_point_precision-bit to speed up calculation,) –

    e.g.: convert an x to round(x * 2**floating_point_precision) during Paillier operation, divide

    the result by 2**floating_point_precision in the end.

class OneVsRestParam(need_one_vs_rest=False, has_arbiter=True)

Define the one_vs_rest parameters.

Parameters

has_arbiter (bool. For some algorithm, may not has arbiter, for instances, secureboost of FATE,) – for these algorithms, it should be set to false. default true

class PoissonParam(penalty='L2', tol=0.0001, alpha=1.0, optimizer='rmsprop', batch_size=-1, learning_rate=0.01, init_param=<federatedml.param.init_model_param.InitParam object>, max_iter=20, early_stop='diff', exposure_colname=None, predict_param=<federatedml.param.predict_param.PredictParam object>, encrypt_param=<federatedml.param.encrypt_param.EncryptParam object>, encrypted_mode_calculator_param=<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object>, stepwise_param=<federatedml.param.stepwise_param.StepwiseParam object>, decay=1, decay_sqrt=True, validation_freqs=None, early_stopping_rounds=None, metrics=None, use_first_metric_only=False, floating_point_precision=23)

Parameters used for Poisson Regression.

Parameters
  • penalty (str, 'L1' or 'L2'. default: 'L2') – Penalty method used in Poisson. Please note that, when using encrypted version in HeteroPoisson, ‘L1’ is not supported.

  • tol (float, default: 1e-4) – The tolerance of convergence

  • alpha (float, default: 1.0) – Regularization strength coefficient.

  • optimizer (str, 'sgd', 'rmsprop', 'adam' or 'adagrad', default: 'rmsprop') – Optimize method

  • batch_size (int, default: -1) – Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.

  • learning_rate (float, default: 0.01) – Learning rate

  • max_iter (int, default: 20) – The maximum iteration for training.

  • init_param (InitParam object, default: default InitParam object) – Init param method object.

  • early_stop (str, 'weight_diff', 'diff' or 'abs', default: 'diff') –

    Method used to judge convergence.
    1. diff: Use difference of loss between two iterations to judge whether converge.

    2. weight_diff: Use difference between weights of two consecutive iterations

    3. abs: Use the absolute value of loss to judge whether converge. i.e. if loss < eps, it is converged.

  • exposure_colname (str or None, default: None) – Name of optional exposure variable in dTable.

  • predict_param (PredictParam object, default: default PredictParam object) –

  • encrypt_param (EncryptParam object, default: default EncryptParam object) –

  • encrypted_mode_calculator_param (EncryptedModeCalculatorParam object, default: default EncryptedModeCalculatorParam object) –

  • cv_param (CrossValidationParam object, default: default CrossValidationParam object) –

  • stepwise_param (StepwiseParam object, default: default StepwiseParam object) –

  • decay (int or float, default: 1) – Decay rate for learning rate. learning rate will follow the following decay schedule. lr = lr0/(1+decay*t) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decay*t) where t is the iter number.

  • decay_sqrt (Bool, default: True) – lr = lr0/(1+decay*t) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decay*t)

  • validation_freqs (int, list, tuple, set, or None) – validation frequency during training, required when using early stopping. The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to speed up training by skipping validation rounds. When it is larger than 1, a number which is divisible by “max_iter” is recommended, otherwise, you will miss the validation scores of the last training iteration.

  • early_stopping_rounds (int, default: None) – If positive number specified, at every specified training rounds, program checks for early stopping criteria. Validation_freqs must also be set when using early stopping.

  • metrics (list or None, default: None) – Specify which metrics to be used when performing evaluation during training process. If metrics have not improved at early_stopping rounds, trianing stops before convergence. If set as empty, default metrics will be used. For regression tasks, default metrics are [‘root_mean_squared_error’, ‘mean_absolute_error’]

  • use_first_metric_only (bool, default: False) – Indicate whether to use the first metric in metrics as the only criterion for early stopping judgement.

  • floating_point_precision (None or integer, if not None, use floating_point_precision-bit to speed up calculation,) –

    e.g.: convert an x to round(x * 2**floating_point_precision) during Paillier operation, divide

    the result by 2**floating_point_precision in the end.

class PredictParam(threshold=0.5)

Define the predict method of HomoLR, HeteroLR, SecureBoosting

Parameters

threshold (float or int, The threshold use to separate positive and negative class. Normally, it should be (0,1)) –

class RsaParam(rsa_key_n=None, rsa_key_e=None, rsa_key_d=None, save_out_table_namespace=None, save_out_table_name=None)

Define the sample method

Parameters
  • rsa_key_n (integer, RSA modulus, default: None) –

  • rsa_key_e (integer, RSA public exponent, default: None) –

  • rsa_key_d (integer, RSA private exponent, default: None) –

  • save_out_table_namespace (str, namespace of dtable where stores the output data. default: None) –

  • save_out_table_name (str, name of dtable where stores the output data. default: None) –

class SampleParam(mode='random', method='downsample', fractions=None, random_state=None, task_type='hetero', need_run=True)

Define the sample method

Parameters
  • mode (str, accepted 'random','stratified'' only in this version, specify sample to use, default: 'random') –

  • method (str, accepted 'downsample','upsample' only in this version. default: 'downsample') –

  • fractions (None or float or list, if mode equals to random, it should be a float number greater than 0,) – otherwise a list of elements of pairs like [label_i, sample_rate_i], e.g. [[0, 0.5], [1, 0.8], [2, 0.3]]. default: None

  • random_state (int, RandomState instance or None, default: None) –

  • need_run (bool, default True) – Indicate if this module needed to be run

class ScaleParam(method='standard_scale', mode='normal', scale_col_indexes=- 1, scale_names=None, feat_upper=None, feat_lower=None, with_mean=True, with_std=True, need_run=True)

Define the feature scale parameters.

Parameters
  • method (str, like scale in sklearn, now it support "min_max_scale" and "standard_scale", and will support other scale method soon.) – Default standard_scale, which will do nothing for scale

  • mode (str, the mode support "normal" and "cap". for mode is "normal", the feat_upper and feat_lower is the normal value like "10" or "3.1" and for "cap", feat_upper and) – feature_lower will between 0 and 1, which means the percentile of the column. Default “normal”

  • feat_upper (int or float or list of int or float, the upper limit in the column.) – If use list, mode must be “normal”, and list length should equal to the number of features to scale. If the scaled value is larger than feat_upper, it will be set to feat_upper. Default None.

  • feat_lower (int or float or list of int or float, the lower limit in the column.) – If use list, mode must be “normal”, and list length should equal to the number of features to scale. If the scaled value is less than feat_lower, it will be set to feat_lower. Default None.

  • scale_col_indexes (list,the idx of column in scale_column_idx will be scaled, while the idx of column is not in, it will not be scaled.) –

  • scale_names (list of string, default: []Specify which columns need to scaled. Each element in the list represent for a column name in header.) –

  • with_mean (bool, used for "standard_scale". Default True.) –

  • with_std (bool, used for "standard_scale". Default True.) – The standard scale of column x is calculated as : z = (x - u) / s, where u is the mean of the column and s is the standard deviation of the column. if with_mean is False, u will be 0, and if with_std is False, s will be 1.

  • need_run (bool, default True) – Indicate if this module needed to be run

class StochasticQuasiNewtonParam(update_interval_L=3, memory_M=5, sample_size=5000, random_seed=None)

Parameters used for stochastic quasi-newton method.

Parameters
  • update_interval_L (int, default: 3) – Set how many iteration to update hess matrix

  • memory_M (int, default: 5) – Stack size of curvature information, i.e. y_k and s_k in the paper.

  • sample_size (int, default: 5000) – Sample size of data that used to update Hess matrix

class StatisticsParam(statistics='summary', column_names=None, column_indexes=- 1, need_run=True, abnormal_list=None, quantile_error=0.0001, bias=True)

Define statistics params

Parameters
  • statistics (list, string, default "summary") –

    Specify the statistic types to be computed. “summary” represents list: [consts.SUM, consts.MEAN, consts.STANDARD_DEVIATION,

    consts.MEDIAN, consts.MIN, consts.MAX, consts.MISSING_COUNT, consts.SKEWNESS, consts.KURTOSIS]

  • column_names (list of string, default []) – Specify columns to be used for statistic computation by column names in header

  • column_indexes (list of int, default -1) – Specify columns to be used for statistic computation by column order in header -1 indicates to compute statistics over all columns

  • bias (bool, default: True) – If False, the calculations of skewness and kurtosis are corrected for statistical bias.

  • need_run (bool, default True) – Indicate whether to run this modules

class StepwiseParam(score_name='AIC', mode='hetero', role='guest', direction='both', max_step=10, nvmin=2, nvmax=None, need_stepwise=False)

Define stepwise params

Parameters
  • score_name (str, default: 'AIC') – Specify which model selection criterion to be used, choose ‘aic’ or ‘bic’

  • mode (str, default: 'Hetero') – Indicate what mode is current task

  • role (str, default: 'Guest') – Indicate what role is current party

  • direction (str, default: 'both') – Indicate which direction to go for stepwise. ‘forward’ means forward selection; ‘backward’ means elimination; ‘both’ means possible models of both directions are examined at each step.

  • max_step (int, default: '10') – Specify total number of steps to run before forced stop.

  • nvmin (int, default: '2') – Specify the min subset size of final model, cannot be lower than 2. When nvmin > 2, the final model size may be smaller than nvmin due to max_step limit.

  • nvmax (int, default: None) – Specify the max subset size of final model, 2 <= nvmin <= nvmax. The final model size may be larger than nvmax due to max_step limit.

  • need_stepwise (bool, default False) – Indicate if this module needed to be run

class UnionParam(need_run=True, allow_missing=False, keep_duplicate=False)

Define the union method for combining multiple dTables and keep entries with the same id

Parameters
  • need_run (bool, default True) – Indicate if this module needed to be run

  • allow_missing (bool, default False) – Whether allow mismatch between feature length and header length in the result. Note that empty tables will always be skipped regardless of this param setting.

  • keep_duplicate (bool, default False) – Whether to keep entries with duplicated keys. If set to True, a new id will be generated for duplicated entry in the format {id}_{table_name}.

class ColumnExpandParam(append_header=None, method='manual', fill_value=1e-08, need_run=True)

Define method used for expanding column

Parameters
  • append_header (None, str, List[str] default: None) – Name(s) for appended feature(s). If None is given, module outputs the original input value without any operation.

  • method (str, default: 'manual') – If method is ‘manual’, use user-specified fill_value to fill in new features.

  • fill_value (int, float, str, List[int], List[float], List[str] default: 1e-8) – Used for filling expanded feature columns. If given a list, length of the list must match that of append_header

  • need_run (bool, default: True) – Indicate if this module needed to be run.

class KmeansParam(k=5, max_iter=300, tol=0.001, random_stat=None)
kint, should be larger than 1 and less than 100 in this version, default 5.

The number of the centroids to generate.

max_iterint, default 300.

Maximum number of iterations of the hetero-k-means algorithm to run.

tol : float, default 0.001. random_stat : random seed

class ScorecardParam(method='credit', offset=500, factor=20, factor_base=2, upper_limit_ratio=3, lower_limit_value=0, need_run=True)

Define method used for transforming prediction score to credit score

Parameters
  • method (str, default: 'credit') – score method, currently only supports “credit”

  • offset (int or float, default: 500) – score baseline

  • factor (int or float, default: 20) – scoring step, when odds double, result score increases by this factor

  • factor_base (int or float, default: 2) – factor base, value ln(factor_base) is used for calculating result score

  • upper_limit_ratio (int or float, default: 3) – upper bound for odds, credit score upper bound is upper_limit_ratio * offset

  • lower_limit_value (int or float, default: 0) – lower bound for result score

  • need_run (bool, default: True) – Indicate if this module needs to be run.

class SecureInformationRetrievalParam(security_level=0.5, oblivious_transfer_protocol='OT_Hauck', commutative_encryption='CommutativeEncryptionPohligHellman', non_committing_encryption='aes', key_size=1024, raw_retrieval=False)

security_level: float [0, 1]; if security_level == 0, then do raw data retrieval oblivious_transfer_protocol: OT type, only supports consts.OT_HAUCK commutative_encryption: the commutative encryption scheme used, only supports consts.CE_PH non_committing_encryption: the non-committing encryption scheme used, only supports consts.AES key_size: int >= 768, the key length of the commutative cipher raw_retrieval: bool, perform raw retrieval if raw_retrieval

class SampleWeightParam(class_weight=None, sample_weight_name=None, normalize=False, need_run=True)

Define sample weight parameters

Parameters
  • class_weight (str or dict, default None) – class weight dictionary or class weight computation mode, string value only accepts ‘balanced’; If dict provided, key should be class(label), and weight will not be normalize, e.g.: {‘0’: 1, ‘1’: 2} If both class_weight and sample_weight_name are None, return original input data.

  • sample_weight_name (str, name of column which specifies sample weight.) – feature name of sample weight; if both class_weight and sample_weight_name are None, return original input data

  • normalize (bool, default False) – whether to normalize sample weight extracted from sample_weight_name column

  • need_run (bool, default True) – whether to run this module or not