Federated Machine Learning¶

[中文]

FederatedML includes implementation of many common machine learning algorithms on federated learning. All modules are developed in a decoupling modular approach to enhance scalability. Specifically, we provide:

Federated Statistic: PSI, Union, Pearson Correlation, etc.
Federated Information Retrieval: PIR(SIR) Based OT
Federated Feature Engineering: Feature Sampling, Feature Binning, Feature Selection, etc.
Federated Machine Learning Algorithms: LR, GBDT, DNN, TransferLearning, UnsupervisedLearning which support Heterogeneous and Homogeneous styles.
Model Evaluation: Binary | Multiclass | Regression | Clustering Evaluation, Local vs Federated Comparison.
Secure Protocol: Provides multiple security protocols for secure multi-party computing and interaction between participants.

Algorithm List¶

Algorithm	Module Name	Description	Data Input	Data Output	Model Input	Model Output
DataIO	DataIO	This component transforms user-uploaded data into Instance object(deprecate in FATe-v1.7, use DataTransform instead).	Table, values are raw data.	Transformed Table, values are data instance defined here		DataIO Model
DataTransform	DataTransform	This component transforms user-uploaded data into Instance object.	Table, values are raw data.	Transformed Table, values are data instance defined here		DataTransform Model
Intersect	Intersection	Compute intersect data set of multiple parties without leakage of difference set information. Mainly used in hetero scenario task.	Table.	Table with only common instance keys.		Intersect Model
Federated Sampling	FederatedSample	Federated Sampling data so that its distribution become balance in each party.This module supports standalone and federated versions.	Table	Table of sampled data; both random and stratified sampling methods are supported.
Feature Scale	FeatureScale	module for feature scaling and standardization.	Table，values are instances.	Transformed Table.	Transform factors like min/max, mean/std.
Hetero Feature Binning	HeteroFeatureBinning	With binning input data, calculates each column's iv and woe and transform data according to the binned information.	Table, values are instances.	Transformed Table.		iv/woe, split points, event count, non-event count etc. of each column.
Homo Feature Binning	HomoFeatureBinning	Calculate quantile binning through multiple parties	Table	Transformed Table		Split points of each column
OneHot Encoder	OneHotEncoder	Transfer a column into one-hot format.	Table, values are instances.	Transformed Table with new header.		Feature-name mapping between original header and new header.
Hetero Feature Selection	HeteroFeatureSelection	Provide 5 types of filters. Each filters can select columns according to user config	Table	Transformed Table with new header and filtered data instance.	If iv filters used, hetero_binning model is needed.	Whether each column is filtered.
Union	Union	Combine multiple data tables into one.	Tables.	Table with combined values from input Tables.
Hetero-LR	HeteroLR	Build hetero logistic regression model through multiple parties.	Table, values are instances	Table, values are instances.		Logistic Regression Model, consists of model-meta and model-param.
Local Baseline	LocalBaseline	Wrapper that runs sklearn(scikit-learn) Logistic Regression model with local data.	Table, values are instances.	Table, values are instances.
Hetero-LinR	HeteroLinR	Build hetero linear regression model through multiple parties.	Table, values are instances.	Table, values are instances.		Linear Regression Model, consists of model-meta and model-param.
Hetero-Poisson	HeteroPoisson	Build hetero poisson regression model through multiple parties.	Table, values are instances.	Table, values are instances.		Poisson Regression Model, consists of model-meta and model-param.
Homo-LR	HomoLR	Build homo logistic regression model through multiple parties.	Table, values are instances.	Table, values are instances.		Logistic Regression Model, consists of model-meta and model-param.
Homo-NN	HomoNN	Build homo neural network model through multiple parties.	Table, values are instances.	Table, values are instances.		Neural Network Model, consists of model-meta and model-param.
Hetero Secure Boosting	HeteroSecureBoost	Build hetero secure boosting model through multiple parties	Table, values are instances.	Table, values are instances.		SecureBoost Model, consists of model-meta and model-param.
Hetero Fast Secure Boosting	HeteroFastSecureBoost	Build hetero secure boosting model through multiple parties in layered/mix manners.	Table, values are instances.	Table, values are instances.		FastSecureBoost Model, consists of model-meta and model-param.
Hetero Secure Boost Feature Transformer	SBTFeatureTransformer	This component can encode sample using Hetero SBT leaf indices.	Table, values are instances.	Table, values are instances.		SBT Transformer Model
Evaluation	Evaluation	Output the model evaluation metrics for user.	Table(s), values are instances.
Hetero Pearson	HeteroPearson	Calculate hetero correlation of features from different parties.	Table, values are instances.
Hetero-NN	HeteroNN	Build hetero neural network model.	Table, values are instances.	Table, values are instances.		Hetero Neural Network Model, consists of model-meta and model-param.
Homo Secure Boosting	HomoSecureBoost	Build homo secure boosting model through multiple parties	Table, values are instances.	Table, values are instances.		SecureBoost Model, consists of model-meta and model-param.
Homo OneHot Encoder	HomoOneHotEncoder	Build homo onehot encoder model through multiple parties.	Table, values are instances.	Transformed Table with new header.		Feature-name mapping between original header and new header.
Hetero Data Split	HeteroDataSplit	Split one data table into 3 tables by given ratio or count	Table, values are instances.	3 Tables, values are instance.
Homo Data Split	HomoDataSplit	Split one data table into 3 tables by given ratio or count	Table, values are instances.	3 Tables, values are instance.
Column Expand	ColumnExpand	Add arbitrary number of columns with user-provided values.	Table, values are raw data.	Transformed Table with added column(s) and new header.		Column Expand Model
Secure Information Retrieval	SecureInformationRetrieval	Securely retrieves information from host through oblivious transfer	Table, values are instance	Table, values are instance
Hetero Federated Transfer Learning	FTL	Build Hetero FTL Model Between 2 party	Table, values are instance			Hetero FTL Model
Hetero KMeans	HeteroKMeans	Build Hetero KMeans model through multiple parties	Table, values are instance	Table, values are instance; Arbier outputs 2 Tables		Hetero KMeans Model
PSI	PSI	Compute PSI value of features between two table	Table, values are instance			PSI Results
Data Statistics	DataStatistics	This component will do some statistical work on the data, including statistical mean, maximum and minimum, median, etc.	Table, values are instance	Table		Statistic Result
Scorecard	Scorecard	Scale predict score to credit score by given scaling parameters	Table, values are predict score	Table, values are score results
Sample Weight	SampleWeight	Assign weight to instances according to user-specified parameters	Table, values are instance	Table, values are weighted instance		SampleWeight Model
Feldman Verifiable Sum	FeldmanVerifiableSum	This component will sum multiple privacy values without exposing data	Table, values to sum	Table, values are sum results
Feature Imputation	FeatureImputation	This component imputes missing features using arbitrary methods/values	Table, values are Instances	Table, values with missing features filled		FeatureImputation Model
Label Transform	LabelTransform	Replaces label values of input data instances and predict results	Table, values are Instances or prediction results	Table, values with transformed label values		LabelTransform Model
Hetero SSHE Logistic Regression	HeteroSSHELR	Build hetero logistic regression model without arbiter	Table, values are Instances	Table, values are Instances		SSHE LR Model

Secure Protocol¶

Params¶

`param` `special` ¶

`all` `special` ¶

Modules¶

`base_param` ¶


BaseParam

¶

Source code in federatedml/param/base_param.py

class BaseParam(metaclass=_StaticDefaultMeta):
    def __init__(self):
        pass

    def set_name(self, name: str):
        self._name = name
        return self

    def check(self):
        raise NotImplementedError("Parameter Object should have be check")

    @classmethod
    def _get_or_init_deprecated_params_set(cls):
        if not hasattr(cls, _DEPRECATED_PARAMS):
            setattr(cls, _DEPRECATED_PARAMS, set())
        return getattr(cls, _DEPRECATED_PARAMS)

    def _get_or_init_feeded_deprecated_params_set(self, conf=None):
        if not hasattr(self, _FEEDED_DEPRECATED_PARAMS):
            if conf is None:
                setattr(self, _FEEDED_DEPRECATED_PARAMS, set())
            else:
                setattr(
                    self,
                    _FEEDED_DEPRECATED_PARAMS,
                    set(conf[_FEEDED_DEPRECATED_PARAMS]),
                )
        return getattr(self, _FEEDED_DEPRECATED_PARAMS)

    def _get_or_init_user_feeded_params_set(self, conf=None):
        if not hasattr(self, _USER_FEEDED_PARAMS):
            if conf is None:
                setattr(self, _USER_FEEDED_PARAMS, set())
            else:
                setattr(self, _USER_FEEDED_PARAMS, set(conf[_USER_FEEDED_PARAMS]))
        return getattr(self, _USER_FEEDED_PARAMS)

    def get_user_feeded(self):
        return self._get_or_init_user_feeded_params_set()

    def get_feeded_deprecated_params(self):
        return self._get_or_init_feeded_deprecated_params_set()

    @property
    def _deprecated_params_set(self):
        return {name: True for name in self.get_feeded_deprecated_params()}

    def as_dict(self):
        def _recursive_convert_obj_to_dict(obj):
            ret_dict = {}
            for attr_name in list(obj.__dict__):
                # get attr
                attr = getattr(obj, attr_name)
                if attr and type(attr).__name__ not in dir(builtins):
                    ret_dict[attr_name] = _recursive_convert_obj_to_dict(attr)
                else:
                    ret_dict[attr_name] = attr

            return ret_dict

        return _recursive_convert_obj_to_dict(self)

    def update(self, conf, allow_redundant=False):
        update_from_raw_conf = conf.get(_IS_RAW_CONF, True)
        if update_from_raw_conf:
            deprecated_params_set = self._get_or_init_deprecated_params_set()
            feeded_deprecated_params_set = (
                self._get_or_init_feeded_deprecated_params_set()
            )
            user_feeded_params_set = self._get_or_init_user_feeded_params_set()
            setattr(self, _IS_RAW_CONF, False)
        else:
            feeded_deprecated_params_set = (
                self._get_or_init_feeded_deprecated_params_set(conf)
            )
            user_feeded_params_set = self._get_or_init_user_feeded_params_set(conf)

        def _recursive_update_param(param, config, depth, prefix):
            if depth > consts.PARAM_MAXDEPTH:
                raise ValueError("Param define nesting too deep!!!, can not parse it")

            inst_variables = param.__dict__
            redundant_attrs = []
            for config_key, config_value in config.items():
                # redundant attr
                if config_key not in inst_variables:
                    if not update_from_raw_conf and config_key.startswith("_"):
                        setattr(param, config_key, config_value)
                    else:
                        redundant_attrs.append(config_key)
                    continue

                full_config_key = f"{prefix}{config_key}"

                if update_from_raw_conf:
                    # add user feeded params
                    user_feeded_params_set.add(full_config_key)

                    # update user feeded deprecated param set
                    if full_config_key in deprecated_params_set:
                        feeded_deprecated_params_set.add(full_config_key)

                # supported attr
                attr = getattr(param, config_key)
                if type(attr).__name__ in dir(builtins) or attr is None:
                    setattr(param, config_key, config_value)

                else:
                    # recursive set obj attr
                    sub_params = _recursive_update_param(
                        attr, config_value, depth + 1, prefix=f"{prefix}{config_key}."
                    )
                    setattr(param, config_key, sub_params)

            if not allow_redundant and redundant_attrs:
                raise ValueError(
                    f"cpn `{getattr(self, '_name', type(self))}` has redundant parameters: `{[redundant_attrs]}`"
                )

            return param

        return _recursive_update_param(param=self, config=conf, depth=0, prefix="")

    def extract_not_builtin(self):
        def _get_not_builtin_types(obj):
            ret_dict = {}
            for variable in obj.__dict__:
                attr = getattr(obj, variable)
                if attr and type(attr).__name__ not in dir(builtins):
                    ret_dict[variable] = _get_not_builtin_types(attr)

            return ret_dict

        return _get_not_builtin_types(self)

    def validate(self):
        self.builtin_types = dir(builtins)
        self.func = {
            "ge": self._greater_equal_than,
            "le": self._less_equal_than,
            "in": self._in,
            "not_in": self._not_in,
            "range": self._range,
        }
        home_dir = os.path.abspath(os.path.dirname(os.path.realpath(__file__)))
        param_validation_path_prefix = home_dir + "/param_validation/"

        param_name = type(self).__name__
        param_validation_path = "/".join(
            [param_validation_path_prefix, param_name + ".json"]
        )

        validation_json = None

        try:
            with open(param_validation_path, "r") as fin:
                validation_json = json.loads(fin.read())
        except:
            return

        self._validate_param(self, validation_json)

    def _validate_param(self, param_obj, validation_json):
        default_section = type(param_obj).__name__
        var_list = param_obj.__dict__

        for variable in var_list:
            attr = getattr(param_obj, variable)

            if type(attr).__name__ in self.builtin_types or attr is None:
                if variable not in validation_json:
                    continue

                validation_dict = validation_json[default_section][variable]
                value = getattr(param_obj, variable)
                value_legal = False

                for op_type in validation_dict:
                    if self.func[op_type](value, validation_dict[op_type]):
                        value_legal = True
                        break

                if not value_legal:
                    raise ValueError(
                        "Plase check runtime conf, {} = {} does not match user-parameter restriction".format(
                            variable, value
                        )
                    )

            elif variable in validation_json:
                self._validate_param(attr, validation_json)

    @staticmethod
    def check_string(param, descr):
        if type(param).__name__ not in ["str"]:
            raise ValueError(
                descr + " {} not supported, should be string type".format(param)
            )

    @staticmethod
    def check_positive_integer(param, descr):
        if type(param).__name__ not in ["int", "long"] or param <= 0:
            raise ValueError(
                descr + " {} not supported, should be positive integer".format(param)
            )

    @staticmethod
    def check_positive_number(param, descr):
        if type(param).__name__ not in ["float", "int", "long"] or param <= 0:
            raise ValueError(
                descr + " {} not supported, should be positive numeric".format(param)
            )

    @staticmethod
    def check_nonnegative_number(param, descr):
        if type(param).__name__ not in ["float", "int", "long"] or param < 0:
            raise ValueError(
                descr
                + " {} not supported, should be non-negative numeric".format(param)
            )

    @staticmethod
    def check_decimal_float(param, descr):
        if type(param).__name__ not in ["float", "int"] or param < 0 or param > 1:
            raise ValueError(
                descr
                + " {} not supported, should be a float number in range [0, 1]".format(
                    param
                )
            )

    @staticmethod
    def check_boolean(param, descr):
        if type(param).__name__ != "bool":
            raise ValueError(
                descr + " {} not supported, should be bool type".format(param)
            )

    @staticmethod
    def check_open_unit_interval(param, descr):
        if type(param).__name__ not in ["float"] or param <= 0 or param >= 1:
            raise ValueError(
                descr + " should be a numeric number between 0 and 1 exclusively"
            )

    @staticmethod
    def check_valid_value(param, descr, valid_values):
        if param not in valid_values:
            raise ValueError(
                descr
                + " {} is not supported, it should be in {}".format(param, valid_values)
            )

    @staticmethod
    def check_defined_type(param, descr, types):
        if type(param).__name__ not in types:
            raise ValueError(
                descr + " {} not supported, should be one of {}".format(param, types)
            )

    @staticmethod
    def check_and_change_lower(param, valid_list, descr=""):
        if type(param).__name__ != "str":
            raise ValueError(
                descr
                + " {} not supported, should be one of {}".format(param, valid_list)
            )

        lower_param = param.lower()
        if lower_param in valid_list:
            return lower_param
        else:
            raise ValueError(
                descr
                + " {} not supported, should be one of {}".format(param, valid_list)
            )

    @staticmethod
    def _greater_equal_than(value, limit):
        return value >= limit - consts.FLOAT_ZERO

    @staticmethod
    def _less_equal_than(value, limit):
        return value <= limit + consts.FLOAT_ZERO

    @staticmethod
    def _range(value, ranges):
        in_range = False
        for left_limit, right_limit in ranges:
            if (
                left_limit - consts.FLOAT_ZERO
                <= value
                <= right_limit + consts.FLOAT_ZERO
            ):
                in_range = True
                break

        return in_range

    @staticmethod
    def _in(value, right_value_list):
        return value in right_value_list

    @staticmethod
    def _not_in(value, wrong_value_list):
        return value not in wrong_value_list

    def _warn_deprecated_param(self, param_name, descr):
        if self._deprecated_params_set.get(param_name):
            LOGGER.warning(
                f"{descr} {param_name} is deprecated and ignored in this version."
            )

    def _warn_to_deprecate_param(self, param_name, descr, new_param):
        if self._deprecated_params_set.get(param_name):
            LOGGER.warning(
                f"{descr} {param_name} will be deprecated in future release; "
                f"please use {new_param} instead."
            )
            return True
        return False

__init__(self) special ¶

Source code in federatedml/param/base_param.py

def __init__(self):
    pass

set_name(self, name) ¶

Source code in federatedml/param/base_param.py

def set_name(self, name: str):
    self._name = name
    return self

check(self) ¶

Source code in federatedml/param/base_param.py

def check(self):
    raise NotImplementedError("Parameter Object should have be check")

get_user_feeded(self) ¶

Source code in federatedml/param/base_param.py

def get_user_feeded(self):
    return self._get_or_init_user_feeded_params_set()

get_feeded_deprecated_params(self) ¶

Source code in federatedml/param/base_param.py

def get_feeded_deprecated_params(self):
    return self._get_or_init_feeded_deprecated_params_set()

as_dict(self) ¶

Source code in federatedml/param/base_param.py

def as_dict(self):
    def _recursive_convert_obj_to_dict(obj):
        ret_dict = {}
        for attr_name in list(obj.__dict__):
            # get attr
            attr = getattr(obj, attr_name)
            if attr and type(attr).__name__ not in dir(builtins):
                ret_dict[attr_name] = _recursive_convert_obj_to_dict(attr)
            else:
                ret_dict[attr_name] = attr

        return ret_dict

    return _recursive_convert_obj_to_dict(self)

update(self, conf, allow_redundant=False) ¶

Source code in federatedml/param/base_param.py

def update(self, conf, allow_redundant=False):
    update_from_raw_conf = conf.get(_IS_RAW_CONF, True)
    if update_from_raw_conf:
        deprecated_params_set = self._get_or_init_deprecated_params_set()
        feeded_deprecated_params_set = (
            self._get_or_init_feeded_deprecated_params_set()
        )
        user_feeded_params_set = self._get_or_init_user_feeded_params_set()
        setattr(self, _IS_RAW_CONF, False)
    else:
        feeded_deprecated_params_set = (
            self._get_or_init_feeded_deprecated_params_set(conf)
        )
        user_feeded_params_set = self._get_or_init_user_feeded_params_set(conf)

    def _recursive_update_param(param, config, depth, prefix):
        if depth > consts.PARAM_MAXDEPTH:
            raise ValueError("Param define nesting too deep!!!, can not parse it")

        inst_variables = param.__dict__
        redundant_attrs = []
        for config_key, config_value in config.items():
            # redundant attr
            if config_key not in inst_variables:
                if not update_from_raw_conf and config_key.startswith("_"):
                    setattr(param, config_key, config_value)
                else:
                    redundant_attrs.append(config_key)
                continue

            full_config_key = f"{prefix}{config_key}"

            if update_from_raw_conf:
                # add user feeded params
                user_feeded_params_set.add(full_config_key)

                # update user feeded deprecated param set
                if full_config_key in deprecated_params_set:
                    feeded_deprecated_params_set.add(full_config_key)

            # supported attr
            attr = getattr(param, config_key)
            if type(attr).__name__ in dir(builtins) or attr is None:
                setattr(param, config_key, config_value)

            else:
                # recursive set obj attr
                sub_params = _recursive_update_param(
                    attr, config_value, depth + 1, prefix=f"{prefix}{config_key}."
                )
                setattr(param, config_key, sub_params)

        if not allow_redundant and redundant_attrs:
            raise ValueError(
                f"cpn `{getattr(self, '_name', type(self))}` has redundant parameters: `{[redundant_attrs]}`"
            )

        return param

    return _recursive_update_param(param=self, config=conf, depth=0, prefix="")

extract_not_builtin(self) ¶

Source code in federatedml/param/base_param.py

def extract_not_builtin(self):
    def _get_not_builtin_types(obj):
        ret_dict = {}
        for variable in obj.__dict__:
            attr = getattr(obj, variable)
            if attr and type(attr).__name__ not in dir(builtins):
                ret_dict[variable] = _get_not_builtin_types(attr)

        return ret_dict

    return _get_not_builtin_types(self)

validate(self) ¶

Source code in federatedml/param/base_param.py

def validate(self):
    self.builtin_types = dir(builtins)
    self.func = {
        "ge": self._greater_equal_than,
        "le": self._less_equal_than,
        "in": self._in,
        "not_in": self._not_in,
        "range": self._range,
    }
    home_dir = os.path.abspath(os.path.dirname(os.path.realpath(__file__)))
    param_validation_path_prefix = home_dir + "/param_validation/"

    param_name = type(self).__name__
    param_validation_path = "/".join(
        [param_validation_path_prefix, param_name + ".json"]
    )

    validation_json = None

    try:
        with open(param_validation_path, "r") as fin:
            validation_json = json.loads(fin.read())
    except:
        return

    self._validate_param(self, validation_json)

check_string(param, descr) staticmethod ¶

Source code in federatedml/param/base_param.py

@staticmethod
def check_string(param, descr):
    if type(param).__name__ not in ["str"]:
        raise ValueError(
            descr + " {} not supported, should be string type".format(param)
        )

check_positive_integer(param, descr) staticmethod ¶

Source code in federatedml/param/base_param.py

@staticmethod
def check_positive_integer(param, descr):
    if type(param).__name__ not in ["int", "long"] or param <= 0:
        raise ValueError(
            descr + " {} not supported, should be positive integer".format(param)
        )

check_positive_number(param, descr) staticmethod ¶

Source code in federatedml/param/base_param.py

@staticmethod
def check_positive_number(param, descr):
    if type(param).__name__ not in ["float", "int", "long"] or param <= 0:
        raise ValueError(
            descr + " {} not supported, should be positive numeric".format(param)
        )

check_nonnegative_number(param, descr) staticmethod ¶

Source code in federatedml/param/base_param.py

@staticmethod
def check_nonnegative_number(param, descr):
    if type(param).__name__ not in ["float", "int", "long"] or param < 0:
        raise ValueError(
            descr
            + " {} not supported, should be non-negative numeric".format(param)
        )

check_decimal_float(param, descr) staticmethod ¶

Source code in federatedml/param/base_param.py

@staticmethod
def check_decimal_float(param, descr):
    if type(param).__name__ not in ["float", "int"] or param < 0 or param > 1:
        raise ValueError(
            descr
            + " {} not supported, should be a float number in range [0, 1]".format(
                param
            )
        )

check_boolean(param, descr) staticmethod ¶

Source code in federatedml/param/base_param.py

@staticmethod
def check_boolean(param, descr):
    if type(param).__name__ != "bool":
        raise ValueError(
            descr + " {} not supported, should be bool type".format(param)
        )

check_open_unit_interval(param, descr) staticmethod ¶

Source code in federatedml/param/base_param.py

@staticmethod
def check_open_unit_interval(param, descr):
    if type(param).__name__ not in ["float"] or param <= 0 or param >= 1:
        raise ValueError(
            descr + " should be a numeric number between 0 and 1 exclusively"
        )

check_valid_value(param, descr, valid_values) staticmethod ¶

Source code in federatedml/param/base_param.py

@staticmethod
def check_valid_value(param, descr, valid_values):
    if param not in valid_values:
        raise ValueError(
            descr
            + " {} is not supported, it should be in {}".format(param, valid_values)
        )

check_defined_type(param, descr, types) staticmethod ¶

Source code in federatedml/param/base_param.py

@staticmethod
def check_defined_type(param, descr, types):
    if type(param).__name__ not in types:
        raise ValueError(
            descr + " {} not supported, should be one of {}".format(param, types)
        )

check_and_change_lower(param, valid_list, descr='') staticmethod ¶

Source code in federatedml/param/base_param.py

@staticmethod
def check_and_change_lower(param, valid_list, descr=""):
    if type(param).__name__ != "str":
        raise ValueError(
            descr
            + " {} not supported, should be one of {}".format(param, valid_list)
        )

    lower_param = param.lower()
    if lower_param in valid_list:
        return lower_param
    else:
        raise ValueError(
            descr
            + " {} not supported, should be one of {}".format(param, valid_list)
        )

deprecated_param(*names) ¶

Source code in federatedml/param/base_param.py

def deprecated_param(*names):
    def _decorator(cls: "BaseParam"):
        deprecated = cls._get_or_init_deprecated_params_set()
        for name in names:
            deprecated.add(name)
        return cls

    return _decorator

`boosting_param` ¶

hetero_deprecated_param_list ¶

homo_deprecated_param_list ¶

Classes¶


ObjectiveParam            (BaseParam)

¶

Define objective parameters that used in federated ml.

Parameters:

Name	Type	Description	Default
`objective`	`{None, 'cross_entropy', 'lse', 'lae', 'log_cosh', 'tweedie', 'fair', 'huber'}`	None in host's config, should be str in guest'config. when task_type is classification, only support 'cross_entropy', other 6 types support in regression task	`'cross_entropy'`
`params`	`None or list`	should be non empty list when objective is 'tweedie','fair','huber', first element of list shoulf be a float-number large than 0.0 when objective is 'fair', 'huber', first element of list should be a float-number in [1.0, 2.0) when objective is 'tweedie'	`None`

Source code in federatedml/param/boosting_param.py

class ObjectiveParam(BaseParam):
    """
    Define objective parameters that used in federated ml.

    Parameters
    ----------
    objective : {None, 'cross_entropy', 'lse', 'lae', 'log_cosh', 'tweedie', 'fair', 'huber'}
        None in host's config, should be str in guest'config.
        when task_type is classification, only support 'cross_entropy',
        other 6 types support in regression task

    params : None or list
        should be non empty list when objective is 'tweedie','fair','huber',
        first element of list shoulf be a float-number large than 0.0 when objective is 'fair', 'huber',
        first element of list should be a float-number in [1.0, 2.0) when objective is 'tweedie'
    """

    def __init__(self, objective='cross_entropy', params=None):
        self.objective = objective
        self.params = params

    def check(self, task_type=None):
        if self.objective is None:
            return True

        descr = "objective param's"

        LOGGER.debug('check objective {}'.format(self.objective))

        if task_type not in [consts.CLASSIFICATION, consts.REGRESSION]:
            self.objective = self.check_and_change_lower(self.objective,
                                                   ["cross_entropy", "lse", "lae", "huber", "fair",
                                                    "log_cosh", "tweedie"],
                                                       descr)

        if task_type == consts.CLASSIFICATION:
            if self.objective != "cross_entropy":
                raise ValueError("objective param's objective {} not supported".format(self.objective))

        elif task_type == consts.REGRESSION:
            self.objective = self.check_and_change_lower(self.objective,
                                                               ["lse", "lae", "huber", "fair", "log_cosh", "tweedie"],
                                                               descr)

            params = self.params
            if self.objective in ["huber", "fair", "tweedie"]:
                if type(params).__name__ != 'list' or len(params) < 1:
                    raise ValueError(
                        "objective param's params {} not supported, should be non-empty list".format(params))

                if type(params[0]).__name__ not in ["float", "int", "long"]:
                    raise ValueError("objective param's params[0] {} not supported".format(self.params[0]))

                if self.objective == 'tweedie':
                    if params[0] < 1 or params[0] >= 2:
                        raise ValueError("in tweedie regression, objective params[0] should betweend [1, 2)")

                if self.objective == 'fair' or 'huber':
                    if params[0] <= 0.0:
                        raise ValueError("in {} regression, objective params[0] should greater than 0.0".format(
                            self.objective))
        return True

__init__(self, objective='cross_entropy', params=None) special ¶

Source code in federatedml/param/boosting_param.py

def __init__(self, objective='cross_entropy', params=None):
    self.objective = objective
    self.params = params

check(self, task_type=None) ¶

Source code in federatedml/param/boosting_param.py

def check(self, task_type=None):
    if self.objective is None:
        return True

    descr = "objective param's"

    LOGGER.debug('check objective {}'.format(self.objective))

    if task_type not in [consts.CLASSIFICATION, consts.REGRESSION]:
        self.objective = self.check_and_change_lower(self.objective,
                                               ["cross_entropy", "lse", "lae", "huber", "fair",
                                                "log_cosh", "tweedie"],
                                                   descr)

    if task_type == consts.CLASSIFICATION:
        if self.objective != "cross_entropy":
            raise ValueError("objective param's objective {} not supported".format(self.objective))

    elif task_type == consts.REGRESSION:
        self.objective = self.check_and_change_lower(self.objective,
                                                           ["lse", "lae", "huber", "fair", "log_cosh", "tweedie"],
                                                           descr)

        params = self.params
        if self.objective in ["huber", "fair", "tweedie"]:
            if type(params).__name__ != 'list' or len(params) < 1:
                raise ValueError(
                    "objective param's params {} not supported, should be non-empty list".format(params))

            if type(params[0]).__name__ not in ["float", "int", "long"]:
                raise ValueError("objective param's params[0] {} not supported".format(self.params[0]))

            if self.objective == 'tweedie':
                if params[0] < 1 or params[0] >= 2:
                    raise ValueError("in tweedie regression, objective params[0] should betweend [1, 2)")

            if self.objective == 'fair' or 'huber':
                if params[0] <= 0.0:
                    raise ValueError("in {} regression, objective params[0] should greater than 0.0".format(
                        self.objective))
    return True


DecisionTreeParam            (BaseParam)

¶

Define decision tree parameters that used in federated ml.

Parameters:

Name	Type	Description	Default
`criterion_method`	`{"xgboost"}, default: "xgboost"`	the criterion function to use	`'xgboost'`
`criterion_params`	`list or dict`	should be non empty and elements are float-numbers, if a list is offered, the first one is l2 regularization value, and the second one is l1 regularization value. if a dict is offered, make sure it contains key 'l1', and 'l2'. l1, l2 regularization values are non-negative floats. default: [0.1, 0] or {'l1':0, 'l2':0,1}	`[0.1, 0]`
`max_depth`	`positive integer`	the max depth of a decision tree, default: 3	`3`
`min_sample_split`	`int`	least quantity of nodes to split, default: 2	`2`
`min_impurity_split`	`float`	least gain of a single split need to reach, default: 1e-3	`0.001`
`min_child_weight`	`float`	sum of hessian needed in child nodes. default is 0	`0`
`min_leaf_node`	`int`	when samples no more than min_leaf_node, it becomes a leave, default: 1	`1`
`max_split_nodes`	`positive integer`	we will use no more than max_split_nodes to parallel finding their splits in a batch, for memory consideration. default is 65536	`65536`
`feature_importance_type`	`{'split', 'gain'}`	if is 'split', feature_importances calculate by feature split times, if is 'gain', feature_importances calculate by feature split gain. default: 'split'	`'split'`
`use_missing`	`bool`	use missing value in training process or not. default: False	`False`
`zero_as_missing`	`bool`	regard 0 as missing value or not, will be use only if use_missing=True, default: False	`False`
`deterministic`	`bool`	ensure stability when computing histogram. Set this to true to ensure stable result when using same data and same parameter. But it may slow down computation.	`False`

Source code in federatedml/param/boosting_param.py

class DecisionTreeParam(BaseParam):
    """
    Define decision tree parameters that used in federated ml.

    Parameters
    ----------
    criterion_method : {"xgboost"}, default: "xgboost"
        the criterion function to use

    criterion_params: list or dict
        should be non empty and elements are float-numbers,
        if a list is offered, the first one is l2 regularization value, and the second one is
        l1 regularization value.
        if a dict is offered, make sure it contains key 'l1', and 'l2'.
        l1, l2 regularization values are non-negative floats.
        default: [0.1, 0] or {'l1':0, 'l2':0,1}

    max_depth: positive integer
        the max depth of a decision tree, default: 3

    min_sample_split: int
        least quantity of nodes to split, default: 2

    min_impurity_split: float
        least gain of a single split need to reach, default: 1e-3

    min_child_weight: float
        sum of hessian needed in child nodes. default is 0

    min_leaf_node: int
        when samples no more than min_leaf_node, it becomes a leave, default: 1

    max_split_nodes: positive integer
        we will use no more than max_split_nodes to
        parallel finding their splits in a batch, for memory consideration. default is 65536

    feature_importance_type: {'split', 'gain'}
        if is 'split', feature_importances calculate by feature split times,
        if is 'gain', feature_importances calculate by feature split gain.
        default: 'split'

    use_missing: bool
        use missing value in training process or not. default: False

    zero_as_missing: bool
        regard 0 as missing value or not,
        will be use only if use_missing=True, default: False

    deterministic: bool
        ensure stability when computing histogram. Set this to true to ensure stable result when using
        same data and same parameter. But it may slow down computation.

    """

    def __init__(self, criterion_method="xgboost", criterion_params=[0.1, 0], max_depth=3,
                 min_sample_split=2, min_impurity_split=1e-3, min_leaf_node=1,
                 max_split_nodes=consts.MAX_SPLIT_NODES, feature_importance_type="split",
                 n_iter_no_change=True, tol=0.001, min_child_weight=0,
                 use_missing=False, zero_as_missing=False, deterministic=False):

        super(DecisionTreeParam, self).__init__()

        self.criterion_method = criterion_method
        self.criterion_params = criterion_params
        self.max_depth = max_depth
        self.min_sample_split = min_sample_split
        self.min_impurity_split = min_impurity_split
        self.min_leaf_node = min_leaf_node
        self.min_child_weight = min_child_weight
        self.max_split_nodes = max_split_nodes
        self.feature_importance_type = feature_importance_type
        self.n_iter_no_change = n_iter_no_change
        self.tol = tol
        self.use_missing = use_missing
        self.zero_as_missing = zero_as_missing
        self.deterministic = deterministic

    def check(self):
        descr = "decision tree param"

        self.criterion_method = self.check_and_change_lower(self.criterion_method,
                                                             ["xgboost"],
                                                             descr)

        if len(self.criterion_params) == 0:
            raise ValueError("decisition tree param's criterio_params should be non empty")

        if type(self.criterion_params) == list:
            assert len(self.criterion_params) == 2, 'length of criterion_param should be 2: l1, l2 regularization ' \
                                                    'values are needed'
            self.check_nonnegative_number(self.criterion_params[0], 'l2 reg value')
            self.check_nonnegative_number(self.criterion_params[1], 'l1 reg value')

        elif type(self.criterion_params) == dict:
            assert 'l1' in self.criterion_params and 'l2' in self.criterion_params, 'l1 and l2 keys are needed in ' \
                                                                                    'criterion_params dict'
            self.criterion_params = [self.criterion_params['l2'], self.criterion_params['l1']]
        else:
            raise ValueError('criterion_params should be a dict or a list contains l1, l2 reg value')

        if type(self.max_depth).__name__ not in ["int", "long"]:
            raise ValueError("decision tree param's max_depth {} not supported, should be integer".format(
                self.max_depth))

        if self.max_depth < 1:
            raise ValueError("decision tree param's max_depth should be positive integer, no less than 1")

        if type(self.min_sample_split).__name__ not in ["int", "long"]:
            raise ValueError("decision tree param's min_sample_split {} not supported, should be integer".format(
                self.min_sample_split))

        if type(self.min_impurity_split).__name__ not in ["int", "long", "float"]:
            raise ValueError("decision tree param's min_impurity_split {} not supported, should be numeric".format(
                self.min_impurity_split))

        if type(self.min_leaf_node).__name__ not in ["int", "long"]:
            raise ValueError("decision tree param's min_leaf_node {} not supported, should be integer".format(
                self.min_leaf_node))

        if type(self.max_split_nodes).__name__ not in ["int", "long"] or self.max_split_nodes < 1:
            raise ValueError("decision tree param's max_split_nodes {} not supported, " + \
                             "should be positive integer between 1 and {}".format(self.max_split_nodes,
                                                                                  consts.MAX_SPLIT_NODES))

        if type(self.n_iter_no_change).__name__ != "bool":
            raise ValueError("decision tree param's n_iter_no_change {} not supported, should be bool type".format(
                self.n_iter_no_change))

        if type(self.tol).__name__ not in ["float", "int", "long"]:
            raise ValueError("decision tree param's tol {} not supported, should be numeric".format(self.tol))

        self.feature_importance_type = self.check_and_change_lower(self.feature_importance_type,
                                                                    ["split", "gain"],
                                                                    descr)

        self.check_nonnegative_number(self.min_child_weight, 'min_child_weight')
        self.check_boolean(self.deterministic, 'deterministic')

        return True

__init__(self, criterion_method='xgboost', criterion_params=[0.1, 0], max_depth=3, min_sample_split=2, min_impurity_split=0.001, min_leaf_node=1, max_split_nodes=65536, feature_importance_type='split', n_iter_no_change=True, tol=0.001, min_child_weight=0, use_missing=False, zero_as_missing=False, deterministic=False)

special ¶

Source code in federatedml/param/boosting_param.py

def __init__(self, criterion_method="xgboost", criterion_params=[0.1, 0], max_depth=3,
             min_sample_split=2, min_impurity_split=1e-3, min_leaf_node=1,
             max_split_nodes=consts.MAX_SPLIT_NODES, feature_importance_type="split",
             n_iter_no_change=True, tol=0.001, min_child_weight=0,
             use_missing=False, zero_as_missing=False, deterministic=False):

    super(DecisionTreeParam, self).__init__()

    self.criterion_method = criterion_method
    self.criterion_params = criterion_params
    self.max_depth = max_depth
    self.min_sample_split = min_sample_split
    self.min_impurity_split = min_impurity_split
    self.min_leaf_node = min_leaf_node
    self.min_child_weight = min_child_weight
    self.max_split_nodes = max_split_nodes
    self.feature_importance_type = feature_importance_type
    self.n_iter_no_change = n_iter_no_change
    self.tol = tol
    self.use_missing = use_missing
    self.zero_as_missing = zero_as_missing
    self.deterministic = deterministic

check(self) ¶

Source code in federatedml/param/boosting_param.py

def check(self):
    descr = "decision tree param"

    self.criterion_method = self.check_and_change_lower(self.criterion_method,
                                                         ["xgboost"],
                                                         descr)

    if len(self.criterion_params) == 0:
        raise ValueError("decisition tree param's criterio_params should be non empty")

    if type(self.criterion_params) == list:
        assert len(self.criterion_params) == 2, 'length of criterion_param should be 2: l1, l2 regularization ' \
                                                'values are needed'
        self.check_nonnegative_number(self.criterion_params[0], 'l2 reg value')
        self.check_nonnegative_number(self.criterion_params[1], 'l1 reg value')

    elif type(self.criterion_params) == dict:
        assert 'l1' in self.criterion_params and 'l2' in self.criterion_params, 'l1 and l2 keys are needed in ' \
                                                                                'criterion_params dict'
        self.criterion_params = [self.criterion_params['l2'], self.criterion_params['l1']]
    else:
        raise ValueError('criterion_params should be a dict or a list contains l1, l2 reg value')

    if type(self.max_depth).__name__ not in ["int", "long"]:
        raise ValueError("decision tree param's max_depth {} not supported, should be integer".format(
            self.max_depth))

    if self.max_depth < 1:
        raise ValueError("decision tree param's max_depth should be positive integer, no less than 1")

    if type(self.min_sample_split).__name__ not in ["int", "long"]:
        raise ValueError("decision tree param's min_sample_split {} not supported, should be integer".format(
            self.min_sample_split))

    if type(self.min_impurity_split).__name__ not in ["int", "long", "float"]:
        raise ValueError("decision tree param's min_impurity_split {} not supported, should be numeric".format(
            self.min_impurity_split))

    if type(self.min_leaf_node).__name__ not in ["int", "long"]:
        raise ValueError("decision tree param's min_leaf_node {} not supported, should be integer".format(
            self.min_leaf_node))

    if type(self.max_split_nodes).__name__ not in ["int", "long"] or self.max_split_nodes < 1:
        raise ValueError("decision tree param's max_split_nodes {} not supported, " + \
                         "should be positive integer between 1 and {}".format(self.max_split_nodes,
                                                                              consts.MAX_SPLIT_NODES))

    if type(self.n_iter_no_change).__name__ != "bool":
        raise ValueError("decision tree param's n_iter_no_change {} not supported, should be bool type".format(
            self.n_iter_no_change))

    if type(self.tol).__name__ not in ["float", "int", "long"]:
        raise ValueError("decision tree param's tol {} not supported, should be numeric".format(self.tol))

    self.feature_importance_type = self.check_and_change_lower(self.feature_importance_type,
                                                                ["split", "gain"],
                                                                descr)

    self.check_nonnegative_number(self.min_child_weight, 'min_child_weight')
    self.check_boolean(self.deterministic, 'deterministic')

    return True


BoostingParam            (BaseParam)

¶

Basic parameter for Boosting Algorithms

Parameters:

Name	Type	Description	Default
`task_type`	`{'classification', 'regression'}, default: 'classification'`	task type	`'classification'`
`objective_param`	`ObjectiveParam Object, default: ObjectiveParam()`	objective param	`<federatedml.param.boosting_param.ObjectiveParam object at 0x7f3f8a6a3810>`
`learning_rate`	`float, int or long`	the learning rate of secure boost. default: 0.3	`0.3`
`num_trees`	`int or float`	the max number of boosting round. default: 5	`5`
`subsample_feature_rate`	`float`	a float-number in [0, 1], default: 1.0	`1`
`n_iter_no_change`	`bool,`	when True and residual error less than tol, tree building process will stop. default: True	`True`
`bin_num`	`positive integer greater than 1`	bin number use in quantile. default: 32	`32`
`validation_freqs`	`None or positive integer or container object in python`	Do validation in training process or Not. if equals None, will not do validation in train process; if equals positive integer, will validate data every validation_freqs epochs passes; if container object in python, will validate data if epochs belong to this container. e.g. validation_freqs = [10, 15], will validate data when epoch equals to 10 and 15. Default: None	`None`

Source code in federatedml/param/boosting_param.py

class BoostingParam(BaseParam):
    """
    Basic parameter for Boosting Algorithms

    Parameters
    ----------
    task_type : {'classification', 'regression'}, default: 'classification'
        task type

    objective_param : ObjectiveParam Object, default: ObjectiveParam()
        objective param

    learning_rate : float, int or long
        the learning rate of secure boost. default: 0.3

    num_trees : int or float
        the max number of boosting round. default: 5

    subsample_feature_rate : float
        a float-number in [0, 1], default: 1.0

    n_iter_no_change : bool,
        when True and residual error less than tol, tree building process will stop. default: True

    bin_num: positive integer greater than 1
        bin number use in quantile. default: 32

    validation_freqs: None or positive integer or container object in python
        Do validation in training process or Not.
        if equals None, will not do validation in train process;
        if equals positive integer, will validate data every validation_freqs epochs passes;
        if container object in python, will validate data if epochs belong to this container.
        e.g. validation_freqs = [10, 15], will validate data when epoch equals to 10 and 15.
        Default: None
        """

    def __init__(self,  task_type=consts.CLASSIFICATION,
                 objective_param=ObjectiveParam(),
                 learning_rate=0.3, num_trees=5, subsample_feature_rate=1, n_iter_no_change=True,
                 tol=0.0001, bin_num=32,
                 predict_param=PredictParam(), cv_param=CrossValidationParam(),
                 validation_freqs=None, metrics=None, random_seed=100,
                 binning_error=consts.DEFAULT_RELATIVE_ERROR):

        super(BoostingParam, self).__init__()

        self.task_type = task_type
        self.objective_param = copy.deepcopy(objective_param)
        self.learning_rate = learning_rate
        self.num_trees = num_trees
        self.subsample_feature_rate = subsample_feature_rate
        self.n_iter_no_change = n_iter_no_change
        self.tol = tol
        self.bin_num = bin_num
        self.predict_param = copy.deepcopy(predict_param)
        self.cv_param = copy.deepcopy(cv_param)
        self.validation_freqs = validation_freqs
        self.metrics = metrics
        self.random_seed = random_seed
        self.binning_error = binning_error

    def check(self):

        descr = "boosting tree param's"

        if self.task_type not in [consts.CLASSIFICATION, consts.REGRESSION]:
            raise ValueError("boosting_core tree param's task_type {} not supported, should be {} or {}".format(
                self.task_type, consts.CLASSIFICATION, consts.REGRESSION))

        self.objective_param.check(self.task_type)

        if type(self.learning_rate).__name__ not in ["float", "int", "long"]:
            raise ValueError("boosting_core tree param's learning_rate {} not supported, should be numeric".format(
                self.learning_rate))

        if type(self.subsample_feature_rate).__name__ not in ["float", "int", "long"] or \
                self.subsample_feature_rate < 0 or self.subsample_feature_rate > 1:
            raise ValueError("boosting_core tree param's subsample_feature_rate should be a numeric number between 0 and 1")

        if type(self.n_iter_no_change).__name__ != "bool":
            raise ValueError("boosting_core tree param's n_iter_no_change {} not supported, should be bool type".format(
                self.n_iter_no_change))

        if type(self.tol).__name__ not in ["float", "int", "long"]:
            raise ValueError("boosting_core tree param's tol {} not supported, should be numeric".format(self.tol))

        if type(self.bin_num).__name__ not in ["int", "long"] or self.bin_num < 2:
            raise ValueError(
                "boosting_core tree param's bin_num {} not supported, should be positive integer greater than 1".format(
                    self.bin_num))

        if self.validation_freqs is None:
            pass
        elif isinstance(self.validation_freqs, int):
            if self.validation_freqs < 1:
                raise ValueError("validation_freqs should be larger than 0 when it's integer")
        elif not isinstance(self.validation_freqs, collections.Container):
            raise ValueError("validation_freqs should be None or positive integer or container")

        if self.metrics is not None and not isinstance(self.metrics, list):
            raise ValueError("metrics should be a list")

        if self.random_seed is not None:
            assert type(self.random_seed) == int and self.random_seed >= 0, 'random seed must be an integer >= 0'

        self.check_decimal_float(self.binning_error, descr)

        return True

__init__(self, task_type='classification', objective_param=<federatedml.param.boosting_param.ObjectiveParam object at 0x7f3f8a6a3810>, learning_rate=0.3, num_trees=5, subsample_feature_rate=1, n_iter_no_change=True, tol=0.0001, bin_num=32, predict_param=<federatedml.param.predict_param.PredictParam object at 0x7f3f8a6a3910>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object at 0x7f3f8a6a3a10>, validation_freqs=None, metrics=None, random_seed=100, binning_error=0.0001)

special ¶

Source code in federatedml/param/boosting_param.py

def __init__(self,  task_type=consts.CLASSIFICATION,
             objective_param=ObjectiveParam(),
             learning_rate=0.3, num_trees=5, subsample_feature_rate=1, n_iter_no_change=True,
             tol=0.0001, bin_num=32,
             predict_param=PredictParam(), cv_param=CrossValidationParam(),
             validation_freqs=None, metrics=None, random_seed=100,
             binning_error=consts.DEFAULT_RELATIVE_ERROR):

    super(BoostingParam, self).__init__()

    self.task_type = task_type
    self.objective_param = copy.deepcopy(objective_param)
    self.learning_rate = learning_rate
    self.num_trees = num_trees
    self.subsample_feature_rate = subsample_feature_rate
    self.n_iter_no_change = n_iter_no_change
    self.tol = tol
    self.bin_num = bin_num
    self.predict_param = copy.deepcopy(predict_param)
    self.cv_param = copy.deepcopy(cv_param)
    self.validation_freqs = validation_freqs
    self.metrics = metrics
    self.random_seed = random_seed
    self.binning_error = binning_error

check(self) ¶

Source code in federatedml/param/boosting_param.py

def check(self):

    descr = "boosting tree param's"

    if self.task_type not in [consts.CLASSIFICATION, consts.REGRESSION]:
        raise ValueError("boosting_core tree param's task_type {} not supported, should be {} or {}".format(
            self.task_type, consts.CLASSIFICATION, consts.REGRESSION))

    self.objective_param.check(self.task_type)

    if type(self.learning_rate).__name__ not in ["float", "int", "long"]:
        raise ValueError("boosting_core tree param's learning_rate {} not supported, should be numeric".format(
            self.learning_rate))

    if type(self.subsample_feature_rate).__name__ not in ["float", "int", "long"] or \
            self.subsample_feature_rate < 0 or self.subsample_feature_rate > 1:
        raise ValueError("boosting_core tree param's subsample_feature_rate should be a numeric number between 0 and 1")

    if type(self.n_iter_no_change).__name__ != "bool":
        raise ValueError("boosting_core tree param's n_iter_no_change {} not supported, should be bool type".format(
            self.n_iter_no_change))

    if type(self.tol).__name__ not in ["float", "int", "long"]:
        raise ValueError("boosting_core tree param's tol {} not supported, should be numeric".format(self.tol))

    if type(self.bin_num).__name__ not in ["int", "long"] or self.bin_num < 2:
        raise ValueError(
            "boosting_core tree param's bin_num {} not supported, should be positive integer greater than 1".format(
                self.bin_num))

    if self.validation_freqs is None:
        pass
    elif isinstance(self.validation_freqs, int):
        if self.validation_freqs < 1:
            raise ValueError("validation_freqs should be larger than 0 when it's integer")
    elif not isinstance(self.validation_freqs, collections.Container):
        raise ValueError("validation_freqs should be None or positive integer or container")

    if self.metrics is not None and not isinstance(self.metrics, list):
        raise ValueError("metrics should be a list")

    if self.random_seed is not None:
        assert type(self.random_seed) == int and self.random_seed >= 0, 'random seed must be an integer >= 0'

    self.check_decimal_float(self.binning_error, descr)

    return True


HeteroBoostingParam            (BoostingParam)

¶

Parameters:

Name	Type	Description	Default
`encrypt_param`	`EncodeParam Object`	encrypt method use in secure boost, default: EncryptParam()	`<federatedml.param.encrypt_param.EncryptParam object at 0x7f3f8a6a3ad0>`
`encrypted_mode_calculator_param`	`EncryptedModeCalculatorParam object`	the calculation mode use in secureboost, default: EncryptedModeCalculatorParam()	`<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object at 0x7f3f8a6a3b90>`

Source code in federatedml/param/boosting_param.py

class HeteroBoostingParam(BoostingParam):

    """
    Parameters
    ----------
    encrypt_param : EncodeParam Object
        encrypt method use in secure boost, default: EncryptParam()

    encrypted_mode_calculator_param: EncryptedModeCalculatorParam object
        the calculation mode use in secureboost,
        default: EncryptedModeCalculatorParam()
    """

    def __init__(self, task_type=consts.CLASSIFICATION,
                 objective_param=ObjectiveParam(),
                 learning_rate=0.3, num_trees=5, subsample_feature_rate=1, n_iter_no_change=True,
                 tol=0.0001, encrypt_param=EncryptParam(),
                 bin_num=32,
                 encrypted_mode_calculator_param=EncryptedModeCalculatorParam(),
                 predict_param=PredictParam(), cv_param=CrossValidationParam(),
                 validation_freqs=None, early_stopping_rounds=None, metrics=None, use_first_metric_only=False,
                 random_seed=100, binning_error=consts.DEFAULT_RELATIVE_ERROR):

        super(HeteroBoostingParam, self).__init__(task_type, objective_param, learning_rate, num_trees,
                                                  subsample_feature_rate, n_iter_no_change, tol, bin_num,
                                                  predict_param, cv_param, validation_freqs, metrics=metrics,
                                                  random_seed=random_seed,
                                                  binning_error=binning_error)

        self.encrypt_param = copy.deepcopy(encrypt_param)
        self.encrypted_mode_calculator_param = copy.deepcopy(encrypted_mode_calculator_param)
        self.early_stopping_rounds = early_stopping_rounds
        self.use_first_metric_only = use_first_metric_only

    def check(self):

        super(HeteroBoostingParam, self).check()
        self.encrypted_mode_calculator_param.check()
        self.encrypt_param.check()

        if self.early_stopping_rounds is None:
            pass
        elif isinstance(self.early_stopping_rounds, int):
            if self.early_stopping_rounds < 1:
                raise ValueError("early stopping rounds should be larger than 0 when it's integer")
            if self.validation_freqs is None:
                raise ValueError("validation freqs must be set when early stopping is enabled")

        if not isinstance(self.use_first_metric_only, bool):
            raise ValueError("use_first_metric_only should be a boolean")

        return True

__init__(self, task_type='classification', objective_param=<federatedml.param.boosting_param.ObjectiveParam object at 0x7f3f8a6a39d0>, learning_rate=0.3, num_trees=5, subsample_feature_rate=1, n_iter_no_change=True, tol=0.0001, encrypt_param=<federatedml.param.encrypt_param.EncryptParam object at 0x7f3f8a6a3ad0>, bin_num=32, encrypted_mode_calculator_param=<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object at 0x7f3f8a6a3b90>, predict_param=<federatedml.param.predict_param.PredictParam object at 0x7f3f8a6a3990>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object at 0x7f3f8a6a3b10>, validation_freqs=None, early_stopping_rounds=None, metrics=None, use_first_metric_only=False, random_seed=100, binning_error=0.0001)

special ¶

Source code in federatedml/param/boosting_param.py

def __init__(self, task_type=consts.CLASSIFICATION,
             objective_param=ObjectiveParam(),
             learning_rate=0.3, num_trees=5, subsample_feature_rate=1, n_iter_no_change=True,
             tol=0.0001, encrypt_param=EncryptParam(),
             bin_num=32,
             encrypted_mode_calculator_param=EncryptedModeCalculatorParam(),
             predict_param=PredictParam(), cv_param=CrossValidationParam(),
             validation_freqs=None, early_stopping_rounds=None, metrics=None, use_first_metric_only=False,
             random_seed=100, binning_error=consts.DEFAULT_RELATIVE_ERROR):

    super(HeteroBoostingParam, self).__init__(task_type, objective_param, learning_rate, num_trees,
                                              subsample_feature_rate, n_iter_no_change, tol, bin_num,
                                              predict_param, cv_param, validation_freqs, metrics=metrics,
                                              random_seed=random_seed,
                                              binning_error=binning_error)

    self.encrypt_param = copy.deepcopy(encrypt_param)
    self.encrypted_mode_calculator_param = copy.deepcopy(encrypted_mode_calculator_param)
    self.early_stopping_rounds = early_stopping_rounds
    self.use_first_metric_only = use_first_metric_only

check(self) ¶

Source code in federatedml/param/boosting_param.py

def check(self):

    super(HeteroBoostingParam, self).check()
    self.encrypted_mode_calculator_param.check()
    self.encrypt_param.check()

    if self.early_stopping_rounds is None:
        pass
    elif isinstance(self.early_stopping_rounds, int):
        if self.early_stopping_rounds < 1:
            raise ValueError("early stopping rounds should be larger than 0 when it's integer")
        if self.validation_freqs is None:
            raise ValueError("validation freqs must be set when early stopping is enabled")

    if not isinstance(self.use_first_metric_only, bool):
        raise ValueError("use_first_metric_only should be a boolean")

    return True


HeteroSecureBoostParam            (HeteroBoostingParam)

¶

Define boosting tree parameters that used in federated ml.

Parameters:

Name	Type	Description	Default
`task_type`	`{'classification', 'regression'}, default: 'classification'`	task type	`'classification'`
`tree_param`	`DecisionTreeParam`	tree param	`<federatedml.param.boosting_param.DecisionTreeParam object at 0x7f3f8a6a3c50>`
`objective_param`	`ObjectiveParam Object, default: ObjectiveParam()`	objective param	`<federatedml.param.boosting_param.ObjectiveParam object at 0x7f3f8a6a3dd0>`
`learning_rate`	`float, int or long`	the learning rate of secure boost. default: 0.3	`0.3`
`num_trees`	`int or float`	the max number of trees to build. default: 5	`5`
`subsample_feature_rate`	`float`	a float-number in [0, 1], default: 1.0	`1.0`
`random_seed`	`int`	seed that controls all random functions	`100`
`n_iter_no_change`	`bool,`	when True and residual error less than tol, tree building process will stop. default: True	`True`
`encrypt_param`	`EncodeParam Object`	encrypt method use in secure boost, default: EncryptParam(), this parameter is only for hetero-secureboost	`<federatedml.param.encrypt_param.EncryptParam object at 0x7f3f8a6a3e50>`
`bin_num`	`positive integer greater than 1`	bin number use in quantile. default: 32	`32`
`encrypted_mode_calculator_param`	`EncryptedModeCalculatorParam object`	the calculation mode use in secureboost, default: EncryptedModeCalculatorParam(), only for hetero-secureboost	`<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object at 0x7f3f8a6a3e90>`
`use_missing`	`bool`	use missing value in training process or not. default: False	`False`
`zero_as_missing`	`bool`	regard 0 as missing value or not, will be use only if use_missing=True, default: False	`False`
`validation_freqs`	`None or positive integer or container object in python`	Do validation in training process or Not. if equals None, will not do validation in train process; if equals positive integer, will validate data every validation_freqs epochs passes; if container object in python, will validate data if epochs belong to this container. e.g. validation_freqs = [10, 15], will validate data when epoch equals to 10 and 15. Default: None The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to speed up training by skipping validation rounds. When it is larger than 1, a number which is divisible by "num_trees" is recommended, otherwise, you will miss the validation scores of last training iteration.	`None`
`early_stopping_rounds`	`integer larger than 0`	will stop training if one metric of one validation data doesn’t improve in last early_stopping_round rounds， need to set validation freqs and will check early_stopping every at every validation epoch,	`None`
`metrics`	`list, default: []`	Specify which metrics to be used when performing evaluation during training process. If set as empty, default metrics will be used. For regression tasks, default metrics are ['root_mean_squared_error', 'mean_absolute_error']， For binary-classificatiin tasks, default metrics are ['auc', 'ks']. For multi-classification tasks, default metrics are ['accuracy', 'precision', 'recall']	`None`
`use_first_metric_only`	`bool`	use only the first metric for early stopping	`False`
`complete_secure`	`bool`	if use complete_secure, when use complete secure, build first tree using only guest features	`False`
`sparse_optimization`	`bool`	Available when encrypted method is 'iterativeAffine' An optimized mode for high-dimension, sparse data.	`False`
`run_goss`	`bool`	activate Gradient-based One-Side Sampling, which selects large gradient and small gradient samples using top_rate and other_rate.	`False`
`top_rate`	`float`	the retain ratio of large gradient data, used when run_goss is True	`0.2`
`other_rate`	`float`	the retain ratio of small gradient data, used when run_goss is True	`0.1`
`cipher_compress_error`	`{None}`	This param is now abandoned	`None`
`cipher_compress`	`bool`	default is True, use cipher compressing to reduce computation cost and transfer cost	`True`

Source code in federatedml/param/boosting_param.py

class HeteroSecureBoostParam(HeteroBoostingParam):
    """
    Define boosting tree parameters that used in federated ml.

    Parameters
    ----------
    task_type : {'classification', 'regression'}, default: 'classification'
        task type

    tree_param : DecisionTreeParam Object, default: DecisionTreeParam()
        tree param

    objective_param : ObjectiveParam Object, default: ObjectiveParam()
        objective param

    learning_rate : float, int or long
        the learning rate of secure boost. default: 0.3

    num_trees : int or float
        the max number of trees to build. default: 5

    subsample_feature_rate : float
        a float-number in [0, 1], default: 1.0

    random_seed: int
        seed that controls all random functions

    n_iter_no_change : bool,
        when True and residual error less than tol, tree building process will stop. default: True

    encrypt_param : EncodeParam Object
        encrypt method use in secure boost, default: EncryptParam(), this parameter
        is only for hetero-secureboost

    bin_num: positive integer greater than 1
        bin number use in quantile. default: 32

    encrypted_mode_calculator_param: EncryptedModeCalculatorParam object
        the calculation mode use in secureboost, default: EncryptedModeCalculatorParam(), only for hetero-secureboost

    use_missing: bool
        use missing value in training process or not. default: False

    zero_as_missing: bool
        regard 0 as missing value or not, will be use only if use_missing=True, default: False

    validation_freqs: None or positive integer or container object in python
        Do validation in training process or Not.
        if equals None, will not do validation in train process;
        if equals positive integer, will validate data every validation_freqs epochs passes;
        if container object in python, will validate data if epochs belong to this container.
        e.g. validation_freqs = [10, 15], will validate data when epoch equals to 10 and 15.
        Default: None
        The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to
        speed up training by skipping validation rounds. When it is larger than 1, a number which is
        divisible by "num_trees" is recommended, otherwise, you will miss the validation scores
        of last training iteration.

    early_stopping_rounds: integer larger than 0
        will stop training if one metric of one validation data
        doesn’t improve in last early_stopping_round rounds，
        need to set validation freqs and will check early_stopping every at every validation epoch,

    metrics: list, default: []
        Specify which metrics to be used when performing evaluation during training process.
        If set as empty, default metrics will be used. For regression tasks, default metrics are
        ['root_mean_squared_error', 'mean_absolute_error']， For binary-classificatiin tasks, default metrics
        are ['auc', 'ks']. For multi-classification tasks, default metrics are ['accuracy', 'precision', 'recall']

    use_first_metric_only: bool
        use only the first metric for early stopping

    complete_secure: bool
        if use complete_secure, when use complete secure, build first tree using only guest features

    sparse_optimization: bool
        Available when encrypted method is 'iterativeAffine'
        An optimized mode for high-dimension, sparse data.

    run_goss: bool
        activate Gradient-based One-Side Sampling, which selects large gradient and small
        gradient samples using top_rate and other_rate.

    top_rate: float
        the retain ratio of large gradient data, used when run_goss is True

    other_rate: float
        the retain ratio of small gradient data, used when run_goss is True

    cipher_compress_error: {None}
        This param is now abandoned

    cipher_compress: bool
        default is True, use cipher compressing to reduce computation cost and transfer cost

    """

    def __init__(self, tree_param: DecisionTreeParam = DecisionTreeParam(), task_type=consts.CLASSIFICATION,
                 objective_param=ObjectiveParam(),
                 learning_rate=0.3, num_trees=5, subsample_feature_rate=1.0, n_iter_no_change=True,
                 tol=0.0001, encrypt_param=EncryptParam(),
                 bin_num=32,
                 encrypted_mode_calculator_param=EncryptedModeCalculatorParam(),
                 predict_param=PredictParam(), cv_param=CrossValidationParam(),
                 validation_freqs=None, early_stopping_rounds=None, use_missing=False, zero_as_missing=False,
                 complete_secure=False, metrics=None, use_first_metric_only=False, random_seed=100,
                 binning_error=consts.DEFAULT_RELATIVE_ERROR,
                 sparse_optimization=False, run_goss=False, top_rate=0.2, other_rate=0.1,
                 cipher_compress_error=None, cipher_compress=True, new_ver=True,
                 callback_param=CallbackParam()):

        super(HeteroSecureBoostParam, self).__init__(task_type, objective_param, learning_rate, num_trees,
                                                     subsample_feature_rate, n_iter_no_change, tol, encrypt_param,
                                                     bin_num, encrypted_mode_calculator_param, predict_param, cv_param,
                                                     validation_freqs, early_stopping_rounds, metrics=metrics,
                                                     use_first_metric_only=use_first_metric_only,
                                                     random_seed=random_seed,
                                                     binning_error=binning_error)

        self.tree_param = copy.deepcopy(tree_param)
        self.zero_as_missing = zero_as_missing
        self.use_missing = use_missing
        self.complete_secure = complete_secure
        self.sparse_optimization = sparse_optimization
        self.run_goss = run_goss
        self.top_rate = top_rate
        self.other_rate = other_rate
        self.cipher_compress_error = cipher_compress_error
        self.cipher_compress = cipher_compress
        self.new_ver = new_ver
        self.callback_param = copy.deepcopy(callback_param)

    def check(self):

        super(HeteroSecureBoostParam, self).check()
        self.tree_param.check()
        if type(self.use_missing) != bool:
            raise ValueError('use missing should be bool type')
        if type(self.zero_as_missing) != bool:
            raise ValueError('zero as missing should be bool type')
        self.check_boolean(self.complete_secure, 'complete_secure')
        self.check_boolean(self.sparse_optimization, 'sparse optimization')
        self.check_boolean(self.run_goss, 'run goss')
        self.check_decimal_float(self.top_rate, 'top rate')
        self.check_decimal_float(self.other_rate, 'other rate')
        self.check_positive_number(self.other_rate, 'other_rate')
        self.check_positive_number(self.top_rate, 'top_rate')
        self.check_boolean(self.new_ver, 'code version switcher')
        self.check_boolean(self.cipher_compress, 'cipher compress')

        for p in ["early_stopping_rounds", "validation_freqs", "metrics",
                  "use_first_metric_only"]:
            # if self._warn_to_deprecate_param(p, "", ""):
            if self._deprecated_params_set.get(p):
                if "callback_param" in self.get_user_feeded():
                    raise ValueError(f"{p} and callback param should not be set simultaneously，"
                                     f"{self._deprecated_params_set}, {self.get_user_feeded()}")
                else:
                    self.callback_param.callbacks = ["PerformanceEvaluate"]
                break

        descr = "boosting_param's"

        if self._warn_to_deprecate_param("validation_freqs", descr, "callback_param's 'validation_freqs'"):
            self.callback_param.validation_freqs = self.validation_freqs

        if self._warn_to_deprecate_param("early_stopping_rounds", descr, "callback_param's 'early_stopping_rounds'"):
            self.callback_param.early_stopping_rounds = self.early_stopping_rounds

        if self._warn_to_deprecate_param("metrics", descr, "callback_param's 'metrics'"):
            self.callback_param.metrics = self.metrics

        if self._warn_to_deprecate_param("use_first_metric_only", descr, "callback_param's 'use_first_metric_only'"):
            self.callback_param.use_first_metric_only = self.use_first_metric_only

        if self.top_rate + self.other_rate >= 1:
            raise ValueError('sum of top rate and other rate should be smaller than 1')

        if self.sparse_optimization and self.cipher_compress:
            raise ValueError('cipher compress is not supported in sparse optimization mode')

        return True

__init__(self, tree_param=<federatedml.param.boosting_param.DecisionTreeParam object at 0x7f3f8a6a3c50>, task_type='classification', objective_param=<federatedml.param.boosting_param.ObjectiveParam object at 0x7f3f8a6a3dd0>, learning_rate=0.3, num_trees=5, subsample_feature_rate=1.0, n_iter_no_change=True, tol=0.0001, encrypt_param=<federatedml.param.encrypt_param.EncryptParam object at 0x7f3f8a6a3e50>, bin_num=32, encrypted_mode_calculator_param=<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object at 0x7f3f8a6a3e90>, predict_param=<federatedml.param.predict_param.PredictParam object at 0x7f3f8a6a3e10>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object at 0x7f3f8a6a3cd0>, validation_freqs=None, early_stopping_rounds=None, use_missing=False, zero_as_missing=False, complete_secure=False, metrics=None, use_first_metric_only=False, random_seed=100, binning_error=0.0001, sparse_optimization=False, run_goss=False, top_rate=0.2, other_rate=0.1, cipher_compress_error=None, cipher_compress=True, new_ver=True, callback_param=<federatedml.param.callback_param.CallbackParam object at 0x7f3f8a6a3f90>)

special ¶

Source code in federatedml/param/boosting_param.py

def __init__(self, tree_param: DecisionTreeParam = DecisionTreeParam(), task_type=consts.CLASSIFICATION,
             objective_param=ObjectiveParam(),
             learning_rate=0.3, num_trees=5, subsample_feature_rate=1.0, n_iter_no_change=True,
             tol=0.0001, encrypt_param=EncryptParam(),
             bin_num=32,
             encrypted_mode_calculator_param=EncryptedModeCalculatorParam(),
             predict_param=PredictParam(), cv_param=CrossValidationParam(),
             validation_freqs=None, early_stopping_rounds=None, use_missing=False, zero_as_missing=False,
             complete_secure=False, metrics=None, use_first_metric_only=False, random_seed=100,
             binning_error=consts.DEFAULT_RELATIVE_ERROR,
             sparse_optimization=False, run_goss=False, top_rate=0.2, other_rate=0.1,
             cipher_compress_error=None, cipher_compress=True, new_ver=True,
             callback_param=CallbackParam()):

    super(HeteroSecureBoostParam, self).__init__(task_type, objective_param, learning_rate, num_trees,
                                                 subsample_feature_rate, n_iter_no_change, tol, encrypt_param,
                                                 bin_num, encrypted_mode_calculator_param, predict_param, cv_param,
                                                 validation_freqs, early_stopping_rounds, metrics=metrics,
                                                 use_first_metric_only=use_first_metric_only,
                                                 random_seed=random_seed,
                                                 binning_error=binning_error)

    self.tree_param = copy.deepcopy(tree_param)
    self.zero_as_missing = zero_as_missing
    self.use_missing = use_missing
    self.complete_secure = complete_secure
    self.sparse_optimization = sparse_optimization
    self.run_goss = run_goss
    self.top_rate = top_rate
    self.other_rate = other_rate
    self.cipher_compress_error = cipher_compress_error
    self.cipher_compress = cipher_compress
    self.new_ver = new_ver
    self.callback_param = copy.deepcopy(callback_param)

check(self) ¶

Source code in federatedml/param/boosting_param.py

def check(self):

    super(HeteroSecureBoostParam, self).check()
    self.tree_param.check()
    if type(self.use_missing) != bool:
        raise ValueError('use missing should be bool type')
    if type(self.zero_as_missing) != bool:
        raise ValueError('zero as missing should be bool type')
    self.check_boolean(self.complete_secure, 'complete_secure')
    self.check_boolean(self.sparse_optimization, 'sparse optimization')
    self.check_boolean(self.run_goss, 'run goss')
    self.check_decimal_float(self.top_rate, 'top rate')
    self.check_decimal_float(self.other_rate, 'other rate')
    self.check_positive_number(self.other_rate, 'other_rate')
    self.check_positive_number(self.top_rate, 'top_rate')
    self.check_boolean(self.new_ver, 'code version switcher')
    self.check_boolean(self.cipher_compress, 'cipher compress')

    for p in ["early_stopping_rounds", "validation_freqs", "metrics",
              "use_first_metric_only"]:
        # if self._warn_to_deprecate_param(p, "", ""):
        if self._deprecated_params_set.get(p):
            if "callback_param" in self.get_user_feeded():
                raise ValueError(f"{p} and callback param should not be set simultaneously，"
                                 f"{self._deprecated_params_set}, {self.get_user_feeded()}")
            else:
                self.callback_param.callbacks = ["PerformanceEvaluate"]
            break

    descr = "boosting_param's"

    if self._warn_to_deprecate_param("validation_freqs", descr, "callback_param's 'validation_freqs'"):
        self.callback_param.validation_freqs = self.validation_freqs

    if self._warn_to_deprecate_param("early_stopping_rounds", descr, "callback_param's 'early_stopping_rounds'"):
        self.callback_param.early_stopping_rounds = self.early_stopping_rounds

    if self._warn_to_deprecate_param("metrics", descr, "callback_param's 'metrics'"):
        self.callback_param.metrics = self.metrics

    if self._warn_to_deprecate_param("use_first_metric_only", descr, "callback_param's 'use_first_metric_only'"):
        self.callback_param.use_first_metric_only = self.use_first_metric_only

    if self.top_rate + self.other_rate >= 1:
        raise ValueError('sum of top rate and other rate should be smaller than 1')

    if self.sparse_optimization and self.cipher_compress:
        raise ValueError('cipher compress is not supported in sparse optimization mode')

    return True


HeteroFastSecureBoostParam            (HeteroSecureBoostParam)

¶

Source code in federatedml/param/boosting_param.py

class HeteroFastSecureBoostParam(HeteroSecureBoostParam):

    def __init__(self, tree_param: DecisionTreeParam = DecisionTreeParam(), task_type=consts.CLASSIFICATION,
                 objective_param=ObjectiveParam(),
                 learning_rate=0.3, num_trees=5, subsample_feature_rate=1, n_iter_no_change=True,
                 tol=0.0001, encrypt_param=EncryptParam(),
                 bin_num=32,
                 encrypted_mode_calculator_param=EncryptedModeCalculatorParam(),
                 predict_param=PredictParam(), cv_param=CrossValidationParam(),
                 validation_freqs=None, early_stopping_rounds=None, use_missing=False, zero_as_missing=False,
                 complete_secure=False, tree_num_per_party=1, guest_depth=1, host_depth=1, work_mode='mix', metrics=None,
                 sparse_optimization=False, random_seed=100, binning_error=consts.DEFAULT_RELATIVE_ERROR,
                 cipher_compress_error=None, new_ver=True, run_goss=False, top_rate=0.2, other_rate=0.1,
                 cipher_compress=True, callback_param=CallbackParam()):

        """
        Parameters
        ----------
        work_mode: {"mix", "layered"}
            mix:  alternate using guest/host features to build trees. For example, the first 'tree_num_per_party' trees use guest features,
                  the second k trees use host features, and so on
            layered: only support 2 party, when running layered mode, first 'host_depth' layer will use host features,
                     and then next 'guest_depth' will only use guest features
        tree_num_per_party: int
            every party will alternate build 'tree_num_per_party' trees until reach max tree num, this param is valid when work_mode is mix
        guest_depth: int
            guest will build last guest_depth of a decision tree using guest features, is valid when work mode is layered
        host depth: int
            host will build first host_depth of a decision tree using host features, is valid when work mode is layered

        """

        super(HeteroFastSecureBoostParam, self).__init__(tree_param, task_type, objective_param, learning_rate,
                                                         num_trees, subsample_feature_rate, n_iter_no_change, tol,
                                                         encrypt_param, bin_num, encrypted_mode_calculator_param,
                                                         predict_param, cv_param, validation_freqs, early_stopping_rounds,
                                                         use_missing, zero_as_missing, complete_secure, metrics=metrics,
                                                         random_seed=random_seed,
                                                         sparse_optimization=sparse_optimization,
                                                         binning_error=binning_error,
                                                         cipher_compress_error=cipher_compress_error,
                                                         new_ver=new_ver,
                                                         cipher_compress=cipher_compress,
                                                         run_goss=run_goss, top_rate=top_rate, other_rate=other_rate,
                                                         )

        self.tree_num_per_party = tree_num_per_party
        self.guest_depth = guest_depth
        self.host_depth = host_depth
        self.work_mode = work_mode
        self.callback_param = copy.deepcopy(callback_param)

    def check(self):

        super(HeteroFastSecureBoostParam, self).check()
        if type(self.guest_depth).__name__ not in ["int", "long"] or self.guest_depth <= 0:
            raise ValueError("guest_depth should be larger than 0")
        if type(self.host_depth).__name__ not in ["int", "long"] or self.host_depth <= 0:
            raise ValueError("host_depth should be larger than 0")
        if type(self.tree_num_per_party).__name__ not in ["int", "long"] or self.tree_num_per_party <= 0:
            raise ValueError("tree_num_per_party should be larger than 0")

        work_modes = [consts.MIX_TREE, consts.LAYERED_TREE]
        if self.work_mode not in work_modes:
            raise ValueError('only work_modes: {} are supported, input work mode is {}'.
                             format(work_modes, self.work_mode))

        return True

Methods¶

__init__(self, tree_param=<federatedml.param.boosting_param.DecisionTreeParam object at 0x7f3f8a6a3f10>, task_type='classification', objective_param=<federatedml.param.boosting_param.ObjectiveParam object at 0x7f3f8a6a3d50>, learning_rate=0.3, num_trees=5, subsample_feature_rate=1, n_iter_no_change=True, tol=0.0001, encrypt_param=<federatedml.param.encrypt_param.EncryptParam object at 0x7f3f8a6a3fd0>, bin_num=32, encrypted_mode_calculator_param=<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object at 0x7f3f8a75b050>, predict_param=<federatedml.param.predict_param.PredictParam object at 0x7f3f8a75b0d0>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object at 0x7f3f8a75b190>, validation_freqs=None, early_stopping_rounds=None, use_missing=False, zero_as_missing=False, complete_secure=False, tree_num_per_party=1, guest_depth=1, host_depth=1, work_mode='mix', metrics=None, sparse_optimization=False, random_seed=100, binning_error=0.0001, cipher_compress_error=None, new_ver=True, run_goss=False, top_rate=0.2, other_rate=0.1, cipher_compress=True, callback_param=<federatedml.param.callback_param.CallbackParam object at 0x7f3f8a75b210>)

special ¶

Parameters:

Name	Type	Description	Default
`work_mode`	`{"mix", "layered"}`		`'mix'`
`tree_num_per_party`	`int`	every party will alternate build 'tree_num_per_party' trees until reach max tree num, this param is valid when work_mode is mix	`1`
`guest_depth`	`int`	guest will build last guest_depth of a decision tree using guest features, is valid when work mode is layered	`1`
`host depth`	`int`	host will build first host_depth of a decision tree using host features, is valid when work mode is layered	required

Source code in federatedml/param/boosting_param.py

def __init__(self, tree_param: DecisionTreeParam = DecisionTreeParam(), task_type=consts.CLASSIFICATION,
             objective_param=ObjectiveParam(),
             learning_rate=0.3, num_trees=5, subsample_feature_rate=1, n_iter_no_change=True,
             tol=0.0001, encrypt_param=EncryptParam(),
             bin_num=32,
             encrypted_mode_calculator_param=EncryptedModeCalculatorParam(),
             predict_param=PredictParam(), cv_param=CrossValidationParam(),
             validation_freqs=None, early_stopping_rounds=None, use_missing=False, zero_as_missing=False,
             complete_secure=False, tree_num_per_party=1, guest_depth=1, host_depth=1, work_mode='mix', metrics=None,
             sparse_optimization=False, random_seed=100, binning_error=consts.DEFAULT_RELATIVE_ERROR,
             cipher_compress_error=None, new_ver=True, run_goss=False, top_rate=0.2, other_rate=0.1,
             cipher_compress=True, callback_param=CallbackParam()):

    """
    Parameters
    ----------
    work_mode: {"mix", "layered"}
        mix:  alternate using guest/host features to build trees. For example, the first 'tree_num_per_party' trees use guest features,
              the second k trees use host features, and so on
        layered: only support 2 party, when running layered mode, first 'host_depth' layer will use host features,
                 and then next 'guest_depth' will only use guest features
    tree_num_per_party: int
        every party will alternate build 'tree_num_per_party' trees until reach max tree num, this param is valid when work_mode is mix
    guest_depth: int
        guest will build last guest_depth of a decision tree using guest features, is valid when work mode is layered
    host depth: int
        host will build first host_depth of a decision tree using host features, is valid when work mode is layered

    """

    super(HeteroFastSecureBoostParam, self).__init__(tree_param, task_type, objective_param, learning_rate,
                                                     num_trees, subsample_feature_rate, n_iter_no_change, tol,
                                                     encrypt_param, bin_num, encrypted_mode_calculator_param,
                                                     predict_param, cv_param, validation_freqs, early_stopping_rounds,
                                                     use_missing, zero_as_missing, complete_secure, metrics=metrics,
                                                     random_seed=random_seed,
                                                     sparse_optimization=sparse_optimization,
                                                     binning_error=binning_error,
                                                     cipher_compress_error=cipher_compress_error,
                                                     new_ver=new_ver,
                                                     cipher_compress=cipher_compress,
                                                     run_goss=run_goss, top_rate=top_rate, other_rate=other_rate,
                                                     )

    self.tree_num_per_party = tree_num_per_party
    self.guest_depth = guest_depth
    self.host_depth = host_depth
    self.work_mode = work_mode
    self.callback_param = copy.deepcopy(callback_param)

check(self) ¶

Source code in federatedml/param/boosting_param.py

def check(self):

    super(HeteroFastSecureBoostParam, self).check()
    if type(self.guest_depth).__name__ not in ["int", "long"] or self.guest_depth <= 0:
        raise ValueError("guest_depth should be larger than 0")
    if type(self.host_depth).__name__ not in ["int", "long"] or self.host_depth <= 0:
        raise ValueError("host_depth should be larger than 0")
    if type(self.tree_num_per_party).__name__ not in ["int", "long"] or self.tree_num_per_party <= 0:
        raise ValueError("tree_num_per_party should be larger than 0")

    work_modes = [consts.MIX_TREE, consts.LAYERED_TREE]
    if self.work_mode not in work_modes:
        raise ValueError('only work_modes: {} are supported, input work mode is {}'.
                         format(work_modes, self.work_mode))

    return True


HomoSecureBoostParam            (BoostingParam)

¶

Parameters:

Name	Type	Description	Default
`backend`	`{'distributed', 'memory'}`	decides which backend to use when computing histograms for homo-sbt	`'distributed'`

Source code in federatedml/param/boosting_param.py

class HomoSecureBoostParam(BoostingParam):

    """
    Parameters
    ----------
    backend: {'distributed', 'memory'}
        decides which backend to use when computing histograms for homo-sbt
    """

    def __init__(self, tree_param: DecisionTreeParam = DecisionTreeParam(), task_type=consts.CLASSIFICATION,
                 objective_param=ObjectiveParam(),
                 learning_rate=0.3, num_trees=5, subsample_feature_rate=1, n_iter_no_change=True,
                 tol=0.0001, bin_num=32, predict_param=PredictParam(), cv_param=CrossValidationParam(),
                 validation_freqs=None, use_missing=False, zero_as_missing=False, random_seed=100,
                 binning_error=consts.DEFAULT_RELATIVE_ERROR, backend=consts.DISTRIBUTED_BACKEND,
                 callback_param=CallbackParam()):
        super(HomoSecureBoostParam, self).__init__(task_type=task_type,
                                                   objective_param=objective_param,
                                                   learning_rate=learning_rate,
                                                   num_trees=num_trees,
                                                   subsample_feature_rate=subsample_feature_rate,
                                                   n_iter_no_change=n_iter_no_change,
                                                   tol=tol,
                                                   bin_num=bin_num,
                                                   predict_param=predict_param,
                                                   cv_param=cv_param,
                                                   validation_freqs=validation_freqs,
                                                   random_seed=random_seed,
                                                   binning_error=binning_error
                                                   )
        self.use_missing = use_missing
        self.zero_as_missing = zero_as_missing
        self.tree_param = copy.deepcopy(tree_param)
        self.backend = backend
        self.callback_param = copy.deepcopy(callback_param)

    def check(self):

        super(HomoSecureBoostParam, self).check()
        self.tree_param.check()
        if type(self.use_missing) != bool:
            raise ValueError('use missing should be bool type')
        if type(self.zero_as_missing) != bool:
            raise ValueError('zero as missing should be bool type')
        if self.backend not in [consts.MEMORY_BACKEND, consts.DISTRIBUTED_BACKEND]:
            raise ValueError('unsupported backend')

        for p in ["validation_freqs", "metrics"]:
            # if self._warn_to_deprecate_param(p, "", ""):
            if self._deprecated_params_set.get(p):
                if "callback_param" in self.get_user_feeded():
                    raise ValueError(f"{p} and callback param should not be set simultaneously，"
                                     f"{self._deprecated_params_set}, {self.get_user_feeded()}")
                else:
                    self.callback_param.callbacks = ["PerformanceEvaluate"]
                break

        descr = "boosting_param's"

        if self._warn_to_deprecate_param("validation_freqs", descr, "callback_param's 'validation_freqs'"):
            self.callback_param.validation_freqs = self.validation_freqs

        if self._warn_to_deprecate_param("metrics", descr, "callback_param's 'metrics'"):
            self.callback_param.metrics = self.metrics

        return True

__init__(self, tree_param=<federatedml.param.boosting_param.DecisionTreeParam object at 0x7f3f8a75b150>, task_type='classification', objective_param=<federatedml.param.boosting_param.ObjectiveParam object at 0x7f3f8a75b390>, learning_rate=0.3, num_trees=5, subsample_feature_rate=1, n_iter_no_change=True, tol=0.0001, bin_num=32, predict_param=<federatedml.param.predict_param.PredictParam object at 0x7f3f8a75b410>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object at 0x7f3f8a75b450>, validation_freqs=None, use_missing=False, zero_as_missing=False, random_seed=100, binning_error=0.0001, backend='distributed', callback_param=<federatedml.param.callback_param.CallbackParam object at 0x7f3f8a75b4d0>)

special ¶

Source code in federatedml/param/boosting_param.py

def __init__(self, tree_param: DecisionTreeParam = DecisionTreeParam(), task_type=consts.CLASSIFICATION,
             objective_param=ObjectiveParam(),
             learning_rate=0.3, num_trees=5, subsample_feature_rate=1, n_iter_no_change=True,
             tol=0.0001, bin_num=32, predict_param=PredictParam(), cv_param=CrossValidationParam(),
             validation_freqs=None, use_missing=False, zero_as_missing=False, random_seed=100,
             binning_error=consts.DEFAULT_RELATIVE_ERROR, backend=consts.DISTRIBUTED_BACKEND,
             callback_param=CallbackParam()):
    super(HomoSecureBoostParam, self).__init__(task_type=task_type,
                                               objective_param=objective_param,
                                               learning_rate=learning_rate,
                                               num_trees=num_trees,
                                               subsample_feature_rate=subsample_feature_rate,
                                               n_iter_no_change=n_iter_no_change,
                                               tol=tol,
                                               bin_num=bin_num,
                                               predict_param=predict_param,
                                               cv_param=cv_param,
                                               validation_freqs=validation_freqs,
                                               random_seed=random_seed,
                                               binning_error=binning_error
                                               )
    self.use_missing = use_missing
    self.zero_as_missing = zero_as_missing
    self.tree_param = copy.deepcopy(tree_param)
    self.backend = backend
    self.callback_param = copy.deepcopy(callback_param)

check(self) ¶

Source code in federatedml/param/boosting_param.py

def check(self):

    super(HomoSecureBoostParam, self).check()
    self.tree_param.check()
    if type(self.use_missing) != bool:
        raise ValueError('use missing should be bool type')
    if type(self.zero_as_missing) != bool:
        raise ValueError('zero as missing should be bool type')
    if self.backend not in [consts.MEMORY_BACKEND, consts.DISTRIBUTED_BACKEND]:
        raise ValueError('unsupported backend')

    for p in ["validation_freqs", "metrics"]:
        # if self._warn_to_deprecate_param(p, "", ""):
        if self._deprecated_params_set.get(p):
            if "callback_param" in self.get_user_feeded():
                raise ValueError(f"{p} and callback param should not be set simultaneously，"
                                 f"{self._deprecated_params_set}, {self.get_user_feeded()}")
            else:
                self.callback_param.callbacks = ["PerformanceEvaluate"]
            break

    descr = "boosting_param's"

    if self._warn_to_deprecate_param("validation_freqs", descr, "callback_param's 'validation_freqs'"):
        self.callback_param.validation_freqs = self.validation_freqs

    if self._warn_to_deprecate_param("metrics", descr, "callback_param's 'metrics'"):
        self.callback_param.metrics = self.metrics

    return True

`callback_param` ¶

Classes¶


CallbackParam            (BaseParam)

¶

Define callback method that used in federated ml.

Parameters:

Name	Type	Description	Default
`callbacks`	`list, default: []`	Indicate what kinds of callback functions is desired during the training process. Accepted values: {'EarlyStopping', 'ModelCheckpoint'， 'PerformanceEvaluate'}	`None`
`validation_freqs`	`{None, int, list, tuple, set}`	validation frequency during training.	`None`
`early_stopping_rounds`	`None or int`	Will stop training if one metric doesn’t improve in last early_stopping_round rounds	`None`
`metrics`	`None, or list`	Indicate when executing evaluation during train process, which metrics will be used. If set as empty, default metrics for specific task type will be used. As for binary classification, default metrics are ['auc', 'ks']	`None`
`use_first_metric_only`	`bool, default: False`	Indicate whether use the first metric only for early stopping judgement.	`False`
`save_freq`	`int, default: 1`	The callbacks save model every save_freq epoch	`1`

Source code in federatedml/param/callback_param.py

class CallbackParam(BaseParam):
    """
    Define callback method that used in federated ml.

    Parameters
    ----------
    callbacks : list, default: []
        Indicate what kinds of callback functions is desired during the training process.
        Accepted values: {'EarlyStopping', 'ModelCheckpoint'， 'PerformanceEvaluate'}

    validation_freqs: {None, int, list, tuple, set}
        validation frequency during training.

    early_stopping_rounds: None or int
        Will stop training if one metric doesn’t improve in last early_stopping_round rounds

    metrics: None, or list
        Indicate when executing evaluation during train process, which metrics will be used. If set as empty,
        default metrics for specific task type will be used. As for binary classification, default metrics are
        ['auc', 'ks']

    use_first_metric_only: bool, default: False
        Indicate whether use the first metric only for early stopping judgement.

    save_freq: int, default: 1
        The callbacks save model every save_freq epoch


    """

    def __init__(self, callbacks=None, validation_freqs=None, early_stopping_rounds=None,
                 metrics=None, use_first_metric_only=False, save_freq=1):
        super(CallbackParam, self).__init__()
        self.callbacks = callbacks or []
        self.validation_freqs = validation_freqs
        self.early_stopping_rounds = early_stopping_rounds
        self.metrics = metrics or []
        self.use_first_metric_only = use_first_metric_only
        self.save_freq = save_freq

    def check(self):

        if self.early_stopping_rounds is None:
            pass
        elif isinstance(self.early_stopping_rounds, int):
            if self.early_stopping_rounds < 1:
                raise ValueError("early stopping rounds should be larger than 0 when it's integer")
            if self.validation_freqs is None:
                raise ValueError("validation freqs must be set when early stopping is enabled")

        if self.metrics is not None and not isinstance(self.metrics, list):
            raise ValueError("metrics should be a list")

        if not isinstance(self.use_first_metric_only, bool):
            raise ValueError("use_first_metric_only should be a boolean")

        return True

__init__(self, callbacks=None, validation_freqs=None, early_stopping_rounds=None, metrics=None, use_first_metric_only=False, save_freq=1)

special ¶

Source code in federatedml/param/callback_param.py

def __init__(self, callbacks=None, validation_freqs=None, early_stopping_rounds=None,
             metrics=None, use_first_metric_only=False, save_freq=1):
    super(CallbackParam, self).__init__()
    self.callbacks = callbacks or []
    self.validation_freqs = validation_freqs
    self.early_stopping_rounds = early_stopping_rounds
    self.metrics = metrics or []
    self.use_first_metric_only = use_first_metric_only
    self.save_freq = save_freq

check(self) ¶

Source code in federatedml/param/callback_param.py

def check(self):

    if self.early_stopping_rounds is None:
        pass
    elif isinstance(self.early_stopping_rounds, int):
        if self.early_stopping_rounds < 1:
            raise ValueError("early stopping rounds should be larger than 0 when it's integer")
        if self.validation_freqs is None:
            raise ValueError("validation freqs must be set when early stopping is enabled")

    if self.metrics is not None and not isinstance(self.metrics, list):
        raise ValueError("metrics should be a list")

    if not isinstance(self.use_first_metric_only, bool):
        raise ValueError("use_first_metric_only should be a boolean")

    return True

`column_expand_param` ¶

Classes¶


ColumnExpandParam            (BaseParam)

¶

Define method used for expanding column

Parameters:

Name	Type	Description	Default
`append_header`	`None or str or List[str], default: None`	Name(s) for appended feature(s). If None is given, module outputs the original input value without any operation.	`None`
`method`	`str, default: 'manual'`	If method is 'manual', use user-specified `fill_value` to fill in new features.	`'manual'`
`fill_value`	`int or float or str or List[int] or List[float] or List[str], default: 1e-8`	Used for filling expanded feature columns. If given a list, length of the list must match that of `append_header`	`1e-08`
`need_run`	`bool, default: True`	Indicate if this module needed to be run.	`True`

Source code in federatedml/param/column_expand_param.py

class ColumnExpandParam(BaseParam):
    """
    Define method used for expanding column

    Parameters
    ----------

    append_header : None or str or List[str], default: None
        Name(s) for appended feature(s). If None is given, module outputs the original input value without any operation.

    method : str, default: 'manual'
        If method is 'manual', use user-specified `fill_value` to fill in new features.

    fill_value : int or float or str or List[int] or List[float] or List[str], default: 1e-8
        Used for filling expanded feature columns. If given a list, length of the list must match that of `append_header`

    need_run: bool, default: True
        Indicate if this module needed to be run.

    """

    def __init__(self, append_header=None, method="manual",
                 fill_value=consts.FLOAT_ZERO, need_run=True):
        super(ColumnExpandParam, self).__init__()
        self.append_header = [] if append_header is None else append_header
        self.method = method
        self.fill_value = fill_value
        self.need_run = need_run

    def check(self):
        descr = "column_expand param's "
        if not isinstance(self.method, str):
            raise ValueError(f"{descr}method {self.method} not supported, should be str type")
        else:
            user_input = self.method.lower()
            if user_input == "manual":
                self.method = consts.MANUAL
            else:
                raise ValueError(f"{descr} method {user_input} not supported")

        BaseParam.check_boolean(self.need_run, descr=descr)

        if not isinstance(self.append_header, list):
            raise ValueError(f"{descr} append_header must be None or list of str. "
                             f"Received {type(self.append_header)} instead.")
        for feature_name in self.append_header:
            BaseParam.check_string(feature_name, descr+"append_header values")

        if isinstance(self.fill_value, list):
            if len(self.append_header) != len(self.fill_value):
                raise ValueError(
                    f"{descr} `fill value` is set to be list, "
                    f"and param `append_header` must also be list of the same length.")
        else:
            self.fill_value = [self.fill_value]
        for value in self.fill_value:
            if type(value).__name__ not in ["float", "int", "long", "str"]:
                raise ValueError(
                    f"{descr} fill value(s) must be float, int, or str. Received type {type(value)} instead.")

        LOGGER.debug("Finish column expand parameter check!")
        return True

__init__(self, append_header=None, method='manual', fill_value=1e-08, need_run=True) special ¶

Source code in federatedml/param/column_expand_param.py

def __init__(self, append_header=None, method="manual",
             fill_value=consts.FLOAT_ZERO, need_run=True):
    super(ColumnExpandParam, self).__init__()
    self.append_header = [] if append_header is None else append_header
    self.method = method
    self.fill_value = fill_value
    self.need_run = need_run

check(self) ¶

Source code in federatedml/param/column_expand_param.py

def check(self):
    descr = "column_expand param's "
    if not isinstance(self.method, str):
        raise ValueError(f"{descr}method {self.method} not supported, should be str type")
    else:
        user_input = self.method.lower()
        if user_input == "manual":
            self.method = consts.MANUAL
        else:
            raise ValueError(f"{descr} method {user_input} not supported")

    BaseParam.check_boolean(self.need_run, descr=descr)

    if not isinstance(self.append_header, list):
        raise ValueError(f"{descr} append_header must be None or list of str. "
                         f"Received {type(self.append_header)} instead.")
    for feature_name in self.append_header:
        BaseParam.check_string(feature_name, descr+"append_header values")

    if isinstance(self.fill_value, list):
        if len(self.append_header) != len(self.fill_value):
            raise ValueError(
                f"{descr} `fill value` is set to be list, "
                f"and param `append_header` must also be list of the same length.")
    else:
        self.fill_value = [self.fill_value]
    for value in self.fill_value:
        if type(value).__name__ not in ["float", "int", "long", "str"]:
            raise ValueError(
                f"{descr} fill value(s) must be float, int, or str. Received type {type(value)} instead.")

    LOGGER.debug("Finish column expand parameter check!")
    return True

`cross_validation_param` ¶

Classes¶


CrossValidationParam            (BaseParam)

¶

Define cross validation params

Parameters:

Name	Type	Description	Default
`n_splits`	`int, default: 5`	Specify how many splits used in KFold	`5`
`mode`	`str, default: 'Hetero'`	Indicate what mode is current task	`'hetero'`
`role`	`{'Guest', 'Host', 'Arbiter'}, default: 'Guest'`	Indicate what role is current party	`'guest'`
`shuffle`	`bool, default: True`	Define whether do shuffle before KFold or not.	`True`
`random_seed`	`int, default: 1`	Specify the random seed for numpy shuffle	`1`
`need_cv`	`bool, default False`	Indicate if this module needed to be run	`False`
`output_fold_history`	`bool, default True`	Indicate whether to output table of ids used by each fold, else return original input data returned ids are formatted as: {original_id}#fold{fold_num}#{train/validate}	`True`
`history_value_type`	`{'score', 'instance'}, default score`	Indicate whether to include original instance or predict score in the output fold history, only effective when output_fold_history set to True	`'score'`

Source code in federatedml/param/cross_validation_param.py

class CrossValidationParam(BaseParam):
    """
    Define cross validation params

    Parameters
    ----------
    n_splits: int, default: 5
        Specify how many splits used in KFold

    mode: str, default: 'Hetero'
        Indicate what mode is current task

    role: {'Guest', 'Host', 'Arbiter'}, default: 'Guest'
        Indicate what role is current party

    shuffle: bool, default: True
        Define whether do shuffle before KFold or not.

    random_seed: int, default: 1
        Specify the random seed for numpy shuffle

    need_cv: bool, default False
        Indicate if this module needed to be run

    output_fold_history: bool, default True
        Indicate whether to output table of ids used by each fold, else return original input data
        returned ids are formatted as: {original_id}#fold{fold_num}#{train/validate}

    history_value_type: {'score', 'instance'}, default score
        Indicate whether to include original instance or predict score in the output fold history,
        only effective when output_fold_history set to True

    """

    def __init__(self, n_splits=5, mode=consts.HETERO, role=consts.GUEST, shuffle=True, random_seed=1,
                 need_cv=False, output_fold_history=True, history_value_type="score"):
        super(CrossValidationParam, self).__init__()
        self.n_splits = n_splits
        self.mode = mode
        self.role = role
        self.shuffle = shuffle
        self.random_seed = random_seed
        # self.evaluate_param = copy.deepcopy(evaluate_param)
        self.need_cv = need_cv
        self.output_fold_history = output_fold_history
        self.history_value_type = history_value_type

    def check(self):
        model_param_descr = "cross validation param's "
        self.check_positive_integer(self.n_splits, model_param_descr)
        self.check_valid_value(self.mode, model_param_descr, valid_values=[consts.HOMO, consts.HETERO])
        self.check_valid_value(self.role, model_param_descr, valid_values=[consts.HOST, consts.GUEST, consts.ARBITER])
        self.check_boolean(self.shuffle, model_param_descr)
        self.check_boolean(self.output_fold_history, model_param_descr)
        self.history_value_type = self.check_and_change_lower(self.history_value_type, ["instance", "score"], model_param_descr)
        if self.random_seed is not None:
            self.check_positive_integer(self.random_seed, model_param_descr)

__init__(self, n_splits=5, mode='hetero', role='guest', shuffle=True, random_seed=1, need_cv=False, output_fold_history=True, history_value_type='score')

special ¶

Source code in federatedml/param/cross_validation_param.py

def __init__(self, n_splits=5, mode=consts.HETERO, role=consts.GUEST, shuffle=True, random_seed=1,
             need_cv=False, output_fold_history=True, history_value_type="score"):
    super(CrossValidationParam, self).__init__()
    self.n_splits = n_splits
    self.mode = mode
    self.role = role
    self.shuffle = shuffle
    self.random_seed = random_seed
    # self.evaluate_param = copy.deepcopy(evaluate_param)
    self.need_cv = need_cv
    self.output_fold_history = output_fold_history
    self.history_value_type = history_value_type

check(self) ¶

Source code in federatedml/param/cross_validation_param.py

def check(self):
    model_param_descr = "cross validation param's "
    self.check_positive_integer(self.n_splits, model_param_descr)
    self.check_valid_value(self.mode, model_param_descr, valid_values=[consts.HOMO, consts.HETERO])
    self.check_valid_value(self.role, model_param_descr, valid_values=[consts.HOST, consts.GUEST, consts.ARBITER])
    self.check_boolean(self.shuffle, model_param_descr)
    self.check_boolean(self.output_fold_history, model_param_descr)
    self.history_value_type = self.check_and_change_lower(self.history_value_type, ["instance", "score"], model_param_descr)
    if self.random_seed is not None:
        self.check_positive_integer(self.random_seed, model_param_descr)

`data_split_param` ¶

Classes¶


DataSplitParam            (BaseParam)

¶

Define data split param that used in data split.

Parameters:

Name	Type	Description	Default
`random_state`	`None or int, default: None`	Specify the random state for shuffle.	`None`
`test_size`	`float or int or None, default: 0.0`	Specify test data set size. float value specifies fraction of input data set, int value specifies exact number of data instances	`None`
`train_size`	`float or int or None, default: 0.8`	Specify train data set size. float value specifies fraction of input data set, int value specifies exact number of data instances	`None`
`validate_size`	`float or int or None, default: 0.2`	Specify validate data set size. float value specifies fraction of input data set, int value specifies exact number of data instances	`None`
`stratified`	`bool, default: False`	Define whether sampling should be stratified, according to label value.	`False`
`shuffle`	`bool, default: True`	Define whether do shuffle before splitting or not.	`True`
`split_points`	`None or list, default : None`	Specify the point(s) by which continuous label values are bucketed into bins for stratified split. eg.[0.2] for two bins or [0.1, 1, 3] for 4 bins	`None`
`need_run`	`bool, default: True`	Specify whether to run data split	`True`

Source code in federatedml/param/data_split_param.py

class DataSplitParam(BaseParam):
    """
    Define data split param that used in data split.

    Parameters
    ----------
    random_state : None or int, default: None
        Specify the random state for shuffle.

    test_size : float or int or None, default: 0.0
        Specify test data set size.
        float value specifies fraction of input data set, int value specifies exact number of data instances

    train_size : float or int or None, default: 0.8
        Specify train data set size.
        float value specifies fraction of input data set, int value specifies exact number of data instances

    validate_size : float or int or None, default: 0.2
        Specify validate data set size.
        float value specifies fraction of input data set, int value specifies exact number of data instances

    stratified : bool, default: False
        Define whether sampling should be stratified, according to label value.

    shuffle : bool, default: True
        Define whether do shuffle before splitting or not.

    split_points : None or list, default : None
        Specify the point(s) by which continuous label values are bucketed into bins for stratified split.
        eg.[0.2] for two bins or [0.1, 1, 3] for 4 bins

    need_run: bool, default: True
        Specify whether to run data split

    """

    def __init__(self, random_state=None, test_size=None, train_size=None, validate_size=None, stratified=False,
                 shuffle=True, split_points=None, need_run=True):
        super(DataSplitParam, self).__init__()
        self.random_state = random_state
        self.test_size = test_size
        self.train_size = train_size
        self.validate_size = validate_size
        self.stratified = stratified
        self.shuffle = shuffle
        self.split_points = split_points
        self.need_run = need_run

    def check(self):
        model_param_descr = "data split param's "
        if self.random_state is not None:
            if not isinstance(self.random_state, int):
                raise ValueError(f"{model_param_descr} random state should be int type")
            BaseParam.check_nonnegative_number(self.random_state, f"{model_param_descr} random_state ")

        if self.test_size is not None:
            BaseParam.check_nonnegative_number(self.test_size, f"{model_param_descr} test_size ")
            if isinstance(self.test_size, float):
                BaseParam.check_decimal_float(self.test_size, f"{model_param_descr} test_size ")
        if self.train_size is not None:
            BaseParam.check_nonnegative_number(self.train_size, f"{model_param_descr} train_size ")
            if isinstance(self.train_size, float):
                BaseParam.check_decimal_float(self.train_size, f"{model_param_descr} train_size ")
        if self.validate_size is not None:
            BaseParam.check_nonnegative_number(self.validate_size, f"{model_param_descr} validate_size ")
            if isinstance(self.validate_size, float):
                BaseParam.check_decimal_float(self.validate_size, f"{model_param_descr} validate_size ")
        # use default size values if none given
        if self.test_size is None and self.train_size is None and self.validate_size is None:
            self.test_size = 0.0
            self.train_size = 0.8
            self.validate_size = 0.2

        BaseParam.check_boolean(self.stratified, f"{model_param_descr} stratified ")
        BaseParam.check_boolean(self.shuffle, f"{model_param_descr} shuffle ")
        BaseParam.check_boolean(self.need_run, f"{model_param_descr} need run ")

        if self.split_points is not None:
            if not isinstance(self.split_points, list):
                raise ValueError(f"{model_param_descr} split_points should be list type")

        LOGGER.debug("Finish data_split parameter check!")
        return True

__init__(self, random_state=None, test_size=None, train_size=None, validate_size=None, stratified=False, shuffle=True, split_points=None, need_run=True)

special ¶

Source code in federatedml/param/data_split_param.py

def __init__(self, random_state=None, test_size=None, train_size=None, validate_size=None, stratified=False,
             shuffle=True, split_points=None, need_run=True):
    super(DataSplitParam, self).__init__()
    self.random_state = random_state
    self.test_size = test_size
    self.train_size = train_size
    self.validate_size = validate_size
    self.stratified = stratified
    self.shuffle = shuffle
    self.split_points = split_points
    self.need_run = need_run

check(self) ¶

Source code in federatedml/param/data_split_param.py

def check(self):
    model_param_descr = "data split param's "
    if self.random_state is not None:
        if not isinstance(self.random_state, int):
            raise ValueError(f"{model_param_descr} random state should be int type")
        BaseParam.check_nonnegative_number(self.random_state, f"{model_param_descr} random_state ")

    if self.test_size is not None:
        BaseParam.check_nonnegative_number(self.test_size, f"{model_param_descr} test_size ")
        if isinstance(self.test_size, float):
            BaseParam.check_decimal_float(self.test_size, f"{model_param_descr} test_size ")
    if self.train_size is not None:
        BaseParam.check_nonnegative_number(self.train_size, f"{model_param_descr} train_size ")
        if isinstance(self.train_size, float):
            BaseParam.check_decimal_float(self.train_size, f"{model_param_descr} train_size ")
    if self.validate_size is not None:
        BaseParam.check_nonnegative_number(self.validate_size, f"{model_param_descr} validate_size ")
        if isinstance(self.validate_size, float):
            BaseParam.check_decimal_float(self.validate_size, f"{model_param_descr} validate_size ")
    # use default size values if none given
    if self.test_size is None and self.train_size is None and self.validate_size is None:
        self.test_size = 0.0
        self.train_size = 0.8
        self.validate_size = 0.2

    BaseParam.check_boolean(self.stratified, f"{model_param_descr} stratified ")
    BaseParam.check_boolean(self.shuffle, f"{model_param_descr} shuffle ")
    BaseParam.check_boolean(self.need_run, f"{model_param_descr} need run ")

    if self.split_points is not None:
        if not isinstance(self.split_points, list):
            raise ValueError(f"{model_param_descr} split_points should be list type")

    LOGGER.debug("Finish data_split parameter check!")
    return True

`data_transform_param` ¶

Classes¶


DataTransformParam            (BaseParam)

¶

Define data transform parameters that used in federated ml.

Parameters:

Name	Type	Description	Default
`input_format`	`{'dense', 'sparse', 'tag'}`	please have a look at this tutorial at "DataTransform" section of federatedml/util/README.md. Formally, dense input format data should be set to "dense", svm-light input format data should be set to "sparse", tag or tag:value input format data should be set to "tag".	`'dense'`
`delimitor`	`str`	the delimitor of data input, default: ','	`','`
`data_type`	`int`	{'float64','float','int','int64','str','long'} the data type of data input	`'float64'`
`exclusive_data_type`	`dict`	the key of dict is col_name, the value is data_type, use to specified special data type of some features.	`None`
`tag_with_value`	`bool`	use if input_format is 'tag', if tag_with_value is True, input column data format should be tag[delimitor]value, otherwise is tag only	`False`
`tag_value_delimitor`	`str`	use if input_format is 'tag' and 'tag_with_value' is True, delimitor of tag[delimitor]value column value.	`':'`
`missing_fill`	`bool`	need to fill missing value or not, accepted only True/False, default: False	`False`
`default_value`	`None or object or list`	the value to replace missing value. if None, it will use default value define in federatedml/feature/imputer.py, if single object, will fill missing value with this object, if list, it's length should be the sample of input data' feature dimension, means that if some column happens to have missing values, it will replace it the value by element in the identical position of this list.	`0`
`missing_fill_method`	`None or str`	the method to replace missing value, should be one of [None, 'min', 'max', 'mean', 'designated']	`None`
`missing_impute`	`None or list`	element of list can be any type, or auto generated if value is None, define which values to be consider as missing	`None`
`outlier_replace`	`bool`	need to replace outlier value or not, accepted only True/False, default: True	`False`
`outlier_replace_method`	`None or str`	the method to replace missing value, should be one of [None, 'min', 'max', 'mean', 'designated']	`None`
`outlier_impute`	`None or list`	element of list can be any type, which values should be regard as missing value	`None`
`outlier_replace_value`	`None or object or list`	the value to replace outlier. if None, it will use default value define in federatedml/feature/imputer.py, if single object, will replace outlier with this object, if list, it's length should be the sample of input data' feature dimension, means that if some column happens to have outliers, it will replace it the value by element in the identical position of this list.	`0`
`with_label`	`bool`	True if input data consist of label, False otherwise. default: 'false'	`False`
`label_name`	`str`	column_name of the column where label locates, only use in dense-inputformat. default: 'y'	`'y'`
`label_type`	`{'int','int64','float','float64','long','str'}`	use when with_label is True	`'int'`
`output_format`	`{'dense', 'sparse'}`	output format	`'dense'`
`with_match_id`	`bool`	True if dataset has match_id, default: False	`False`

Source code in federatedml/param/data_transform_param.py

class DataTransformParam(BaseParam):
    """
    Define data transform parameters that used in federated ml.

    Parameters
    ----------
    input_format : {'dense', 'sparse', 'tag'}
        please have a look at this tutorial at "DataTransform" section of federatedml/util/README.md.
        Formally,
            dense input format data should be set to "dense",
            svm-light input format data should be set to "sparse",
            tag or tag:value input format data should be set to "tag".

    delimitor : str 
        the delimitor of data input, default: ','

    data_type : int
        {'float64','float','int','int64','str','long'}
        the data type of data input

    exclusive_data_type : dict 
        the key of dict is col_name, the value is data_type, use to specified special data type
        of some features.

    tag_with_value: bool
        use if input_format is 'tag', if tag_with_value is True,
        input column data format should be tag[delimitor]value, otherwise is tag only

    tag_value_delimitor: str
        use if input_format is 'tag' and 'tag_with_value' is True,
        delimitor of tag[delimitor]value column value.

    missing_fill : bool
        need to fill missing value or not, accepted only True/False, default: False

    default_value : None or object or list
        the value to replace missing value.
        if None, it will use default value define in federatedml/feature/imputer.py,
        if single object, will fill missing value with this object,
        if list, it's length should be the sample of input data' feature dimension,
        means that if some column happens to have missing values, it will replace it
        the value by element in the identical position of this list.

    missing_fill_method: None or str
        the method to replace missing value, should be one of [None, 'min', 'max', 'mean', 'designated']

    missing_impute: None or list
        element of list can be any type, or auto generated if value is None, define which values to be consider as missing

    outlier_replace: bool
        need to replace outlier value or not, accepted only True/False, default: True

    outlier_replace_method: None or str
        the method to replace missing value, should be one of [None, 'min', 'max', 'mean', 'designated']

    outlier_impute: None or list
        element of list can be any type, which values should be regard as missing value

    outlier_replace_value: None or object or list
        the value to replace outlier.
        if None, it will use default value define in federatedml/feature/imputer.py,
        if single object, will replace outlier with this object,
        if list, it's length should be the sample of input data' feature dimension,
        means that if some column happens to have outliers, it will replace it
        the value by element in the identical position of this list.

    with_label : bool
        True if input data consist of label, False otherwise. default: 'false'

    label_name : str
        column_name of the column where label locates, only use in dense-inputformat. default: 'y'

    label_type : {'int','int64','float','float64','long','str'}
        use when with_label is True

    output_format : {'dense', 'sparse'}
        output format

    with_match_id: bool
        True if dataset has match_id, default: False

    """
    def __init__(self, input_format="dense", delimitor=',', data_type='float64',
                 exclusive_data_type=None,
                 tag_with_value=False, tag_value_delimitor=":",
                 missing_fill=False, default_value=0, missing_fill_method=None,
                 missing_impute=None, outlier_replace=False, outlier_replace_method=None,
                 outlier_impute=None, outlier_replace_value=0,
                 with_label=False, label_name='y',
                 label_type='int', output_format='dense', need_run=True,
                 with_match_id=False):
        self.input_format = input_format
        self.delimitor = delimitor
        self.data_type = data_type
        self.exclusive_data_type = exclusive_data_type
        self.tag_with_value = tag_with_value
        self.tag_value_delimitor = tag_value_delimitor
        self.missing_fill = missing_fill
        self.default_value = default_value
        self.missing_fill_method = missing_fill_method
        self.missing_impute = missing_impute
        self.outlier_replace = outlier_replace
        self.outlier_replace_method = outlier_replace_method
        self.outlier_impute = outlier_impute
        self.outlier_replace_value = outlier_replace_value
        self.with_label = with_label
        self.label_name = label_name
        self.label_type = label_type
        self.output_format = output_format
        self.need_run = need_run
        self.with_match_id = with_match_id

    def check(self):

        descr = "data_transform param's"

        self.input_format = self.check_and_change_lower(self.input_format,
                                                        ["dense", "sparse", "tag"],
                                                        descr)

        self.output_format = self.check_and_change_lower(self.output_format,
                                                         ["dense", "sparse"],
                                                         descr)

        self.data_type = self.check_and_change_lower(self.data_type,
                                                     ["int", "int64", "float", "float64", "str", "long"],
                                                     descr)

        if type(self.missing_fill).__name__ != 'bool':
            raise ValueError("data_transform param's missing_fill {} not supported".format(self.missing_fill))

        if self.missing_fill_method is not None:
            self.missing_fill_method = self.check_and_change_lower(self.missing_fill_method,
                                                                   ['min', 'max', 'mean', 'designated'],
                                                                   descr)

        if self.outlier_replace_method is not None:
            self.outlier_replace_method = self.check_and_change_lower(self.outlier_replace_method,
                                                                      ['min', 'max', 'mean', 'designated'],
                                                                      descr)

        if type(self.with_label).__name__ != 'bool':
            raise ValueError("data_transform param's with_label {} not supported".format(self.with_label))

        if self.with_label:
            if not isinstance(self.label_name, str):
                raise ValueError("data transform param's label_name {} should be str".format(self.label_name))

            self.label_type = self.check_and_change_lower(self.label_type,
                                                          ["int", "int64", "float", "float64", "str", "long"],
                                                          descr)

        if self.exclusive_data_type is not None and not isinstance(self.exclusive_data_type, dict):
            raise ValueError("exclusive_data_type is should be None or a dict")

        if not isinstance(self.with_match_id, bool):
            raise ValueError("with_match_id should be boolean variable, but {} find".format(self.with_match_id))

        return True

__init__(self, input_format='dense', delimitor=',', data_type='float64', exclusive_data_type=None, tag_with_value=False, tag_value_delimitor=':', missing_fill=False, default_value=0, missing_fill_method=None, missing_impute=None, outlier_replace=False, outlier_replace_method=None, outlier_impute=None, outlier_replace_value=0, with_label=False, label_name='y', label_type='int', output_format='dense', need_run=True, with_match_id=False)

special ¶

Source code in federatedml/param/data_transform_param.py

def __init__(self, input_format="dense", delimitor=',', data_type='float64',
             exclusive_data_type=None,
             tag_with_value=False, tag_value_delimitor=":",
             missing_fill=False, default_value=0, missing_fill_method=None,
             missing_impute=None, outlier_replace=False, outlier_replace_method=None,
             outlier_impute=None, outlier_replace_value=0,
             with_label=False, label_name='y',
             label_type='int', output_format='dense', need_run=True,
             with_match_id=False):
    self.input_format = input_format
    self.delimitor = delimitor
    self.data_type = data_type
    self.exclusive_data_type = exclusive_data_type
    self.tag_with_value = tag_with_value
    self.tag_value_delimitor = tag_value_delimitor
    self.missing_fill = missing_fill
    self.default_value = default_value
    self.missing_fill_method = missing_fill_method
    self.missing_impute = missing_impute
    self.outlier_replace = outlier_replace
    self.outlier_replace_method = outlier_replace_method
    self.outlier_impute = outlier_impute
    self.outlier_replace_value = outlier_replace_value
    self.with_label = with_label
    self.label_name = label_name
    self.label_type = label_type
    self.output_format = output_format
    self.need_run = need_run
    self.with_match_id = with_match_id

check(self) ¶

Source code in federatedml/param/data_transform_param.py

def check(self):

    descr = "data_transform param's"

    self.input_format = self.check_and_change_lower(self.input_format,
                                                    ["dense", "sparse", "tag"],
                                                    descr)

    self.output_format = self.check_and_change_lower(self.output_format,
                                                     ["dense", "sparse"],
                                                     descr)

    self.data_type = self.check_and_change_lower(self.data_type,
                                                 ["int", "int64", "float", "float64", "str", "long"],
                                                 descr)

    if type(self.missing_fill).__name__ != 'bool':
        raise ValueError("data_transform param's missing_fill {} not supported".format(self.missing_fill))

    if self.missing_fill_method is not None:
        self.missing_fill_method = self.check_and_change_lower(self.missing_fill_method,
                                                               ['min', 'max', 'mean', 'designated'],
                                                               descr)

    if self.outlier_replace_method is not None:
        self.outlier_replace_method = self.check_and_change_lower(self.outlier_replace_method,
                                                                  ['min', 'max', 'mean', 'designated'],
                                                                  descr)

    if type(self.with_label).__name__ != 'bool':
        raise ValueError("data_transform param's with_label {} not supported".format(self.with_label))

    if self.with_label:
        if not isinstance(self.label_name, str):
            raise ValueError("data transform param's label_name {} should be str".format(self.label_name))

        self.label_type = self.check_and_change_lower(self.label_type,
                                                      ["int", "int64", "float", "float64", "str", "long"],
                                                      descr)

    if self.exclusive_data_type is not None and not isinstance(self.exclusive_data_type, dict):
        raise ValueError("exclusive_data_type is should be None or a dict")

    if not isinstance(self.with_match_id, bool):
        raise ValueError("with_match_id should be boolean variable, but {} find".format(self.with_match_id))

    return True

`dataio_param` ¶

Classes¶


DataIOParam            (BaseParam)

¶

Define dataio parameters that used in federated ml.

Parameters:

Name	Type	Description	Default
`input_format`	`{'dense', 'sparse', 'tag'}`	please have a look at this tutorial at "DataIO" section of federatedml/util/README.md. Formally, dense input format data should be set to "dense", svm-light input format data should be set to "sparse", tag or tag:value input format data should be set to "tag".	`'dense'`
`delimitor`	`str`	the delimitor of data input, default: ','	`','`
`data_type`	`{'float64', 'float', 'int', 'int64', 'str', 'long'}`	the data type of data input	`'float64'`
`exclusive_data_type`	`dict`	the key of dict is col_name, the value is data_type, use to specified special data type of some features.	`None`
`tag_with_value`	`bool`	use if input_format is 'tag', if tag_with_value is True, input column data format should be tag[delimitor]value, otherwise is tag only	`False`
`tag_value_delimitor`	`str`	use if input_format is 'tag' and 'tag_with_value' is True, delimitor of tag[delimitor]value column value.	`':'`
`missing_fill`	`bool`	need to fill missing value or not, accepted only True/False, default: False	`False`
`default_value`	`None or object or list`	the value to replace missing value. if None, it will use default value define in federatedml/feature/imputer.py, if single object, will fill missing value with this object, if list, it's length should be the sample of input data' feature dimension, means that if some column happens to have missing values, it will replace it the value by element in the identical position of this list.	`0`
`missing_fill_method`	`{None, 'min', 'max', 'mean', 'designated'}`	the method to replace missing value	`None`
`missing_impute`	`None or list`	element of list can be any type, or auto generated if value is None, define which values to be consider as missing	`None`
`outlier_replace`	`bool`	need to replace outlier value or not, accepted only True/False, default: True	`False`
`outlier_replace_method`	`{None, 'min', 'max', 'mean', 'designated'}`	the method to replace missing value	`None`
`outlier_impute`	`None or list`	element of list can be any type, which values should be regard as missing value, default: None	`None`
`outlier_replace_value`	`None or object or list`	the value to replace outlier. if None, it will use default value define in federatedml/feature/imputer.py, if single object, will replace outlier with this object, if list, it's length should be the sample of input data' feature dimension, means that if some column happens to have outliers, it will replace it the value by element in the identical position of this list.	`0`
`with_label`	`bool`	True if input data consist of label, False otherwise. default: 'false'	`False`
`label_name`	`str`	column_name of the column where label locates, only use in dense-inputformat. default: 'y'	`'y'`
`label_type`	`{'int', 'int64', 'float', 'float64', 'long', 'str'}`	use when with_label is True.	`'int'`
`output_format`	`{'dense', 'sparse'}`	output format	`'dense'`

Source code in federatedml/param/dataio_param.py

class DataIOParam(BaseParam):
    """
    Define dataio parameters that used in federated ml.

    Parameters
    ----------
    input_format : {'dense', 'sparse', 'tag'}
        please have a look at this tutorial at "DataIO" section of federatedml/util/README.md.
        Formally,
            dense input format data should be set to "dense",
            svm-light input format data should be set to "sparse",
            tag or tag:value input format data should be set to "tag".

    delimitor : str
        the delimitor of data input, default: ','

    data_type : {'float64', 'float', 'int', 'int64', 'str', 'long'}
        the data type of data input

    exclusive_data_type : dict
        the key of dict is col_name, the value is data_type, use to specified special data type 
        of some features.

    tag_with_value: bool
        use if input_format is 'tag', if tag_with_value is True,
        input column data format should be tag[delimitor]value, otherwise is tag only

    tag_value_delimitor: str
        use if input_format is 'tag' and 'tag_with_value' is True,
        delimitor of tag[delimitor]value column value.

    missing_fill : bool
        need to fill missing value or not, accepted only True/False, default: False

    default_value : None or object or list
        the value to replace missing value.
            if None, it will use default value define in federatedml/feature/imputer.py,
            if single object, will fill missing value with this object,
            if list, it's length should be the sample of input data' feature dimension,
                means that if some column happens to have missing values, it will replace it
                the value by element in the identical position of this list.

    missing_fill_method : {None, 'min', 'max', 'mean', 'designated'}
        the method to replace missing value

    missing_impute: None or list
        element of list can be any type, or auto generated if value is None, define which values to be consider as missing

    outlier_replace: bool
        need to replace outlier value or not, accepted only True/False, default: True

    outlier_replace_method : {None, 'min', 'max', 'mean', 'designated'}
        the method to replace missing value

    outlier_impute: None or list
        element of list can be any type, which values should be regard as missing value, default: None

    outlier_replace_value : None or object or list
        the value to replace outlier.
            if None, it will use default value define in federatedml/feature/imputer.py,
            if single object, will replace outlier with this object,
            if list, it's length should be the sample of input data' feature dimension,
                means that if some column happens to have outliers, it will replace it
                the value by element in the identical position of this list.

    with_label : bool
        True if input data consist of label, False otherwise. default: 'false'

    label_name : str
        column_name of the column where label locates, only use in dense-inputformat. default: 'y'

    label_type : {'int', 'int64', 'float', 'float64', 'long', 'str'}
        use when with_label is True.

    output_format : {'dense', 'sparse'}
        output format

    """

    def __init__(self, input_format="dense", delimitor=',', data_type='float64',
                 exclusive_data_type=None,
                 tag_with_value=False, tag_value_delimitor=":",
                 missing_fill=False, default_value=0, missing_fill_method=None,
                 missing_impute=None, outlier_replace=False, outlier_replace_method=None,
                 outlier_impute=None, outlier_replace_value=0,
                 with_label=False, label_name='y',
                 label_type='int', output_format='dense', need_run=True):
        self.input_format = input_format
        self.delimitor = delimitor
        self.data_type = data_type
        self.exclusive_data_type = exclusive_data_type
        self.tag_with_value = tag_with_value
        self.tag_value_delimitor = tag_value_delimitor
        self.missing_fill = missing_fill
        self.default_value = default_value
        self.missing_fill_method = missing_fill_method
        self.missing_impute = missing_impute
        self.outlier_replace = outlier_replace
        self.outlier_replace_method = outlier_replace_method
        self.outlier_impute = outlier_impute
        self.outlier_replace_value = outlier_replace_value
        self.with_label = with_label
        self.label_name = label_name
        self.label_type = label_type
        self.output_format = output_format
        self.need_run = need_run

    def check(self):

        descr = "dataio param's"

        self.input_format = self.check_and_change_lower(self.input_format,
                                                        ["dense", "sparse", "tag"],
                                                        descr)

        self.output_format = self.check_and_change_lower(self.output_format,
                                                         ["dense", "sparse"],
                                                         descr)

        self.data_type = self.check_and_change_lower(self.data_type,
                                                     ["int", "int64", "float", "float64", "str", "long"],
                                                     descr)

        if type(self.missing_fill).__name__ != 'bool':
            raise ValueError("dataio param's missing_fill {} not supported".format(self.missing_fill))

        if self.missing_fill_method is not None:
            self.missing_fill_method = self.check_and_change_lower(self.missing_fill_method,
                                                                   ['min', 'max', 'mean', 'designated'],
                                                                   descr)

        if self.outlier_replace_method is not None:
            self.outlier_replace_method = self.check_and_change_lower(self.outlier_replace_method,
                                                                      ['min', 'max', 'mean', 'designated'],
                                                                      descr)

        if type(self.with_label).__name__ != 'bool':
            raise ValueError("dataio param's with_label {} not supported".format(self.with_label))

        if self.with_label:
            if not isinstance(self.label_name, str):
                raise ValueError("dataio param's label_name {} should be str".format(self.label_name))

            self.label_type = self.check_and_change_lower(self.label_type,
                                                          ["int", "int64", "float", "float64", "str", "long"],
                                                          descr)

        if self.exclusive_data_type is not None and not isinstance(self.exclusive_data_type, dict):
            raise ValueError("exclusive_data_type is should be None or a dict")

        return True

__init__(self, input_format='dense', delimitor=',', data_type='float64', exclusive_data_type=None, tag_with_value=False, tag_value_delimitor=':', missing_fill=False, default_value=0, missing_fill_method=None, missing_impute=None, outlier_replace=False, outlier_replace_method=None, outlier_impute=None, outlier_replace_value=0, with_label=False, label_name='y', label_type='int', output_format='dense', need_run=True)

special ¶

Source code in federatedml/param/dataio_param.py

def __init__(self, input_format="dense", delimitor=',', data_type='float64',
             exclusive_data_type=None,
             tag_with_value=False, tag_value_delimitor=":",
             missing_fill=False, default_value=0, missing_fill_method=None,
             missing_impute=None, outlier_replace=False, outlier_replace_method=None,
             outlier_impute=None, outlier_replace_value=0,
             with_label=False, label_name='y',
             label_type='int', output_format='dense', need_run=True):
    self.input_format = input_format
    self.delimitor = delimitor
    self.data_type = data_type
    self.exclusive_data_type = exclusive_data_type
    self.tag_with_value = tag_with_value
    self.tag_value_delimitor = tag_value_delimitor
    self.missing_fill = missing_fill
    self.default_value = default_value
    self.missing_fill_method = missing_fill_method
    self.missing_impute = missing_impute
    self.outlier_replace = outlier_replace
    self.outlier_replace_method = outlier_replace_method
    self.outlier_impute = outlier_impute
    self.outlier_replace_value = outlier_replace_value
    self.with_label = with_label
    self.label_name = label_name
    self.label_type = label_type
    self.output_format = output_format
    self.need_run = need_run

check(self) ¶

Source code in federatedml/param/dataio_param.py

def check(self):

    descr = "dataio param's"

    self.input_format = self.check_and_change_lower(self.input_format,
                                                    ["dense", "sparse", "tag"],
                                                    descr)

    self.output_format = self.check_and_change_lower(self.output_format,
                                                     ["dense", "sparse"],
                                                     descr)

    self.data_type = self.check_and_change_lower(self.data_type,
                                                 ["int", "int64", "float", "float64", "str", "long"],
                                                 descr)

    if type(self.missing_fill).__name__ != 'bool':
        raise ValueError("dataio param's missing_fill {} not supported".format(self.missing_fill))

    if self.missing_fill_method is not None:
        self.missing_fill_method = self.check_and_change_lower(self.missing_fill_method,
                                                               ['min', 'max', 'mean', 'designated'],
                                                               descr)

    if self.outlier_replace_method is not None:
        self.outlier_replace_method = self.check_and_change_lower(self.outlier_replace_method,
                                                                  ['min', 'max', 'mean', 'designated'],
                                                                  descr)

    if type(self.with_label).__name__ != 'bool':
        raise ValueError("dataio param's with_label {} not supported".format(self.with_label))

    if self.with_label:
        if not isinstance(self.label_name, str):
            raise ValueError("dataio param's label_name {} should be str".format(self.label_name))

        self.label_type = self.check_and_change_lower(self.label_type,
                                                      ["int", "int64", "float", "float64", "str", "long"],
                                                      descr)

    if self.exclusive_data_type is not None and not isinstance(self.exclusive_data_type, dict):
        raise ValueError("exclusive_data_type is should be None or a dict")

    return True

`encrypt_param` ¶

Classes¶


EncryptParam            (BaseParam)

¶

Define encryption method that used in federated ml.

Parameters:

Name	Type	Description	Default
`method`	`{'Paillier', 'IterativeAffine', 'RandomIterativeAffine'}`	If method is 'Paillier', Paillier encryption will be used for federated ml. To use non-encryption version in HomoLR, set this to None. For detail of Paillier encryption, please check out the paper mentioned in README file.	`'Paillier'`
`key_length`	`int, default: 1024`	Used to specify the length of key in this encryption method.	`1024`

Source code in federatedml/param/encrypt_param.py

class EncryptParam(BaseParam):
    """
    Define encryption method that used in federated ml.

    Parameters
    ----------
    method : {'Paillier', 'IterativeAffine', 'RandomIterativeAffine'}
        If method is 'Paillier', Paillier encryption will be used for federated ml.
        To use non-encryption version in HomoLR, set this to None.
        For detail of Paillier encryption, please check out the paper mentioned in README file.

    key_length : int, default: 1024
        Used to specify the length of key in this encryption method.

    """

    def __init__(self, method=consts.PAILLIER, key_length=1024):
        super(EncryptParam, self).__init__()
        self.method = method
        self.key_length = key_length

    def check(self):
        if self.method is not None and type(self.method).__name__ != "str":
            raise ValueError(
                "encrypt_param's method {} not supported, should be str type".format(
                    self.method))
        elif self.method is None:
            pass
        else:
            user_input = self.method.lower()
            if user_input == "paillier":
                self.method = consts.PAILLIER
            elif user_input == "iterativeaffine":
                self.method = consts.ITERATIVEAFFINE
            elif user_input == "randomiterativeaffine":
                self.method = consts.RANDOM_ITERATIVEAFFINE
            else:
                raise ValueError(
                    "encrypt_param's method {} not supported".format(user_input))

        if type(self.key_length).__name__ != "int":
            raise ValueError(
                "encrypt_param's key_length {} not supported, should be int type".format(self.key_length))
        elif self.key_length <= 0:
            raise ValueError(
                "encrypt_param's key_length must be greater or equal to 1")

        LOGGER.debug("Finish encrypt parameter check!")
        return True

__init__(self, method='Paillier', key_length=1024) special ¶

Source code in federatedml/param/encrypt_param.py

def __init__(self, method=consts.PAILLIER, key_length=1024):
    super(EncryptParam, self).__init__()
    self.method = method
    self.key_length = key_length

check(self) ¶

Source code in federatedml/param/encrypt_param.py

def check(self):
    if self.method is not None and type(self.method).__name__ != "str":
        raise ValueError(
            "encrypt_param's method {} not supported, should be str type".format(
                self.method))
    elif self.method is None:
        pass
    else:
        user_input = self.method.lower()
        if user_input == "paillier":
            self.method = consts.PAILLIER
        elif user_input == "iterativeaffine":
            self.method = consts.ITERATIVEAFFINE
        elif user_input == "randomiterativeaffine":
            self.method = consts.RANDOM_ITERATIVEAFFINE
        else:
            raise ValueError(
                "encrypt_param's method {} not supported".format(user_input))

    if type(self.key_length).__name__ != "int":
        raise ValueError(
            "encrypt_param's key_length {} not supported, should be int type".format(self.key_length))
    elif self.key_length <= 0:
        raise ValueError(
            "encrypt_param's key_length must be greater or equal to 1")

    LOGGER.debug("Finish encrypt parameter check!")
    return True

`encrypted_mode_calculation_param` ¶

Classes¶


EncryptedModeCalculatorParam            (BaseParam)

¶

Define the encrypted_mode_calulator parameters.

Parameters:

Name	Type	Description	Default
`mode`	`{'strict', 'fast', 'balance', 'confusion_opt'}`	encrypted mode, default: strict	`'strict'`
`re_encrypted_rate`	`float or int`	numeric number in [0, 1], use when mode equals to 'balance', default: 1	`1`

Source code in federatedml/param/encrypted_mode_calculation_param.py

class EncryptedModeCalculatorParam(BaseParam):
    """
    Define the encrypted_mode_calulator parameters.

    Parameters
    ----------
    mode: {'strict', 'fast', 'balance', 'confusion_opt'}
        encrypted mode, default: strict

    re_encrypted_rate: float or int
        numeric number in [0, 1], use when mode equals to 'balance', default: 1
    """

    def __init__(self, mode="strict", re_encrypted_rate=1):
        self.mode = mode
        self.re_encrypted_rate = re_encrypted_rate

    def check(self):
        descr = "encrypted_mode_calculator param"
        self.mode = self.check_and_change_lower(self.mode,
                                                ["strict", "fast", "balance", "confusion_opt", "confusion_opt_balance"],
                                                descr)

        if self.mode in ["balance", "confusion_opt_balance"]:
            if type(self.re_encrypted_rate).__name__ not in ["int", "long", "float"]:
                raise ValueError("re_encrypted_rate should be a numeric number")

            if not 0.0 <= self.re_encrypted_rate <= 1:
                raise ValueError("re_encrypted_rate should  in [0, 1]")

        return True

__init__(self, mode='strict', re_encrypted_rate=1) special ¶

Source code in federatedml/param/encrypted_mode_calculation_param.py

def __init__(self, mode="strict", re_encrypted_rate=1):
    self.mode = mode
    self.re_encrypted_rate = re_encrypted_rate

check(self) ¶

Source code in federatedml/param/encrypted_mode_calculation_param.py

def check(self):
    descr = "encrypted_mode_calculator param"
    self.mode = self.check_and_change_lower(self.mode,
                                            ["strict", "fast", "balance", "confusion_opt", "confusion_opt_balance"],
                                            descr)

    if self.mode in ["balance", "confusion_opt_balance"]:
        if type(self.re_encrypted_rate).__name__ not in ["int", "long", "float"]:
            raise ValueError("re_encrypted_rate should be a numeric number")

        if not 0.0 <= self.re_encrypted_rate <= 1:
            raise ValueError("re_encrypted_rate should  in [0, 1]")

    return True

`evaluation_param` ¶

Classes¶


EvaluateParam            (BaseParam)

¶

Define the evaluation method of binary/multiple classification and regression

Parameters:

Name	Type	Description	Default
`eval_type`	`{'binary', 'regression', 'multi'}`	support 'binary' for HomoLR, HeteroLR and Secureboosting, support 'regression' for Secureboosting, 'multi' is not support these version	`'binary'`
`unfold_multi_result`	`bool`	unfold multi result and get several one-vs-rest binary classification results	`False`
`pos_label`	`int or float or str`	specify positive label type, depend on the data's label. this parameter effective only for 'binary'	`1`
`need_run`	`bool, default True`	Indicate if this module needed to be run	`True`

Source code in federatedml/param/evaluation_param.py

class EvaluateParam(BaseParam):
    """
    Define the evaluation method of binary/multiple classification and regression

    Parameters
    ----------
    eval_type : {'binary', 'regression', 'multi'}
        support 'binary' for HomoLR, HeteroLR and Secureboosting,
        support 'regression' for Secureboosting,
        'multi' is not support these version

    unfold_multi_result : bool
        unfold multi result and get several one-vs-rest binary classification results

    pos_label : int or float or str
        specify positive label type, depend on the data's label. this parameter effective only for 'binary'

    need_run: bool, default True
        Indicate if this module needed to be run
    """

    def __init__(self, eval_type="binary", pos_label=1, need_run=True, metrics=None,
                 run_clustering_arbiter_metric=False, unfold_multi_result=False):
        super().__init__()
        self.eval_type = eval_type
        self.pos_label = pos_label
        self.need_run = need_run
        self.metrics = metrics
        self.unfold_multi_result = unfold_multi_result
        self.run_clustering_arbiter_metric = run_clustering_arbiter_metric

        self.default_metrics = {
            consts.BINARY: consts.ALL_BINARY_METRICS,
            consts.MULTY: consts.ALL_MULTI_METRICS,
            consts.REGRESSION: consts.ALL_REGRESSION_METRICS,
            consts.CLUSTERING: consts.ALL_CLUSTER_METRICS
        }

        self.allowed_metrics = {
            consts.BINARY: consts.ALL_BINARY_METRICS,
            consts.MULTY: consts.ALL_MULTI_METRICS,
            consts.REGRESSION: consts.ALL_REGRESSION_METRICS,
            consts.CLUSTERING: consts.ALL_CLUSTER_METRICS
        }

    def _use_single_value_default_metrics(self):

        self.default_metrics = {
            consts.BINARY: consts.DEFAULT_BINARY_METRIC,
            consts.MULTY: consts.DEFAULT_MULTI_METRIC,
            consts.REGRESSION: consts.DEFAULT_REGRESSION_METRIC,
            consts.CLUSTERING: consts.DEFAULT_CLUSTER_METRIC
        }

    def _check_valid_metric(self, metrics_list):

        metric_list = consts.ALL_METRIC_NAME
        alias_name: dict = consts.ALIAS

        full_name_list = []

        metrics_list = [str.lower(i) for i in metrics_list]

        for metric in metrics_list:

            if metric in metric_list:
                if metric not in full_name_list:
                    full_name_list.append(metric)
                continue

            valid_flag = False
            for alias, full_name in alias_name.items():
                if metric in alias:
                    if full_name not in full_name_list:
                        full_name_list.append(full_name)
                    valid_flag = True
                    break

            if not valid_flag:
                raise ValueError('metric {} is not supported'.format(metric))

        allowed_metrics = self.allowed_metrics[self.eval_type]

        for m in full_name_list:
            if m not in allowed_metrics:
                raise ValueError('metric {} is not used for {} task'.format(m, self.eval_type))

        if consts.RECALL in full_name_list and consts.PRECISION not in full_name_list:
            full_name_list.append(consts.PRECISION)

        if consts.RECALL not in full_name_list and consts.PRECISION in full_name_list:
            full_name_list.append(consts.RECALL)

        return full_name_list

    def check(self):

        descr = "evaluate param's "
        self.eval_type = self.check_and_change_lower(self.eval_type,
                                                       [consts.BINARY, consts.MULTY, consts.REGRESSION,
                                                        consts.CLUSTERING],
                                                       descr)

        if type(self.pos_label).__name__ not in ["str", "float", "int"]:
            raise ValueError(
                "evaluate param's pos_label {} not supported, should be str or float or int type".format(
                    self.pos_label))

        if type(self.need_run).__name__ != "bool":
            raise ValueError(
                "evaluate param's need_run {} not supported, should be bool".format(
                    self.need_run))

        if self.metrics is None or len(self.metrics) == 0:
            self.metrics = self.default_metrics[self.eval_type]
            LOGGER.warning('use default metric {} for eval type {}'.format(self.metrics, self.eval_type)) 

        self.check_boolean(self.unfold_multi_result, 'multi_result_unfold')

        self.metrics = self._check_valid_metric(self.metrics)

        LOGGER.info("Finish evaluation parameter check!")

        return True

    def check_single_value_default_metric(self):
        self._use_single_value_default_metrics()

        # in validation strategy, psi f1-score and confusion-mat pr-quantile are not supported in cur version
        if self.metrics is None or len(self.metrics) == 0:
            self.metrics = self.default_metrics[self.eval_type]
            LOGGER.warning('use default metric {} for eval type {}'.format(self.metrics, self.eval_type))

        ban_metric = [consts.PSI, consts.F1_SCORE, consts.CONFUSION_MAT, consts.QUANTILE_PR]
        for metric in self.metrics:
            if metric in ban_metric:
                self.metrics.remove(metric)
        self.check()

__init__(self, eval_type='binary', pos_label=1, need_run=True, metrics=None, run_clustering_arbiter_metric=False, unfold_multi_result=False)

special ¶

Source code in federatedml/param/evaluation_param.py

def __init__(self, eval_type="binary", pos_label=1, need_run=True, metrics=None,
             run_clustering_arbiter_metric=False, unfold_multi_result=False):
    super().__init__()
    self.eval_type = eval_type
    self.pos_label = pos_label
    self.need_run = need_run
    self.metrics = metrics
    self.unfold_multi_result = unfold_multi_result
    self.run_clustering_arbiter_metric = run_clustering_arbiter_metric

    self.default_metrics = {
        consts.BINARY: consts.ALL_BINARY_METRICS,
        consts.MULTY: consts.ALL_MULTI_METRICS,
        consts.REGRESSION: consts.ALL_REGRESSION_METRICS,
        consts.CLUSTERING: consts.ALL_CLUSTER_METRICS
    }

    self.allowed_metrics = {
        consts.BINARY: consts.ALL_BINARY_METRICS,
        consts.MULTY: consts.ALL_MULTI_METRICS,
        consts.REGRESSION: consts.ALL_REGRESSION_METRICS,
        consts.CLUSTERING: consts.ALL_CLUSTER_METRICS
    }

check(self) ¶

Source code in federatedml/param/evaluation_param.py

def check(self):

    descr = "evaluate param's "
    self.eval_type = self.check_and_change_lower(self.eval_type,
                                                   [consts.BINARY, consts.MULTY, consts.REGRESSION,
                                                    consts.CLUSTERING],
                                                   descr)

    if type(self.pos_label).__name__ not in ["str", "float", "int"]:
        raise ValueError(
            "evaluate param's pos_label {} not supported, should be str or float or int type".format(
                self.pos_label))

    if type(self.need_run).__name__ != "bool":
        raise ValueError(
            "evaluate param's need_run {} not supported, should be bool".format(
                self.need_run))

    if self.metrics is None or len(self.metrics) == 0:
        self.metrics = self.default_metrics[self.eval_type]
        LOGGER.warning('use default metric {} for eval type {}'.format(self.metrics, self.eval_type)) 

    self.check_boolean(self.unfold_multi_result, 'multi_result_unfold')

    self.metrics = self._check_valid_metric(self.metrics)

    LOGGER.info("Finish evaluation parameter check!")

    return True

check_single_value_default_metric(self) ¶

Source code in federatedml/param/evaluation_param.py

def check_single_value_default_metric(self):
    self._use_single_value_default_metrics()

    # in validation strategy, psi f1-score and confusion-mat pr-quantile are not supported in cur version
    if self.metrics is None or len(self.metrics) == 0:
        self.metrics = self.default_metrics[self.eval_type]
        LOGGER.warning('use default metric {} for eval type {}'.format(self.metrics, self.eval_type))

    ban_metric = [consts.PSI, consts.F1_SCORE, consts.CONFUSION_MAT, consts.QUANTILE_PR]
    for metric in self.metrics:
        if metric in ban_metric:
            self.metrics.remove(metric)
    self.check()

`feature_binning_param` ¶

Classes¶


TransformParam            (BaseParam)

¶

Define how to transfer the cols

Parameters:

Name	Type	Description	Default
`transform_cols`	`list of column index, default: -1`	Specify which columns need to be transform. If column index is None, None of columns will be transformed. If it is -1, it will use same columns as cols in binning module.	`-1`
`transform_names`	`list of string, default: []`	Specify which columns need to calculated. Each element in the list represent for a column name in header.	`None`
`transform_type`	`{'bin_num', 'woe', None}`	Specify which value these columns going to replace. 1. bin_num: Transfer original feature value to bin index in which this value belongs to. 2. woe: This is valid for guest party only. It will replace original value to its woe value 3. None: nothing will be replaced.	`'bin_num'`

Source code in federatedml/param/feature_binning_param.py

class TransformParam(BaseParam):
    """
    Define how to transfer the cols

    Parameters
    ----------
    transform_cols : list of column index, default: -1
        Specify which columns need to be transform. If column index is None, None of columns will be transformed.
        If it is -1, it will use same columns as cols in binning module.

    transform_names: list of string, default: []
        Specify which columns need to calculated. Each element in the list represent for a column name in header.


    transform_type: {'bin_num', 'woe', None}
        Specify which value these columns going to replace.
         1. bin_num: Transfer original feature value to bin index in which this value belongs to.
         2. woe: This is valid for guest party only. It will replace original value to its woe value
         3. None: nothing will be replaced.
    """

    def __init__(self, transform_cols=-1, transform_names=None, transform_type="bin_num"):
        super(TransformParam, self).__init__()
        self.transform_cols = transform_cols
        self.transform_names = transform_names
        self.transform_type = transform_type

    def check(self):
        descr = "Transform Param's "
        if self.transform_cols is not None and self.transform_cols != -1:
            self.check_defined_type(self.transform_cols, descr, ['list'])
        self.check_defined_type(self.transform_names, descr, ['list', "NoneType"])
        if self.transform_names is not None:
            for name in self.transform_names:
                if not isinstance(name, str):
                    raise ValueError("Elements in transform_names should be string type")
        self.check_valid_value(self.transform_type, descr, ['bin_num', 'woe', None])

__init__(self, transform_cols=-1, transform_names=None, transform_type='bin_num') special ¶

Source code in federatedml/param/feature_binning_param.py

def __init__(self, transform_cols=-1, transform_names=None, transform_type="bin_num"):
    super(TransformParam, self).__init__()
    self.transform_cols = transform_cols
    self.transform_names = transform_names
    self.transform_type = transform_type

check(self) ¶

Source code in federatedml/param/feature_binning_param.py

def check(self):
    descr = "Transform Param's "
    if self.transform_cols is not None and self.transform_cols != -1:
        self.check_defined_type(self.transform_cols, descr, ['list'])
    self.check_defined_type(self.transform_names, descr, ['list', "NoneType"])
    if self.transform_names is not None:
        for name in self.transform_names:
            if not isinstance(name, str):
                raise ValueError("Elements in transform_names should be string type")
    self.check_valid_value(self.transform_type, descr, ['bin_num', 'woe', None])


OptimalBinningParam            (BaseParam)

¶

Indicate optimal binning params

Parameters:

Name	Type	Description	Default
`metric_method`	`str, default: "iv"`	The algorithm metric method. Support iv, gini, ks, chi-square	`'iv'`
`min_bin_pct`	`float, default: 0.05`	The minimum percentage of each bucket	`0.05`
`max_bin_pct`	`float, default: 1.0`	The maximum percentage of each bucket	`1.0`
`init_bin_nums`	`int, default 100`	Number of bins when initialize	`1000`
`mixture`	`bool, default: True`	Whether each bucket need event and non-event records	`True`
`init_bucket_method`	`str default: quantile`	Init bucket methods. Accept quantile and bucket.	`'quantile'`

Source code in federatedml/param/feature_binning_param.py

class OptimalBinningParam(BaseParam):
    """
    Indicate optimal binning params

    Parameters
    ----------
    metric_method: str, default: "iv"
        The algorithm metric method. Support iv, gini, ks, chi-square


    min_bin_pct: float, default: 0.05
        The minimum percentage of each bucket

    max_bin_pct: float, default: 1.0
        The maximum percentage of each bucket

    init_bin_nums: int, default 100
        Number of bins when initialize

    mixture: bool, default: True
        Whether each bucket need event and non-event records

    init_bucket_method: str default: quantile
        Init bucket methods. Accept quantile and bucket.

    """

    def __init__(self, metric_method='iv', min_bin_pct=0.05, max_bin_pct=1.0,
                 init_bin_nums=1000, mixture=True, init_bucket_method='quantile'):
        super().__init__()
        self.init_bucket_method = init_bucket_method
        self.metric_method = metric_method
        self.max_bin = None
        self.mixture = mixture
        self.max_bin_pct = max_bin_pct
        self.min_bin_pct = min_bin_pct
        self.init_bin_nums = init_bin_nums
        self.adjustment_factor = None

    def check(self):
        descr = "hetero binning's optimal binning param's"
        self.check_string(self.metric_method, descr)

        self.metric_method = self.metric_method.lower()
        if self.metric_method in ['chi_square', 'chi-square']:
            self.metric_method = 'chi_square'
        self.check_valid_value(self.metric_method, descr, ['iv', 'gini', 'chi_square', 'ks'])
        self.check_positive_integer(self.init_bin_nums, descr)

        self.init_bucket_method = self.init_bucket_method.lower()
        self.check_valid_value(self.init_bucket_method, descr, ['quantile', 'bucket'])

        if self.max_bin_pct not in [1, 0]:
            self.check_decimal_float(self.max_bin_pct, descr)
        if self.min_bin_pct not in [1, 0]:
            self.check_decimal_float(self.min_bin_pct, descr)
        if self.min_bin_pct > self.max_bin_pct:
            raise ValueError("Optimal binning's min_bin_pct should less or equal than max_bin_pct")

        self.check_boolean(self.mixture, descr)
        self.check_positive_integer(self.init_bin_nums, descr)

__init__(self, metric_method='iv', min_bin_pct=0.05, max_bin_pct=1.0, init_bin_nums=1000, mixture=True, init_bucket_method='quantile')

special ¶

Source code in federatedml/param/feature_binning_param.py

def __init__(self, metric_method='iv', min_bin_pct=0.05, max_bin_pct=1.0,
             init_bin_nums=1000, mixture=True, init_bucket_method='quantile'):
    super().__init__()
    self.init_bucket_method = init_bucket_method
    self.metric_method = metric_method
    self.max_bin = None
    self.mixture = mixture
    self.max_bin_pct = max_bin_pct
    self.min_bin_pct = min_bin_pct
    self.init_bin_nums = init_bin_nums
    self.adjustment_factor = None

check(self) ¶

Source code in federatedml/param/feature_binning_param.py

def check(self):
    descr = "hetero binning's optimal binning param's"
    self.check_string(self.metric_method, descr)

    self.metric_method = self.metric_method.lower()
    if self.metric_method in ['chi_square', 'chi-square']:
        self.metric_method = 'chi_square'
    self.check_valid_value(self.metric_method, descr, ['iv', 'gini', 'chi_square', 'ks'])
    self.check_positive_integer(self.init_bin_nums, descr)

    self.init_bucket_method = self.init_bucket_method.lower()
    self.check_valid_value(self.init_bucket_method, descr, ['quantile', 'bucket'])

    if self.max_bin_pct not in [1, 0]:
        self.check_decimal_float(self.max_bin_pct, descr)
    if self.min_bin_pct not in [1, 0]:
        self.check_decimal_float(self.min_bin_pct, descr)
    if self.min_bin_pct > self.max_bin_pct:
        raise ValueError("Optimal binning's min_bin_pct should less or equal than max_bin_pct")

    self.check_boolean(self.mixture, descr)
    self.check_positive_integer(self.init_bin_nums, descr)


FeatureBinningParam            (BaseParam)

¶

Define the feature binning method

Parameters:

Name	Type	Description	Default
`method`	`str, 'quantile'， 'bucket' or 'optimal', default: 'quantile'`	Binning method.	`'quantile'`
`compress_thres`	`int, default: 10000`	When the number of saved summaries exceed this threshold, it will call its compress function	`10000`
`head_size`	`int, default: 10000`	The buffer size to store inserted observations. When head list reach this buffer size, the QuantileSummaries object start to generate summary(or stats) and insert into its sampled list.	`10000`
`error`	`float, 0 <= error < 1 default: 0.001`	The error of tolerance of binning. The final split point comes from original data, and the rank of this value is close to the exact rank. More precisely, floor((p - 2 * error) * N) <= rank(x) <= ceil((p + 2 * error) * N) where p is the quantile in float, and N is total number of data.	`0.0001`
`bin_num`	`int, bin_num > 0, default: 10`	The max bin number for binning	`10`
`bin_indexes`	`list of int or int, default: -1`	Specify which columns need to be binned. -1 represent for all columns. If you need to indicate specific cols, provide a list of header index instead of -1.	`-1`
`bin_names`	`list of string, default: []`	Specify which columns need to calculated. Each element in the list represent for a column name in header.	`None`
`adjustment_factor`	`float, default: 0.5`	the adjustment factor when calculating WOE. This is useful when there is no event or non-event in a bin. Please note that this parameter will NOT take effect for setting in host.	`0.5`
`category_indexes`	`list of int or int, default: []`	Specify which columns are category features. -1 represent for all columns. List of int indicate a set of such features. For category features, bin_obj will take its original values as split_points and treat them as have been binned. If this is not what you expect, please do NOT put it into this parameters. The number of categories should not exceed bin_num set above.	`None`
`category_names`	`list of string, default: []`	Use column names to specify category features. Each element in the list represent for a column name in header.	`None`
`local_only`	`bool, default: False`	Whether just provide binning method to guest party. If true, host party will do nothing. Warnings: This parameter will be deprecated in future version.	`False`
`transform_param`	`TransformParam`	Define how to transfer the binned data.	`<federatedml.param.feature_binning_param.TransformParam object at 0x7f3f8a697a90>`
`need_run`	`bool, default True`	Indicate if this module needed to be run	`True`
`skip_static`	`bool, default False`	If true, binning will not calculate iv, woe etc. In this case, optimal-binning will not be supported.	`False`

Source code in federatedml/param/feature_binning_param.py

class FeatureBinningParam(BaseParam):
    """
    Define the feature binning method

    Parameters
    ----------
    method : str, 'quantile'， 'bucket' or 'optimal', default: 'quantile'
        Binning method.

    compress_thres: int, default: 10000
        When the number of saved summaries exceed this threshold, it will call its compress function

    head_size: int, default: 10000
        The buffer size to store inserted observations. When head list reach this buffer size, the
        QuantileSummaries object start to generate summary(or stats) and insert into its sampled list.

    error: float, 0 <= error < 1 default: 0.001
        The error of tolerance of binning. The final split point comes from original data, and the rank
        of this value is close to the exact rank. More precisely,
        floor((p - 2 * error) * N) <= rank(x) <= ceil((p + 2 * error) * N)
        where p is the quantile in float, and N is total number of data.

    bin_num: int, bin_num > 0, default: 10
        The max bin number for binning

    bin_indexes : list of int or int, default: -1
        Specify which columns need to be binned. -1 represent for all columns. If you need to indicate specific
        cols, provide a list of header index instead of -1.

    bin_names : list of string, default: []
        Specify which columns need to calculated. Each element in the list represent for a column name in header.

    adjustment_factor : float, default: 0.5
        the adjustment factor when calculating WOE. This is useful when there is no event or non-event in
        a bin. Please note that this parameter will NOT take effect for setting in host.

    category_indexes : list of int or int, default: []
        Specify which columns are category features. -1 represent for all columns. List of int indicate a set of
        such features. For category features, bin_obj will take its original values as split_points and treat them
        as have been binned. If this is not what you expect, please do NOT put it into this parameters.

        The number of categories should not exceed bin_num set above.

    category_names : list of string, default: []
        Use column names to specify category features. Each element in the list represent for a column name in header.

    local_only : bool, default: False
        Whether just provide binning method to guest party. If true, host party will do nothing.
        Warnings: This parameter will be deprecated in future version.

    transform_param: TransformParam
        Define how to transfer the binned data.

    need_run: bool, default True
        Indicate if this module needed to be run

    skip_static: bool, default False
        If true, binning will not calculate iv, woe etc. In this case, optimal-binning
        will not be supported.

    """

    def __init__(self, method=consts.QUANTILE,
                 compress_thres=consts.DEFAULT_COMPRESS_THRESHOLD,
                 head_size=consts.DEFAULT_HEAD_SIZE,
                 error=consts.DEFAULT_RELATIVE_ERROR,
                 bin_num=consts.G_BIN_NUM, bin_indexes=-1, bin_names=None, adjustment_factor=0.5,
                 transform_param=TransformParam(),
                 local_only=False,
                 category_indexes=None, category_names=None,
                 need_run=True, skip_static=False):
        super(FeatureBinningParam, self).__init__()
        self.method = method
        self.compress_thres = compress_thres
        self.head_size = head_size
        self.error = error
        self.adjustment_factor = adjustment_factor
        self.bin_num = bin_num
        self.bin_indexes = bin_indexes
        self.bin_names = bin_names
        self.category_indexes = category_indexes
        self.category_names = category_names
        self.transform_param = copy.deepcopy(transform_param)
        self.need_run = need_run
        self.skip_static = skip_static
        self.local_only = local_only

    def check(self):
        descr = "Binning param's"
        self.check_string(self.method, descr)
        self.method = self.method.lower()
        self.check_positive_integer(self.compress_thres, descr)
        self.check_positive_integer(self.head_size, descr)
        self.check_decimal_float(self.error, descr)
        self.check_positive_integer(self.bin_num, descr)
        if self.bin_indexes != -1:
            self.check_defined_type(self.bin_indexes, descr, ['list', 'RepeatedScalarContainer', "NoneType"])
        self.check_defined_type(self.bin_names, descr, ['list', "NoneType"])
        self.check_defined_type(self.category_indexes, descr, ['list', "NoneType"])
        self.check_defined_type(self.category_names, descr, ['list', "NoneType"])
        self.check_open_unit_interval(self.adjustment_factor, descr)
        self.check_boolean(self.local_only, descr)

__init__(self, method='quantile', compress_thres=10000, head_size=10000, error=0.0001, bin_num=10, bin_indexes=-1, bin_names=None, adjustment_factor=0.5, transform_param=<federatedml.param.feature_binning_param.TransformParam object at 0x7f3f8a697a90>, local_only=False, category_indexes=None, category_names=None, need_run=True, skip_static=False)

special ¶

Source code in federatedml/param/feature_binning_param.py

def __init__(self, method=consts.QUANTILE,
             compress_thres=consts.DEFAULT_COMPRESS_THRESHOLD,
             head_size=consts.DEFAULT_HEAD_SIZE,
             error=consts.DEFAULT_RELATIVE_ERROR,
             bin_num=consts.G_BIN_NUM, bin_indexes=-1, bin_names=None, adjustment_factor=0.5,
             transform_param=TransformParam(),
             local_only=False,
             category_indexes=None, category_names=None,
             need_run=True, skip_static=False):
    super(FeatureBinningParam, self).__init__()
    self.method = method
    self.compress_thres = compress_thres
    self.head_size = head_size
    self.error = error
    self.adjustment_factor = adjustment_factor
    self.bin_num = bin_num
    self.bin_indexes = bin_indexes
    self.bin_names = bin_names
    self.category_indexes = category_indexes
    self.category_names = category_names
    self.transform_param = copy.deepcopy(transform_param)
    self.need_run = need_run
    self.skip_static = skip_static
    self.local_only = local_only

check(self) ¶

Source code in federatedml/param/feature_binning_param.py

def check(self):
    descr = "Binning param's"
    self.check_string(self.method, descr)
    self.method = self.method.lower()
    self.check_positive_integer(self.compress_thres, descr)
    self.check_positive_integer(self.head_size, descr)
    self.check_decimal_float(self.error, descr)
    self.check_positive_integer(self.bin_num, descr)
    if self.bin_indexes != -1:
        self.check_defined_type(self.bin_indexes, descr, ['list', 'RepeatedScalarContainer', "NoneType"])
    self.check_defined_type(self.bin_names, descr, ['list', "NoneType"])
    self.check_defined_type(self.category_indexes, descr, ['list', "NoneType"])
    self.check_defined_type(self.category_names, descr, ['list', "NoneType"])
    self.check_open_unit_interval(self.adjustment_factor, descr)
    self.check_boolean(self.local_only, descr)


HeteroFeatureBinningParam            (FeatureBinningParam)

¶

Source code in federatedml/param/feature_binning_param.py

class HeteroFeatureBinningParam(FeatureBinningParam):
    def __init__(self, method=consts.QUANTILE,
                 compress_thres=consts.DEFAULT_COMPRESS_THRESHOLD,
                 head_size=consts.DEFAULT_HEAD_SIZE,
                 error=consts.DEFAULT_RELATIVE_ERROR,
                 bin_num=consts.G_BIN_NUM, bin_indexes=-1, bin_names=None, adjustment_factor=0.5,
                 transform_param=TransformParam(), optimal_binning_param=OptimalBinningParam(),
                 local_only=False, category_indexes=None, category_names=None,
                 encrypt_param=EncryptParam(),
                 need_run=True, skip_static=False):
        super(HeteroFeatureBinningParam, self).__init__(method=method, compress_thres=compress_thres,
                                                        head_size=head_size, error=error,
                                                        bin_num=bin_num, bin_indexes=bin_indexes,
                                                        bin_names=bin_names, adjustment_factor=adjustment_factor,
                                                        transform_param=transform_param,
                                                        category_indexes=category_indexes,
                                                        category_names=category_names,
                                                        need_run=need_run, local_only=local_only,
                                                        skip_static=skip_static)
        self.optimal_binning_param = copy.deepcopy(optimal_binning_param)
        self.encrypt_param = encrypt_param

    def check(self):
        descr = "Hetero Binning param's"
        super(HeteroFeatureBinningParam, self).check()
        self.check_valid_value(self.method, descr, [consts.QUANTILE, consts.BUCKET, consts.OPTIMAL])
        self.optimal_binning_param.check()
        self.encrypt_param.check()
        if self.encrypt_param.method != consts.PAILLIER:
            raise ValueError("Feature Binning support Paillier encrypt method only.")
        if self.skip_static and self.method == consts.OPTIMAL:
            raise ValueError("When skip_static, optimal binning is not supported.")
        self.transform_param.check()
        if self.skip_static and self.transform_param.transform_type == 'woe':
            raise ValueError("To use woe transform, skip_static should set as False")

__init__(self, method='quantile', compress_thres=10000, head_size=10000, error=0.0001, bin_num=10, bin_indexes=-1, bin_names=None, adjustment_factor=0.5, transform_param=<federatedml.param.feature_binning_param.TransformParam object at 0x7f3f8a697f10>, optimal_binning_param=<federatedml.param.feature_binning_param.OptimalBinningParam object at 0x7f3f8a697f50>, local_only=False, category_indexes=None, category_names=None, encrypt_param=<federatedml.param.encrypt_param.EncryptParam object at 0x7f3f8a697350>, need_run=True, skip_static=False)

special ¶

Source code in federatedml/param/feature_binning_param.py

def __init__(self, method=consts.QUANTILE,
             compress_thres=consts.DEFAULT_COMPRESS_THRESHOLD,
             head_size=consts.DEFAULT_HEAD_SIZE,
             error=consts.DEFAULT_RELATIVE_ERROR,
             bin_num=consts.G_BIN_NUM, bin_indexes=-1, bin_names=None, adjustment_factor=0.5,
             transform_param=TransformParam(), optimal_binning_param=OptimalBinningParam(),
             local_only=False, category_indexes=None, category_names=None,
             encrypt_param=EncryptParam(),
             need_run=True, skip_static=False):
    super(HeteroFeatureBinningParam, self).__init__(method=method, compress_thres=compress_thres,
                                                    head_size=head_size, error=error,
                                                    bin_num=bin_num, bin_indexes=bin_indexes,
                                                    bin_names=bin_names, adjustment_factor=adjustment_factor,
                                                    transform_param=transform_param,
                                                    category_indexes=category_indexes,
                                                    category_names=category_names,
                                                    need_run=need_run, local_only=local_only,
                                                    skip_static=skip_static)
    self.optimal_binning_param = copy.deepcopy(optimal_binning_param)
    self.encrypt_param = encrypt_param

check(self) ¶

Source code in federatedml/param/feature_binning_param.py

def check(self):
    descr = "Hetero Binning param's"
    super(HeteroFeatureBinningParam, self).check()
    self.check_valid_value(self.method, descr, [consts.QUANTILE, consts.BUCKET, consts.OPTIMAL])
    self.optimal_binning_param.check()
    self.encrypt_param.check()
    if self.encrypt_param.method != consts.PAILLIER:
        raise ValueError("Feature Binning support Paillier encrypt method only.")
    if self.skip_static and self.method == consts.OPTIMAL:
        raise ValueError("When skip_static, optimal binning is not supported.")
    self.transform_param.check()
    if self.skip_static and self.transform_param.transform_type == 'woe':
        raise ValueError("To use woe transform, skip_static should set as False")


HomoFeatureBinningParam            (FeatureBinningParam)

¶

Source code in federatedml/param/feature_binning_param.py

class HomoFeatureBinningParam(FeatureBinningParam):
    def __init__(self, method=consts.VIRTUAL_SUMMARY,
                 compress_thres=consts.DEFAULT_COMPRESS_THRESHOLD,
                 head_size=consts.DEFAULT_HEAD_SIZE,
                 error=consts.DEFAULT_RELATIVE_ERROR,
                 sample_bins=100,
                 bin_num=consts.G_BIN_NUM, bin_indexes=-1, bin_names=None, adjustment_factor=0.5,
                 transform_param=TransformParam(),
                 category_indexes=None, category_names=None,
                 need_run=True, skip_static=False, max_iter=100):
        super(HomoFeatureBinningParam, self).__init__(method=method, compress_thres=compress_thres,
                                                      head_size=head_size, error=error,
                                                      bin_num=bin_num, bin_indexes=bin_indexes,
                                                      bin_names=bin_names, adjustment_factor=adjustment_factor,
                                                      transform_param=transform_param,
                                                      category_indexes=category_indexes, category_names=category_names,
                                                      need_run=need_run,
                                                      skip_static=skip_static)
        self.sample_bins = sample_bins
        self.max_iter = max_iter

    def check(self):
        descr = "homo binning param's"
        super(HomoFeatureBinningParam, self).check()
        self.check_string(self.method, descr)
        self.method = self.method.lower()
        self.check_valid_value(self.method, descr, [consts.VIRTUAL_SUMMARY, consts.RECURSIVE_QUERY])
        self.check_positive_integer(self.max_iter, descr)
        if self.max_iter > 100:
            raise ValueError("Max iter is not allowed exceed 100")

__init__(self, method='virtual_summary', compress_thres=10000, head_size=10000, error=0.0001, sample_bins=100, bin_num=10, bin_indexes=-1, bin_names=None, adjustment_factor=0.5, transform_param=<federatedml.param.feature_binning_param.TransformParam object at 0x7f3f8a697f90>, category_indexes=None, category_names=None, need_run=True, skip_static=False, max_iter=100)

special ¶

Source code in federatedml/param/feature_binning_param.py

def __init__(self, method=consts.VIRTUAL_SUMMARY,
             compress_thres=consts.DEFAULT_COMPRESS_THRESHOLD,
             head_size=consts.DEFAULT_HEAD_SIZE,
             error=consts.DEFAULT_RELATIVE_ERROR,
             sample_bins=100,
             bin_num=consts.G_BIN_NUM, bin_indexes=-1, bin_names=None, adjustment_factor=0.5,
             transform_param=TransformParam(),
             category_indexes=None, category_names=None,
             need_run=True, skip_static=False, max_iter=100):
    super(HomoFeatureBinningParam, self).__init__(method=method, compress_thres=compress_thres,
                                                  head_size=head_size, error=error,
                                                  bin_num=bin_num, bin_indexes=bin_indexes,
                                                  bin_names=bin_names, adjustment_factor=adjustment_factor,
                                                  transform_param=transform_param,
                                                  category_indexes=category_indexes, category_names=category_names,
                                                  need_run=need_run,
                                                  skip_static=skip_static)
    self.sample_bins = sample_bins
    self.max_iter = max_iter

check(self) ¶

Source code in federatedml/param/feature_binning_param.py

def check(self):
    descr = "homo binning param's"
    super(HomoFeatureBinningParam, self).check()
    self.check_string(self.method, descr)
    self.method = self.method.lower()
    self.check_valid_value(self.method, descr, [consts.VIRTUAL_SUMMARY, consts.RECURSIVE_QUERY])
    self.check_positive_integer(self.max_iter, descr)
    if self.max_iter > 100:
        raise ValueError("Max iter is not allowed exceed 100")

`feature_imputation_param` ¶

Classes¶


FeatureImputationParam            (BaseParam)

¶

Define feature imputation parameters

Parameters:

Name	Type	Description	Default
`default_value`	`None or single object type or list`	the value to replace missing value. if None, it will use default value defined in federatedml/feature/imputer.py, if single object, will fill missing value with this object, if list, it's length should be the same as input data' feature dimension, means that if some column happens to have missing values, it will replace it the value by element in the identical position of this list.	`0`
`missing_fill_method`	`[None, 'min', 'max', 'mean', 'designated']`	the method to replace missing value	`None`
`col_missing_fill_method`	`None or dict of (column name, missing_fill_method) pairs`	specifies method to replace missing value for each column; any column not specified will take missing_fill_method, if missing_fill_method is None, unspecified column will not be imputed;	`None`
`missing_impute`	`None or list`	element of list can be any type, or auto generated if value is None, define which values to be consider as missing, default: None	`None`
`need_run`	`bool, default True`	need run or not	`True`

Source code in federatedml/param/feature_imputation_param.py

class FeatureImputationParam(BaseParam):
    """
    Define feature imputation parameters

    Parameters
    ----------

    default_value : None or single object type or list
        the value to replace missing value.
        if None, it will use default value defined in federatedml/feature/imputer.py,
        if single object, will fill missing value with this object,
        if list, it's length should be the same as input data' feature dimension,
            means that if some column happens to have missing values, it will replace it
            the value by element in the identical position of this list.

    missing_fill_method : [None, 'min', 'max', 'mean', 'designated']
        the method to replace missing value

    col_missing_fill_method: None or dict of (column name, missing_fill_method) pairs
        specifies method to replace missing value for each column;
        any column not specified will take missing_fill_method,
        if missing_fill_method is None, unspecified column will not be imputed;

    missing_impute : None or list
        element of list can be any type, or auto generated if value is None, define which values to be consider as missing, default: None

    need_run: bool, default True
        need run or not

    """

    def __init__(self, default_value=0, missing_fill_method=None, col_missing_fill_method=None,
                 missing_impute=None, need_run=True):
        self.default_value = default_value
        self.missing_fill_method = missing_fill_method
        self.col_missing_fill_method = col_missing_fill_method
        self.missing_impute = missing_impute
        self.need_run = need_run

    def check(self):

        descr = "feature imputation param's "

        self.check_boolean(self.need_run, descr+"need_run")

        if self.missing_fill_method is not None:
            self.missing_fill_method = self.check_and_change_lower(self.missing_fill_method,
                                                                   ['min', 'max', 'mean', 'designated'],
                                                                   f"{descr}missing_fill_method ")
        if self.col_missing_fill_method:
            if not isinstance(self.col_missing_fill_method, dict):
                raise ValueError(f"{descr}col_missing_fill_method should be a dict")
            for k, v in self.col_missing_fill_method.items():
                if not isinstance(k, str):
                    raise ValueError(f"{descr}col_missing_fill_method should contain str key(s) only")
                v = self.check_and_change_lower(v,
                                                ['min', 'max', 'mean', 'designated'],
                                                f"per column method specified in {descr} col_missing_fill_method dict")
                self.col_missing_fill_method[k] = v
        if self.missing_impute:
            if not isinstance(self.missing_impute, list):
                raise ValueError(f"{descr}missing_impute must be None or list.")

        return True

__init__(self, default_value=0, missing_fill_method=None, col_missing_fill_method=None, missing_impute=None, need_run=True)

special ¶

Source code in federatedml/param/feature_imputation_param.py

def __init__(self, default_value=0, missing_fill_method=None, col_missing_fill_method=None,
             missing_impute=None, need_run=True):
    self.default_value = default_value
    self.missing_fill_method = missing_fill_method
    self.col_missing_fill_method = col_missing_fill_method
    self.missing_impute = missing_impute
    self.need_run = need_run

check(self) ¶

Source code in federatedml/param/feature_imputation_param.py

def check(self):

    descr = "feature imputation param's "

    self.check_boolean(self.need_run, descr+"need_run")

    if self.missing_fill_method is not None:
        self.missing_fill_method = self.check_and_change_lower(self.missing_fill_method,
                                                               ['min', 'max', 'mean', 'designated'],
                                                               f"{descr}missing_fill_method ")
    if self.col_missing_fill_method:
        if not isinstance(self.col_missing_fill_method, dict):
            raise ValueError(f"{descr}col_missing_fill_method should be a dict")
        for k, v in self.col_missing_fill_method.items():
            if not isinstance(k, str):
                raise ValueError(f"{descr}col_missing_fill_method should contain str key(s) only")
            v = self.check_and_change_lower(v,
                                            ['min', 'max', 'mean', 'designated'],
                                            f"per column method specified in {descr} col_missing_fill_method dict")
            self.col_missing_fill_method[k] = v
    if self.missing_impute:
        if not isinstance(self.missing_impute, list):
            raise ValueError(f"{descr}missing_impute must be None or list.")

    return True

`feature_selection_param` ¶

deprecated_param_list ¶

Classes¶


UniqueValueParam            (BaseParam)

¶

Use the difference between max-value and min-value to judge.

Parameters:

Name	Type	Description	Default
`eps`	`float, default: 1e-5`	The column(s) will be filtered if its difference is smaller than eps.	`1e-05`

Source code in federatedml/param/feature_selection_param.py

class UniqueValueParam(BaseParam):
    """
    Use the difference between max-value and min-value to judge.

    Parameters
    ----------
    eps : float, default: 1e-5
        The column(s) will be filtered if its difference is smaller than eps.
    """

    def __init__(self, eps=1e-5):
        self.eps = eps

    def check(self):
        descr = "Unique value param's"
        self.check_positive_number(self.eps, descr)
        return True

__init__(self, eps=1e-05) special ¶

Source code in federatedml/param/feature_selection_param.py

def __init__(self, eps=1e-5):
    self.eps = eps

check(self) ¶

Source code in federatedml/param/feature_selection_param.py

def check(self):
    descr = "Unique value param's"
    self.check_positive_number(self.eps, descr)
    return True


IVValueSelectionParam            (BaseParam)

¶

Use information values to select features.

Parameters:

Name	Type	Description	Default
`value_threshold`	`float, default: 1.0`	Used if iv_value_thres method is used in feature selection.	`0.0`
`host_thresholds`	`List of float or None, default: None`	Set threshold for different host. If None, use same threshold as guest. If provided, the order should map with the host id setting.	`None`

Source code in federatedml/param/feature_selection_param.py

class IVValueSelectionParam(BaseParam):
    """
    Use information values to select features.

    Parameters
    ----------
    value_threshold: float, default: 1.0
        Used if iv_value_thres method is used in feature selection.

    host_thresholds: List of float or None, default: None
        Set threshold for different host. If None, use same threshold as guest. If provided, the order should map with
        the host id setting.

    """

    def __init__(self, value_threshold=0.0, host_thresholds=None, local_only=False):
        super().__init__()
        self.value_threshold = value_threshold
        self.host_thresholds = host_thresholds
        self.local_only = local_only

    def check(self):
        if not isinstance(self.value_threshold, (float, int)):
            raise ValueError("IV selection param's value_threshold should be float or int")

        if self.host_thresholds is not None:
            if not isinstance(self.host_thresholds, list):
                raise ValueError("IV selection param's host_threshold should be list or None")

        if not isinstance(self.local_only, bool):
            raise ValueError("IV selection param's local_only should be bool")

        return True

__init__(self, value_threshold=0.0, host_thresholds=None, local_only=False) special ¶

Source code in federatedml/param/feature_selection_param.py

def __init__(self, value_threshold=0.0, host_thresholds=None, local_only=False):
    super().__init__()
    self.value_threshold = value_threshold
    self.host_thresholds = host_thresholds
    self.local_only = local_only

check(self) ¶

Source code in federatedml/param/feature_selection_param.py

def check(self):
    if not isinstance(self.value_threshold, (float, int)):
        raise ValueError("IV selection param's value_threshold should be float or int")

    if self.host_thresholds is not None:
        if not isinstance(self.host_thresholds, list):
            raise ValueError("IV selection param's host_threshold should be list or None")

    if not isinstance(self.local_only, bool):
        raise ValueError("IV selection param's local_only should be bool")

    return True


IVPercentileSelectionParam            (BaseParam)

¶

Use information values to select features.

Parameters:

Name	Type	Description	Default
`percentile_threshold`	`float`	0 <= percentile_threshold <= 1.0, default: 1.0, Percentile threshold for iv_percentile method	`1.0`

Source code in federatedml/param/feature_selection_param.py

class IVPercentileSelectionParam(BaseParam):
    """
    Use information values to select features.

    Parameters
    ----------
    percentile_threshold: float
        0 <= percentile_threshold <= 1.0, default: 1.0, Percentile threshold for iv_percentile method
    """

    def __init__(self, percentile_threshold=1.0, local_only=False):
        super().__init__()
        self.percentile_threshold = percentile_threshold
        self.local_only = local_only

    def check(self):
        descr = "IV selection param's"
        if self.percentile_threshold != 0 or self.percentile_threshold != 1:
            self.check_decimal_float(self.percentile_threshold, descr)
        self.check_boolean(self.local_only, descr)
        return True

__init__(self, percentile_threshold=1.0, local_only=False) special ¶

Source code in federatedml/param/feature_selection_param.py

def __init__(self, percentile_threshold=1.0, local_only=False):
    super().__init__()
    self.percentile_threshold = percentile_threshold
    self.local_only = local_only

check(self) ¶

Source code in federatedml/param/feature_selection_param.py

def check(self):
    descr = "IV selection param's"
    if self.percentile_threshold != 0 or self.percentile_threshold != 1:
        self.check_decimal_float(self.percentile_threshold, descr)
    self.check_boolean(self.local_only, descr)
    return True


IVTopKParam            (BaseParam)

¶

Use information values to select features.

Parameters:

Name	Type	Description	Default
`k`	`int`	should be greater than 0, default: 10, Percentile threshold for iv_percentile method	`10`

Source code in federatedml/param/feature_selection_param.py

class IVTopKParam(BaseParam):
    """
    Use information values to select features.

    Parameters
    ----------
    k: int
        should be greater than 0, default: 10, Percentile threshold for iv_percentile method
    """

    def __init__(self, k=10, local_only=False):
        super().__init__()
        self.k = k
        self.local_only = local_only

    def check(self):
        descr = "IV selection param's"
        self.check_positive_integer(self.k, descr)
        self.check_boolean(self.local_only, descr)
        return True

__init__(self, k=10, local_only=False) special ¶

Source code in federatedml/param/feature_selection_param.py

def __init__(self, k=10, local_only=False):
    super().__init__()
    self.k = k
    self.local_only = local_only

check(self) ¶

Source code in federatedml/param/feature_selection_param.py

def check(self):
    descr = "IV selection param's"
    self.check_positive_integer(self.k, descr)
    self.check_boolean(self.local_only, descr)
    return True


VarianceOfCoeSelectionParam            (BaseParam)

¶

Use coefficient of variation to select features. When judging, the absolute value will be used.

Parameters:

Name	Type	Description	Default
`value_threshold`	`float, default: 1.0`	Used if coefficient_of_variation_value_thres method is used in feature selection. Filter those columns who has smaller coefficient of variance than the threshold.	`1.0`

Source code in federatedml/param/feature_selection_param.py

class VarianceOfCoeSelectionParam(BaseParam):
    """
    Use coefficient of variation to select features. When judging, the absolute value will be used.

    Parameters
    ----------
    value_threshold: float, default: 1.0
        Used if coefficient_of_variation_value_thres method is used in feature selection. Filter those
        columns who has smaller coefficient of variance than the threshold.

    """

    def __init__(self, value_threshold=1.0):
        self.value_threshold = value_threshold

    def check(self):
        descr = "Coff of Variances param's"
        self.check_positive_number(self.value_threshold, descr)
        return True

__init__(self, value_threshold=1.0) special ¶

Source code in federatedml/param/feature_selection_param.py

def __init__(self, value_threshold=1.0):
    self.value_threshold = value_threshold

check(self) ¶

Source code in federatedml/param/feature_selection_param.py

def check(self):
    descr = "Coff of Variances param's"
    self.check_positive_number(self.value_threshold, descr)
    return True


OutlierColsSelectionParam            (BaseParam)

¶

Given percentile and threshold. Judge if this quantile point is larger than threshold. Filter those larger ones.

Parameters:

Name	Type	Description	Default
`percentile`	`float, [0., 1.] default: 1.0`	The percentile points to compare.	`1.0`
`upper_threshold`	`float, default: 1.0`	Percentile threshold for coefficient_of_variation_percentile method	`1.0`

Source code in federatedml/param/feature_selection_param.py

class OutlierColsSelectionParam(BaseParam):
    """
    Given percentile and threshold. Judge if this quantile point is larger than threshold. Filter those larger ones.

    Parameters
    ----------
    percentile: float, [0., 1.] default: 1.0
        The percentile points to compare.

    upper_threshold: float, default: 1.0
        Percentile threshold for coefficient_of_variation_percentile method

    """

    def __init__(self, percentile=1.0, upper_threshold=1.0):
        self.percentile = percentile
        self.upper_threshold = upper_threshold

    def check(self):
        descr = "Outlier Filter param's"
        self.check_decimal_float(self.percentile, descr)
        self.check_defined_type(self.upper_threshold, descr, ['float', 'int'])
        return True

__init__(self, percentile=1.0, upper_threshold=1.0) special ¶

Source code in federatedml/param/feature_selection_param.py

def __init__(self, percentile=1.0, upper_threshold=1.0):
    self.percentile = percentile
    self.upper_threshold = upper_threshold

check(self) ¶

Source code in federatedml/param/feature_selection_param.py

def check(self):
    descr = "Outlier Filter param's"
    self.check_decimal_float(self.percentile, descr)
    self.check_defined_type(self.upper_threshold, descr, ['float', 'int'])
    return True


CommonFilterParam            (BaseParam)

¶

All of the following parameters can set with a single value or a list of those values.

When setting one single value, it means using only one metric to filter while a list represent for using multiple metrics.

Please note that if some of the following values has been set as list, all of them should have same length. Otherwise, error will be raised. And if there exist a list type parameter, the metrics should be in list type.

Parameters:

Name	Type	Description	Default
`metrics`	`str or list, default: depends on the specific filter`	Indicate what metrics are used in this filter	required
`filter_type`	`str, default: threshold`	Should be one of "threshold", "top_k" or "top_percentile"	`'threshold'`
`take_high`	`bool, default: True`	When filtering, taking highest values or not.	`True`
`threshold`	`float or int, default: 1`	If filter type is threshold, this is the threshold value. If it is "top_k", this is the k value. If it is top_percentile, this is the percentile threshold.	`1`
`host_thresholds`	`List of float or List of List of float or None, default: None`	Set threshold for different host. If None, use same threshold as guest. If provided, the order should map with the host id setting.	`None`
`select_federated`	`bool, default: True`	Whether select federated with other parties or based on local variables	`True`

Source code in federatedml/param/feature_selection_param.py

class CommonFilterParam(BaseParam):
    """
    All of the following parameters can set with a single value or a list of those values.
    When setting one single value, it means using only one metric to filter while
    a list represent for using multiple metrics.

    Please note that if some of the following values has been set as list, all of them
    should have same length. Otherwise, error will be raised. And if there exist a list
    type parameter, the metrics should be in list type.

    Parameters
    ----------
    metrics: str or list, default: depends on the specific filter
        Indicate what metrics are used in this filter

    filter_type: str, default: threshold
        Should be one of "threshold", "top_k" or "top_percentile"

    take_high: bool, default: True
        When filtering, taking highest values or not.

    threshold: float or int, default: 1
        If filter type is threshold, this is the threshold value.
        If it is "top_k", this is the k value.
        If it is top_percentile, this is the percentile threshold.

    host_thresholds: List of float or List of List of float or None, default: None
        Set threshold for different host. If None, use same threshold as guest. If provided, the order should map with
        the host id setting.

    select_federated: bool, default: True
        Whether select federated with other parties or based on local variables
    """

    def __init__(self, metrics, filter_type='threshold', take_high=True, threshold=1,
                 host_thresholds=None, select_federated=True):
        super().__init__()
        self.metrics = metrics
        self.filter_type = filter_type
        self.take_high = take_high
        self.threshold = threshold
        self.host_thresholds = host_thresholds
        self.select_federated = select_federated

    def check(self):
        self._convert_to_list(param_names=["filter_type", "take_high",
                                           "threshold", "select_federated"])

        for v in self.filter_type:
            if v not in ["threshold", "top_k", "top_percentile"]:
                raise ValueError('filter_type should be one of '
                                 '"threshold", "top_k", "top_percentile"')

        descr = "hetero feature selection param's"
        for v in self.take_high:
            self.check_boolean(v, descr)

        for idx, v in enumerate(self.threshold):
            if self.filter_type[idx] == "threshold":
                if not isinstance(v, (float, int)):
                    raise ValueError(descr + f"{v} should be a float or int")
            elif self.filter_type[idx] == 'top_k':
                self.check_positive_integer(v, descr)
            else:
                if not (v == 0 or v == 1):
                    self.check_decimal_float(v, descr)

        if self.host_thresholds is not None:
            if not isinstance(self.host_thresholds, list):
                raise ValueError("IV selection param's host_threshold should be list or None")

        assert isinstance(self.select_federated, list)
        for v in self.select_federated:
            self.check_boolean(v, descr)

    def _convert_to_list(self, param_names):
        if not isinstance(self.metrics, list):
            for value_name in param_names:
                v = getattr(self, value_name)
                if isinstance(v, list):
                    raise ValueError(f"{value_name}: {v} should not be a list when "
                                     f"metrics: {self.metrics} is not a list")
                setattr(self, value_name, [v])
            setattr(self, "metrics", [self.metrics])
        else:
            expected_length = len(self.metrics)
            for value_name in param_names:
                v = getattr(self, value_name)
                if isinstance(v, list):
                    if len(v) != expected_length:
                        raise ValueError(f"The parameter {v} should have same length "
                                         f"with metrics")
                else:
                    new_v = [v] * expected_length
                    setattr(self, value_name, new_v)

__init__(self, metrics, filter_type='threshold', take_high=True, threshold=1, host_thresholds=None, select_federated=True)

special ¶

Source code in federatedml/param/feature_selection_param.py

def __init__(self, metrics, filter_type='threshold', take_high=True, threshold=1,
             host_thresholds=None, select_federated=True):
    super().__init__()
    self.metrics = metrics
    self.filter_type = filter_type
    self.take_high = take_high
    self.threshold = threshold
    self.host_thresholds = host_thresholds
    self.select_federated = select_federated

check(self) ¶

Source code in federatedml/param/feature_selection_param.py

def check(self):
    self._convert_to_list(param_names=["filter_type", "take_high",
                                       "threshold", "select_federated"])

    for v in self.filter_type:
        if v not in ["threshold", "top_k", "top_percentile"]:
            raise ValueError('filter_type should be one of '
                             '"threshold", "top_k", "top_percentile"')

    descr = "hetero feature selection param's"
    for v in self.take_high:
        self.check_boolean(v, descr)

    for idx, v in enumerate(self.threshold):
        if self.filter_type[idx] == "threshold":
            if not isinstance(v, (float, int)):
                raise ValueError(descr + f"{v} should be a float or int")
        elif self.filter_type[idx] == 'top_k':
            self.check_positive_integer(v, descr)
        else:
            if not (v == 0 or v == 1):
                self.check_decimal_float(v, descr)

    if self.host_thresholds is not None:
        if not isinstance(self.host_thresholds, list):
            raise ValueError("IV selection param's host_threshold should be list or None")

    assert isinstance(self.select_federated, list)
    for v in self.select_federated:
        self.check_boolean(v, descr)


IVFilterParam            (CommonFilterParam)

¶

Parameters:

Name	Type	Description	Default
`mul_class_merge_type`	`str or list, default: "average"`	Indicate how to merge multi-class iv results. Support "average", "min" and "max".	`'average'`

Source code in federatedml/param/feature_selection_param.py

class IVFilterParam(CommonFilterParam):
    """
    Parameters
    ----------
    mul_class_merge_type: str or list, default: "average"
        Indicate how to merge multi-class iv results. Support "average", "min" and "max".

    """

    def __init__(self, filter_type='threshold', threshold=1,
                 host_thresholds=None, select_federated=True, mul_class_merge_type="average"):
        super().__init__(metrics='iv', filter_type=filter_type, take_high=True, threshold=threshold,
                         host_thresholds=host_thresholds, select_federated=select_federated)
        self.mul_class_merge_type = mul_class_merge_type

    def check(self):
        super(IVFilterParam, self).check()
        self._convert_to_list(param_names=["mul_class_merge_type"])

__init__(self, filter_type='threshold', threshold=1, host_thresholds=None, select_federated=True, mul_class_merge_type='average')

special ¶

Source code in federatedml/param/feature_selection_param.py

def __init__(self, filter_type='threshold', threshold=1,
             host_thresholds=None, select_federated=True, mul_class_merge_type="average"):
    super().__init__(metrics='iv', filter_type=filter_type, take_high=True, threshold=threshold,
                     host_thresholds=host_thresholds, select_federated=select_federated)
    self.mul_class_merge_type = mul_class_merge_type

check(self) ¶

Source code in federatedml/param/feature_selection_param.py

def check(self):
    super(IVFilterParam, self).check()
    self._convert_to_list(param_names=["mul_class_merge_type"])


CorrelationFilterParam            (BaseParam)

¶

This filter follow this specific rules:

Sort all the columns from high to low based on specific metric, eg. iv.
1. Traverse each sorted column. If there exists other columns with whom the absolute values of correlation are larger than threshold, they will be filtered.

Parameters:

Name	Type	Description	Default
`sort_metric`	`str, default: iv`	Specify which metric to be used to sort features.	`'iv'`
`threshold`	`float or int, default: 0.1`	Correlation threshold	`0.1`
`select_federated`	`bool, default: True`	Whether select federated with other parties or based on local variables	`True`

Source code in federatedml/param/feature_selection_param.py

class CorrelationFilterParam(BaseParam):
    """
    This filter follow this specific rules:
        1. Sort all the columns from high to low based on specific metric, eg. iv.
        2. Traverse each sorted column. If there exists other columns with whom the
            absolute values of correlation are larger than threshold, they will be filtered.

    Parameters
    ----------
    sort_metric: str, default: iv
        Specify which metric to be used to sort features.

    threshold: float or int, default: 0.1
        Correlation threshold

    select_federated: bool, default: True
        Whether select federated with other parties or based on local variables
    """

    def __init__(self, sort_metric='iv', threshold=0.1, select_federated=True):
        super().__init__()
        self.sort_metric = sort_metric
        self.threshold = threshold
        self.select_federated = select_federated

    def check(self):
        descr = "Correlation Filter param's"

        self.sort_metric = self.sort_metric.lower()
        support_metrics = ['iv']
        if self.sort_metric not in support_metrics:
            raise ValueError(f"sort_metric in Correlation Filter should be one of {support_metrics}")

        self.check_positive_number(self.threshold, descr)

__init__(self, sort_metric='iv', threshold=0.1, select_federated=True) special ¶

Source code in federatedml/param/feature_selection_param.py

def __init__(self, sort_metric='iv', threshold=0.1, select_federated=True):
    super().__init__()
    self.sort_metric = sort_metric
    self.threshold = threshold
    self.select_federated = select_federated

check(self) ¶

Source code in federatedml/param/feature_selection_param.py

def check(self):
    descr = "Correlation Filter param's"

    self.sort_metric = self.sort_metric.lower()
    support_metrics = ['iv']
    if self.sort_metric not in support_metrics:
        raise ValueError(f"sort_metric in Correlation Filter should be one of {support_metrics}")

    self.check_positive_number(self.threshold, descr)


PercentageValueParam            (BaseParam)

¶

Filter the columns that have a value that exceeds a certain percentage.

Parameters:

Name	Type	Description	Default
`upper_pct`	`float, [0.1, 1.], default: 1.0`	The upper percentage threshold for filtering, upper_pct should not be less than 0.1.	`1.0`

Source code in federatedml/param/feature_selection_param.py

class PercentageValueParam(BaseParam):
    """
    Filter the columns that have a value that exceeds a certain percentage.

    Parameters
    ----------
    upper_pct: float, [0.1, 1.], default: 1.0
        The upper percentage threshold for filtering, upper_pct should not be less than 0.1.

    """

    def __init__(self, upper_pct=1.0):
        super().__init__()
        self.upper_pct = upper_pct

    def check(self):
        descr = "Percentage Filter param's"
        if self.upper_pct not in [0, 1]:
            self.check_decimal_float(self.upper_pct, descr)
        if self.upper_pct < consts.PERCENTAGE_VALUE_LIMIT:
            raise ValueError(descr + f" {self.upper_pct} not supported,"
                                     f" should not be smaller than {consts.PERCENTAGE_VALUE_LIMIT}")
        return True

__init__(self, upper_pct=1.0) special ¶

Source code in federatedml/param/feature_selection_param.py

def __init__(self, upper_pct=1.0):
    super().__init__()
    self.upper_pct = upper_pct

check(self) ¶

Source code in federatedml/param/feature_selection_param.py

def check(self):
    descr = "Percentage Filter param's"
    if self.upper_pct not in [0, 1]:
        self.check_decimal_float(self.upper_pct, descr)
    if self.upper_pct < consts.PERCENTAGE_VALUE_LIMIT:
        raise ValueError(descr + f" {self.upper_pct} not supported,"
                                 f" should not be smaller than {consts.PERCENTAGE_VALUE_LIMIT}")
    return True


ManuallyFilterParam            (BaseParam)

¶

Specified columns that need to be filtered. If exist, it will be filtered directly, otherwise, ignore it.

Both Filter_out or left parameters only works for this specific filter. For instances, if you set some columns left in this filter but those columns are filtered by other filters, those columns will NOT left in final.

Please note that (left_col_indexes & left_col_names) cannot use with (filter_out_indexes & filter_out_names) simultaneously.

Parameters:

Name	Type	Description	Default
`filter_out_indexes`	`list of int, default: None`	Specify columns' indexes to be filtered out	`None`
`filter_out_names`	`list of string, default: None`	Specify columns' names to be filtered out	`None`
`left_col_indexes`	`list of int, default: None`	Specify left_col_index	`None`
`left_col_names`	`list of string, default: None`	Specify left col names	`None`

Source code in federatedml/param/feature_selection_param.py

class ManuallyFilterParam(BaseParam):
    """
    Specified columns that need to be filtered. If exist, it will be filtered directly, otherwise, ignore it.

    Both Filter_out or left parameters only works for this specific filter. For instances, if you set some columns left
    in this filter but those columns are filtered by other filters, those columns will NOT left in final.

    Please note that (left_col_indexes & left_col_names) cannot use with (filter_out_indexes & filter_out_names) simultaneously.

    Parameters
    ----------
    filter_out_indexes: list of int, default: None
        Specify columns' indexes to be filtered out

    filter_out_names : list of string, default: None
        Specify columns' names to be filtered out

    left_col_indexes: list of int, default: None
        Specify left_col_index

    left_col_names: list of string, default: None
        Specify left col names


    """

    def __init__(self, filter_out_indexes=None, filter_out_names=None, left_col_indexes=None,
                 left_col_names=None):
        super().__init__()
        self.filter_out_indexes = filter_out_indexes
        self.filter_out_names = filter_out_names
        self.left_col_indexes = left_col_indexes
        self.left_col_names = left_col_names

    def check(self):
        descr = "Manually Filter param's"
        self.check_defined_type(self.filter_out_indexes, descr, ['list', 'NoneType'])
        self.check_defined_type(self.filter_out_names, descr, ['list', 'NoneType'])
        self.check_defined_type(self.left_col_indexes, descr, ['list', 'NoneType'])
        self.check_defined_type(self.left_col_names, descr, ['list', 'NoneType'])

        if (self.filter_out_indexes or self.filter_out_names) is not None and \
                (self.left_col_names or self.left_col_indexes) is not None:
            raise ValueError("(left_col_indexes & left_col_names) cannot use with"
                             " (filter_out_indexes & filter_out_names) simultaneously")
        return True

__init__(self, filter_out_indexes=None, filter_out_names=None, left_col_indexes=None, left_col_names=None) special ¶

Source code in federatedml/param/feature_selection_param.py

def __init__(self, filter_out_indexes=None, filter_out_names=None, left_col_indexes=None,
             left_col_names=None):
    super().__init__()
    self.filter_out_indexes = filter_out_indexes
    self.filter_out_names = filter_out_names
    self.left_col_indexes = left_col_indexes
    self.left_col_names = left_col_names

check(self) ¶

Source code in federatedml/param/feature_selection_param.py

def check(self):
    descr = "Manually Filter param's"
    self.check_defined_type(self.filter_out_indexes, descr, ['list', 'NoneType'])
    self.check_defined_type(self.filter_out_names, descr, ['list', 'NoneType'])
    self.check_defined_type(self.left_col_indexes, descr, ['list', 'NoneType'])
    self.check_defined_type(self.left_col_names, descr, ['list', 'NoneType'])

    if (self.filter_out_indexes or self.filter_out_names) is not None and \
            (self.left_col_names or self.left_col_indexes) is not None:
        raise ValueError("(left_col_indexes & left_col_names) cannot use with"
                         " (filter_out_indexes & filter_out_names) simultaneously")
    return True


FeatureSelectionParam            (BaseParam)

¶

Define the feature selection parameters.

Parameters:

Name	Type	Description	Default
`select_col_indexes`	`list or int, default: -1`	Specify which columns need to calculated. -1 represent for all columns.	`-1`
`select_names`	`list of string, default: []`	Specify which columns need to calculated. Each element in the list represent for a column name in header.	`None`
`filter_methods`	`list of ["manually", "iv_filter", "statistic_filter", "psi_filter", “hetero_sbt_filter", "homo_sbt_filter", "hetero_fast_sbt_filter", "percentage_value", "vif_filter", "correlation_filter"], default: ["manually"]`	The following methods will be deprecated in future version: "unique_value", "iv_value_thres", "iv_percentile", "coefficient_of_variation_value_thres", "outlier_cols" Specify the filter methods used in feature selection. The orders of filter used is depended on this list. Please be notified that, if a percentile method is used after some certain filter method, the percentile represent for the ratio of rest features. e.g. If you have 10 features at the beginning. After first filter method, you have 8 rest. Then, you want top 80% highest iv feature. Here, we will choose floor(0.8 * 8) = 6 features instead of 8.	`None`
`unique_param`	`UniqueValueParam`	filter the columns if all values in this feature is the same	`<federatedml.param.feature_selection_param.UniqueValueParam object at 0x7f3f8a7ebc10>`
`iv_value_param`	`IVValueSelectionParam`	Use information value to filter columns. If this method is set, a float threshold need to be provided. Filter those columns whose iv is smaller than threshold. Will be deprecated in the future.	`<federatedml.param.feature_selection_param.IVValueSelectionParam object at 0x7f3f8a7ebdd0>`
`iv_percentile_param`	`IVPercentileSelectionParam`	Use information value to filter columns. If this method is set, a float ratio threshold need to be provided. Pick floor(ratio * feature_num) features with higher iv. If multiple features around the threshold are same, all those columns will be keep. Will be deprecated in the future.	`<federatedml.param.feature_selection_param.IVPercentileSelectionParam object at 0x7f3f8a7eb890>`
`variance_coe_param`	`VarianceOfCoeSelectionParam`	Use coefficient of variation to judge whether filtered or not. Will be deprecated in the future.	`<federatedml.param.feature_selection_param.VarianceOfCoeSelectionParam object at 0x7f3f8a7ebed0>`
`outlier_param`	`OutlierColsSelectionParam`	Filter columns whose certain percentile value is larger than a threshold. Will be deprecated in the future.	`<federatedml.param.feature_selection_param.OutlierColsSelectionParam object at 0x7f3f8a7eb510>`
`percentage_value_param`	`PercentageValueParam`	Filter the columns that have a value that exceeds a certain percentage.	`<federatedml.param.feature_selection_param.PercentageValueParam object at 0x7f3f8a7eb690>`
`iv_param`	`IVFilterParam`	Setting how to filter base on iv. It support take high mode only. All of "threshold", "top_k" and "top_percentile" are accepted. Check more details in CommonFilterParam. To use this filter, hetero-feature-binning module has to be provided.	`<federatedml.param.feature_selection_param.IVFilterParam object at 0x7f3f8a6ae610>`
`statistic_param`	`CommonFilterParam`	Setting how to filter base on statistic values. All of "threshold", "top_k" and "top_percentile" are accepted. Check more details in CommonFilterParam. To use this filter, data_statistic module has to be provided.	`<federatedml.param.feature_selection_param.CommonFilterParam object at 0x7f3f8a6ae790>`
`psi_param`	`CommonFilterParam`	Setting how to filter base on psi values. All of "threshold", "top_k" and "top_percentile" are accepted. Its take_high properties should be False to choose lower psi features. Check more details in CommonFilterParam. To use this filter, data_statistic module has to be provided.	`<federatedml.param.feature_selection_param.CommonFilterParam object at 0x7f3f8a6ae6d0>`
`need_run`	`bool, default True`	Indicate if this module needed to be run	`True`

Source code in federatedml/param/feature_selection_param.py

class FeatureSelectionParam(BaseParam):
    """
    Define the feature selection parameters.

    Parameters
    ----------
    select_col_indexes: list or int, default: -1
        Specify which columns need to calculated. -1 represent for all columns.

    select_names : list of string, default: []
        Specify which columns need to calculated. Each element in the list represent for a column name in header.

    filter_methods: list of ["manually", "iv_filter", "statistic_filter", "psi_filter", “hetero_sbt_filter", "homo_sbt_filter", "hetero_fast_sbt_filter", "percentage_value", "vif_filter", "correlation_filter"], default: ["manually"]
        The following methods will be deprecated in future version:
        "unique_value", "iv_value_thres", "iv_percentile",
        "coefficient_of_variation_value_thres", "outlier_cols"

        Specify the filter methods used in feature selection. The orders of filter used is depended on this list.
        Please be notified that, if a percentile method is used after some certain filter method,
        the percentile represent for the ratio of rest features.

        e.g. If you have 10 features at the beginning. After first filter method, you have 8 rest. Then, you want
        top 80% highest iv feature. Here, we will choose floor(0.8 * 8) = 6 features instead of 8.

    unique_param: UniqueValueParam
        filter the columns if all values in this feature is the same

    iv_value_param: IVValueSelectionParam
        Use information value to filter columns. If this method is set, a float threshold need to be provided.
        Filter those columns whose iv is smaller than threshold. Will be deprecated in the future.

    iv_percentile_param: IVPercentileSelectionParam
        Use information value to filter columns. If this method is set, a float ratio threshold
        need to be provided. Pick floor(ratio * feature_num) features with higher iv. If multiple features around
        the threshold are same, all those columns will be keep. Will be deprecated in the future.

    variance_coe_param: VarianceOfCoeSelectionParam
        Use coefficient of variation to judge whether filtered or not.
        Will be deprecated in the future.

    outlier_param: OutlierColsSelectionParam
        Filter columns whose certain percentile value is larger than a threshold.
        Will be deprecated in the future.

    percentage_value_param: PercentageValueParam
        Filter the columns that have a value that exceeds a certain percentage.

    iv_param: IVFilterParam
        Setting how to filter base on iv. It support take high mode only. All of "threshold",
        "top_k" and "top_percentile" are accepted. Check more details in CommonFilterParam. To
        use this filter, hetero-feature-binning module has to be provided.

    statistic_param: CommonFilterParam
        Setting how to filter base on statistic values. All of "threshold",
        "top_k" and "top_percentile" are accepted. Check more details in CommonFilterParam.
        To use this filter, data_statistic module has to be provided.

    psi_param: CommonFilterParam
        Setting how to filter base on psi values. All of "threshold",
        "top_k" and "top_percentile" are accepted. Its take_high properties should be False
        to choose lower psi features. Check more details in CommonFilterParam.
        To use this filter, data_statistic module has to be provided.

    need_run: bool, default True
        Indicate if this module needed to be run

    """

    def __init__(self, select_col_indexes=-1, select_names=None, filter_methods=None,
                 unique_param=UniqueValueParam(),
                 iv_value_param=IVValueSelectionParam(),
                 iv_percentile_param=IVPercentileSelectionParam(),
                 iv_top_k_param=IVTopKParam(),
                 variance_coe_param=VarianceOfCoeSelectionParam(),
                 outlier_param=OutlierColsSelectionParam(),
                 manually_param=ManuallyFilterParam(),
                 percentage_value_param=PercentageValueParam(),
                 iv_param=IVFilterParam(),
                 statistic_param=CommonFilterParam(metrics=consts.MEAN),
                 psi_param=CommonFilterParam(metrics=consts.PSI,
                                             take_high=False),
                 vif_param=CommonFilterParam(metrics=consts.VIF,
                                             threshold=5.0,
                                             take_high=False),
                 sbt_param=CommonFilterParam(metrics=consts.FEATURE_IMPORTANCE),
                 correlation_param=CorrelationFilterParam(),
                 need_run=True
                 ):
        super(FeatureSelectionParam, self).__init__()
        self.correlation_param = correlation_param
        self.vif_param = vif_param
        self.select_col_indexes = select_col_indexes
        if select_names is None:
            self.select_names = []
        else:
            self.select_names = select_names
        if filter_methods is None:
            self.filter_methods = [consts.MANUALLY_FILTER]
        else:
            self.filter_methods = filter_methods

        # deprecate in the future
        self.unique_param = copy.deepcopy(unique_param)
        self.iv_value_param = copy.deepcopy(iv_value_param)
        self.iv_percentile_param = copy.deepcopy(iv_percentile_param)
        self.iv_top_k_param = copy.deepcopy(iv_top_k_param)
        self.variance_coe_param = copy.deepcopy(variance_coe_param)
        self.outlier_param = copy.deepcopy(outlier_param)
        self.percentage_value_param = copy.deepcopy(percentage_value_param)

        self.manually_param = copy.deepcopy(manually_param)
        self.iv_param = copy.deepcopy(iv_param)
        self.statistic_param = copy.deepcopy(statistic_param)
        self.psi_param = copy.deepcopy(psi_param)
        self.sbt_param = copy.deepcopy(sbt_param)
        self.need_run = need_run

    def check(self):
        descr = "hetero feature selection param's"

        self.check_defined_type(self.filter_methods, descr, ['list'])

        for idx, method in enumerate(self.filter_methods):
            method = method.lower()
            self.check_valid_value(method, descr, [consts.UNIQUE_VALUE, consts.IV_VALUE_THRES, consts.IV_PERCENTILE,
                                                   consts.COEFFICIENT_OF_VARIATION_VALUE_THRES, consts.OUTLIER_COLS,
                                                   consts.MANUALLY_FILTER, consts.PERCENTAGE_VALUE,
                                                   consts.IV_FILTER, consts.STATISTIC_FILTER, consts.IV_TOP_K,
                                                   consts.PSI_FILTER, consts.HETERO_SBT_FILTER,
                                                   consts.HOMO_SBT_FILTER, consts.HETERO_FAST_SBT_FILTER,
                                                   consts.VIF_FILTER, consts.CORRELATION_FILTER])

            self.filter_methods[idx] = method

        self.check_defined_type(self.select_col_indexes, descr, ['list', 'int'])

        self.unique_param.check()
        self.iv_value_param.check()
        self.iv_percentile_param.check()
        self.iv_top_k_param.check()
        self.variance_coe_param.check()
        self.outlier_param.check()
        self.manually_param.check()
        self.percentage_value_param.check()

        self.iv_param.check()
        for th in self.iv_param.take_high:
            if not th:
                raise ValueError("Iv filter should take higher iv features")
        for m in self.iv_param.metrics:
            if m != consts.IV:
                raise ValueError("For iv filter, metrics should be 'iv'")

        self.statistic_param.check()
        self.psi_param.check()
        for th in self.psi_param.take_high:
            if th:
                raise ValueError("PSI filter should take lower psi features")
        for m in self.psi_param.metrics:
            if m != consts.PSI:
                raise ValueError("For psi filter, metrics should be 'psi'")

        self.sbt_param.check()
        for th in self.sbt_param.take_high:
            if not th:
                raise ValueError("SBT filter should take higher feature_importance features")
        for m in self.sbt_param.metrics:
            if m != consts.FEATURE_IMPORTANCE:
                raise ValueError("For SBT filter, metrics should be 'feature_importance'")

        self.vif_param.check()
        for m in self.vif_param.metrics:
            if m != consts.VIF:
                raise ValueError("For VIF filter, metrics should be 'vif'")

        self.correlation_param.check()

        self._warn_to_deprecate_param("iv_value_param", descr, "iv_param")
        self._warn_to_deprecate_param("iv_percentile_param", descr, "iv_param")
        self._warn_to_deprecate_param("iv_top_k_param", descr, "iv_param")
        self._warn_to_deprecate_param("variance_coe_param", descr, "statistic_param")
        self._warn_to_deprecate_param("unique_param", descr, "statistic_param")
        self._warn_to_deprecate_param("outlier_param", descr, "statistic_param")

__init__(self, select_col_indexes=-1, select_names=None, filter_methods=None, unique_param=<federatedml.param.feature_selection_param.UniqueValueParam object at 0x7f3f8a7ebc10>, iv_value_param=<federatedml.param.feature_selection_param.IVValueSelectionParam object at 0x7f3f8a7ebdd0>, iv_percentile_param=<federatedml.param.feature_selection_param.IVPercentileSelectionParam object at 0x7f3f8a7eb890>, iv_top_k_param=<federatedml.param.feature_selection_param.IVTopKParam object at 0x7f3f8a7ebd10>, variance_coe_param=<federatedml.param.feature_selection_param.VarianceOfCoeSelectionParam object at 0x7f3f8a7ebed0>, outlier_param=<federatedml.param.feature_selection_param.OutlierColsSelectionParam object at 0x7f3f8a7eb510>, manually_param=<federatedml.param.feature_selection_param.ManuallyFilterParam object at 0x7f3f8a7ebfd0>, percentage_value_param=<federatedml.param.feature_selection_param.PercentageValueParam object at 0x7f3f8a7eb690>, iv_param=<federatedml.param.feature_selection_param.IVFilterParam object at 0x7f3f8a6ae610>, statistic_param=<federatedml.param.feature_selection_param.CommonFilterParam object at 0x7f3f8a6ae790>, psi_param=<federatedml.param.feature_selection_param.CommonFilterParam object at 0x7f3f8a6ae6d0>, vif_param=<federatedml.param.feature_selection_param.CommonFilterParam object at 0x7f3f8a6ae710>, sbt_param=<federatedml.param.feature_selection_param.CommonFilterParam object at 0x7f3f8a6ae810>, correlation_param=<federatedml.param.feature_selection_param.CorrelationFilterParam object at 0x7f3f8a6ae850>, need_run=True)

special ¶

Source code in federatedml/param/feature_selection_param.py

def __init__(self, select_col_indexes=-1, select_names=None, filter_methods=None,
             unique_param=UniqueValueParam(),
             iv_value_param=IVValueSelectionParam(),
             iv_percentile_param=IVPercentileSelectionParam(),
             iv_top_k_param=IVTopKParam(),
             variance_coe_param=VarianceOfCoeSelectionParam(),
             outlier_param=OutlierColsSelectionParam(),
             manually_param=ManuallyFilterParam(),
             percentage_value_param=PercentageValueParam(),
             iv_param=IVFilterParam(),
             statistic_param=CommonFilterParam(metrics=consts.MEAN),
             psi_param=CommonFilterParam(metrics=consts.PSI,
                                         take_high=False),
             vif_param=CommonFilterParam(metrics=consts.VIF,
                                         threshold=5.0,
                                         take_high=False),
             sbt_param=CommonFilterParam(metrics=consts.FEATURE_IMPORTANCE),
             correlation_param=CorrelationFilterParam(),
             need_run=True
             ):
    super(FeatureSelectionParam, self).__init__()
    self.correlation_param = correlation_param
    self.vif_param = vif_param
    self.select_col_indexes = select_col_indexes
    if select_names is None:
        self.select_names = []
    else:
        self.select_names = select_names
    if filter_methods is None:
        self.filter_methods = [consts.MANUALLY_FILTER]
    else:
        self.filter_methods = filter_methods

    # deprecate in the future
    self.unique_param = copy.deepcopy(unique_param)
    self.iv_value_param = copy.deepcopy(iv_value_param)
    self.iv_percentile_param = copy.deepcopy(iv_percentile_param)
    self.iv_top_k_param = copy.deepcopy(iv_top_k_param)
    self.variance_coe_param = copy.deepcopy(variance_coe_param)
    self.outlier_param = copy.deepcopy(outlier_param)
    self.percentage_value_param = copy.deepcopy(percentage_value_param)

    self.manually_param = copy.deepcopy(manually_param)
    self.iv_param = copy.deepcopy(iv_param)
    self.statistic_param = copy.deepcopy(statistic_param)
    self.psi_param = copy.deepcopy(psi_param)
    self.sbt_param = copy.deepcopy(sbt_param)
    self.need_run = need_run

check(self) ¶

Source code in federatedml/param/feature_selection_param.py

def check(self):
    descr = "hetero feature selection param's"

    self.check_defined_type(self.filter_methods, descr, ['list'])

    for idx, method in enumerate(self.filter_methods):
        method = method.lower()
        self.check_valid_value(method, descr, [consts.UNIQUE_VALUE, consts.IV_VALUE_THRES, consts.IV_PERCENTILE,
                                               consts.COEFFICIENT_OF_VARIATION_VALUE_THRES, consts.OUTLIER_COLS,
                                               consts.MANUALLY_FILTER, consts.PERCENTAGE_VALUE,
                                               consts.IV_FILTER, consts.STATISTIC_FILTER, consts.IV_TOP_K,
                                               consts.PSI_FILTER, consts.HETERO_SBT_FILTER,
                                               consts.HOMO_SBT_FILTER, consts.HETERO_FAST_SBT_FILTER,
                                               consts.VIF_FILTER, consts.CORRELATION_FILTER])

        self.filter_methods[idx] = method

    self.check_defined_type(self.select_col_indexes, descr, ['list', 'int'])

    self.unique_param.check()
    self.iv_value_param.check()
    self.iv_percentile_param.check()
    self.iv_top_k_param.check()
    self.variance_coe_param.check()
    self.outlier_param.check()
    self.manually_param.check()
    self.percentage_value_param.check()

    self.iv_param.check()
    for th in self.iv_param.take_high:
        if not th:
            raise ValueError("Iv filter should take higher iv features")
    for m in self.iv_param.metrics:
        if m != consts.IV:
            raise ValueError("For iv filter, metrics should be 'iv'")

    self.statistic_param.check()
    self.psi_param.check()
    for th in self.psi_param.take_high:
        if th:
            raise ValueError("PSI filter should take lower psi features")
    for m in self.psi_param.metrics:
        if m != consts.PSI:
            raise ValueError("For psi filter, metrics should be 'psi'")

    self.sbt_param.check()
    for th in self.sbt_param.take_high:
        if not th:
            raise ValueError("SBT filter should take higher feature_importance features")
    for m in self.sbt_param.metrics:
        if m != consts.FEATURE_IMPORTANCE:
            raise ValueError("For SBT filter, metrics should be 'feature_importance'")

    self.vif_param.check()
    for m in self.vif_param.metrics:
        if m != consts.VIF:
            raise ValueError("For VIF filter, metrics should be 'vif'")

    self.correlation_param.check()

    self._warn_to_deprecate_param("iv_value_param", descr, "iv_param")
    self._warn_to_deprecate_param("iv_percentile_param", descr, "iv_param")
    self._warn_to_deprecate_param("iv_top_k_param", descr, "iv_param")
    self._warn_to_deprecate_param("variance_coe_param", descr, "statistic_param")
    self._warn_to_deprecate_param("unique_param", descr, "statistic_param")
    self._warn_to_deprecate_param("outlier_param", descr, "statistic_param")

`feldman_verifiable_sum_param` ¶

Classes¶


FeldmanVerifiableSumParam            (BaseParam)

¶

Define how to transfer the cols

Parameters:

Name	Type	Description	Default
`sum_cols`	`list of column index, default: None`	Specify which columns need to be sum. If column index is None, each of columns will be sum.	`None`
`q_n`	`int, positive integer less than or equal to 16, default: 6`	q_n is the number of significant decimal digit, If the data type is a float, the maximum significant digit is 16. The sum of integer and significant decimal digits should be less than or equal to 16.	`6`

Source code in federatedml/param/feldman_verifiable_sum_param.py

class FeldmanVerifiableSumParam(BaseParam):
    """
    Define how to transfer the cols

    Parameters
    ----------
    sum_cols : list of column index, default: None
        Specify which columns need to be sum. If column index is None, each of columns will be sum.

    q_n : int, positive integer less than or equal to 16, default: 6
        q_n is the number of significant decimal digit, If the data type is a float, 
        the maximum significant digit is 16. The sum of integer and significant decimal digits should 
        be less than or equal to 16.
    """
    def __init__(self, sum_cols=None, q_n=6):
        self.sum_cols = sum_cols
        if sum_cols is None:
            self.sum_cols = []

        self.q_n = q_n

    def check(self):
        if isinstance(self.sum_cols, list):
            for idx in self.sum_cols:
                if not isinstance(idx, int):
                    raise ValueError(f"type mismatch, column_indexes with element {idx}(type is {type(idx)})")

        if not isinstance(self.q_n, int):
            raise ValueError(f"Init param's q_n {self.q_n} not supported, should be int type", type is {type(self.q_n)})

        if self.q_n < 0:
            raise ValueError(f"param's q_n {self.q_n} not supported, should be non-negative int value")
        elif self.q_n > 16:
            raise ValueError(f"param's q_n {self.q_n} not supported, should be less than or equal to 16")

__init__(self, sum_cols=None, q_n=6) special ¶

Source code in federatedml/param/feldman_verifiable_sum_param.py

def __init__(self, sum_cols=None, q_n=6):
    self.sum_cols = sum_cols
    if sum_cols is None:
        self.sum_cols = []

    self.q_n = q_n

check(self) ¶

Source code in federatedml/param/feldman_verifiable_sum_param.py

def check(self):
    if isinstance(self.sum_cols, list):
        for idx in self.sum_cols:
            if not isinstance(idx, int):
                raise ValueError(f"type mismatch, column_indexes with element {idx}(type is {type(idx)})")

    if not isinstance(self.q_n, int):
        raise ValueError(f"Init param's q_n {self.q_n} not supported, should be int type", type is {type(self.q_n)})

    if self.q_n < 0:
        raise ValueError(f"param's q_n {self.q_n} not supported, should be non-negative int value")
    elif self.q_n > 16:
        raise ValueError(f"param's q_n {self.q_n} not supported, should be less than or equal to 16")

`ftl_param` ¶

deprecated_param_list ¶

Classes¶


FTLParam            (BaseParam)

¶

Source code in federatedml/param/ftl_param.py

class FTLParam(BaseParam):

    def __init__(self, alpha=1, tol=0.000001,
                 n_iter_no_change=False, validation_freqs=None, optimizer={'optimizer': 'Adam', 'learning_rate': 0.01},
                 nn_define={}, epochs=1
                 , intersect_param=IntersectParam(consts.RSA), config_type='keras', batch_size=-1,
                 encrypte_param=EncryptParam(),
                 encrypted_mode_calculator_param=EncryptedModeCalculatorParam(mode="confusion_opt"),
                 predict_param=PredictParam(), mode='plain', communication_efficient=False,
                 local_round=5, callback_param=CallbackParam()):
        """
        Parameters
        ----------
        alpha : float
            a loss coefficient defined in paper, it defines the importance of alignment loss
        tol : float
            loss tolerance
        n_iter_no_change : bool
            check loss convergence or not
        validation_freqs : None or positive integer or container object in python
            Do validation in training process or Not.
            if equals None, will not do validation in train process;
            if equals positive integer, will validate data every validation_freqs epochs passes;
            if container object in python, will validate data if epochs belong to this container.
            e.g. validation_freqs = [10, 15], will validate data when epoch equals to 10 and 15.
            The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to
            speed up training by skipping validation rounds. When it is larger than 1, a number which is
            divisible by "epochs" is recommended, otherwise, you will miss the validation scores
            of last training epoch.
        optimizer : str or dict
            optimizer method, accept following types:
            1. a string, one of "Adadelta", "Adagrad", "Adam", "Adamax", "Nadam", "RMSprop", "SGD"
            2. a dict, with a required key-value pair keyed by "optimizer",
                with optional key-value pairs such as learning rate.
            defaults to "SGD"
        nn_define : dict
            a dict represents the structure of neural network, it can be output by tf-keras
        epochs : int
            epochs num
        intersect_param
            define the intersect method
        config_type : {'tf-keras'}
            config type
        batch_size : int
            batch size when computing transformed feature embedding, -1 use full data.
        encrypte_param
            encrypted param
        encrypted_mode_calculator_param
            encrypted mode calculator param:
        predict_param
            predict param
        mode: {"plain", "encrypted"}
            plain: will not use any encrypt algorithms, data exchanged in plaintext
            encrypted: use paillier to encrypt gradients
        communication_efficient: bool
            will use communication efficient or not. when communication efficient is enabled, FTL model will
            update gradients by several local rounds using intermediate data
        local_round: int
            local update round when using communication efficient
        """

        super(FTLParam, self).__init__()
        self.alpha = alpha
        self.tol = tol
        self.n_iter_no_change = n_iter_no_change
        self.validation_freqs = validation_freqs
        self.optimizer = optimizer
        self.nn_define = nn_define
        self.epochs = epochs
        self.intersect_param = copy.deepcopy(intersect_param)
        self.config_type = config_type
        self.batch_size = batch_size
        self.encrypted_mode_calculator_param = copy.deepcopy(encrypted_mode_calculator_param)
        self.encrypt_param = copy.deepcopy(encrypte_param)
        self.predict_param = copy.deepcopy(predict_param)
        self.mode = mode
        self.communication_efficient = communication_efficient
        self.local_round = local_round
        self.callback_param = copy.deepcopy(callback_param)

    def check(self):
        self.intersect_param.check()
        self.encrypt_param.check()
        self.encrypted_mode_calculator_param.check()

        self.optimizer = self._parse_optimizer(self.optimizer)

        supported_config_type = ["keras"]
        if self.config_type not in supported_config_type:
            raise ValueError(f"config_type should be one of {supported_config_type}")

        if not isinstance(self.tol, (int, float)):
            raise ValueError("tol should be numeric")

        if not isinstance(self.epochs, int) or self.epochs <= 0:
            raise ValueError("epochs should be a positive integer")

        if self.nn_define and not isinstance(self.nn_define, dict):
            raise ValueError("bottom_nn_define should be a dict defining the structure of neural network")

        if self.batch_size != -1:
            if not isinstance(self.batch_size, int) \
                    or self.batch_size < consts.MIN_BATCH_SIZE:
                raise ValueError(
                    " {} not supported, should be larger than 10 or -1 represent for all data".format(self.batch_size))

        for p in deprecated_param_list:
            # if self._warn_to_deprecate_param(p, "", ""):
            if self._deprecated_params_set.get(p):
                if "callback_param" in self.get_user_feeded():
                    raise ValueError(f"{p} and callback param should not be set simultaneously，"
                                     f"{self._deprecated_params_set}, {self.get_user_feeded()}")
                else:
                    self.callback_param.callbacks = ["PerformanceEvaluate"]
                break

        descr = "ftl's"

        if self._warn_to_deprecate_param("validation_freqs", descr, "callback_param's 'validation_freqs'"):
            self.callback_param.validation_freqs = self.validation_freqs

        if self._warn_to_deprecate_param("metrics", descr, "callback_param's 'metrics'"):
            self.callback_param.metrics = self.metrics

        if self.validation_freqs is None:
            pass
        elif isinstance(self.validation_freqs, int):
            if self.validation_freqs < 1:
                raise ValueError("validation_freqs should be larger than 0 when it's integer")
        elif not isinstance(self.validation_freqs, collections.Container):
            raise ValueError("validation_freqs should be None or positive integer or container")

        assert type(self.communication_efficient) is bool, 'communication efficient must be a boolean'
        assert self.mode in ['encrypted', 'plain'], 'mode options: encrpyted or plain, but {} is offered'.format(self.mode)

        self.check_positive_integer(self.epochs, 'epochs')
        self.check_positive_number(self.alpha, 'alpha')
        self.check_positive_integer(self.local_round, 'local round')

    @staticmethod
    def _parse_optimizer(opt):
        """
        Examples:

            1. "optimize": "SGD"
            2. "optimize": {
                "optimizer": "SGD",
                "learning_rate": 0.05
            }
        """

        kwargs = {}
        if isinstance(opt, str):
            return SimpleNamespace(optimizer=opt, kwargs=kwargs)
        elif isinstance(opt, dict):
            optimizer = opt.get("optimizer", kwargs)
            if not optimizer:
                raise ValueError(f"optimizer config: {opt} invalid")
            kwargs = {k: v for k, v in opt.items() if k != "optimizer"}
            return SimpleNamespace(optimizer=optimizer, kwargs=kwargs)
        else:
            raise ValueError(f"invalid type for optimize: {type(opt)}")

Methods¶

__init__(self, alpha=1, tol=1e-06, n_iter_no_change=False, validation_freqs=None, optimizer={'optimizer': 'Adam', 'learning_rate': 0.01}, nn_define={}, epochs=1, intersect_param=<federatedml.param.intersect_param.IntersectParam object at 0x7f3f8a6aecd0>, config_type='keras', batch_size=-1, encrypte_param=<federatedml.param.encrypt_param.EncryptParam object at 0x7f3f8a7c9c50>, encrypted_mode_calculator_param=<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object at 0x7f3f8a6cb150>, predict_param=<federatedml.param.predict_param.PredictParam object at 0x7f3f8a6cb6d0>, mode='plain', communication_efficient=False, local_round=5, callback_param=<federatedml.param.callback_param.CallbackParam object at 0x7f3f8a6cbc10>)

special ¶

Parameters:

Name	Type	Description	Default
`alpha`	`float`	a loss coefficient defined in paper, it defines the importance of alignment loss	`1`
`tol`	`float`	loss tolerance	`1e-06`
`n_iter_no_change`	`bool`	check loss convergence or not	`False`
`validation_freqs`	`None or positive integer or container object in python`	Do validation in training process or Not. if equals None, will not do validation in train process; if equals positive integer, will validate data every validation_freqs epochs passes; if container object in python, will validate data if epochs belong to this container. e.g. validation_freqs = [10, 15], will validate data when epoch equals to 10 and 15. The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to speed up training by skipping validation rounds. When it is larger than 1, a number which is divisible by "epochs" is recommended, otherwise, you will miss the validation scores of last training epoch.	`None`
`optimizer`	`str or dict`	optimizer method, accept following types: 1. a string, one of "Adadelta", "Adagrad", "Adam", "Adamax", "Nadam", "RMSprop", "SGD" 2. a dict, with a required key-value pair keyed by "optimizer", with optional key-value pairs such as learning rate. defaults to "SGD"	`{'optimizer': 'Adam', 'learning_rate': 0.01}`
`nn_define`	`dict`	a dict represents the structure of neural network, it can be output by tf-keras	`{}`
`epochs`	`int`	epochs num	`1`
`intersect_param`	`None`	define the intersect method	`<federatedml.param.intersect_param.IntersectParam object at 0x7f3f8a6aecd0>`
`config_type`	`{'tf-keras'}`	config type	`'keras'`
`batch_size`	`int`	batch size when computing transformed feature embedding, -1 use full data.	`-1`
`encrypte_param`	`None`	encrypted param	`<federatedml.param.encrypt_param.EncryptParam object at 0x7f3f8a7c9c50>`
`encrypted_mode_calculator_param`	`None`	encrypted mode calculator param:	`<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object at 0x7f3f8a6cb150>`
`predict_param`	`None`	predict param	`<federatedml.param.predict_param.PredictParam object at 0x7f3f8a6cb6d0>`
`mode`	`{"plain", "encrypted"}`		`'plain'`
`communication_efficient`	`bool`	will use communication efficient or not. when communication efficient is enabled, FTL model will update gradients by several local rounds using intermediate data	`False`
`local_round`	`int`	local update round when using communication efficient	`5`

Source code in federatedml/param/ftl_param.py

def __init__(self, alpha=1, tol=0.000001,
             n_iter_no_change=False, validation_freqs=None, optimizer={'optimizer': 'Adam', 'learning_rate': 0.01},
             nn_define={}, epochs=1
             , intersect_param=IntersectParam(consts.RSA), config_type='keras', batch_size=-1,
             encrypte_param=EncryptParam(),
             encrypted_mode_calculator_param=EncryptedModeCalculatorParam(mode="confusion_opt"),
             predict_param=PredictParam(), mode='plain', communication_efficient=False,
             local_round=5, callback_param=CallbackParam()):
    """
    Parameters
    ----------
    alpha : float
        a loss coefficient defined in paper, it defines the importance of alignment loss
    tol : float
        loss tolerance
    n_iter_no_change : bool
        check loss convergence or not
    validation_freqs : None or positive integer or container object in python
        Do validation in training process or Not.
        if equals None, will not do validation in train process;
        if equals positive integer, will validate data every validation_freqs epochs passes;
        if container object in python, will validate data if epochs belong to this container.
        e.g. validation_freqs = [10, 15], will validate data when epoch equals to 10 and 15.
        The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to
        speed up training by skipping validation rounds. When it is larger than 1, a number which is
        divisible by "epochs" is recommended, otherwise, you will miss the validation scores
        of last training epoch.
    optimizer : str or dict
        optimizer method, accept following types:
        1. a string, one of "Adadelta", "Adagrad", "Adam", "Adamax", "Nadam", "RMSprop", "SGD"
        2. a dict, with a required key-value pair keyed by "optimizer",
            with optional key-value pairs such as learning rate.
        defaults to "SGD"
    nn_define : dict
        a dict represents the structure of neural network, it can be output by tf-keras
    epochs : int
        epochs num
    intersect_param
        define the intersect method
    config_type : {'tf-keras'}
        config type
    batch_size : int
        batch size when computing transformed feature embedding, -1 use full data.
    encrypte_param
        encrypted param
    encrypted_mode_calculator_param
        encrypted mode calculator param:
    predict_param
        predict param
    mode: {"plain", "encrypted"}
        plain: will not use any encrypt algorithms, data exchanged in plaintext
        encrypted: use paillier to encrypt gradients
    communication_efficient: bool
        will use communication efficient or not. when communication efficient is enabled, FTL model will
        update gradients by several local rounds using intermediate data
    local_round: int
        local update round when using communication efficient
    """

    super(FTLParam, self).__init__()
    self.alpha = alpha
    self.tol = tol
    self.n_iter_no_change = n_iter_no_change
    self.validation_freqs = validation_freqs
    self.optimizer = optimizer
    self.nn_define = nn_define
    self.epochs = epochs
    self.intersect_param = copy.deepcopy(intersect_param)
    self.config_type = config_type
    self.batch_size = batch_size
    self.encrypted_mode_calculator_param = copy.deepcopy(encrypted_mode_calculator_param)
    self.encrypt_param = copy.deepcopy(encrypte_param)
    self.predict_param = copy.deepcopy(predict_param)
    self.mode = mode
    self.communication_efficient = communication_efficient
    self.local_round = local_round
    self.callback_param = copy.deepcopy(callback_param)

check(self) ¶

Source code in federatedml/param/ftl_param.py

def check(self):
    self.intersect_param.check()
    self.encrypt_param.check()
    self.encrypted_mode_calculator_param.check()

    self.optimizer = self._parse_optimizer(self.optimizer)

    supported_config_type = ["keras"]
    if self.config_type not in supported_config_type:
        raise ValueError(f"config_type should be one of {supported_config_type}")

    if not isinstance(self.tol, (int, float)):
        raise ValueError("tol should be numeric")

    if not isinstance(self.epochs, int) or self.epochs <= 0:
        raise ValueError("epochs should be a positive integer")

    if self.nn_define and not isinstance(self.nn_define, dict):
        raise ValueError("bottom_nn_define should be a dict defining the structure of neural network")

    if self.batch_size != -1:
        if not isinstance(self.batch_size, int) \
                or self.batch_size < consts.MIN_BATCH_SIZE:
            raise ValueError(
                " {} not supported, should be larger than 10 or -1 represent for all data".format(self.batch_size))

    for p in deprecated_param_list:
        # if self._warn_to_deprecate_param(p, "", ""):
        if self._deprecated_params_set.get(p):
            if "callback_param" in self.get_user_feeded():
                raise ValueError(f"{p} and callback param should not be set simultaneously，"
                                 f"{self._deprecated_params_set}, {self.get_user_feeded()}")
            else:
                self.callback_param.callbacks = ["PerformanceEvaluate"]
            break

    descr = "ftl's"

    if self._warn_to_deprecate_param("validation_freqs", descr, "callback_param's 'validation_freqs'"):
        self.callback_param.validation_freqs = self.validation_freqs

    if self._warn_to_deprecate_param("metrics", descr, "callback_param's 'metrics'"):
        self.callback_param.metrics = self.metrics

    if self.validation_freqs is None:
        pass
    elif isinstance(self.validation_freqs, int):
        if self.validation_freqs < 1:
            raise ValueError("validation_freqs should be larger than 0 when it's integer")
    elif not isinstance(self.validation_freqs, collections.Container):
        raise ValueError("validation_freqs should be None or positive integer or container")

    assert type(self.communication_efficient) is bool, 'communication efficient must be a boolean'
    assert self.mode in ['encrypted', 'plain'], 'mode options: encrpyted or plain, but {} is offered'.format(self.mode)

    self.check_positive_integer(self.epochs, 'epochs')
    self.check_positive_number(self.alpha, 'alpha')
    self.check_positive_integer(self.local_round, 'local round')

`hetero_kmeans_param` ¶

Classes¶


KmeansParam            (BaseParam)

¶

Parameters used for K-means.

k : int, default 5 The number of the centroids to generate. should be larger than 1 and less than 100 in this version max_iter : int, default 300. Maximum number of iterations of the hetero-k-means algorithm to run. tol : float, default 0.001. tol random_stat : None or int random seed

Source code in federatedml/param/hetero_kmeans_param.py

class KmeansParam(BaseParam):
    """
    Parameters used for K-means.
    ----------
    k : int, default 5
        The number of the centroids to generate.
        should be larger than 1 and less than 100 in this version
    max_iter : int, default 300.
        Maximum number of iterations of the hetero-k-means algorithm to run.
    tol : float, default 0.001.
        tol
    random_stat : None or int
        random seed
    """

    def __init__(self, k=5, max_iter=300, tol=0.001, random_stat=None):
        super(KmeansParam, self).__init__()
        self.k = k
        self.max_iter = max_iter
        self.tol = tol
        self.random_stat = random_stat

    def check(self):
        descr = "Kmeans_param's"

        if not isinstance(self.k, int):
            raise ValueError(
                descr + "k {} not supported, should be int type".format(self.k))
        elif self.k <= 1:
            raise ValueError(
                descr + "k {} not supported, should be larger than 1")
        elif self.k > 100:
            raise ValueError(
                descr + "k {} not supported, should be less than 100 in this version")

        if not isinstance(self.max_iter, int):
            raise ValueError(
                descr + "max_iter not supported, should be int type".format(self.max_iter))
        elif self.max_iter <= 0:
            raise ValueError(
                descr + "max_iter not supported, should be larger than 0".format(self.max_iter))

        if not isinstance(self.tol, (float, int)):
            raise ValueError(
                descr + "tol not supported, should be float type".format(self.tol))
        elif self.tol < 0:
            raise ValueError(
                descr + "tol not supported, should be larger than or equal to 0".format(self.tol))

        if self.random_stat is not None:
            if not isinstance(self.random_stat, int):
                raise ValueError(descr + "random_stat not supported, should be int type".format(self.random_stat))
            elif self.random_stat < 0:
                raise ValueError(
                    descr + "random_stat not supported, should be larger than/equal to 0".format(self.random_stat))

__init__(self, k=5, max_iter=300, tol=0.001, random_stat=None) special ¶

Source code in federatedml/param/hetero_kmeans_param.py

def __init__(self, k=5, max_iter=300, tol=0.001, random_stat=None):
    super(KmeansParam, self).__init__()
    self.k = k
    self.max_iter = max_iter
    self.tol = tol
    self.random_stat = random_stat

check(self) ¶

Source code in federatedml/param/hetero_kmeans_param.py

def check(self):
    descr = "Kmeans_param's"

    if not isinstance(self.k, int):
        raise ValueError(
            descr + "k {} not supported, should be int type".format(self.k))
    elif self.k <= 1:
        raise ValueError(
            descr + "k {} not supported, should be larger than 1")
    elif self.k > 100:
        raise ValueError(
            descr + "k {} not supported, should be less than 100 in this version")

    if not isinstance(self.max_iter, int):
        raise ValueError(
            descr + "max_iter not supported, should be int type".format(self.max_iter))
    elif self.max_iter <= 0:
        raise ValueError(
            descr + "max_iter not supported, should be larger than 0".format(self.max_iter))

    if not isinstance(self.tol, (float, int)):
        raise ValueError(
            descr + "tol not supported, should be float type".format(self.tol))
    elif self.tol < 0:
        raise ValueError(
            descr + "tol not supported, should be larger than or equal to 0".format(self.tol))

    if self.random_stat is not None:
        if not isinstance(self.random_stat, int):
            raise ValueError(descr + "random_stat not supported, should be int type".format(self.random_stat))
        elif self.random_stat < 0:
            raise ValueError(
                descr + "random_stat not supported, should be larger than/equal to 0".format(self.random_stat))

`hetero_nn_param` ¶

Classes¶


SelectorParam

¶

Parameters used for Homo Neural Network.

Parameters:

Name	Type	Description	Default
`method`	`None`	None or str back propagation select method, accept "relative" only, default: None	`None`
`selective_size`	`None`	int deque size to use, store the most recent selective_size historical loss, default: 1024	`1024`
`beta`	`None`	int sample whose selective probability >= power(np.random, beta) will be selected	`1`
`min_prob`	`None`	Numeric selective probability is max(min_prob, rank_rate)	`0`

Source code in federatedml/param/hetero_nn_param.py

class SelectorParam(object):
    """
    Parameters used for Homo Neural Network.

    Args:
        method: None or str
            back propagation select method, accept "relative" only, default: None
        selective_size: int
            deque size to use, store the most recent selective_size historical loss, default: 1024
        beta: int
            sample whose selective probability >= power(np.random, beta) will be selected
        min_prob: Numeric
            selective probability is max(min_prob, rank_rate)

    """
    def __init__(self, method=None, beta=1, selective_size=consts.SELECTIVE_SIZE, min_prob=0, random_state=None):
        self.method = method
        self.selective_size = selective_size
        self.beta = beta
        self.min_prob = min_prob
        self.random_state = random_state

    def check(self):
        if self.method is not None and self.method not in ["relative"]:
            raise ValueError('selective method should be None be "relative"')

        if not isinstance(self.selective_size, int) or self.selective_size <= 0:
            raise ValueError("selective size should be a positive integer")

        if not isinstance(self.beta, int):
            raise ValueError("beta should be integer")

        if not isinstance(self.min_prob, (float, int)):
            raise ValueError("min_prob should be numeric")

__init__(self, method=None, beta=1, selective_size=1024, min_prob=0, random_state=None) special ¶

Source code in federatedml/param/hetero_nn_param.py

def __init__(self, method=None, beta=1, selective_size=consts.SELECTIVE_SIZE, min_prob=0, random_state=None):
    self.method = method
    self.selective_size = selective_size
    self.beta = beta
    self.min_prob = min_prob
    self.random_state = random_state

check(self) ¶

Source code in federatedml/param/hetero_nn_param.py

def check(self):
    if self.method is not None and self.method not in ["relative"]:
        raise ValueError('selective method should be None be "relative"')

    if not isinstance(self.selective_size, int) or self.selective_size <= 0:
        raise ValueError("selective size should be a positive integer")

    if not isinstance(self.beta, int):
        raise ValueError("beta should be integer")

    if not isinstance(self.min_prob, (float, int)):
        raise ValueError("min_prob should be numeric")


HeteroNNParam            (BaseParam)

¶

Parameters used for Hetero Neural Network.

Parameters:

Name	Type	Description	Default
`task_type`	`None`	str, task type of hetero nn model, one of 'classification', 'regression'.	`'classification'`
`config_type`	`None`	str, accept "keras" only.	`'keras'`
`bottom_nn_define`	`None`	a dict represents the structure of bottom neural network.	`None`
`interactive_layer_define`	`None`	a dict represents the structure of interactive layer.	`None`
`interactive_layer_lr`	`None`	float, the learning rate of interactive layer.	`0.9`
`top_nn_define`	`None`	a dict represents the structure of top neural network.	`None`
`optimizer`	`None`	optimizer method, accept following types: 1. a string, one of "Adadelta", "Adagrad", "Adam", "Adamax", "Nadam", "RMSprop", "SGD" 2. a dict, with a required key-value pair keyed by "optimizer", with optional key-value pairs such as learning rate. defaults to "SGD"	`'SGD'`
`loss`	`None`	str, a string to define loss function used	`None`
`epochs`	`None`	int, the maximum iteration for aggregation in training.	`100`
`batch_size`	`None`	int, batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy. defaults to -1.	required
`early_stop`	`None`	str, accept 'diff' only in this version, default: 'diff' Method used to judge converge or not. a) diff： Use difference of loss between two iterations to judge whether converge.	required
`floating_point_precision`	`None`	None or integer, if not None, means use floating_point_precision-bit to speed up calculation, e.g.: convert an x to round(x * 2floating_point_precision) during Paillier operation, divide the result by 2floating_point_precision in the end.	`23`
`drop_out_keep_rate`	`None`	float, should betweend 0 and 1, if not equals to 1.0, will enabled drop out	`1.0`
`callback_param`	`None`	CallbackParam object	`<federatedml.param.callback_param.CallbackParam object at 0x7f3f8a7f5250>`

Source code in federatedml/param/hetero_nn_param.py

class HeteroNNParam(BaseParam):
    """
    Parameters used for Hetero Neural Network.

    Args:
        task_type: str, task type of hetero nn model, one of 'classification', 'regression'.
        config_type: str, accept "keras" only.
        bottom_nn_define: a dict represents the structure of bottom neural network.
        interactive_layer_define: a dict represents the structure of interactive layer.
        interactive_layer_lr: float, the learning rate of interactive layer.
        top_nn_define: a dict represents the structure of top neural network.
        optimizer: optimizer method, accept following types:
            1. a string, one of "Adadelta", "Adagrad", "Adam", "Adamax", "Nadam", "RMSprop", "SGD"
            2. a dict, with a required key-value pair keyed by "optimizer",
                with optional key-value pairs such as learning rate.
            defaults to "SGD"
        loss:  str, a string to define loss function used
        epochs: int, the maximum iteration for aggregation in training.
        batch_size : int, batch size when updating model.
            -1 means use all data in a batch. i.e. Not to use mini-batch strategy.
            defaults to -1.
        early_stop : str, accept 'diff' only in this version, default: 'diff'
            Method used to judge converge or not.
                a)	diff： Use difference of loss between two iterations to judge whether converge.
        floating_point_precision: None or integer, if not None, means use floating_point_precision-bit to speed up calculation,
                                   e.g.: convert an x to round(x * 2**floating_point_precision) during Paillier operation, divide
                                          the result by 2**floating_point_precision in the end.
        drop_out_keep_rate: float, should betweend 0 and 1, if not equals to 1.0, will enabled drop out
        callback_param: CallbackParam object
    """

    def __init__(self,
                 task_type='classification',
                 config_type="keras",
                 bottom_nn_define=None,
                 top_nn_define=None,
                 interactive_layer_define=None,
                 interactive_layer_lr=0.9,
                 optimizer='SGD',
                 loss=None,
                 epochs=100,
                 batch_size=-1,
                 early_stop="diff",
                 tol=1e-5,
                 encrypt_param=EncryptParam(),
                 encrypted_mode_calculator_param=EncryptedModeCalculatorParam(mode="confusion_opt"),
                 predict_param=PredictParam(),
                 cv_param=CrossValidationParam(),
                 validation_freqs=None,
                 early_stopping_rounds=None,
                 metrics=None,
                 use_first_metric_only=True,
                 selector_param=SelectorParam(),
                 floating_point_precision=23,
                 drop_out_keep_rate=1.0,
                 callback_param=CallbackParam()):
        super(HeteroNNParam, self).__init__()

        self.task_type = task_type
        self.config_type = config_type
        self.bottom_nn_define = bottom_nn_define
        self.interactive_layer_define = interactive_layer_define
        self.interactive_layer_lr = interactive_layer_lr
        self.top_nn_define = top_nn_define
        self.batch_size = batch_size
        self.epochs = epochs
        self.early_stop = early_stop
        self.tol = tol
        self.optimizer = optimizer
        self.loss = loss
        self.validation_freqs = validation_freqs
        self.early_stopping_rounds = early_stopping_rounds
        self.metrics = metrics or []
        self.use_first_metric_only = use_first_metric_only

        self.encrypt_param = copy.deepcopy(encrypt_param)
        self.encrypted_model_calculator_param = encrypted_mode_calculator_param
        self.predict_param = copy.deepcopy(predict_param)
        self.cv_param = copy.deepcopy(cv_param)

        self.selector_param = selector_param
        self.floating_point_precision = floating_point_precision

        self.drop_out_keep_rate = drop_out_keep_rate

        self.callback_param = copy.deepcopy(callback_param)

    def check(self):
        self.optimizer = self._parse_optimizer(self.optimizer)
        supported_config_type = ["keras"]

        if self.task_type not in ["classification", "regression"]:
            raise ValueError("config_type should be classification or regression")

        if self.config_type not in supported_config_type:
            raise ValueError(f"config_type should be one of {supported_config_type}")

        if not isinstance(self.tol, (int, float)):
            raise ValueError("tol should be numeric")

        if not isinstance(self.epochs, int) or self.epochs <= 0:
            raise ValueError("epochs should be a positive integer")

        if self.bottom_nn_define and not isinstance(self.bottom_nn_define, dict):
            raise ValueError("bottom_nn_define should be a dict defining the structure of neural network")

        if self.top_nn_define and not isinstance(self.top_nn_define, dict):
            raise ValueError("top_nn_define should be a dict defining the structure of neural network")

        if self.interactive_layer_define is not None and not isinstance(self.interactive_layer_define, dict):
            raise ValueError(
                "the interactive_layer_define should be a dict defining the structure of interactive layer")

        if self.batch_size != -1:
            if not isinstance(self.batch_size, int) \
                    or self.batch_size < consts.MIN_BATCH_SIZE:
                raise ValueError(
                    " {} not supported, should be larger than 10 or -1 represent for all data".format(self.batch_size))

        if self.early_stop != "diff":
            raise ValueError("early stop should be diff in this version")

        if self.metrics is not None and not isinstance(self.metrics, list):
            raise ValueError("metrics should be a list")

        if self.floating_point_precision is not None and \
                (not isinstance(self.floating_point_precision, int) or\
                 self.floating_point_precision < 0 or self.floating_point_precision > 63):
            raise ValueError("floating point precision should be null or a integer between 0 and 63")

        if not isinstance(self.drop_out_keep_rate, (float, int)) or self.drop_out_keep_rate < 0.0 or \
                self.drop_out_keep_rate > 1.0:
            raise ValueError("drop_out_keep_rate should be in range [0.0, 1.0]")

        self.encrypt_param.check()
        self.encrypted_model_calculator_param.check()
        self.predict_param.check()
        self.selector_param.check()

        descr = "hetero nn param's "

        for p in ["early_stopping_rounds", "validation_freqs",
                  "use_first_metric_only"]:
            if self._deprecated_params_set.get(p):
                if "callback_param" in self.get_user_feeded():
                    raise ValueError(f"{p} and callback param should not be set simultaneously，"
                                     f"{self._deprecated_params_set}, {self.get_user_feeded()}")
                else:
                    self.callback_param.callbacks = ["PerformanceEvaluate"]
                break

        if self._warn_to_deprecate_param("validation_freqs", descr, "callback_param's 'validation_freqs'"):
            self.callback_param.validation_freqs = self.validation_freqs

        if self._warn_to_deprecate_param("early_stopping_rounds", descr, "callback_param's 'early_stopping_rounds'"):
            self.callback_param.early_stopping_rounds = self.early_stopping_rounds

        if self._warn_to_deprecate_param("metrics", descr, "callback_param's 'metrics'"):
            if self.metrics:
                self.callback_param.metrics = self.metrics

        if self._warn_to_deprecate_param("use_first_metric_only", descr, "callback_param's 'use_first_metric_only'"):
            self.callback_param.use_first_metric_only = self.use_first_metric_only

    @staticmethod
    def _parse_optimizer(opt):
        """
        Examples:

            1. "optimize": "SGD"
            2. "optimize": {
                "optimizer": "SGD",
                "learning_rate": 0.05
            }
        """

        kwargs = {}
        if isinstance(opt, str):
            return SimpleNamespace(optimizer=opt, kwargs=kwargs)
        elif isinstance(opt, dict):
            optimizer = opt.get("optimizer", kwargs)
            if not optimizer:
                raise ValueError(f"optimizer config: {opt} invalid")
            kwargs = {k: v for k, v in opt.items() if k != "optimizer"}
            return SimpleNamespace(optimizer=optimizer, kwargs=kwargs)
        elif opt is None:
            return None
        else:
            raise ValueError(f"invalid type for optimize: {type(opt)}")

__init__(self, task_type='classification', config_type='keras', bottom_nn_define=None, top_nn_define=None, interactive_layer_define=None, interactive_layer_lr=0.9, optimizer='SGD', loss=None, epochs=100, batch_size=-1, early_stop='diff', tol=1e-05, encrypt_param=<federatedml.param.encrypt_param.EncryptParam object at 0x7f3f8a7f50d0>, encrypted_mode_calculator_param=<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object at 0x7f3f8a7f5150>, predict_param=<federatedml.param.predict_param.PredictParam object at 0x7f3f8a7f5350>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object at 0x7f3f8a7f5110>, validation_freqs=None, early_stopping_rounds=None, metrics=None, use_first_metric_only=True, selector_param=<federatedml.param.hetero_nn_param.SelectorParam object at 0x7f3f8a7f52d0>, floating_point_precision=23, drop_out_keep_rate=1.0, callback_param=<federatedml.param.callback_param.CallbackParam object at 0x7f3f8a7f5250>)

special ¶

Source code in federatedml/param/hetero_nn_param.py

def __init__(self,
             task_type='classification',
             config_type="keras",
             bottom_nn_define=None,
             top_nn_define=None,
             interactive_layer_define=None,
             interactive_layer_lr=0.9,
             optimizer='SGD',
             loss=None,
             epochs=100,
             batch_size=-1,
             early_stop="diff",
             tol=1e-5,
             encrypt_param=EncryptParam(),
             encrypted_mode_calculator_param=EncryptedModeCalculatorParam(mode="confusion_opt"),
             predict_param=PredictParam(),
             cv_param=CrossValidationParam(),
             validation_freqs=None,
             early_stopping_rounds=None,
             metrics=None,
             use_first_metric_only=True,
             selector_param=SelectorParam(),
             floating_point_precision=23,
             drop_out_keep_rate=1.0,
             callback_param=CallbackParam()):
    super(HeteroNNParam, self).__init__()

    self.task_type = task_type
    self.config_type = config_type
    self.bottom_nn_define = bottom_nn_define
    self.interactive_layer_define = interactive_layer_define
    self.interactive_layer_lr = interactive_layer_lr
    self.top_nn_define = top_nn_define
    self.batch_size = batch_size
    self.epochs = epochs
    self.early_stop = early_stop
    self.tol = tol
    self.optimizer = optimizer
    self.loss = loss
    self.validation_freqs = validation_freqs
    self.early_stopping_rounds = early_stopping_rounds
    self.metrics = metrics or []
    self.use_first_metric_only = use_first_metric_only

    self.encrypt_param = copy.deepcopy(encrypt_param)
    self.encrypted_model_calculator_param = encrypted_mode_calculator_param
    self.predict_param = copy.deepcopy(predict_param)
    self.cv_param = copy.deepcopy(cv_param)

    self.selector_param = selector_param
    self.floating_point_precision = floating_point_precision

    self.drop_out_keep_rate = drop_out_keep_rate

    self.callback_param = copy.deepcopy(callback_param)

check(self) ¶

Source code in federatedml/param/hetero_nn_param.py

def check(self):
    self.optimizer = self._parse_optimizer(self.optimizer)
    supported_config_type = ["keras"]

    if self.task_type not in ["classification", "regression"]:
        raise ValueError("config_type should be classification or regression")

    if self.config_type not in supported_config_type:
        raise ValueError(f"config_type should be one of {supported_config_type}")

    if not isinstance(self.tol, (int, float)):
        raise ValueError("tol should be numeric")

    if not isinstance(self.epochs, int) or self.epochs <= 0:
        raise ValueError("epochs should be a positive integer")

    if self.bottom_nn_define and not isinstance(self.bottom_nn_define, dict):
        raise ValueError("bottom_nn_define should be a dict defining the structure of neural network")

    if self.top_nn_define and not isinstance(self.top_nn_define, dict):
        raise ValueError("top_nn_define should be a dict defining the structure of neural network")

    if self.interactive_layer_define is not None and not isinstance(self.interactive_layer_define, dict):
        raise ValueError(
            "the interactive_layer_define should be a dict defining the structure of interactive layer")

    if self.batch_size != -1:
        if not isinstance(self.batch_size, int) \
                or self.batch_size < consts.MIN_BATCH_SIZE:
            raise ValueError(
                " {} not supported, should be larger than 10 or -1 represent for all data".format(self.batch_size))

    if self.early_stop != "diff":
        raise ValueError("early stop should be diff in this version")

    if self.metrics is not None and not isinstance(self.metrics, list):
        raise ValueError("metrics should be a list")

    if self.floating_point_precision is not None and \
            (not isinstance(self.floating_point_precision, int) or\
             self.floating_point_precision < 0 or self.floating_point_precision > 63):
        raise ValueError("floating point precision should be null or a integer between 0 and 63")

    if not isinstance(self.drop_out_keep_rate, (float, int)) or self.drop_out_keep_rate < 0.0 or \
            self.drop_out_keep_rate > 1.0:
        raise ValueError("drop_out_keep_rate should be in range [0.0, 1.0]")

    self.encrypt_param.check()
    self.encrypted_model_calculator_param.check()
    self.predict_param.check()
    self.selector_param.check()

    descr = "hetero nn param's "

    for p in ["early_stopping_rounds", "validation_freqs",
              "use_first_metric_only"]:
        if self._deprecated_params_set.get(p):
            if "callback_param" in self.get_user_feeded():
                raise ValueError(f"{p} and callback param should not be set simultaneously，"
                                 f"{self._deprecated_params_set}, {self.get_user_feeded()}")
            else:
                self.callback_param.callbacks = ["PerformanceEvaluate"]
            break

    if self._warn_to_deprecate_param("validation_freqs", descr, "callback_param's 'validation_freqs'"):
        self.callback_param.validation_freqs = self.validation_freqs

    if self._warn_to_deprecate_param("early_stopping_rounds", descr, "callback_param's 'early_stopping_rounds'"):
        self.callback_param.early_stopping_rounds = self.early_stopping_rounds

    if self._warn_to_deprecate_param("metrics", descr, "callback_param's 'metrics'"):
        if self.metrics:
            self.callback_param.metrics = self.metrics

    if self._warn_to_deprecate_param("use_first_metric_only", descr, "callback_param's 'use_first_metric_only'"):
        self.callback_param.use_first_metric_only = self.use_first_metric_only

`hetero_sshe_lr_param` ¶

Classes¶


LogisticRegressionParam            (BaseParam)

¶

Parameters used for Hetero SSHE Logistic Regression

Parameters:

Name	Type	Description	Default
`penalty`	`str, 'L1', 'L2' or None. default: None`	Penalty method used in LR. If it is not None, weights are required to be reconstruct every iter.	`None`
`tol`	`float, default: 1e-4`	The tolerance of convergence	`0.0001`
`alpha`	`float, default: 1.0`	Regularization strength coefficient.	`1.0`
`optimizer`	`str, 'sgd', 'rmsprop', 'adam', 'nesterov_momentum_sgd', or 'adagrad', default: 'sgd'`	Optimize method	`'sgd'`
`batch_size`	`int, default: -1`	Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.	`-1`
`learning_rate`	`float, default: 0.01`	Learning rate	`0.01`
`max_iter`	`int, default: 100`	The maximum iteration for training.	`100`
`early_stop`	`str, 'diff', 'weight_diff' or 'abs', default: 'diff'`	Method used to judge converge or not. a) diff： Use difference of loss between two iterations to judge whether converge. b) weight_diff: Use difference between weights of two consecutive iterations c) abs: Use the absolute value of loss to judge whether converge. i.e. if loss < eps, it is converged.	`'diff'`
`decay`	`int or float, default: 1`	Decay rate for learning rate. learning rate will follow the following decay schedule. lr = lr0/(1+decayt) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decayt) where t is the iter number.	`1`
`decay_sqrt`	`Bool, default: True`	lr = lr0/(1+decayt) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decayt)	`True`
`encrypt_param`	`EncryptParam object, default: default EncryptParam object`	encrypt param	`<federatedml.param.encrypt_param.EncryptParam object at 0x7f3f8a5bcd50>`
`predict_param`	`PredictParam object, default: default PredictParam object`	predict param	`<federatedml.param.predict_param.PredictParam object at 0x7f3f8a5bcd90>`
`cv_param`	`CrossValidationParam object, default: default CrossValidationParam object`	cv param	`<federatedml.param.cross_validation_param.CrossValidationParam object at 0x7f3f8a5bccd0>`
`multi_class`	`str, 'ovr', default: 'ovr'`	If it is a multi_class task, indicate what strategy to use. Currently, support 'ovr' short for one_vs_rest only.	`'ovr'`
`reveal_strategy`	`str, "respectively", "encrypted_reveal_in_host", default: "respectively"`	"respectively": Means guest and host can reveal their own part of weights only. "encrypted_reveal_in_host": Means host can be revealed his weights in encrypted mode, and guest can be revealed in normal mode.	`'respectively'`
`reveal_every_iter`	`bool, default: True`	Whether reconstruct model weights every iteration. If so, Regularization is available. The performance will be better as well since the algorithm process is simplified.	`True`

Source code in federatedml/param/hetero_sshe_lr_param.py

class LogisticRegressionParam(BaseParam):
    """
    Parameters used for Hetero SSHE Logistic Regression

    Parameters
    ----------
    penalty : str, 'L1', 'L2' or None. default: None
        Penalty method used in LR. If it is not None, weights are required to be reconstruct every iter.

    tol : float, default: 1e-4
        The tolerance of convergence

    alpha : float, default: 1.0
        Regularization strength coefficient.

    optimizer : str, 'sgd', 'rmsprop', 'adam', 'nesterov_momentum_sgd', or 'adagrad', default: 'sgd'
        Optimize method

    batch_size : int, default: -1
        Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.

    learning_rate : float, default: 0.01
        Learning rate

    max_iter : int, default: 100
        The maximum iteration for training.

    early_stop : str, 'diff', 'weight_diff' or 'abs', default: 'diff'
        Method used to judge converge or not.
            a)	diff： Use difference of loss between two iterations to judge whether converge.
            b)  weight_diff: Use difference between weights of two consecutive iterations
            c)	abs: Use the absolute value of loss to judge whether converge. i.e. if loss < eps, it is converged.

    decay: int or float, default: 1
        Decay rate for learning rate. learning rate will follow the following decay schedule.
        lr = lr0/(1+decay*t) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decay*t)
        where t is the iter number.

    decay_sqrt: Bool, default: True
        lr = lr0/(1+decay*t) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decay*t)

    encrypt_param: EncryptParam object, default: default EncryptParam object
        encrypt param

    predict_param: PredictParam object, default: default PredictParam object
        predict param

    cv_param: CrossValidationParam object, default: default CrossValidationParam object
        cv param

    multi_class: str, 'ovr', default: 'ovr'
        If it is a multi_class task, indicate what strategy to use. Currently, support 'ovr' short for one_vs_rest only.

    reveal_strategy: str, "respectively", "encrypted_reveal_in_host", default: "respectively"
        "respectively": Means guest and host can reveal their own part of weights only.
        "encrypted_reveal_in_host": Means host can be revealed his weights in encrypted mode, and guest can be revealed in normal mode.

    reveal_every_iter: bool, default: True
        Whether reconstruct model weights every iteration. If so, Regularization is available.
        The performance will be better as well since the algorithm process is simplified.

    """

    def __init__(self, penalty=None,
                 tol=1e-4, alpha=1.0, optimizer='sgd',
                 batch_size=-1, learning_rate=0.01, init_param=InitParam(),
                 max_iter=100, early_stop='diff', encrypt_param=EncryptParam(),
                 predict_param=PredictParam(), cv_param=CrossValidationParam(),
                 decay=1, decay_sqrt=True,
                 multi_class='ovr', use_mix_rand=True,
                 reveal_strategy="respectively",
                 reveal_every_iter=True,
                 callback_param=CallbackParam(),
                 encrypted_mode_calculator_param=EncryptedModeCalculatorParam()
                 ):
        super(LogisticRegressionParam, self).__init__()
        self.penalty = penalty
        self.tol = tol
        self.alpha = alpha
        self.optimizer = optimizer
        self.batch_size = batch_size
        self.learning_rate = learning_rate
        self.init_param = copy.deepcopy(init_param)
        self.max_iter = max_iter
        self.early_stop = early_stop
        self.encrypt_param = encrypt_param
        self.predict_param = copy.deepcopy(predict_param)
        self.decay = decay
        self.decay_sqrt = decay_sqrt
        self.multi_class = multi_class
        self.use_mix_rand = use_mix_rand
        self.reveal_strategy = reveal_strategy
        self.reveal_every_iter = reveal_every_iter
        self.callback_param = copy.deepcopy(callback_param)
        self.cv_param = copy.deepcopy(cv_param)
        self.encrypted_mode_calculator_param = copy.deepcopy(encrypted_mode_calculator_param)

    def check(self):
        descr = "logistic_param's"

        if self.penalty is None:
            pass
        elif type(self.penalty).__name__ != "str":
            raise ValueError(
                "logistic_param's penalty {} not supported, should be str type".format(self.penalty))
        else:
            self.penalty = self.penalty.upper()
            if self.penalty not in [consts.L1_PENALTY, consts.L2_PENALTY]:
                raise ValueError(
                    "logistic_param's penalty not supported, penalty should be 'L1', 'L2' or 'none'")
            if not self.reveal_every_iter:
                if self.penalty not in [consts.L2_PENALTY]:
                    raise ValueError(
                        f"penalty should be 'L2' or 'none', when reveal_every_iter is False"
                    )

        if not isinstance(self.tol, (int, float)):
            raise ValueError(
                "logistic_param's tol {} not supported, should be float type".format(self.tol))

        if type(self.alpha).__name__ not in ["float", 'int']:
            raise ValueError(
                "logistic_param's alpha {} not supported, should be float or int type".format(self.alpha))

        if type(self.optimizer).__name__ != "str":
            raise ValueError(
                "logistic_param's optimizer {} not supported, should be str type".format(self.optimizer))
        else:
            self.optimizer = self.optimizer.lower()
            if self.reveal_every_iter:
                if self.optimizer not in ['sgd', 'rmsprop', 'adam', 'adagrad', 'nesterov_momentum_sgd']:
                    raise ValueError(
                        "When reveal_every_iter is True, "
                        "sshe logistic_param's optimizer not supported, optimizer should be"
                        " 'sgd', 'rmsprop', 'adam', 'nesterov_momentum_sgd', or 'adagrad'")
            else:
                if self.optimizer not in ['sgd', 'nesterov_momentum_sgd']:
                    raise ValueError("When reveal_every_iter is False, "
                                     "sshe logistic_param's optimizer not supported, optimizer should be"
                                     " 'sgd', 'nesterov_momentum_sgd'")

        if self.batch_size != -1:
            if type(self.batch_size).__name__ not in ["int"] \
                    or self.batch_size < consts.MIN_BATCH_SIZE:
                raise ValueError(descr + " {} not supported, should be larger than {} or "
                                         "-1 represent for all data".format(self.batch_size, consts.MIN_BATCH_SIZE))

        if not isinstance(self.learning_rate, (float, int)):
            raise ValueError(
                "logistic_param's learning_rate {} not supported, should be float or int type".format(
                    self.learning_rate))

        self.init_param.check()

        if type(self.max_iter).__name__ != "int":
            raise ValueError(
                "logistic_param's max_iter {} not supported, should be int type".format(self.max_iter))
        elif self.max_iter <= 0:
            raise ValueError(
                "logistic_param's max_iter must be greater or equal to 1")

        if type(self.early_stop).__name__ != "str":
            raise ValueError(
                "logistic_param's early_stop {} not supported, should be str type".format(
                    self.early_stop))
        else:
            self.early_stop = self.early_stop.lower()
            if self.early_stop not in ['diff', 'abs', 'weight_diff']:
                raise ValueError(
                    "logistic_param's early_stop not supported, converge_func should be"
                    " 'diff', 'weight_diff' or 'abs'")

        self.encrypt_param.check()
        self.predict_param.check()
        if self.encrypt_param.method not in [consts.PAILLIER, None]:
            raise ValueError(
                "logistic_param's encrypted method support 'Paillier' or None only")

        if type(self.decay).__name__ not in ["int", 'float']:
            raise ValueError(
                "logistic_param's decay {} not supported, should be 'int' or 'float'".format(
                    self.decay))

        if type(self.decay_sqrt).__name__ not in ['bool']:
            raise ValueError(
                "logistic_param's decay_sqrt {} not supported, should be 'bool'".format(
                    self.decay_sqrt))

        if self.callback_param.validation_freqs is not None:
            if type(self.callback_param.validation_freqs).__name__ not in ["int", "list", "tuple", "set"]:
                raise ValueError(
                    "validation strategy param's validate_freqs's type not supported ,"
                    " should be int or list or tuple or set"
                )
            if type(self.callback_param.validation_freqs).__name__ == "int" and \
                    self.callback_param.validation_freqs <= 0:
                raise ValueError("validation strategy param's validate_freqs should greater than 0")
            if self.reveal_every_iter is False:
                raise ValueError(f"When reveal_every_iter is False, validation every iter"
                                 f" is not supported.")

        if self.callback_param.early_stopping_rounds is None:
            pass
        elif isinstance(self.callback_param.early_stopping_rounds, int):
            if self.callback_param.early_stopping_rounds < 1:
                raise ValueError("early stopping rounds should be larger than 0 when it's integer")
            if self.callback_param.validation_freqs is None:
                raise ValueError("validation freqs must be set when early stopping is enabled")

        if self.callback_param.metrics is not None and \
                not isinstance(self.callback_param.metrics, list):
            raise ValueError("metrics should be a list")

        if not isinstance(self.callback_param.use_first_metric_only, bool):
            raise ValueError("use_first_metric_only should be a boolean")

        self.reveal_strategy = self.reveal_strategy.lower()
        self.check_valid_value(self.reveal_strategy, descr, ["respectively", "encrypted_reveal_in_host"])

        if self.reveal_strategy == "encrypted_reveal_in_host" and self.reveal_every_iter:
            raise PermissionError("reveal strategy: encrypted_reveal_in_host mode is not allow to reveal every iter.")
        self.check_boolean(self.reveal_every_iter, descr)
        self.callback_param.check()
        self.cv_param.check()
        return True

__init__(self, penalty=None, tol=0.0001, alpha=1.0, optimizer='sgd', batch_size=-1, learning_rate=0.01, init_param=<federatedml.param.init_model_param.InitParam object at 0x7f3f8a5bcfd0>, max_iter=100, early_stop='diff', encrypt_param=<federatedml.param.encrypt_param.EncryptParam object at 0x7f3f8a5bcd50>, predict_param=<federatedml.param.predict_param.PredictParam object at 0x7f3f8a5bcd90>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object at 0x7f3f8a5bccd0>, decay=1, decay_sqrt=True, multi_class='ovr', use_mix_rand=True, reveal_strategy='respectively', reveal_every_iter=True, callback_param=<federatedml.param.callback_param.CallbackParam object at 0x7f3f8a5bce50>, encrypted_mode_calculator_param=<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object at 0x7f3f8a5bced0>)

special ¶

Source code in federatedml/param/hetero_sshe_lr_param.py

def __init__(self, penalty=None,
             tol=1e-4, alpha=1.0, optimizer='sgd',
             batch_size=-1, learning_rate=0.01, init_param=InitParam(),
             max_iter=100, early_stop='diff', encrypt_param=EncryptParam(),
             predict_param=PredictParam(), cv_param=CrossValidationParam(),
             decay=1, decay_sqrt=True,
             multi_class='ovr', use_mix_rand=True,
             reveal_strategy="respectively",
             reveal_every_iter=True,
             callback_param=CallbackParam(),
             encrypted_mode_calculator_param=EncryptedModeCalculatorParam()
             ):
    super(LogisticRegressionParam, self).__init__()
    self.penalty = penalty
    self.tol = tol
    self.alpha = alpha
    self.optimizer = optimizer
    self.batch_size = batch_size
    self.learning_rate = learning_rate
    self.init_param = copy.deepcopy(init_param)
    self.max_iter = max_iter
    self.early_stop = early_stop
    self.encrypt_param = encrypt_param
    self.predict_param = copy.deepcopy(predict_param)
    self.decay = decay
    self.decay_sqrt = decay_sqrt
    self.multi_class = multi_class
    self.use_mix_rand = use_mix_rand
    self.reveal_strategy = reveal_strategy
    self.reveal_every_iter = reveal_every_iter
    self.callback_param = copy.deepcopy(callback_param)
    self.cv_param = copy.deepcopy(cv_param)
    self.encrypted_mode_calculator_param = copy.deepcopy(encrypted_mode_calculator_param)

check(self) ¶

Source code in federatedml/param/hetero_sshe_lr_param.py

def check(self):
    descr = "logistic_param's"

    if self.penalty is None:
        pass
    elif type(self.penalty).__name__ != "str":
        raise ValueError(
            "logistic_param's penalty {} not supported, should be str type".format(self.penalty))
    else:
        self.penalty = self.penalty.upper()
        if self.penalty not in [consts.L1_PENALTY, consts.L2_PENALTY]:
            raise ValueError(
                "logistic_param's penalty not supported, penalty should be 'L1', 'L2' or 'none'")
        if not self.reveal_every_iter:
            if self.penalty not in [consts.L2_PENALTY]:
                raise ValueError(
                    f"penalty should be 'L2' or 'none', when reveal_every_iter is False"
                )

    if not isinstance(self.tol, (int, float)):
        raise ValueError(
            "logistic_param's tol {} not supported, should be float type".format(self.tol))

    if type(self.alpha).__name__ not in ["float", 'int']:
        raise ValueError(
            "logistic_param's alpha {} not supported, should be float or int type".format(self.alpha))

    if type(self.optimizer).__name__ != "str":
        raise ValueError(
            "logistic_param's optimizer {} not supported, should be str type".format(self.optimizer))
    else:
        self.optimizer = self.optimizer.lower()
        if self.reveal_every_iter:
            if self.optimizer not in ['sgd', 'rmsprop', 'adam', 'adagrad', 'nesterov_momentum_sgd']:
                raise ValueError(
                    "When reveal_every_iter is True, "
                    "sshe logistic_param's optimizer not supported, optimizer should be"
                    " 'sgd', 'rmsprop', 'adam', 'nesterov_momentum_sgd', or 'adagrad'")
        else:
            if self.optimizer not in ['sgd', 'nesterov_momentum_sgd']:
                raise ValueError("When reveal_every_iter is False, "
                                 "sshe logistic_param's optimizer not supported, optimizer should be"
                                 " 'sgd', 'nesterov_momentum_sgd'")

    if self.batch_size != -1:
        if type(self.batch_size).__name__ not in ["int"] \
                or self.batch_size < consts.MIN_BATCH_SIZE:
            raise ValueError(descr + " {} not supported, should be larger than {} or "
                                     "-1 represent for all data".format(self.batch_size, consts.MIN_BATCH_SIZE))

    if not isinstance(self.learning_rate, (float, int)):
        raise ValueError(
            "logistic_param's learning_rate {} not supported, should be float or int type".format(
                self.learning_rate))

    self.init_param.check()

    if type(self.max_iter).__name__ != "int":
        raise ValueError(
            "logistic_param's max_iter {} not supported, should be int type".format(self.max_iter))
    elif self.max_iter <= 0:
        raise ValueError(
            "logistic_param's max_iter must be greater or equal to 1")

    if type(self.early_stop).__name__ != "str":
        raise ValueError(
            "logistic_param's early_stop {} not supported, should be str type".format(
                self.early_stop))
    else:
        self.early_stop = self.early_stop.lower()
        if self.early_stop not in ['diff', 'abs', 'weight_diff']:
            raise ValueError(
                "logistic_param's early_stop not supported, converge_func should be"
                " 'diff', 'weight_diff' or 'abs'")

    self.encrypt_param.check()
    self.predict_param.check()
    if self.encrypt_param.method not in [consts.PAILLIER, None]:
        raise ValueError(
            "logistic_param's encrypted method support 'Paillier' or None only")

    if type(self.decay).__name__ not in ["int", 'float']:
        raise ValueError(
            "logistic_param's decay {} not supported, should be 'int' or 'float'".format(
                self.decay))

    if type(self.decay_sqrt).__name__ not in ['bool']:
        raise ValueError(
            "logistic_param's decay_sqrt {} not supported, should be 'bool'".format(
                self.decay_sqrt))

    if self.callback_param.validation_freqs is not None:
        if type(self.callback_param.validation_freqs).__name__ not in ["int", "list", "tuple", "set"]:
            raise ValueError(
                "validation strategy param's validate_freqs's type not supported ,"
                " should be int or list or tuple or set"
            )
        if type(self.callback_param.validation_freqs).__name__ == "int" and \
                self.callback_param.validation_freqs <= 0:
            raise ValueError("validation strategy param's validate_freqs should greater than 0")
        if self.reveal_every_iter is False:
            raise ValueError(f"When reveal_every_iter is False, validation every iter"
                             f" is not supported.")

    if self.callback_param.early_stopping_rounds is None:
        pass
    elif isinstance(self.callback_param.early_stopping_rounds, int):
        if self.callback_param.early_stopping_rounds < 1:
            raise ValueError("early stopping rounds should be larger than 0 when it's integer")
        if self.callback_param.validation_freqs is None:
            raise ValueError("validation freqs must be set when early stopping is enabled")

    if self.callback_param.metrics is not None and \
            not isinstance(self.callback_param.metrics, list):
        raise ValueError("metrics should be a list")

    if not isinstance(self.callback_param.use_first_metric_only, bool):
        raise ValueError("use_first_metric_only should be a boolean")

    self.reveal_strategy = self.reveal_strategy.lower()
    self.check_valid_value(self.reveal_strategy, descr, ["respectively", "encrypted_reveal_in_host"])

    if self.reveal_strategy == "encrypted_reveal_in_host" and self.reveal_every_iter:
        raise PermissionError("reveal strategy: encrypted_reveal_in_host mode is not allow to reveal every iter.")
    self.check_boolean(self.reveal_every_iter, descr)
    self.callback_param.check()
    self.cv_param.check()
    return True

`homo_nn_param` ¶

Classes¶


HomoNNParam            (BaseParam)

¶

Parameters used for Homo Neural Network.

Parameters:

Name	Type	Description	Default
`secure_aggregate`	`bool`	enable secure aggregation or not, defaults to True.	`True`
`aggregate_every_n_epoch`	`int`	aggregate model every n epoch, defaults to 1.	`1`
`config_type`	`str`	config type	`'nn'`
`nn_define`	`dict`	a dict represents the structure of neural network.	`None`
`optimizer`	`Union[str, dict, types.SimpleNamespace]`	optimizer method, accept following types: 1. a string, one of "Adadelta", "Adagrad", "Adam", "Adamax", "Nadam", "RMSprop", "SGD" 2. a dict, with a required key-value pair keyed by "optimizer", with optional key-value pairs such as learning rate. defaults to "SGD"	`'SGD'`
`loss`	`str`	loss	`None`
`metrics`	`Union[str, list]`	metrics	`None`
`max_iter`	`int`	the maximum iteration for aggregation in training.	`100`
`batch_size`	`int`	batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy. defaults to -1.	`-1`
`early_stop`	`Union[str, dict, types.SimpleNamespace]`	Method used to judge converge or not. a) diff： Use difference of loss between two iterations to judge whether converge. b) weight_diff: Use difference between weights of two consecutive iterations c) abs: Use the absolute value of loss to judge whether converge. i.e. if loss < eps, it is converged.	`'diff'`
`encode_label`	`bool`	encode label to one_hot.	`False`

Source code in federatedml/param/homo_nn_param.py

class HomoNNParam(BaseParam):
    """
    Parameters used for Homo Neural Network.

    Parameters
    ----------
    secure_aggregate : bool
        enable secure aggregation or not, defaults to True.
    aggregate_every_n_epoch : int
        aggregate model every n epoch, defaults to 1.
    config_type : {"nn", "keras", "tf"}
        config type
    nn_define : dict
        a dict represents the structure of neural network.
    optimizer : str or dict
        optimizer method, accept following types:
        1. a string, one of "Adadelta", "Adagrad", "Adam", "Adamax", "Nadam", "RMSprop", "SGD"
        2. a dict, with a required key-value pair keyed by "optimizer",
            with optional key-value pairs such as learning rate.
        defaults to "SGD"
    loss : str
        loss
    metrics: str or list of str
        metrics
    max_iter: int
        the maximum iteration for aggregation in training.
    batch_size : int
        batch size when updating model.
        -1 means use all data in a batch. i.e. Not to use mini-batch strategy.
        defaults to -1.
    early_stop : {'diff', 'weight_diff', 'abs'}
        Method used to judge converge or not.
            a)	diff： Use difference of loss between two iterations to judge whether converge.
            b)  weight_diff: Use difference between weights of two consecutive iterations
            c)	abs: Use the absolute value of loss to judge whether converge. i.e. if loss < eps, it is converged.
    encode_label : bool
        encode label to one_hot.
    """

    def __init__(
        self,
        api_version: int = 0,
        secure_aggregate: bool = True,
        aggregate_every_n_epoch: int = 1,
        config_type: str = "nn",
        nn_define: dict = None,
        optimizer: typing.Union[str, dict, SimpleNamespace] = "SGD",
        loss: str = None,
        metrics: typing.Union[str, list] = None,
        max_iter: int = 100,
        batch_size: int = -1,
        early_stop: typing.Union[str, dict, SimpleNamespace] = "diff",
        encode_label: bool = False,
        predict_param=PredictParam(),
        cv_param=CrossValidationParam(),
        callback_param=CallbackParam(),
    ):
        super(HomoNNParam, self).__init__()

        self.api_version = api_version

        self.secure_aggregate = secure_aggregate
        self.aggregate_every_n_epoch = aggregate_every_n_epoch

        self.config_type = config_type
        self.nn_define = nn_define or []
        self.encode_label = encode_label

        self.batch_size = batch_size
        self.max_iter = max_iter
        self.early_stop = early_stop
        self.metrics = metrics
        self.optimizer = optimizer
        self.loss = loss

        self.predict_param = copy.deepcopy(predict_param)
        self.cv_param = copy.deepcopy(cv_param)
        self.callback_param = copy.deepcopy(callback_param)

    def check(self):
        supported_config_type = ["nn", "keras", "pytorch"]
        if self.config_type not in supported_config_type:
            raise ValueError(f"config_type should be one of {supported_config_type}")
        self.early_stop = _parse_early_stop(self.early_stop)
        self.metrics = _parse_metrics(self.metrics)
        self.optimizer = _parse_optimizer(self.optimizer)

    def generate_pb(self):
        from federatedml.protobuf.generated import nn_model_meta_pb2

        pb = nn_model_meta_pb2.HomoNNParam()
        pb.secure_aggregate = self.secure_aggregate
        pb.encode_label = self.encode_label
        pb.aggregate_every_n_epoch = self.aggregate_every_n_epoch
        pb.config_type = self.config_type

        if self.config_type == "nn":
            for layer in self.nn_define:
                pb.nn_define.append(json.dumps(layer))
        elif self.config_type == "keras":
            pb.nn_define.append(json.dumps(self.nn_define))
        elif self.config_type == "pytorch":
            for layer in self.nn_define:
                pb.nn_define.append(json.dumps(layer))

        pb.batch_size = self.batch_size
        pb.max_iter = self.max_iter

        pb.early_stop.early_stop = self.early_stop.converge_func
        pb.early_stop.eps = self.early_stop.eps

        for metric in self.metrics:
            pb.metrics.append(metric)

        pb.optimizer.optimizer = self.optimizer.optimizer
        pb.optimizer.args = json.dumps(self.optimizer.kwargs)
        pb.loss = self.loss
        return pb

    def restore_from_pb(self, pb, is_warm_start_mode: bool = False):
        self.secure_aggregate = pb.secure_aggregate
        self.encode_label = pb.encode_label
        self.aggregate_every_n_epoch = pb.aggregate_every_n_epoch
        self.config_type = pb.config_type

        if self.config_type == "nn":
            for layer in pb.nn_define:
                self.nn_define.append(json.loads(layer))
        elif self.config_type == "keras":
            self.nn_define = json.loads(pb.nn_define[0])
        elif self.config_type == "pytorch":
            for layer in pb.nn_define:
                self.nn_define.append(json.loads(layer))
        else:
            raise ValueError(f"{self.config_type} is not supported")

        self.batch_size = pb.batch_size
        if not is_warm_start_mode:
            self.max_iter = pb.max_iter
            self.optimizer = _parse_optimizer(
                dict(optimizer=pb.optimizer.optimizer, **json.loads(pb.optimizer.args))
            )
        self.early_stop = _parse_early_stop(
            dict(early_stop=pb.early_stop.early_stop, eps=pb.early_stop.eps)
        )
        self.metrics = list(pb.metrics)
        self.loss = pb.loss
        return pb

__init__(self, api_version=0, secure_aggregate=True, aggregate_every_n_epoch=1, config_type='nn', nn_define=None, optimizer='SGD', loss=None, metrics=None, max_iter=100, batch_size=-1, early_stop='diff', encode_label=False, predict_param=<federatedml.param.predict_param.PredictParam object at 0x7f3f8a7f5450>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object at 0x7f3f8a7f5390>, callback_param=<federatedml.param.callback_param.CallbackParam object at 0x7f3f8a7f5810>)

special ¶

Source code in federatedml/param/homo_nn_param.py

def __init__(
    self,
    api_version: int = 0,
    secure_aggregate: bool = True,
    aggregate_every_n_epoch: int = 1,
    config_type: str = "nn",
    nn_define: dict = None,
    optimizer: typing.Union[str, dict, SimpleNamespace] = "SGD",
    loss: str = None,
    metrics: typing.Union[str, list] = None,
    max_iter: int = 100,
    batch_size: int = -1,
    early_stop: typing.Union[str, dict, SimpleNamespace] = "diff",
    encode_label: bool = False,
    predict_param=PredictParam(),
    cv_param=CrossValidationParam(),
    callback_param=CallbackParam(),
):
    super(HomoNNParam, self).__init__()

    self.api_version = api_version

    self.secure_aggregate = secure_aggregate
    self.aggregate_every_n_epoch = aggregate_every_n_epoch

    self.config_type = config_type
    self.nn_define = nn_define or []
    self.encode_label = encode_label

    self.batch_size = batch_size
    self.max_iter = max_iter
    self.early_stop = early_stop
    self.metrics = metrics
    self.optimizer = optimizer
    self.loss = loss

    self.predict_param = copy.deepcopy(predict_param)
    self.cv_param = copy.deepcopy(cv_param)
    self.callback_param = copy.deepcopy(callback_param)

check(self) ¶

Source code in federatedml/param/homo_nn_param.py

def check(self):
    supported_config_type = ["nn", "keras", "pytorch"]
    if self.config_type not in supported_config_type:
        raise ValueError(f"config_type should be one of {supported_config_type}")
    self.early_stop = _parse_early_stop(self.early_stop)
    self.metrics = _parse_metrics(self.metrics)
    self.optimizer = _parse_optimizer(self.optimizer)

generate_pb(self) ¶

Source code in federatedml/param/homo_nn_param.py

def generate_pb(self):
    from federatedml.protobuf.generated import nn_model_meta_pb2

    pb = nn_model_meta_pb2.HomoNNParam()
    pb.secure_aggregate = self.secure_aggregate
    pb.encode_label = self.encode_label
    pb.aggregate_every_n_epoch = self.aggregate_every_n_epoch
    pb.config_type = self.config_type

    if self.config_type == "nn":
        for layer in self.nn_define:
            pb.nn_define.append(json.dumps(layer))
    elif self.config_type == "keras":
        pb.nn_define.append(json.dumps(self.nn_define))
    elif self.config_type == "pytorch":
        for layer in self.nn_define:
            pb.nn_define.append(json.dumps(layer))

    pb.batch_size = self.batch_size
    pb.max_iter = self.max_iter

    pb.early_stop.early_stop = self.early_stop.converge_func
    pb.early_stop.eps = self.early_stop.eps

    for metric in self.metrics:
        pb.metrics.append(metric)

    pb.optimizer.optimizer = self.optimizer.optimizer
    pb.optimizer.args = json.dumps(self.optimizer.kwargs)
    pb.loss = self.loss
    return pb

restore_from_pb(self, pb, is_warm_start_mode=False) ¶

Source code in federatedml/param/homo_nn_param.py

def restore_from_pb(self, pb, is_warm_start_mode: bool = False):
    self.secure_aggregate = pb.secure_aggregate
    self.encode_label = pb.encode_label
    self.aggregate_every_n_epoch = pb.aggregate_every_n_epoch
    self.config_type = pb.config_type

    if self.config_type == "nn":
        for layer in pb.nn_define:
            self.nn_define.append(json.loads(layer))
    elif self.config_type == "keras":
        self.nn_define = json.loads(pb.nn_define[0])
    elif self.config_type == "pytorch":
        for layer in pb.nn_define:
            self.nn_define.append(json.loads(layer))
    else:
        raise ValueError(f"{self.config_type} is not supported")

    self.batch_size = pb.batch_size
    if not is_warm_start_mode:
        self.max_iter = pb.max_iter
        self.optimizer = _parse_optimizer(
            dict(optimizer=pb.optimizer.optimizer, **json.loads(pb.optimizer.args))
        )
    self.early_stop = _parse_early_stop(
        dict(early_stop=pb.early_stop.early_stop, eps=pb.early_stop.eps)
    )
    self.metrics = list(pb.metrics)
    self.loss = pb.loss
    return pb

`homo_onehot_encoder_param` ¶

Classes¶


HomoOneHotParam            (BaseParam)

¶

Parameters:

Name	Type	Description	Default
`transform_col_indexes`	`list or int, default: -1`	Specify which columns need to calculated. -1 represent for all columns.	`-1`
`need_run`	`bool, default True`	Indicate if this module needed to be run	`True`
`need_alignment`	`bool, default True`	Indicated whether alignment of features is turned on	`True`

Source code in federatedml/param/homo_onehot_encoder_param.py

class HomoOneHotParam(BaseParam):
    """

    Parameters
    ----------

    transform_col_indexes: list or int, default: -1
        Specify which columns need to calculated. -1 represent for all columns.

    need_run: bool, default True
        Indicate if this module needed to be run

    need_alignment: bool, default True
        Indicated whether alignment of features is turned on

    """

    def __init__(self, transform_col_indexes=-1, transform_col_names=None, need_run=True, need_alignment=True):
        super(HomoOneHotParam, self).__init__()
        if transform_col_names is None:
            transform_col_names = []
        self.transform_col_indexes = transform_col_indexes
        self.transform_col_names = transform_col_names
        self.need_run = need_run
        self.need_alignment = need_alignment

    def check(self):
        descr = "One-hot encoder with alignment param's"
        self.check_defined_type(self.transform_col_indexes, descr, ['list', 'int'])
        self.check_boolean(self.need_run, descr)
        self.check_boolean(self.need_alignment, descr)
        return True

__init__(self, transform_col_indexes=-1, transform_col_names=None, need_run=True, need_alignment=True) special ¶

Source code in federatedml/param/homo_onehot_encoder_param.py

def __init__(self, transform_col_indexes=-1, transform_col_names=None, need_run=True, need_alignment=True):
    super(HomoOneHotParam, self).__init__()
    if transform_col_names is None:
        transform_col_names = []
    self.transform_col_indexes = transform_col_indexes
    self.transform_col_names = transform_col_names
    self.need_run = need_run
    self.need_alignment = need_alignment

check(self) ¶

Source code in federatedml/param/homo_onehot_encoder_param.py

def check(self):
    descr = "One-hot encoder with alignment param's"
    self.check_defined_type(self.transform_col_indexes, descr, ['list', 'int'])
    self.check_boolean(self.need_run, descr)
    self.check_boolean(self.need_alignment, descr)
    return True

`init_model_param` ¶

Classes¶


InitParam            (BaseParam)

¶

Initialize Parameters used in initializing a model.

Parameters:

Name	Type	Description	Default
`init_method`	`{'random_uniform', 'random_normal', 'ones', 'zeros' or 'const'}`	Initial method.	`'random_uniform'`
`init_const`	`int or float, default: 1`	Required when init_method is 'const'. Specify the constant.	`1`
`fit_intercept`	`bool, default: True`	Whether to initialize the intercept or not.	`True`

Source code in federatedml/param/init_model_param.py

class InitParam(BaseParam):
    """
    Initialize Parameters used in initializing a model.

    Parameters
    ----------
    init_method : {'random_uniform', 'random_normal', 'ones', 'zeros' or 'const'}
        Initial method.

    init_const : int or float, default: 1
        Required when init_method is 'const'. Specify the constant.

    fit_intercept : bool, default: True
        Whether to initialize the intercept or not.

    """

    def __init__(self, init_method='random_uniform', init_const=1, fit_intercept=True, random_seed=None):
        super().__init__()
        self.init_method = init_method
        self.init_const = init_const
        self.fit_intercept = fit_intercept
        self.random_seed = random_seed

    def check(self):
        if type(self.init_method).__name__ != "str":
            raise ValueError(
                "Init param's init_method {} not supported, should be str type".format(self.init_method))
        else:
            self.init_method = self.init_method.lower()
            if self.init_method not in ['random_uniform', 'random_normal', 'ones', 'zeros', 'const']:
                raise ValueError(
                    "Init param's init_method {} not supported, init_method should in 'random_uniform',"
                    " 'random_normal' 'ones', 'zeros' or 'const'".format(self.init_method))

        if type(self.init_const).__name__ not in ['int', 'float']:
            raise ValueError(
                "Init param's init_const {} not supported, should be int or float type".format(self.init_const))

        if type(self.fit_intercept).__name__ != 'bool':
            raise ValueError(
                "Init param's fit_intercept {} not supported, should be bool type".format(self.fit_intercept))

        if self.random_seed is not None:
            if type(self.random_seed).__name__ != 'int':
                raise ValueError(
                    "Init param's random_seed {} not supported, should be int or float type".format(self.random_seed))

        return True

__init__(self, init_method='random_uniform', init_const=1, fit_intercept=True, random_seed=None) special ¶

Source code in federatedml/param/init_model_param.py

def __init__(self, init_method='random_uniform', init_const=1, fit_intercept=True, random_seed=None):
    super().__init__()
    self.init_method = init_method
    self.init_const = init_const
    self.fit_intercept = fit_intercept
    self.random_seed = random_seed

check(self) ¶

Source code in federatedml/param/init_model_param.py

def check(self):
    if type(self.init_method).__name__ != "str":
        raise ValueError(
            "Init param's init_method {} not supported, should be str type".format(self.init_method))
    else:
        self.init_method = self.init_method.lower()
        if self.init_method not in ['random_uniform', 'random_normal', 'ones', 'zeros', 'const']:
            raise ValueError(
                "Init param's init_method {} not supported, init_method should in 'random_uniform',"
                " 'random_normal' 'ones', 'zeros' or 'const'".format(self.init_method))

    if type(self.init_const).__name__ not in ['int', 'float']:
        raise ValueError(
            "Init param's init_const {} not supported, should be int or float type".format(self.init_const))

    if type(self.fit_intercept).__name__ != 'bool':
        raise ValueError(
            "Init param's fit_intercept {} not supported, should be bool type".format(self.fit_intercept))

    if self.random_seed is not None:
        if type(self.random_seed).__name__ != 'int':
            raise ValueError(
                "Init param's random_seed {} not supported, should be int or float type".format(self.random_seed))

    return True

`intersect_param` ¶

DEFAULT_RANDOM_BIT ¶

Classes¶


EncodeParam            (BaseParam)

¶

Define the hash method for raw intersect method

Parameters:

Name	Type	Description	Default
`salt`	`str`	the src data string will be str = str + salt, default by empty string	`''`
`encode_method`	`{"none", "md5", "sha1", "sha224", "sha256", "sha384", "sha512", "sm3"}`	the hash method of src data string, support md5, sha1, sha224, sha256, sha384, sha512, sm3, default by None	`'none'`
`base64`	`bool`	if True, the result of hash will be changed to base64, default by False	`False`

Source code in federatedml/param/intersect_param.py

class EncodeParam(BaseParam):
    """
    Define the hash method for raw intersect method

    Parameters
    ----------
    salt: str
        the src data string will be str = str + salt, default by empty string

    encode_method: {"none", "md5", "sha1", "sha224", "sha256", "sha384", "sha512", "sm3"}
        the hash method of src data string, support md5, sha1, sha224, sha256, sha384, sha512, sm3, default by None

    base64: bool
        if True, the result of hash will be changed to base64, default by False
    """

    def __init__(self, salt='', encode_method='none', base64=False):
        super().__init__()
        self.salt = salt
        self.encode_method = encode_method
        self.base64 = base64

    def check(self):
        if type(self.salt).__name__ != "str":
            raise ValueError(
                "encode param's salt {} not supported, should be str type".format(
                    self.salt))

        descr = "encode param's "

        self.encode_method = self.check_and_change_lower(self.encode_method,
                                                         ["none", consts.MD5, consts.SHA1, consts.SHA224,
                                                          consts.SHA256, consts.SHA384, consts.SHA512,
                                                          consts.SM3],
                                                         descr)

        if type(self.base64).__name__ != "bool":
            raise ValueError(
                "hash param's base64 {} not supported, should be bool type".format(self.base64))

        LOGGER.debug("Finish EncodeParam check!")
        LOGGER.warning(f"'EncodeParam' will be replaced by 'RAWParam' in future release."
                       f"Please do not rely on current param naming in application.")
        return True

__init__(self, salt='', encode_method='none', base64=False) special ¶

Source code in federatedml/param/intersect_param.py

def __init__(self, salt='', encode_method='none', base64=False):
    super().__init__()
    self.salt = salt
    self.encode_method = encode_method
    self.base64 = base64

check(self) ¶

Source code in federatedml/param/intersect_param.py

def check(self):
    if type(self.salt).__name__ != "str":
        raise ValueError(
            "encode param's salt {} not supported, should be str type".format(
                self.salt))

    descr = "encode param's "

    self.encode_method = self.check_and_change_lower(self.encode_method,
                                                     ["none", consts.MD5, consts.SHA1, consts.SHA224,
                                                      consts.SHA256, consts.SHA384, consts.SHA512,
                                                      consts.SM3],
                                                     descr)

    if type(self.base64).__name__ != "bool":
        raise ValueError(
            "hash param's base64 {} not supported, should be bool type".format(self.base64))

    LOGGER.debug("Finish EncodeParam check!")
    LOGGER.warning(f"'EncodeParam' will be replaced by 'RAWParam' in future release."
                   f"Please do not rely on current param naming in application.")
    return True


RAWParam            (BaseParam)

¶

Specify parameters for raw intersect method

Parameters:

Name	Type	Description	Default
`use_hash`	`bool`	whether to hash ids for raw intersect	`False`
`salt`	`str`	the src data string will be str = str + salt, default by empty string	`''`
`hash_method`	`str`	the hash method of src data string, support md5, sha1, sha224, sha256, sha384, sha512, sm3, default by None	`'none'`
`base64`	`bool`	if True, the result of hash will be changed to base64, default by False	`False`
`join_role`	`{"guest", "host"}`	role who joins ids, supports "guest" and "host" only and effective only for raw. If it is "guest", the host will send its ids to guest and find the intersection of ids in guest; if it is "host", the guest will send its ids to host. Default by "guest";	`'guest'`

Source code in federatedml/param/intersect_param.py

class RAWParam(BaseParam):
    """
    Specify parameters for raw intersect method

    Parameters
    ----------
    use_hash: bool
        whether to hash ids for raw intersect

    salt: str
        the src data string will be str = str + salt, default by empty string

    hash_method: str
        the hash method of src data string, support md5, sha1, sha224, sha256, sha384, sha512, sm3, default by None

    base64: bool
        if True, the result of hash will be changed to base64, default by False

    join_role: {"guest", "host"}
        role who joins ids, supports "guest" and "host" only and effective only for raw.
        If it is "guest", the host will send its ids to guest and find the intersection of
        ids in guest; if it is "host", the guest will send its ids to host. Default by "guest";
    """

    def __init__(self, use_hash=False, salt='', hash_method='none', base64=False, join_role=consts.GUEST):
        super().__init__()
        self.use_hash = use_hash
        self.salt = salt
        self.hash_method = hash_method
        self.base64 = base64
        self.join_role = join_role

    def check(self):
        descr = "raw param's "

        self.check_boolean(self.use_hash, f"{descr}use_hash")
        self.check_string(self.salt, f"{descr}salt")

        self.hash_method = self.check_and_change_lower(self.hash_method,
                                                         ["none", consts.MD5, consts.SHA1, consts.SHA224,
                                                          consts.SHA256, consts.SHA384, consts.SHA512,
                                                          consts.SM3],
                                                         f"{descr}hash_method")

        self.check_boolean(self.base64, f"{descr}base_64")
        self.join_role = self.check_and_change_lower(self.join_role, [consts.GUEST, consts.HOST], f"{descr}join_role")

        LOGGER.debug("Finish RAWParam check!")
        return True

__init__(self, use_hash=False, salt='', hash_method='none', base64=False, join_role='guest') special ¶

Source code in federatedml/param/intersect_param.py

def __init__(self, use_hash=False, salt='', hash_method='none', base64=False, join_role=consts.GUEST):
    super().__init__()
    self.use_hash = use_hash
    self.salt = salt
    self.hash_method = hash_method
    self.base64 = base64
    self.join_role = join_role

check(self) ¶

Source code in federatedml/param/intersect_param.py

def check(self):
    descr = "raw param's "

    self.check_boolean(self.use_hash, f"{descr}use_hash")
    self.check_string(self.salt, f"{descr}salt")

    self.hash_method = self.check_and_change_lower(self.hash_method,
                                                     ["none", consts.MD5, consts.SHA1, consts.SHA224,
                                                      consts.SHA256, consts.SHA384, consts.SHA512,
                                                      consts.SM3],
                                                     f"{descr}hash_method")

    self.check_boolean(self.base64, f"{descr}base_64")
    self.join_role = self.check_and_change_lower(self.join_role, [consts.GUEST, consts.HOST], f"{descr}join_role")

    LOGGER.debug("Finish RAWParam check!")
    return True


RSAParam            (BaseParam)

¶

Specify parameters for RSA intersect method

Parameters:

Name	Type	Description	Default
`salt`	`str`	the src data string will be str = str + salt, default ''	`''`
`hash_method`	`str`	the hash method of src data string, support sha256, sha384, sha512, sm3, default sha256	`'sha256'`
`final_hash_method`	`str`	the hash method of result data string, support md5, sha1, sha224, sha256, sha384, sha512, sm3, default sha256	`'sha256'`
`split_calculation`	`bool`	if True, Host & Guest split operations for faster performance, recommended on large data set	`False`
`random_base_fraction`	`positive float`	if not None, generate (fraction * public key id count) of r for encryption and reuse generated r; note that value greater than 0.99 will be taken as 1, and value less than 0.01 will be rounded up to 0.01	`None`
`key_length`	`int`	value >= 1024, bit count of rsa key, default 1024	`1024`
`random_bit`	`positive int`	it will define the size of blinding factor in rsa algorithm, default 128	`128`

Source code in federatedml/param/intersect_param.py

class RSAParam(BaseParam):
    """
    Specify parameters for RSA intersect method

    Parameters
    ----------
    salt: str
        the src data string will be str = str + salt, default ''

    hash_method: str
        the hash method of src data string, support sha256, sha384, sha512, sm3, default sha256

    final_hash_method: str
        the hash method of result data string, support md5, sha1, sha224, sha256, sha384, sha512, sm3, default sha256

    split_calculation: bool
        if True, Host & Guest split operations for faster performance, recommended on large data set

    random_base_fraction: positive float
        if not None, generate (fraction * public key id count) of r for encryption and reuse generated r;
        note that value greater than 0.99 will be taken as 1, and value less than 0.01 will be rounded up to 0.01

    key_length: int
        value >= 1024, bit count of rsa key, default 1024

    random_bit: positive int
        it will define the size of blinding factor in rsa algorithm, default 128

    """

    def __init__(self, salt='', hash_method='sha256',  final_hash_method='sha256',
                 split_calculation=False, random_base_fraction=None, key_length=consts.DEFAULT_KEY_LENGTH,
                 random_bit=DEFAULT_RANDOM_BIT):
        super().__init__()
        self.salt = salt
        self.hash_method = hash_method
        self.final_hash_method = final_hash_method
        self.split_calculation = split_calculation
        self.random_base_fraction = random_base_fraction
        self.key_length = key_length
        self.random_bit = random_bit

    def check(self):
        descr = "rsa param's "
        self.check_string(self.salt, f"{descr}salt")

        self.hash_method = self.check_and_change_lower(self.hash_method,
                                                       [consts.SHA256, consts.SHA384, consts.SHA512, consts.SM3],
                                                       f"{descr}hash_method")

        self.final_hash_method = self.check_and_change_lower(self.final_hash_method,
                                                             [consts.MD5, consts.SHA1, consts.SHA224,
                                                              consts.SHA256, consts.SHA384, consts.SHA512,
                                                              consts.SM3],
                                                             f"{descr}final_hash_method")

        self.check_boolean(self.split_calculation, f"{descr}split_calculation")

        if self.random_base_fraction:
            self.check_positive_number(self.random_base_fraction, descr)
            self.check_decimal_float(self.random_base_fraction, f"{descr}random_base_fraction")

        self.check_positive_integer(self.key_length, f"{descr}key_length")
        if self.key_length < 1024:
            raise ValueError(f"key length must be >= 1024")
        self.check_positive_integer(self.random_bit, f"{descr}random_bit")

        LOGGER.debug("Finish RSAParam parameter check!")
        return True

__init__(self, salt='', hash_method='sha256', final_hash_method='sha256', split_calculation=False, random_base_fraction=None, key_length=1024, random_bit=128)

special ¶

Source code in federatedml/param/intersect_param.py

def __init__(self, salt='', hash_method='sha256',  final_hash_method='sha256',
             split_calculation=False, random_base_fraction=None, key_length=consts.DEFAULT_KEY_LENGTH,
             random_bit=DEFAULT_RANDOM_BIT):
    super().__init__()
    self.salt = salt
    self.hash_method = hash_method
    self.final_hash_method = final_hash_method
    self.split_calculation = split_calculation
    self.random_base_fraction = random_base_fraction
    self.key_length = key_length
    self.random_bit = random_bit

check(self) ¶

Source code in federatedml/param/intersect_param.py

def check(self):
    descr = "rsa param's "
    self.check_string(self.salt, f"{descr}salt")

    self.hash_method = self.check_and_change_lower(self.hash_method,
                                                   [consts.SHA256, consts.SHA384, consts.SHA512, consts.SM3],
                                                   f"{descr}hash_method")

    self.final_hash_method = self.check_and_change_lower(self.final_hash_method,
                                                         [consts.MD5, consts.SHA1, consts.SHA224,
                                                          consts.SHA256, consts.SHA384, consts.SHA512,
                                                          consts.SM3],
                                                         f"{descr}final_hash_method")

    self.check_boolean(self.split_calculation, f"{descr}split_calculation")

    if self.random_base_fraction:
        self.check_positive_number(self.random_base_fraction, descr)
        self.check_decimal_float(self.random_base_fraction, f"{descr}random_base_fraction")

    self.check_positive_integer(self.key_length, f"{descr}key_length")
    if self.key_length < 1024:
        raise ValueError(f"key length must be >= 1024")
    self.check_positive_integer(self.random_bit, f"{descr}random_bit")

    LOGGER.debug("Finish RSAParam parameter check!")
    return True


DHParam            (BaseParam)

¶

Define the hash method for DH intersect method

Parameters:

Name	Type	Description	Default
`salt`	`str`	the src data string will be str = str + salt, default ''	`''`
`hash_method`	`str`	the hash method of src data string, support none, md5, sha1, sha 224, sha256, sha384, sha512, sm3, default sha256	`'sha256'`
`key_length`	`int, value >= 1024`	the key length of the commutative cipher p, default 1024	`1024`

Source code in federatedml/param/intersect_param.py

class DHParam(BaseParam):
    """
    Define the hash method for DH intersect method

    Parameters
    ----------
    salt: str
        the src data string will be str = str + salt, default ''

    hash_method: str
        the hash method of src data string, support none, md5, sha1, sha 224, sha256, sha384, sha512, sm3, default sha256

    key_length: int, value >= 1024
        the key length of the commutative cipher p, default 1024

    """

    def __init__(self, salt='', hash_method='sha256', key_length=consts.DEFAULT_KEY_LENGTH):
        super().__init__()
        self.salt = salt
        self.hash_method = hash_method
        self.key_length = key_length

    def check(self):
        descr = "dh param's "
        self.check_string(self.salt, f"{descr}salt")

        self.hash_method = self.check_and_change_lower(self.hash_method,
                                                       ["none", consts.MD5, consts.SHA1, consts.SHA224,
                                                        consts.SHA256, consts.SHA384, consts.SHA512,
                                                        consts.SM3],
                                                       f"{descr}hash_method")

        self.check_positive_integer(self.key_length, f"{descr}key_length")
        if self.key_length < 1024:
            raise ValueError(f"key length must be >= 1024")

        LOGGER.debug("Finish DHParam parameter check!")
        return True

__init__(self, salt='', hash_method='sha256', key_length=1024) special ¶

Source code in federatedml/param/intersect_param.py

def __init__(self, salt='', hash_method='sha256', key_length=consts.DEFAULT_KEY_LENGTH):
    super().__init__()
    self.salt = salt
    self.hash_method = hash_method
    self.key_length = key_length

check(self) ¶

Source code in federatedml/param/intersect_param.py

def check(self):
    descr = "dh param's "
    self.check_string(self.salt, f"{descr}salt")

    self.hash_method = self.check_and_change_lower(self.hash_method,
                                                   ["none", consts.MD5, consts.SHA1, consts.SHA224,
                                                    consts.SHA256, consts.SHA384, consts.SHA512,
                                                    consts.SM3],
                                                   f"{descr}hash_method")

    self.check_positive_integer(self.key_length, f"{descr}key_length")
    if self.key_length < 1024:
        raise ValueError(f"key length must be >= 1024")

    LOGGER.debug("Finish DHParam parameter check!")
    return True


IntersectCache            (BaseParam)

¶

Source code in federatedml/param/intersect_param.py

class IntersectCache(BaseParam):
    def __init__(self, use_cache=False, id_type=consts.PHONE, encrypt_type=consts.SHA256):
        """

        Parameters
        ----------
        use_cache: bool
            whether to use cached ids; with ver1.7 and above, this param is ignored
        id_type
            with ver1.7 and above, this param is ignored
        encrypt_type
            with ver1.7 and above, this param is ignored
        """
        super().__init__()
        self.use_cache = use_cache
        self.id_type = id_type
        self.encrypt_type = encrypt_type

    def check(self):
        descr = "intersect_cache param's "
        # self.check_boolean(self.use_cache, f"{descr}use_cache")

        self.check_and_change_lower(self.id_type,
                                    [consts.PHONE, consts.IMEI],
                                    f"{descr}id_type")
        self.check_and_change_lower(self.encrypt_type,
                                    [consts.MD5, consts.SHA256],
                                    f"{descr}encrypt_type")

Methods¶

__init__(self, use_cache=False, id_type='phone', encrypt_type='sha256') special ¶

Parameters:

Name	Type	Description	Default
`use_cache`	`bool`	whether to use cached ids; with ver1.7 and above, this param is ignored	`False`
`id_type`	`None`	with ver1.7 and above, this param is ignored	`'phone'`
`encrypt_type`	`None`	with ver1.7 and above, this param is ignored	`'sha256'`

Source code in federatedml/param/intersect_param.py

def __init__(self, use_cache=False, id_type=consts.PHONE, encrypt_type=consts.SHA256):
    """

    Parameters
    ----------
    use_cache: bool
        whether to use cached ids; with ver1.7 and above, this param is ignored
    id_type
        with ver1.7 and above, this param is ignored
    encrypt_type
        with ver1.7 and above, this param is ignored
    """
    super().__init__()
    self.use_cache = use_cache
    self.id_type = id_type
    self.encrypt_type = encrypt_type

check(self) ¶

Source code in federatedml/param/intersect_param.py

def check(self):
    descr = "intersect_cache param's "
    # self.check_boolean(self.use_cache, f"{descr}use_cache")

    self.check_and_change_lower(self.id_type,
                                [consts.PHONE, consts.IMEI],
                                f"{descr}id_type")
    self.check_and_change_lower(self.encrypt_type,
                                [consts.MD5, consts.SHA256],
                                f"{descr}encrypt_type")


IntersectPreProcessParam            (BaseParam)

¶

Specify parameters for pre-processing and cardinality-only mode

Parameters:

Name	Type	Description	Default
`false_positive_rate`	`float`	initial target false positive rate when creating Bloom Filter, must be <= 0.5, default 1e-3	`0.001`
`encrypt_method`	`str`	encrypt method for encrypting id when performing cardinality_only task, supports rsa only, default rsa; specify rsa parameter setting with RSAParam	`'rsa'`
`hash_method`	`str`	the hash method for inserting ids, support md5, sha1, sha 224, sha256, sha384, sha512, sm3, default sha256	`'sha256'`
`preprocess_method`	`str`	the hash method for encoding ids before insertion into filter, default sha256, only effective for preprocessing	`'sha256'`
`preprocess_salt`	`str`	salt to be appended to hash result by preprocess_method before insertion into filter, default '', only effective for preprocessing	`''`
`random_state`	`int`	seed for random salt generator when constructing hash functions, salt is appended to hash result by hash_method when performing insertion, default None	`None`
`filter_owner`	`str`	role that constructs filter, either guest or host, default guest, only effective for preprocessing	`'guest'`

Source code in federatedml/param/intersect_param.py

class IntersectPreProcessParam(BaseParam):
    """
    Specify parameters for pre-processing and cardinality-only mode

    Parameters
    ----------
    false_positive_rate: float
        initial target false positive rate when creating Bloom Filter,
        must be <= 0.5, default 1e-3

    encrypt_method: str
        encrypt method for encrypting id when performing cardinality_only task,
        supports rsa only, default rsa;
        specify rsa parameter setting with RSAParam

    hash_method: str
        the hash method for inserting ids, support md5, sha1, sha 224, sha256, sha384, sha512, sm3,
        default sha256

    preprocess_method: str
        the hash method for encoding ids before insertion into filter, default sha256,
        only effective for preprocessing

    preprocess_salt: str
        salt to be appended to hash result by preprocess_method before insertion into filter,
        default '', only effective for preprocessing

    random_state: int
        seed for random salt generator when constructing hash functions,
        salt is appended to hash result by hash_method when performing insertion, default None

    filter_owner: str
        role that constructs filter, either guest or host, default guest,
        only effective for preprocessing

    """

    def __init__(self, false_positive_rate=1e-3, encrypt_method=consts.RSA, hash_method='sha256',
                 preprocess_method='sha256', preprocess_salt='', random_state=None, filter_owner=consts.GUEST):
        super().__init__()
        self.false_positive_rate = false_positive_rate
        self.encrypt_method = encrypt_method
        self.hash_method = hash_method
        self.preprocess_method = preprocess_method
        self.preprocess_salt = preprocess_salt
        self.random_state = random_state
        self.filter_owner = filter_owner

    def check(self):
        descr = "intersect preprocess param's false_positive_rate "
        self.check_decimal_float(self.false_positive_rate, descr)
        self.check_positive_number(self.false_positive_rate, descr)
        if self.false_positive_rate > 0.5:
            raise ValueError(f"{descr} must be positive float no greater than 0.5")

        descr = "intersect preprocess param's encrypt_method "
        self.encrypt_method = self.check_and_change_lower(self.encrypt_method, [consts.RSA], descr)

        descr = "intersect preprocess param's random_state "
        if self.random_state:
            self.check_nonnegative_number(self.random_state, descr)

        descr = "intersect preprocess param's hash_method "
        self.hash_method = self.check_and_change_lower(self.hash_method,
                                                       [consts.MD5, consts.SHA1, consts.SHA224,
                                                        consts.SHA256, consts.SHA384, consts.SHA512,
                                                        consts.SM3],
                                                       descr)
        descr = "intersect preprocess param's preprocess_salt "
        self.check_string(self.preprocess_salt, descr)

        descr = "intersect preprocess param's preprocess_method "
        self.preprocess_method = self.check_and_change_lower(self.preprocess_method,
                                                             [consts.MD5, consts.SHA1, consts.SHA224,
                                                             consts.SHA256, consts.SHA384, consts.SHA512,
                                                             consts.SM3],
                                                             descr)

        descr = "intersect preprocess param's filter_owner "
        self.filter_owner = self.check_and_change_lower(self.filter_owner,
                                                        [consts.GUEST, consts.HOST],
                                                        descr)

        LOGGER.debug("Finish IntersectPreProcessParam parameter check!")
        return True

__init__(self, false_positive_rate=0.001, encrypt_method='rsa', hash_method='sha256', preprocess_method='sha256', preprocess_salt='', random_state=None, filter_owner='guest')

special ¶

Source code in federatedml/param/intersect_param.py

def __init__(self, false_positive_rate=1e-3, encrypt_method=consts.RSA, hash_method='sha256',
             preprocess_method='sha256', preprocess_salt='', random_state=None, filter_owner=consts.GUEST):
    super().__init__()
    self.false_positive_rate = false_positive_rate
    self.encrypt_method = encrypt_method
    self.hash_method = hash_method
    self.preprocess_method = preprocess_method
    self.preprocess_salt = preprocess_salt
    self.random_state = random_state
    self.filter_owner = filter_owner

check(self) ¶

Source code in federatedml/param/intersect_param.py

def check(self):
    descr = "intersect preprocess param's false_positive_rate "
    self.check_decimal_float(self.false_positive_rate, descr)
    self.check_positive_number(self.false_positive_rate, descr)
    if self.false_positive_rate > 0.5:
        raise ValueError(f"{descr} must be positive float no greater than 0.5")

    descr = "intersect preprocess param's encrypt_method "
    self.encrypt_method = self.check_and_change_lower(self.encrypt_method, [consts.RSA], descr)

    descr = "intersect preprocess param's random_state "
    if self.random_state:
        self.check_nonnegative_number(self.random_state, descr)

    descr = "intersect preprocess param's hash_method "
    self.hash_method = self.check_and_change_lower(self.hash_method,
                                                   [consts.MD5, consts.SHA1, consts.SHA224,
                                                    consts.SHA256, consts.SHA384, consts.SHA512,
                                                    consts.SM3],
                                                   descr)
    descr = "intersect preprocess param's preprocess_salt "
    self.check_string(self.preprocess_salt, descr)

    descr = "intersect preprocess param's preprocess_method "
    self.preprocess_method = self.check_and_change_lower(self.preprocess_method,
                                                         [consts.MD5, consts.SHA1, consts.SHA224,
                                                         consts.SHA256, consts.SHA384, consts.SHA512,
                                                         consts.SM3],
                                                         descr)

    descr = "intersect preprocess param's filter_owner "
    self.filter_owner = self.check_and_change_lower(self.filter_owner,
                                                    [consts.GUEST, consts.HOST],
                                                    descr)

    LOGGER.debug("Finish IntersectPreProcessParam parameter check!")
    return True


IntersectParam            (BaseParam)

¶

Define the intersect method

Parameters:

Name	Type	Description	Default
`intersect_method`	`str`	it supports 'rsa', 'raw', and 'dh', default by 'rsa'	`'rsa'`
`random_bit`	`positive int`	it will define the size of blinding factor in rsa algorithm, default 128 note that this param will be deprecated in future, please use random_bit in RSAParam instead	`128`
`sync_intersect_ids`	`bool`	In rsa, 'sync_intersect_ids' is True means guest or host will send intersect results to the others, and False will not. while in raw, 'sync_intersect_ids' is True means the role of "join_role" will send intersect results and the others will get them. Default by True.	`True`
`join_role`	`str`	role who joins ids, supports "guest" and "host" only and effective only for raw. If it is "guest", the host will send its ids to guest and find the intersection of ids in guest; if it is "host", the guest will send its ids to host. Default by "guest"; note this param will be deprecated in future version, please use 'join_role' in raw_params instead	`'guest'`
`only_output_key`	`bool`	if false, the results of intersection will include key and value which from input data; if true, it will just include key from input data and the value will be empty or filled by uniform string like "intersect_id"	`False`
`with_encode`	`bool`	if True, it will use hash method for intersect ids, effective for raw method only; note that this param will be deprecated in future version, please use 'use_hash' in raw_params; currently if this param is set to True, specification by 'encode_params' will be taken instead of 'raw_params'.	`False`
`encode_params`	`EncodeParam`	effective only when with_encode is True; this param will be deprecated in future version, use 'raw_params' in future implementation	`<federatedml.param.intersect_param.EncodeParam object at 0x7f3f8a7c9e50>`
`raw_params`	`RAWParam`	effective for raw method only	`<federatedml.param.intersect_param.RAWParam object at 0x7f3f8a6aee90>`
`rsa_params`	`RSAParam`	effective for rsa method only	`<federatedml.param.intersect_param.RSAParam object at 0x7f3f8a7c9ed0>`
`dh_params`	`DHParam`	effective for dh method only	`<federatedml.param.intersect_param.DHParam object at 0x7f3f8a6aed90>`
`join_method`	`{'inner_join', 'left_join'}`	if 'left_join', participants will all include sample_id_generator's (imputed) ids in output, default 'inner_join'	`'inner_join'`
`new_sample_id`	`bool`	whether to generate new id for sample_id_generator's ids, only effective when join_method is 'left_join' or when input data are instance with match id, default False	`False`
`sample_id_generator`	`str`	role whose ids are to be kept, effective only when join_method is 'left_join' or when input data are instance with match id, default 'guest'	`'guest'`
`intersect_cache_param`	`IntersectCacheParam`	specification for cache generation, with ver1.7 and above, this param is ignored.	`<federatedml.param.intersect_param.IntersectCache object at 0x7f3f8a6aedd0>`
`run_cache`	`bool`	whether to store Host's encrypted ids, only valid when intersect method is 'rsa' or 'dh', default False	`False`
`cardinality_only`	`bool`	whether to output estimated intersection count(cardinality); if sync_cardinality is True, then sync cardinality count with host(s)	`False`
`sync_cardinality`	`bool`	whether to sync cardinality with all participants, default False, only effective when cardinality_only set to True	`False`
`run_preprocess`	`bool`	whether to run preprocess process, default False	`False`
`intersect_preprocess_params`	`IntersectPreProcessParam`	used for preprocessing and cardinality_only mode	`<federatedml.param.intersect_param.IntersectPreProcessParam object at 0x7f3f8a6cb090>`
`repeated_id_process`	`bool`	if true, intersection will process the ids which can be repeatable; in ver 1.7 and above,repeated id process will be automatically applied to data with instance id, this param will be ignored	`False`
`repeated_id_owner`	`str`	which role has the repeated id; in ver 1.7 and above, this param is ignored	`'guest'`
`allow_info_share`	`bool`	in ver 1.7 and above, this param is ignored	`False`
`info_owner`	`str`	in ver 1.7 and above, this param is ignored	`'guest'`
`with_sample_id`	`bool`	data with sample id or not, default False; in ver 1.7 and above, this param is ignored	`False`

Source code in federatedml/param/intersect_param.py

class IntersectParam(BaseParam):
    """
    Define the intersect method

    Parameters
    ----------
    intersect_method: str
        it supports 'rsa', 'raw', and 'dh', default by 'rsa'

    random_bit: positive int
        it will define the size of blinding factor in rsa algorithm, default 128
        note that this param will be deprecated in future, please use random_bit in RSAParam instead

    sync_intersect_ids: bool
        In rsa, 'sync_intersect_ids' is True means guest or host will send intersect results to the others, and False will not.
        while in raw, 'sync_intersect_ids' is True means the role of "join_role" will send intersect results and the others will get them.
        Default by True.

    join_role: str
        role who joins ids, supports "guest" and "host" only and effective only for raw.
        If it is "guest", the host will send its ids to guest and find the intersection of
        ids in guest; if it is "host", the guest will send its ids to host. Default by "guest";
        note this param will be deprecated in future version, please use 'join_role' in raw_params instead

    only_output_key: bool
        if false, the results of intersection will include key and value which from input data; if true, it will just include key from input
        data and the value will be empty or filled by uniform string like "intersect_id"

    with_encode: bool
        if True, it will use hash method for intersect ids, effective for raw method only;
        note that this param will be deprecated in future version, please use 'use_hash' in raw_params;
        currently if this param is set to True,
        specification by 'encode_params' will be taken instead of 'raw_params'.

    encode_params: EncodeParam
        effective only when with_encode is True;
        this param will be deprecated in future version, use 'raw_params' in future implementation

    raw_params: RAWParam
        effective for raw method only

    rsa_params: RSAParam
        effective for rsa method only

    dh_params: DHParam
        effective for dh method only

    join_method: {'inner_join', 'left_join'}
        if 'left_join', participants will all include sample_id_generator's (imputed) ids in output,
        default 'inner_join'

    new_sample_id: bool
        whether to generate new id for sample_id_generator's ids,
        only effective when join_method is 'left_join' or when input data are instance with match id,
        default False

    sample_id_generator: str
        role whose ids are to be kept,
        effective only when join_method is 'left_join' or when input data are instance with match id,
        default 'guest'

    intersect_cache_param: IntersectCacheParam
        specification for cache generation,
        with ver1.7 and above, this param is ignored.

    run_cache: bool
        whether to store Host's encrypted ids, only valid when intersect method is 'rsa' or 'dh', default False

    cardinality_only: bool
        whether to output estimated intersection count(cardinality);
        if sync_cardinality is True, then sync cardinality count with host(s)

    sync_cardinality: bool
        whether to sync cardinality with all participants, default False,
        only effective when cardinality_only set to True

    run_preprocess: bool
        whether to run preprocess process, default False

    intersect_preprocess_params: IntersectPreProcessParam
        used for preprocessing and cardinality_only mode

    repeated_id_process: bool
        if true, intersection will process the ids which can be repeatable;
        in ver 1.7 and above,repeated id process
        will be automatically applied to data with instance id, this param will be ignored

    repeated_id_owner: str
        which role has the repeated id; in ver 1.7 and above, this param is ignored

    allow_info_share: bool
        in ver 1.7 and above, this param is ignored

    info_owner: str
        in ver 1.7 and above, this param is ignored

    with_sample_id: bool
        data with sample id or not, default False; in ver 1.7 and above, this param is ignored
    """

    def __init__(self, intersect_method: str = consts.RSA, random_bit=DEFAULT_RANDOM_BIT, sync_intersect_ids=True,
                 join_role=consts.GUEST, only_output_key: bool=False,
                 with_encode=False, encode_params=EncodeParam(),
                 raw_params=RAWParam(), rsa_params=RSAParam(), dh_params=DHParam(),
                 join_method=consts.INNER_JOIN, new_sample_id: bool = False, sample_id_generator=consts.GUEST,
                 intersect_cache_param=IntersectCache(), run_cache: bool = False,
                 cardinality_only: bool = False, sync_cardinality: bool = False,
                 run_preprocess:bool = False,
                 intersect_preprocess_params=IntersectPreProcessParam(),
                 repeated_id_process=False, repeated_id_owner=consts.GUEST,
                 with_sample_id=False,  allow_info_share: bool = False, info_owner=consts.GUEST):
        super().__init__()
        self.intersect_method = intersect_method
        self.random_bit = random_bit
        self.sync_intersect_ids = sync_intersect_ids
        self.join_role = join_role
        self.with_encode = with_encode
        self.encode_params = copy.deepcopy(encode_params)
        self.raw_params = copy.deepcopy(raw_params)
        self.rsa_params = copy.deepcopy(rsa_params)
        self.only_output_key = only_output_key
        self.sample_id_generator = sample_id_generator
        self.intersect_cache_param = copy.deepcopy(intersect_cache_param)
        self.run_cache = run_cache
        self.repeated_id_process = repeated_id_process
        self.repeated_id_owner = repeated_id_owner
        self.allow_info_share = allow_info_share
        self.info_owner = info_owner
        self.with_sample_id = with_sample_id
        self.join_method = join_method
        self.new_sample_id = new_sample_id
        self.dh_params = copy.deepcopy(dh_params)
        self.cardinality_only = cardinality_only
        self.sync_cardinality = sync_cardinality
        self.run_preprocess = run_preprocess
        self.intersect_preprocess_params = copy.deepcopy(intersect_preprocess_params)

    def check(self):
        descr = "intersect param's "

        self.intersect_method = self.check_and_change_lower(self.intersect_method,
                                                            [consts.RSA, consts.RAW, consts.DH],
                                                            f"{descr}intersect_method")

        if self._warn_to_deprecate_param("random_bit", descr, "rsa_params' 'random_bit'"):
            if "rsa_params.random_bit" in self.get_user_feeded():
                raise ValueError(f"random_bit and rsa_params.random_bit should not be set simultaneously")
            self.rsa_params.random_bit = self.random_bit

        self.check_boolean(self.sync_intersect_ids, f"{descr}intersect_ids")

        if self._warn_to_deprecate_param("encode_param", "", ""):
            if "raw_params" in self.get_user_feeded():
                raise ValueError(f"encode_param and raw_params should not be set simultaneously")
            else:
                self.callback_param.callbacks = ["PerformanceEvaluate"]

        if self._warn_to_deprecate_param("join_role", descr, "raw_params' 'join_role'"):
            if "raw_params.join_role" in self.get_user_feeded():
                raise ValueError(f"join_role and raw_params.join_role should not be set simultaneously")
            self.raw_params.join_role = self.join_role

        self.check_boolean(self.only_output_key, f"{descr}only_output_key")

        self.join_method = self.check_and_change_lower(self.join_method, [consts.INNER_JOIN, consts.LEFT_JOIN],
                                                       f"{descr}join_method")
        self.check_boolean(self.new_sample_id, f"{descr}new_sample_id")
        self.sample_id_generator = self.check_and_change_lower(self.sample_id_generator,
                                                               [consts.GUEST, consts.HOST],
                                                               f"{descr}sample_id_generator")

        if self.join_method==consts.LEFT_JOIN:
            if not self.sync_intersect_ids:
                raise ValueError(f"Cannot perform left join without sync intersect ids")

        self.check_boolean(self.run_cache, f"{descr} run_cache")

        if self._warn_to_deprecate_param("encode_params", descr, "raw_params") or \
            self._warn_to_deprecate_param("with_encode", descr, "raw_params' 'use_hash'"):
            # self.encode_params.check()
            if "with_encode" in self.get_user_feeded() and "raw_params.use_hash" in self.get_user_feeded():
                raise ValueError(f"'raw_params' and 'encode_params' should not be set simultaneously.")
            if "raw_params" in self.get_user_feeded() and "encode_params" in self.get_user_feeded():
                raise ValueError(f"'raw_params' and 'encode_params' should not be set simultaneously.")
            LOGGER.warning(f"Param values from 'encode_params' will override 'raw_params' settings.")
            self.raw_params.use_hash = self.with_encode
            self.raw_params.hash_method = self.encode_params.encode_method
            self.raw_params.salt = self.encode_params.salt
            self.raw_params.base64 = self.encode_params.base64

        self.raw_params.check()
        self.rsa_params.check()
        self.dh_params.check()
        # self.intersect_cache_param.check()
        self.check_boolean(self.cardinality_only, f"{descr}cardinality_only")
        self.check_boolean(self.sync_cardinality, f"{descr}sync_cardinality")
        self.check_boolean(self.run_preprocess, f"{descr}run_preprocess")
        self.intersect_preprocess_params.check()
        if self.cardinality_only:
            if self.intersect_method not in [consts.RSA]:
                raise ValueError(f"cardinality-only mode only support rsa.")
            if self.intersect_method == consts.RSA and self.rsa_params.split_calculation:
                raise ValueError(f"cardinality-only mode only supports unified calculation.")
        if self.run_preprocess:
            if self.intersect_preprocess_params.false_positive_rate < 0.01:
                raise ValueError(f"for preprocessing ids, false_positive_rate must be no less than 0.01")
            if self.cardinality_only:
                raise ValueError(f"cardinality_only mode cannot run preprocessing.")
        if self.run_cache:
            if self.intersect_method not in [consts.RSA, consts.DH]:
                raise ValueError(f"Only rsa or dh method supports cache.")
            if self.intersect_method == consts.RSA and self.rsa_params.split_calculation:
                raise ValueError(f"RSA split_calculation does not support cache.")
            if self.cardinality_only:
                raise ValueError(f"cache is not available for cardinality_only mode.")
            if self.run_preprocess:
                raise ValueError(f"Preprocessing does not support cache.")

        deprecated_param_list = ["repeated_id_process", "repeated_id_owner", "intersect_cache_param",
                                 "allow_info_share", "info_owner", "with_sample_id"]
        for param in deprecated_param_list:
            self._warn_deprecated_param(param, descr)

        LOGGER.debug("Finish intersect parameter check!")
        return True

__init__(self, intersect_method='rsa', random_bit=128, sync_intersect_ids=True, join_role='guest', only_output_key=False, with_encode=False, encode_params=<federatedml.param.intersect_param.EncodeParam object at 0x7f3f8a7c9e50>, raw_params=<federatedml.param.intersect_param.RAWParam object at 0x7f3f8a6aee90>, rsa_params=<federatedml.param.intersect_param.RSAParam object at 0x7f3f8a7c9ed0>, dh_params=<federatedml.param.intersect_param.DHParam object at 0x7f3f8a6aed90>, join_method='inner_join', new_sample_id=False, sample_id_generator='guest', intersect_cache_param=<federatedml.param.intersect_param.IntersectCache object at 0x7f3f8a6aedd0>, run_cache=False, cardinality_only=False, sync_cardinality=False, run_preprocess=False, intersect_preprocess_params=<federatedml.param.intersect_param.IntersectPreProcessParam object at 0x7f3f8a6cb090>, repeated_id_process=False, repeated_id_owner='guest', with_sample_id=False, allow_info_share=False, info_owner='guest')

special ¶

Source code in federatedml/param/intersect_param.py

def __init__(self, intersect_method: str = consts.RSA, random_bit=DEFAULT_RANDOM_BIT, sync_intersect_ids=True,
             join_role=consts.GUEST, only_output_key: bool=False,
             with_encode=False, encode_params=EncodeParam(),
             raw_params=RAWParam(), rsa_params=RSAParam(), dh_params=DHParam(),
             join_method=consts.INNER_JOIN, new_sample_id: bool = False, sample_id_generator=consts.GUEST,
             intersect_cache_param=IntersectCache(), run_cache: bool = False,
             cardinality_only: bool = False, sync_cardinality: bool = False,
             run_preprocess:bool = False,
             intersect_preprocess_params=IntersectPreProcessParam(),
             repeated_id_process=False, repeated_id_owner=consts.GUEST,
             with_sample_id=False,  allow_info_share: bool = False, info_owner=consts.GUEST):
    super().__init__()
    self.intersect_method = intersect_method
    self.random_bit = random_bit
    self.sync_intersect_ids = sync_intersect_ids
    self.join_role = join_role
    self.with_encode = with_encode
    self.encode_params = copy.deepcopy(encode_params)
    self.raw_params = copy.deepcopy(raw_params)
    self.rsa_params = copy.deepcopy(rsa_params)
    self.only_output_key = only_output_key
    self.sample_id_generator = sample_id_generator
    self.intersect_cache_param = copy.deepcopy(intersect_cache_param)
    self.run_cache = run_cache
    self.repeated_id_process = repeated_id_process
    self.repeated_id_owner = repeated_id_owner
    self.allow_info_share = allow_info_share
    self.info_owner = info_owner
    self.with_sample_id = with_sample_id
    self.join_method = join_method
    self.new_sample_id = new_sample_id
    self.dh_params = copy.deepcopy(dh_params)
    self.cardinality_only = cardinality_only
    self.sync_cardinality = sync_cardinality
    self.run_preprocess = run_preprocess
    self.intersect_preprocess_params = copy.deepcopy(intersect_preprocess_params)

check(self) ¶

Source code in federatedml/param/intersect_param.py

def check(self):
    descr = "intersect param's "

    self.intersect_method = self.check_and_change_lower(self.intersect_method,
                                                        [consts.RSA, consts.RAW, consts.DH],
                                                        f"{descr}intersect_method")

    if self._warn_to_deprecate_param("random_bit", descr, "rsa_params' 'random_bit'"):
        if "rsa_params.random_bit" in self.get_user_feeded():
            raise ValueError(f"random_bit and rsa_params.random_bit should not be set simultaneously")
        self.rsa_params.random_bit = self.random_bit

    self.check_boolean(self.sync_intersect_ids, f"{descr}intersect_ids")

    if self._warn_to_deprecate_param("encode_param", "", ""):
        if "raw_params" in self.get_user_feeded():
            raise ValueError(f"encode_param and raw_params should not be set simultaneously")
        else:
            self.callback_param.callbacks = ["PerformanceEvaluate"]

    if self._warn_to_deprecate_param("join_role", descr, "raw_params' 'join_role'"):
        if "raw_params.join_role" in self.get_user_feeded():
            raise ValueError(f"join_role and raw_params.join_role should not be set simultaneously")
        self.raw_params.join_role = self.join_role

    self.check_boolean(self.only_output_key, f"{descr}only_output_key")

    self.join_method = self.check_and_change_lower(self.join_method, [consts.INNER_JOIN, consts.LEFT_JOIN],
                                                   f"{descr}join_method")
    self.check_boolean(self.new_sample_id, f"{descr}new_sample_id")
    self.sample_id_generator = self.check_and_change_lower(self.sample_id_generator,
                                                           [consts.GUEST, consts.HOST],
                                                           f"{descr}sample_id_generator")

    if self.join_method==consts.LEFT_JOIN:
        if not self.sync_intersect_ids:
            raise ValueError(f"Cannot perform left join without sync intersect ids")

    self.check_boolean(self.run_cache, f"{descr} run_cache")

    if self._warn_to_deprecate_param("encode_params", descr, "raw_params") or \
        self._warn_to_deprecate_param("with_encode", descr, "raw_params' 'use_hash'"):
        # self.encode_params.check()
        if "with_encode" in self.get_user_feeded() and "raw_params.use_hash" in self.get_user_feeded():
            raise ValueError(f"'raw_params' and 'encode_params' should not be set simultaneously.")
        if "raw_params" in self.get_user_feeded() and "encode_params" in self.get_user_feeded():
            raise ValueError(f"'raw_params' and 'encode_params' should not be set simultaneously.")
        LOGGER.warning(f"Param values from 'encode_params' will override 'raw_params' settings.")
        self.raw_params.use_hash = self.with_encode
        self.raw_params.hash_method = self.encode_params.encode_method
        self.raw_params.salt = self.encode_params.salt
        self.raw_params.base64 = self.encode_params.base64

    self.raw_params.check()
    self.rsa_params.check()
    self.dh_params.check()
    # self.intersect_cache_param.check()
    self.check_boolean(self.cardinality_only, f"{descr}cardinality_only")
    self.check_boolean(self.sync_cardinality, f"{descr}sync_cardinality")
    self.check_boolean(self.run_preprocess, f"{descr}run_preprocess")
    self.intersect_preprocess_params.check()
    if self.cardinality_only:
        if self.intersect_method not in [consts.RSA]:
            raise ValueError(f"cardinality-only mode only support rsa.")
        if self.intersect_method == consts.RSA and self.rsa_params.split_calculation:
            raise ValueError(f"cardinality-only mode only supports unified calculation.")
    if self.run_preprocess:
        if self.intersect_preprocess_params.false_positive_rate < 0.01:
            raise ValueError(f"for preprocessing ids, false_positive_rate must be no less than 0.01")
        if self.cardinality_only:
            raise ValueError(f"cardinality_only mode cannot run preprocessing.")
    if self.run_cache:
        if self.intersect_method not in [consts.RSA, consts.DH]:
            raise ValueError(f"Only rsa or dh method supports cache.")
        if self.intersect_method == consts.RSA and self.rsa_params.split_calculation:
            raise ValueError(f"RSA split_calculation does not support cache.")
        if self.cardinality_only:
            raise ValueError(f"cache is not available for cardinality_only mode.")
        if self.run_preprocess:
            raise ValueError(f"Preprocessing does not support cache.")

    deprecated_param_list = ["repeated_id_process", "repeated_id_owner", "intersect_cache_param",
                             "allow_info_share", "info_owner", "with_sample_id"]
    for param in deprecated_param_list:
        self._warn_deprecated_param(param, descr)

    LOGGER.debug("Finish intersect parameter check!")
    return True

`label_transform_param` ¶

Classes¶


LabelTransformParam            (BaseParam)

¶

Define label transform param that used in label transform.

Parameters:

Name	Type	Description	Default
`label_encoder`	`None or dict, default : None`	Specify (label, encoded label) key-value pairs for transforming labels to new values. e.g. {"Yes": 1, "No": 0}	`None`
`label_list`	`None or list, default : None`	List all input labels, used for matching types of original keys in label_encoder dict, length should match key count in label_encoder e.g. ["Yes", "No"]	`None`
`need_run`	`bool, default: True`	Specify whether to run label transform	`True`

Source code in federatedml/param/label_transform_param.py

class LabelTransformParam(BaseParam):
    """
    Define label transform param that used in label transform.

    Parameters
    ----------

    label_encoder : None or dict, default : None
        Specify (label, encoded label) key-value pairs for transforming labels to new values.
        e.g. {"Yes": 1, "No": 0}

    label_list : None or list, default : None
        List all input labels, used for matching types of original keys in label_encoder dict,
        length should match key count in label_encoder
        e.g. ["Yes", "No"]

    need_run: bool, default: True
        Specify whether to run label transform

    """

    def __init__(self, label_encoder=None, label_list=None, need_run=True):
        super(LabelTransformParam, self).__init__()
        self.label_encoder = label_encoder
        self.label_list = label_list
        self.need_run = need_run

    def check(self):
        model_param_descr = "label transform param's "

        BaseParam.check_boolean(self.need_run, f"{model_param_descr} need run ")

        if self.label_encoder is not None:
            if not isinstance(self.label_encoder, dict):
                raise ValueError(f"{model_param_descr} label_encoder should be dict type")

        if self.label_list is not None:
            if not isinstance(self.label_list, list):
                raise ValueError(f"{model_param_descr} label_list should be list type")
            if self.label_encoder and len(self.label_list) != len(self.label_encoder.keys()):
                raise ValueError(f"label_list length should match label_encoder key count")

        LOGGER.debug("Finish label transformer parameter check!")
        return True

__init__(self, label_encoder=None, label_list=None, need_run=True) special ¶

Source code in federatedml/param/label_transform_param.py

def __init__(self, label_encoder=None, label_list=None, need_run=True):
    super(LabelTransformParam, self).__init__()
    self.label_encoder = label_encoder
    self.label_list = label_list
    self.need_run = need_run

check(self) ¶

Source code in federatedml/param/label_transform_param.py

def check(self):
    model_param_descr = "label transform param's "

    BaseParam.check_boolean(self.need_run, f"{model_param_descr} need run ")

    if self.label_encoder is not None:
        if not isinstance(self.label_encoder, dict):
            raise ValueError(f"{model_param_descr} label_encoder should be dict type")

    if self.label_list is not None:
        if not isinstance(self.label_list, list):
            raise ValueError(f"{model_param_descr} label_list should be list type")
        if self.label_encoder and len(self.label_list) != len(self.label_encoder.keys()):
            raise ValueError(f"label_list length should match label_encoder key count")

    LOGGER.debug("Finish label transformer parameter check!")
    return True

`linear_regression_param` ¶

Classes¶


LinearParam            (BaseParam)

¶

Parameters used for Linear Regression.

Parameters:

Name	Type	Description	Default
`penalty`	`{'L2' or 'L1'}`	Penalty method used in LinR. Please note that, when using encrypted version in HeteroLinR, 'L1' is not supported.	`'L2'`
`tol`	`float, default: 1e-4`	The tolerance of convergence	`0.0001`
`alpha`	`float, default: 1.0`	Regularization strength coefficient.	`1.0`
`optimizer`	`{'sgd', 'rmsprop', 'adam', 'sqn', 'adagrad'}`	Optimize method	`'sgd'`
`batch_size`	`int, default: -1`	Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.	`-1`
`learning_rate`	`float, default: 0.01`	Learning rate	`0.01`
`max_iter`	`int, default: 20`	The maximum iteration for training.	`20`
`init_param`	`InitParam object, default: default InitParam object`	Init param method object.	`<federatedml.param.init_model_param.InitParam object at 0x7f3f8a6b62d0>`
`early_stop`	`{'diff', 'abs', 'weight_dff'}`	Method used to judge convergence. a) diff： Use difference of loss between two iterations to judge whether converge. b) abs: Use the absolute value of loss to judge whether converge. i.e. if loss < tol, it is converged. c) weight_diff: Use difference between weights of two consecutive iterations	`'diff'`
`predict_param`	`PredictParam object, default: default PredictParam object`	predict param	`<federatedml.param.predict_param.PredictParam object at 0x7f3f8a6b6a10>`
`encrypt_param`	`EncryptParam object, default: default EncryptParam object`	encrypt param	`<federatedml.param.encrypt_param.EncryptParam object at 0x7f3f8a6b6e90>`
`encrypted_mode_calculator_param`	`EncryptedModeCalculatorParam object, default: default EncryptedModeCalculatorParam object`	encrypted mode calculator param	`<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object at 0x7f3f8a6b6f90>`
`cv_param`	`CrossValidationParam object, default: default CrossValidationParam object`	cv param	`<federatedml.param.cross_validation_param.CrossValidationParam object at 0x7f3f8a6b6ed0>`
`decay`	`int or float, default: 1`	Decay rate for learning rate. learning rate will follow the following decay schedule. lr = lr0/(1+decayt) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decayt) where t is the iter number.	`1`
`decay_sqrt`	`Bool, default: True`	lr = lr0/(1+decayt) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decayt)	`True`
`validation_freqs`	`int, list, tuple, set, or None`	validation frequency during training, required when using early stopping. The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to speed up training by skipping validation rounds. When it is larger than 1, a number which is divisible by "max_iter" is recommended, otherwise, you will miss the validation scores of the last training iteration.	`None`
`early_stopping_rounds`	`int, default: None`	If positive number specified, at every specified training rounds, program checks for early stopping criteria. Validation_freqs must also be set when using early stopping.	`None`
`metrics`	`list or None, default: None`	Specify which metrics to be used when performing evaluation during training process. If metrics have not improved at early_stopping rounds, trianing stops before convergence. If set as empty, default metrics will be used. For regression tasks, default metrics are ['root_mean_squared_error', 'mean_absolute_error']	`None`
`use_first_metric_only`	`bool, default: False`	Indicate whether to use the first metric in `metrics` as the only criterion for early stopping judgement.	`False`
`floating_point_precision`	`None or integer`	if not None, use floating_point_precision-bit to speed up calculation, e.g.: convert an x to round(x * 2floating_point_precision) during Paillier operation, divide the result by 2floating_point_precision in the end.	`23`
`callback_param`	`CallbackParam object`	callback param	`<federatedml.param.callback_param.CallbackParam object at 0x7f3f8a6b6fd0>`

Source code in federatedml/param/linear_regression_param.py

class LinearParam(BaseParam):
    """
    Parameters used for Linear Regression.

    Parameters
    ----------
    penalty : {'L2' or 'L1'}
        Penalty method used in LinR. Please note that, when using encrypted version in HeteroLinR,
        'L1' is not supported.

    tol : float, default: 1e-4
        The tolerance of convergence

    alpha : float, default: 1.0
        Regularization strength coefficient.

    optimizer : {'sgd', 'rmsprop', 'adam', 'sqn', 'adagrad'}
        Optimize method

    batch_size : int, default: -1
        Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.

    learning_rate : float, default: 0.01
        Learning rate

    max_iter : int, default: 20
        The maximum iteration for training.

    init_param: InitParam object, default: default InitParam object
        Init param method object.

    early_stop : {'diff', 'abs', 'weight_dff'}
        Method used to judge convergence.
            a)	diff： Use difference of loss between two iterations to judge whether converge.
            b)	abs: Use the absolute value of loss to judge whether converge. i.e. if loss < tol, it is converged.
            c)  weight_diff: Use difference between weights of two consecutive iterations

    predict_param: PredictParam object, default: default PredictParam object
        predict param

    encrypt_param: EncryptParam object, default: default EncryptParam object
        encrypt param

    encrypted_mode_calculator_param: EncryptedModeCalculatorParam object, default: default EncryptedModeCalculatorParam object
        encrypted mode calculator param

    cv_param: CrossValidationParam object, default: default CrossValidationParam object
        cv param

    decay: int or float, default: 1
        Decay rate for learning rate. learning rate will follow the following decay schedule.
        lr = lr0/(1+decay*t) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decay*t)
        where t is the iter number.

    decay_sqrt: Bool, default: True
        lr = lr0/(1+decay*t) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decay*t)

    validation_freqs: int, list, tuple, set, or None
        validation frequency during training, required when using early stopping.
        The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to speed up training by skipping validation rounds.
        When it is larger than 1, a number which is divisible by "max_iter" is recommended, otherwise, you will miss the validation scores of the last training iteration.

    early_stopping_rounds: int, default: None
        If positive number specified, at every specified training rounds, program checks for early stopping criteria.
        Validation_freqs must also be set when using early stopping.

    metrics: list or None, default: None
        Specify which metrics to be used when performing evaluation during training process. If metrics have not improved at early_stopping rounds, trianing stops before convergence.
        If set as empty, default metrics will be used. For regression tasks, default metrics are ['root_mean_squared_error', 'mean_absolute_error']

    use_first_metric_only: bool, default: False
        Indicate whether to use the first metric in `metrics` as the only criterion for early stopping judgement.

    floating_point_precision: None or integer
        if not None, use floating_point_precision-bit to speed up calculation,
        e.g.: convert an x to round(x * 2**floating_point_precision) during Paillier operation, divide
                the result by 2**floating_point_precision in the end.
    callback_param: CallbackParam object
        callback param

    """

    def __init__(self, penalty='L2',
                 tol=1e-4, alpha=1.0, optimizer='sgd',
                 batch_size=-1, learning_rate=0.01, init_param=InitParam(),
                 max_iter=20, early_stop='diff', predict_param=PredictParam(),
                 encrypt_param=EncryptParam(), sqn_param=StochasticQuasiNewtonParam(),
                 encrypted_mode_calculator_param=EncryptedModeCalculatorParam(),
                 cv_param=CrossValidationParam(), decay=1, decay_sqrt=True, validation_freqs=None,
                 early_stopping_rounds=None, stepwise_param=StepwiseParam(), metrics=None, use_first_metric_only=False,
                 floating_point_precision=23, callback_param=CallbackParam()):
        super(LinearParam, self).__init__()
        self.penalty = penalty
        self.tol = tol
        self.alpha = alpha
        self.optimizer = optimizer
        self.batch_size = batch_size
        self.learning_rate = learning_rate
        self.init_param = copy.deepcopy(init_param)
        self.max_iter = max_iter
        self.early_stop = early_stop
        self.encrypt_param = encrypt_param
        self.encrypted_mode_calculator_param = copy.deepcopy(encrypted_mode_calculator_param)
        self.cv_param = copy.deepcopy(cv_param)
        self.predict_param = copy.deepcopy(predict_param)
        self.decay = decay
        self.decay_sqrt = decay_sqrt
        self.validation_freqs = validation_freqs
        self.sqn_param = copy.deepcopy(sqn_param)
        self.early_stopping_rounds = early_stopping_rounds
        self.stepwise_param = copy.deepcopy(stepwise_param)
        self.metrics = metrics or []
        self.use_first_metric_only = use_first_metric_only
        self.floating_point_precision = floating_point_precision
        self.callback_param = copy.deepcopy(callback_param)

    def check(self):
        descr = "linear_regression_param's "

        if self.penalty is None:
            self.penalty = 'NONE'
        elif type(self.penalty).__name__ != "str":
            raise ValueError(
                descr + "penalty {} not supported, should be str type".format(self.penalty))

        self.penalty = self.penalty.upper()
        if self.penalty not in ['L1', 'L2', 'NONE']:
            raise ValueError(
                "penalty {} not supported, penalty should be 'L1', 'L2' or 'none'".format(self.penalty))

        if type(self.tol).__name__ not in ["int", "float"]:
            raise ValueError(
                descr + "tol {} not supported, should be float type".format(self.tol))

        if type(self.alpha).__name__ not in ["int", "float"]:
            raise ValueError(
                descr + "alpha {} not supported, should be float type".format(self.alpha))

        if type(self.optimizer).__name__ != "str":
            raise ValueError(
                descr + "optimizer {} not supported, should be str type".format(self.optimizer))
        else:
            self.optimizer = self.optimizer.lower()
            if self.optimizer not in ['sgd', 'rmsprop', 'adam', 'adagrad', 'sqn']:
                raise ValueError(
                    descr + "optimizer not supported, optimizer should be"
                    " 'sgd', 'rmsprop', 'adam', 'sqn' or 'adagrad'")

        if type(self.batch_size).__name__ not in ["int", "long"]:
            raise ValueError(
                descr + "batch_size {} not supported, should be int type".format(self.batch_size))
        if self.batch_size != -1:
            if type(self.batch_size).__name__ not in ["int", "long"] \
                    or self.batch_size < consts.MIN_BATCH_SIZE:
                raise ValueError(descr + " {} not supported, should be larger than {} or "
                                         "-1 represent for all data".format(self.batch_size, consts.MIN_BATCH_SIZE))

        if type(self.learning_rate).__name__ not in ["int", "float"]:
            raise ValueError(
                descr + "learning_rate {} not supported, should be float type".format(
                    self.learning_rate))

        self.init_param.check()

        if type(self.max_iter).__name__ != "int":
            raise ValueError(
                descr + "max_iter {} not supported, should be int type".format(self.max_iter))
        elif self.max_iter <= 0:
            raise ValueError(
                descr + "max_iter must be greater or equal to 1")

        if type(self.early_stop).__name__ != "str":
            raise ValueError(
                descr + "early_stop {} not supported, should be str type".format(
                    self.early_stop))
        else:
            self.early_stop = self.early_stop.lower()
            if self.early_stop not in ['diff', 'abs', 'weight_diff']:
                raise ValueError(
                    descr + "early_stop not supported, early_stop should be 'weight_diff', 'diff' or 'abs'")

        self.encrypt_param.check()
        if self.encrypt_param.method != consts.PAILLIER:
            raise ValueError(
                descr + "encrypt method supports 'Paillier' only")

        self.encrypted_mode_calculator_param.check()

        if type(self.decay).__name__ not in ["int", "float"]:
            raise ValueError(
                descr + "decay {} not supported, should be 'int' or 'float'".format(self.decay)
            )
        if type(self.decay_sqrt).__name__ not in ["bool"]:
            raise ValueError(
                descr + "decay_sqrt {} not supported, should be 'bool'".format(self.decay)
            )
        self.sqn_param.check()
        self.stepwise_param.check()

        for p in ["early_stopping_rounds", "validation_freqs", "metrics",
                  "use_first_metric_only"]:
            if self._warn_to_deprecate_param(p, "", ""):
                if "callback_param" in self.get_user_feeded():
                    raise ValueError(f"{p} and callback param should not be set simultaneously")
                else:
                    self.callback_param.callbacks = ["PerformanceEvaluate"]
                break

        if self._warn_to_deprecate_param("validation_freqs", descr, "callback_param's 'validation_freqs'"):
            self.callback_param.validation_freqs = self.validation_freqs

        if self._warn_to_deprecate_param("early_stopping_rounds", descr, "callback_param's 'early_stopping_rounds'"):
            self.callback_param.early_stopping_rounds = self.early_stopping_rounds

        if self._warn_to_deprecate_param("metrics", descr, "callback_param's 'metrics'"):
            self.callback_param.metrics = self.metrics

        if self._warn_to_deprecate_param("use_first_metric_only", descr, "callback_param's 'use_first_metric_only'"):
            self.callback_param.use_first_metric_only = self.use_first_metric_only

        if self.floating_point_precision is not None and \
                (not isinstance(self.floating_point_precision, int) or
                 self.floating_point_precision < 0 or self.floating_point_precision > 64):
            raise ValueError("floating point precision should be null or a integer between 0 and 64")
        self.callback_param.check()
        return True

__init__(self, penalty='L2', tol=0.0001, alpha=1.0, optimizer='sgd', batch_size=-1, learning_rate=0.01, init_param=<federatedml.param.init_model_param.InitParam object at 0x7f3f8a6b62d0>, max_iter=20, early_stop='diff', predict_param=<federatedml.param.predict_param.PredictParam object at 0x7f3f8a6b6a10>, encrypt_param=<federatedml.param.encrypt_param.EncryptParam object at 0x7f3f8a6b6e90>, sqn_param=<federatedml.param.sqn_param.StochasticQuasiNewtonParam object at 0x7f3f8a6b6750>, encrypted_mode_calculator_param=<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object at 0x7f3f8a6b6f90>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object at 0x7f3f8a6b6ed0>, decay=1, decay_sqrt=True, validation_freqs=None, early_stopping_rounds=None, stepwise_param=<federatedml.param.stepwise_param.StepwiseParam object at 0x7f3f8a6b6f10>, metrics=None, use_first_metric_only=False, floating_point_precision=23, callback_param=<federatedml.param.callback_param.CallbackParam object at 0x7f3f8a6b6fd0>)

special ¶

Source code in federatedml/param/linear_regression_param.py

def __init__(self, penalty='L2',
             tol=1e-4, alpha=1.0, optimizer='sgd',
             batch_size=-1, learning_rate=0.01, init_param=InitParam(),
             max_iter=20, early_stop='diff', predict_param=PredictParam(),
             encrypt_param=EncryptParam(), sqn_param=StochasticQuasiNewtonParam(),
             encrypted_mode_calculator_param=EncryptedModeCalculatorParam(),
             cv_param=CrossValidationParam(), decay=1, decay_sqrt=True, validation_freqs=None,
             early_stopping_rounds=None, stepwise_param=StepwiseParam(), metrics=None, use_first_metric_only=False,
             floating_point_precision=23, callback_param=CallbackParam()):
    super(LinearParam, self).__init__()
    self.penalty = penalty
    self.tol = tol
    self.alpha = alpha
    self.optimizer = optimizer
    self.batch_size = batch_size
    self.learning_rate = learning_rate
    self.init_param = copy.deepcopy(init_param)
    self.max_iter = max_iter
    self.early_stop = early_stop
    self.encrypt_param = encrypt_param
    self.encrypted_mode_calculator_param = copy.deepcopy(encrypted_mode_calculator_param)
    self.cv_param = copy.deepcopy(cv_param)
    self.predict_param = copy.deepcopy(predict_param)
    self.decay = decay
    self.decay_sqrt = decay_sqrt
    self.validation_freqs = validation_freqs
    self.sqn_param = copy.deepcopy(sqn_param)
    self.early_stopping_rounds = early_stopping_rounds
    self.stepwise_param = copy.deepcopy(stepwise_param)
    self.metrics = metrics or []
    self.use_first_metric_only = use_first_metric_only
    self.floating_point_precision = floating_point_precision
    self.callback_param = copy.deepcopy(callback_param)

check(self) ¶

Source code in federatedml/param/linear_regression_param.py

def check(self):
    descr = "linear_regression_param's "

    if self.penalty is None:
        self.penalty = 'NONE'
    elif type(self.penalty).__name__ != "str":
        raise ValueError(
            descr + "penalty {} not supported, should be str type".format(self.penalty))

    self.penalty = self.penalty.upper()
    if self.penalty not in ['L1', 'L2', 'NONE']:
        raise ValueError(
            "penalty {} not supported, penalty should be 'L1', 'L2' or 'none'".format(self.penalty))

    if type(self.tol).__name__ not in ["int", "float"]:
        raise ValueError(
            descr + "tol {} not supported, should be float type".format(self.tol))

    if type(self.alpha).__name__ not in ["int", "float"]:
        raise ValueError(
            descr + "alpha {} not supported, should be float type".format(self.alpha))

    if type(self.optimizer).__name__ != "str":
        raise ValueError(
            descr + "optimizer {} not supported, should be str type".format(self.optimizer))
    else:
        self.optimizer = self.optimizer.lower()
        if self.optimizer not in ['sgd', 'rmsprop', 'adam', 'adagrad', 'sqn']:
            raise ValueError(
                descr + "optimizer not supported, optimizer should be"
                " 'sgd', 'rmsprop', 'adam', 'sqn' or 'adagrad'")

    if type(self.batch_size).__name__ not in ["int", "long"]:
        raise ValueError(
            descr + "batch_size {} not supported, should be int type".format(self.batch_size))
    if self.batch_size != -1:
        if type(self.batch_size).__name__ not in ["int", "long"] \
                or self.batch_size < consts.MIN_BATCH_SIZE:
            raise ValueError(descr + " {} not supported, should be larger than {} or "
                                     "-1 represent for all data".format(self.batch_size, consts.MIN_BATCH_SIZE))

    if type(self.learning_rate).__name__ not in ["int", "float"]:
        raise ValueError(
            descr + "learning_rate {} not supported, should be float type".format(
                self.learning_rate))

    self.init_param.check()

    if type(self.max_iter).__name__ != "int":
        raise ValueError(
            descr + "max_iter {} not supported, should be int type".format(self.max_iter))
    elif self.max_iter <= 0:
        raise ValueError(
            descr + "max_iter must be greater or equal to 1")

    if type(self.early_stop).__name__ != "str":
        raise ValueError(
            descr + "early_stop {} not supported, should be str type".format(
                self.early_stop))
    else:
        self.early_stop = self.early_stop.lower()
        if self.early_stop not in ['diff', 'abs', 'weight_diff']:
            raise ValueError(
                descr + "early_stop not supported, early_stop should be 'weight_diff', 'diff' or 'abs'")

    self.encrypt_param.check()
    if self.encrypt_param.method != consts.PAILLIER:
        raise ValueError(
            descr + "encrypt method supports 'Paillier' only")

    self.encrypted_mode_calculator_param.check()

    if type(self.decay).__name__ not in ["int", "float"]:
        raise ValueError(
            descr + "decay {} not supported, should be 'int' or 'float'".format(self.decay)
        )
    if type(self.decay_sqrt).__name__ not in ["bool"]:
        raise ValueError(
            descr + "decay_sqrt {} not supported, should be 'bool'".format(self.decay)
        )
    self.sqn_param.check()
    self.stepwise_param.check()

    for p in ["early_stopping_rounds", "validation_freqs", "metrics",
              "use_first_metric_only"]:
        if self._warn_to_deprecate_param(p, "", ""):
            if "callback_param" in self.get_user_feeded():
                raise ValueError(f"{p} and callback param should not be set simultaneously")
            else:
                self.callback_param.callbacks = ["PerformanceEvaluate"]
            break

    if self._warn_to_deprecate_param("validation_freqs", descr, "callback_param's 'validation_freqs'"):
        self.callback_param.validation_freqs = self.validation_freqs

    if self._warn_to_deprecate_param("early_stopping_rounds", descr, "callback_param's 'early_stopping_rounds'"):
        self.callback_param.early_stopping_rounds = self.early_stopping_rounds

    if self._warn_to_deprecate_param("metrics", descr, "callback_param's 'metrics'"):
        self.callback_param.metrics = self.metrics

    if self._warn_to_deprecate_param("use_first_metric_only", descr, "callback_param's 'use_first_metric_only'"):
        self.callback_param.use_first_metric_only = self.use_first_metric_only

    if self.floating_point_precision is not None and \
            (not isinstance(self.floating_point_precision, int) or
             self.floating_point_precision < 0 or self.floating_point_precision > 64):
        raise ValueError("floating point precision should be null or a integer between 0 and 64")
    self.callback_param.check()
    return True

`local_baseline_param` ¶

Classes¶


LocalBaselineParam            (BaseParam)

¶

Define the local baseline model param

Parameters:

Name	Type	Description	Default
`model_name`	`str`	sklearn model used to train on baseline model	`'LogisticRegression'`
`model_opts`	`dict or none, default None`	Param to be used as input into baseline model	`None`
`predict_param`	`PredictParam object, default: default PredictParam object`	predict param	`<federatedml.param.predict_param.PredictParam object at 0x7f3f8a68e090>`
`need_run`	`bool, default True`	Indicate if this module needed to be run	`True`

Source code in federatedml/param/local_baseline_param.py

class LocalBaselineParam(BaseParam):
    """
    Define the local baseline model param

    Parameters
    ----------
    model_name : str
        sklearn model used to train on baseline model

    model_opts : dict or none, default None
        Param to be used as input into baseline model

    predict_param : PredictParam object, default: default PredictParam object
        predict param

    need_run: bool, default True
        Indicate if this module needed to be run
    """

    def __init__(self, model_name="LogisticRegression", model_opts=None, predict_param=PredictParam(), need_run=True):
        super(LocalBaselineParam, self).__init__()
        self.model_name = model_name
        self.model_opts = model_opts
        self.predict_param = copy.deepcopy(predict_param)
        self.need_run = need_run

    def check(self):
        descr = "local baseline param"

        self.model_name = self.check_and_change_lower(self.model_name,
                                                      ["logisticregression"],
                                                      descr)
        self.check_boolean(self.need_run, descr)
        if self.model_opts is not None:
            if not isinstance(self.model_opts, dict):
                raise ValueError(descr + " model_opts must be None or dict.")
        if self.model_opts is None:
            self.model_opts = {}
        self.predict_param.check()

        return True

__init__(self, model_name='LogisticRegression', model_opts=None, predict_param=<federatedml.param.predict_param.PredictParam object at 0x7f3f8a68e090>, need_run=True)

special ¶

Source code in federatedml/param/local_baseline_param.py

def __init__(self, model_name="LogisticRegression", model_opts=None, predict_param=PredictParam(), need_run=True):
    super(LocalBaselineParam, self).__init__()
    self.model_name = model_name
    self.model_opts = model_opts
    self.predict_param = copy.deepcopy(predict_param)
    self.need_run = need_run

check(self) ¶

Source code in federatedml/param/local_baseline_param.py

def check(self):
    descr = "local baseline param"

    self.model_name = self.check_and_change_lower(self.model_name,
                                                  ["logisticregression"],
                                                  descr)
    self.check_boolean(self.need_run, descr)
    if self.model_opts is not None:
        if not isinstance(self.model_opts, dict):
            raise ValueError(descr + " model_opts must be None or dict.")
    if self.model_opts is None:
        self.model_opts = {}
    self.predict_param.check()

    return True

`logistic_regression_param` ¶

deprecated_param_list ¶

Classes¶


LogisticParam            (BaseParam)

¶

Parameters used for Logistic Regression both for Homo mode or Hetero mode.

Parameters:

Name	Type	Description	Default
`penalty`	`{'L2', 'L1' or None}`	Penalty method used in LR. Please note that, when using encrypted version in HomoLR, 'L1' is not supported.	`'L2'`
`tol`	`float, default: 1e-4`	The tolerance of convergence	`0.0001`
`alpha`	`float, default: 1.0`	Regularization strength coefficient.	`1.0`
`optimizer`	`{'rmsprop', 'sgd', 'adam', 'nesterov_momentum_sgd', 'sqn', 'adagrad'}, default: 'rmsprop'`	Optimize method, if 'sqn' has been set, sqn_param will take effect. Currently, 'sqn' support hetero mode only.	`'rmsprop'`
`batch_size`	`int, default: -1`	Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.	`-1`
`learning_rate`	`float, default: 0.01`	Learning rate	`0.01`
`max_iter`	`int, default: 100`	The maximum iteration for training.	`100`
`early_stop`	`{'diff', 'weight_diff', 'abs'}, default: 'diff'`	Method used to judge converge or not. a) diff： Use difference of loss between two iterations to judge whether converge. b) weight_diff: Use difference between weights of two consecutive iterations c) abs: Use the absolute value of loss to judge whether converge. i.e. if loss < eps, it is converged. `Please note that for hetero-lr multi-host situation, this parameter support "weight_diff" only.`	`'diff'`
`decay`	`int or float, default: 1`	Decay rate for learning rate. learning rate will follow the following decay schedule. lr = lr0/(1+decayt) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decayt) where t is the iter number.	`1`
`decay_sqrt`	`bool, default: True`	lr = lr0/(1+decayt) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decayt)	`True`
`encrypt_param`	`EncryptParam object, default: default EncryptParam object`	encrypt param	`<federatedml.param.encrypt_param.EncryptParam object at 0x7f3f8a68e590>`
`predict_param`	`PredictParam object, default: default PredictParam object`	predict param	`<federatedml.param.predict_param.PredictParam object at 0x7f3f8a68e5d0>`
`callback_param`	`CallbackParam object`	callback param	`<federatedml.param.callback_param.CallbackParam object at 0x7f3f8a68e690>`
`cv_param`	`CrossValidationParam object, default: default CrossValidationParam object`	cv param	`<federatedml.param.cross_validation_param.CrossValidationParam object at 0x7f3f8a68e490>`
`multi_class`	`{'ovr'}, default: 'ovr'`	If it is a multi_class task, indicate what strategy to use. Currently, support 'ovr' short for one_vs_rest only.	`'ovr'`
`validation_freqs`	`int or list or tuple or set, or None, default None`	validation frequency during training.	`None`
`early_stopping_rounds`	`int, default: None`	Will stop training if one metric doesn’t improve in last early_stopping_round rounds	`None`
`metrics`	`list or None, default: None`	Indicate when executing evaluation during train process, which metrics will be used. If set as empty, default metrics for specific task type will be used. As for binary classification, default metrics are ['auc', 'ks']	`None`
`use_first_metric_only`	`bool, default: False`	Indicate whether use the first metric only for early stopping judgement.	`False`
`floating_point_precision`	`None or integer`	if not None, use floating_point_precision-bit to speed up calculation, e.g.: convert an x to round(x * 2floating_point_precision) during Paillier operation, divide the result by 2floating_point_precision in the end.	`23`

Source code in federatedml/param/logistic_regression_param.py

class LogisticParam(BaseParam):
    """
    Parameters used for Logistic Regression both for Homo mode or Hetero mode.

    Parameters
    ----------
    penalty : {'L2', 'L1' or None}
        Penalty method used in LR. Please note that, when using encrypted version in HomoLR,
        'L1' is not supported.

    tol : float, default: 1e-4
        The tolerance of convergence

    alpha : float, default: 1.0
        Regularization strength coefficient.

    optimizer : {'rmsprop', 'sgd', 'adam', 'nesterov_momentum_sgd', 'sqn', 'adagrad'}, default: 'rmsprop'
        Optimize method, if 'sqn' has been set, sqn_param will take effect. Currently, 'sqn' support hetero mode only.

    batch_size : int, default: -1
        Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.

    learning_rate : float, default: 0.01
        Learning rate

    max_iter : int, default: 100
        The maximum iteration for training.

    early_stop : {'diff', 'weight_diff', 'abs'}, default: 'diff'
        Method used to judge converge or not.
            a)	diff： Use difference of loss between two iterations to judge whether converge.
            b)  weight_diff: Use difference between weights of two consecutive iterations
            c)	abs: Use the absolute value of loss to judge whether converge. i.e. if loss < eps, it is converged.

            Please note that for hetero-lr multi-host situation, this parameter support "weight_diff" only.

    decay: int or float, default: 1
        Decay rate for learning rate. learning rate will follow the following decay schedule.
        lr = lr0/(1+decay*t) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decay*t)
        where t is the iter number.

    decay_sqrt: bool, default: True
        lr = lr0/(1+decay*t) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decay*t)

    encrypt_param: EncryptParam object, default: default EncryptParam object
        encrypt param

    predict_param: PredictParam object, default: default PredictParam object
        predict param

    callback_param: CallbackParam object
        callback param

    cv_param: CrossValidationParam object, default: default CrossValidationParam object
        cv param

    multi_class: {'ovr'}, default: 'ovr'
        If it is a multi_class task, indicate what strategy to use. Currently, support 'ovr' short for one_vs_rest only.

    validation_freqs: int or list or tuple or set, or None, default None
        validation frequency during training.

    early_stopping_rounds: int, default: None
        Will stop training if one metric doesn’t improve in last early_stopping_round rounds

    metrics: list or None, default: None
        Indicate when executing evaluation during train process, which metrics will be used. If set as empty,
        default metrics for specific task type will be used. As for binary classification, default metrics are
        ['auc', 'ks']

    use_first_metric_only: bool, default: False
        Indicate whether use the first metric only for early stopping judgement.

    floating_point_precision: None or integer
        if not None, use floating_point_precision-bit to speed up calculation,
        e.g.: convert an x to round(x * 2**floating_point_precision) during Paillier operation, divide
                the result by 2**floating_point_precision in the end.

    """

    def __init__(self, penalty='L2',
                 tol=1e-4, alpha=1.0, optimizer='rmsprop',
                 batch_size=-1, learning_rate=0.01, init_param=InitParam(),
                 max_iter=100, early_stop='diff', encrypt_param=EncryptParam(),
                 predict_param=PredictParam(), cv_param=CrossValidationParam(),
                 decay=1, decay_sqrt=True,
                 multi_class='ovr', validation_freqs=None, early_stopping_rounds=None,
                 stepwise_param=StepwiseParam(), floating_point_precision=23,
                 metrics=None,
                 use_first_metric_only=False,
                 callback_param=CallbackParam()
                 ):
        super(LogisticParam, self).__init__()
        self.penalty = penalty
        self.tol = tol
        self.alpha = alpha
        self.optimizer = optimizer
        self.batch_size = batch_size
        self.learning_rate = learning_rate
        self.init_param = copy.deepcopy(init_param)
        self.max_iter = max_iter
        self.early_stop = early_stop
        self.encrypt_param = encrypt_param
        self.predict_param = copy.deepcopy(predict_param)
        self.cv_param = copy.deepcopy(cv_param)
        self.decay = decay
        self.decay_sqrt = decay_sqrt
        self.multi_class = multi_class
        self.validation_freqs = validation_freqs
        self.stepwise_param = copy.deepcopy(stepwise_param)
        self.early_stopping_rounds = early_stopping_rounds
        self.metrics = metrics or []
        self.use_first_metric_only = use_first_metric_only
        self.floating_point_precision = floating_point_precision
        self.callback_param = copy.deepcopy(callback_param)

    def check(self):
        descr = "logistic_param's"

        if self.penalty is None:
            pass
        elif type(self.penalty).__name__ != "str":
            raise ValueError(
                "logistic_param's penalty {} not supported, should be str type".format(self.penalty))
        else:
            self.penalty = self.penalty.upper()
            if self.penalty not in [consts.L1_PENALTY, consts.L2_PENALTY, 'NONE']:
                raise ValueError(
                    "logistic_param's penalty not supported, penalty should be 'L1', 'L2' or 'none'")

        if not isinstance(self.tol, (int, float)):
            raise ValueError(
                "logistic_param's tol {} not supported, should be float type".format(self.tol))

        if type(self.alpha).__name__ not in ["float", 'int']:
            raise ValueError(
                "logistic_param's alpha {} not supported, should be float or int type".format(self.alpha))

        if type(self.optimizer).__name__ != "str":
            raise ValueError(
                "logistic_param's optimizer {} not supported, should be str type".format(self.optimizer))
        else:
            self.optimizer = self.optimizer.lower()
            if self.optimizer not in ['sgd', 'rmsprop', 'adam', 'adagrad', 'nesterov_momentum_sgd', 'sqn']:
                raise ValueError(
                    "logistic_param's optimizer not supported, optimizer should be"
                    " 'sgd', 'rmsprop', 'adam', 'nesterov_momentum_sgd', 'sqn' or 'adagrad'")

        if self.batch_size != -1:
            if type(self.batch_size).__name__ not in ["int"] \
                    or self.batch_size < consts.MIN_BATCH_SIZE:
                raise ValueError(descr + " {} not supported, should be larger than {} or "
                                         "-1 represent for all data".format(self.batch_size, consts.MIN_BATCH_SIZE))

        if not isinstance(self.learning_rate, (float, int)):
            raise ValueError(
                "logistic_param's learning_rate {} not supported, should be float or int type".format(
                    self.learning_rate))

        self.init_param.check()

        if type(self.max_iter).__name__ != "int":
            raise ValueError(
                "logistic_param's max_iter {} not supported, should be int type".format(self.max_iter))
        elif self.max_iter <= 0:
            raise ValueError(
                "logistic_param's max_iter must be greater or equal to 1")

        if type(self.early_stop).__name__ != "str":
            raise ValueError(
                "logistic_param's early_stop {} not supported, should be str type".format(
                    self.early_stop))
        else:
            self.early_stop = self.early_stop.lower()
            if self.early_stop not in ['diff', 'abs', 'weight_diff']:
                raise ValueError(
                    "logistic_param's early_stop not supported, converge_func should be"
                    " 'diff', 'weight_diff' or 'abs'")

        self.encrypt_param.check()
        self.predict_param.check()
        if self.encrypt_param.method not in [consts.PAILLIER, None]:
            raise ValueError(
                "logistic_param's encrypted method support 'Paillier' or None only")

        if type(self.decay).__name__ not in ["int", 'float']:
            raise ValueError(
                "logistic_param's decay {} not supported, should be 'int' or 'float'".format(
                    self.decay))

        if type(self.decay_sqrt).__name__ not in ['bool']:
            raise ValueError(
                "logistic_param's decay_sqrt {} not supported, should be 'bool'".format(
                    self.decay_sqrt))
        self.stepwise_param.check()

        if self.early_stopping_rounds is None:
            pass
        elif isinstance(self.early_stopping_rounds, int):
            if self.early_stopping_rounds < 1:
                raise ValueError("early stopping rounds should be larger than 0 when it's integer")
            if self.validation_freqs is None:
                raise ValueError("validation freqs must be set when early stopping is enabled")

        if self.metrics is not None and not isinstance(self.metrics, list):
            raise ValueError("metrics should be a list")

        if not isinstance(self.use_first_metric_only, bool):
            raise ValueError("use_first_metric_only should be a boolean")

        if self.floating_point_precision is not None and \
                (not isinstance(self.floating_point_precision, int) or \
                 self.floating_point_precision < 0 or self.floating_point_precision > 63):
            raise ValueError("floating point precision should be null or a integer between 0 and 63")

        for p in ["early_stopping_rounds", "validation_freqs", "metrics",
                  "use_first_metric_only"]:
            # if self._warn_to_deprecate_param(p, "", ""):
            if self._deprecated_params_set.get(p):
                if "callback_param" in self.get_user_feeded():
                    raise ValueError(f"{p} and callback param should not be set simultaneously，"
                                     f"{self._deprecated_params_set}, {self.get_user_feeded()}")
                else:
                    self.callback_param.callbacks = ["PerformanceEvaluate"]
                break

        if self._warn_to_deprecate_param("validation_freqs", descr, "callback_param's 'validation_freqs'"):
            self.callback_param.validation_freqs = self.validation_freqs

        if self._warn_to_deprecate_param("early_stopping_rounds", descr, "callback_param's 'early_stopping_rounds'"):
            self.callback_param.early_stopping_rounds = self.early_stopping_rounds

        if self._warn_to_deprecate_param("metrics", descr, "callback_param's 'metrics'"):
            self.callback_param.metrics = self.metrics

        if self._warn_to_deprecate_param("use_first_metric_only", descr, "callback_param's 'use_first_metric_only'"):
            self.callback_param.use_first_metric_only = self.use_first_metric_only
        return True

__init__(self, penalty='L2', tol=0.0001, alpha=1.0, optimizer='rmsprop', batch_size=-1, learning_rate=0.01, init_param=<federatedml.param.init_model_param.InitParam object at 0x7f3f8a68e450>, max_iter=100, early_stop='diff', encrypt_param=<federatedml.param.encrypt_param.EncryptParam object at 0x7f3f8a68e590>, predict_param=<federatedml.param.predict_param.PredictParam object at 0x7f3f8a68e5d0>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object at 0x7f3f8a68e490>, decay=1, decay_sqrt=True, multi_class='ovr', validation_freqs=None, early_stopping_rounds=None, stepwise_param=<federatedml.param.stepwise_param.StepwiseParam object at 0x7f3f8a68e6d0>, floating_point_precision=23, metrics=None, use_first_metric_only=False, callback_param=<federatedml.param.callback_param.CallbackParam object at 0x7f3f8a68e690>)

special ¶

Source code in federatedml/param/logistic_regression_param.py

def __init__(self, penalty='L2',
             tol=1e-4, alpha=1.0, optimizer='rmsprop',
             batch_size=-1, learning_rate=0.01, init_param=InitParam(),
             max_iter=100, early_stop='diff', encrypt_param=EncryptParam(),
             predict_param=PredictParam(), cv_param=CrossValidationParam(),
             decay=1, decay_sqrt=True,
             multi_class='ovr', validation_freqs=None, early_stopping_rounds=None,
             stepwise_param=StepwiseParam(), floating_point_precision=23,
             metrics=None,
             use_first_metric_only=False,
             callback_param=CallbackParam()
             ):
    super(LogisticParam, self).__init__()
    self.penalty = penalty
    self.tol = tol
    self.alpha = alpha
    self.optimizer = optimizer
    self.batch_size = batch_size
    self.learning_rate = learning_rate
    self.init_param = copy.deepcopy(init_param)
    self.max_iter = max_iter
    self.early_stop = early_stop
    self.encrypt_param = encrypt_param
    self.predict_param = copy.deepcopy(predict_param)
    self.cv_param = copy.deepcopy(cv_param)
    self.decay = decay
    self.decay_sqrt = decay_sqrt
    self.multi_class = multi_class
    self.validation_freqs = validation_freqs
    self.stepwise_param = copy.deepcopy(stepwise_param)
    self.early_stopping_rounds = early_stopping_rounds
    self.metrics = metrics or []
    self.use_first_metric_only = use_first_metric_only
    self.floating_point_precision = floating_point_precision
    self.callback_param = copy.deepcopy(callback_param)

check(self) ¶

Source code in federatedml/param/logistic_regression_param.py

def check(self):
    descr = "logistic_param's"

    if self.penalty is None:
        pass
    elif type(self.penalty).__name__ != "str":
        raise ValueError(
            "logistic_param's penalty {} not supported, should be str type".format(self.penalty))
    else:
        self.penalty = self.penalty.upper()
        if self.penalty not in [consts.L1_PENALTY, consts.L2_PENALTY, 'NONE']:
            raise ValueError(
                "logistic_param's penalty not supported, penalty should be 'L1', 'L2' or 'none'")

    if not isinstance(self.tol, (int, float)):
        raise ValueError(
            "logistic_param's tol {} not supported, should be float type".format(self.tol))

    if type(self.alpha).__name__ not in ["float", 'int']:
        raise ValueError(
            "logistic_param's alpha {} not supported, should be float or int type".format(self.alpha))

    if type(self.optimizer).__name__ != "str":
        raise ValueError(
            "logistic_param's optimizer {} not supported, should be str type".format(self.optimizer))
    else:
        self.optimizer = self.optimizer.lower()
        if self.optimizer not in ['sgd', 'rmsprop', 'adam', 'adagrad', 'nesterov_momentum_sgd', 'sqn']:
            raise ValueError(
                "logistic_param's optimizer not supported, optimizer should be"
                " 'sgd', 'rmsprop', 'adam', 'nesterov_momentum_sgd', 'sqn' or 'adagrad'")

    if self.batch_size != -1:
        if type(self.batch_size).__name__ not in ["int"] \
                or self.batch_size < consts.MIN_BATCH_SIZE:
            raise ValueError(descr + " {} not supported, should be larger than {} or "
                                     "-1 represent for all data".format(self.batch_size, consts.MIN_BATCH_SIZE))

    if not isinstance(self.learning_rate, (float, int)):
        raise ValueError(
            "logistic_param's learning_rate {} not supported, should be float or int type".format(
                self.learning_rate))

    self.init_param.check()

    if type(self.max_iter).__name__ != "int":
        raise ValueError(
            "logistic_param's max_iter {} not supported, should be int type".format(self.max_iter))
    elif self.max_iter <= 0:
        raise ValueError(
            "logistic_param's max_iter must be greater or equal to 1")

    if type(self.early_stop).__name__ != "str":
        raise ValueError(
            "logistic_param's early_stop {} not supported, should be str type".format(
                self.early_stop))
    else:
        self.early_stop = self.early_stop.lower()
        if self.early_stop not in ['diff', 'abs', 'weight_diff']:
            raise ValueError(
                "logistic_param's early_stop not supported, converge_func should be"
                " 'diff', 'weight_diff' or 'abs'")

    self.encrypt_param.check()
    self.predict_param.check()
    if self.encrypt_param.method not in [consts.PAILLIER, None]:
        raise ValueError(
            "logistic_param's encrypted method support 'Paillier' or None only")

    if type(self.decay).__name__ not in ["int", 'float']:
        raise ValueError(
            "logistic_param's decay {} not supported, should be 'int' or 'float'".format(
                self.decay))

    if type(self.decay_sqrt).__name__ not in ['bool']:
        raise ValueError(
            "logistic_param's decay_sqrt {} not supported, should be 'bool'".format(
                self.decay_sqrt))
    self.stepwise_param.check()

    if self.early_stopping_rounds is None:
        pass
    elif isinstance(self.early_stopping_rounds, int):
        if self.early_stopping_rounds < 1:
            raise ValueError("early stopping rounds should be larger than 0 when it's integer")
        if self.validation_freqs is None:
            raise ValueError("validation freqs must be set when early stopping is enabled")

    if self.metrics is not None and not isinstance(self.metrics, list):
        raise ValueError("metrics should be a list")

    if not isinstance(self.use_first_metric_only, bool):
        raise ValueError("use_first_metric_only should be a boolean")

    if self.floating_point_precision is not None and \
            (not isinstance(self.floating_point_precision, int) or \
             self.floating_point_precision < 0 or self.floating_point_precision > 63):
        raise ValueError("floating point precision should be null or a integer between 0 and 63")

    for p in ["early_stopping_rounds", "validation_freqs", "metrics",
              "use_first_metric_only"]:
        # if self._warn_to_deprecate_param(p, "", ""):
        if self._deprecated_params_set.get(p):
            if "callback_param" in self.get_user_feeded():
                raise ValueError(f"{p} and callback param should not be set simultaneously，"
                                 f"{self._deprecated_params_set}, {self.get_user_feeded()}")
            else:
                self.callback_param.callbacks = ["PerformanceEvaluate"]
            break

    if self._warn_to_deprecate_param("validation_freqs", descr, "callback_param's 'validation_freqs'"):
        self.callback_param.validation_freqs = self.validation_freqs

    if self._warn_to_deprecate_param("early_stopping_rounds", descr, "callback_param's 'early_stopping_rounds'"):
        self.callback_param.early_stopping_rounds = self.early_stopping_rounds

    if self._warn_to_deprecate_param("metrics", descr, "callback_param's 'metrics'"):
        self.callback_param.metrics = self.metrics

    if self._warn_to_deprecate_param("use_first_metric_only", descr, "callback_param's 'use_first_metric_only'"):
        self.callback_param.use_first_metric_only = self.use_first_metric_only
    return True


HomoLogisticParam            (LogisticParam)

¶

Parameters:

Name	Type	Description	Default
`re_encrypt_batches`	`int, default: 2`	Required when using encrypted version HomoLR. Since multiple batch updating coefficient may cause overflow error. The model need to be re-encrypt for every several batches. Please be careful when setting this parameter. Too large batches may cause training failure.	`2`
`aggregate_iters`	`int, default: 1`	Indicate how many iterations are aggregated once.	`1`
`use_proximal`	`bool, default: False`	Whether to turn on additional proximial term. For more details of FedProx, Please refer to https://arxiv.org/abs/1812.06127	`False`
`mu`	`float, default 0.1`	To scale the proximal term	`0.1`

Source code in federatedml/param/logistic_regression_param.py

class HomoLogisticParam(LogisticParam):
    """
    Parameters
    ----------
    re_encrypt_batches : int, default: 2
        Required when using encrypted version HomoLR. Since multiple batch updating coefficient may cause
        overflow error. The model need to be re-encrypt for every several batches. Please be careful when setting
        this parameter. Too large batches may cause training failure.

    aggregate_iters : int, default: 1
        Indicate how many iterations are aggregated once.

    use_proximal: bool, default: False
        Whether to turn on additional proximial term. For more details of FedProx, Please refer to
        https://arxiv.org/abs/1812.06127

    mu: float, default 0.1
        To scale the proximal term

    """

    def __init__(self, penalty='L2',
                 tol=1e-4, alpha=1.0, optimizer='rmsprop',
                 batch_size=-1, learning_rate=0.01, init_param=InitParam(),
                 max_iter=100, early_stop='diff',
                 encrypt_param=EncryptParam(method=None), re_encrypt_batches=2,
                 predict_param=PredictParam(), cv_param=CrossValidationParam(),
                 decay=1, decay_sqrt=True,
                 aggregate_iters=1, multi_class='ovr', validation_freqs=None,
                 early_stopping_rounds=None,
                 metrics=['auc', 'ks'],
                 use_first_metric_only=False,
                 use_proximal=False,
                 mu=0.1, callback_param=CallbackParam()
                 ):
        super(HomoLogisticParam, self).__init__(penalty=penalty, tol=tol, alpha=alpha, optimizer=optimizer,
                                                batch_size=batch_size,
                                                learning_rate=learning_rate,
                                                init_param=init_param, max_iter=max_iter, early_stop=early_stop,
                                                encrypt_param=encrypt_param, predict_param=predict_param,
                                                cv_param=cv_param, multi_class=multi_class,
                                                validation_freqs=validation_freqs,
                                                decay=decay, decay_sqrt=decay_sqrt,
                                                early_stopping_rounds=early_stopping_rounds,
                                                metrics=metrics, use_first_metric_only=use_first_metric_only,
                                                callback_param=callback_param)
        self.re_encrypt_batches = re_encrypt_batches
        self.aggregate_iters = aggregate_iters
        self.use_proximal = use_proximal
        self.mu = mu

    def check(self):
        super().check()
        if type(self.re_encrypt_batches).__name__ != "int":
            raise ValueError(
                "logistic_param's re_encrypt_batches {} not supported, should be int type".format(
                    self.re_encrypt_batches))
        elif self.re_encrypt_batches < 0:
            raise ValueError(
                "logistic_param's re_encrypt_batches must be greater or equal to 0")

        if not isinstance(self.aggregate_iters, int):
            raise ValueError(
                "logistic_param's aggregate_iters {} not supported, should be int type".format(
                    self.aggregate_iters))

        if self.encrypt_param.method == consts.PAILLIER:
            if self.optimizer != 'sgd':
                raise ValueError("Paillier encryption mode supports 'sgd' optimizer method only.")

            if self.penalty == consts.L1_PENALTY:
                raise ValueError("Paillier encryption mode supports 'L2' penalty or None only.")

        if self.optimizer == 'sqn':
            raise ValueError("'sqn' optimizer is supported for hetero mode only.")

        return True

__init__(self, penalty='L2', tol=0.0001, alpha=1.0, optimizer='rmsprop', batch_size=-1, learning_rate=0.01, init_param=<federatedml.param.init_model_param.InitParam object at 0x7f3f8a68e550>, max_iter=100, early_stop='diff', encrypt_param=<federatedml.param.encrypt_param.EncryptParam object at 0x7f3f8a68e710>, re_encrypt_batches=2, predict_param=<federatedml.param.predict_param.PredictParam object at 0x7f3f8a68e790>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object at 0x7f3f8a68e750>, decay=1, decay_sqrt=True, aggregate_iters=1, multi_class='ovr', validation_freqs=None, early_stopping_rounds=None, metrics=['auc', 'ks'], use_first_metric_only=False, use_proximal=False, mu=0.1, callback_param=<federatedml.param.callback_param.CallbackParam object at 0x7f3f8a68e610>)

special ¶

Source code in federatedml/param/logistic_regression_param.py

def __init__(self, penalty='L2',
             tol=1e-4, alpha=1.0, optimizer='rmsprop',
             batch_size=-1, learning_rate=0.01, init_param=InitParam(),
             max_iter=100, early_stop='diff',
             encrypt_param=EncryptParam(method=None), re_encrypt_batches=2,
             predict_param=PredictParam(), cv_param=CrossValidationParam(),
             decay=1, decay_sqrt=True,
             aggregate_iters=1, multi_class='ovr', validation_freqs=None,
             early_stopping_rounds=None,
             metrics=['auc', 'ks'],
             use_first_metric_only=False,
             use_proximal=False,
             mu=0.1, callback_param=CallbackParam()
             ):
    super(HomoLogisticParam, self).__init__(penalty=penalty, tol=tol, alpha=alpha, optimizer=optimizer,
                                            batch_size=batch_size,
                                            learning_rate=learning_rate,
                                            init_param=init_param, max_iter=max_iter, early_stop=early_stop,
                                            encrypt_param=encrypt_param, predict_param=predict_param,
                                            cv_param=cv_param, multi_class=multi_class,
                                            validation_freqs=validation_freqs,
                                            decay=decay, decay_sqrt=decay_sqrt,
                                            early_stopping_rounds=early_stopping_rounds,
                                            metrics=metrics, use_first_metric_only=use_first_metric_only,
                                            callback_param=callback_param)
    self.re_encrypt_batches = re_encrypt_batches
    self.aggregate_iters = aggregate_iters
    self.use_proximal = use_proximal
    self.mu = mu

check(self) ¶

Source code in federatedml/param/logistic_regression_param.py

def check(self):
    super().check()
    if type(self.re_encrypt_batches).__name__ != "int":
        raise ValueError(
            "logistic_param's re_encrypt_batches {} not supported, should be int type".format(
                self.re_encrypt_batches))
    elif self.re_encrypt_batches < 0:
        raise ValueError(
            "logistic_param's re_encrypt_batches must be greater or equal to 0")

    if not isinstance(self.aggregate_iters, int):
        raise ValueError(
            "logistic_param's aggregate_iters {} not supported, should be int type".format(
                self.aggregate_iters))

    if self.encrypt_param.method == consts.PAILLIER:
        if self.optimizer != 'sgd':
            raise ValueError("Paillier encryption mode supports 'sgd' optimizer method only.")

        if self.penalty == consts.L1_PENALTY:
            raise ValueError("Paillier encryption mode supports 'L2' penalty or None only.")

    if self.optimizer == 'sqn':
        raise ValueError("'sqn' optimizer is supported for hetero mode only.")

    return True


HeteroLogisticParam            (LogisticParam)

¶

Source code in federatedml/param/logistic_regression_param.py

class HeteroLogisticParam(LogisticParam):
    def __init__(self, penalty='L2',
                 tol=1e-4, alpha=1.0, optimizer='rmsprop',
                 batch_size=-1, learning_rate=0.01, init_param=InitParam(),
                 max_iter=100, early_stop='diff',
                 encrypted_mode_calculator_param=EncryptedModeCalculatorParam(),
                 predict_param=PredictParam(), cv_param=CrossValidationParam(),
                 decay=1, decay_sqrt=True, sqn_param=StochasticQuasiNewtonParam(),
                 multi_class='ovr', validation_freqs=None, early_stopping_rounds=None,
                 metrics=['auc', 'ks'], floating_point_precision=23,
                 encrypt_param=EncryptParam(),
                 use_first_metric_only=False, stepwise_param=StepwiseParam(),
                 callback_param=CallbackParam()
                 ):
        super(HeteroLogisticParam, self).__init__(penalty=penalty, tol=tol, alpha=alpha, optimizer=optimizer,
                                                  batch_size=batch_size,
                                                  learning_rate=learning_rate,
                                                  init_param=init_param, max_iter=max_iter, early_stop=early_stop,
                                                  predict_param=predict_param, cv_param=cv_param,
                                                  decay=decay,
                                                  decay_sqrt=decay_sqrt, multi_class=multi_class,
                                                  validation_freqs=validation_freqs,
                                                  early_stopping_rounds=early_stopping_rounds,
                                                  metrics=metrics, floating_point_precision=floating_point_precision,
                                                  encrypt_param=encrypt_param,
                                                  use_first_metric_only=use_first_metric_only,
                                                  stepwise_param=stepwise_param,
                                                  callback_param=callback_param)
        self.encrypted_mode_calculator_param = copy.deepcopy(encrypted_mode_calculator_param)
        self.sqn_param = copy.deepcopy(sqn_param)

    def check(self):
        super().check()
        self.encrypted_mode_calculator_param.check()
        self.sqn_param.check()
        return True

__init__(self, penalty='L2', tol=0.0001, alpha=1.0, optimizer='rmsprop', batch_size=-1, learning_rate=0.01, init_param=<federatedml.param.init_model_param.InitParam object at 0x7f3f8a68e850>, max_iter=100, early_stop='diff', encrypted_mode_calculator_param=<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object at 0x7f3f8a68e810>, predict_param=<federatedml.param.predict_param.PredictParam object at 0x7f3f8a68e910>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object at 0x7f3f8a68e890>, decay=1, decay_sqrt=True, sqn_param=<federatedml.param.sqn_param.StochasticQuasiNewtonParam object at 0x7f3f8a68ea10>, multi_class='ovr', validation_freqs=None, early_stopping_rounds=None, metrics=['auc', 'ks'], floating_point_precision=23, encrypt_param=<federatedml.param.encrypt_param.EncryptParam object at 0x7f3f8a68e9d0>, use_first_metric_only=False, stepwise_param=<federatedml.param.stepwise_param.StepwiseParam object at 0x7f3f8a68ea50>, callback_param=<federatedml.param.callback_param.CallbackParam object at 0x7f3f8a68eb10>)

special ¶

Source code in federatedml/param/logistic_regression_param.py

def __init__(self, penalty='L2',
             tol=1e-4, alpha=1.0, optimizer='rmsprop',
             batch_size=-1, learning_rate=0.01, init_param=InitParam(),
             max_iter=100, early_stop='diff',
             encrypted_mode_calculator_param=EncryptedModeCalculatorParam(),
             predict_param=PredictParam(), cv_param=CrossValidationParam(),
             decay=1, decay_sqrt=True, sqn_param=StochasticQuasiNewtonParam(),
             multi_class='ovr', validation_freqs=None, early_stopping_rounds=None,
             metrics=['auc', 'ks'], floating_point_precision=23,
             encrypt_param=EncryptParam(),
             use_first_metric_only=False, stepwise_param=StepwiseParam(),
             callback_param=CallbackParam()
             ):
    super(HeteroLogisticParam, self).__init__(penalty=penalty, tol=tol, alpha=alpha, optimizer=optimizer,
                                              batch_size=batch_size,
                                              learning_rate=learning_rate,
                                              init_param=init_param, max_iter=max_iter, early_stop=early_stop,
                                              predict_param=predict_param, cv_param=cv_param,
                                              decay=decay,
                                              decay_sqrt=decay_sqrt, multi_class=multi_class,
                                              validation_freqs=validation_freqs,
                                              early_stopping_rounds=early_stopping_rounds,
                                              metrics=metrics, floating_point_precision=floating_point_precision,
                                              encrypt_param=encrypt_param,
                                              use_first_metric_only=use_first_metric_only,
                                              stepwise_param=stepwise_param,
                                              callback_param=callback_param)
    self.encrypted_mode_calculator_param = copy.deepcopy(encrypted_mode_calculator_param)
    self.sqn_param = copy.deepcopy(sqn_param)

check(self) ¶

Source code in federatedml/param/logistic_regression_param.py

def check(self):
    super().check()
    self.encrypted_mode_calculator_param.check()
    self.sqn_param.check()
    return True

`one_vs_rest_param` ¶

Classes¶


OneVsRestParam            (BaseParam)

¶

Define the one_vs_rest parameters.

Parameters:

Name	Type	Description	Default
`has_arbiter`	`bool, default: true`	For some algorithm, may not has arbiter, for instances, secureboost of FATE, for these algorithms, it should be set to false.	`True`

Source code in federatedml/param/one_vs_rest_param.py

class OneVsRestParam(BaseParam):
    """
    Define the one_vs_rest parameters.

    Parameters
    ----------
    has_arbiter: bool, default: true
        For some algorithm, may not has arbiter, for instances, secureboost of FATE,
        for these algorithms, it should be set to false.
    """

    def __init__(self, need_one_vs_rest=False, has_arbiter=True):
        super().__init__()
        self.need_one_vs_rest = need_one_vs_rest
        self.has_arbiter = has_arbiter

    def check(self):
        if type(self.has_arbiter).__name__ != "bool":
            raise ValueError(
                "one_vs_rest param's has_arbiter {} not supported, should be bool type".format(
                    self.has_arbiter))

        LOGGER.debug("Finish one_vs_rest parameter check!")
        return True

__init__(self, need_one_vs_rest=False, has_arbiter=True) special ¶

Source code in federatedml/param/one_vs_rest_param.py

def __init__(self, need_one_vs_rest=False, has_arbiter=True):
    super().__init__()
    self.need_one_vs_rest = need_one_vs_rest
    self.has_arbiter = has_arbiter

check(self) ¶

Source code in federatedml/param/one_vs_rest_param.py

def check(self):
    if type(self.has_arbiter).__name__ != "bool":
        raise ValueError(
            "one_vs_rest param's has_arbiter {} not supported, should be bool type".format(
                self.has_arbiter))

    LOGGER.debug("Finish one_vs_rest parameter check!")
    return True

`onehot_encoder_param` ¶

Classes¶


OneHotEncoderParam            (BaseParam)

¶

Parameters:

Name	Type	Description	Default
`transform_col_indexes`	`list or int, default: -1`	Specify which columns need to calculated. -1 represent for all columns.	`-1`
`transform_col_names`	`list of string, default: []`	Specify which columns need to calculated. Each element in the list represent for a column name in header.	`None`
`need_run`	`bool, default True`	Indicate if this module needed to be run	`True`

Source code in federatedml/param/onehot_encoder_param.py

class OneHotEncoderParam(BaseParam):
    """

    Parameters
    ----------

    transform_col_indexes: list or int, default: -1
        Specify which columns need to calculated. -1 represent for all columns.

    transform_col_names : list of string, default: []
        Specify which columns need to calculated. Each element in the list represent for a column name in header.


    need_run: bool, default True
        Indicate if this module needed to be run
    """

    def __init__(self, transform_col_indexes=-1, transform_col_names=None, need_run=True):
        super(OneHotEncoderParam, self).__init__()
        if transform_col_names is None:
            transform_col_names = []
        self.transform_col_indexes = transform_col_indexes
        self.transform_col_names = transform_col_names
        self.need_run = need_run

    def check(self):
        descr = "One-hot encoder param's"
        self.check_defined_type(self.transform_col_indexes, descr, ['list', 'int', 'NoneType'])
        self.check_defined_type(self.transform_col_names, descr, ['list', 'NoneType'])
        return True

__init__(self, transform_col_indexes=-1, transform_col_names=None, need_run=True) special ¶

Source code in federatedml/param/onehot_encoder_param.py

def __init__(self, transform_col_indexes=-1, transform_col_names=None, need_run=True):
    super(OneHotEncoderParam, self).__init__()
    if transform_col_names is None:
        transform_col_names = []
    self.transform_col_indexes = transform_col_indexes
    self.transform_col_names = transform_col_names
    self.need_run = need_run

check(self) ¶

Source code in federatedml/param/onehot_encoder_param.py

def check(self):
    descr = "One-hot encoder param's"
    self.check_defined_type(self.transform_col_indexes, descr, ['list', 'int', 'NoneType'])
    self.check_defined_type(self.transform_col_names, descr, ['list', 'NoneType'])
    return True

`pearson_param` ¶

Classes¶


PearsonParam            (BaseParam)

¶

param for pearson correlation

Parameters:

Name	Type	Description	Default
`column_names`	`list of string`	list of column names	`None`
`column_index`	`list of int`	list of column index	required
`cross_parties`	`bool, default: True`	if True, calculate correlation of columns from both party	`True`
`need_run`	`bool`	set False to skip this party	`True`
`use_mix_rand`	`bool, defalut: False`	mix system random and pseudo random for quicker calculation	`False`
`calc_loca_vif`	`bool, default True`	calculate VIF for columns in local	required

Source code in federatedml/param/pearson_param.py

class PearsonParam(BaseParam):
    """
    param for pearson correlation

    Parameters
    ----------

    column_names : list of string
        list of column names

    column_index : list of int
        list of column index

    cross_parties : bool, default: True
        if True, calculate correlation of columns from both party

    need_run : bool
        set False to skip this party

    use_mix_rand : bool, defalut: False
        mix system random and pseudo random for quicker calculation

    calc_loca_vif : bool, default True
        calculate VIF for columns in local
    """
    def __init__(
        self,
        column_names=None,
        column_indexes=None,
        cross_parties=True,
        need_run=True,
        use_mix_rand=False,
        calc_local_vif=True,
    ):
        super().__init__()
        self.column_names = column_names
        self.column_indexes = column_indexes
        self.cross_parties = cross_parties
        self.need_run = need_run
        self.use_mix_rand = use_mix_rand
        if column_names is None:
            self.column_names = []
        if column_indexes is None:
            self.column_indexes = []
        self.calc_local_vif = calc_local_vif

    def check(self):
        if not isinstance(self.use_mix_rand, bool):
            raise ValueError(
                f"use_mix_rand accept bool type only, {type(self.use_mix_rand)} got"
            )
        if self.cross_parties and (not self.need_run):
            raise ValueError(
                f"need_run should be True(which is default) when cross_parties is True."
            )
        if not isinstance(self.column_names, list):
            raise ValueError(
                f"type mismatch, column_names with type {type(self.column_names)}"
            )
        for name in self.column_names:
            if not isinstance(name, str):
                raise ValueError(
                    f"type mismatch, column_names with element {name}(type is {type(name)})"
                )

        if isinstance(self.column_indexes, list):
            for idx in self.column_indexes:
                if not isinstance(idx, int):
                    raise ValueError(
                        f"type mismatch, column_indexes with element {idx}(type is {type(idx)})"
                    )

        if isinstance(self.column_indexes, int) and self.column_indexes != -1:
            raise ValueError(
                f"column_indexes with type int and value {self.column_indexes}(only -1 allowed)"
            )

        if self.need_run:
            if isinstance(self.column_indexes, list) and isinstance(
                self.column_names, list
            ):
                if len(self.column_indexes) == 0 and len(self.column_names) == 0:
                    raise ValueError(f"provide at least one column")

__init__(self, column_names=None, column_indexes=None, cross_parties=True, need_run=True, use_mix_rand=False, calc_local_vif=True)

special ¶

Source code in federatedml/param/pearson_param.py

def __init__(
    self,
    column_names=None,
    column_indexes=None,
    cross_parties=True,
    need_run=True,
    use_mix_rand=False,
    calc_local_vif=True,
):
    super().__init__()
    self.column_names = column_names
    self.column_indexes = column_indexes
    self.cross_parties = cross_parties
    self.need_run = need_run
    self.use_mix_rand = use_mix_rand
    if column_names is None:
        self.column_names = []
    if column_indexes is None:
        self.column_indexes = []
    self.calc_local_vif = calc_local_vif

check(self) ¶

Source code in federatedml/param/pearson_param.py

def check(self):
    if not isinstance(self.use_mix_rand, bool):
        raise ValueError(
            f"use_mix_rand accept bool type only, {type(self.use_mix_rand)} got"
        )
    if self.cross_parties and (not self.need_run):
        raise ValueError(
            f"need_run should be True(which is default) when cross_parties is True."
        )
    if not isinstance(self.column_names, list):
        raise ValueError(
            f"type mismatch, column_names with type {type(self.column_names)}"
        )
    for name in self.column_names:
        if not isinstance(name, str):
            raise ValueError(
                f"type mismatch, column_names with element {name}(type is {type(name)})"
            )

    if isinstance(self.column_indexes, list):
        for idx in self.column_indexes:
            if not isinstance(idx, int):
                raise ValueError(
                    f"type mismatch, column_indexes with element {idx}(type is {type(idx)})"
                )

    if isinstance(self.column_indexes, int) and self.column_indexes != -1:
        raise ValueError(
            f"column_indexes with type int and value {self.column_indexes}(only -1 allowed)"
        )

    if self.need_run:
        if isinstance(self.column_indexes, list) and isinstance(
            self.column_names, list
        ):
            if len(self.column_indexes) == 0 and len(self.column_names) == 0:
                raise ValueError(f"provide at least one column")

`poisson_regression_param` ¶

Classes¶


PoissonParam            (BaseParam)

¶

Parameters used for Poisson Regression.

Parameters:

Name	Type	Description	Default
`penalty`	`{'L2', 'L1'}, default: 'L2'`	Penalty method used in Poisson. Please note that, when using encrypted version in HeteroPoisson, 'L1' is not supported.	`'L2'`
`tol`	`float, default: 1e-4`	The tolerance of convergence	`0.0001`
`alpha`	`float, default: 1.0`	Regularization strength coefficient.	`1.0`
`optimizer`	`{'rmsprop', 'sgd', 'adam', 'adagrad'}, default: 'rmsprop'`	Optimize method	`'rmsprop'`
`batch_size`	`int, default: -1`	Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.	`-1`
`learning_rate`	`float, default: 0.01`	Learning rate	`0.01`
`max_iter`	`int, default: 20`	The maximum iteration for training.	`20`
`init_param`	`InitParam object, default: default InitParam object`	Init param method object.	`<federatedml.param.init_model_param.InitParam object at 0x7f3f8a6865d0>`
`early_stop`	`str, 'weight_diff', 'diff' or 'abs', default: 'diff'`	Method used to judge convergence. a) diff： Use difference of loss between two iterations to judge whether converge. b) weight_diff: Use difference between weights of two consecutive iterations c) abs: Use the absolute value of loss to judge whether converge. i.e. if loss < eps, it is converged.	`'diff'`
`exposure_colname`	`str or None, default: None`	Name of optional exposure variable in dTable.	`None`
`predict_param`	`PredictParam object, default: default PredictParam object`	predict param	`<federatedml.param.predict_param.PredictParam object at 0x7f3f8a686690>`
`encrypt_param`	`EncryptParam object, default: default EncryptParam object`	encrypt param	`<federatedml.param.encrypt_param.EncryptParam object at 0x7f3f8a6866d0>`
`encrypted_mode_calculator_param`	`EncryptedModeCalculatorParam object, default: default EncryptedModeCalculatorParam object`	encrypted mode calculator param	`<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object at 0x7f3f8a686610>`
`cv_param`	`CrossValidationParam object, default: default CrossValidationParam object`	cv param	`<federatedml.param.cross_validation_param.CrossValidationParam object at 0x7f3f8a686650>`
`stepwise_param`	`StepwiseParam object, default: default StepwiseParam object`	stepwise param	`<federatedml.param.stepwise_param.StepwiseParam object at 0x7f3f8a686810>`
`decay`	`int or float, default: 1`	Decay rate for learning rate. learning rate will follow the following decay schedule. lr = lr0/(1+decayt) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decayt) where t is the iter number.	`1`
`decay_sqrt`	`bool, default: True`	lr = lr0/(1+decayt) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decayt)	`True`
`validation_freqs`	`int, list, tuple, set, or None`	validation frequency during training, required when using early stopping. The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to speed up training by skipping validation rounds. When it is larger than 1, a number which is divisible by "max_iter" is recommended, otherwise, you will miss the validation scores of the last training iteration.	`None`
`early_stopping_rounds`	`int, default: None`	If positive number specified, at every specified training rounds, program checks for early stopping criteria. Validation_freqs must also be set when using early stopping.	`None`
`metrics`	`list or None, default: None`	Specify which metrics to be used when performing evaluation during training process. If metrics have not improved at early_stopping rounds, trianing stops before convergence. If set as empty, default metrics will be used. For regression tasks, default metrics are ['root_mean_squared_error', 'mean_absolute_error']	`None`
`use_first_metric_only`	`bool, default: False`	Indicate whether to use the first metric in `metrics` as the only criterion for early stopping judgement.	`False`
`floating_point_precision`	`None or integer`	if not None, use floating_point_precision-bit to speed up calculation, e.g.: convert an x to round(x * 2floating_point_precision) during Paillier operation, divide the result by 2floating_point_precision in the end.	`23`
`callback_param`	`CallbackParam object`	callback param	`<federatedml.param.callback_param.CallbackParam object at 0x7f3f8a6867d0>`

Source code in federatedml/param/poisson_regression_param.py

class PoissonParam(BaseParam):
    """
    Parameters used for Poisson Regression.

    Parameters
    ----------
    penalty : {'L2', 'L1'}, default: 'L2'
        Penalty method used in Poisson. Please note that, when using encrypted version in HeteroPoisson,
        'L1' is not supported.

    tol : float, default: 1e-4
        The tolerance of convergence

    alpha : float, default: 1.0
        Regularization strength coefficient.

    optimizer : {'rmsprop', 'sgd', 'adam', 'adagrad'}, default: 'rmsprop'
        Optimize method

    batch_size : int, default: -1
        Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.

    learning_rate : float, default: 0.01
        Learning rate

    max_iter : int, default: 20
        The maximum iteration for training.

    init_param: InitParam object, default: default InitParam object
        Init param method object.

    early_stop : str, 'weight_diff', 'diff' or 'abs', default: 'diff'
        Method used to judge convergence.
            a)	diff： Use difference of loss between two iterations to judge whether converge.
            b)  weight_diff: Use difference between weights of two consecutive iterations
            c)	abs: Use the absolute value of loss to judge whether converge. i.e. if loss < eps, it is converged.

    exposure_colname: str or None, default: None
        Name of optional exposure variable in dTable.

    predict_param: PredictParam object, default: default PredictParam object
        predict param

    encrypt_param: EncryptParam object, default: default EncryptParam object
        encrypt param

    encrypted_mode_calculator_param: EncryptedModeCalculatorParam object, default: default EncryptedModeCalculatorParam object
        encrypted mode calculator param

    cv_param: CrossValidationParam object, default: default CrossValidationParam object
        cv param

    stepwise_param: StepwiseParam object, default: default StepwiseParam object
        stepwise param

    decay: int or float, default: 1
        Decay rate for learning rate. learning rate will follow the following decay schedule.
        lr = lr0/(1+decay*t) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decay*t)
        where t is the iter number.

    decay_sqrt: bool, default: True
        lr = lr0/(1+decay*t) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decay*t)

    validation_freqs: int, list, tuple, set, or None
        validation frequency during training, required when using early stopping.
        The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to speed up training by skipping validation rounds.
        When it is larger than 1, a number which is divisible by "max_iter" is recommended, otherwise, you will miss the validation scores of the last training iteration.

    early_stopping_rounds: int, default: None
        If positive number specified, at every specified training rounds, program checks for early stopping criteria.
        Validation_freqs must also be set when using early stopping.

    metrics: list or None, default: None
        Specify which metrics to be used when performing evaluation during training process. If metrics have not improved at early_stopping rounds, trianing stops before convergence.
        If set as empty, default metrics will be used. For regression tasks, default metrics are ['root_mean_squared_error', 'mean_absolute_error']

    use_first_metric_only: bool, default: False
        Indicate whether to use the first metric in `metrics` as the only criterion for early stopping judgement.

    floating_point_precision: None or integer
        if not None, use floating_point_precision-bit to speed up calculation,
        e.g.: convert an x to round(x * 2**floating_point_precision) during Paillier operation, divide
                the result by 2**floating_point_precision in the end.

    callback_param: CallbackParam object
        callback param

    """

    def __init__(self, penalty='L2',
                 tol=1e-4, alpha=1.0, optimizer='rmsprop',
                 batch_size=-1, learning_rate=0.01, init_param=InitParam(),
                 max_iter=20, early_stop='diff',
                 exposure_colname = None, predict_param=PredictParam(),
                 encrypt_param=EncryptParam(),
                 encrypted_mode_calculator_param=EncryptedModeCalculatorParam(),
                 cv_param=CrossValidationParam(), stepwise_param=StepwiseParam(),
                 decay=1, decay_sqrt=True,
                 validation_freqs=None, early_stopping_rounds=None, metrics=None, use_first_metric_only=False,
                 floating_point_precision=23, callback_param=CallbackParam()):
        super(PoissonParam, self).__init__()
        self.penalty = penalty
        self.tol = tol
        self.alpha = alpha
        self.optimizer = optimizer
        self.batch_size = batch_size
        self.learning_rate = learning_rate
        self.init_param = copy.deepcopy(init_param)

        self.max_iter = max_iter
        self.early_stop = early_stop
        self.encrypt_param = encrypt_param
        self.encrypted_mode_calculator_param = copy.deepcopy(encrypted_mode_calculator_param)
        self.cv_param = copy.deepcopy(cv_param)
        self.predict_param = copy.deepcopy(predict_param)
        self.decay = decay
        self.decay_sqrt = decay_sqrt
        self.exposure_colname = exposure_colname
        self.validation_freqs = validation_freqs
        self.stepwise_param = stepwise_param
        self.early_stopping_rounds = early_stopping_rounds
        self.metrics = metrics or []
        self.use_first_metric_only = use_first_metric_only
        self.floating_point_precision = floating_point_precision
        self.callback_param = copy.deepcopy(callback_param)

    def check(self):
        descr = "poisson_regression_param's "

        if self.penalty is None:
            self.penalty = 'NONE'
        elif type(self.penalty).__name__ != "str":
            raise ValueError(
                descr + "penalty {} not supported, should be str type".format(self.penalty))

        self.penalty = self.penalty.upper()
        if self.penalty not in ['L1', 'L2', 'NONE']:
            raise ValueError(
                "penalty {} not supported, penalty should be 'L1', 'L2' or 'none'".format(self.penalty))

        if type(self.tol).__name__ not in ["int", "float"]:
            raise ValueError(
                descr + "tol {} not supported, should be float type".format(self.tol))

        if type(self.alpha).__name__ not in ["int", "float"]:
            raise ValueError(
                descr + "alpha {} not supported, should be float type".format(self.alpha))

        if type(self.optimizer).__name__ != "str":
            raise ValueError(
                descr + "optimizer {} not supported, should be str type".format(self.optimizer))
        else:
            self.optimizer = self.optimizer.lower()
            if self.optimizer not in ['sgd', 'rmsprop', 'adam', 'adagrad', 'nesterov_momentum_sgd']:
                raise ValueError(
                    descr + "optimizer not supported, optimizer should be"
                    " 'sgd', 'rmsprop', 'adam', 'adagrad' or 'nesterov_momentum_sgd'")

        if type(self.batch_size).__name__ not in ["int", "long"]:
            raise ValueError(
                descr + "batch_size {} not supported, should be int type".format(self.batch_size))
        if self.batch_size != -1:
            if type(self.batch_size).__name__ not in ["int", "long"] \
                or self.batch_size < consts.MIN_BATCH_SIZE:
                raise ValueError(descr + " {} not supported, should be larger than {} or "
                                         "-1 represent for all data".format(self.batch_size, consts.MIN_BATCH_SIZE))

        if type(self.learning_rate).__name__ not in ["int", "float"]:
            raise ValueError(
                descr + "learning_rate {} not supported, should be float type".format(
                    self.learning_rate))

        self.init_param.check()
        if self.encrypt_param.method != consts.PAILLIER:
            raise ValueError(
                descr + "encrypt method supports 'Paillier' only")

        if type(self.max_iter).__name__ != "int":
            raise ValueError(
                descr + "max_iter {} not supported, should be int type".format(self.max_iter))
        elif self.max_iter <= 0:
            raise ValueError(
                descr + "max_iter must be greater or equal to 1")

        if self.exposure_colname is not None:
            if type(self.exposure_colname).__name__ != "str":
                raise ValueError(
                    descr + "exposure_colname {} not supported, should be string type".format(self.exposure_colname))

        if type(self.early_stop).__name__ != "str":
            raise ValueError(
                descr + "early_stop {} not supported, should be str type".format(
                    self.early_stop))
        else:
            self.early_stop = self.early_stop.lower()
            if self.early_stop not in ['diff', 'abs', 'weight_diff']:
                raise ValueError(
                    descr + "early_stop not supported, early_stop should be"
                    " 'diff' or 'abs'")

        self.encrypt_param.check()
        if self.encrypt_param.method != consts.PAILLIER:
            raise ValueError(
                descr + "encrypt method supports 'Paillier' or None only"
            )

        self.encrypted_mode_calculator_param.check()
        if type(self.decay).__name__ not in ["int", "float"]:
            raise ValueError(
                descr + "decay {} not supported, should be 'int' or 'float'".format(self.decay)
            )
        if type(self.decay_sqrt).__name__ not in ["bool"]:
            raise ValueError(
                descr + "decay_sqrt {} not supported, should be 'bool'".format(self.decay)
            )

        self.stepwise_param.check()

        for p in ["early_stopping_rounds", "validation_freqs", "metrics",
                  "use_first_metric_only"]:
            if self._warn_to_deprecate_param(p, "", ""):
                if "callback_param" in self.get_user_feeded():
                    raise ValueError(f"{p} and callback param should not be set simultaneously，"
                                     f"{self._deprecated_params_set}, {self.get_user_feeded()}")
                else:
                    self.callback_param.callbacks = ["PerformanceEvaluate"]
                break

        if self._warn_to_deprecate_param("validation_freqs", descr, "callback_param's 'validation_freqs'"):
            self.callback_param.validation_freqs = self.validation_freqs

        if self._warn_to_deprecate_param("early_stopping_rounds", descr, "callback_param's 'early_stopping_rounds'"):
            self.callback_param.early_stopping_rounds = self.early_stopping_rounds

        if self._warn_to_deprecate_param("metrics", descr, "callback_param's 'metrics'"):
            self.callback_param.metrics = self.metrics

        if self._warn_to_deprecate_param("use_first_metric_only", descr, "callback_param's 'use_first_metric_only'"):
            self.callback_param.use_first_metric_only = self.use_first_metric_only

        if self.floating_point_precision is not None and \
                (not isinstance(self.floating_point_precision, int) or
                 self.floating_point_precision < 0 or self.floating_point_precision > 64):
            raise ValueError("floating point precision should be null or a integer between 0 and 64")
        self.callback_param.check()

        return True

__init__(self, penalty='L2', tol=0.0001, alpha=1.0, optimizer='rmsprop', batch_size=-1, learning_rate=0.01, init_param=<federatedml.param.init_model_param.InitParam object at 0x7f3f8a6865d0>, max_iter=20, early_stop='diff', exposure_colname=None, predict_param=<federatedml.param.predict_param.PredictParam object at 0x7f3f8a686690>, encrypt_param=<federatedml.param.encrypt_param.EncryptParam object at 0x7f3f8a6866d0>, encrypted_mode_calculator_param=<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object at 0x7f3f8a686610>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object at 0x7f3f8a686650>, stepwise_param=<federatedml.param.stepwise_param.StepwiseParam object at 0x7f3f8a686810>, decay=1, decay_sqrt=True, validation_freqs=None, early_stopping_rounds=None, metrics=None, use_first_metric_only=False, floating_point_precision=23, callback_param=<federatedml.param.callback_param.CallbackParam object at 0x7f3f8a6867d0>)

special ¶

Source code in federatedml/param/poisson_regression_param.py

def __init__(self, penalty='L2',
             tol=1e-4, alpha=1.0, optimizer='rmsprop',
             batch_size=-1, learning_rate=0.01, init_param=InitParam(),
             max_iter=20, early_stop='diff',
             exposure_colname = None, predict_param=PredictParam(),
             encrypt_param=EncryptParam(),
             encrypted_mode_calculator_param=EncryptedModeCalculatorParam(),
             cv_param=CrossValidationParam(), stepwise_param=StepwiseParam(),
             decay=1, decay_sqrt=True,
             validation_freqs=None, early_stopping_rounds=None, metrics=None, use_first_metric_only=False,
             floating_point_precision=23, callback_param=CallbackParam()):
    super(PoissonParam, self).__init__()
    self.penalty = penalty
    self.tol = tol
    self.alpha = alpha
    self.optimizer = optimizer
    self.batch_size = batch_size
    self.learning_rate = learning_rate
    self.init_param = copy.deepcopy(init_param)

    self.max_iter = max_iter
    self.early_stop = early_stop
    self.encrypt_param = encrypt_param
    self.encrypted_mode_calculator_param = copy.deepcopy(encrypted_mode_calculator_param)
    self.cv_param = copy.deepcopy(cv_param)
    self.predict_param = copy.deepcopy(predict_param)
    self.decay = decay
    self.decay_sqrt = decay_sqrt
    self.exposure_colname = exposure_colname
    self.validation_freqs = validation_freqs
    self.stepwise_param = stepwise_param
    self.early_stopping_rounds = early_stopping_rounds
    self.metrics = metrics or []
    self.use_first_metric_only = use_first_metric_only
    self.floating_point_precision = floating_point_precision
    self.callback_param = copy.deepcopy(callback_param)

check(self) ¶

Source code in federatedml/param/poisson_regression_param.py

def check(self):
    descr = "poisson_regression_param's "

    if self.penalty is None:
        self.penalty = 'NONE'
    elif type(self.penalty).__name__ != "str":
        raise ValueError(
            descr + "penalty {} not supported, should be str type".format(self.penalty))

    self.penalty = self.penalty.upper()
    if self.penalty not in ['L1', 'L2', 'NONE']:
        raise ValueError(
            "penalty {} not supported, penalty should be 'L1', 'L2' or 'none'".format(self.penalty))

    if type(self.tol).__name__ not in ["int", "float"]:
        raise ValueError(
            descr + "tol {} not supported, should be float type".format(self.tol))

    if type(self.alpha).__name__ not in ["int", "float"]:
        raise ValueError(
            descr + "alpha {} not supported, should be float type".format(self.alpha))

    if type(self.optimizer).__name__ != "str":
        raise ValueError(
            descr + "optimizer {} not supported, should be str type".format(self.optimizer))
    else:
        self.optimizer = self.optimizer.lower()
        if self.optimizer not in ['sgd', 'rmsprop', 'adam', 'adagrad', 'nesterov_momentum_sgd']:
            raise ValueError(
                descr + "optimizer not supported, optimizer should be"
                " 'sgd', 'rmsprop', 'adam', 'adagrad' or 'nesterov_momentum_sgd'")

    if type(self.batch_size).__name__ not in ["int", "long"]:
        raise ValueError(
            descr + "batch_size {} not supported, should be int type".format(self.batch_size))
    if self.batch_size != -1:
        if type(self.batch_size).__name__ not in ["int", "long"] \
            or self.batch_size < consts.MIN_BATCH_SIZE:
            raise ValueError(descr + " {} not supported, should be larger than {} or "
                                     "-1 represent for all data".format(self.batch_size, consts.MIN_BATCH_SIZE))

    if type(self.learning_rate).__name__ not in ["int", "float"]:
        raise ValueError(
            descr + "learning_rate {} not supported, should be float type".format(
                self.learning_rate))

    self.init_param.check()
    if self.encrypt_param.method != consts.PAILLIER:
        raise ValueError(
            descr + "encrypt method supports 'Paillier' only")

    if type(self.max_iter).__name__ != "int":
        raise ValueError(
            descr + "max_iter {} not supported, should be int type".format(self.max_iter))
    elif self.max_iter <= 0:
        raise ValueError(
            descr + "max_iter must be greater or equal to 1")

    if self.exposure_colname is not None:
        if type(self.exposure_colname).__name__ != "str":
            raise ValueError(
                descr + "exposure_colname {} not supported, should be string type".format(self.exposure_colname))

    if type(self.early_stop).__name__ != "str":
        raise ValueError(
            descr + "early_stop {} not supported, should be str type".format(
                self.early_stop))
    else:
        self.early_stop = self.early_stop.lower()
        if self.early_stop not in ['diff', 'abs', 'weight_diff']:
            raise ValueError(
                descr + "early_stop not supported, early_stop should be"
                " 'diff' or 'abs'")

    self.encrypt_param.check()
    if self.encrypt_param.method != consts.PAILLIER:
        raise ValueError(
            descr + "encrypt method supports 'Paillier' or None only"
        )

    self.encrypted_mode_calculator_param.check()
    if type(self.decay).__name__ not in ["int", "float"]:
        raise ValueError(
            descr + "decay {} not supported, should be 'int' or 'float'".format(self.decay)
        )
    if type(self.decay_sqrt).__name__ not in ["bool"]:
        raise ValueError(
            descr + "decay_sqrt {} not supported, should be 'bool'".format(self.decay)
        )

    self.stepwise_param.check()

    for p in ["early_stopping_rounds", "validation_freqs", "metrics",
              "use_first_metric_only"]:
        if self._warn_to_deprecate_param(p, "", ""):
            if "callback_param" in self.get_user_feeded():
                raise ValueError(f"{p} and callback param should not be set simultaneously，"
                                 f"{self._deprecated_params_set}, {self.get_user_feeded()}")
            else:
                self.callback_param.callbacks = ["PerformanceEvaluate"]
            break

    if self._warn_to_deprecate_param("validation_freqs", descr, "callback_param's 'validation_freqs'"):
        self.callback_param.validation_freqs = self.validation_freqs

    if self._warn_to_deprecate_param("early_stopping_rounds", descr, "callback_param's 'early_stopping_rounds'"):
        self.callback_param.early_stopping_rounds = self.early_stopping_rounds

    if self._warn_to_deprecate_param("metrics", descr, "callback_param's 'metrics'"):
        self.callback_param.metrics = self.metrics

    if self._warn_to_deprecate_param("use_first_metric_only", descr, "callback_param's 'use_first_metric_only'"):
        self.callback_param.use_first_metric_only = self.use_first_metric_only

    if self.floating_point_precision is not None and \
            (not isinstance(self.floating_point_precision, int) or
             self.floating_point_precision < 0 or self.floating_point_precision > 64):
        raise ValueError("floating point precision should be null or a integer between 0 and 64")
    self.callback_param.check()

    return True

`predict_param` ¶

Classes¶


PredictParam            (BaseParam)

¶

Define the predict method of HomoLR, HeteroLR, SecureBoosting

Parameters:

Name	Type	Description	Default
`threshold`	`float or int`	The threshold use to separate positive and negative class. Normally, it should be (0,1)	`0.5`

Source code in federatedml/param/predict_param.py

class PredictParam(BaseParam):
    """
    Define the predict method of HomoLR, HeteroLR, SecureBoosting

    Parameters
    ----------

    threshold: float or int
        The threshold use to separate positive and negative class. Normally, it should be (0,1)
    """

    def __init__(self, threshold=0.5):
        self.threshold = threshold

    def check(self):

        if type(self.threshold).__name__ not in ["float", "int"]:
            raise ValueError("predict param's predict_param {} not supported, should be float or int".format(
                self.threshold))

        LOGGER.debug("Finish predict parameter check!")
        return True

__init__(self, threshold=0.5) special ¶

Source code in federatedml/param/predict_param.py

def __init__(self, threshold=0.5):
    self.threshold = threshold

check(self) ¶

Source code in federatedml/param/predict_param.py

def check(self):

    if type(self.threshold).__name__ not in ["float", "int"]:
        raise ValueError("predict param's predict_param {} not supported, should be float or int".format(
            self.threshold))

    LOGGER.debug("Finish predict parameter check!")
    return True

`psi_param` ¶


PSIParam            (BaseParam)

¶

Source code in federatedml/param/psi_param.py

class PSIParam(BaseParam):

    def __init__(self, max_bin_num=20, need_run=True, dense_missing_val=None,
                 binning_error=consts.DEFAULT_RELATIVE_ERROR):
        super(PSIParam, self).__init__()
        self.max_bin_num = max_bin_num
        self.need_run = need_run
        self.dense_missing_val = dense_missing_val
        self.binning_error = binning_error

    def check(self):
        assert type(self.max_bin_num) == int and self.max_bin_num > 0, 'max bin must be an integer larger than 0'
        assert type(self.need_run) == bool

        if self.dense_missing_val is not None:
            assert type(self.dense_missing_val) == str or type(self.dense_missing_val) == int or \
                   type(self.dense_missing_val) == float, \
                   'missing value type {} not supported'.format(type(self.dense_missing_val))

        self.check_decimal_float(self.binning_error, "psi's param")

__init__(self, max_bin_num=20, need_run=True, dense_missing_val=None, binning_error=0.0001) special ¶

Source code in federatedml/param/psi_param.py

def __init__(self, max_bin_num=20, need_run=True, dense_missing_val=None,
             binning_error=consts.DEFAULT_RELATIVE_ERROR):
    super(PSIParam, self).__init__()
    self.max_bin_num = max_bin_num
    self.need_run = need_run
    self.dense_missing_val = dense_missing_val
    self.binning_error = binning_error

check(self) ¶

Source code in federatedml/param/psi_param.py

def check(self):
    assert type(self.max_bin_num) == int and self.max_bin_num > 0, 'max bin must be an integer larger than 0'
    assert type(self.need_run) == bool

    if self.dense_missing_val is not None:
        assert type(self.dense_missing_val) == str or type(self.dense_missing_val) == int or \
               type(self.dense_missing_val) == float, \
               'missing value type {} not supported'.format(type(self.dense_missing_val))

    self.check_decimal_float(self.binning_error, "psi's param")

`rsa_param` ¶

Classes¶


RsaParam            (BaseParam)

¶

Define the sample method

Parameters:

Name	Type	Description	Default
`rsa_key_n`	`integer`	RSA modulus, default: None	`None`
`rsa_key_e`	`integer`	RSA public exponent, default: None	`None`
`rsa_key_d`	`integer`	RSA private exponent, default: None	`None`
`save_out_table_namespace`	`str`	namespace of table where stores the output data. default: None	`None`
`save_out_table_name`	`str`	name of table where stores the output data. default: None	`None`

Source code in federatedml/param/rsa_param.py

class RsaParam(BaseParam):
    """
    Define the sample method

    Parameters
    ----------
    rsa_key_n: integer
        RSA modulus, default: None
    rsa_key_e: integer
        RSA public exponent, default: None
    rsa_key_d: integer
        RSA private exponent, default: None
    save_out_table_namespace: str
        namespace of table where stores the output data. default: None
    save_out_table_name: str
        name of table where stores the output data. default: None
    """
    def __init__(self, rsa_key_n=None, rsa_key_e=None, rsa_key_d=None, save_out_table_namespace=None, save_out_table_name=None):
        self.rsa_key_n = rsa_key_n
        self.rsa_key_e = rsa_key_e
        self.rsa_key_d = rsa_key_d
        self.save_out_table_namespace = save_out_table_namespace
        self.save_out_table_name = save_out_table_name

    def check(self):
        descr = "rsa param"
        self.check_positive_integer(self.rsa_key_n, descr)
        self.check_positive_integer(self.rsa_key_e, descr)
        self.check_positive_integer(self.rsa_key_d, descr)
        self.check_string(self.save_out_table_namespace, descr)
        self.check_string(self.save_out_table_name, descr)
        return True

__init__(self, rsa_key_n=None, rsa_key_e=None, rsa_key_d=None, save_out_table_namespace=None, save_out_table_name=None) special ¶

Source code in federatedml/param/rsa_param.py

def __init__(self, rsa_key_n=None, rsa_key_e=None, rsa_key_d=None, save_out_table_namespace=None, save_out_table_name=None):
    self.rsa_key_n = rsa_key_n
    self.rsa_key_e = rsa_key_e
    self.rsa_key_d = rsa_key_d
    self.save_out_table_namespace = save_out_table_namespace
    self.save_out_table_name = save_out_table_name

check(self) ¶

Source code in federatedml/param/rsa_param.py

def check(self):
    descr = "rsa param"
    self.check_positive_integer(self.rsa_key_n, descr)
    self.check_positive_integer(self.rsa_key_e, descr)
    self.check_positive_integer(self.rsa_key_d, descr)
    self.check_string(self.save_out_table_namespace, descr)
    self.check_string(self.save_out_table_name, descr)
    return True

`sample_param` ¶

Classes¶


SampleParam            (BaseParam)

¶

Define the sample method

Parameters:

Name	Type	Description	Default
`mode`	`{'random', 'stratified'}'`	specify sample to use, default: 'random'	`'random'`
`method`	`{'downsample', 'upsample'}, default: 'downsample'`	specify sample method	`'downsample'`
`fractions`	`None or float or list`	if mode equals to random, it should be a float number greater than 0, otherwise a list of elements of pairs like [label_i, sample_rate_i], e.g. [[0, 0.5], [1, 0.8], [2, 0.3]]. default: None	`None`
`random_state`	`int, RandomState instance or None, default: None`	random state	`None`
`need_run`	`bool, default True`	Indicate if this module needed to be run	`True`

Source code in federatedml/param/sample_param.py

class SampleParam(BaseParam):
    """
    Define the sample method

    Parameters
    ----------
    mode: {'random', 'stratified'}'
        specify sample to use, default: 'random'

    method: {'downsample', 'upsample'}, default: 'downsample'
        specify sample method

    fractions: None or float or list
        if mode equals to random, it should be a float number greater than 0,
        otherwise a list of elements of pairs like [label_i, sample_rate_i], e.g. [[0, 0.5], [1, 0.8], [2, 0.3]]. default: None

    random_state: int, RandomState instance or None, default: None
        random state

    need_run: bool, default True
        Indicate if this module needed to be run
    """

    def __init__(self, mode="random", method="downsample", fractions=None, random_state=None, task_type="hetero",
                 need_run=True):
        self.mode = mode
        self.method = method
        self.fractions = fractions
        self.random_state = random_state
        self.task_type = task_type
        self.need_run = need_run

    def check(self):
        descr = "sample param"
        self.mode = self.check_and_change_lower(self.mode,
                                                ["random", "stratified"],
                                                descr)

        self.method = self.check_and_change_lower(self.method,
                                                  ["upsample", "downsample"],
                                                  descr)

        if self.mode == "stratified" and self.fractions is not None:
            if not isinstance(self.fractions, list):
                raise ValueError("fractions of sample param when using stratified should be list")
            for ele in self.fractions:
                if not isinstance(ele, collections.Container) or len(ele) != 2:
                    raise ValueError(
                        "element in fractions of sample param using stratified should be a pair like [label_i, rate_i]")

        return True

__init__(self, mode='random', method='downsample', fractions=None, random_state=None, task_type='hetero', need_run=True) special ¶

Source code in federatedml/param/sample_param.py

def __init__(self, mode="random", method="downsample", fractions=None, random_state=None, task_type="hetero",
             need_run=True):
    self.mode = mode
    self.method = method
    self.fractions = fractions
    self.random_state = random_state
    self.task_type = task_type
    self.need_run = need_run

check(self) ¶

Source code in federatedml/param/sample_param.py

def check(self):
    descr = "sample param"
    self.mode = self.check_and_change_lower(self.mode,
                                            ["random", "stratified"],
                                            descr)

    self.method = self.check_and_change_lower(self.method,
                                              ["upsample", "downsample"],
                                              descr)

    if self.mode == "stratified" and self.fractions is not None:
        if not isinstance(self.fractions, list):
            raise ValueError("fractions of sample param when using stratified should be list")
        for ele in self.fractions:
            if not isinstance(ele, collections.Container) or len(ele) != 2:
                raise ValueError(
                    "element in fractions of sample param using stratified should be a pair like [label_i, rate_i]")

    return True

`sample_weight_param` ¶

Classes¶


SampleWeightParam            (BaseParam)

¶

Define sample weight parameters

Parameters:

Name	Type	Description	Default
`class_weight`	`str or dict, or None, default None`	class weight dictionary or class weight computation mode, string value only accepts 'balanced'; If dict provided, key should be class(label), and weight will not be normalize, e.g.: {'0': 1, '1': 2} If both class_weight and sample_weight_name are None, return original input data.	`None`
`sample_weight_name`	`str`	name of column which specifies sample weight. feature name of sample weight; if both class_weight and sample_weight_name are None, return original input data	`None`
`normalize`	`bool, default False`	whether to normalize sample weight extracted from `sample_weight_name` column	`False`
`need_run`	`bool, default True`	whether to run this module or not	`True`

Source code in federatedml/param/sample_weight_param.py

class SampleWeightParam(BaseParam):
    """
    Define sample weight parameters

    Parameters
    ----------

    class_weight : str or dict, or None, default None
        class weight dictionary or class weight computation mode, string value only accepts 'balanced';
        If dict provided, key should be class(label), and weight will not be normalize, e.g.: {'0': 1, '1': 2}
        If both class_weight and sample_weight_name are None, return original input data.

    sample_weight_name : str
        name of column which specifies sample weight.
        feature name of sample weight; if both class_weight and sample_weight_name are None, return original input data

    normalize : bool, default False
        whether to normalize sample weight extracted from `sample_weight_name` column

    need_run : bool, default True
        whether to run this module or not

    """

    def __init__(self, class_weight=None, sample_weight_name=None, normalize=False, need_run=True):
        self.class_weight = class_weight
        self.sample_weight_name = sample_weight_name
        self.normalize = normalize
        self.need_run = need_run

    def check(self):

        descr = "sample weight param's"

        if self.class_weight:
            if not isinstance(self.class_weight, str) and not isinstance(self.class_weight, dict):
                raise ValueError(f"{descr} class_weight must be str, dict, or None.")
            if isinstance(self.class_weight, str):
                self.class_weight = self.check_and_change_lower(self.class_weight,
                                                                [consts.BALANCED],
                                                                f"{descr} class_weight")
            if isinstance(self.class_weight, dict):
                for k, v in self.class_weight.items():
                    if v < 0:
                        LOGGER.warning(f"Negative value {v} provided for class {k} as class_weight.")

        if self.sample_weight_name:
            self.check_string(self.sample_weight_name, f"{descr} sample_weight_name")

        self.check_boolean(self.need_run, f"{descr} need_run")

        self.check_boolean(self.normalize, f"{descr} normalize")

        return True

__init__(self, class_weight=None, sample_weight_name=None, normalize=False, need_run=True) special ¶

Source code in federatedml/param/sample_weight_param.py

def __init__(self, class_weight=None, sample_weight_name=None, normalize=False, need_run=True):
    self.class_weight = class_weight
    self.sample_weight_name = sample_weight_name
    self.normalize = normalize
    self.need_run = need_run

check(self) ¶

Source code in federatedml/param/sample_weight_param.py

def check(self):

    descr = "sample weight param's"

    if self.class_weight:
        if not isinstance(self.class_weight, str) and not isinstance(self.class_weight, dict):
            raise ValueError(f"{descr} class_weight must be str, dict, or None.")
        if isinstance(self.class_weight, str):
            self.class_weight = self.check_and_change_lower(self.class_weight,
                                                            [consts.BALANCED],
                                                            f"{descr} class_weight")
        if isinstance(self.class_weight, dict):
            for k, v in self.class_weight.items():
                if v < 0:
                    LOGGER.warning(f"Negative value {v} provided for class {k} as class_weight.")

    if self.sample_weight_name:
        self.check_string(self.sample_weight_name, f"{descr} sample_weight_name")

    self.check_boolean(self.need_run, f"{descr} need_run")

    self.check_boolean(self.normalize, f"{descr} normalize")

    return True

`sbt_feature_transformer_param` ¶

Classes¶


SBTTransformerParam            (BaseParam)

¶

Source code in federatedml/param/sbt_feature_transformer_param.py

class SBTTransformerParam(BaseParam):

    def __init__(self, dense_format=True):

        """
        Parameters
        ----------
        dense_format: bool
            return data in dense vec if True, otherwise return in sparse vec
        """
        super(SBTTransformerParam, self).__init__()
        self.dense_format = dense_format

    def check(self):
        self.check_boolean(self.dense_format, 'SBTTransformer')

Methods¶

__init__(self, dense_format=True) special ¶

Parameters:

Name	Type	Description	Default
`dense_format`	`bool`	return data in dense vec if True, otherwise return in sparse vec	`True`

Source code in federatedml/param/sbt_feature_transformer_param.py

def __init__(self, dense_format=True):

    """
    Parameters
    ----------
    dense_format: bool
        return data in dense vec if True, otherwise return in sparse vec
    """
    super(SBTTransformerParam, self).__init__()
    self.dense_format = dense_format

check(self) ¶

Source code in federatedml/param/sbt_feature_transformer_param.py

def check(self):
    self.check_boolean(self.dense_format, 'SBTTransformer')

`scale_param` ¶

Classes¶


ScaleParam            (BaseParam)

¶

Define the feature scale parameters.

Parameters:

Name	Type	Description	Default
`method`	`{"standard_scale", "min_max_scale"}`	like scale in sklearn, now it support "min_max_scale" and "standard_scale", and will support other scale method soon. Default standard_scale, which will do nothing for scale	`'standard_scale'`
`mode`	`{"normal", "cap"}`	for mode is "normal", the feat_upper and feat_lower is the normal value like "10" or "3.1" and for "cap", feat_upper and feature_lower will between 0 and 1, which means the percentile of the column. Default "normal"	`'normal'`
`feat_upper`	`int or float or list of int or float`	the upper limit in the column. If use list, mode must be "normal", and list length should equal to the number of features to scale. If the scaled value is larger than feat_upper, it will be set to feat_upper	`None`
`feat_lower`	`int or float or list of int or float`	the lower limit in the column. If use list, mode must be "normal", and list length should equal to the number of features to scale. If the scaled value is less than feat_lower, it will be set to feat_lower	`None`
`scale_col_indexes`	`list`	the idx of column in scale_column_idx will be scaled, while the idx of column is not in, it will not be scaled.	`-1`
`scale_names`	`list of string`	Specify which columns need to scaled. Each element in the list represent for a column name in header. default: []	`None`
`with_mean`	`bool`	used for "standard_scale". Default True.	`True`
`with_std`	`bool`	used for "standard_scale". Default True. The standard scale of column x is calculated as : $z = (x - u) / s$ , where $u$ is the mean of the column and $s$ is the standard deviation of the column. if with_mean is False, $u$ will be 0, and if with_std is False, $s$ will be 1.	`True`
`need_run`	`bool`	Indicate if this module needed to be run, default True	`True`

Source code in federatedml/param/scale_param.py

class ScaleParam(BaseParam):
    """
    Define the feature scale parameters.

    Parameters
    ----------
    method : {"standard_scale", "min_max_scale"}
        like scale in sklearn, now it support "min_max_scale" and "standard_scale", and will support other scale method soon.
        Default standard_scale, which will do nothing for scale

    mode : {"normal", "cap"}
        for mode is "normal", the feat_upper and feat_lower is the normal value like "10" or "3.1"
        and for "cap", feat_upper and feature_lower will between 0 and 1, which means the percentile of the column. Default "normal"

    feat_upper : int or float or list of int or float
        the upper limit in the column.
        If use list, mode must be "normal", and list length should equal to the number of features to scale.
        If the scaled value is larger than feat_upper, it will be set to feat_upper

    feat_lower: int or float or list of int or float
        the lower limit in the column.
        If use list, mode must be "normal", and list length should equal to the number of features to scale.
        If the scaled value is less than feat_lower, it will be set to feat_lower

    scale_col_indexes: list
        the idx of column in scale_column_idx will be scaled, while the idx of column is not in, it will not be scaled.

    scale_names : list of string
        Specify which columns need to scaled. Each element in the list represent for a column name in header. default: []

    with_mean : bool
        used for "standard_scale". Default True.

    with_std : bool
        used for "standard_scale". Default True.
        The standard scale of column x is calculated as : $z = (x - u) / s$ , where $u$ is the mean of the column and $s$ is the standard deviation of the column.
        if with_mean is False, $u$ will be 0, and if with_std is False, $s$ will be 1.

    need_run : bool
        Indicate if this module needed to be run, default True

    """

    def __init__(self, method="standard_scale", mode="normal", scale_col_indexes=-1, scale_names=None, feat_upper=None, feat_lower=None,
                 with_mean=True, with_std=True, need_run=True):
        super().__init__()
        self.scale_names = [] if scale_names is None else scale_names

        self.method = method
        self.mode = mode
        self.feat_upper = feat_upper
        # LOGGER.debug("self.feat_upper:{}, type:{}".format(self.feat_upper, type(self.feat_upper)))
        self.feat_lower = feat_lower
        self.scale_col_indexes = scale_col_indexes

        self.with_mean = with_mean
        self.with_std = with_std

        self.need_run = need_run

    def check(self):
        if self.method is not None:
            descr = "scale param's method"
            self.method = self.check_and_change_lower(self.method,
                                                      [consts.MINMAXSCALE, consts.STANDARDSCALE],
                                                      descr)

        descr = "scale param's mode"
        self.mode = self.check_and_change_lower(self.mode,
                                                [consts.NORMAL, consts.CAP],
                                                descr)
        # LOGGER.debug("self.feat_upper:{}, type:{}".format(self.feat_upper, type(self.feat_upper)))
        # if type(self.feat_upper).__name__ not in ["float", "int"]:
        #     raise ValueError("scale param's feat_upper {} not supported, should be float or int".format(
        #         self.feat_upper))


        if self.scale_col_indexes != -1  and not isinstance(self.scale_col_indexes, list):
            raise ValueError("scale_col_indexes is should be -1 or a list")

        if self.scale_names is None:
            self.scale_names = []
        if not isinstance(self.scale_names, list):
            raise ValueError("scale_names is should be a list of string")
        else:
            for e in self.scale_names:
                if not isinstance(e, str):
                    raise ValueError("scale_names is should be a list of string")

        self.check_boolean(self.with_mean, "scale_param with_mean")
        self.check_boolean(self.with_std, "scale_param with_std")
        self.check_boolean(self.need_run, "scale_param need_run")

        LOGGER.debug("Finish scale parameter check!")
        return True

__init__(self, method='standard_scale', mode='normal', scale_col_indexes=-1, scale_names=None, feat_upper=None, feat_lower=None, with_mean=True, with_std=True, need_run=True)

special ¶

Source code in federatedml/param/scale_param.py

def __init__(self, method="standard_scale", mode="normal", scale_col_indexes=-1, scale_names=None, feat_upper=None, feat_lower=None,
             with_mean=True, with_std=True, need_run=True):
    super().__init__()
    self.scale_names = [] if scale_names is None else scale_names

    self.method = method
    self.mode = mode
    self.feat_upper = feat_upper
    # LOGGER.debug("self.feat_upper:{}, type:{}".format(self.feat_upper, type(self.feat_upper)))
    self.feat_lower = feat_lower
    self.scale_col_indexes = scale_col_indexes

    self.with_mean = with_mean
    self.with_std = with_std

    self.need_run = need_run

check(self) ¶

Source code in federatedml/param/scale_param.py

def check(self):
    if self.method is not None:
        descr = "scale param's method"
        self.method = self.check_and_change_lower(self.method,
                                                  [consts.MINMAXSCALE, consts.STANDARDSCALE],
                                                  descr)

    descr = "scale param's mode"
    self.mode = self.check_and_change_lower(self.mode,
                                            [consts.NORMAL, consts.CAP],
                                            descr)
    # LOGGER.debug("self.feat_upper:{}, type:{}".format(self.feat_upper, type(self.feat_upper)))
    # if type(self.feat_upper).__name__ not in ["float", "int"]:
    #     raise ValueError("scale param's feat_upper {} not supported, should be float or int".format(
    #         self.feat_upper))


    if self.scale_col_indexes != -1  and not isinstance(self.scale_col_indexes, list):
        raise ValueError("scale_col_indexes is should be -1 or a list")

    if self.scale_names is None:
        self.scale_names = []
    if not isinstance(self.scale_names, list):
        raise ValueError("scale_names is should be a list of string")
    else:
        for e in self.scale_names:
            if not isinstance(e, str):
                raise ValueError("scale_names is should be a list of string")

    self.check_boolean(self.with_mean, "scale_param with_mean")
    self.check_boolean(self.with_std, "scale_param with_std")
    self.check_boolean(self.need_run, "scale_param need_run")

    LOGGER.debug("Finish scale parameter check!")
    return True

`scorecard_param` ¶

Classes¶


ScorecardParam            (BaseParam)

¶

Define method used for transforming prediction score to credit score

Parameters:

Name	Type	Description	Default
`method`	`{"credit"}, default: 'credit'`	score method, currently only supports "credit"	`'credit'`
`offset`	`int or float, default: 500`	score baseline	`500`
`factor`	`int or float, default: 20`	scoring step, when odds double, result score increases by this factor	`20`
`factor_base`	`int or float, default: 2`	factor base, value ln(factor_base) is used for calculating result score	`2`
`upper_limit_ratio`	`int or float, default: 3`	upper bound for odds, credit score upper bound is upper_limit_ratio * offset	`3`
`lower_limit_value`	`int or float, default: 0`	lower bound for result score	`0`
`need_run`	`bool, default: True`	Indicate if this module needs to be run.	`True`

Source code in federatedml/param/scorecard_param.py

class ScorecardParam(BaseParam):
    """
    Define method used for transforming prediction score to credit score

    Parameters
    ----------

    method : {"credit"}, default: 'credit'
        score method, currently only supports "credit"

    offset : int or float, default: 500
        score baseline

    factor : int or float, default: 20
        scoring step, when odds double, result score increases by this factor

    factor_base : int or float, default: 2
        factor base, value ln(factor_base) is used for calculating result score

    upper_limit_ratio : int or float, default: 3
        upper bound for odds, credit score upper bound is upper_limit_ratio * offset

    lower_limit_value : int or float, default: 0
        lower bound for result score

    need_run : bool, default: True
        Indicate if this module needs to be run.

    """

    def __init__(self, method="credit", offset=500, factor=20, factor_base=2, upper_limit_ratio=3, lower_limit_value=0, need_run=True):
        super(ScorecardParam, self).__init__()
        self.method = method
        self.offset = offset
        self.factor = factor
        self.factor_base = factor_base
        self.upper_limit_ratio = upper_limit_ratio
        self.lower_limit_value = lower_limit_value
        self.need_run = need_run

    def check(self):
        descr = "scorecard param"
        if not isinstance(self.method, str):
            raise ValueError(f"{descr}method {self.method} not supported, should be str type")
        else:
            user_input = self.method.lower()
            if user_input == "credit":
                self.method = consts.CREDIT
            else:
                raise ValueError(f"{descr} method {user_input} not supported")

        if type(self.offset).__name__ not in ["int", "long", "float"]:
            raise ValueError(f"{descr} offset must be numeric,"
                             f"received {type(self.offset)} instead.")

        if type(self.factor).__name__ not in ["int", "long", "float"]:
            raise ValueError(f"{descr} factor must be numeric,"
                             f"received {type(self.factor)} instead.")

        if type(self.factor_base).__name__ not in ["int", "long", "float"]:
            raise ValueError(f"{descr} factor_base must be numeric,"
                             f"received {type(self.factor_base)} instead.")

        if type(self.upper_limit_ratio).__name__ not in ["int", "long", "float"]:
            raise ValueError(f"{descr} upper_limit_ratio must be numeric,"
                             f"received {type(self.upper_limit_ratio)} instead.")

        if type(self.lower_limit_value).__name__ not in ["int", "long", "float"]:
            raise ValueError(f"{descr} lower_limit_value must be numeric,"
                             f"received {type(self.lower_limit_value)} instead.")

        BaseParam.check_boolean(self.need_run, descr=descr+"need_run ")

        LOGGER.debug("Finish Scorecard parameter check!")
        return True

__init__(self, method='credit', offset=500, factor=20, factor_base=2, upper_limit_ratio=3, lower_limit_value=0, need_run=True)

special ¶

Source code in federatedml/param/scorecard_param.py

def __init__(self, method="credit", offset=500, factor=20, factor_base=2, upper_limit_ratio=3, lower_limit_value=0, need_run=True):
    super(ScorecardParam, self).__init__()
    self.method = method
    self.offset = offset
    self.factor = factor
    self.factor_base = factor_base
    self.upper_limit_ratio = upper_limit_ratio
    self.lower_limit_value = lower_limit_value
    self.need_run = need_run

check(self) ¶

Source code in federatedml/param/scorecard_param.py

def check(self):
    descr = "scorecard param"
    if not isinstance(self.method, str):
        raise ValueError(f"{descr}method {self.method} not supported, should be str type")
    else:
        user_input = self.method.lower()
        if user_input == "credit":
            self.method = consts.CREDIT
        else:
            raise ValueError(f"{descr} method {user_input} not supported")

    if type(self.offset).__name__ not in ["int", "long", "float"]:
        raise ValueError(f"{descr} offset must be numeric,"
                         f"received {type(self.offset)} instead.")

    if type(self.factor).__name__ not in ["int", "long", "float"]:
        raise ValueError(f"{descr} factor must be numeric,"
                         f"received {type(self.factor)} instead.")

    if type(self.factor_base).__name__ not in ["int", "long", "float"]:
        raise ValueError(f"{descr} factor_base must be numeric,"
                         f"received {type(self.factor_base)} instead.")

    if type(self.upper_limit_ratio).__name__ not in ["int", "long", "float"]:
        raise ValueError(f"{descr} upper_limit_ratio must be numeric,"
                         f"received {type(self.upper_limit_ratio)} instead.")

    if type(self.lower_limit_value).__name__ not in ["int", "long", "float"]:
        raise ValueError(f"{descr} lower_limit_value must be numeric,"
                         f"received {type(self.lower_limit_value)} instead.")

    BaseParam.check_boolean(self.need_run, descr=descr+"need_run ")

    LOGGER.debug("Finish Scorecard parameter check!")
    return True

`secure_add_example_param` ¶


SecureAddExampleParam            (BaseParam)

¶

Source code in federatedml/param/secure_add_example_param.py

class SecureAddExampleParam(BaseParam):
    def __init__(self, seed=None, partition=1, data_num=1000):
        self.seed = seed
        self.partition = partition
        self.data_num = data_num

    def check(self):
        if self.seed is not None and type(self.seed).__name__ != "int":
            raise ValueError("random seed should be None or integers")

        if type(self.partition).__name__ != "int" or self.partition < 1:
            raise ValueError("partition should be an integer large than 0")

        if type(self.data_num).__name__ != "int" or self.data_num < 1:
            raise ValueError("data_num should be an integer large than 0")

__init__(self, seed=None, partition=1, data_num=1000) special ¶

Source code in federatedml/param/secure_add_example_param.py

def __init__(self, seed=None, partition=1, data_num=1000):
    self.seed = seed
    self.partition = partition
    self.data_num = data_num

check(self) ¶

Source code in federatedml/param/secure_add_example_param.py

def check(self):
    if self.seed is not None and type(self.seed).__name__ != "int":
        raise ValueError("random seed should be None or integers")

    if type(self.partition).__name__ != "int" or self.partition < 1:
        raise ValueError("partition should be an integer large than 0")

    if type(self.data_num).__name__ != "int" or self.data_num < 1:
        raise ValueError("data_num should be an integer large than 0")

`sir_param` ¶

Classes¶


SecureInformationRetrievalParam            (BaseParam)

¶

Parameters:

Name	Type	Description	Default
`security_level`	`float, default 0.5`	security level, should set value in [0, 1] if security_level equals 0.0 means raw data retrieval	`0.5`
`oblivious_transfer_protocol`	`{"OT_Hauck"}`	OT type, only supports OT_Hauck	`'OT_Hauck'`
`commutative_encryption`	`{"CommutativeEncryptionPohligHellman"}`	the commutative encryption scheme used	`'CommutativeEncryptionPohligHellman'`
`non_committing_encryption`	`{"aes"}`	the non-committing encryption scheme used	`'aes'`
`dh_params`	`None`	params for Pohlig-Hellman Encryption	`<federatedml.param.intersect_param.DHParam object at 0x7f3f8a6b5d90>`
`key_size`	`int, value >= 1024`	the key length of the commutative cipher; note that this param will be deprecated in future, please specify key_length in PHParam instead.	`1024`
`raw_retrieval`	`bool`	perform raw retrieval if raw_retrieval	`False`
`target_cols`	`str or list of str`	target cols to retrieve; any values not retrieved will be marked as "unretrieved", if target_cols is None, label will be retrieved, same behavior as in previous version default None	`None`

Source code in federatedml/param/sir_param.py

class SecureInformationRetrievalParam(BaseParam):
    """
    Parameters
    ----------
    security_level: float, default 0.5 
        security level, should set value in [0, 1]
        if security_level equals 0.0 means raw data retrieval

    oblivious_transfer_protocol: {"OT_Hauck"}
        OT type, only supports OT_Hauck

    commutative_encryption : {"CommutativeEncryptionPohligHellman"}
        the commutative encryption scheme used

    non_committing_encryption : {"aes"}
        the non-committing encryption scheme used

    dh_params
        params for Pohlig-Hellman Encryption

    key_size: int, value >= 1024
        the key length of the commutative cipher;
        note that this param will be deprecated in future, please specify key_length in PHParam instead.

    raw_retrieval: bool
        perform raw retrieval if raw_retrieval

    target_cols: str or list of str
        target cols to retrieve;
        any values not retrieved will be marked as "unretrieved",
        if target_cols is None, label will be retrieved, same behavior as in previous version
        default None

    """
    def __init__(self, security_level=0.5,
                 oblivious_transfer_protocol=consts.OT_HAUCK,
                 commutative_encryption=consts.CE_PH,
                 non_committing_encryption=consts.AES,
                 key_size=consts.DEFAULT_KEY_LENGTH,
                 dh_params=DHParam(),
                 raw_retrieval=False,
                 target_cols=None):
        super(SecureInformationRetrievalParam, self).__init__()
        self.security_level = security_level
        self.oblivious_transfer_protocol = oblivious_transfer_protocol
        self.commutative_encryption = commutative_encryption
        self.non_committing_encryption = non_committing_encryption
        self.dh_params = dh_params
        self.key_size = key_size
        self.raw_retrieval = raw_retrieval
        self.target_cols = [] if target_cols is None else target_cols

    def check(self):
        descr = "secure information retrieval param's "
        self.check_decimal_float(self.security_level, descr+"security_level")
        self.oblivious_transfer_protocol = self.check_and_change_lower(self.oblivious_transfer_protocol,
                                                                       [consts.OT_HAUCK.lower()],
                                                                       descr + "oblivious_transfer_protocol")
        self.commutative_encryption = self.check_and_change_lower(self.commutative_encryption,
                                                                  [consts.CE_PH.lower()],
                                                                  descr + "commutative_encryption")
        self.non_committing_encryption = self.check_and_change_lower(self.non_committing_encryption,
                                                                     [consts.AES.lower()],
                                                                     descr + "non_committing_encryption")
        if self._warn_to_deprecate_param("key_size", descr, "dh_param's key_length"):
            self.dh_params.key_length = self.key_size
        self.dh_params.check()
        if self._warn_to_deprecate_param("raw_retrieval", descr, "dh_param's security_level = 0"):
           self.check_boolean(self.raw_retrieval, descr)
        if not isinstance(self.target_cols, list):
            self.target_cols = [self.target_cols]
        for col in self.target_cols:
            self.check_string(col, descr+"target_cols")
        if len(self.target_cols) == 0:
            LOGGER.warning(f"Both 'target_cols' and 'target_indexes' are empty. Label will be retrieved.")

__init__(self, security_level=0.5, oblivious_transfer_protocol='OT_Hauck', commutative_encryption='CommutativeEncryptionPohligHellman', non_committing_encryption='aes', key_size=1024, dh_params=<federatedml.param.intersect_param.DHParam object at 0x7f3f8a6b5d90>, raw_retrieval=False, target_cols=None)

special ¶

Source code in federatedml/param/sir_param.py

def __init__(self, security_level=0.5,
             oblivious_transfer_protocol=consts.OT_HAUCK,
             commutative_encryption=consts.CE_PH,
             non_committing_encryption=consts.AES,
             key_size=consts.DEFAULT_KEY_LENGTH,
             dh_params=DHParam(),
             raw_retrieval=False,
             target_cols=None):
    super(SecureInformationRetrievalParam, self).__init__()
    self.security_level = security_level
    self.oblivious_transfer_protocol = oblivious_transfer_protocol
    self.commutative_encryption = commutative_encryption
    self.non_committing_encryption = non_committing_encryption
    self.dh_params = dh_params
    self.key_size = key_size
    self.raw_retrieval = raw_retrieval
    self.target_cols = [] if target_cols is None else target_cols

check(self) ¶

Source code in federatedml/param/sir_param.py

def check(self):
    descr = "secure information retrieval param's "
    self.check_decimal_float(self.security_level, descr+"security_level")
    self.oblivious_transfer_protocol = self.check_and_change_lower(self.oblivious_transfer_protocol,
                                                                   [consts.OT_HAUCK.lower()],
                                                                   descr + "oblivious_transfer_protocol")
    self.commutative_encryption = self.check_and_change_lower(self.commutative_encryption,
                                                              [consts.CE_PH.lower()],
                                                              descr + "commutative_encryption")
    self.non_committing_encryption = self.check_and_change_lower(self.non_committing_encryption,
                                                                 [consts.AES.lower()],
                                                                 descr + "non_committing_encryption")
    if self._warn_to_deprecate_param("key_size", descr, "dh_param's key_length"):
        self.dh_params.key_length = self.key_size
    self.dh_params.check()
    if self._warn_to_deprecate_param("raw_retrieval", descr, "dh_param's security_level = 0"):
       self.check_boolean(self.raw_retrieval, descr)
    if not isinstance(self.target_cols, list):
        self.target_cols = [self.target_cols]
    for col in self.target_cols:
        self.check_string(col, descr+"target_cols")
    if len(self.target_cols) == 0:
        LOGGER.warning(f"Both 'target_cols' and 'target_indexes' are empty. Label will be retrieved.")

`sqn_param` ¶

Classes¶


StochasticQuasiNewtonParam            (BaseParam)

¶

Parameters used for stochastic quasi-newton method.

Parameters:

Name	Type	Description	Default
`update_interval_L`	`int, default: 3`	Set how many iteration to update hess matrix	`3`
`memory_M`	`int, default: 5`	Stack size of curvature information, i.e. y_k and s_k in the paper.	`5`
`sample_size`	`int, default: 5000`	Sample size of data that used to update Hess matrix	`5000`

Source code in federatedml/param/sqn_param.py

class StochasticQuasiNewtonParam(BaseParam):
    """
    Parameters used for stochastic quasi-newton method.

    Parameters
    ----------
    update_interval_L : int, default: 3
        Set how many iteration to update hess matrix

    memory_M : int, default: 5
        Stack size of curvature information, i.e. y_k and s_k in the paper.

    sample_size : int, default: 5000
        Sample size of data that used to update Hess matrix

    """
    def __init__(self, update_interval_L=3, memory_M=5, sample_size=5000, random_seed=None):
        super().__init__()
        self.update_interval_L = update_interval_L
        self.memory_M = memory_M
        self.sample_size = sample_size
        self.random_seed = random_seed

    def check(self):
        descr = "hetero sqn param's"
        self.check_positive_integer(self.update_interval_L, descr)
        self.check_positive_integer(self.memory_M, descr)
        self.check_positive_integer(self.sample_size, descr)
        if self.random_seed is not None:
            self.check_positive_integer(self.random_seed, descr)
        return True

__init__(self, update_interval_L=3, memory_M=5, sample_size=5000, random_seed=None) special ¶

Source code in federatedml/param/sqn_param.py

def __init__(self, update_interval_L=3, memory_M=5, sample_size=5000, random_seed=None):
    super().__init__()
    self.update_interval_L = update_interval_L
    self.memory_M = memory_M
    self.sample_size = sample_size
    self.random_seed = random_seed

check(self) ¶

Source code in federatedml/param/sqn_param.py

def check(self):
    descr = "hetero sqn param's"
    self.check_positive_integer(self.update_interval_L, descr)
    self.check_positive_integer(self.memory_M, descr)
    self.check_positive_integer(self.sample_size, descr)
    if self.random_seed is not None:
        self.check_positive_integer(self.random_seed, descr)
    return True

`statistics_param` ¶

Classes¶


StatisticsParam            (BaseParam)

¶

Define statistics params

Parameters:

Name	Type	Description	Default
`statistics`	`list, string, default "summary"`	Specify the statistic types to be computed. "summary" represents list: [consts.SUM, consts.MEAN, consts.STANDARD_DEVIATION, consts.MEDIAN, consts.MIN, consts.MAX, consts.MISSING_COUNT, consts.SKEWNESS, consts.KURTOSIS]	`'summary'`
`column_names`	`list of string, default []`	Specify columns to be used for statistic computation by column names in header	`None`
`column_indexes`	`list of int, default -1`	Specify columns to be used for statistic computation by column order in header -1 indicates to compute statistics over all columns	`-1`
`bias`	`bool, default: True`	If False, the calculations of skewness and kurtosis are corrected for statistical bias.	`True`
`need_run`	`bool, default True`	Indicate whether to run this modules	`True`

Source code in federatedml/param/statistics_param.py

class StatisticsParam(BaseParam):
    """
    Define statistics params

    Parameters
    ----------
    statistics: list, string, default "summary"
        Specify the statistic types to be computed.
        "summary" represents list: [consts.SUM, consts.MEAN, consts.STANDARD_DEVIATION,
                    consts.MEDIAN, consts.MIN, consts.MAX,
                    consts.MISSING_COUNT, consts.SKEWNESS, consts.KURTOSIS]

    column_names: list of string, default []
        Specify columns to be used for statistic computation by column names in header

    column_indexes: list of int, default -1
        Specify columns to be used for statistic computation by column order in header
        -1 indicates to compute statistics over all columns

    bias: bool, default: True
        If False, the calculations of skewness and kurtosis are corrected for statistical bias.

    need_run: bool, default True
        Indicate whether to run this modules
    """

    LEGAL_STAT = [consts.COUNT, consts.SUM, consts.MEAN, consts.STANDARD_DEVIATION,
                  consts.MEDIAN, consts.MIN, consts.MAX, consts.VARIANCE,
                  consts.COEFFICIENT_OF_VARIATION, consts.MISSING_COUNT,
                  consts.MISSING_RATIO,
                  consts.SKEWNESS, consts.KURTOSIS]
    BASIC_STAT = [consts.SUM, consts.MEAN, consts.STANDARD_DEVIATION,
                  consts.MEDIAN, consts.MIN, consts.MAX, consts.MISSING_RATIO,
                  consts.MISSING_COUNT, consts.SKEWNESS, consts.KURTOSIS,
                  consts.COEFFICIENT_OF_VARIATION]
    LEGAL_QUANTILE = re.compile("^(100)|([1-9]?[0-9])%$")

    def __init__(self, statistics="summary", column_names=None,
                 column_indexes=-1, need_run=True, abnormal_list=None,
                 quantile_error=consts.DEFAULT_RELATIVE_ERROR, bias=True):
        super().__init__()
        self.statistics = statistics
        self.column_names = column_names
        self.column_indexes = column_indexes
        self.abnormal_list = abnormal_list
        self.need_run = need_run
        self.quantile_error = quantile_error
        self.bias = bias
        if column_names is None:
            self.column_names = []
        if column_indexes is None:
            self.column_indexes = []
        if abnormal_list is None:
            self.abnormal_list = []

    # @staticmethod
    # def extend_statistics(statistic_name):
    #     basic_metrics = [consts.SUM, consts.MEAN, consts.STANDARD_DEVIATION,
    #                 consts.MEDIAN, consts.MIN, consts.MAX, consts.MISSING_RATIO,
    #                 consts.MISSING_COUNT, consts.SKEWNESS, consts.KURTOSIS,
    #                 consts.COEFFICIENT_OF_VARIATION]
    #     if statistic_name == "summary":
    #         return basic_metrics
    #
    # if statistic_name == "describe":
    #     return [consts.COUNT, consts.MEAN, consts.STANDARD_DEVIATION,
    #             consts.MIN, consts.MAX]

    @staticmethod
    def find_stat_name_match(stat_name):
        if stat_name in StatisticsParam.LEGAL_STAT or StatisticsParam.LEGAL_QUANTILE.match(stat_name):
            return True
        return False

        # match_result = [legal_name == stat_name for legal_name in StatisticsParam.LEGAL_STAT]
        # match_result.append(0 if LEGAL_QUANTILE.match(stat_name) is None else True)
        # match_found = sum(match_result) > 0
        # return match_found

    def check(self):
        model_param_descr = "Statistics's param statistics"
        BaseParam.check_boolean(self.need_run, model_param_descr)
        statistics = copy.copy(self.BASIC_STAT)
        if not isinstance(self.statistics, list):
            if self.statistics in [consts.SUMMARY]:
                self.statistics = statistics
            else:
                if self.statistics not in statistics:
                    statistics.append(self.statistics)
                self.statistics = statistics
        else:
            for s in self.statistics:
                if s not in statistics:
                    statistics.append(s)
            self.statistics = statistics

        for stat_name in self.statistics:
            match_found = StatisticsParam.find_stat_name_match(stat_name)
            if not match_found:
                raise ValueError(f"Illegal statistics name provided: {stat_name}.")

        model_param_descr = "Statistics's param column_names"
        if not isinstance(self.column_names, list):
            raise ValueError(f"column_names should be list of string.")
        for col_name in self.column_names:
            BaseParam.check_string(col_name, model_param_descr)

        model_param_descr = "Statistics's param column_indexes"
        if not isinstance(self.column_indexes, list) and self.column_indexes != -1:
            raise ValueError(f"column_indexes should be list of int or -1.")
        if self.column_indexes != -1:
            for col_index in self.column_indexes:
                if not isinstance(col_index, int):
                    raise ValueError(f"{model_param_descr} should be int or list of int")
                if col_index < -consts.FLOAT_ZERO:
                    raise ValueError(f"{model_param_descr} should be non-negative int value(s)")

        if not isinstance(self.abnormal_list, list):
            raise ValueError(f"abnormal_list should be list of int or string.")

        self.check_decimal_float(self.quantile_error, "Statistics's param quantile_error ")
        self.check_boolean(self.bias, "Statistics's param bias ")
        return True

BASIC_STAT ¶

LEGAL_QUANTILE ¶

LEGAL_STAT ¶

__init__(self, statistics='summary', column_names=None, column_indexes=-1, need_run=True, abnormal_list=None, quantile_error=0.0001, bias=True)

special ¶

Source code in federatedml/param/statistics_param.py

def __init__(self, statistics="summary", column_names=None,
             column_indexes=-1, need_run=True, abnormal_list=None,
             quantile_error=consts.DEFAULT_RELATIVE_ERROR, bias=True):
    super().__init__()
    self.statistics = statistics
    self.column_names = column_names
    self.column_indexes = column_indexes
    self.abnormal_list = abnormal_list
    self.need_run = need_run
    self.quantile_error = quantile_error
    self.bias = bias
    if column_names is None:
        self.column_names = []
    if column_indexes is None:
        self.column_indexes = []
    if abnormal_list is None:
        self.abnormal_list = []

find_stat_name_match(stat_name) staticmethod ¶

Source code in federatedml/param/statistics_param.py

@staticmethod
def find_stat_name_match(stat_name):
    if stat_name in StatisticsParam.LEGAL_STAT or StatisticsParam.LEGAL_QUANTILE.match(stat_name):
        return True
    return False

check(self) ¶

Source code in federatedml/param/statistics_param.py

def check(self):
    model_param_descr = "Statistics's param statistics"
    BaseParam.check_boolean(self.need_run, model_param_descr)
    statistics = copy.copy(self.BASIC_STAT)
    if not isinstance(self.statistics, list):
        if self.statistics in [consts.SUMMARY]:
            self.statistics = statistics
        else:
            if self.statistics not in statistics:
                statistics.append(self.statistics)
            self.statistics = statistics
    else:
        for s in self.statistics:
            if s not in statistics:
                statistics.append(s)
        self.statistics = statistics

    for stat_name in self.statistics:
        match_found = StatisticsParam.find_stat_name_match(stat_name)
        if not match_found:
            raise ValueError(f"Illegal statistics name provided: {stat_name}.")

    model_param_descr = "Statistics's param column_names"
    if not isinstance(self.column_names, list):
        raise ValueError(f"column_names should be list of string.")
    for col_name in self.column_names:
        BaseParam.check_string(col_name, model_param_descr)

    model_param_descr = "Statistics's param column_indexes"
    if not isinstance(self.column_indexes, list) and self.column_indexes != -1:
        raise ValueError(f"column_indexes should be list of int or -1.")
    if self.column_indexes != -1:
        for col_index in self.column_indexes:
            if not isinstance(col_index, int):
                raise ValueError(f"{model_param_descr} should be int or list of int")
            if col_index < -consts.FLOAT_ZERO:
                raise ValueError(f"{model_param_descr} should be non-negative int value(s)")

    if not isinstance(self.abnormal_list, list):
        raise ValueError(f"abnormal_list should be list of int or string.")

    self.check_decimal_float(self.quantile_error, "Statistics's param quantile_error ")
    self.check_boolean(self.bias, "Statistics's param bias ")
    return True

`stepwise_param` ¶

Classes¶


StepwiseParam            (BaseParam)

¶

Define stepwise params

Parameters:

Name	Type	Description	Default
`score_name`	`{"AIC", "BIC"}, default: 'AIC'`	Specify which model selection criterion to be used	`'AIC'`
`mode`	`{"Hetero", "Homo"}, default: 'Hetero'`	Indicate what mode is current task	`'hetero'`
`role`	`{"Guest", "Host", "Arbiter"}, default: 'Guest'`	Indicate what role is current party	`'guest'`
`direction`	`{"both", "forward", "backward"}, default: 'both'`	Indicate which direction to go for stepwise. 'forward' means forward selection; 'backward' means elimination; 'both' means possible models of both directions are examined at each step.	`'both'`
`max_step`	`int, default: '10'`	Specify total number of steps to run before forced stop.	`10`
`nvmin`	`int, default: '2'`	Specify the min subset size of final model, cannot be lower than 2. When nvmin > 2, the final model size may be smaller than nvmin due to max_step limit.	`2`
`nvmax`	`int, default: None`	Specify the max subset size of final model, 2 <= nvmin <= nvmax. The final model size may be larger than nvmax due to max_step limit.	`None`
`need_stepwise`	`bool, default False`	Indicate if this module needed to be run	`False`

Source code in federatedml/param/stepwise_param.py

class StepwiseParam(BaseParam):
    """
    Define stepwise params

    Parameters
    ----------
    score_name: {"AIC", "BIC"}, default: 'AIC'
        Specify which model selection criterion to be used

    mode: {"Hetero", "Homo"}, default: 'Hetero'
        Indicate what mode is current task

    role: {"Guest", "Host", "Arbiter"}, default: 'Guest'
        Indicate what role is current party

    direction: {"both", "forward", "backward"}, default: 'both'
        Indicate which direction to go for stepwise.
        'forward' means forward selection; 'backward' means elimination; 'both' means possible models of both directions are examined at each step.

    max_step: int, default: '10'
        Specify total number of steps to run before forced stop.

    nvmin: int, default: '2'
        Specify the min subset size of final model, cannot be lower than 2. When nvmin > 2, the final model size may be smaller than nvmin due to max_step limit.

    nvmax: int, default: None
        Specify the max subset size of final model, 2 <= nvmin <= nvmax. The final model size may be larger than nvmax due to max_step limit.

    need_stepwise: bool, default False
        Indicate if this module needed to be run

    """

    def __init__(self, score_name="AIC", mode=consts.HETERO, role=consts.GUEST, direction="both",
                 max_step=10, nvmin=2, nvmax=None, need_stepwise=False):
        super(StepwiseParam, self).__init__()
        self.score_name = score_name
        self.mode = mode
        self.role = role
        self.direction = direction
        self.max_step = max_step
        self.nvmin = nvmin
        self.nvmax = nvmax
        self.need_stepwise = need_stepwise

    def check(self):
        model_param_descr = "stepwise param's"
        self.score_name = self.check_and_change_lower(self.score_name, ["aic", "bic"], model_param_descr)
        self.check_valid_value(self.mode, model_param_descr, valid_values=[consts.HOMO, consts.HETERO])
        self.check_valid_value(self.role, model_param_descr, valid_values=[consts.HOST, consts.GUEST, consts.ARBITER])
        self.direction = self.check_and_change_lower(self.direction, ["forward", "backward", "both"], model_param_descr)
        self.check_positive_integer(self.max_step, model_param_descr)
        self.check_positive_integer(self.nvmin, model_param_descr)
        if self.nvmin < 2:
            raise ValueError(model_param_descr + " nvmin must be no less than 2.")
        if self.nvmax is not None:
            self.check_positive_integer(self.nvmax, model_param_descr)
            if self.nvmin > self.nvmax:
                raise ValueError(model_param_descr + " nvmax must be greater than nvmin.")
        self.check_boolean(self.need_stepwise, model_param_descr)

__init__(self, score_name='AIC', mode='hetero', role='guest', direction='both', max_step=10, nvmin=2, nvmax=None, need_stepwise=False)

special ¶

Source code in federatedml/param/stepwise_param.py

def __init__(self, score_name="AIC", mode=consts.HETERO, role=consts.GUEST, direction="both",
             max_step=10, nvmin=2, nvmax=None, need_stepwise=False):
    super(StepwiseParam, self).__init__()
    self.score_name = score_name
    self.mode = mode
    self.role = role
    self.direction = direction
    self.max_step = max_step
    self.nvmin = nvmin
    self.nvmax = nvmax
    self.need_stepwise = need_stepwise

check(self) ¶

Source code in federatedml/param/stepwise_param.py

def check(self):
    model_param_descr = "stepwise param's"
    self.score_name = self.check_and_change_lower(self.score_name, ["aic", "bic"], model_param_descr)
    self.check_valid_value(self.mode, model_param_descr, valid_values=[consts.HOMO, consts.HETERO])
    self.check_valid_value(self.role, model_param_descr, valid_values=[consts.HOST, consts.GUEST, consts.ARBITER])
    self.direction = self.check_and_change_lower(self.direction, ["forward", "backward", "both"], model_param_descr)
    self.check_positive_integer(self.max_step, model_param_descr)
    self.check_positive_integer(self.nvmin, model_param_descr)
    if self.nvmin < 2:
        raise ValueError(model_param_descr + " nvmin must be no less than 2.")
    if self.nvmax is not None:
        self.check_positive_integer(self.nvmax, model_param_descr)
        if self.nvmin > self.nvmax:
            raise ValueError(model_param_descr + " nvmax must be greater than nvmin.")
    self.check_boolean(self.need_stepwise, model_param_descr)

`test` `special` ¶

Modules¶

param_json_test ¶

home_dir ¶

Classes¶


TestParamExtract            (TestCase)

¶

Source code in federatedml/param/test/param_json_test.py

class TestParamExtract(unittest.TestCase):
    def setUp(self):
        self.param = FeatureBinningParam()
        json_config_file = home_dir + '/param_feature_binning.json'
        self.config_path = json_config_file
        with open(json_config_file, 'r', encoding='utf-8') as load_f:
            role_config = json.load(load_f)
        self.config_json = role_config

    # def tearDown(self):
    #     os.system("rm -r " + self.config_path)

    def test_directly_extract(self):
        param_obj = FeatureBinningParam()
        extractor = ParamExtract()
        param_obj = extractor.parse_param_from_config(param_obj, self.config_json)
        self.assertTrue(param_obj.method == "quantile")
        self.assertTrue(param_obj.transform_param.transform_type == 'bin_num')

Methods¶

setUp(self) ¶

Hook method for setting up the test fixture before exercising it.

Source code in federatedml/param/test/param_json_test.py

def setUp(self):
    self.param = FeatureBinningParam()
    json_config_file = home_dir + '/param_feature_binning.json'
    self.config_path = json_config_file
    with open(json_config_file, 'r', encoding='utf-8') as load_f:
        role_config = json.load(load_f)
    self.config_json = role_config

test_directly_extract(self) ¶

Source code in federatedml/param/test/param_json_test.py

def test_directly_extract(self):
    param_obj = FeatureBinningParam()
    extractor = ParamExtract()
    param_obj = extractor.parse_param_from_config(param_obj, self.config_json)
    self.assertTrue(param_obj.method == "quantile")
    self.assertTrue(param_obj.transform_param.transform_type == 'bin_num')

`union_param` ¶

Classes¶


UnionParam            (BaseParam)

¶

Define the union method for combining multiple dTables and keep entries with the same id

Parameters:

Name	Type	Description	Default
`need_run`	`bool, default True`	Indicate if this module needed to be run	`True`
`allow_missing`	`bool, default False`	Whether allow mismatch between feature length and header length in the result. Note that empty tables will always be skipped regardless of this param setting.	`False`
`keep_duplicate`	`bool, default False`	Whether to keep entries with duplicated keys. If set to True, a new id will be generated for duplicated entry in the format {id}_{table_name}.	`False`

Source code in federatedml/param/union_param.py

class UnionParam(BaseParam):
    """
    Define the union method for combining multiple dTables and keep entries with the same id

    Parameters
    ----------
    need_run: bool, default True
        Indicate if this module needed to be run

    allow_missing: bool, default False
        Whether allow mismatch between feature length and header length in the result. Note that empty tables will always be skipped regardless of this param setting.

    keep_duplicate: bool, default False
        Whether to keep entries with duplicated keys. If set to True, a new id will be generated for duplicated entry in the format {id}_{table_name}.
    """

    def __init__(self, need_run=True, allow_missing=False, keep_duplicate=False):
        super().__init__()
        self.need_run = need_run
        self.allow_missing = allow_missing
        self.keep_duplicate = keep_duplicate

    def check(self):
        descr = "union param's "

        if type(self.need_run).__name__ != "bool":
            raise ValueError(
                descr + "need_run {} not supported, should be bool".format(
                    self.need_run))

        if type(self.allow_missing).__name__ != "bool":
            raise ValueError(
                descr + "allow_missing {} not supported, should be bool".format(
                    self.allow_missing))

        if type(self.keep_duplicate).__name__ != "bool":
            raise ValueError(
                descr + "keep_duplicate {} not supported, should be bool".format(
                    self.keep_duplicate))

        LOGGER.info("Finish union parameter check!")
        return True

__init__(self, need_run=True, allow_missing=False, keep_duplicate=False) special ¶

Source code in federatedml/param/union_param.py

def __init__(self, need_run=True, allow_missing=False, keep_duplicate=False):
    super().__init__()
    self.need_run = need_run
    self.allow_missing = allow_missing
    self.keep_duplicate = keep_duplicate

check(self) ¶

Source code in federatedml/param/union_param.py

def check(self):
    descr = "union param's "

    if type(self.need_run).__name__ != "bool":
        raise ValueError(
            descr + "need_run {} not supported, should be bool".format(
                self.need_run))

    if type(self.allow_missing).__name__ != "bool":
        raise ValueError(
            descr + "allow_missing {} not supported, should be bool".format(
                self.allow_missing))

    if type(self.keep_duplicate).__name__ != "bool":
        raise ValueError(
            descr + "keep_duplicate {} not supported, should be bool".format(
                self.keep_duplicate))

    LOGGER.info("Finish union parameter check!")
    return True

Last update: 2021-11-23

Federated Machine Learning¶

Algorithm List¶

Secure Protocol¶

Params¶

param special ¶

__all__ special ¶

Modules¶

base_param ¶

boosting_param ¶

Classes¶

callback_param ¶

Classes¶

column_expand_param ¶

Classes¶

cross_validation_param ¶

Classes¶

data_split_param ¶

Classes¶

data_transform_param ¶

Classes¶

dataio_param ¶

Classes¶

encrypt_param ¶

Classes¶

encrypted_mode_calculation_param ¶

Classes¶

evaluation_param ¶

Classes¶

feature_binning_param ¶

Classes¶

feature_imputation_param ¶

Classes¶

feature_selection_param ¶

Classes¶

feldman_verifiable_sum_param ¶

Classes¶

ftl_param ¶

Classes¶

hetero_kmeans_param ¶

Classes¶

hetero_nn_param ¶

Classes¶

hetero_sshe_lr_param ¶

Classes¶

homo_nn_param ¶

Classes¶

homo_onehot_encoder_param ¶

Classes¶

init_model_param ¶

Classes¶

intersect_param ¶

Classes¶

label_transform_param ¶

Classes¶

linear_regression_param ¶

Classes¶

local_baseline_param ¶

Classes¶

logistic_regression_param ¶

Classes¶

one_vs_rest_param ¶

Classes¶

onehot_encoder_param ¶

Classes¶

pearson_param ¶

Classes¶

poisson_regression_param ¶

Classes¶

predict_param ¶

Classes¶

psi_param ¶

rsa_param ¶

Classes¶

sample_param ¶

Classes¶

sample_weight_param ¶

Classes¶

sbt_feature_transformer_param ¶

Classes¶

scale_param ¶

`param` `special` ¶

`all` `special` ¶

`base_param` ¶

`boosting_param` ¶

`callback_param` ¶

`column_expand_param` ¶

`cross_validation_param` ¶

`data_split_param` ¶

`data_transform_param` ¶

`dataio_param` ¶

`encrypt_param` ¶

`encrypted_mode_calculation_param` ¶

`evaluation_param` ¶

`feature_binning_param` ¶

`feature_imputation_param` ¶

`feature_selection_param` ¶

`feldman_verifiable_sum_param` ¶

`ftl_param` ¶

`hetero_kmeans_param` ¶

`hetero_nn_param` ¶

`hetero_sshe_lr_param` ¶

`homo_nn_param` ¶

`homo_onehot_encoder_param` ¶

`init_model_param` ¶

`intersect_param` ¶

`label_transform_param` ¶

`linear_regression_param` ¶

`local_baseline_param` ¶

`logistic_regression_param` ¶

`one_vs_rest_param` ¶

`onehot_encoder_param` ¶

`pearson_param` ¶

`poisson_regression_param` ¶

`predict_param` ¶

`psi_param` ¶

`rsa_param` ¶

`sample_param` ¶

`sample_weight_param` ¶

`sbt_feature_transformer_param` ¶

`scale_param` ¶

`scorecard_param` ¶

`secure_add_example_param` ¶

`sir_param` ¶

`sqn_param` ¶

`statistics_param` ¶

`stepwise_param` ¶

`test` `special` ¶

`union_param` ¶