Federated Logistic Regression¶
Logistic Regression(LR) is a widely used statistic model for classification problems. FATE provided two kinds of federated LR: Homogeneous LR (HomoLR) and Heterogeneous LR (HeteroLR).
We simplified the federation process into three parties. Party A represents Guest, party B represents Host while party C, which also known as “Arbiter”, is a third party that holds a private key for each party and work as a coordinator.
Homogeneous LR¶
As the name suggested, in HomoLR, the feature spaces of guest and hosts are identical. An optional encryption mode for computing gradients is provided for host parties. By doing this, the plain model is not available for this host any more.
The HomoLR process is shown in Figure 1 (Federated HomoLR Principle). Models of Party A and Party B have the same structure. In each iteration, each party trains its model on its own data. After that, all parties upload their encrypted (or plain, depends on your configuration) gradients to arbiter. The arbiter aggregates these gradients to form a federated gradient that will then be distributed to all parties for updating their local models. Similar to traditional LR, the training process will stop when the federated model converges or the whole training process reaches a predefined max-iteration threshold. More details is available in this [paper]
Heterogeneous LR¶
The HeteroLR carries out the federated learning in a different way. As shown in Figure 2, A sample alignment process is conducted before training. This sample alignment process is to identify overlapping samples stored in databases of the two involved parties. The federated model is built based on those overlapping samples. The whole sample alignment process will not leak confidential information (e.g., sample ids) on the two parties since it is conducted in an encrypted way. Check out [paper 1] for more details.
In the training process, party A and party B compute out the elements needed for final gradients. Arbiter aggregate them and compute out the gradient and then transfer back to each party. Check out the [paper 2] for more details.
Multi-host hetero-lr¶
For multi-host scenario, the gradient computation still keep the same as single-host case. However, we use the second-norm of the difference of model weights between two consecutive iterations as the convergence criterion. Since the arbiter can obtain the completed model weight, the convergence decision is happening in Arbiter.
Param¶
-
class
HeteroLogisticParam
(penalty='L2', tol=0.0001, alpha=1.0, optimizer='rmsprop', batch_size=-1, learning_rate=0.01, init_param=<federatedml.param.init_model_param.InitParam object>, max_iter=100, early_stop='diff', encrypted_mode_calculator_param=<federatedml.param.encrypted_mode_calculation_param.EncryptedModeCalculatorParam object>, predict_param=<federatedml.param.predict_param.PredictParam object>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object>, decay=1, decay_sqrt=True, sqn_param=<federatedml.param.sqn_param.StochasticQuasiNewtonParam object>, multi_class='ovr', validation_freqs=None, early_stopping_rounds=None, metrics=['auc', 'ks'], floating_point_precision=23, use_first_metric_only=False, stepwise_param=<federatedml.param.stepwise_param.StepwiseParam object>)¶
-
class
HomoLogisticParam
(penalty='L2', tol=0.0001, alpha=1.0, optimizer='rmsprop', batch_size=-1, learning_rate=0.01, init_param=<federatedml.param.init_model_param.InitParam object>, max_iter=100, early_stop='diff', encrypt_param=<federatedml.param.encrypt_param.EncryptParam object>, re_encrypt_batches=2, predict_param=<federatedml.param.predict_param.PredictParam object>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object>, decay=1, decay_sqrt=True, aggregate_iters=1, multi_class='ovr', validation_freqs=None, early_stopping_rounds=None, metrics=['auc', 'ks'], use_first_metric_only=False, use_proximal=False, mu=0.1)¶ - Parameters
re_encrypt_batches (int, default: 2) – Required when using encrypted version HomoLR. Since multiple batch updating coefficient may cause overflow error. The model need to be re-encrypt for every several batches. Please be careful when setting this parameter. Too large batches may cause training failure.
aggregate_iters (int, default: 1) – Indicate how many iterations are aggregated once.
use_proximal (bool, default: False) – Whether to turn on additional proximial term. For more details of FedProx, Please refer to https://arxiv.org/abs/1812.06127
mu (float, default 0.1) – To scale the proximal term
-
class
LogisticParam
(penalty='L2', tol=0.0001, alpha=1.0, optimizer='rmsprop', batch_size=-1, learning_rate=0.01, init_param=<federatedml.param.init_model_param.InitParam object>, max_iter=100, early_stop='diff', encrypt_param=<federatedml.param.encrypt_param.EncryptParam object>, predict_param=<federatedml.param.predict_param.PredictParam object>, cv_param=<federatedml.param.cross_validation_param.CrossValidationParam object>, decay=1, decay_sqrt=True, multi_class='ovr', validation_freqs=None, early_stopping_rounds=None, stepwise_param=<federatedml.param.stepwise_param.StepwiseParam object>, floating_point_precision=23, metrics=None, use_first_metric_only=False)¶ Parameters used for Logistic Regression both for Homo mode or Hetero mode.
- Parameters
penalty (str, 'L1', 'L2' or None. default: 'L2') – Penalty method used in LR. Please note that, when using encrypted version in HomoLR, ‘L1’ is not supported.
tol (float, default: 1e-4) – The tolerance of convergence
alpha (float, default: 1.0) – Regularization strength coefficient.
optimizer (str, 'sgd', 'rmsprop', 'adam', 'nesterov_momentum_sgd', 'sqn' or 'adagrad', default: 'rmsprop') – Optimize method, if ‘sqn’ has been set, sqn_param will take effect. Currently, ‘sqn’ support hetero mode only.
batch_size (int, default: -1) – Batch size when updating model. -1 means use all data in a batch. i.e. Not to use mini-batch strategy.
learning_rate (float, default: 0.01) – Learning rate
max_iter (int, default: 100) – The maximum iteration for training.
early_stop (str, 'diff', 'weight_diff' or 'abs', default: 'diff') –
- Method used to judge converge or not.
diff: Use difference of loss between two iterations to judge whether converge.
weight_diff: Use difference between weights of two consecutive iterations
abs: Use the absolute value of loss to judge whether converge. i.e. if loss < eps, it is converged.
Please note that for hetero-lr multi-host situation, this parameter support “weight_diff” only.
decay (int or float, default: 1) – Decay rate for learning rate. learning rate will follow the following decay schedule. lr = lr0/(1+decay*t) if decay_sqrt is False. If decay_sqrt is True, lr = lr0 / sqrt(1+decay*t) where t is the iter number.
decay_sqrt (Bool, default: True) – lr = lr0/(1+decay*t) if decay_sqrt is False, otherwise, lr = lr0 / sqrt(1+decay*t)
encrypt_param (EncryptParam object, default: default EncryptParam object) –
predict_param (PredictParam object, default: default PredictParam object) –
cv_param (CrossValidationParam object, default: default CrossValidationParam object) –
multi_class (str, 'ovr', default: 'ovr') – If it is a multi_class task, indicate what strategy to use. Currently, support ‘ovr’ short for one_vs_rest only.
validation_freqs (int, list, tuple, set, or None) – validation frequency during training.
early_stopping_rounds (int, default: None) – Will stop training if one metric doesn’t improve in last early_stopping_round rounds
metrics (list or None, default: None) – Indicate when executing evaluation during train process, which metrics will be used. If set as empty, default metrics for specific task type will be used. As for binary classification, default metrics are [‘auc’, ‘ks’]
use_first_metric_only (bool, default: False) – Indicate whether use the first metric only for early stopping judgement.
floating_point_precision (None or integer, if not None, use floating_point_precision-bit to speed up calculation,) –
- e.g.: convert an x to round(x * 2**floating_point_precision) during Paillier operation, divide
the result by 2**floating_point_precision in the end.
Features¶
- Both Homo-LR and Hetero-LR
L1 & L2 regularization
Mini-batch mechanism
Weighted training
Six optimization method:
- sgd
gradient descent with arbitrary batch size
- rmsprop
RMSProp
- adam
Adam
- adagrad
AdaGrad
- nesterov_momentum_sgd
Nesterov Momentum
- sqn
stochastic quansi-newton. The algorithm details can refer to [this paper]
Three converge criteria:
- diff
Use difference of loss between two iterations, not available for multi-host training;
- abs
use the absolute value of loss;
- weight_diff
use difference of model weights
Support multi-host modeling task. The detail configuration for multi-host modeling task is located in doc/dsl_conf_setting_guide.html
Support validation for every arbitrary iterations
Learning rate decay mechanism
- Homo-LR extra features
Two Encryption mode
- “Paillier” mode
Host will not get clear text model. When using encryption mode, “sgd” optimizer is supported only.
- Non-encryption mode
Everything is in clear text.
Secure aggregation mechanism used when more aggregating models
Support aggregate for every arbitrary iterations.
Support FedProx mechanism. For more details, please refer to [this paper]
- Hetero-LR extra features
Support different encrypt-mode to balance speed and security
Support OneVeRest
When modeling a multi-host task, “weight_diff” converge criteria is supported only.
Support sparse format data
Support early-stopping mechanism
Support setting arbitrary metrics for validation during training
Support stepwise. For details on stepwise mode, please refer [stepwise].