Tree Models¶
Hetero SecureBoost¶
Gradient Boosting Decision Tree(GBDT) is a widely used statistic model for classification and regression problems. FATE provides a novel lossless privacy-preserving tree-boosting system known as SecureBoost: A Lossless Federated Learning Framework.
This federated learning system allows a learning process to be jointly conducted over multiple parties with partially common user samples but different feature sets, which corresponds to a vertically partitioned data set. An advantage of SecureBoost is that it provides the same level of accuracy as the non privacy-preserving approach while revealing no information on private data.
The following figure shows the proposed Federated SecureBoost framework.
-
Active Party
We define the active party as the data provider who holds both a data matrix and the class label. Since the class label information is indispensable for supervised learning, there must be an active party with access to the label y. The active party naturally takes the responsibility as a dominating server in federated learning.
-
Passive Party
We define the data provider which has only a data matrix as a passive party. Passive parties play the role of clients in the federated learning setting. They are also in need of building a model to predict the class label y for their prediction purposes. Thus they must collaborate with the active party to build their model to predict y for their future users using their own features.
We align the data samples under an encryption scheme by using the privacy-preserving protocol for inter-database intersections to find the common shared users or data samples across the parties without compromising the non-shared parts of the user sets.
To ensure security, passive parties cannot get access to gradient and hessian directly. We use a "XGBoost" like tree-learning algorithm. In order to keep gradient and hessian confidential, we require that the active party encrypt gradient and hessian before sending them to passive parties. After encrypted the gradient and hessian, active party will send the encrypted [gradient] and [hessian] to passive party. Each passive party uses [gradient] and [hessian] to calculate the encrypted feature histograms, then encodes the (feature, split_bin_val) and constructs a (feature, split_bin_val) lookup table; it then sends the encoded value of (feature, split_bin_val) with feature histograms to the active party. After receiving the feature histograms from passive parties, the active party decrypts them and finds the best gains. If the best-gain feature belongs to a passive party, the active party sends the encoded (feature, split_bin_val) to back to the owner party. The following figure shows the process of finding split in federated tree building.
The parties continue the split finding process until tree construction finishes. Each party only knows the detailed split information of the tree nodes where the split features are provided by the party. The following figure shows the final structure of a single decision tree.
To use the learned model to classify a new instance, the active party first judges where current tree node belongs to. If the current tree belongs to the active party, then it can use its (feature, split_bin_val) lookup table to decide whether going to left child node or right; otherwise, the active party sends the node id to designated passive party, the passive party checks its lookup table and sends back which branch should the current node goes to. This process stops until the current node is a leave. The following figure shows the federated inference process.
By following the SecureBoost framework, multiple parties can jointly build tree ensemble model without leaking privacy in federated learning. If you want to learn more about the algorithm, you can read the paper attached above.
Optimization in Parallel Learning¶
SecureBoost uses data parallel learning algorithm to build the decision trees in every party. The procedure of the data parallel algorithm in each party is:
- Every party use mapPartitions API interface to generate feature-histograms of each partition of data.
- Use reduce API interface to merge global histograms from all local feature-histograms
- Find the best splits from merged global histograms by federated learning, then perform splits.
Applications¶
Hetero SecureBoost supports the following applications.
- binary classification, the objective function is sigmoid cross-entropy
- multi classification, the objective function is softmax cross-entropy
- regression, objective function includes least-squared-error-loss、least-absolutely-error-loss、huber-loss、 tweedie-loss、fair-loss、 log-cosh-loss
Other features¶
- Column sub-sample
- Controlling the number of nodes to split parallelly at each layer by setting max_split_nodes parameter,in order to avoid memory limit exceeding
- Support feature importance calculation
- Support Multi-host and single guest to build model
- Support different encrypt-mode to balance speed and security
- Support missing value in train and predict process
- Support evaluate training and validate data during training process
- Support another homomorphic encryption method called "Iterative
- Support early stopping in FATE-1.4, to use early stopping, see Boosting Tree Param
- Support sparse data optimization in FATE-1.5. You can activate it by setting "sparse_optimization" as true in conf. Notice that this feature may increase memory consumption. See here.
- Support feature subsample random seed setting in FATE-1.5
- Support feature binning error setting
- Support GOSS sampling in FATE-1.6
- Support cipher compressing and g, h packing in FATE-1.7
Homo SecureBoost¶
Unlike Hetero Secureboost, Homo SecureBoost is conducted under a different setting. In homo SecureBoost, every participant(clients) holds data that shares the same feature space, and jointly train a GBDT model without leaking any data sample.
The figure below shows the overall framework of the homo SecureBoost algorithm.
-
Client
Clients are the participants who hold their labeled samples. Samples from all client parties have the same feature space. Clients are to build a more powerful model together without leaking local samples, and they share the same trained model after learning. -
Server
There are potentials of data leakage if all participants send its local histogram(which contains sum of gradient and hessian) to each other because sometimes features and labels can be inferred from gradient sums and hessian sums. Thus, to ensure security in the learning process, the Server uses secure aggregation to aggregate all participants' local histograms in a safe manner. The server can get a global histogram while not getting any local histogram and then find and broadcast the best splits to clients. Server collaborates with all clients in the learning process.
The key steps of learning a Homo SecureBoost model are described below:
- Clients and Server initialize local settings. Clients and Server apply homo feature binning to get binning points for all features and then to pre-process local samples.
-
Clients and Server build a decision tree collaboratively:
a. Clients compute local histograms for cur leaf nodes (left nodes or root node)
b. The server applies secure aggregations: every local histogram plus a random number, and these numbers can cancel each other out. By this way server can get the global histogram without knowing any local histograms and data leakage is prevented. Figure below shows how histogram secure aggregations are conducted.
c. The server commit histogram subtractions: getting the right node histograms by subtracting left node local histogram from parent node histogram. Then, the server will find the best splits points and broadcast them to clients.
d. After getting the best split points, clients build the next layer for the current decision tree and re-assign samples. If current decision tree reaches the max depth or stop conditions are fulfilled, stop build the current tree, else go back to step (1). Figure below shows the procedure of fitting a decision tree.
-
If tree number reaches the max number, or loss is converged, Homo SecureBoost Fitting process stops.
By following the steps above clients are able to jointly build a GBDT model. Every client can then conduct inference on a new instance locally.
Optimization in learning¶
Homo SecureBoost utilizes data parallelization and histogram subtraction to accelerate the learning process.
- Every party use mapPartitions and reduce API interface to generate local feature-histograms, only samples in left nodes are used in computing feature histograms.
- The server aggregates all local histograms to get global histograms then get sibling histograms by subtracting left node histograms from parent histograms.
- The server finds the best splits from merged global histograms, then broadcast best splits.
- The computational cost and transmission cost are halved by using node subtraction.
Applications¶
Homo SecureBoost supports the following applications:
- binary classification, the objective function is sigmoid cross-entropy
- multi classification, the objective function is softmax cross-entropy
- regression, objective function includes least-squared-error-loss, least-absolutely-error-loss, huber-loss, tweedie-loss, fair-loss, log-cosh-loss
Other features¶
- Server uses safe aggregations to aggregate clients' histograms and losses, ensuring data security
- Column sub-sample
- Controlling the number of nodes to split parallelly at each layer by setting max_split_nodes parameter,in order to avoid memory limit exceeding
- Support feature importance calculation
- Support Multi-host and single guest to build model
- Support missing value in train and predict process
- Support evaluate training and validate data during training process
- Support feature subsample random seed setting in FATE-1.5
- Support feature binning error setting.
Param¶
boosting_param
¶
Attributes¶
hetero_deprecated_param_list = ['early_stopping_rounds', 'validation_freqs', 'metrics', 'use_first_metric_only']
module-attribute
¶
homo_deprecated_param_list = ['validation_freqs', 'metrics']
module-attribute
¶
Classes¶
ObjectiveParam(objective='cross_entropy', params=None)
¶
Bases: BaseParam
Define objective parameters that used in federated ml.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
objective |
None in host's config, should be str in guest'config. when task_type is classification, only support 'cross_entropy', other 6 types support in regression task |
None
|
|
params |
None or list
|
should be non empty list when objective is 'tweedie','fair','huber', first element of list shoulf be a float-number large than 0.0 when objective is 'fair', 'huber', first element of list should be a float-number in [1.0, 2.0) when objective is 'tweedie' |
None
|
Source code in python/federatedml/param/boosting_param.py
50 51 52 |
|
Attributes¶
objective = objective
instance-attribute
¶params = params
instance-attribute
¶Functions¶
check(task_type=None)
¶Source code in python/federatedml/param/boosting_param.py
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
|
DecisionTreeParam(criterion_method='xgboost', criterion_params=[0.1, 0], max_depth=3, min_sample_split=2, min_impurity_split=0.001, min_leaf_node=1, max_split_nodes=consts.MAX_SPLIT_NODES, feature_importance_type='split', n_iter_no_change=True, tol=0.001, min_child_weight=0, use_missing=False, zero_as_missing=False, deterministic=False)
¶
Bases: BaseParam
Define decision tree parameters that used in federated ml.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
criterion_method |
the criterion function to use |
"xgboost"
|
|
criterion_params |
should be non empty and elements are float-numbers, if a list is offered, the first one is l2 regularization value, and the second one is l1 regularization value. if a dict is offered, make sure it contains key 'l1', and 'l2'. l1, l2 regularization values are non-negative floats. default: [0.1, 0] or {'l1':0, 'l2':0,1} |
[0.1, 0]
|
|
max_depth |
the max depth of a decision tree, default: 3 |
3
|
|
min_sample_split |
least quantity of nodes to split, default: 2 |
2
|
|
min_impurity_split |
least gain of a single split need to reach, default: 1e-3 |
0.001
|
|
min_child_weight |
sum of hessian needed in child nodes. default is 0 |
0
|
|
min_leaf_node |
when samples no more than min_leaf_node, it becomes a leave, default: 1 |
1
|
|
max_split_nodes |
we will use no more than max_split_nodes to parallel finding their splits in a batch, for memory consideration. default is 65536 |
consts.MAX_SPLIT_NODES
|
|
feature_importance_type |
if is 'split', feature_importances calculate by feature split times, if is 'gain', feature_importances calculate by feature split gain. default: 'split' Due to the safety concern, we adjust training strategy of Hetero-SBT in FATE-1.8, When running Hetero-SBT, this parameter is now abandoned. In Hetero-SBT of FATE-1.8, guest side will compute split, gain of local features, and receive anonymous feature importance results from hosts. Hosts will compute split importance of local features. |
'split'
|
|
use_missing |
use missing value in training process or not. |
False
|
|
zero_as_missing |
regard 0 as missing value or not, will be use only if use_missing=True, default: False |
False
|
|
deterministic |
ensure stability when computing histogram. Set this to true to ensure stable result when using same data and same parameter. But it may slow down computation. |
False
|
Source code in python/federatedml/param/boosting_param.py
144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 |
|
Attributes¶
criterion_method = criterion_method
instance-attribute
¶criterion_params = criterion_params
instance-attribute
¶max_depth = max_depth
instance-attribute
¶min_sample_split = min_sample_split
instance-attribute
¶min_impurity_split = min_impurity_split
instance-attribute
¶min_leaf_node = min_leaf_node
instance-attribute
¶min_child_weight = min_child_weight
instance-attribute
¶max_split_nodes = max_split_nodes
instance-attribute
¶feature_importance_type = feature_importance_type
instance-attribute
¶n_iter_no_change = n_iter_no_change
instance-attribute
¶tol = tol
instance-attribute
¶use_missing = use_missing
instance-attribute
¶zero_as_missing = zero_as_missing
instance-attribute
¶deterministic = deterministic
instance-attribute
¶Functions¶
check()
¶Source code in python/federatedml/param/boosting_param.py
167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 |
|
BoostingParam(task_type=consts.CLASSIFICATION, objective_param=ObjectiveParam(), learning_rate=0.3, num_trees=5, subsample_feature_rate=1, n_iter_no_change=True, tol=0.0001, bin_num=32, predict_param=PredictParam(), cv_param=CrossValidationParam(), validation_freqs=None, metrics=None, random_seed=100, binning_error=consts.DEFAULT_RELATIVE_ERROR)
¶
Bases: BaseParam
Basic parameter for Boosting Algorithms
Parameters:
Name | Type | Description | Default |
---|---|---|---|
task_type |
task type |
'classification'
|
|
objective_param |
ObjectiveParam Object, default
|
objective param |
ObjectiveParam()
|
learning_rate |
float, int or long
|
the learning rate of secure boost. default: 0.3 |
0.3
|
num_trees |
int or float
|
the max number of boosting round. default: 5 |
5
|
subsample_feature_rate |
float
|
a float-number in [0, 1], default: 1.0 |
1
|
n_iter_no_change |
bool
|
when True and residual error less than tol, tree building process will stop. default: True |
True
|
bin_num |
bin number use in quantile. default: 32 |
32
|
|
validation_freqs |
Do validation in training process or Not. if equals None, will not do validation in train process; if equals positive integer, will validate data every validation_freqs epochs passes; if container object in python, will validate data if epochs belong to this container. e.g. validation_freqs = [10, 15], will validate data when epoch equals to 10 and 15. Default: None |
None
|
Source code in python/federatedml/param/boosting_param.py
259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 |
|
Attributes¶
task_type = task_type
instance-attribute
¶objective_param = copy.deepcopy(objective_param)
instance-attribute
¶learning_rate = learning_rate
instance-attribute
¶num_trees = num_trees
instance-attribute
¶subsample_feature_rate = subsample_feature_rate
instance-attribute
¶n_iter_no_change = n_iter_no_change
instance-attribute
¶tol = tol
instance-attribute
¶bin_num = bin_num
instance-attribute
¶predict_param = copy.deepcopy(predict_param)
instance-attribute
¶cv_param = copy.deepcopy(cv_param)
instance-attribute
¶validation_freqs = validation_freqs
instance-attribute
¶metrics = metrics
instance-attribute
¶random_seed = random_seed
instance-attribute
¶binning_error = binning_error
instance-attribute
¶Functions¶
check()
¶Source code in python/federatedml/param/boosting_param.py
284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 |
|
HeteroBoostingParam(task_type=consts.CLASSIFICATION, objective_param=ObjectiveParam(), learning_rate=0.3, num_trees=5, subsample_feature_rate=1, n_iter_no_change=True, tol=0.0001, encrypt_param=EncryptParam(), bin_num=32, encrypted_mode_calculator_param=EncryptedModeCalculatorParam(), predict_param=PredictParam(), cv_param=CrossValidationParam(), validation_freqs=None, early_stopping_rounds=None, metrics=None, use_first_metric_only=False, random_seed=100, binning_error=consts.DEFAULT_RELATIVE_ERROR)
¶
Bases: BoostingParam
Parameters:
Name | Type | Description | Default |
---|---|---|---|
encrypt_param |
EncodeParam Object
|
encrypt method use in secure boost, default: EncryptParam() |
EncryptParam()
|
encrypted_mode_calculator_param |
the calculation mode use in secureboost, default: EncryptedModeCalculatorParam() |
EncryptedModeCalculatorParam()
|
Source code in python/federatedml/param/boosting_param.py
345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 |
|
Attributes¶
encrypt_param = copy.deepcopy(encrypt_param)
instance-attribute
¶encrypted_mode_calculator_param = copy.deepcopy(encrypted_mode_calculator_param)
instance-attribute
¶early_stopping_rounds = early_stopping_rounds
instance-attribute
¶use_first_metric_only = use_first_metric_only
instance-attribute
¶Functions¶
check()
¶Source code in python/federatedml/param/boosting_param.py
366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 |
|
HeteroSecureBoostParam(tree_param=DecisionTreeParam(), task_type=consts.CLASSIFICATION, objective_param=ObjectiveParam(), learning_rate=0.3, num_trees=5, subsample_feature_rate=1.0, n_iter_no_change=True, tol=0.0001, encrypt_param=EncryptParam(), bin_num=32, encrypted_mode_calculator_param=EncryptedModeCalculatorParam(), predict_param=PredictParam(), cv_param=CrossValidationParam(), validation_freqs=None, early_stopping_rounds=None, use_missing=False, zero_as_missing=False, complete_secure=0, metrics=None, use_first_metric_only=False, random_seed=100, binning_error=consts.DEFAULT_RELATIVE_ERROR, sparse_optimization=False, run_goss=False, top_rate=0.2, other_rate=0.1, cipher_compress_error=None, cipher_compress=True, new_ver=True, boosting_strategy=consts.STD_TREE, work_mode=None, tree_num_per_party=1, guest_depth=2, host_depth=3, callback_param=CallbackParam(), multi_mode=consts.SINGLE_OUTPUT, EINI_inference=False, EINI_random_mask=False, EINI_complexity_check=False)
¶
Bases: HeteroBoostingParam
Define boosting tree parameters that used in federated ml.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
task_type |
task type |
'classification'
|
|
tree_param |
DecisionTreeParam Object, default
|
tree param |
DecisionTreeParam()
|
objective_param |
ObjectiveParam Object, default
|
objective param |
ObjectiveParam()
|
learning_rate |
float, int or long
|
the learning rate of secure boost. default: 0.3 |
0.3
|
num_trees |
int or float
|
the max number of trees to build. default: 5 |
5
|
subsample_feature_rate |
float
|
a float-number in [0, 1], default: 1.0 |
1.0
|
random_seed |
seed that controls all random functions |
100
|
|
n_iter_no_change |
bool
|
when True and residual error less than tol, tree building process will stop. default: True |
True
|
encrypt_param |
EncodeParam Object
|
encrypt method use in secure boost, default: EncryptParam(), this parameter is only for hetero-secureboost |
EncryptParam()
|
bin_num |
bin number use in quantile. default: 32 |
32
|
|
encrypted_mode_calculator_param |
the calculation mode use in secureboost, default: EncryptedModeCalculatorParam(), only for hetero-secureboost |
EncryptedModeCalculatorParam()
|
|
use_missing |
use missing value in training process or not. default: False |
False
|
|
zero_as_missing |
regard 0 as missing value or not, will be use only if use_missing=True, default: False |
False
|
|
validation_freqs |
Do validation in training process or Not. if equals None, will not do validation in train process; if equals positive integer, will validate data every validation_freqs epochs passes; if container object in python, will validate data if epochs belong to this container. e.g. validation_freqs = [10, 15], will validate data when epoch equals to 10 and 15. Default: None The default value is None, 1 is suggested. You can set it to a number larger than 1 in order to speed up training by skipping validation rounds. When it is larger than 1, a number which is divisible by "num_trees" is recommended, otherwise, you will miss the validation scores of last training iteration. |
None
|
|
early_stopping_rounds |
will stop training if one metric of one validation data doesn’t improve in last early_stopping_round rounds, need to set validation freqs and will check early_stopping every at every validation epoch, |
None
|
|
metrics |
Specify which metrics to be used when performing evaluation during training process. If set as empty, default metrics will be used. For regression tasks, default metrics are ['root_mean_squared_error', 'mean_absolute_error'], For binary-classificatiin tasks, default metrics are ['auc', 'ks']. For multi-classification tasks, default metrics are ['accuracy', 'precision', 'recall'] |
None
|
|
use_first_metric_only |
use only the first metric for early stopping |
False
|
|
complete_secure |
if use complete_secure, when use complete secure, build first 'complete secure' tree using only guest features |
0
|
|
sparse_optimization |
this parameter is abandoned in FATE-1.7.1 |
False
|
|
run_goss |
activate Gradient-based One-Side Sampling, which selects large gradient and small gradient samples using top_rate and other_rate. |
False
|
|
top_rate |
0.2
|
||
other_rate |
0.1
|
||
cipher_compress_error |
None
|
||
cipher_compress |
True
|
||
boosting_strategy |
use guest features, the second k trees use host features, and so on layered: only support 2 party, when running layered mode, first 'host_depth' layer will use host features, and then next 'guest_depth' will only use guest features |
consts.STD_TREE
|
|
work_mode |
This parameter has the same function as boosting_strategy, but is deprecated |
None
|
|
tree_num_per_party |
param is valid when boosting_strategy is mix |
1
|
|
guest_depth |
is layered |
2
|
|
host_depth |
layered |
3
|
|
multi_mode |
single_output standard gbdt multi-classification strategy multi_output every leaf give a multi-dimension predict, using multi_mode can save time by learning a model with less trees. |
consts.SINGLE_OUTPUT
|
|
EINI_inference |
default is False, this option changes the inference algorithm used in predict tasks. a secure prediction method that hides decision path to enhance security in the inference step. This method is insprired by EINI inference algorithm. |
False
|
|
EINI_random_mask |
default is False multiply predict result by a random float number to confuse original predict result. This operation further enhances the security of naive EINI algorithm. |
False
|
|
EINI_complexity_check |
default is False check the complexity of tree models when running EINI algorithms. Complexity models are easy to hide their decision path, while simple tree models are not, therefore if a tree model is too simple, it is not allowed to run EINI predict algorithms. |
False
|
Source code in python/federatedml/param/boosting_param.py
487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 |
|
Attributes¶
tree_param = copy.deepcopy(tree_param)
instance-attribute
¶zero_as_missing = zero_as_missing
instance-attribute
¶use_missing = use_missing
instance-attribute
¶complete_secure = complete_secure
instance-attribute
¶sparse_optimization = sparse_optimization
instance-attribute
¶run_goss = run_goss
instance-attribute
¶top_rate = top_rate
instance-attribute
¶other_rate = other_rate
instance-attribute
¶cipher_compress_error = cipher_compress_error
instance-attribute
¶cipher_compress = cipher_compress
instance-attribute
¶new_ver = new_ver
instance-attribute
¶EINI_inference = EINI_inference
instance-attribute
¶EINI_random_mask = EINI_random_mask
instance-attribute
¶EINI_complexity_check = EINI_complexity_check
instance-attribute
¶boosting_strategy = boosting_strategy
instance-attribute
¶work_mode = work_mode
instance-attribute
¶tree_num_per_party = tree_num_per_party
instance-attribute
¶guest_depth = guest_depth
instance-attribute
¶host_depth = host_depth
instance-attribute
¶callback_param = copy.deepcopy(callback_param)
instance-attribute
¶multi_mode = multi_mode
instance-attribute
¶Functions¶
check()
¶Source code in python/federatedml/param/boosting_param.py
533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 |
|
HomoSecureBoostParam(tree_param=DecisionTreeParam(), task_type=consts.CLASSIFICATION, objective_param=ObjectiveParam(), learning_rate=0.3, num_trees=5, subsample_feature_rate=1, n_iter_no_change=True, tol=0.0001, bin_num=32, predict_param=PredictParam(), cv_param=CrossValidationParam(), validation_freqs=None, use_missing=False, zero_as_missing=False, random_seed=100, binning_error=consts.DEFAULT_RELATIVE_ERROR, backend=consts.DISTRIBUTED_BACKEND, callback_param=CallbackParam(), multi_mode=consts.SINGLE_OUTPUT)
¶
Bases: BoostingParam
Parameters:
Name | Type | Description | Default |
---|---|---|---|
backend |
decides which backend to use when computing histograms for homo-sbt |
consts.DISTRIBUTED_BACKEND
|
Source code in python/federatedml/param/boosting_param.py
617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 |
|
Attributes¶
use_missing = use_missing
instance-attribute
¶zero_as_missing = zero_as_missing
instance-attribute
¶tree_param = copy.deepcopy(tree_param)
instance-attribute
¶backend = backend
instance-attribute
¶callback_param = copy.deepcopy(callback_param)
instance-attribute
¶multi_mode = multi_mode
instance-attribute
¶Functions¶
check()
¶Source code in python/federatedml/param/boosting_param.py
646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 |
|
Functions¶
Hetero Complete Secureboost¶
Now Hetero SecureBoost adds a new option: complete_secure. Once enabled, the boosting model will only use guest features to build the first decision tree. This can avoid label leakages, accord to SecureBoost: A Lossless Federated Learning Framework.
Examples¶
Example
## Hetero SecureBoost Configuration Usage Guide.
#### Example Tasks.
1. Binary-Class:
example-data: (1) guest: breast_hetero_guest.csv (2) host: breast_hetero_host.csv
dsl: test_secureboost_train_dsl.json
runtime_config: test_secureboost_train_binary_conf.json
2. Multi-Class:
example-data: (1) guest: vehicle_scale_hetero_guest.csv
(2) host: vehicle_scale_hetero_host.csv
dsl: test_secureboost_train_dsl.json
runtime_config: test_secureboost_train_multi_conf.json
3. Regression:
example-data: (1) guest: student_hetero_guest.csv
(2) host: student_hetero_host.csv
dsl: test_secureboost_train_dsl.json
runtime_config: test_secureboost_train_regression_conf.json
4. Multi-Host Regression
example-data: (1) guest: motor_hetero_guest.csv
(2) host1: motor_hetero_host_1.csv;
(3) host2: motor_hetero_host_2.csv
dsl: test_secureboost_train_dsl.json
runtime_config: test_secureboost_train_regression_multi_host_conf.json
5. Binary-Class With Missing Value
example-data: (1) guest: ionosphere_scale_hetero_guest.csv
(2) host: ionosphere_scale_hetero_host.csv
dsl: test_secureboost_train_dsl.json
runtime_config: test_secureboost_train_binary_with_missing_value_conf.json
This example also contains another two feature since FATE-1.1.
(1) evaluate data during training process, check the "validation_freqs" field in runtime_config
6. Early stopping example
example-data: (1) guest: student_hetero_guest.csv
(2) host: student_hetero_host.csv
dsl: test_secureboost_train_dsl.json
runtime_config: test_secureboost_train_with_early_stopping_conf.json
#### Cross Validation Class
1. Binary-Class:
example-data: (1) guest: breast_hetero_guest.csv
(2) host: breast_hetero_guest.csv
dsl: test_secureboost_cross_validation_dsl.json
runtime_config: test_secureboost_cross_validation_binary_conf.json
2. Multi-Class:
example-data: (1) guest: vehicle_scale_hetero_guest.csv
(2) host: vehicle_scal_a.csv
dsl: test_secureboost_cross_validation_binary_conf.json
runtime_config: test_secureboost_cross_validation_multi_conf.json
3. Regression:
example-data: (1) guest: student_hetero_guest.csv
(2) host: student_hetero_host.csv
dsl: test_secureboost_cross_validation_dsl.json
runtime_config: test_secureboost_cross_validation_regression_conf.json
Users can use following commands to run a task.
flow job submit -c ${runtime_config} -d ${dsl}
Moreover, after successfully running the training task, you can use it to predict too.
test_secureboost_train_complete_secure_conf.json
{
"dsl_version": 2,
"initiator": {
"role": "guest",
"party_id": 9999
},
"role": {
"host": [
9998
],
"guest": [
9999
]
},
"component_parameters": {
"common": {
"hetero_secure_boost_0": {
"task_type": "classification",
"objective_param": {
"objective": "cross_entropy"
},
"num_trees": 3,
"validation_freqs": 1,
"encrypt_param": {
"method": "Paillier"
},
"tree_param": {
"max_depth": 3
},
"complete_secure": true
},
"evaluation_0": {
"eval_type": "binary"
}
},
"role": {
"guest": {
"0": {
"data_transform_0": {
"with_label": true,
"output_format": "dense"
},
"data_transform_1": {
"with_label": true,
"output_format": "dense"
},
"reader_0": {
"table": {
"name": "breast_hetero_guest",
"namespace": "experiment"
}
},
"reader_1": {
"table": {
"name": "breast_hetero_guest",
"namespace": "experiment"
}
}
}
},
"host": {
"0": {
"data_transform_0": {
"with_label": false
},
"data_transform_1": {
"with_label": false
},
"reader_0": {
"table": {
"name": "breast_hetero_host",
"namespace": "experiment"
}
},
"reader_1": {
"table": {
"name": "breast_hetero_host",
"namespace": "experiment"
}
}
}
}
}
}
}
test_secureboost_cross_validation_dsl.json
{
"components": {
"reader_0": {
"module": "Reader",
"output": {
"data": [
"data"
]
}
},
"data_transform_0": {
"module": "DataTransform",
"input": {
"data": {
"data": [
"reader_0.data"
]
}
},
"output": {
"data": [
"data"
],
"model": [
"model"
]
}
},
"intersection_0": {
"module": "Intersection",
"input": {
"data": {
"data": [
"data_transform_0.data"
]
}
},
"output": {
"data": [
"data"
]
}
},
"hetero_secure_boost_0": {
"module": "HeteroSecureBoost",
"input": {
"data": {
"train_data": [
"intersection_0.data"
]
}
},
"output": {
"data": [
"data"
],
"model": [
"model"
]
}
}
}
}
test_secureboost_train_regression_conf.json
{
"dsl_version": 2,
"initiator": {
"role": "guest",
"party_id": 9999
},
"role": {
"host": [
9998
],
"guest": [
9999
]
},
"component_parameters": {
"common": {
"hetero_secure_boost_0": {
"task_type": "regression",
"objective_param": {
"objective": "lse"
},
"num_trees": 3,
"validation_freqs": 1,
"encrypt_param": {
"method": "Paillier"
},
"tree_param": {
"max_depth": 3
}
},
"evaluation_0": {
"eval_type": "regression"
}
},
"role": {
"host": {
"0": {
"data_transform_0": {
"with_label": false
},
"reader_1": {
"table": {
"name": "student_hetero_host",
"namespace": "experiment"
}
},
"data_transform_1": {
"with_label": false
},
"reader_0": {
"table": {
"name": "student_hetero_host",
"namespace": "experiment"
}
}
}
},
"guest": {
"0": {
"data_transform_0": {
"with_label": true,
"label_type": "float",
"output_format": "dense"
},
"reader_1": {
"table": {
"name": "student_hetero_guest",
"namespace": "experiment"
}
},
"data_transform_1": {
"with_label": true,
"label_type": "float",
"output_format": "dense"
},
"reader_0": {
"table": {
"name": "student_hetero_guest",
"namespace": "experiment"
}
}
}
}
}
}
}
test_EINI_predict_conf.json
{
"dsl_version": 2,
"initiator": {
"role": "guest",
"party_id": 9999
},
"role": {
"host": [
10000
],
"guest": [
9999
]
},
"job_parameters": {
"common": {
"model_id": "guest-10000#host-9999#model",
"model_version": "20200928174750711017114",
"job_type": "predict"
}
},
"component_parameters": {
"common": {"hetero_secure_boost_0":{"EINI_inference": true, "EINI_random_mask": true}},
"role": {
"guest": {
"0": {
"reader_0": {
"table": {
"name": "breast_hetero_guest",
"namespace": "experiment"
}
}
}
},
"host": {
"0": {
"reader_0": {
"table": {
"name": "breast_hetero_host",
"namespace": "experiment"
}
}
}
}
}
}
}
test_secureboost_train_multi_conf.json
{
"dsl_version": 2,
"initiator": {
"role": "guest",
"party_id": 9999
},
"role": {
"host": [
9998
],
"guest": [
9999
]
},
"component_parameters": {
"common": {
"hetero_secure_boost_0": {
"task_type": "classification",
"objective_param": {
"objective": "cross_entropy"
},
"num_trees": 3,
"validation_freqs": 1,
"encrypt_param": {
"method": "Paillier"
},
"tree_param": {
"max_depth": 3
}
},
"evaluation_0": {
"eval_type": "multi"
}
},
"role": {
"host": {
"0": {
"reader_1": {
"table": {
"name": "vehicle_scale_hetero_host",
"namespace": "experiment"
}
},
"data_transform_0": {
"with_label": false
},
"data_transform_1": {
"with_label": false
},
"reader_0": {
"table": {
"name": "vehicle_scale_hetero_host",
"namespace": "experiment"
}
}
}
},
"guest": {
"0": {
"reader_1": {
"table": {
"name": "vehicle_scale_hetero_guest",
"namespace": "experiment"
}
},
"data_transform_0": {
"with_label": true,
"output_format": "dense"
},
"data_transform_1": {
"with_label": true,
"output_format": "dense"
},
"reader_0": {
"table": {
"name": "vehicle_scale_hetero_guest",
"namespace": "experiment"
}
}
}
}
}
}
}
test_secureboost_train_dsl.json
{
"components": {
"reader_0": {
"module": "Reader",
"output": {
"data": [
"data"
]
}
},
"reader_1": {
"module": "Reader",
"output": {
"data": [
"data"
]
}
},
"data_transform_0": {
"module": "DataTransform",
"input": {
"data": {
"data": [
"reader_0.data"
]
}
},
"output": {
"data": [
"data"
],
"model": [
"model"
]
}
},
"data_transform_1": {
"module": "DataTransform",
"input": {
"data": {
"data": [
"reader_1.data"
]
},
"model": [
"data_transform_0.model"
]
},
"output": {
"data": [
"data"
],
"model": [
"model"
]
}
},
"intersection_0": {
"module": "Intersection",
"input": {
"data": {
"data": [
"data_transform_0.data"
]
}
},
"output": {
"data": [
"data"
]
}
},
"intersection_1": {
"module": "Intersection",
"input": {
"data": {
"data": [
"data_transform_1.data"
]
}
},
"output": {
"data": [
"data"
]
}
},
"hetero_secure_boost_0": {
"module": "HeteroSecureBoost",
"input": {
"data": {
"train_data": [
"intersection_0.data"
],
"validate_data": [
"intersection_1.data"
]
}
},
"output": {
"data": [
"data"
],
"model": [
"model"
]
}
},
"evaluation_0": {
"module": "Evaluation",
"input": {
"data": {
"data": [
"hetero_secure_boost_0.data"
]
}
},
"output": {
"data": [
"data"
]
}
}
}
}
test_secureboost_warm_start_conf.json
{
"dsl_version": 2,
"initiator": {
"role": "guest",
"party_id": 9999
},
"role": {
"arbiter": [
9999
],
"host": [
10000
],
"guest": [
9999
]
},
"component_parameters": {
"role": {
"host": {
"0": {
"data_transform_0": {
"with_label": false
},
"reader_0": {
"table": {
"name": "breast_hetero_host",
"namespace": "experiment"
}
}
}
},
"guest": {
"0": {
"data_transform_0": {
"with_label": true
},
"reader_0": {
"table": {
"name": "breast_hetero_guest",
"namespace": "experiment"
}
}
}
}
},
"common": {
"data_transform_0": {
"output_format": "dense"
},
"hetero_secure_boost_0": {
"task_type": "classification",
"objective_param": {
"objective": "cross_entropy"
},
"num_trees": 3,
"encrypt_param": {
"method": "Paillier"
},
"tree_param": {
"max_depth": 3
}
},
"hetero_secure_boost_1": {
"task_type": "classification",
"objective_param": {
"objective": "cross_entropy"
},
"num_trees": 3,
"encrypt_param": {
"method": "Paillier"
},
"tree_param": {
"max_depth": 3
},
"callback_param": {
"callbacks": [
"PerformanceEvaluate"
],
"validation_freqs": 1
}
},
"evaluation_0": {
"eval_type": "binary"
}
}
}
}
test_secureboost_train_binary_with_missing_value_conf.json
{
"dsl_version": 2,
"initiator": {
"role": "guest",
"party_id": 9999
},
"role": {
"host": [
9998
],
"guest": [
9999
]
},
"component_parameters": {
"common": {
"hetero_secure_boost_0": {
"task_type": "classification",
"objective_param": {
"objective": "cross_entropy"
},
"num_trees": 3,
"validation_freqs": 1,
"encrypt_param": {
"method": "Paillier"
},
"tree_param": {
"max_depth": 3
},
"use_missing": true
},
"evaluation_0": {
"eval_type": "binary"
}
},
"role": {
"guest": {
"0": {
"reader_1": {
"table": {
"name": "ionosphere_scale_hetero_guest",
"namespace": "experiment"
}
},
"data_transform_0": {
"with_label": true,
"label_name": "label",
"label_type": "int",
"output_format": "dense"
},
"data_transform_1": {
"with_label": true,
"label_name": "label",
"label_type": "int",
"output_format": "dense"
},
"reader_0": {
"table": {
"name": "ionosphere_scale_hetero_guest",
"namespace": "experiment"
}
}
}
},
"host": {
"0": {
"reader_1": {
"table": {
"name": "ionosphere_scale_hetero_host",
"namespace": "experiment"
}
},
"data_transform_0": {
"with_label": false
},
"data_transform_1": {
"with_label": false
},
"reader_0": {
"table": {
"name": "ionosphere_scale_hetero_host",
"namespace": "experiment"
}
}
}
}
}
}
}
test_secureboost_cross_validation_regression_conf.json
{
"dsl_version": 2,
"initiator": {
"role": "guest",
"party_id": 9999
},
"role": {
"host": [
9998
],
"guest": [
9999
]
},
"component_parameters": {
"common": {
"hetero_secure_boost_0": {
"task_type": "regression",
"objective_param": {
"objective": "lse"
},
"num_trees": 3,
"cv_param": {
"need_cv": true,
"n_splits": 5,
"shuffle": false,
"random_seed": 103
},
"validation_freqs": 1,
"encrypt_param": {
"method": "Paillier"
},
"tree_param": {
"max_depth": 3
}
}
},
"role": {
"host": {
"0": {
"reader_0": {
"table": {
"name": "student_hetero_host",
"namespace": "experiment"
}
},
"data_transform_0": {
"with_label": false
}
}
},
"guest": {
"0": {
"reader_0": {
"table": {
"name": "student_hetero_guest",
"namespace": "experiment"
}
},
"data_transform_0": {
"with_label": true,
"label_type": "float",
"output_format": "dense"
}
}
}
}
}
}
test_secureboost_cross_validation_binary_conf.json
{
"dsl_version": 2,
"initiator": {
"role": "guest",
"party_id": 9999
},
"role": {
"host": [
9998
],
"guest": [
9999
]
},
"component_parameters": {
"common": {
"hetero_secure_boost_0": {
"task_type": "classification",
"objective_param": {
"objective": "cross_entropy"
},
"num_trees": 3,
"cv_param": {
"need_cv": true,
"n_splits": 5,
"shuffle": false,
"random_seed": 103
},
"validation_freqs": 1,
"encrypt_param": {
"method": "Paillier"
},
"tree_param": {
"max_depth": 3
}
}
},
"role": {
"guest": {
"0": {
"reader_0": {
"table": {
"name": "breast_hetero_guest",
"namespace": "experiment"
}
},
"data_transform_0": {
"with_label": true,
"output_format": "dense"
}
}
},
"host": {
"0": {
"reader_0": {
"table": {
"name": "breast_hetero_host",
"namespace": "experiment"
}
},
"data_transform_0": {
"with_label": false
}
}
}
}
}
}
test_secureboost_train_layered_mode_binary_conf.json
{
"dsl_version": 2,
"initiator": {
"role": "guest",
"party_id": 9999
},
"role": {
"host": [
9998
],
"guest": [
9999
]
},
"component_parameters": {
"common": {
"hetero_secure_boost_0": {
"task_type": "classification",
"objective_param": {
"objective": "cross_entropy"
},
"num_trees": 3,
"validation_freqs": 1,
"encrypt_param": {
"method": "Paillier"
},
"tree_param": {
"max_depth": 3
},
"boosting_strategy": "layered"
},
"evaluation_0": {
"eval_type": "binary"
}
},
"role": {
"guest": {
"0": {
"reader_1": {
"table": {
"name": "breast_hetero_guest",
"namespace": "experiment"
}
},
"reader_0": {
"table": {
"name": "breast_hetero_guest",
"namespace": "experiment"
}
},
"data_transform_0": {
"with_label": true,
"output_format": "dense"
},
"data_transform_1": {
"with_label": true,
"output_format": "dense"
}
}
},
"host": {
"0": {
"reader_1": {
"table": {
"name": "breast_hetero_host",
"namespace": "experiment"
}
},
"reader_0": {
"table": {
"name": "breast_hetero_host",
"namespace": "experiment"
}
},
"data_transform_0": {
"with_label": false
},
"data_transform_1": {
"with_label": false
}
}
}
}
}
}
test_secureboost_warm_start_dsl.json
{
"components": {
"reader_0": {
"module": "Reader",
"output": {
"data": [
"data"
]
}
},
"data_transform_0": {
"module": "DataTransform",
"input": {
"data": {
"data": [
"reader_0.data"
]
}
},
"output": {
"data": [
"data"
],
"model": [
"model"
]
}
},
"intersection_0": {
"module": "Intersection",
"input": {
"data": {
"data": [
"data_transform_0.data"
]
}
},
"output": {
"data": [
"data"
],
"cache": [
"cache"
]
}
},
"hetero_secure_boost_0": {
"module": "HeteroSecureBoost",
"input": {
"data": {
"train_data": [
"intersection_0.data"
]
}
},
"output": {
"data": [
"data"
],
"model": [
"model"
]
}
},
"hetero_secure_boost_1": {
"module": "HeteroSecureBoost",
"input": {
"data": {
"train_data": [
"intersection_0.data"
]
},
"model": [
"hetero_secure_boost_0.model"
]
},
"output": {
"data": [
"data"
],
"model": [
"model"
]
}
},
"evaluation_0": {
"module": "Evaluation",
"input": {
"data": {
"data": [
"hetero_secure_boost_1.data"
]
}
},
"output": {
"data": [
"data"
]
}
}
}
}
test_secureboost_train_with_early_stopping_conf.json
{
"dsl_version": 2,
"initiator": {
"role": "guest",
"party_id": 9999
},
"role": {
"host": [
9998
],
"guest": [
9999
]
},
"component_parameters": {
"common": {
"hetero_secure_boost_0": {
"task_type": "regression",
"objective_param": {
"objective": "lse"
},
"num_trees": 3,
"validation_freqs": 1,
"encrypt_param": {
"method": "Paillier"
},
"early_stopping_rounds": 1,
"tree_param": {
"max_depth": 3
}
},
"evaluation_0": {
"eval_type": "regression"
}
},
"role": {
"guest": {
"0": {
"data_transform_1": {
"with_label": true,
"output_format": "dense"
},
"data_transform_0": {
"with_label": true,
"output_format": "dense"
},
"reader_0": {
"table": {
"name": "student_hetero_guest",
"namespace": "experiment"
}
},
"reader_1": {
"table": {
"name": "student_hetero_guest",
"namespace": "experiment"
}
}
}
},
"host": {
"0": {
"data_transform_1": {
"with_label": false
},
"data_transform_0": {
"with_label": false
},
"reader_0": {
"table": {
"name": "student_hetero_host",
"namespace": "experiment"
}
},
"reader_1": {
"table": {
"name": "student_hetero_host",
"namespace": "experiment"
}
}
}
}
}
}
}
test_predict_conf.json
{
"dsl_version": 2,
"initiator": {
"role": "guest",
"party_id": 9999
},
"role": {
"host": [
10000
],
"guest": [
9999
]
},
"job_parameters": {
"common": {
"model_id": "guest-10000#host-9999#model",
"model_version": "20200928174750711017114",
"job_type": "predict"
}
},
"component_parameters": {
"role": {
"guest": {
"0": {
"reader_0": {
"table": {
"name": "breast_hetero_guest",
"namespace": "experiment"
}
}
}
},
"host": {
"0": {
"reader_0": {
"table": {
"name": "breast_hetero_host",
"namespace": "experiment"
}
}
}
}
}
}
}
test_secureboost_train_binary_cipher_compress_conf.json
{
"dsl_version": 2,
"initiator": {
"role": "guest",
"party_id": 9999
},
"role": {
"host": [
9998
],
"guest": [
9999
]
},
"component_parameters": {
"common": {
"hetero_secure_boost_0": {
"task_type": "classification",
"objective_param": {
"objective": "cross_entropy"
},
"num_trees": 3,
"validation_freqs": 1,
"encrypt_param": {
"method": "paillier"
},
"tree_param": {
"max_depth": 3
},
"cipher_compress_error": 8
},
"evaluation_0": {
"eval_type": "binary"
}
},
"role": {
"guest": {
"0": {
"reader_1": {
"table": {
"name": "breast_hetero_guest",
"namespace": "experiment"
}
},
"reader_0": {
"table": {
"name": "breast_hetero_guest",
"namespace": "experiment"
}
},
"data_transform_0": {
"with_label": true,
"output_format": "dense"
},
"data_transform_1": {
"with_label": true,
"output_format": "dense"
}
}
},
"host": {
"0": {
"reader_1": {
"table": {
"name": "breast_hetero_host",
"namespace": "experiment"
}
},
"reader_0": {
"table": {
"name": "breast_hetero_host",
"namespace": "experiment"
}
},
"data_transform_0": {
"with_label": false
},
"data_transform_1": {
"with_label": false
}
}
}
}
}
}
test_secureboost_train_binary_without_cipher_compress_conf.json
{
"dsl_version": 2,
"initiator": {
"role": "guest",
"party_id": 9999
},
"role": {
"host": [
9998
],
"guest": [
9999
]
},
"component_parameters": {
"common": {
"hetero_secure_boost_0": {
"task_type": "classification",
"objective_param": {
"objective": "cross_entropy"
},
"num_trees": 3,
"validation_freqs": 1,
"encrypt_param": {
"method": "Paillier"
},
"tree_param": {
"max_depth": 3
},
"cipher_compress": false
},
"evaluation_0": {
"eval_type": "binary"
}
},
"role": {
"guest": {
"0": {
"reader_1": {
"table": {
"name": "breast_hetero_guest",
"namespace": "experiment"
}
},
"reader_0": {
"table": {
"name": "breast_hetero_guest",
"namespace": "experiment"
}
},
"data_transform_0": {
"with_label": true,
"output_format": "dense"
},
"data_transform_1": {
"with_label": true,
"output_format": "dense"
}
}
},
"host": {
"0": {
"reader_1": {
"table": {
"name": "breast_hetero_host",
"namespace": "experiment"
}
},
"reader_0": {
"table": {
"name": "breast_hetero_host",
"namespace": "experiment"
}
},
"data_transform_0": {
"with_label": false
},
"data_transform_1": {
"with_label": false
}
}
}
}
}
}
hetero_secureboost_testsuite.json
{
"data": [
{
"file": "examples/data/breast_hetero_guest.csv",
"head": 1,
"partition": 4,
"table_name": "breast_hetero_guest",
"namespace": "experiment",
"role": "guest_0"
},
{
"file": "examples/data/breast_hetero_host.csv",
"head": 1,
"partition": 4,
"table_name": "breast_hetero_host",
"namespace": "experiment",
"role": "host_0"
},
{
"file": "examples/data/vehicle_scale_hetero_guest.csv",
"head": 1,
"partition": 4,
"table_name": "vehicle_scale_hetero_guest",
"namespace": "experiment",
"role": "guest_0"
},
{
"file": "examples/data/vehicle_scale_hetero_host.csv",
"head": 1,
"partition": 4,
"table_name": "vehicle_scale_hetero_host",
"namespace": "experiment",
"role": "host_0"
},
{
"file": "examples/data/student_hetero_guest.csv",
"head": 1,
"partition": 4,
"table_name": "student_hetero_guest",
"namespace": "experiment",
"role": "guest_0"
},
{
"file": "examples/data/student_hetero_host.csv",
"head": 1,
"partition": 4,
"table_name": "student_hetero_host",
"namespace": "experiment",
"role": "host_0"
},
{
"file": "examples/data/ionosphere_scale_hetero_guest.csv",
"head": 1,
"partition": 4,
"table_name": "ionosphere_scale_hetero_guest",
"namespace": "experiment",
"role": "guest_0"
},
{
"file": "examples/data/ionosphere_scale_hetero_host.csv",
"head": 1,
"partition": 4,
"table_name": "ionosphere_scale_hetero_host",
"namespace": "experiment",
"role": "host_0"
},
{
"file": "examples/data/motor_hetero_guest.csv",
"head": 1,
"partition": 4,
"table_name": "motor_hetero_guest",
"namespace": "experiment",
"role": "guest_0"
},
{
"file": "examples/data/motor_hetero_host_1.csv",
"head": 1,
"partition": 4,
"table_name": "motor_hetero_host_1",
"namespace": "experiment",
"role": "host_0"
},
{
"file": "examples/data/motor_hetero_host_2.csv",
"head": 1,
"partition": 4,
"table_name": "motor_hetero_host_2",
"namespace": "experiment",
"role": "host_1"
}
],
"tasks": {
"train_binary": {
"conf": "./test_secureboost_train_binary_conf.json",
"dsl": "./test_secureboost_train_dsl.json"
},
"warm_start_train": {
"conf": "./test_secureboost_warm_start_conf.json",
"dsl": "./test_secureboost_warm_start_dsl.json"
},
"train_complete_secure": {
"conf": "./test_secureboost_train_complete_secure_conf.json",
"dsl": "./test_secureboost_train_dsl.json"
},
"train_binary_without_cipher_compress": {
"conf": "./test_secureboost_train_binary_without_cipher_compress_conf.json",
"dsl": "./test_secureboost_train_dsl.json"
},
"train_binary_predict": {
"conf": "./test_predict_conf.json",
"dsl": "./test_predict_dsl.json",
"deps": "train_binary"
},
"train_binary_EINI_predict": {
"conf": "./test_EINI_predict_conf.json",
"dsl": "./test_predict_dsl.json",
"deps": "train_binary"
},
"train_multi": {
"conf": "./test_secureboost_train_multi_conf.json",
"dsl": "./test_secureboost_train_dsl.json"
},
"train_regression": {
"conf": "./test_secureboost_train_regression_conf.json",
"dsl": "./test_secureboost_train_dsl.json"
},
"cv_binary": {
"conf": "./test_secureboost_cross_validation_binary_conf.json",
"dsl": "./test_secureboost_cross_validation_dsl.json"
},
"cv_multi": {
"conf": "./test_secureboost_cross_validation_multi_conf.json",
"dsl": "./test_secureboost_cross_validation_dsl.json"
},
"cv_regression": {
"conf": "./test_secureboost_cross_validation_regression_conf.json",
"dsl": "./test_secureboost_cross_validation_dsl.json"
},
"train_binary_with_missing_value": {
"conf": "./test_secureboost_train_binary_with_missing_value_conf.json",
"dsl": "./test_secureboost_train_dsl.json"
},
"train_multi_host": {
"conf": "./test_secureboost_train_regression_multi_host_conf.json",
"dsl": "./test_secureboost_train_dsl.json"
},
"train_with_early_stopping": {
"conf": "./test_secureboost_train_with_early_stopping_conf.json",
"dsl": "./test_secureboost_train_dsl.json"
}
}
}
test_predict_dsl.json
{
"components": {
"reader_0": {
"module": "Reader",
"output": {
"data": [
"data"
]
}
},
"intersection_0": {
"input": {
"data": {
"data": [
"data_transform_0.data"
]
}
},
"module": "Intersection",
"output": {
"data": [
"data"
]
}
},
"data_transform_0": {
"input": {
"data": {
"data": [
"reader_0.data"
]
},
"model": [
"pipeline.data_transform_0.model"
]
},
"module": "DataTransform",
"output": {
"data": [
"data"
]
}
},
"hetero_secure_boost_0": {
"input": {
"data": {
"test_data": [
"intersection_0.data"
]
},
"model": [
"pipeline.hetero_secure_boost_0.model"
]
},
"module": "HeteroSecureBoost",
"output": {
"data": [
"data"
]
}
}
}
}
test_secureboost_cross_validation_multi_conf.json
{
"dsl_version": 2,
"initiator": {
"role": "guest",
"party_id": 9999
},
"role": {
"host": [
9998
],
"guest": [
9999
]
},
"component_parameters": {
"common": {
"hetero_secure_boost_0": {
"task_type": "classification",
"objective_param": {
"objective": "cross_entropy"
},
"num_trees": 3,
"cv_param": {
"need_cv": true,
"n_splits": 5,
"shuffle": false,
"random_seed": 103
},
"validation_freqs": 1,
"encrypt_param": {
"method": "Paillier"
},
"tree_param": {
"max_depth": 3
}
}
},
"role": {
"host": {
"0": {
"reader_0": {
"table": {
"name": "vehicle_scale_hetero_host",
"namespace": "experiment"
}
},
"data_transform_0": {
"with_label": false
}
}
},
"guest": {
"0": {
"reader_0": {
"table": {
"name": "vehicle_scale_hetero_guest",
"namespace": "experiment"
}
},
"data_transform_0": {
"with_label": true,
"output_format": "dense"
}
}
}
}
}
}
test_secureboost_train_mix_mode_binary_conf.json
{
"dsl_version": 2,
"initiator": {
"role": "guest",
"party_id": 9999
},
"role": {
"host": [
9998
],
"guest": [
9999
]
},
"component_parameters": {
"common": {
"hetero_secure_boost_0": {
"task_type": "classification",
"objective_param": {
"objective": "cross_entropy"
},
"num_trees": 3,
"validation_freqs": 1,
"encrypt_param": {
"method": "Paillier"
},
"tree_param": {
"max_depth": 3
},
"boosting_strategy": "mix"
},
"evaluation_0": {
"eval_type": "binary"
}
},
"role": {
"guest": {
"0": {
"reader_1": {
"table": {
"name": "breast_hetero_guest",
"namespace": "experiment"
}
},
"reader_0": {
"table": {
"name": "breast_hetero_guest",
"namespace": "experiment"
}
},
"data_transform_0": {
"with_label": true,
"output_format": "dense"
},
"data_transform_1": {
"with_label": true,
"output_format": "dense"
}
}
},
"host": {
"0": {
"reader_1": {
"table": {
"name": "breast_hetero_host",
"namespace": "experiment"
}
},
"reader_0": {
"table": {
"name": "breast_hetero_host",
"namespace": "experiment"
}
},
"data_transform_0": {
"with_label": false
},
"data_transform_1": {
"with_label": false
}
}
}
}
}
}
test_secureboost_train_layered_mode_multi_conf.json
{
"dsl_version": 2,
"initiator": {
"role": "guest",
"party_id": 9999
},
"role": {
"host": [
9998
],
"guest": [
9999
]
},
"component_parameters": {
"common": {
"hetero_secure_boost_0": {
"task_type": "classification",
"objective_param": {
"objective": "cross_entropy"
},
"num_trees": 3,
"validation_freqs": 1,
"encrypt_param": {
"method": "Paillier"
},
"tree_param": {
"max_depth": 3
},
"boosting_strategy": "layered"
},
"evaluation_0": {
"eval_type": "binary"
}
},
"role": {
"guest": {
"0": {
"reader_1": {
"table": {
"name": "vehicle_scale_hetero_guest",
"namespace": "experiment"
}
},
"reader_0": {
"table": {
"name": "vehicle_scale_hetero_guest",
"namespace": "experiment"
}
},
"data_transform_0": {
"with_label": true,
"output_format": "dense"
},
"data_transform_1": {
"with_label": true,
"output_format": "dense"
}
}
},
"host": {
"0": {
"reader_1": {
"table": {
"name": "vehicle_scale_hetero_host",
"namespace": "experiment"
}
},
"reader_0": {
"table": {
"name": "vehicle_scale_hetero_host",
"namespace": "experiment"
}
},
"data_transform_0": {
"with_label": false
},
"data_transform_1": {
"with_label": false
}
}
}
}
}
}
test_secureboost_train_mix_mode_multi_conf.json
{
"dsl_version": 2,
"initiator": {
"role": "guest",
"party_id": 9999
},
"role": {
"host": [
9998
],
"guest": [
9999
]
},
"component_parameters": {
"common": {
"hetero_secure_boost_0": {
"task_type": "classification",
"objective_param": {
"objective": "cross_entropy"
},
"num_trees": 3,
"validation_freqs": 1,
"encrypt_param": {
"method": "Paillier"
},
"tree_param": {
"max_depth": 3
},
"boosting_strategy": "mix"
},
"evaluation_0": {
"eval_type": "multi"
}
},
"role": {
"guest": {
"0": {
"reader_1": {
"table": {
"name": "vehicle_scale_hetero_guest",
"namespace": "experiment"
}
},
"reader_0": {
"table": {
"name": "vehicle_scale_hetero_guest",
"namespace": "experiment"
}
},
"data_transform_0": {
"with_label": true,
"output_format": "dense"
},
"data_transform_1": {
"with_label": true,
"output_format": "dense"
}
}
},
"host": {
"0": {
"reader_1": {
"table": {
"name": "vehicle_scale_hetero_host",
"namespace": "experiment"
}
},
"reader_0": {
"table": {
"name": "vehicle_scale_hetero_host",
"namespace": "experiment"
}
},
"data_transform_0": {
"with_label": false
},
"data_transform_1": {
"with_label": false
}
}
}
}
}
}
test_secureboost_train_binary_conf.json
{
"dsl_version": 2,
"initiator": {
"role": "guest",
"party_id": 9999
},
"role": {
"host": [
9998
],
"guest": [
9999
]
},
"component_parameters": {
"common": {
"hetero_secure_boost_0": {
"task_type": "classification",
"objective_param": {
"objective": "cross_entropy"
},
"num_trees": 3,
"validation_freqs": 1,
"encrypt_param": {
"method": "Paillier"
},
"tree_param": {
"max_depth": 3
}
},
"evaluation_0": {
"eval_type": "binary"
}
},
"role": {
"guest": {
"0": {
"reader_1": {
"table": {
"name": "breast_hetero_guest",
"namespace": "experiment"
}
},
"reader_0": {
"table": {
"name": "breast_hetero_guest",
"namespace": "experiment"
}
},
"data_transform_0": {
"with_label": true,
"output_format": "dense"
},
"data_transform_1": {
"with_label": true,
"output_format": "dense"
}
}
},
"host": {
"0": {
"reader_1": {
"table": {
"name": "breast_hetero_host",
"namespace": "experiment"
}
},
"reader_0": {
"table": {
"name": "breast_hetero_host",
"namespace": "experiment"
}
},
"data_transform_0": {
"with_label": false
},
"data_transform_1": {
"with_label": false
}
}
}
}
}
}
test_secureboost_train_regression_multi_host_conf.json
{
"dsl_version": 2,
"initiator": {
"role": "guest",
"party_id": 9999
},
"role": {
"host": [
9998,
10000
],
"guest": [
9999
]
},
"component_parameters": {
"common": {
"hetero_secure_boost_0": {
"task_type": "regression",
"objective_param": {
"objective": "lse"
},
"num_trees": 3,
"validation_freqs": 1,
"encrypt_param": {
"method": "Paillier"
},
"tree_param": {
"max_depth": 3
}
},
"evaluation_0": {
"eval_type": "regression"
}
},
"role": {
"guest": {
"0": {
"data_transform_0": {
"with_label": true,
"label_name": "motor_speed",
"label_type": "float",
"output_format": "dense"
},
"data_transform_1": {
"with_label": true,
"label_name": "motor_speed",
"label_type": "float",
"output_format": "dense"
},
"reader_1": {
"table": {
"name": "motor_hetero_guest",
"namespace": "experiment"
}
},
"reader_0": {
"table": {
"name": "motor_hetero_guest",
"namespace": "experiment"
}
}
}
},
"host": {
"1": {
"data_transform_0": {
"with_label": false
},
"data_transform_1": {
"with_label": false
},
"reader_1": {
"table": {
"name": "motor_hetero_host_2",
"namespace": "experiment"
}
},
"reader_0": {
"table": {
"name": "motor_hetero_host_2",
"namespace": "experiment"
}
}
},
"0": {
"data_transform_0": {
"with_label": false
},
"data_transform_1": {
"with_label": false
},
"reader_1": {
"table": {
"name": "motor_hetero_host_1",
"namespace": "experiment"
}
},
"reader_0": {
"table": {
"name": "motor_hetero_host_1",
"namespace": "experiment"
}
}
}
}
}
}
}
test_secureboost_train_multi_with_mo_conf.json
{
"dsl_version": 2,
"initiator": {
"role": "guest",
"party_id": 9999
},
"role": {
"host": [
9998
],
"guest": [
9999
]
},
"component_parameters": {
"common": {
"hetero_secure_boost_0": {
"task_type": "classification",
"objective_param": {
"objective": "cross_entropy"
},
"num_trees": 5,
"validation_freqs": 1,
"encrypt_param": {
"method": "Paillier"
},
"tree_param": {
"max_depth": 3,
"criterion_params": [0.5, 0]
},
"multi_mode": "multi_output"
},
"evaluation_0": {
"eval_type": "multi"
}
},
"role": {
"host": {
"0": {
"reader_1": {
"table": {
"name": "vehicle_scale_hetero_host",
"namespace": "experiment"
}
},
"data_transform_0": {
"with_label": false
},
"data_transform_1": {
"with_label": false
},
"reader_0": {
"table": {
"name": "vehicle_scale_hetero_host",
"namespace": "experiment"
}
}
}
},
"guest": {
"0": {
"reader_1": {
"table": {
"name": "vehicle_scale_hetero_guest",
"namespace": "experiment"
}
},
"data_transform_0": {
"with_label": true,
"output_format": "dense"
},
"data_transform_1": {
"with_label": true,
"output_format": "dense"
},
"reader_0": {
"table": {
"name": "vehicle_scale_hetero_guest",
"namespace": "experiment"
}
}
}
}
}
}
}
Hetero Fast SecureBoost¶
We support Hetero Fast SecureBoost, in abbreviation, fast SBT, in FATE-1.5. The fast SBT uses guest features and host features alternately to build trees, in order to save encryption costs and communication costs. In fast SBT, we support MIX mode and LAYERED mode and they use different strategies while building decision trees.
MIX mode¶
In mix mode, we offer a new parameter 'tree_num_per_party'. Every participated party will build 'tree_num_per_party' trees using their local features, and this procedure will be repeated until reach the max tree number. Figure 5 illustrates the mix mode.
While building a guest tree, the guest party side simply computes g/h and finds the best split points, other host parties will skip this tree and wait. While building a host tree, the related host party will receive encrypted g/h and find the best split points with the assistant of the guest party. The structures of host trees and split points will be preserved on the host side while leaf weights will be preserved on the guest side. In this way, encryption and communication costs are reduced by half. (If there are two parties)
While conducting inference, every party will traverse its trees locally. All hosts will send the final leaf id to guests and the guest retrieves leaf weights using received leaf id. The prediction only needs one communication in mix mode.
LAYERED mode¶
In layered mode, only supports one guest party and one host party. The host will be responsible for building the first "host_depth" layers, with the help of the guest, and the guest will be responsible for the next "guest_depth" layers. All trees will be built in this 'layered' manner.
The benefits of layered mod is obvious, like the mix mode, parts of communication costs and encryption costs will be saved in the process of training. When predicting, we only need one communication because all host can conduct inferences of host layers locally.
According to experiments on our standard data sets, mix mode and layered mode of Fast SBT can still give performances (sometimes even better) equivalent to standard Hetero SecureBoost, even the training data is unbalanced distributed in different parties or contains noise features. (Binary, multi-class, and regression tasks are tested). At the same time, the time consumption of FAST SBT is reduced by 30% ~ 50% on average.
Optimization in learning¶
Fast SBT uses guest features and host features alternately by trees/layers to reduce encryption and communication costs.
- Prediction only needs one communication round.
Applications¶
Fast SBT supports the following applications.
- binary classification, the objective function is sigmoid cross-entropy
- multi classification, the objective function is softmax cross-entropy
- regression, objective function includes least-squared-error-loss、least-absolutely-error-loss、huber-loss、tweedie-loss、fair-loss、 log-cosh-loss
Other features¶
- In mix mode, every host parties only keep their own tree models. Guest will only keep guest trees and host leaves.
- In mix mode, host side support feature importance calculation (split type is supported, gain type is not supported)
- In layered mode, model exporting setting is the same as the normal-SBT.
- The time consumption of FAST SBT is reduced by 30% ~ 50% on average.
Hetero SecureBoost with Multi-Output(SBT-MO)¶
In the traditional GBDT setting, the strategy of multi-classification learning is to separate the gradient/hessian of each class and learn a tree for each class independently. Each tree is responsible for predicting a single variable. In the vertical federated scenario, using a traditional single-output-tree-based multi-classification strategy has its limitations: All the computation costs and communication overheads are amplified by the times of the number of labels. It will be extremely time-consuming if we are training on a dataset with many types of labels.
To address the efficiency problem, in FATE-1.8, we propose a novel multi-output-tree based vertical boosting tree techniques for multi-classification tasks. Leaves of multi-output-tree give multi-dimension output, corresponding to every class. Instead of learning trees for every class separately, now we only need to fit one tree at every boosting epoch. We also combines our cipher optimization techniques with SBT-MO. According to our preliminary experiments, on the conditions of reaching the same accuracy, SBT-MO reduces tree building time(ignoring data preprocessing and evaluation time) by over 50%+. For more details of SBT-MO, please refer to SecureBoost+ : A High Performance Gradient Boosting Tree Framework for Large Scale Vertical Federated Learning.
Optimization in learning¶
SBT-MO solve multi-classification tasks with multi-output decision trees. Save time when training on a dataset with many types of labels.
Applications¶
- multi classification, the objective function is softmax cross-entropy