Components¶

Each Component wraps a FederatedML Module. Modules implement machine learning algorithms on federated learning, while Components provide convenient interface for easy model building.

Interface¶

Input¶

Input encapsulates all upstream input to a component in a job workflow. There are two classes of input: data and model. Not all components have both classes of input, and a component may accept only some types of the class. For information on each components’ input, check the list below.

Here is an example to access a component’s input:

from pipeline.component import DataIO
dataio_0 = DataIO(name="dataio_0")
input_all = dataio_0.input
input_data = dataio_0.input.data
input_model = dataio_0.input.model

Output¶

Same as Input, Output encapsulates output data and model of component in a FATE job. Not all components have both classes of outputs. For information on each components’ output, check the list below.

Here is an example to access a component’s output:

from pipeline.component import DataIO
dataio_0 = DataIO(name="dataio_0")
output_all = dataio_0.output
output_data = dataio_0.output.data
output_model = dataio_0.output.model

Meanwhile, to download components’ output table or model, please use task info interface.

Data¶

In most cases, data sets are wrapped into data when being passed between modules. For instance, in the mini demo, data output of dataio_0 is set as data input to intersection_0.

pipeline.add_component(intersection_0, data=Data(data=dataio_0.output.data))

For data sets used in different modeling stages (e.g., train & validate) of the same component, additional keywords train_data and validate_data are used to distinguish data sets. Also from mini demo, result from intersection_0 and intersection_1 are set as train and validate data of hetero logistic regression, respectively.

pipeline.add_component(hetero_lr_0, data=Data(train_data=intersection_0.output.data,
                                              validate_data=intersection_1.output.data))

Another case of using keywords train_data and validate_data is to use data output from DataSplit module, which always has three data outputs: train_data, validate_data, and test_data.

pipeline.add_component(hetero_lr_0,
                       data=Data(train_data=hetero_data_split_0.output.data.train_data))

A special data type is predict_input. predict_input is only used for specifying data input when running prediction task.

Here is an example of running prediction with an upstream model within the same pipeline:

pipeline.add_component(hetero_lr_1,
                       data=Data(predict_input=hetero_data_split_0.output.data.test_data),
                       model=Model(model=hetero_lr_0))

To run prediction with with new data, data source needs to be updated in prediction job. Below is an example from mini demo, where data input of original dataio_0 component is set to be the data output from reader_2.

reader_2 = Reader(name="reader_2")
reader_2.get_party_instance(role="guest", party_id=guest).component_param(table=guest_eval_data)
reader_2.get_party_instance(role="host", party_id=host).component_param(table=host_eval_data)
# add data reader onto predict pipeline
predict_pipeline.add_component(reader_2)
predict_pipeline.add_component(pipeline,
                               data=Data(predict_input={pipeline.dataio_0.input.data: reader_2.output.data}))

Below lists all five types of data and whether Input and Output include them.

Data¶
Data Name	Input	Output	Use Case
data	Yes	Yes	single data input or output
train_data	Yes	Yes	model training; output of `DataSplit` component
validate_data	Yes	Yes	model training with validate data; output of `DataSplit` component
test_data	No	Yes	output of `DataSplit` component
predict_input	Yes	No	model prediction

All input and output data of components need to be wrapped into Data objects when being passed between components. For information on valid data types of each component, check the list below. Here is a an example of chaining components with different types of data input and output:

from pipeline.backend.pipeline import Pipeline
from pipeline.component import DataIO, Intersection, HeteroDataSplit, HeteroLR
# initialize a pipeline
pipeline = PipeLine().set_initiator(role='guest', party_id=guest).set_roles(guest=guest)
# define all components
dataio_0 = DataIO(name="dataio_0")
data_split = HeteroDataSplit(name="data_split_0")
hetero_lr_0 = HeteroLR(name="hetero_lr_0", max_iter=20)
# chain together all components
pipeline.add_component(reader_0)
pipeline.add_component(dataio_0, data=Data(data=reader_0.output.data))
pipeline.add_component(intersection_0, data=Data(data=dataio_0.output.data))
pipeline.add_component(hetero_data_split_0, data=Data(data=intersection_0.output.data))
pipeline.add_component(hetero_lr_0, data=Data(train_data=hetero_data_split_0.output.data.train_data,
                                              validate_data=hetero_data_split_0.output.data.test_data))

Model¶

There are two types of Model: model andisometric_model. When the current component is of the same class as the previous component, if receiving model, the current component will replicate all model parameters from the previous component. When a model from previous component is used as input but the current component is of different class from the previous component, isometric_model is used.

Check below for a case from mini demo, where model from dataio_0 is passed to dataio_1.

pipeline.add_component(dataio_1,
                       data=Data(data=reader_1.output.data),
                       model=Model(dataio_0.output.model))

Here is a case of using isometric model. HeteroFeatureSelection uses isometric_model from HeteroFeatureBinning to select the most important features.

pipeline.add_component(hetero_feature_selection_0,
                       data=Data(data=intersection_0.output.data),
                       isometric_model=Model(hetero_feature_binning_0.output.model))

Warning

Please note that when using stepwise or cross validation method, components do not have model output. For information on valid model types of each components, check the list below.

Parameter¶

Parameters of underlying module can be set for all job participants or per individual participant.

Parameters for all participants may be specified when defining a component:

from pipeline.component import DataIO
dataio_0 = DataIO(name="dataio_0", input_format="dense", output_format="dense",
                  outlier_replace=False)

Parameters can be set for each party individually:

# set guest dataio_0 component parameters
guest_dataio_0 = dataio_0.get_party_instance(role='guest', party_id=9999)
guest_dataio_0.component_param(with_label=True)
# set host dataio_0 component parameters
dataio_0.get_party_instance(role='host', party_id=10000).component_param(with_label=False)

Task Info¶

Output data and model information of Components can be retrieved with Pipeline task info API. Currently Pipeline support these four types of query on components:

get_output_data: returns downloaded output data; use parameter limits to limit output lines
get_output_data_table: returns output data table information(including table name and namespace)
get_model_param: returns fitted model parameters
get_summary: returns model summary

To obtain output of a component, the component needs to be first extracted from pipeline:

print(pipeline.get_component("dataio_0").get_output_data(limits=10))

Component List¶

Below lists input and output elements of each component.

Component¶
Algorithm	Component Name	Description	Acceptable Input Data	Acceptable Output Data	Acceptable Input Model	Acceptable Output Model
Reader	Reader	This component is always the first component of a pipeline task(except for upload). It loads raw data from storage.	None	data	None	None
DataIO	DataIO	This component usually follows `Reader`. It transforms user-uploaded date into Instance object(deprecate in FATe-v1.7, use DataTransform instead).	data	data	model	model
DataTransform	DataTransform	This component usually follows `Reader`. It transforms user-uploaded date into Instance object.	data	data	model	model
Intersect	Intersection	Compute intersect data set of multiple parties without leakage of difference set information. Mainly used in hetero scenario task.	data	data	model	model
Federated Sampling	FederatedSample	Federated Sampling data so that its distribution become balance in each party.This module supports standalone and federated versions.	data	data	model	model
Feature Scale	FeatureScale	Feature scaling and standardization.	data	data	model	model
Hetero Feature Binning	Hetero Feature Binning	With binning input data, calculates each column’s iv and woe and transform data according to the binned information.	data	data	model	model
Homo Feature Binning	Homo Feature Binning	Calculate quantile binning through multiple parties	data	data	None	model
OneHot Encoder	OneHotEncoder	Transfer a column into one-hot format.	data	data	model	model
Hetero Feature Selection	HeteroFeatureSelection	Provide 5 types of filters. Each filters can select columns according to user config	data	data	model; isometric model	model
Union	Union	Combine multiple data tables into one.	List[data]	data	model	model
Hetero-LR	HeteroLR	Build hetero logistic regression module through multiple parties.	train_data; validate_data; predict_input	data	model	model
Local Baseline	LocalBaseline	Wrapper that runs sklearn(scikit-learn) Logistic Regression model with local data.	train_data; validate_data; predict_input	data	None	None
Hetero-LinR	HeteroLinR	Build hetero linear regression module through multiple parties.	train_data; validate_data; predict_input	data	model	model
Hetero-Poisson	HeteroPoisson	Build hetero poisson regression module through multiple parties.	train_data; validate_data; predict_input	data	model	model
Homo-LR	HomoLR	Build homo logistic regression module through multiple parties.	train_data; validate_data; predict_input	data	model	model
Homo-NN	HomoNN	Build homo neural network module through multiple parties.	train_data; validate_data; predict_input	data	model	model
Hetero Secure Boosting	HeteroSecureBoost	Build hetero secure boosting module through multiple parties	train_data; validate_data; predict_input	data	model	model
Hetero Fast Secure Boosting	HeteroFastSecureBoost	Build hetero secure boosting model through multiple parties in layered/mix manners.	train_data; validate_data; predict_input	data	model	model
Hetero Secure Boost Feature Transformer	SBT Feature Transformer	This component can encode sample using Hetero SBT leaf indices.	data	data	model	model
Evaluation	Evaluation	Output the model evaluation metrics for user.	data; List[data]	None	model	None
Hetero Pearson	HeteroPearson	Calculate hetero correlation of features from different parties.	data	None	model	model
Hetero-NN	HeteroNN	Build hetero neural network module.	train_data; validate_data; predict_input	data	model	model
Homo Secure Boosting	HomoSecureBoost	Build homo secure boosting module through multiple parties	train_data; validate_data; predict_input	data	model	model
Homo OneHot Encoder	HomoOneHotEncoder	Build homo onehot encoder module through multiple parties.	data	data	model	model
Data Split	Data Split	Split one data table into 3 tables by given ratio or count	data	train_data & validate_data & test_data	None	None
Column Expand	Column Expand	Add arbitrary number of columns with user-provided values.	data	data	model	model
Hetero KMeans	Hetero KMeans	Build Hetero KMeans module through multiple parties	train_data; validate_data; predict_input	data	model	model
Data Statistics	Data Statistics	Compute Data Statistics	data	data	None	model
Scorecard	Scorecard	Scale predict score to credit score by given scaling parameters	data	data	None	None
Feldman Verifiable Sum	Feldman Verifiable Sum	This component will sum multiple privacy values without exposing data	data	data	None	None
Sample Weight	Sample Weight	Sample Weight assigns weight to instances according to user-specified parameters	data	data	None	None