Using FATE Built-In Dataset¶
In FATE-1.10, three data sets of table, nlp_tokenizer and image are provided to meet the basic needs of table data, text data and image data
class TableDataset(Dataset):
"""
A Table Dataset, load data from a give csv path, or transform FATE DTable
Parameters
----------
label_col str, name of label column in csv, if None, will automatically take 'y' or 'label' or 'target' as label
feature_dtype dtype of feature, supports int, long, float, double
label_dtype: dtype of label, supports int, long, float, double
label_shape: list or tuple, the shape of label
flatten_label: bool, flatten extracted label column or not, default is False
"""
def __init__(
self,
label_col=None,
feature_dtype='float',
label_dtype='float',
label_shape=None,
flatten_label=False):
TokenizerDataset¶
TokenizerDataset is provided under nlp_tokenizer.py, which is developed based on Transformer's BertTokenizer, which can read strings from csv, and at the same time automatically segment the text and convert it into word ids.
class TokenizerDataset(Dataset):
"""
A Dataset for some basic NLP Tasks, this dataset will automatically transform raw text into word indices
using BertTokenizer from transformers library,
see https://huggingface.co/docs/transformers/model_doc/bert?highlight=berttokenizer for details of BertTokenizer
Parameters
----------
truncation bool, truncate word sequence to 'text_max_length'
text_max_length int, max length of word sequences
tokenizer_name_or_path str, name of bert tokenizer(see transformers official for details) or path to local
transformer tokenizer folder
return_label bool, return label or not, this option is for host dataset, when running hetero-NN
"""
def __init__(self, truncation=True, text_max_length=128,
tokenizer_name_or_path="bert-base-uncased",
return_label=True):
class ImageDataset(Dataset):
"""
A basic Image Dataset built on pytorch ImageFolder, supports simple image transform
Given a folder path, ImageDataset will load images from this folder, images in this
folder need to be organized in a Torch-ImageFolder format, see
https://pytorch.org/vision/main/generated/torchvision.datasets.ImageFolder.html for details.
Image name will be automatically taken as the sample id.
Parameters
----------
center_crop : bool, use center crop transformer
center_crop_shape: tuple or list
generate_id_from_file_name: bool, whether to take image name as sample id
file_suffix: str, default is '.jpg', if generate_id_from_file_name is True, will remove this suffix from file name,
result will be the sample id
return_label: bool, return label or not, this option is for host dataset, when running hetero-NN
float64: bool, returned image tensors will be transformed to double precision
label_dtype: str, long, float, or double, the dtype of return label
"""
def __init__(self, center_crop=False, center_crop_shape=None,
generate_id_from_file_name=True, file_suffix='.jpg',
return_label=True, float64=False, label_dtype='long'):
Use Built-IN Dataset¶
Using the built-in dataset of FATE is precisely the same as using a user-customized dataset. Here we use our image dataset and a new model with conv layers to solve the MNIST handwritten recognition task again, as the example.
If you don't have the MNIST dataset, you can refer to previous tutorial and download it:
from federatedml.nn.dataset.image import ImageDataset
! ls ../examples/data/mnist/
dataset = ImageDataset()
dataset.load('../../../../examples/data/mnist/')
len(dataset)
1309
dataset[400]
(tensor([[[0.0000, 0.0275, 0.0000, ..., 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], [0.0118, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], ..., [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000]], [[0.0000, 0.0275, 0.0000, ..., 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], [0.0118, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], ..., [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000]], [[0.0000, 0.0275, 0.0000, ..., 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], [0.0118, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], ..., [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000]]]), tensor(3))
from torch import nn
import torch as t
from torch.nn import functional as F
from pipeline.component.nn.backend.torch.operation import Flatten
# a new model with conv layer, it can work with our ImageDataset
model = t.nn.Sequential(
nn.Conv2d(in_channels=3, out_channels=12, kernel_size=5),
nn.MaxPool2d(kernel_size=3),
nn.Conv2d(in_channels=12, out_channels=12, kernel_size=3),
nn.AvgPool2d(kernel_size=3),
Flatten(start_dim=1),
nn.Linear(48, 32),
nn.ReLU(),
nn.Linear(32, 10),
nn.Softmax(dim=1)
)
Local Test¶
In the case of local testing, all federation processes will be skipped, and the model will not perform fed averaging
from federatedml.nn.homo.trainer.fedavg_trainer import FedAVGTrainer
trainer = FedAVGTrainer(epochs=5, batch_size=256, shuffle=True, data_loader_worker=8, pin_memory=False) # 参数
trainer.set_model(model)
trainer.local_mode()
optimizer = t.optim.Adam(model.parameters(), lr=0.01)
loss = t.nn.CrossEntropyLoss()
trainer.train(train_set=dataset,optimizer=optimizer, loss=loss)
epoch is 0 100%|██████████| 6/6 [00:00<00:00, 7.49it/s] epoch loss is 2.6923995983336515 epoch is 1 100%|██████████| 6/6 [00:00<00:00, 7.78it/s] epoch loss is 2.636708398735915 epoch is 2 100%|██████████| 6/6 [00:00<00:00, 7.75it/s] epoch loss is 2.4953262410699364 epoch is 3 100%|██████████| 6/6 [00:00<00:00, 7.79it/s] epoch loss is 2.3616474521715647 epoch is 4 100%|██████████| 6/6 [00:00<00:00, 8.26it/s] epoch loss is 2.2441106669496635
It can work, now good to go to federated task!
A Homo-NN Task with Built-in Dataset¶
import torch as t
from torch import nn
from pipeline import fate_torch_hook
from pipeline.component import HomoNN
from pipeline.backend.pipeline import PipeLine
from pipeline.component import Reader, Evaluation, DataTransform
from pipeline.interface import Data, Model
t = fate_torch_hook(t)
import os
# bind data path to name & namespace
fate_project_path = os.path.abspath('../../../../')
host = 10000
guest = 9999
arbiter = 10000
pipeline = PipeLine().set_initiator(role='guest', party_id=guest).set_roles(guest=guest, host=host,
arbiter=arbiter)
data_0 = {"name": "mnist_guest", "namespace": "experiment"}
data_1 = {"name": "mnist_host", "namespace": "experiment"}
data_path_0 = fate_project_path + '/examples/data/mnist'
data_path_1 = fate_project_path + '/examples/data/mnist'
pipeline.bind_table(name=data_0['name'], namespace=data_0['namespace'], path=data_path_0)
pipeline.bind_table(name=data_1['name'], namespace=data_1['namespace'], path=data_path_1)
{'namespace': 'experiment', 'table_name': 'mnist_host'}
# 定义reader
reader_0 = Reader(name="reader_0")
reader_0.get_party_instance(role='guest', party_id=guest).component_param(table=data_0)
reader_0.get_party_instance(role='host', party_id=host).component_param(table=data_1)
from pipeline.component.homo_nn import DatasetParam, TrainerParam
# a new model with conv layer, it can work with our ImageDataset
model = t.nn.Sequential(
nn.Conv2d(in_channels=3, out_channels=12, kernel_size=5),
nn.MaxPool2d(kernel_size=3),
nn.Conv2d(in_channels=12, out_channels=12, kernel_size=3),
nn.AvgPool2d(kernel_size=3),
Flatten(start_dim=1),
nn.Linear(48, 32),
nn.ReLU(),
nn.Linear(32, 10),
nn.Softmax(dim=1)
)
nn_component = HomoNN(name='nn_0',
model=model, # model
loss=t.nn.CrossEntropyLoss(), # loss
optimizer=t.optim.Adam(model.parameters(), lr=0.01), # optimizer
dataset=DatasetParam(dataset_name='image', label_dtype='long'), # dataset
trainer=TrainerParam(trainer_name='fedavg_trainer', epochs=2, batch_size=1024, validation_freqs=1),
torch_seed=100 # random seed
)
pipeline.add_component(reader_0)
pipeline.add_component(nn_component, data=Data(train_data=reader_0.output.data))
pipeline.add_component(Evaluation(name='eval_0', eval_type='multi'), data=Data(data=nn_component.output.data))
<pipeline.backend.pipeline.PipeLine at 0x7fb116bffdf0>
pipeline.compile()
pipeline.fit()
2022-12-19 17:31:15.709 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:83 - Job id is 202212191731149354320 2022-12-19 17:31:15.732 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:98 - Job is still waiting, time elapse: 0:00:00 m2022-12-19 17:31:16.813 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:125 - 2022-12-19 17:31:16.815 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component reader_0, time elapse: 0:00:01 2022-12-19 17:31:17.847 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component reader_0, time elapse: 0:00:02 2022-12-19 17:31:18.874 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component reader_0, time elapse: 0:00:03 2022-12-19 17:31:19.944 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component reader_0, time elapse: 0:00:04 2022-12-19 17:31:20.978 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component reader_0, time elapse: 0:00:05 2022-12-19 17:31:22.021 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component reader_0, time elapse: 0:00:06 2022-12-19 17:31:23.072 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component reader_0, time elapse: 0:00:07 2022-12-19 17:31:24.114 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component reader_0, time elapse: 0:00:08 2022-12-19 17:31:25.144 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component reader_0, time elapse: 0:00:09 m2022-12-19 17:31:27.250 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:125 - 2022-12-19 17:31:27.256 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component nn_0, time elapse: 0:00:11 2022-12-19 17:31:28.288 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component nn_0, time elapse: 0:00:12 2022-12-19 17:31:29.378 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component nn_0, time elapse: 0:00:13 2022-12-19 17:31:30.708 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component nn_0, time elapse: 0:00:14 2022-12-19 17:31:31.771 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component nn_0, time elapse: 0:00:16 2022-12-19 17:31:32.864 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component nn_0, time elapse: 0:00:17 2022-12-19 17:31:33.906 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component nn_0, time elapse: 0:00:18 2022-12-19 17:31:34.945 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component nn_0, time elapse: 0:00:19 2022-12-19 17:31:35.997 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component nn_0, time elapse: 0:00:20 2022-12-19 17:31:37.038 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component nn_0, time elapse: 0:00:21 2022-12-19 17:31:38.085 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component nn_0, time elapse: 0:00:22 2022-12-19 17:31:39.145 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component nn_0, time elapse: 0:00:23 2022-12-19 17:31:40.189 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component nn_0, time elapse: 0:00:24 2022-12-19 17:31:41.287 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component nn_0, time elapse: 0:00:25 2022-12-19 17:31:42.321 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component nn_0, time elapse: 0:00:26 2022-12-19 17:31:43.395 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component nn_0, time elapse: 0:00:27 2022-12-19 17:31:44.515 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component nn_0, time elapse: 0:00:28 2022-12-19 17:31:45.552 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component nn_0, time elapse: 0:00:29 2022-12-19 17:31:46.670 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component nn_0, time elapse: 0:00:30 2022-12-19 17:31:47.717 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component nn_0, time elapse: 0:00:32 2022-12-19 17:31:48.824 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component nn_0, time elapse: 0:00:33 2022-12-19 17:31:50.015 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component nn_0, time elapse: 0:00:34 2022-12-19 17:31:51.117 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component nn_0, time elapse: 0:00:35 2022-12-19 17:31:52.211 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component nn_0, time elapse: 0:00:36 2022-12-19 17:31:53.299 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component nn_0, time elapse: 0:00:37 2022-12-19 17:31:54.444 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component nn_0, time elapse: 0:00:38 2022-12-19 17:31:55.488 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component nn_0, time elapse: 0:00:39 2022-12-19 17:31:56.547 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component nn_0, time elapse: 0:00:40 2022-12-19 17:31:57.642 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component nn_0, time elapse: 0:00:41 2022-12-19 17:31:58.679 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component nn_0, time elapse: 0:00:42 m2022-12-19 17:32:00.872 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:125 - 2022-12-19 17:32:00.874 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component eval_0, time elapse: 0:00:45 2022-12-19 17:32:01.946 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component eval_0, time elapse: 0:00:46 2022-12-19 17:32:03.013 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component eval_0, time elapse: 0:00:47 2022-12-19 17:32:04.096 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component eval_0, time elapse: 0:00:48 2022-12-19 17:32:05.175 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component eval_0, time elapse: 0:00:49 2022-12-19 17:32:06.217 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component eval_0, time elapse: 0:00:50 2022-12-19 17:32:07.258 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component eval_0, time elapse: 0:00:51 2022-12-19 17:32:08.313 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component eval_0, time elapse: 0:00:52 2022-12-19 17:32:09.372 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component eval_0, time elapse: 0:00:53 2022-12-19 17:32:10.445 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component eval_0, time elapse: 0:00:54 2022-12-19 17:32:11.491 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component eval_0, time elapse: 0:00:55 2022-12-19 17:32:12.570 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component eval_0, time elapse: 0:00:56 2022-12-19 17:32:13.763 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component eval_0, time elapse: 0:00:58 2022-12-19 17:32:14.822 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component eval_0, time elapse: 0:00:59 2022-12-19 17:32:15.872 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:127 - Running component eval_0, time elapse: 0:01:00 2022-12-19 17:32:18.078 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:89 - Job is success!!! Job id is 202212191731149354320 2022-12-19 17:32:18.081 | INFO | pipeline.utils.invoker.job_submitter:monitor_job_status:90 - Total time: 0:01:02