Federated GPT-2 Tuning with Parameter Efficient methods in FATE-LLM¶
In this tutorial, we will demonstrate how to efficiently train federated large language models using the FATE-LLM framework. In FATE-LLM, we introduce the "pellm"(Parameter Efficient Large Language Model) module, specifically designed for federated learning with large language models. We enable the implementation of parameter-efficient methods in federated learning, reducing communication overhead while maintaining model performance. In this tutorial we particularlly focus on GPT-2, and we will also emphasize the use of the Adapter mechanism for fine-tuning GPT-2, which enables us to effectively reduce communication volume and improve overall efficiency.
By following this tutorial, you will learn how to leverage the FATE framework to rapidly fine-tune federated large language models, such as GPT-2, with ease and efficiency.
GPT2¶
GPT-2 is a large transformer-based language model with 1.5 billion parameters, trained on a dataset of 8 million web pages. GPT-2 is trained with a causal language modeling (CLM) objective, conditioning on a left-to-right context window of 1024 tokens. In this tutorial, we will use GPT2, you can download the pretrained model from here (We choose the smallest version for this tutorial), or let the program automatically download it when you use it later.
Dataset: IMDB Sentimental¶
In this section, we will introduce the process of preparing the IMDB dataset for use in our federated learning task. We use our tokenizer dataset(based on HuggingFace tokenizer) to preprocess the text data.
About IMDB Sentimental Dataset:
This is an binary classification dataset, you can download our processed dataset from here:
- https://webank-ai-1251170195.cos.ap-guangzhou.myqcloud.com/fate/examples/data/IMDB.csv and place it in the examples/data folder.
The orgin data is from:
Check Dataset¶
import pandas as pd
df = pd.read_csv('../../../../examples/data/IMDB.csv')
df
| id | text | label | |
|---|---|---|---|
| 0 | 0 | One of the other reviewers has mentioned that ... | 1 |
| 1 | 1 | A wonderful little production. <br /><br />The... | 1 |
| 2 | 2 | I thought this was a wonderful way to spend ti... | 1 |
| 3 | 3 | Basically there's a family where a little boy ... | 0 |
| 4 | 4 | Petter Mattei's "Love in the Time of Money" is... | 1 |
| ... | ... | ... | ... |
| 1996 | 1996 | THE CELL (2000) Rating: 8/10<br /><br />The Ce... | 1 |
| 1997 | 1997 | This movie, despite its list of B, C, and D li... | 0 |
| 1998 | 1998 | I loved this movie! It was all I could do not ... | 1 |
| 1999 | 1999 | This was the worst movie I have ever seen Bill... | 0 |
| 2000 | 2000 | Stranded in Space (1972) MST3K version - a ver... | 0 |
2001 rows × 3 columns
from federatedml.nn.dataset.nlp_tokenizer import TokenizerDataset
ds = TokenizerDataset(tokenizer_name_or_path="gpt2", text_max_length=128,
padding_side="left", return_input_ids=False, pad_token='<|endoftext|>') # you can load tokenizer config from local pretrained tokenizer
ds.load('../../../../examples/data/IMDB.csv')
ds[0]
({'input_ids': tensor([ 3198, 286, 262, 584, 30702, 468, 4750, 326, 706, 4964,
655, 352, 18024, 4471, 345, 1183, 307, 23373, 13, 1119,
389, 826, 11, 355, 428, 318, 3446, 644, 3022, 351,
502, 29847, 1671, 1220, 6927, 1671, 11037, 464, 717, 1517,
326, 7425, 502, 546, 18024, 373, 663, 24557, 290, 42880,
8589, 278, 8188, 286, 3685, 11, 543, 900, 287, 826,
422, 262, 1573, 10351, 13, 9870, 502, 11, 428, 318,
407, 257, 905, 329, 262, 18107, 2612, 276, 393, 44295,
13, 770, 905, 16194, 645, 25495, 351, 13957, 284, 5010,
11, 1714, 393, 3685, 13, 6363, 318, 22823, 11, 287,
262, 6833, 779, 286, 262, 1573, 29847, 1671, 1220, 6927,
1671, 11037, 1026, 318, 1444, 440, 57, 355, 326, 318,
262, 21814, 1813, 284, 262, 34374, 22246, 4765]),
'attention_mask': tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1])},
array([1.], dtype=float32))
For more details of FATE dataset setting, we recommend that you read through these tutorials first: NN Dataset Customization, Some Built-In Dataset,
PELLM Model with Adapter¶
In this section, we will guide you through the process of building a parameter-efficient language model using the FATE framework. We will focus on the implementation of the PELLM model and the integration of the Adapter mechanism, which enables efficient fine-tuning and reduces communication overhead in federated learning settings. Take GPT-2 as example you will learn how to leverage the FATE framework to rapidly develop and deploy a parameter-efficient language model using FATE built-in classes. Before starting this section, we recommend that you read through this tutorial first: Model Customization.
PELLM Models¶
In this section we introduce the PELLM model, which is a parameter-efficient language model that can be used in federated learning settings. They are designed to be compatible with the FATE framework to enable federated model tuning/training.
PELLM models are located at federatedml.nn.model_zoo.pellm(federatedml/nn/model_zoo/pellm):
! ls ../../../../fate/python/federatedml/nn/model_zoo/pellm
albert.py bert.py distillbert.py parameter_efficient_llm.py roberta.py bart.py deberta.py gpt2.py __pycache__
You can initialize your GPT2 model with corresponding model config dict, or you can load the pretrained model from the model folder, or download the pretrained model from the Huggingface, here we initialize the GPT2 model with the Houlsby Adapter, we will introduce Adapters in the following sub
from federatedml.nn.model_zoo.pellm.gpt2 import GPT2
# case 1 initialize with config dict
from transformers import GPT2Config
gpt2 = GPT2(config=GPT2Config().to_dict(), adapter_type='HoulsbyConfig')
# case 2 load pretrained weights from local pretrained weights, it is the same as using the huggingface pretrained model
path_to_pretrained_folder = ''
gpt2 = GPT2(pretrained_path=path_to_pretrained_folder, adapter_type='HoulsbyConfig')
# case 3 directly download models from huggingface
gpt2 = GPT2(pretrained_path='gpt2', adapter_type='HoulsbyConfig')
In this version we currently support these language model for federated training:
- Bert
- ALBert
- RoBerta
- GPT-2
- Bart
- DeBerta
- DistillBert
Also, you can use auto model(similar to automodel in huggingface) to automatically import models from your local pretrained model folder:
from federatedml.nn.model_zoo.pellm.parameter_efficient_llm import AutoPELLM
path_to_pretrained_folder = ''
gpt2 = AutoPELLM(pretrained_path=path_to_pretrained_folder, adapter_type='HoulsbyConfig')
Adapters¶
We can directly use adapters from the adapterhub. See details for adapters on this page Adapter Methods for more details. By specifying the adapter name and the adapter config dict we can insert adapters into our language models:
from transformers.adapters import HoulsbyConfig
path_to_pretrained_folder = ''
# use default adapter setting
gpt2 = GPT2(pretrained_path=path_to_pretrained_folder, adapter_type='HoulsbyConfig', adapter_config=HoulsbyConfig().to_dict())
# set adapter parameters
gpt2 = GPT2(pretrained_path=path_to_pretrained_folder, adapter_type='HoulsbyConfig', adapter_config=HoulsbyConfig(init_weights='mam_adapter').to_dict())
During the training process, all weights of the pretrained language model will be frozen, and weights of adapters are traininable. Thus, FATE only train in the local training and aggregate adapters' weights in the fedederation process
Now available adapters are Adapters Overview for details.
Use PELLM Model in FATE with CustModel¶
In this Model Customization tutorial, we demonstrate how to employ the t.nn.CustomModel class in fate_torch to parse a model's structure and submit it to a federated learning task. The CustomModel automatically imports the model class from the model_zoo and initializes the models with the parameters provided. Since these language models are built-in, we can directly use them in the CustomModel and easily add a classifier head to address the classification task at hand:
import torch as t
from pipeline import fate_torch_hook
from pipeline.component.nn import save_to_fate
fate_torch_hook(t)
%%save_to_fate model classifier_head.py
import torch as t
class Classifier(t.nn.Module):
def __init__(self, in_size=768):
super().__init__()
self.linr = t.nn.Linear(in_size, 1)
self.sigmoid = t.nn.Sigmoid()
def forward(self, x):
x = x.last_hidden_state[::, -1]
return self.sigmoid(self.linr(x))
# build CustModel with PELLM, and add a classifier head
from transformers import GPT2Config
model = t.nn.Sequential(
t.nn.CustModel(module_name='pellm.gpt2', class_name='GPT2', config=GPT2Config().to_dict(), adapter_type='HoulsbyConfig'),
t.nn.CustModel(module_name='classifier_head', class_name='Classifier', in_size=768)
)
Please note that during the training process, the classifier parameters will be trained and sent to the server for aggregation. In fact, all trainable parameters will participate in the federated learning process.
Local Test¶
Before submitting a federated learning task, we will demonstrate how to perform local testing to ensure the proper functionality of your custom dataset, model. We use the local mode of our FedAVGTrainer to test if our setting can run correctly.
from federatedml.nn.model_zoo.pellm.gpt2 import GPT2
from federatedml.nn.homo.trainer.fedavg_trainer import FedAVGTrainer
from transformers import GPT2Config
from federatedml.nn.dataset.nlp_tokenizer import TokenizerDataset
# load dataset
ds = TokenizerDataset(tokenizer_name_or_path="gpt2", text_max_length=128,
padding_side="left", return_input_ids=False, pad_token='<|endoftext|>') # you can load tokenizer config from local pretrained tokenizer
ds.load('../../../../examples/data/IMDB.csv')
model = t.nn.Sequential(
GPT2(config=GPT2Config().to_dict(), adapter_type='HoulsbyConfig'),
Classifier(768)
)
trainer = FedAVGTrainer(epochs=1, batch_size=8, shuffle=True, data_loader_worker=8)
trainer.local_mode()
trainer.set_model(model)
Loading model from model config dict PELM model summary: ================================================================================ Name Architecture #Param %Param Active Train -------------------------------------------------------------------------------- federation bottleneck 1,789,056 1.438 1 1 -------------------------------------------------------------------------------- Full model 124,439,808 100.000 0 ================================================================================
opt = t.optim.Adam(model.parameters(), lr=0.001)
loss = t.nn.BCELoss()
# local test, here we only use CPU for training
trainer.train(ds, None, opt, loss)
epoch is 0 100%|██████████| 251/251 [07:11<00:00, 1.72s/it] epoch loss is 0.7566220122894486
Submit Federated Task¶
Once you have successfully completed local testing, We can submit a task to FATE. Please notice that this tutorial is ran on a standalone version. Please notice that in this tutorial we are using a standalone version, if you are using a cluster version, you need to bind the data with the corresponding name&namespace on each machine.
In this example we load pretrained weights for gpt2 model.
import torch as t
import os
from pipeline import fate_torch_hook
from pipeline.component import HomoNN
from pipeline.backend.pipeline import PipeLine
from pipeline.component import Reader
from pipeline.interface import Data
from transformers import GPT2Config
fate_torch_hook(t)
fate_project_path = os.path.abspath('../../../../')
guest_0 = 10000
host_1 = 9999
pipeline = PipeLine().set_initiator(role='guest', party_id=guest_0).set_roles(guest=guest_0, host=host_1,
arbiter=guest_0)
data_0 = {"name": "imdb", "namespace": "experiment"}
data_path = fate_project_path + '/examples/data/IMDB.csv'
pipeline.bind_table(name=data_0['name'], namespace=data_0['namespace'], path=data_path)
pipeline.bind_table(name=data_0['name'], namespace=data_0['namespace'], path=data_path)
reader_0 = Reader(name="reader_0")
reader_0.get_party_instance(role='guest', party_id=guest_0).component_param(table=data_0)
reader_0.get_party_instance(role='host', party_id=host_1).component_param(table=data_0)
reader_1 = Reader(name="reader_1")
reader_1.get_party_instance(role='guest', party_id=guest_0).component_param(table=data_0)
reader_1.get_party_instance(role='host', party_id=host_1).component_param(table=data_0)
## Add your pretriained model path here, will load model&tokenizer from this path
model_path = ''
from pipeline.component.homo_nn import DatasetParam, TrainerParam
model = t.nn.Sequential(
t.nn.CustModel(module_name='pellm.gpt2', class_name='GPT2', config=GPT2Config().to_dict(), adapter_type='HoulsbyConfig'),
t.nn.CustModel(module_name='classifier_head', class_name='Classifier', in_size=768)
)
# DatasetParam
dataset_param = DatasetParam(dataset_name='nlp_tokenizer',text_max_length=128, tokenizer_name_or_path=model_path,
padding_side="left", return_input_ids=False, pad_token='<|endoftext|>')
# TrainerParam
trainer_param = TrainerParam(trainer_name='fedavg_trainer', epochs=1, batch_size=8,
data_loader_worker=8, secure_aggregate=True)
nn_component = HomoNN(name='nn_0', model=model)
# set parameter for client 1
nn_component.get_party_instance(role='guest', party_id=guest_0).component_param(
loss=t.nn.BCELoss(),
optimizer = t.optim.Adam(lr=0.0001, eps=1e-8),
dataset=dataset_param,
trainer=trainer_param,
torch_seed=100
)
# set parameter for client 2
nn_component.get_party_instance(role='host', party_id=host_1).component_param(
loss=t.nn.BCELoss(),
optimizer = t.optim.Adam(lr=0.0001, eps=1e-8),
dataset=dataset_param,
trainer=trainer_param,
torch_seed=100
)
# set parameter for server
nn_component.get_party_instance(role='arbiter', party_id=guest_0).component_param(
trainer=trainer_param
)
pipeline.add_component(reader_0)
pipeline.add_component(nn_component, data=Data(train_data=reader_0.output.data))
pipeline.compile()
pipeline.fit()
<pipeline.backend.pipeline.PipeLine at 0x7f43f11b8820>
You can use this script to submit the model, but submitting the model will take a long time to train and generate a long log, so we won't do it here.
Training with CUDA¶
You can use GPU by setting the cuda parameter of the FedAVGTrainer:
trainer_param = TrainerParam(trainer_name='fedavg_trainer', epochs=1, batch_size=8,
data_loader_worker=8, secure_aggregate=True, cuda=0)
The cuda parameter here accepts an integer value that corresponds to the index of the GPU you want to use for training. In the example above, the value is set to 0, which means that on every client the first available GPU in the system will be used. If you have multiple GPUs and would like to use a specific one, simply change the value of the cuda parameter to the appropriate index.
Training with multiple GPUs¶
To take advantage of multiple GPUs, you can assign different GPUs to different clients during federated learning. In the example below, we configure two clients to use different sets of GPUs.
client_0_param = TrainerParam(trainer_name='fedavg_trainer', epochs=1, batch_size=8,
data_loader_worker=8, secure_aggregate=True, cuda=[0, 1, 2, 3])
client_1_param = TrainerParam(trainer_name='fedavg_trainer', epochs=1, batch_size=8,
data_loader_worker=8, secure_aggregate=True, cuda=[0, 3, 4])
server_param = TrainerParam(trainer_name='fedavg_trainer', epochs=1, batch_size=8,
data_loader_worker=8, secure_aggregate=True)
# set parameter for client 1
nn_component.get_party_instance(role='guest', party_id=guest_0).component_param(
loss=t.nn.BCELoss(),
optimizer = t.optim.Adam(lr=0.0001, eps=1e-8),
dataset=dataset_param,
trainer=client_0_param,
torch_seed=100
)
# set parameter for client 2
nn_component.get_party_instance(role='host', party_id=host_1).component_param(
loss=t.nn.BCELoss(),
optimizer = t.optim.Adam(lr=0.0001, eps=1e-8),
dataset=dataset_param,
trainer=client_1_param,
torch_seed=100
)
# set parameter for server
nn_component.get_party_instance(role='arbiter', party_id=guest_0).component_param(
trainer=server_param
)
In this example, client_0 is set to use GPUs with indices [0, 1, 2, 3], while client_1 uses GPUs with indices [0, 3, 4]. The server does not support GPUs usage in the aggregation procession