Intersection¶
This module provide some method of PSI(Private Set Intersection)
RSA Intersection¶
This mode implements algorithm based on RSA Intersection. This work is built on FATE, eggroll and federation API that construct the secure, distributed and parallel infrastructure.
Our Intersection module tries to solve Privacy-Preserving Entity Match problem. This module helps two and more parties to find common entry ids without leaking non-overlapping ids. The process is illustrated below in figure 1.
In figure 1 ,Party A has user id u1,u2,u3,u4, while Party B has u1,u2,u3,u5. After Intersection, party A and party B both learn their common user ids, which are u1,u2,u3, while neither party A nor B could decrypt each other's non-overlapping user ids. Transmission parties' processed id to the other party, like Y-A and Z-B, will not reveal raw ids. Processed Z-B is safe due to the privacy key of party B. Each Y-A includes different random value which binds to each value in X-A and will be safe as well.
Introduced in FATE version 1.6, split_calculation option is available for improved efficiency. Different from unified process described above, split_calculation process first splits hash-processed ids into even and odd groups; each group then runs through the RSA intersection process with either host or guest as the joining role. Note than with split_calculation, host(s) always know about their common even ids with guest since they are responsible for finding common even ids.
With RSA intersection, participants can get their intersection ids securely and efficiently.
RAW Intersection¶
This mode implements the simple intersection method in which a participant sends all its ids to another participant, and the other participant finds their common ids. Finally, the joining role will send the intersection ids to the sender.
DH Intersection¶
This mode implements secure intersection based on symmetric encryption with Pohlig–Hellman commutative cipher. DH Intersection is also used in Secure Information Retrieval(SIR) module.
Below is an illustration of single-host-guest DH intersection.
Here is an illustration of DH intersection with multiple hosts.
ECDH Intersection¶
This mode implements secure intersection
based on elliptic curve Diffie-Hellman scheme.
ECDH mode currently uses Curve25519,
which offers 128 bits of security with key size of 256 bits.
Below is an illustration of ECDH intersection. Note that currently ECDH method only supports single-host scenario.
For details on how to hash value to given curve, please refer here.
Intersection With Cache¶
Intersection may be conducted as online/offline phases. Both RSA and DH Intersection support cache.
Multi-Host Intersection¶
RSA, RAW, and DH intersection support multi-host scenario. It means a guest can perform intersection with more than one host simultaneously and get the common ids among all participants.
Refer to figure 2 for a demonstration of one guest running intersection with two hosts; the same process applies to cases with more than two hosts. First, guest will run intersection with each host and get respective overlapping ids. Then, guest will find common IDs from all intersection results. Optionally, guest will send common IDs to every host.
Match ID(Repeated ID) intersection¶
Starting at ver 1.7, it is recommended to assign random sid to uploaded data. Intersection module then automatically checks for and process data with instance ID.
Note that parameters for original repeated ID process such as
repeated_id_process
are deprecated in
ver 1.7. Specify sample_id_generator
to the
role whose sid to be kept. For instances, when
sample_id_generator
is set to Guest(default),
Guest's data is
sid, id, value
123, alice, 2
125, alice, 3
130, bob, 4
In Host, you data is
sid, id, value
210, alice, 5
232, alice, 5
212, bob, 4
After intersection, guest will get the intersection results:
sid, id, value
123, alice, 2
125, alice, 3
130, bob, 4
And for Host:
sid, id, value
123, alice, 5
125, alice, 5
130, bob, 4
Param¶
intersect_param
¶
Attributes¶
DEFAULT_RANDOM_BIT = 128
module-attribute
¶
Classes¶
EncodeParam(salt='', encode_method='none', base64=False)
¶
Bases: BaseParam
Define the hash method for raw intersect method
Parameters:
Name | Type | Description | Default |
---|---|---|---|
salt |
the src id will be str = str + salt, default by empty string |
''
|
|
encode_method |
the hash method of src id, support md5, sha1, sha224, sha256, sha384, sha512, sm3, default by None |
'none'
|
|
base64 |
if True, the result of hash will be changed to base64, default by False |
False
|
Source code in federatedml/param/intersect_param.py
43 44 45 46 47 |
|
Attributes¶
salt = salt
instance-attribute
¶encode_method = encode_method
instance-attribute
¶base64 = base64
instance-attribute
¶Functions¶
check()
¶Source code in federatedml/param/intersect_param.py
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
|
RAWParam(use_hash=False, salt='', hash_method='none', base64=False, join_role=consts.GUEST)
¶
Bases: BaseParam
Specify parameters for raw intersect method
Parameters:
Name | Type | Description | Default |
---|---|---|---|
use_hash |
whether to hash ids for raw intersect |
False
|
|
salt |
the src id will be str = str + salt, default by empty string |
''
|
|
hash_method |
the hash method of src id, support md5, sha1, sha224, sha256, sha384, sha512, sm3, default by None |
'none'
|
|
base64 |
if True, the result of hash will be changed to base64, default by False |
False
|
|
join_role |
role who joins ids, supports "guest" and "host" only and effective only for raw. If it is "guest", the host will send its ids to guest and find the intersection of ids in guest; if it is "host", the guest will send its ids to host. Default by "guest"; |
consts.GUEST
|
Source code in federatedml/param/intersect_param.py
93 94 95 96 97 98 99 |
|
Attributes¶
use_hash = use_hash
instance-attribute
¶salt = salt
instance-attribute
¶hash_method = hash_method
instance-attribute
¶base64 = base64
instance-attribute
¶join_role = join_role
instance-attribute
¶Functions¶
check()
¶Source code in federatedml/param/intersect_param.py
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
RSAParam(salt='', hash_method='sha256', final_hash_method='sha256', split_calculation=False, random_base_fraction=None, key_length=consts.DEFAULT_KEY_LENGTH, random_bit=DEFAULT_RANDOM_BIT)
¶
Bases: BaseParam
Specify parameters for RSA intersect method
Parameters:
Name | Type | Description | Default |
---|---|---|---|
salt |
the src id will be str = str + salt, default '' |
''
|
|
hash_method |
the hash method of src id, support sha256, sha384, sha512, sm3, default sha256 |
'sha256'
|
|
final_hash_method |
the hash method of result data string, support md5, sha1, sha224, sha256, sha384, sha512, sm3, default sha256 |
'sha256'
|
|
split_calculation |
if True, Host & Guest split operations for faster performance, recommended on large data set |
False
|
|
random_base_fraction |
if not None, generate (fraction * public key id count) of r for encryption and reuse generated r; note that value greater than 0.99 will be taken as 1, and value less than 0.01 will be rounded up to 0.01 |
None
|
|
key_length |
value >= 1024, bit count of rsa key, default 1024 |
consts.DEFAULT_KEY_LENGTH
|
|
random_bit |
it will define the size of blinding factor in rsa algorithm, default 128 |
DEFAULT_RANDOM_BIT
|
Source code in federatedml/param/intersect_param.py
144 145 146 147 148 149 150 151 152 153 154 |
|
Attributes¶
salt = salt
instance-attribute
¶hash_method = hash_method
instance-attribute
¶final_hash_method = final_hash_method
instance-attribute
¶split_calculation = split_calculation
instance-attribute
¶random_base_fraction = random_base_fraction
instance-attribute
¶key_length = key_length
instance-attribute
¶random_bit = random_bit
instance-attribute
¶Functions¶
check()
¶Source code in federatedml/param/intersect_param.py
156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 |
|
DHParam(salt='', hash_method='sha256', key_length=consts.DEFAULT_KEY_LENGTH)
¶
Bases: BaseParam
Define the hash method for DH intersect method
Parameters:
Name | Type | Description | Default |
---|---|---|---|
salt |
the src id will be str = str + salt, default '' |
''
|
|
hash_method |
the hash method of src id, support none, md5, sha1, sha 224, sha256, sha384, sha512, sm3, default sha256 |
'sha256'
|
|
key_length |
the key length of the commutative cipher p, default 1024 |
consts.DEFAULT_KEY_LENGTH
|
Source code in federatedml/param/intersect_param.py
199 200 201 202 203 |
|
Attributes¶
salt = salt
instance-attribute
¶hash_method = hash_method
instance-attribute
¶key_length = key_length
instance-attribute
¶Functions¶
check()
¶Source code in federatedml/param/intersect_param.py
205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 |
|
ECDHParam(salt='', hash_method='sha256', curve=consts.CURVE25519)
¶
Bases: BaseParam
Define the hash method for ECDH intersect method
Parameters:
Name | Type | Description | Default |
---|---|---|---|
salt |
the src id will be str = str + salt, default '' |
''
|
curve: str the name of curve, currently only support 'curve25519', which offers 128 bits of security
Source code in federatedml/param/intersect_param.py
239 240 241 242 243 |
|
Attributes¶
salt = salt
instance-attribute
¶hash_method = hash_method
instance-attribute
¶curve = curve
instance-attribute
¶Functions¶
check()
¶Source code in federatedml/param/intersect_param.py
245 246 247 248 249 250 251 252 253 254 255 256 257 |
|
IntersectCache(use_cache=False, id_type=consts.PHONE, encrypt_type=consts.SHA256)
¶
Bases: BaseParam
Parameters:
Name | Type | Description | Default |
---|---|---|---|
use_cache |
whether to use cached ids; with ver1.7 and above, this param is ignored |
False
|
|
id_type |
with ver1.7 and above, this param is ignored |
consts.PHONE
|
|
encrypt_type |
with ver1.7 and above, this param is ignored |
consts.SHA256
|
Source code in federatedml/param/intersect_param.py
261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 |
|
Attributes¶
use_cache = use_cache
instance-attribute
¶id_type = id_type
instance-attribute
¶encrypt_type = encrypt_type
instance-attribute
¶Functions¶
check()
¶Source code in federatedml/param/intersect_param.py
278 279 280 281 282 283 284 285 286 287 |
|
IntersectPreProcessParam(false_positive_rate=0.001, encrypt_method=consts.RSA, hash_method='sha256', preprocess_method='sha256', preprocess_salt='', random_state=None, filter_owner=consts.GUEST)
¶
Bases: BaseParam
Specify parameters for pre-processing and cardinality-only mode
Parameters:
Name | Type | Description | Default |
---|---|---|---|
false_positive_rate |
initial target false positive rate when creating Bloom Filter, must be <= 0.5, default 1e-3 |
0.001
|
|
encrypt_method |
encrypt method for encrypting id when performing cardinality_only task, supports rsa only, default rsa; specify rsa parameter setting with RSAParam |
consts.RSA
|
|
hash_method |
the hash method for inserting ids, support md5, sha1, sha 224, sha256, sha384, sha512, sm3, default sha256 |
'sha256'
|
|
preprocess_method |
the hash method for encoding ids before insertion into filter, default sha256, only effective for preprocessing |
'sha256'
|
|
preprocess_salt |
salt to be appended to hash result by preprocess_method before insertion into filter, default '', only effective for preprocessing |
''
|
|
random_state |
seed for random salt generator when constructing hash functions, salt is appended to hash result by hash_method when performing insertion, default None |
None
|
|
filter_owner |
role that constructs filter, either guest or host, default guest, only effective for preprocessing |
consts.GUEST
|
Source code in federatedml/param/intersect_param.py
321 322 323 324 325 326 327 328 329 330 |
|
Attributes¶
false_positive_rate = false_positive_rate
instance-attribute
¶encrypt_method = encrypt_method
instance-attribute
¶hash_method = hash_method
instance-attribute
¶preprocess_method = preprocess_method
instance-attribute
¶preprocess_salt = preprocess_salt
instance-attribute
¶random_state = random_state
instance-attribute
¶filter_owner = filter_owner
instance-attribute
¶Functions¶
check()
¶Source code in federatedml/param/intersect_param.py
332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 |
|
IntersectParam(intersect_method=consts.RSA, random_bit=DEFAULT_RANDOM_BIT, sync_intersect_ids=True, join_role=consts.GUEST, only_output_key=False, with_encode=False, encode_params=EncodeParam(), raw_params=RAWParam(), rsa_params=RSAParam(), dh_params=DHParam(), ecdh_params=ECDHParam(), join_method=consts.INNER_JOIN, new_sample_id=False, sample_id_generator=consts.GUEST, intersect_cache_param=IntersectCache(), run_cache=False, cardinality_only=False, sync_cardinality=False, cardinality_method=consts.ECDH, run_preprocess=False, intersect_preprocess_params=IntersectPreProcessParam(), repeated_id_process=False, repeated_id_owner=consts.GUEST, with_sample_id=False, allow_info_share=False, info_owner=consts.GUEST)
¶
Bases: BaseParam
Define the intersect method
Parameters:
Name | Type | Description | Default |
---|---|---|---|
intersect_method |
str
|
it supports 'rsa', 'raw', 'dh', 'ecdh', default by 'rsa' |
consts.RSA
|
random_bit |
it will define the size of blinding factor in rsa algorithm, default 128 note that this param will be deprecated in future, please use random_bit in RSAParam instead |
DEFAULT_RANDOM_BIT
|
|
sync_intersect_ids |
In rsa, 'sync_intersect_ids' is True means guest or host will send intersect results to the others, and False will not. while in raw, 'sync_intersect_ids' is True means the role of "join_role" will send intersect results and the others will get them. Default by True. |
True
|
|
join_role |
role who joins ids, supports "guest" and "host" only and effective only for raw. If it is "guest", the host will send its ids to guest and find the intersection of ids in guest; if it is "host", the guest will send its ids to host. Default by "guest"; note this param will be deprecated in future version, please use 'join_role' in raw_params instead |
consts.GUEST
|
|
only_output_key |
bool
|
if false, the results of intersection will include key and value which from input data; if true, it will just include key from input data and the value will be empty or filled by uniform string like "intersect_id" |
False
|
with_encode |
if True, it will use hash method for intersect ids, effective for raw method only; note that this param will be deprecated in future version, please use 'use_hash' in raw_params; currently if this param is set to True, specification by 'encode_params' will be taken instead of 'raw_params'. |
False
|
|
encode_params |
effective only when with_encode is True; this param will be deprecated in future version, use 'raw_params' in future implementation |
EncodeParam()
|
|
raw_params |
effective for raw method only |
RAWParam()
|
|
rsa_params |
effective for rsa method only |
RSAParam()
|
|
dh_params |
effective for dh method only |
DHParam()
|
|
ecdh_params |
effective for ecdh method only |
ECDHParam()
|
|
join_method |
if 'left_join', participants will all include sample_id_generator's (imputed) ids in output, default 'inner_join' |
consts.INNER_JOIN
|
|
new_sample_id |
bool
|
whether to generate new id for sample_id_generator's ids, only effective when join_method is 'left_join' or when input data are instance with match id, default False |
False
|
sample_id_generator |
role whose ids are to be kept, effective only when join_method is 'left_join' or when input data are instance with match id, default 'guest' |
consts.GUEST
|
|
intersect_cache_param |
specification for cache generation, with ver1.7 and above, this param is ignored. |
IntersectCache()
|
|
run_cache |
bool
|
whether to store Host's encrypted ids, only valid when intersect method is 'rsa', 'dh', 'ecdh', default False |
False
|
cardinality_only |
bool
|
whether to output estimated intersection count(cardinality); if sync_cardinality is True, then sync cardinality count with host(s) |
False
|
cardinality_method |
specify which intersect method to use for coutning cardinality, default "ecdh"; note that with "rsa", estimated cardinality will be produced; while "dh" and "ecdh" method output exact cardinality, it only supports single-host task |
consts.ECDH
|
|
sync_cardinality |
bool
|
whether to sync cardinality with all participants, default False, only effective when cardinality_only set to True |
False
|
run_preprocess |
bool
|
whether to run preprocess process, default False |
False
|
intersect_preprocess_params |
used for preprocessing and cardinality_only mode |
IntersectPreProcessParam()
|
|
repeated_id_process |
if true, intersection will process the ids which can be repeatable; in ver 1.7 and above,repeated id process will be automatically applied to data with instance id, this param will be ignored |
False
|
|
repeated_id_owner |
which role has the repeated id; in ver 1.7 and above, this param is ignored |
consts.GUEST
|
|
allow_info_share |
bool
|
in ver 1.7 and above, this param is ignored |
False
|
info_owner |
in ver 1.7 and above, this param is ignored |
consts.GUEST
|
|
with_sample_id |
data with sample id or not, default False; in ver 1.7 and above, this param is ignored |
False
|
Source code in federatedml/param/intersect_param.py
456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 |
|
Attributes¶
intersect_method = intersect_method
instance-attribute
¶random_bit = random_bit
instance-attribute
¶sync_intersect_ids = sync_intersect_ids
instance-attribute
¶join_role = join_role
instance-attribute
¶with_encode = with_encode
instance-attribute
¶encode_params = copy.deepcopy(encode_params)
instance-attribute
¶raw_params = copy.deepcopy(raw_params)
instance-attribute
¶rsa_params = copy.deepcopy(rsa_params)
instance-attribute
¶only_output_key = only_output_key
instance-attribute
¶sample_id_generator = sample_id_generator
instance-attribute
¶intersect_cache_param = copy.deepcopy(intersect_cache_param)
instance-attribute
¶run_cache = run_cache
instance-attribute
¶repeated_id_process = repeated_id_process
instance-attribute
¶repeated_id_owner = repeated_id_owner
instance-attribute
¶allow_info_share = allow_info_share
instance-attribute
¶info_owner = info_owner
instance-attribute
¶with_sample_id = with_sample_id
instance-attribute
¶join_method = join_method
instance-attribute
¶new_sample_id = new_sample_id
instance-attribute
¶dh_params = copy.deepcopy(dh_params)
instance-attribute
¶cardinality_only = cardinality_only
instance-attribute
¶sync_cardinality = sync_cardinality
instance-attribute
¶cardinality_method = cardinality_method
instance-attribute
¶run_preprocess = run_preprocess
instance-attribute
¶intersect_preprocess_params = copy.deepcopy(intersect_preprocess_params)
instance-attribute
¶ecdh_params = copy.deepcopy(ecdh_params)
instance-attribute
¶Functions¶
check()
¶Source code in federatedml/param/intersect_param.py
495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 |
|
Functions¶
Feature¶
Below lists features of each ECDH, RSA, DH, and RAW intersection methods.
Intersect Methods | PSI | Match-ID Support | Multi-Host | Exact-Cardinality | Estimated Cardinality | Preprocessing | Cache |
---|---|---|---|---|---|---|---|
ECDH | ✓ | ✓ | ✗ | ✓ | ✗ | ✓ | ✓ |
RSA | ✓ | ✓ | ✓ | ✗ | ✓ | ✓ | ✓ |
DH | ✓ | ✓ | ✓ | ✓ | ✗ | ✓ | ✓ |
RAW | ✓ | ✓ | ✓ | ✗ | ✗ | ✓ | ✗ |
All four methods support:
- Automatically match ID intersection using ID expanding (when data contains instance id).
- Configurable hashing methods, including sha256, md5, and sm3; hash operators of RSA intersection can be configured separately, please refer here for more details.
- Preprocessing step to pre-filter Host's data for faster PSI
RSA, RAW, and DH intersection methods support:
- Multi-host PSI task. The detailed configuration for multi-host task can be found here.
RSA, DH, ECDH intersection methods also support:
- PSI with cache
RAW intersection supports the following extra feature:
- base64 encoding may be used for all hashing methods.
Cardinality Computation:
-
Set
cardinality_method
torsa
will produce estimated intersection cardinality; -
Set
cardinality_method
todh
will compute exact intersection cardinality