跳转至

Intersection

This module provide some method of PSI(Private Set Intersection)

RSA Intersection

This mode implements algorithm based on RSA Intersection. This work is built on FATE, eggroll and federation API that construct the secure, distributed and parallel infrastructure.

Our Intersection module tries to solve Privacy-Preserving Entity Match problem. This module helps two and more parties to find common entry ids without leaking non-overlapping ids. The process is illustrated below in figure 1.

Figure 1 (RSA Intersection between party A and party
B)

In figure 1 ,Party A has user id u1,u2,u3,u4, while Party B has u1,u2,u3,u5. After Intersection, party A and party B both learn their common user ids, which are u1,u2,u3, while neither party A nor B could decrypt each other's non-overlapping user ids. Transmission parties' processed id to the other party, like Y-A and Z-B, will not reveal raw ids. Processed Z-B is safe due to the privacy key of party B. Each Y-A includes different random value which binds to each value in X-A and will be safe as well.

Introduced in FATE version 1.6, split_calculation option is available for improved efficiency. Different from unified process described above, split_calculation process first splits hash-processed ids into even and odd groups; each group then runs through the RSA intersection process with either host or guest as the joining role. Note than with split_calculation, host(s) always know about their common even ids with guest since they are responsible for finding common even ids.

With RSA intersection, participants can get their intersection ids securely and efficiently.

RAW Intersection

This mode implements the simple intersection method in which a participant sends all its ids to another participant, and the other participant finds their common ids. Finally, the joining role will send the intersection ids to the sender.

DH Intersection

This mode implements secure intersection based on symmetric encryption with Pohlig–Hellman commutative cipher. DH Intersection is also used in Secure Information Retrieval(SIR) module.

Below is an illustration of single-host-guest DH intersection.

Figure 2 (DH
Intersection)

Here is an illustration of DH intersection with multiple hosts.

Figure 3 (Multi-host DH
Intersection)

ECDH Intersection

This mode implements secure intersection based on elliptic curve Diffie-Hellman scheme. ECDH mode currently uses Curve25519,
which offers 128 bits of security with key size of 256 bits.

Below is an illustration of ECDH intersection. Note that currently ECDH method only supports single-host scenario.

Figure 4 (ECDH
Intersection)

For details on how to hash value to given curve, please refer here.

Intersection With Cache

Intersection may be conducted as online/offline phases. Both RSA and DH Intersection support cache.

Multi-Host Intersection

RSA, RAW, and DH intersection support multi-host scenario. It means a guest can perform intersection with more than one host simultaneously and get the common ids among all participants.

Figure 5 (multi-hosts
Intersection)

Refer to figure 2 for a demonstration of one guest running intersection with two hosts; the same process applies to cases with more than two hosts. First, guest will run intersection with each host and get respective overlapping ids. Then, guest will find common IDs from all intersection results. Optionally, guest will send common IDs to every host.

Match ID(Repeated ID) intersection

Starting at ver 1.7, it is recommended to assign random sid to uploaded data. Intersection module then automatically checks for and process data with instance ID.

Note that parameters for original repeated ID process such as repeated_id_process are deprecated in ver 1.7. Specify sample_id_generator to the role whose sid to be kept. For instances, when sample_id_generator is set to Guest(default), Guest's data is

sid, id, value
123, alice, 2
125, alice, 3
130, bob, 4

In Host, you data is

sid, id, value
210, alice, 5
232, alice, 5
212, bob, 4

After intersection, guest will get the intersection results:

sid, id, value
123, alice, 2
125, alice, 3
130, bob, 4

And for Host:

sid, id, value
123, alice, 5
125, alice, 5
130, bob, 4

Param

intersect_param

Attributes

DEFAULT_RANDOM_BIT = 128 module-attribute

Classes

EncodeParam(salt='', encode_method='none', base64=False)

Bases: BaseParam

Define the hash method for raw intersect method

Parameters:

Name Type Description Default
salt

the src id will be str = str + salt, default by empty string

''
encode_method

the hash method of src id, support md5, sha1, sha224, sha256, sha384, sha512, sm3, default by None

'none'
base64

if True, the result of hash will be changed to base64, default by False

False
Source code in federatedml/param/intersect_param.py
43
44
45
46
47
def __init__(self, salt='', encode_method='none', base64=False):
    super().__init__()
    self.salt = salt
    self.encode_method = encode_method
    self.base64 = base64
Attributes
salt = salt instance-attribute
encode_method = encode_method instance-attribute
base64 = base64 instance-attribute
Functions
check()
Source code in federatedml/param/intersect_param.py
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
def check(self):
    if type(self.salt).__name__ != "str":
        raise ValueError(
            "encode param's salt {} not supported, should be str type".format(
                self.salt))

    descr = "encode param's "

    self.encode_method = self.check_and_change_lower(self.encode_method,
                                                     ["none", consts.MD5, consts.SHA1, consts.SHA224,
                                                      consts.SHA256, consts.SHA384, consts.SHA512,
                                                      consts.SM3],
                                                     descr)

    if type(self.base64).__name__ != "bool":
        raise ValueError(
            "hash param's base64 {} not supported, should be bool type".format(self.base64))

    LOGGER.debug("Finish EncodeParam check!")
    LOGGER.warning(f"'EncodeParam' will be replaced by 'RAWParam' in future release."
                   f"Please do not rely on current param naming in application.")
    return True
RAWParam(use_hash=False, salt='', hash_method='none', base64=False, join_role=consts.GUEST)

Bases: BaseParam

Specify parameters for raw intersect method

Parameters:

Name Type Description Default
use_hash

whether to hash ids for raw intersect

False
salt

the src id will be str = str + salt, default by empty string

''
hash_method

the hash method of src id, support md5, sha1, sha224, sha256, sha384, sha512, sm3, default by None

'none'
base64

if True, the result of hash will be changed to base64, default by False

False
join_role

role who joins ids, supports "guest" and "host" only and effective only for raw. If it is "guest", the host will send its ids to guest and find the intersection of ids in guest; if it is "host", the guest will send its ids to host. Default by "guest";

consts.GUEST
Source code in federatedml/param/intersect_param.py
93
94
95
96
97
98
99
def __init__(self, use_hash=False, salt='', hash_method='none', base64=False, join_role=consts.GUEST):
    super().__init__()
    self.use_hash = use_hash
    self.salt = salt
    self.hash_method = hash_method
    self.base64 = base64
    self.join_role = join_role
Attributes
use_hash = use_hash instance-attribute
salt = salt instance-attribute
hash_method = hash_method instance-attribute
base64 = base64 instance-attribute
join_role = join_role instance-attribute
Functions
check()
Source code in federatedml/param/intersect_param.py
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
def check(self):
    descr = "raw param's "

    self.check_boolean(self.use_hash, f"{descr}use_hash")
    self.check_string(self.salt, f"{descr}salt")

    self.hash_method = self.check_and_change_lower(self.hash_method,
                                                   ["none", consts.MD5, consts.SHA1, consts.SHA224,
                                                    consts.SHA256, consts.SHA384, consts.SHA512,
                                                    consts.SM3],
                                                   f"{descr}hash_method")

    self.check_boolean(self.base64, f"{descr}base_64")
    self.join_role = self.check_and_change_lower(self.join_role, [consts.GUEST, consts.HOST], f"{descr}join_role")

    LOGGER.debug("Finish RAWParam check!")
    return True
RSAParam(salt='', hash_method='sha256', final_hash_method='sha256', split_calculation=False, random_base_fraction=None, key_length=consts.DEFAULT_KEY_LENGTH, random_bit=DEFAULT_RANDOM_BIT)

Bases: BaseParam

Specify parameters for RSA intersect method

Parameters:

Name Type Description Default
salt

the src id will be str = str + salt, default ''

''
hash_method

the hash method of src id, support sha256, sha384, sha512, sm3, default sha256

'sha256'
final_hash_method

the hash method of result data string, support md5, sha1, sha224, sha256, sha384, sha512, sm3, default sha256

'sha256'
split_calculation

if True, Host & Guest split operations for faster performance, recommended on large data set

False
random_base_fraction

if not None, generate (fraction * public key id count) of r for encryption and reuse generated r; note that value greater than 0.99 will be taken as 1, and value less than 0.01 will be rounded up to 0.01

None
key_length

value >= 1024, bit count of rsa key, default 1024

consts.DEFAULT_KEY_LENGTH
random_bit

it will define the size of blinding factor in rsa algorithm, default 128

DEFAULT_RANDOM_BIT
Source code in federatedml/param/intersect_param.py
144
145
146
147
148
149
150
151
152
153
154
def __init__(self, salt='', hash_method='sha256', final_hash_method='sha256',
             split_calculation=False, random_base_fraction=None, key_length=consts.DEFAULT_KEY_LENGTH,
             random_bit=DEFAULT_RANDOM_BIT):
    super().__init__()
    self.salt = salt
    self.hash_method = hash_method
    self.final_hash_method = final_hash_method
    self.split_calculation = split_calculation
    self.random_base_fraction = random_base_fraction
    self.key_length = key_length
    self.random_bit = random_bit
Attributes
salt = salt instance-attribute
hash_method = hash_method instance-attribute
final_hash_method = final_hash_method instance-attribute
split_calculation = split_calculation instance-attribute
random_base_fraction = random_base_fraction instance-attribute
key_length = key_length instance-attribute
random_bit = random_bit instance-attribute
Functions
check()
Source code in federatedml/param/intersect_param.py
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
def check(self):
    descr = "rsa param's "
    self.check_string(self.salt, f"{descr}salt")

    self.hash_method = self.check_and_change_lower(self.hash_method,
                                                   [consts.SHA256, consts.SHA384, consts.SHA512, consts.SM3],
                                                   f"{descr}hash_method")

    self.final_hash_method = self.check_and_change_lower(self.final_hash_method,
                                                         [consts.MD5, consts.SHA1, consts.SHA224,
                                                          consts.SHA256, consts.SHA384, consts.SHA512,
                                                          consts.SM3],
                                                         f"{descr}final_hash_method")

    self.check_boolean(self.split_calculation, f"{descr}split_calculation")

    if self.random_base_fraction:
        self.check_positive_number(self.random_base_fraction, descr)
        self.check_decimal_float(self.random_base_fraction, f"{descr}random_base_fraction")

    self.check_positive_integer(self.key_length, f"{descr}key_length")
    if self.key_length < 1024:
        raise ValueError(f"key length must be >= 1024")
    self.check_positive_integer(self.random_bit, f"{descr}random_bit")

    LOGGER.debug("Finish RSAParam parameter check!")
    return True
DHParam(salt='', hash_method='sha256', key_length=consts.DEFAULT_KEY_LENGTH)

Bases: BaseParam

Define the hash method for DH intersect method

Parameters:

Name Type Description Default
salt

the src id will be str = str + salt, default ''

''
hash_method

the hash method of src id, support none, md5, sha1, sha 224, sha256, sha384, sha512, sm3, default sha256

'sha256'
key_length

the key length of the commutative cipher p, default 1024

consts.DEFAULT_KEY_LENGTH
Source code in federatedml/param/intersect_param.py
199
200
201
202
203
def __init__(self, salt='', hash_method='sha256', key_length=consts.DEFAULT_KEY_LENGTH):
    super().__init__()
    self.salt = salt
    self.hash_method = hash_method
    self.key_length = key_length
Attributes
salt = salt instance-attribute
hash_method = hash_method instance-attribute
key_length = key_length instance-attribute
Functions
check()
Source code in federatedml/param/intersect_param.py
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
def check(self):
    descr = "dh param's "
    self.check_string(self.salt, f"{descr}salt")

    self.hash_method = self.check_and_change_lower(self.hash_method,
                                                   ["none", consts.MD5, consts.SHA1, consts.SHA224,
                                                    consts.SHA256, consts.SHA384, consts.SHA512,
                                                    consts.SM3],
                                                   f"{descr}hash_method")

    self.check_positive_integer(self.key_length, f"{descr}key_length")
    if self.key_length < 1024:
        raise ValueError(f"key length must be >= 1024")

    LOGGER.debug("Finish DHParam parameter check!")
    return True
ECDHParam(salt='', hash_method='sha256', curve=consts.CURVE25519)

Bases: BaseParam

Define the hash method for ECDH intersect method

Parameters:

Name Type Description Default
salt

the src id will be str = str + salt, default ''

''

curve: str the name of curve, currently only support 'curve25519', which offers 128 bits of security

Source code in federatedml/param/intersect_param.py
239
240
241
242
243
def __init__(self, salt='', hash_method='sha256', curve=consts.CURVE25519):
    super().__init__()
    self.salt = salt
    self.hash_method = hash_method
    self.curve = curve
Attributes
salt = salt instance-attribute
hash_method = hash_method instance-attribute
curve = curve instance-attribute
Functions
check()
Source code in federatedml/param/intersect_param.py
245
246
247
248
249
250
251
252
253
254
255
256
257
def check(self):
    descr = "ecdh param's "
    self.check_string(self.salt, f"{descr}salt")

    self.hash_method = self.check_and_change_lower(self.hash_method,
                                                   [consts.SHA256, consts.SHA384, consts.SHA512,
                                                    consts.SM3],
                                                   f"{descr}hash_method")

    self.curve = self.check_and_change_lower(self.curve, [consts.CURVE25519], f"{descr}curve")

    LOGGER.debug("Finish ECDHParam parameter check!")
    return True
IntersectCache(use_cache=False, id_type=consts.PHONE, encrypt_type=consts.SHA256)

Bases: BaseParam

Parameters:

Name Type Description Default
use_cache

whether to use cached ids; with ver1.7 and above, this param is ignored

False
id_type

with ver1.7 and above, this param is ignored

consts.PHONE
encrypt_type

with ver1.7 and above, this param is ignored

consts.SHA256
Source code in federatedml/param/intersect_param.py
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
def __init__(self, use_cache=False, id_type=consts.PHONE, encrypt_type=consts.SHA256):
    """

    Parameters
    ----------
    use_cache: bool
        whether to use cached ids; with ver1.7 and above, this param is ignored
    id_type
        with ver1.7 and above, this param is ignored
    encrypt_type
        with ver1.7 and above, this param is ignored
    """
    super().__init__()
    self.use_cache = use_cache
    self.id_type = id_type
    self.encrypt_type = encrypt_type
Attributes
use_cache = use_cache instance-attribute
id_type = id_type instance-attribute
encrypt_type = encrypt_type instance-attribute
Functions
check()
Source code in federatedml/param/intersect_param.py
278
279
280
281
282
283
284
285
286
287
def check(self):
    descr = "intersect_cache param's "
    # self.check_boolean(self.use_cache, f"{descr}use_cache")

    self.check_and_change_lower(self.id_type,
                                [consts.PHONE, consts.IMEI],
                                f"{descr}id_type")
    self.check_and_change_lower(self.encrypt_type,
                                [consts.MD5, consts.SHA256],
                                f"{descr}encrypt_type")
IntersectPreProcessParam(false_positive_rate=0.001, encrypt_method=consts.RSA, hash_method='sha256', preprocess_method='sha256', preprocess_salt='', random_state=None, filter_owner=consts.GUEST)

Bases: BaseParam

Specify parameters for pre-processing and cardinality-only mode

Parameters:

Name Type Description Default
false_positive_rate

initial target false positive rate when creating Bloom Filter, must be <= 0.5, default 1e-3

0.001
encrypt_method

encrypt method for encrypting id when performing cardinality_only task, supports rsa only, default rsa; specify rsa parameter setting with RSAParam

consts.RSA
hash_method

the hash method for inserting ids, support md5, sha1, sha 224, sha256, sha384, sha512, sm3, default sha256

'sha256'
preprocess_method

the hash method for encoding ids before insertion into filter, default sha256, only effective for preprocessing

'sha256'
preprocess_salt

salt to be appended to hash result by preprocess_method before insertion into filter, default '', only effective for preprocessing

''
random_state

seed for random salt generator when constructing hash functions, salt is appended to hash result by hash_method when performing insertion, default None

None
filter_owner

role that constructs filter, either guest or host, default guest, only effective for preprocessing

consts.GUEST
Source code in federatedml/param/intersect_param.py
321
322
323
324
325
326
327
328
329
330
def __init__(self, false_positive_rate=1e-3, encrypt_method=consts.RSA, hash_method='sha256',
             preprocess_method='sha256', preprocess_salt='', random_state=None, filter_owner=consts.GUEST):
    super().__init__()
    self.false_positive_rate = false_positive_rate
    self.encrypt_method = encrypt_method
    self.hash_method = hash_method
    self.preprocess_method = preprocess_method
    self.preprocess_salt = preprocess_salt
    self.random_state = random_state
    self.filter_owner = filter_owner
Attributes
false_positive_rate = false_positive_rate instance-attribute
encrypt_method = encrypt_method instance-attribute
hash_method = hash_method instance-attribute
preprocess_method = preprocess_method instance-attribute
preprocess_salt = preprocess_salt instance-attribute
random_state = random_state instance-attribute
filter_owner = filter_owner instance-attribute
Functions
check()
Source code in federatedml/param/intersect_param.py
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
def check(self):
    descr = "intersect preprocess param's false_positive_rate "
    self.check_decimal_float(self.false_positive_rate, descr)
    self.check_positive_number(self.false_positive_rate, descr)
    if self.false_positive_rate > 0.5:
        raise ValueError(f"{descr} must be positive float no greater than 0.5")

    descr = "intersect preprocess param's encrypt_method "
    self.encrypt_method = self.check_and_change_lower(self.encrypt_method, [consts.RSA], descr)

    descr = "intersect preprocess param's random_state "
    if self.random_state:
        self.check_nonnegative_number(self.random_state, descr)

    descr = "intersect preprocess param's hash_method "
    self.hash_method = self.check_and_change_lower(self.hash_method,
                                                   [consts.MD5, consts.SHA1, consts.SHA224,
                                                    consts.SHA256, consts.SHA384, consts.SHA512,
                                                    consts.SM3],
                                                   descr)
    descr = "intersect preprocess param's preprocess_salt "
    self.check_string(self.preprocess_salt, descr)

    descr = "intersect preprocess param's preprocess_method "
    self.preprocess_method = self.check_and_change_lower(self.preprocess_method,
                                                         [consts.MD5, consts.SHA1, consts.SHA224,
                                                          consts.SHA256, consts.SHA384, consts.SHA512,
                                                          consts.SM3],
                                                         descr)

    descr = "intersect preprocess param's filter_owner "
    self.filter_owner = self.check_and_change_lower(self.filter_owner,
                                                    [consts.GUEST, consts.HOST],
                                                    descr)

    LOGGER.debug("Finish IntersectPreProcessParam parameter check!")
    return True
IntersectParam(intersect_method=consts.RSA, random_bit=DEFAULT_RANDOM_BIT, sync_intersect_ids=True, join_role=consts.GUEST, only_output_key=False, with_encode=False, encode_params=EncodeParam(), raw_params=RAWParam(), rsa_params=RSAParam(), dh_params=DHParam(), ecdh_params=ECDHParam(), join_method=consts.INNER_JOIN, new_sample_id=False, sample_id_generator=consts.GUEST, intersect_cache_param=IntersectCache(), run_cache=False, cardinality_only=False, sync_cardinality=False, cardinality_method=consts.ECDH, run_preprocess=False, intersect_preprocess_params=IntersectPreProcessParam(), repeated_id_process=False, repeated_id_owner=consts.GUEST, with_sample_id=False, allow_info_share=False, info_owner=consts.GUEST)

Bases: BaseParam

Define the intersect method

Parameters:

Name Type Description Default
intersect_method str

it supports 'rsa', 'raw', 'dh', 'ecdh', default by 'rsa'

consts.RSA
random_bit

it will define the size of blinding factor in rsa algorithm, default 128 note that this param will be deprecated in future, please use random_bit in RSAParam instead

DEFAULT_RANDOM_BIT
sync_intersect_ids

In rsa, 'sync_intersect_ids' is True means guest or host will send intersect results to the others, and False will not. while in raw, 'sync_intersect_ids' is True means the role of "join_role" will send intersect results and the others will get them. Default by True.

True
join_role

role who joins ids, supports "guest" and "host" only and effective only for raw. If it is "guest", the host will send its ids to guest and find the intersection of ids in guest; if it is "host", the guest will send its ids to host. Default by "guest"; note this param will be deprecated in future version, please use 'join_role' in raw_params instead

consts.GUEST
only_output_key bool

if false, the results of intersection will include key and value which from input data; if true, it will just include key from input data and the value will be empty or filled by uniform string like "intersect_id"

False
with_encode

if True, it will use hash method for intersect ids, effective for raw method only; note that this param will be deprecated in future version, please use 'use_hash' in raw_params; currently if this param is set to True, specification by 'encode_params' will be taken instead of 'raw_params'.

False
encode_params

effective only when with_encode is True; this param will be deprecated in future version, use 'raw_params' in future implementation

EncodeParam()
raw_params

effective for raw method only

RAWParam()
rsa_params

effective for rsa method only

RSAParam()
dh_params

effective for dh method only

DHParam()
ecdh_params

effective for ecdh method only

ECDHParam()
join_method

if 'left_join', participants will all include sample_id_generator's (imputed) ids in output, default 'inner_join'

consts.INNER_JOIN
new_sample_id bool

whether to generate new id for sample_id_generator's ids, only effective when join_method is 'left_join' or when input data are instance with match id, default False

False
sample_id_generator

role whose ids are to be kept, effective only when join_method is 'left_join' or when input data are instance with match id, default 'guest'

consts.GUEST
intersect_cache_param

specification for cache generation, with ver1.7 and above, this param is ignored.

IntersectCache()
run_cache bool

whether to store Host's encrypted ids, only valid when intersect method is 'rsa', 'dh', 'ecdh', default False

False
cardinality_only bool

whether to output estimated intersection count(cardinality); if sync_cardinality is True, then sync cardinality count with host(s)

False
cardinality_method

specify which intersect method to use for coutning cardinality, default "ecdh"; note that with "rsa", estimated cardinality will be produced; while "dh" and "ecdh" method output exact cardinality, it only supports single-host task

consts.ECDH
sync_cardinality bool

whether to sync cardinality with all participants, default False, only effective when cardinality_only set to True

False
run_preprocess bool

whether to run preprocess process, default False

False
intersect_preprocess_params

used for preprocessing and cardinality_only mode

IntersectPreProcessParam()
repeated_id_process

if true, intersection will process the ids which can be repeatable; in ver 1.7 and above,repeated id process will be automatically applied to data with instance id, this param will be ignored

False
repeated_id_owner

which role has the repeated id; in ver 1.7 and above, this param is ignored

consts.GUEST
allow_info_share bool

in ver 1.7 and above, this param is ignored

False
info_owner

in ver 1.7 and above, this param is ignored

consts.GUEST
with_sample_id

data with sample id or not, default False; in ver 1.7 and above, this param is ignored

False
Source code in federatedml/param/intersect_param.py
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
def __init__(self, intersect_method: str = consts.RSA, random_bit=DEFAULT_RANDOM_BIT, sync_intersect_ids=True,
             join_role=consts.GUEST, only_output_key: bool = False,
             with_encode=False, encode_params=EncodeParam(),
             raw_params=RAWParam(), rsa_params=RSAParam(), dh_params=DHParam(), ecdh_params=ECDHParam(),
             join_method=consts.INNER_JOIN, new_sample_id: bool = False, sample_id_generator=consts.GUEST,
             intersect_cache_param=IntersectCache(), run_cache: bool = False,
             cardinality_only: bool = False, sync_cardinality: bool = False, cardinality_method=consts.ECDH,
             run_preprocess: bool = False,
             intersect_preprocess_params=IntersectPreProcessParam(),
             repeated_id_process=False, repeated_id_owner=consts.GUEST,
             with_sample_id=False, allow_info_share: bool = False, info_owner=consts.GUEST):
    super().__init__()
    self.intersect_method = intersect_method
    self.random_bit = random_bit
    self.sync_intersect_ids = sync_intersect_ids
    self.join_role = join_role
    self.with_encode = with_encode
    self.encode_params = copy.deepcopy(encode_params)
    self.raw_params = copy.deepcopy(raw_params)
    self.rsa_params = copy.deepcopy(rsa_params)
    self.only_output_key = only_output_key
    self.sample_id_generator = sample_id_generator
    self.intersect_cache_param = copy.deepcopy(intersect_cache_param)
    self.run_cache = run_cache
    self.repeated_id_process = repeated_id_process
    self.repeated_id_owner = repeated_id_owner
    self.allow_info_share = allow_info_share
    self.info_owner = info_owner
    self.with_sample_id = with_sample_id
    self.join_method = join_method
    self.new_sample_id = new_sample_id
    self.dh_params = copy.deepcopy(dh_params)
    self.cardinality_only = cardinality_only
    self.sync_cardinality = sync_cardinality
    self.cardinality_method = cardinality_method
    self.run_preprocess = run_preprocess
    self.intersect_preprocess_params = copy.deepcopy(intersect_preprocess_params)
    self.ecdh_params = copy.deepcopy(ecdh_params)
Attributes
intersect_method = intersect_method instance-attribute
random_bit = random_bit instance-attribute
sync_intersect_ids = sync_intersect_ids instance-attribute
join_role = join_role instance-attribute
with_encode = with_encode instance-attribute
encode_params = copy.deepcopy(encode_params) instance-attribute
raw_params = copy.deepcopy(raw_params) instance-attribute
rsa_params = copy.deepcopy(rsa_params) instance-attribute
only_output_key = only_output_key instance-attribute
sample_id_generator = sample_id_generator instance-attribute
intersect_cache_param = copy.deepcopy(intersect_cache_param) instance-attribute
run_cache = run_cache instance-attribute
repeated_id_process = repeated_id_process instance-attribute
repeated_id_owner = repeated_id_owner instance-attribute
allow_info_share = allow_info_share instance-attribute
info_owner = info_owner instance-attribute
with_sample_id = with_sample_id instance-attribute
join_method = join_method instance-attribute
new_sample_id = new_sample_id instance-attribute
dh_params = copy.deepcopy(dh_params) instance-attribute
cardinality_only = cardinality_only instance-attribute
sync_cardinality = sync_cardinality instance-attribute
cardinality_method = cardinality_method instance-attribute
run_preprocess = run_preprocess instance-attribute
intersect_preprocess_params = copy.deepcopy(intersect_preprocess_params) instance-attribute
ecdh_params = copy.deepcopy(ecdh_params) instance-attribute
Functions
check()
Source code in federatedml/param/intersect_param.py
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
def check(self):
    descr = "intersect param's "

    self.intersect_method = self.check_and_change_lower(self.intersect_method,
                                                        [consts.RSA, consts.RAW, consts.DH, consts.ECDH],
                                                        f"{descr}intersect_method")

    if self._warn_to_deprecate_param("random_bit", descr, "rsa_params' 'random_bit'"):
        if "rsa_params.random_bit" in self.get_user_feeded():
            raise ValueError(f"random_bit and rsa_params.random_bit should not be set simultaneously")
        self.rsa_params.random_bit = self.random_bit

    self.check_boolean(self.sync_intersect_ids, f"{descr}intersect_ids")

    if self._warn_to_deprecate_param("encode_param", "", ""):
        if "raw_params" in self.get_user_feeded():
            raise ValueError(f"encode_param and raw_params should not be set simultaneously")
        else:
            self.callback_param.callbacks = ["PerformanceEvaluate"]

    if self._warn_to_deprecate_param("join_role", descr, "raw_params' 'join_role'"):
        if "raw_params.join_role" in self.get_user_feeded():
            raise ValueError(f"join_role and raw_params.join_role should not be set simultaneously")
        self.raw_params.join_role = self.join_role

    self.check_boolean(self.only_output_key, f"{descr}only_output_key")

    self.join_method = self.check_and_change_lower(self.join_method, [consts.INNER_JOIN, consts.LEFT_JOIN],
                                                   f"{descr}join_method")
    self.check_boolean(self.new_sample_id, f"{descr}new_sample_id")
    self.sample_id_generator = self.check_and_change_lower(self.sample_id_generator,
                                                           [consts.GUEST, consts.HOST],
                                                           f"{descr}sample_id_generator")

    if self.join_method == consts.LEFT_JOIN:
        if not self.sync_intersect_ids:
            raise ValueError(f"Cannot perform left join without sync intersect ids")

    self.check_boolean(self.run_cache, f"{descr} run_cache")

    if self._warn_to_deprecate_param("encode_params", descr, "raw_params") or \
            self._warn_to_deprecate_param("with_encode", descr, "raw_params' 'use_hash'"):
        # self.encode_params.check()
        if "with_encode" in self.get_user_feeded() and "raw_params.use_hash" in self.get_user_feeded():
            raise ValueError(f"'raw_params' and 'encode_params' should not be set simultaneously.")
        if "raw_params" in self.get_user_feeded() and "encode_params" in self.get_user_feeded():
            raise ValueError(f"'raw_params' and 'encode_params' should not be set simultaneously.")
        LOGGER.warning(f"Param values from 'encode_params' will override 'raw_params' settings.")
        self.raw_params.use_hash = self.with_encode
        self.raw_params.hash_method = self.encode_params.encode_method
        self.raw_params.salt = self.encode_params.salt
        self.raw_params.base64 = self.encode_params.base64

    self.raw_params.check()
    self.rsa_params.check()
    self.dh_params.check()
    self.ecdh_params.check()
    self.check_boolean(self.cardinality_only, f"{descr}cardinality_only")
    self.check_boolean(self.sync_cardinality, f"{descr}sync_cardinality")
    self.check_boolean(self.run_preprocess, f"{descr}run_preprocess")
    self.intersect_preprocess_params.check()
    if self.cardinality_only:
        if self.cardinality_method not in [consts.RSA, consts.DH, consts.ECDH]:
            raise ValueError(f"cardinality-only mode only support rsa, dh, ecdh.")
        if self.cardinality_method == consts.RSA and self.rsa_params.split_calculation:
            raise ValueError(f"cardinality-only mode only supports unified calculation.")
    if self.run_preprocess:
        if self.intersect_preprocess_params.false_positive_rate < 0.01:
            raise ValueError(f"for preprocessing ids, false_positive_rate must be no less than 0.01")
        if self.cardinality_only:
            raise ValueError(f"cardinality_only mode cannot run preprocessing.")
    if self.run_cache:
        if self.intersect_method not in [consts.RSA, consts.DH, consts.ECDH]:
            raise ValueError(f"Only rsa, dh, or ecdh method supports cache.")
        if self.intersect_method == consts.RSA and self.rsa_params.split_calculation:
            raise ValueError(f"RSA split_calculation does not support cache.")
        if self.cardinality_only:
            raise ValueError(f"cache is not available for cardinality_only mode.")
        if self.run_preprocess:
            raise ValueError(f"Preprocessing does not support cache.")

    deprecated_param_list = ["repeated_id_process", "repeated_id_owner", "intersect_cache_param",
                             "allow_info_share", "info_owner", "with_sample_id"]
    for param in deprecated_param_list:
        self._warn_deprecated_param(param, descr)

    LOGGER.debug("Finish intersect parameter check!")
    return True

Functions

Feature

Below lists features of each ECDH, RSA, DH, and RAW intersection methods.

Intersect Methods PSI Match-ID Support Multi-Host Exact-Cardinality Estimated Cardinality Preprocessing Cache
ECDH
RSA
DH
RAW

All four methods support:

  1. Automatically match ID intersection using ID expanding (when data contains instance id).
  2. Configurable hashing methods, including sha256, md5, and sm3; hash operators of RSA intersection can be configured separately, please refer here for more details.
  3. Preprocessing step to pre-filter Host's data for faster PSI
  4. Multi-host PSI task. The detailed configuration for multi-host task can be found here.

RSA, DH, ECDH intersection methods also support:

  1. PSI with cache

RAW intersection supports the following extra feature:

  1. base64 encoding may be used for all hashing methods.

Cardinality Computation:

  1. Set cardinality_method to rsa will produce estimated intersection cardinality;

  2. Set cardinality_method to dh or ecdh will compute exact intersection cardinality


最后更新: 2022-11-21