Skip to content

vicolinho/pprl_autoencoder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Autoencoder evaluation for Bloom filter encryption

separate encoder model

Setup

Data Owner A ←— parameters —→ Data Owner B  
     ↓                              ↓  
Bloom Filters                 Bloom Filters  
     ↓                              ↓  
train Encoder                 train Encoder  
     ↓                              ↓  
encode Bloom Filters          encode Bloom Filters
     |                              |
      ————→    Linkage Unit    ←————

Linkage Mapper Data Generation

                Data Owner A  ←— b_decode(D) ——  Data Owner B  
                      |                              ↑
a_encode(b_decode(D)) |                              | random Data D
                       ————→    Linkage Unit    —————
                                     ↓
                      pairs (d, a_encode(b_decode(d)))

running run_all_configs.py  -cdir  <config_directory>  in  /src/  will do the following for each configuration file in  /src/<config_directory>/ :

  • build two autoencoders of the same structure (specified in the configuration file) for two data owners A,B  and fit them on their respective sets of Bloom-Filters
  • encode the two datasets using the fitted encoders, normalize the encoded data
  • generate training data in order to build a mapper between the two encodings. This is done as follows:
  • a random dataset is sampled from an n-dimensional standard normal distribution in the linkage unit (n being the dimension of the encodings) and sent to data owner B
  • the datapoints are transformed to fit the the output distribution of Bs encoder and fed into the decoder network
  • the decoder outputs are sent to A, fed into As encoder network, the encoder outputs are normalized and sent to the linkage unit
  • the linkage unit trains a mapper model on the obtained pairs of datapoints
  • the datasets are linked by applying the mapper to Bs encoded datapoints and searching for the nearest neighbor of the output in As encoded data. If the distance is below a certain threshold (specified in the configuration file), the two datapoints are considered a match.

the linkage results, as well as all models, generated datasets and training progress data, for a configuration file  /src/<config_directory>/<configname>.json  are stored in  /src/<config_directory>/<configname>/ .

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages