GitHub - vicolinho/pprl_autoencoder

Autoencoder evaluation for Bloom filter encryption

separate encoder model

Setup

Data Owner A ←— parameters —→ Data Owner B  
     ↓                              ↓  
Bloom Filters                 Bloom Filters  
     ↓                              ↓  
train Encoder                 train Encoder  
     ↓                              ↓  
encode Bloom Filters          encode Bloom Filters
     |                              |
      ————→    Linkage Unit    ←————

Linkage Mapper Data Generation

                Data Owner A  ←— b_decode(D) ——  Data Owner B  
                      |                              ↑
a_encode(b_decode(D)) |                              | random Data D
                       ————→    Linkage Unit    —————
                                     ↓
                      pairs (d, a_encode(b_decode(d)))

running run_all_configs.py -cdir <config_directory> in /src/ will do the following for each configuration file in /src/<config_directory>/ :

build two autoencoders of the same structure (specified in the configuration file) for two data owners A,B and fit them on their respective sets of Bloom-Filters
encode the two datasets using the fitted encoders, normalize the encoded data
generate training data in order to build a mapper between the two encodings. This is done as follows:
a random dataset is sampled from an n-dimensional standard normal distribution in the linkage unit (n being the dimension of the encodings) and sent to data owner B
the datapoints are transformed to fit the the output distribution of Bs encoder and fed into the decoder network
the decoder outputs are sent to A, fed into As encoder network, the encoder outputs are normalized and sent to the linkage unit
the linkage unit trains a mapper model on the obtained pairs of datapoints
the datasets are linked by applying the mapper to Bs encoded datapoints and searching for the nearest neighbor of the output in As encoded data. If the distance is below a certain threshold (specified in the configuration file), the two datapoints are considered a match.

the linkage results, as well as all models, generated datasets and training progress data, for a configuration file /src/<config_directory>/<configname>.json are stored in /src/<config_directory>/<configname>/ .

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Autoencoder evaluation for Bloom filter encryption

separate encoder model

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Autoencoder evaluation for Bloom filter encryption

separate encoder model

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages