Skip to content

Feature request: readpst should produce identical output for identical mails (?switch-controlled? boundary behavior) #9

@fhanzlik

Description

@fhanzlik

I now have task: extract mails from several (10+) .PST files (all from one account, collected over the past 15 years or so as bakups), remove duplicities and convert mails into MAILDIR structure.
My idea was to extract individual messages from these .PST (using the libpst/readpst) to separate trees, then delete duplicities (using eg. fdupes) and then join result.

In real (apart from the problem of different number of extracted files when processing one .pst file repeatedly - issue #7 touch it), I ran into the problem of detecting the identical/duplicit messages - because readpst now generate internal message boundaries as random strings. Thus even identical messages not appears so:

$ diff /home/mail/outlook-r2020/archive.pst.mdi/.Doručená\ pošta/cur/1681064600.005298:2,S /home/mail/outlook-r2023/outlook.pst.mdi/.Doručená\ pošta/cur/1681059416.005051:2,S 
38c38
<       boundary="--boundary-LibPST-iamunique-1906170776_-_-"
---
>       boundary="--boundary-LibPST-iamunique-1627685354_-_-"  
41c41
< ----boundary-LibPST-iamunique-1906170776_-_-
---
> ----boundary-LibPST-iamunique-1627685354_-_-  
112782c112782
< ----boundary-LibPST-iamunique-1906170776_-_-
---
> ----boundary-LibPST-iamunique-1627685354_-_-  

Perhaps should be somehow (some switch for this behavior) possible to generate predictable and same in all mails boundaries strings - so the same mails would also be presented by the same message files (in terms of content, not file names).

Thanks in advance, Franta Hanzlík

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions