-
Notifications
You must be signed in to change notification settings - Fork 9
Description
I now have task: extract mails from several (10+) .PST files (all from one account, collected over the past 15 years or so as bakups), remove duplicities and convert mails into MAILDIR structure.
My idea was to extract individual messages from these .PST (using the libpst/readpst) to separate trees, then delete duplicities (using eg. fdupes) and then join result.
In real (apart from the problem of different number of extracted files when processing one .pst file repeatedly - issue #7 touch it), I ran into the problem of detecting the identical/duplicit messages - because readpst now generate internal message boundaries as random strings. Thus even identical messages not appears so:
$ diff /home/mail/outlook-r2020/archive.pst.mdi/.Doručená\ pošta/cur/1681064600.005298:2,S /home/mail/outlook-r2023/outlook.pst.mdi/.Doručená\ pošta/cur/1681059416.005051:2,S
38c38
< boundary="--boundary-LibPST-iamunique-1906170776_-_-"
---
> boundary="--boundary-LibPST-iamunique-1627685354_-_-"
41c41
< ----boundary-LibPST-iamunique-1906170776_-_-
---
> ----boundary-LibPST-iamunique-1627685354_-_-
112782c112782
< ----boundary-LibPST-iamunique-1906170776_-_-
---
> ----boundary-LibPST-iamunique-1627685354_-_-
Perhaps should be somehow (some switch for this behavior) possible to generate predictable and same in all mails boundaries strings - so the same mails would also be presented by the same message files (in terms of content, not file names).
Thanks in advance, Franta Hanzlík