- Wed May 29 at Vacasa: video, code, slides, meetup
- Thurs May 30 at New Relic: video, code, slides, meetup
We'll show you how to use a state of pipeline to unredact (inaccurately) the Mueller Report at Portland Meetups May 29 & 30:
Many of the techniques we use are explained in detail in Natural Language Processing in Action.
At the Portland Python User Group, May 30, 2019, we'll show you use a simple RNN to predict the next words or sentences in any document. We'll then show how it's able to compose reasonable guesses for the text in the redactions (black boxes) of the Mueller Report. Then we'll show you how to use transfer learning in Keras with the state-of-the-art BERT language model to bring in information from outside the report text to improve the accuracy of the generated text, but only for short redactions.
We built this on the shoulders of a lot of good people contributing code and data to a valiant effort to improve US government transparency:
- Open Source Mueller Report (machine-readable latex)
- Factbase's human-reviewed text
- Ian Landis Miller
- Zhao HG's port of BERT to Keras
- Sepehr Sameni's port of BERT to Keras
- Gaden Buie's improved OCR of the PDF
- Manuel Amunategui's RNN for Generating Mueller
- Paul Mooney's OCR of the PDF report
At TensorFlow.world 2019, on Oct 28-31 Al Kari, Garrett Lander, Chrisand Hobson Lane from Manceps will be presenting our experience with NLP at scale for long technical documents, like medical records.
At Tensor Flow World 2019 we'll show the results of our investigation into the latest NL embeddings and their usefulness for search and abstractive summarization:
- USE
- BERT
- GPT2
At Tensor Flow World 2019 we'll show how to identify vulnerabilities in your machine learning brain from adversarial attacks:
- Image classification
- Document classification