Download PDFOpen PDF in browser

Robust Authorship Verification with Transfer Learning

EasyChair Preprint 865

12 pagesDate: March 29, 2019

Abstract

We address the problem of open-set authorship verification, a classification task that consists of attributing texts of unknown authorship to a given author when the testing set may differ significantly with the training set in terms of documents and candidate authors. We present an end-to-end model-building process that is universally applicable to a wide variety of corpora, with little to no modification or fine-tuning. It relies on transfer learning of a deep language model and uses a generative adversarial network and a number of text augmentation techniques to improve the model's generalization ability. The language model encodes documents of known and unknown authorship into a domain-invariant space, aligning document pairs as input to the classifier while keeping them separate. The resulting embeddings are used to train an ensemble of recurrent and quasi-recurrent neural networks. The entire pipeline is bidirectional; forward and backward pass results are averaged. We perform experiments on four traditional authorship verification datasets, a collection of machine learning papers mined from the web, and a large Amazon-Reviews dataset. Experimental results outperform baseline and state-of-the-art techniques, validating the proposed approach.

Keyphrases: Transfer Learning, authorship verification, language modeling

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:865,
  author    = {Dainis Boumber and Yifan Zhang and Marjan Hosseinia and Arjun Mukherjee and Ricardo Vilalta},
  title     = {Robust Authorship Verification with Transfer Learning},
  doi       = {10.29007/9nf3},
  howpublished = {EasyChair Preprint 865},
  year      = {EasyChair, 2019}}
Download PDFOpen PDF in browser