Abstract | Authorship verification (AV) is an important sub-area of digital text forensics and has been researched for more than two decades. The fundamental question addressed by AV is whether two documents were written by the same person. A serious problem that has received little attention in the literature so far is the question if AV methods actually focus on the writing style during classification, or whether they are unintentionally distorted by the topic of the documents. To counteract this problem, we propose an effective technique called POSNoise, which aims to mask topic-related content in documents. In this way, AV methods are forced to focus on those text units that are more related to the author's writing style. Based on a comprehensive evaluation with eight existing AV methods applied to eight corpora, we demonstrate that POSNoise is able to outperform a well-known topic masking approach in 51 out of 64 cases with up to 12.5% improvement in terms of accuracy. Furthermore, we show that for corpora preprocessed with POSNoise, the AV methods examined often achieve higher accuracies (improvement of up to 20.6%) compared to the original corpora. |
---|