Mathematical Alogrithimic Analysis Of Natural Language L. Gerber, Uncredited Authorship Attribution: A Machine Learning Approach
Abstract
Authorship attribution is the problem of associating an author with a document by computational means. In this paper the use of SVM and Neural Networks to classify texts by author, taking advantage of stylistic features such as lexical, syntactic and structural patterns, is studied. We test on two datasets: the Blog Authorship Corpus and Project Gutenberg text, with results of 92.3% accuracy with SVM and 94.7% with a Neural Network. We show how machine learning models are able to capture stylistic fingerprints, which is of interest for forensic linguistics, plagiarism detection, and digital humanities.
Keywords: Authorship attribution, stylometry, machine learning, SVM, neural networks