Foundations and Trends® in Information Retrieval > Vol 1 > Issue 3

Authorship Attribution

By Patrick Juola, Department of Mathematics and Computer Science, Duquesne University, USA, juola@mathcs.duq.edu

 
Suggested Citation
Patrick Juola (2008), "Authorship Attribution", Foundations and TrendsĀ® in Information Retrieval: Vol. 1: No. 3, pp 233-334. http://dx.doi.org/10.1561/1500000005

Publication Date: 07 Mar 2008
© 2008 P. Juola
 
Subjects
Natural language processing for IR,  Applications of IR
 

Free Preview:

Download extract

Share

Download article
In this article:
1. Introduction 
2 Background and History 
3 Linguistic and Mathematical Background 
4 Linguistic Features 
5 Attributional Analysis 
6 Empirical Testing 
7 Other Applications of Authorship Attribution 
8 Special Problems of Linguistic Forensics 
9 Recommendations 
References 

Abstract

Authorship attribution, the science of inferring characteristics of the author from the characteristics of documents written by that author, is a problem with a long history and a wide range of application. Recent work in "non-traditional" authorship attribution demonstrates the practicality of automatically analyzing documents based on authorial style, but the state of the art is confusing. Analyses are difficult to apply, little is known about type or rate of errors, and few "best practices" are available. In part because of this confusion, the field has perhaps had less uptake and general acceptance than is its due.

This review surveys the history and present state of the discipline, presenting some comparative results when available. It shows, first, that the discipline is quite successful, even in difficult cases involving small documents in unfamiliar and less studied languages; it further analyzes the types of analysis and features used and tries to determine characteristics of well-performing systems, finally formulating these in a set of recommendations for best practices.

DOI:10.1561/1500000005
ISBN: 978-1-60198-118-9
112 pp. $80.00
Buy book (pb)
 
ISBN: 978-1-60198-119-6
112 pp. $100.00
Buy E-book (.pdf)
Table of contents:
1: Introduction
2: Background and History
3: Linguistic and Mathematical Background
4: Linguistic Features
5: Attributional Analysis
6: Empirical Testing
7: Other Applications of Authorship Attribution
8: Special Problems of Linguistic Forensics
9: Recommendations
References

Authorship Attribution

Authorship attribution, the science of inferring characteristics of the author from the characteristics of documents written by that author, is a problem with a long history and a wide range of application. It is an important problem not only in information retrieval but in many other disciplines as well, from technology to teaching and from finance to forensics. The idea that authors have a statistical "fingerprint" that can be detected by computers is a compelling one that has received a lot of research attention. Authorship Attribution surveys the history and present state of the discipline, presenting some comparative results where available. It also provides a theoretical and empirically-tested basis for further work. Many modern techniques are described and evaluated, along with some insights for application for novices and experts alike. Authorship Attribution will be of particular interest to information retrieval researchers and students who want to keep up with the latest techniques and their applications. It is also a useful resource for people in other disciplines, be it the teacher interested in plagiarism detection or the historian interested in who wrote a particular document.

 
INR-005