Authorship Attribution: From Identity to Essence
Dr. Patrick Juola
“Stylometry” has been able to get remarkable successes in attributing authorship of unknown or disputed works, most recently in this summer’s unmasking of J.K. Rowling’s authorship of Robert Galbraith’s The Cuckoo’s Calling. The theory behind stylometry is straightforward: language contains a large number of learned, habitual choices that vary from person to person and group to group.
What kind of choices do people make? Right now, we can only really track three kinds of relatively uninformative choices : characters, words, and local grammatical constructions. While these are informative (in a narrow technical sense), they are meaningless in terms of giving holistic understanding of people. Knowing that Madison differs significantly from Hamilton in his use of the word “by” tells us noting about their key philosophical differences. What types of patterns would allow us to make this type of leap to the actual thoughts of the writer? Is it even possible to get a computer to codify something as abstract as “philosophical beliefs”?
Similarly, the ability of stylometry to profile people is quite powerful, to the point of being rather scary. Using statistical models, we can perhaps correlate specific linguistic patterns with specific cognitive or behavioral patterns. I present some typical profiling results and discuss some potential future applications. For example, self-esteem can be inferred using these techniques, and in theory one could use this as a screening tool for depression. Is this helpful for a historian wanting to understand Jefferson’s mood when he wrote the Declaration of Independence? Is this a useful technology for studying the mindset of the author of a literary work? Would this help analyze fictional characters who are described as having particular moods? Would we care if an analysis of Dangerous Liaisons revealed that the Vicomte de Valmont had – or didn’t have – signs of low self-esteem? Does it add anything to our understanding of the structure of epistolary novels? Or would this simply be a statement without useful foundation or relation to anything interesting? One underlying cause of the disturbance is this same lack of apparent relationship between the patterns and the person; without a causal theory of individual authorship, can we really trust the group-based, purely statistical findings on an individual basis?
Patrick Juola is co-founder and director of research of Juola & Associates, a text analysis firm focusing on stylometry and authorial studies. He is also associate professor of computer science at Duquesne University, Pittsburgh, PA. He was one of two analysts who revealed Rowling’s authorship of Galbraith’s The Cuckoo’s Calling and is currently working on a DARPA-funded project to use communication style as a replacement for password-based authentication. He is a long-term member of the digital humanities.