Authorship Attribution Simulacrum

Estimating the probability that a given author wrote a given text

Constructed Tool

Converse with Authorship Attribution Simulacrum →

What The Tool Does

Authorship attribution is a probability problem, not a certainty problem. Given a disputed text and one or more candidate authors, the tool estimates the likelihood that each candidate wrote the text, using the statistical fingerprint of their known writing — function-word frequencies, sentence-length distributions, vocabulary richness, rhythmic patterns, and a range of other features that writers reproduce unconsciously and cannot reliably disguise. The output is a probability distribution and a confidence estimate, not a verdict.

The tool works with any language for which sufficient reference texts exist. It requires a meaningful sample of known writing from each candidate — typically several thousand words minimum, preferably more — and a disputed sample of comparable length. Short disputed texts produce wide confidence intervals; the tool says so. Long disputed texts with distinctive stylistic signatures can be attributed with very high confidence, as Frederick Mosteller and David Wallace famously demonstrated for the disputed Federalist Papers.

Where The Method Comes From

The foundational modern work is Mosteller and Wallace's *Inference and Disputed Authorship: The Federalist* (1964), which used Bayesian analysis of function-word frequencies to resolve the long-standing question of whether James Madison or Alexander Hamilton wrote the disputed *Federalist* essays — conclusively in Madison's favour. Their method, using words like *by*, *from*, *to*, *upon*, *while* as stylistic markers invisible to authorial intention, remains the template for the field.

Contemporary stylometry extends the approach with a much larger toolkit: John Burrows's Delta method (2002), support-vector classifiers on word-frequency vectors, character n-grams, and more recently neural-network approaches that learn stylistic signatures directly from text. The field has produced high-profile attributions — Joe Klein as the anonymous author of *Primary Colors*, the contested but widely accepted identification of Elena Ferrante, analyses of the suspected Shakespearean collaborations — and just as importantly, it has refused to make attributions where the data did not support them.

What It Can And Cannot Do

The tool can produce reliable probabilistic attributions when the candidate pool is small, the reference samples are substantial, and the disputed text is long enough to have a statistical signature. It works on prose in any language with adequate training data and has been extended experimentally to poetry, code, and translated texts with more cautious results.

It cannot produce a confident single-author attribution from a short text, from a pool of unknown candidates, or when the reference samples are heterogeneous in genre or period. It can be defeated by deliberate stylistic imitation (though imitation usually leaves its own signature), and it cannot distinguish a collaborator from a single author when both contributed intermingled passages. The tool is honest about these limits and reports them with every attribution.

Can help you with

Testing disputed authorship in cases with a defined candidate pool
Measuring the statistical distance between a suspected author and a disputed text
Evaluating claims of authorship made on other grounds
Investigating possible collaboration or editorial interpolation in a known work
Handling pseudonymous or anonymous texts where candidates can be proposed
Understanding the limits of confidence that stylometry can honestly provide

Converse with Authorship Attribution Simulacrum →

Others in Research & Textual Analysis

Universitas Scholarium · scholar ID research_genitor
Part of Academic Tools · Research & Textual Analysis.