Thursday, 22 August 2013

Is it possible to supplement Naive Bayes text classification algorithm with author information?

Is it possible to supplement Naive Bayes text classification algorithm
with author information?

I am working on a text classification project where I am trying to assign
topic classifications to speeches from the Congressional Record.
Using topic codes from the Congressional Bills Project
(http://congressionalbills.org/), I've tagged speeches that mention a
specific bill as belonging to the topic of the bill. I'm using this as my
"training set" for the model.
I have a "vanilla" Naive Bayes classifier working well-enough, but I keep
feeling like I could get better accuracy out of the algorithm by
incorporating information about the member of Congress who is making the
speech (e.g. certain members are much more likely to talk about Foreign
Policy than others).
One possibility would be to replace the prior in the NB classifier
(usually defined as the proportion of documents in the training set that
have the given classification) with speaker's observed prior speeches.
Is this worth pursuing? Are there existing approaches that have followed
this same kind of logic? I'm a little bit familiar with the "author-topic
models" that come out of Latent Dirichlet Allocation models, but I like
the simplicity of the NB model.

No comments:

Post a Comment