The science of science
Apr 28th 2011 | from the print edition
COMPUTER scientists have long tried to foist order on the explosion of data that is the internet. One obvious way is to group information by topic, but tagging it all comprehensively by hand is impossible. David Blei, of Princeton University, has therefore been trying to teach machines to do the job.He starts with defining topics as sets of words that tend to crop up in the same document. For example, “Big Bang” and “black hole” often will co-occur, but not as often as each does with “galaxy”. Neither, however, would be expected to pop up next to “genome”. This captures the intuition that the first three terms, but not the fourth, are part of a single topic. Of course, much depends on how narrow you want a topic to be. But Dr Blei’s model, which he developed with John Lafferty, of Carnegie Mellon University, allows for that.
Continue read.....
No comments:
Post a Comment