You are here

Robert Sanderson

From: azaroth@liverpool.ac.uk
Subject: [code4libcon] Proposal: Library Text Mining
Date: December 1, 2005 3:40:03 PM PST
To: code4libcon@lists.gatech.edu

Rob Sanderson, (azaroth@liv.ac.uk)
Prepared Talk: Library Text Mining

Using the TeraGrid[1] and the SRB DataGrid[2], we have sufficient
computational and storage facilities to run normally prohibitively
expensive processing tasks. By integrating text and data mining
tools[3][4] within the Cheshire3[5] information architecture, we can
parse the natural language present in 20 million MARC records (the
University of California's MELVYL collection) and extract information to
provide to search/retrieve applications. In this talk, we'll discuss
the results of applying new techniques to 'old' data.

1: http://www.teragrid.org
2: http://www.sdsc.edu/srb
3: http://www.ailab.si/orange
4: http://www-tsujii.is.s.u-tokyo.ac.jp/
5: http://www.cheshire3.org/

[May or may not be able to attend, but there's a proposal :)]

Rob