Download: textvistools.tar, funpartools.tar, probabilitytools.tar
This toolbox permits the spatial visualization of large corpus of documents based on
The following toolboxes are included:
A modifed version of the Matlab Topic Modeling Toolbox 1.3.2 which is the Latent Dirichlet allocation implementation in MATLAB, by M. Steyvers and T. Griffiths, available at
http://psiexp.ss.uci.edu/research/programs_data/toolbox.htm
The changes introduced were made simply to optimize memory usage, allowing for larger corpus.
A modified version of SOM_PAK Version 3.1 (April 7, 1995), which is the Self-Organizing Map Program Package by the SOM Programming Team of the Helsinki University of Technology.
The changes introduced were made simply to avoid compilation errors/warnings in Mac OSX.
A modified version of LVQ_PAK Version 3.1 (April 7, 1995), which is the Learning vector Quantization Package by the LVQ Programming Team of the Helsinki University of Technology.
The changes introduced were made simply to avoid compilation errors/warnings in Mac OSX.
A modified version of SOM Analyst Tools (Fall 2006), which is a python ArcGIS toolbox to interface with SOM_PAK by Martin Lacayo-Emery.
The changes introduced were made simply to allow for more than 9999 documents and to permit a simple interface with matlab.
The FunParTools matlab toolbox by Joao Hespanha. This toolbox is mosly useful when writing scripts to do batch processing of data using multiple steps (each affected by several parameters) and reading/writing intermediate files.
The ProbabilityTools matlab toolbox by Joao Hespanha. This toolbox provides a few simple macros to perform estimation.
These instructions should work "flawlessly" on Mac OSX and possibly also under Linux. On MS windows several adaptations may be needed.
A pre-generated set of outputs for this example are included in the folder
textvistools/example/outputFiles/
The content of the various output files for the example is described in the document
textvistools/example/outputs_overview.rtf
One can get help on several of the key functions by typing the following commands at the matlab prompt. These commands produce reasonably detailed descriptions of all inputs and outputs of the different scripts, including input/output file formats, and parameters used by the different algorithms.
[Note that for the first set of commands the keyword 'help' appears AFTER the command name. That convention is used by the FunParTools toolbox, which is used by those functions to process the input parameters.]
When used in research, please acknowledge the use of this software with the following reference:
Stacy Rebich Hespanha and João Hespanha. Text Visualization Toolbox — a MATLAB toolbox to visualize large corpus of documents. Available at http://www.ece.ucsb.edu/~hespanha, Feb. 2010.
or if you use latex/bibtex:
@Misc{HespanhaHespanhaFeb10,
key = {matlab,lda,arcgis,som},
author = {Stacy Rebich Hespanha and João Pedro Hespanha},
title = {\texttt{Text Visualization Toolbox} --- a {MATLAB} toolbox to
visualize large corpus of documents},
howpublished = {Available at
\url{http://www.ece.ucsb.edu/~hespanha}},
month = feb,
year = 2011
}