summaryrefslogblamecommitdiff
path: root/textproc/p5-Search-VectorSpace/pkg-descr
blob: 0d30dae3d8342f54d892fb007dfb0cceb0af89c0 (plain) (tree)
1
2
3
4
5
6
7
8
9
10
11
12











                                                       
This module takes a list of documents (in English) and 
builds a simple in-memory search engine using a vector 
space model. Documents are stored as PDL objects, and 
after the initial indexing phase, the search should be 
very fast. This implementation applies a rudimentary 
stop list to filter out very common words, and uses a 
cosine measure to calculate document similarity. 
All documents above a user-configurable similarity 
threshold are returned.

Author:	Maciej Ceglowski <maciej AT ceglowski.com>
WWW:	http://search.cpan.org/dist/Search-VectorSpace/