diff options
Diffstat (limited to 'textproc/p5-Algorithm-RabinKarp')
-rw-r--r-- | textproc/p5-Algorithm-RabinKarp/pkg-descr | 16 |
1 files changed, 8 insertions, 8 deletions
diff --git a/textproc/p5-Algorithm-RabinKarp/pkg-descr b/textproc/p5-Algorithm-RabinKarp/pkg-descr index 5d4ddcc02bcd..28adfaf561d7 100644 --- a/textproc/p5-Algorithm-RabinKarp/pkg-descr +++ b/textproc/p5-Algorithm-RabinKarp/pkg-descr @@ -1,17 +1,17 @@ -This is an implementation of Rabin and Karp's streaming hash, as described -in "Winnowing: Local Algorithms for Document Fingerprinting" by Schleimer, -Wilkerson, and Aiken. Following the suggestion of Schleimer, I am using +This is an implementation of Rabin and Karp's streaming hash, as described +in "Winnowing: Local Algorithms for Document Fingerprinting" by Schleimer, +Wilkerson, and Aiken. Following the suggestion of Schleimer, I am using their second equation: $H[ $c[2..$k + 1] ] = (( $H[ $c[1..$k] ] - $c[1] ** $k ) + $c[$k+1] ) * $k -The results of this hash encodes information about the next k values in -the stream (hense k-gram.) This means for any given stream of length n +The results of this hash encodes information about the next k values in +the stream (hense k-gram.) This means for any given stream of length n integer values (or characters), you will get back n - k + 1 hash values. -For best results, you will want to create a code generator that filters -your data to remove all unnecessary information. For example, in a large -english document, you should probably remove all white space, as well as +For best results, you will want to create a code generator that filters +your data to remove all unnecessary information. For example, in a large +english document, you should probably remove all white space, as well as removing all capitalization. WWW: http://search.cpan.org/dist/Algorithm-RabinKarp/ |