|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectnet.jot.search.simpleindexer.JOTSimpleSearchEngine
public class JOTSimpleSearchEngine
Implement a simple search engine using a text/keyword index Use(or extend) to index/search plain text pfiles. This intends to be 'barebone' and decoupled from teh ui part of presesnting the results.
| Field Summary | |
|---|---|
protected static JOTSearchSorter |
defaultSorter
|
java.io.File |
indexRoot
index property file |
protected static java.util.regex.Pattern |
pattern
Pattern matching "words" a single word is considered any letter or number (unicode case insensitive) as well as - and _ |
protected java.io.File |
propFile
|
JOTPropertiesPreferences |
props
|
protected int |
WORD_BATCH_SIZE
Max words to process in memory before writing to file Too low, and performance will be slower Too high and it will use more memory. |
| Constructor Summary | |
|---|---|
JOTSimpleSearchEngine(java.io.File indexRoot)
|
|
| Method Summary | |
|---|---|
protected int |
commitFromMemory(java.lang.String id,
java.util.Hashtable hash)
Writes the temporary -in memory- hash to the index files. |
int |
indexFile(java.io.File textFile)
Index the file using the filepath as the unique key, and only reindexing if file timestamp was updated |
int |
indexFile(java.io.File textFile,
boolean onlyIfModified)
Index the file using the filepath as the unique key |
int |
indexFile(java.io.File textFile,
java.lang.String uniqueId)
Index the file, only if the timestamp chnaged since the last indexing. |
int |
indexFile(java.io.File textFile,
java.lang.String uniqueId,
boolean onlyIfModified)
index a file(update if already indexed) |
protected int |
indexLineInMemory(java.util.Hashtable hash,
java.lang.String lineNb,
java.lang.String s)
mem is the hashtable storing the keyword data. |
static void |
main(java.lang.String[] args)
for testing / Example |
static java.lang.String[] |
parseQueryIntoKeywords(java.lang.String queryString)
Utility method to parse a user typed query (ex: "a java server pAGes ") into keywords ex: [java,server,pages] |
JOTRawSearchResult[] |
performRawSearch(java.lang.String[] keywords)
return an array of rawSearchResults (one rawsearchresult per keyword, in the same order as the keywords). |
JOTSearchResult[] |
performSearch(java.lang.String[] keywords,
JOTSearchSorter sorter)
return sorted list of files(uniqueIds) and score (1-5) |
int |
removeFile(java.io.File textFile,
java.lang.String uniqueId)
remove a file from the index |
protected void |
updateKeywordsCount(int nbNewKeywords)
|
static void |
whipeoutIndex(java.io.File indexRoot)
completely whipeout the index, so you can reindex from scratch Simply deletes everyhting in the indexRoot folder ! |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public java.io.File indexRoot
public JOTPropertiesPreferences props
protected java.io.File propFile
protected static java.util.regex.Pattern pattern
protected static JOTSearchSorter defaultSorter
protected int WORD_BATCH_SIZE
| Constructor Detail |
|---|
public JOTSimpleSearchEngine(java.io.File indexRoot)
throws java.lang.Exception
indexRoot: - root folder where the index data is/will go (empty folder)
java.lang.Exception| Method Detail |
|---|
public int indexFile(java.io.File textFile)
throws java.lang.Exception
textFile -
java.lang.Exception
public int indexFile(java.io.File textFile,
boolean onlyIfModified)
throws java.lang.Exception
textFile - onlyIfModified - if true only update if file timestamp chnaged since last indexing
java.lang.Exception
public int indexFile(java.io.File textFile,
java.lang.String uniqueId)
throws java.lang.Exception
textFile - uniqueId: - a unique id for the file, ie: absolutepath, md5 etc .... if null absolutepath will be used.
java.lang.Exception
public int indexFile(java.io.File textFile,
java.lang.String uniqueId,
boolean onlyIfModified)
throws java.lang.Exception
textFile - onlyIfModified - if true only update the file if file timestamp changed since last indexinguniqueId - a unique id for the file, ie: absolutepath, md5 etc .... if null absolutepath will be used.
java.lang.Exception
protected int commitFromMemory(java.lang.String id,
java.util.Hashtable hash)
throws java.lang.Exception
hash - uniqueId -
java.lang.Exception
protected int indexLineInMemory(java.util.Hashtable hash,
java.lang.String lineNb,
java.lang.String s)
public int removeFile(java.io.File textFile,
java.lang.String uniqueId)
throws java.lang.Exception
textFile - uniqueId - the unique id for the file(used in indexFile), ie: absolutepath, md5 etc .... if null absolutepath will be used.
java.lang.Exception
protected void updateKeywordsCount(int nbNewKeywords)
throws java.lang.Exception
java.lang.Exceptionpublic static void whipeoutIndex(java.io.File indexRoot)
public JOTSearchResult[] performSearch(java.lang.String[] keywords,
JOTSearchSorter sorter)
throws java.lang.Exception
keywords -
java.lang.Exceptionpublic static java.lang.String[] parseQueryIntoKeywords(java.lang.String queryString)
qeryString -
public JOTRawSearchResult[] performRawSearch(java.lang.String[] keywords)
throws java.lang.Exception
keywords: - keywords should be space separated: ie: "java server pages"
java.lang.Exceptionpublic static void main(java.lang.String[] args)
args -
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||