public final class DictionaryAlgorithms extends Object
Modifier and Type | Class and Description |
---|---|
static class |
DictionaryAlgorithms.WordPosition
The WordPosition class is used to classify a word position within a
NodeName instance.
|
Modifier and Type | Field and Description |
---|---|
protected static Pattern |
ALPHANUMERIC_CHARACTER
Alphanumeric character.
|
protected static int |
MIN_WORD_LENGTH
Minimim word length.
|
protected static Pattern |
STOPWORD
Pattern containing all stop-words.
|
protected static Pattern |
SUFFIXES
Pattern containing all suffixes which should be removed.
|
protected static Pattern |
WORD
Pattern used to extract words from a string.
|
protected static Pattern |
WORD_STRICT
Pattern used to extract words from a string.
|
Constructor and Description |
---|
DictionaryAlgorithms() |
Modifier and Type | Method and Description |
---|---|
static String[] |
calculateWordKeys(String word)
Function that can be used to calculated dimensionalty reduced keys for
the given word with "strict" as target comparison mode.
|
static String[] |
calculateWordKeys(String word,
ComparisonMode targetMode)
Function that can be used to calculated dimensionalty reduced keys for
the given word.
|
static boolean |
containsAlphanumericChar(String s)
Returns true if the given string contains at least one alphanumeric
character.
|
static int |
editDistance(String w1,
String w2)
Calculates the edit distance between word w1 and word w2.
|
static int |
firstWhitespaceIndex(String s)
Returns the position of the first whitespace in the given string or
Integer.MAX_VALUE if there is no whitespace in the string.
|
static Set<DictionaryAlgorithms.WordPosition> |
getWords(String s)
Returns a list of words from the given string (both strict and non-strict
matches).
|
static Set<DictionaryAlgorithms.WordPosition> |
getWords(String s,
boolean onlyStrict)
Returns a list of words from the given string.
|
static boolean |
isAlphabetic(char c) |
static boolean |
isAlphanumeric(char c) |
static boolean |
isNumeric(char c) |
static boolean |
isStopWord(String word)
Returns true if the given word is a stop word.
|
static float |
matchProbability(String w1,
String w2)
Calculates the "match probability" between w1 and w2 as a value between
0.0f and 1.0f.
|
static String |
normalizeWord(String s,
ComparisonMode mode)
Normalizes the node name according to the given comparison mode.
|
static void |
purgeWordsCache()
This clears the caches for the getWords method.
|
static String |
stripNonAlphabetic(String s)
Convert the incomming string to lowercase, remove any special character.
|
static String |
stripNonAlphanumeric(String s)
Convert the incomming string to lowercase, remove any non alphanumeric
character.
|
static String |
stripWhitespace(String s)
Strip all whitespace from the given string.
|
static String |
wordStem(String word)
Extracts the word stem of a certain word.
|
protected static final int MIN_WORD_LENGTH
protected static final Pattern STOPWORD
protected static final Pattern SUFFIXES
protected static final Pattern WORD
protected static final Pattern WORD_STRICT
protected static final Pattern ALPHANUMERIC_CHARACTER
public static boolean containsAlphanumericChar(String s)
s
- is the string which should be checked.public static boolean isAlphabetic(char c)
public static boolean isNumeric(char c)
public static boolean isAlphanumeric(char c)
public static String stripNonAlphabetic(String s)
s
- is the string from which all non-alphabetic characters should be
stripped.public static String stripNonAlphanumeric(String s)
s
- is the string from which all non-alphanumeric characters should
be stripped.public static String stripWhitespace(String s)
s
- is the string from which all whitespace should be stripped.public static int firstWhitespaceIndex(String s)
s
- is the string of which the first whitespace position should be
returned.public static Set<DictionaryAlgorithms.WordPosition> getWords(String s, boolean onlyStrict)
s
- is the words for which the words should be calculated.onlyStrict
- if true, only words seperated by whitespace are
returned, otherwise both words separated by whitespace and words
separated by special characters are returned.public static void purgeWordsCache()
public static Set<DictionaryAlgorithms.WordPosition> getWords(String s)
s
- is the words for which the words should be calculated.public static String normalizeWord(String s, ComparisonMode mode)
s
- is the input string (should be a single word).mode
- is the comparison mode for which the node name should be
normalized.public static boolean isStopWord(String word)
word
- to be checked, must be lowercase.public static String wordStem(String word)
word
- is the word of which the word stem should be calculated.public static int editDistance(String w1, String w2)
w1
- is the first word.w2
- is the second word.public static float matchProbability(String w1, String w2)
w1
- is the first word.w2
- is the second word.public static String[] calculateWordKeys(String word, ComparisonMode targetMode)
word
- is word for which the keys shall be calculated.targetMode
- specifies the target mode for which the keys are
generated. Only keys for modes more general or equally general as the
given mode are generated.public static String[] calculateWordKeys(String word)
word
- is word for which the keys shall be calculated.Copyright (C) 2013, 2014 Raphael Dickfelder, Jan Göpfert, Benjamin Paassen, Andreas Stöckel, licensed under the AGPL v. 3: http://openresearch.cit-ec.de/projects/scie