public final class StringUtil extends Object
Modifier and Type | Field and Description |
---|---|
static char |
CHAR_NON_BREAKING_SPACE
nbsp character (decimal 160 = hex A0)
|
Modifier and Type | Method and Description |
---|---|
static String |
camelCaseToDisplayable(String camelCaseString,
boolean dashesToSpaces)
Convert a string from a camel-case "identifier" style to a human-readable
version, by putting spaces between words, uppercasing the first letter and
lowercasing the rest.
|
static String |
escapeRegexCharacters(String termStr)
Escape regex special characters
(Pattern.quote() also does this, but this method is needed if you use a different regex
engine from Java's, such as with Lucene)
|
static String |
normalizeWhitespace(String s)
Replace adjacent whitespace characters with a single space
|
static String |
ordinal(int docNumber)
For a number n, return a string like "nth".
|
static String |
stripAccents(String input)
Removes diacritics (~= accents) from a string.
|
static String |
trimWhitespaceAndPunctuation(String input)
Remove any punctuation and whitespace at the start and end of input.
|
static String |
wildcardToRegex(String wildcard)
Convert wildcard string to regex string.
|
public static final char CHAR_NON_BREAKING_SPACE
public static String escapeRegexCharacters(String termStr)
termStr
- the string to escape characters inpublic static String normalizeWhitespace(String s)
s
- source stringpublic static String stripAccents(String input)
Removes diacritics (~= accents) from a string. The case will not be altered.
For instance, 'à' will be replaced by 'a'.
Note that ligatures will be left as is.
StringUtils.stripAccents(null) = null StringUtils.stripAccents("") = "" StringUtils.stripAccents("control") = "control" StringUtils.stripAccents("éclair") = "eclair"NOTE: this method was copied from Apache StringUtils. The only change is precompiling the regular expression for efficiency.
input
- String to be strippedpublic static String trimWhitespaceAndPunctuation(String input)
input
- the input stringpublic static String camelCaseToDisplayable(String camelCaseString, boolean dashesToSpaces)
camelCaseString
- a string in camel case, i.e. multiple capitalized
words glued together.dashesToSpaces
- if true, also converts dashes and underscores to spacespublic static String ordinal(int docNumber)
docNumber
- numberCopyright © 2020 Instituut voor Nederlandse Taal (INT). All rights reserved.