Modifier and Type | Field and Description |
---|---|
protected static int |
BYTES_PER_INT
Number of bytes per int
|
protected static Charset |
DEFAULT_CHARSET |
protected static int |
DEFAULT_MAX_MAP_SIZE
We set this to a lower value on Windows because we can't properly truncate
the file due to the file still being mapped (there is no clean way to unmap a
mapped file in Java, and Windows doesn't allow truncating a mapped file).
|
static int |
NO_TERM |
Constructor and Description |
---|
Terms() |
Modifier and Type | Method and Description |
---|---|
abstract void |
clear()
Clear the Terms object.
|
int |
compareSortPosition(int termId1,
int termId2,
MatchSensitivity sensitivity)
Compare two terms (from their term ids) based on their sort positions
|
int |
deserializeToken(String term) |
abstract String |
get(int id)
Get a term by id.
|
abstract int |
idToSortPosition(int id,
MatchSensitivity sensitivity)
Get the sort position for a term based on its term id
|
abstract void |
indexOf(org.eclipse.collections.api.set.primitive.MutableIntSet results,
String term,
MatchSensitivity sensitivity)
Get the index number(s) of terms matching a string.
|
abstract int |
indexOf(String term)
Get the existing index number of a term, or add it to the term list and
assign it a new index number.
|
void |
initialize() |
abstract int |
numberOfTerms() |
static Terms |
openForReading(Collators collators,
File termsFile,
boolean useBlockBasedTermsFile,
boolean buildTermIndexesOnInit) |
static Terms |
openForWriting(Collators collators,
File termsFile,
boolean useBlockBasedTermsFile) |
protected IntBuffer |
readFromFileChannel(FileChannel fc,
long fileLength) |
String |
serializeTerm(int valueTokenId) |
protected abstract void |
setBlockBasedFile(boolean useBlockBasedTermsFile) |
abstract boolean |
termsEqual(int[] termId,
MatchSensitivity sensitivity) |
void |
toSortOrder(int[] termId,
int[] sortOrder,
MatchSensitivity sensitivity)
Convert an array of term ids to sort positions
|
abstract void |
write(File termsFile)
Write the terms file
|
public static final int NO_TERM
protected static final Charset DEFAULT_CHARSET
protected static final int DEFAULT_MAX_MAP_SIZE
protected static final int BYTES_PER_INT
public void initialize()
public abstract int indexOf(String term)
term
- the term to get the index number forpublic abstract void indexOf(org.eclipse.collections.api.set.primitive.MutableIntSet results, String term, MatchSensitivity sensitivity)
results
- (out) index numbers for the matching term(s)term
- the term to get the index number forsensitivity
- compare sensitively? (case-sensitivity currently switches both case-
and diacritics-sensitivity)public abstract void clear()
protected IntBuffer readFromFileChannel(FileChannel fc, long fileLength) throws IOException
IOException
public abstract void write(File termsFile)
termsFile
- where to write the terms filepublic abstract String get(int id)
id
- the term idpublic abstract int numberOfTerms()
public abstract int idToSortPosition(int id, MatchSensitivity sensitivity)
id
- the term idsensitivity
- whether we want the sensitive or insensitive sort positionpublic void toSortOrder(int[] termId, int[] sortOrder, MatchSensitivity sensitivity)
termId
- the term idssortOrder
- the sort positionssensitivity
- whether we want the sensitive or insensitive sort positionspublic int compareSortPosition(int termId1, int termId2, MatchSensitivity sensitivity)
termId1
- id of the first termtermId2
- id of the second termsensitivity
- whether we want to compare sensitively or insensitivelyprotected abstract void setBlockBasedFile(boolean useBlockBasedTermsFile)
public static Terms openForReading(Collators collators, File termsFile, boolean useBlockBasedTermsFile, boolean buildTermIndexesOnInit)
public static Terms openForWriting(Collators collators, File termsFile, boolean useBlockBasedTermsFile)
public abstract boolean termsEqual(int[] termId, MatchSensitivity sensitivity)
public int deserializeToken(String term)
public String serializeTerm(int valueTokenId)
Copyright © 2020 Instituut voor Nederlandse Taal (INT). All rights reserved.