public abstract class AnnotationForwardIndex extends Object
Modifier and Type | Class and Description |
---|---|
static class |
AnnotationForwardIndex.CollatorVersion
Different versions of insensitive collator
|
static interface |
AnnotationForwardIndex.ForwardIndexDocTask
A task to perform on a document in the forward index.
|
Modifier and Type | Field and Description |
---|---|
protected boolean |
initialized
Has the tokens file been mapped?
|
Constructor and Description |
---|
AnnotationForwardIndex(Annotation annotation,
File dir,
Collators collators,
boolean largeTermsFileSupport) |
Modifier and Type | Method and Description |
---|---|
int |
addDocument(List<String> content)
Store the given content and assign an id to it
|
abstract int |
addDocument(List<String> content,
List<Integer> posIncr)
Store the given content and assign an id to it.
|
Annotation |
annotation()
The annotation for which this is the forward index
|
boolean |
canDoNfaMatching() |
abstract void |
close()
Close the forward index.
|
abstract void |
deleteDocument(int fiid)
Delete a document from the forward index
|
void |
deleteDocumentByLuceneDoc(org.apache.lucene.document.Document d) |
abstract int |
docLength(int fiid)
Gets the length (in tokens) of a document
|
void |
forEachDocument(AnnotationForwardIndex.ForwardIndexDocTask task)
Perform a task on each document in the forward index.
|
abstract int |
freeBlocks() |
abstract long |
freeSpace() |
int |
getToken(int fiid,
int pos) |
abstract Set<Integer> |
idSet() |
void |
initialize() |
abstract int |
numDocs() |
static AnnotationForwardIndex |
open(File dir,
boolean indexMode,
Collator collator,
boolean create,
Annotation annotation,
boolean buildTermIndexesOnInit)
Open a forward index.
|
abstract List<int[]> |
retrievePartsInt(int fiid,
int[] start,
int[] end)
Retrieve one or more parts from the specified content, in the form of token
ids.
|
protected void |
setLargeTermsFileSupport(boolean b) |
Terms |
terms()
Get the Terms object in order to translate ids to token strings
|
String |
toString() |
long |
totalSize() |
public AnnotationForwardIndex(Annotation annotation, File dir, Collators collators, boolean largeTermsFileSupport)
public static AnnotationForwardIndex open(File dir, boolean indexMode, Collator collator, boolean create, Annotation annotation, boolean buildTermIndexesOnInit)
dir
- forward index directoryindexMode
- true iff we're in index mode (writing to the forward index);
otherwise it will be read-only.collator
- collator to use for sortingcreate
- if true, create a new forward indexannotation
- annotation for which this is the forward index, or null if we don't know (yet)fiidLookup
- how to look up fiid given docIdbuildTermIndexesOnInit
- whether to build term indexes right away or lazilypublic void initialize()
public abstract void close()
public abstract int addDocument(List<String> content, List<Integer> posIncr)
content
- the content to storeposIncr
- the associated position increments, or null if position
increment is always 1.public int addDocument(List<String> content)
content
- the content to storepublic abstract void deleteDocument(int fiid)
fiid
- id of the document to deletepublic void deleteDocumentByLuceneDoc(org.apache.lucene.document.Document d)
public abstract List<int[]> retrievePartsInt(int fiid, int[] start, int[] end)
fiid
- forward index document idstart
- the starting points of the parts to retrieve (in words) (-1 for
start of document)end
- the end points (i.e. first token beyond) of the parts to retrieve
(in words) (-1 for end of document)public Terms terms()
public abstract int numDocs()
public abstract long freeSpace()
public abstract int freeBlocks()
public long totalSize()
public abstract int docLength(int fiid)
fiid
- forward index id of a documentprotected void setLargeTermsFileSupport(boolean b)
public void forEachDocument(AnnotationForwardIndex.ForwardIndexDocTask task)
task
- the task to performpublic int getToken(int fiid, int pos)
public Annotation annotation()
public boolean canDoNfaMatching()
Copyright © 2020 Instituut voor Nederlandse Taal (INT). All rights reserved.