public class DocIndexerPlainTextBasic extends DocIndexerAbstract
nDocumentsSkipped, reader, skippingCurrentDocument, wordsDone
currentLuceneDoc, documentName, docWriter, logger, MAX_DOCVALUES_LENGTH, metadataFieldValues, omitNorms, parameters
Constructor and Description |
---|
DocIndexerPlainTextBasic(DocWriter indexer,
String fileName,
Reader reader) |
Modifier and Type | Method and Description |
---|---|
AnnotationWriter |
addAnnotation(String propName,
AnnotationWriter.SensitivitySetting sensitivity) |
AnnotatedFieldWriter |
getContentsField() |
AnnotationWriter |
getMainAnnotation() |
AnnotationWriter |
getPropPunct() |
int |
getWordPosition()
Returns the current word in the content.
|
void |
index()
Index documents contained in a file.
|
appendContent, appendContent, close, getCharacterPosition, getDescription, getDisplayName, isVisible, processContent, processContent, reportCharsProcessed, reportTokensProcessed, setDocument, startCaptureContent, storeCapturedContent, storePartCapturedContent
addMetadataField, addMetadataFieldsFromParameters, addMetadataToDocument, addNumericFields, addToForwardIndex, getCurrentLuceneDoc, getDocWriter, getMetadataField, getMetadataFieldTypeFromIndexerProperties, getParameter, getParameter, getParameter, getParameter, getSensitivitySetting, hasParameter, luceneTypeFromIndexMetadataType, optTranslateFieldName, setDocument, setDocument, setDocument, setDocumentName, setDocWriter, setOmitNorms, setParameter, setParameters, tokenizeField, warn
public AnnotationWriter getPropPunct()
public AnnotationWriter getMainAnnotation()
public AnnotatedFieldWriter getContentsField()
public int getWordPosition()
public AnnotationWriter addAnnotation(String propName, AnnotationWriter.SensitivitySetting sensitivity)
public void index() throws IOException, MalformedInputFile
DocIndexer
index
in class DocIndexer
IOException
- if an I/O error occurredMalformedInputFile
- if the input file wasn't validCopyright © 2020 Instituut voor Nederlandse Taal (INT). All rights reserved.