public class DocIndexerConvertAndTag extends DocIndexerConfig
It shares the ConfigInputFormat object with the actual DocIndexer, and should be considered an internal implementation detail of the DocIndexer system.
config
wordsDone
currentLuceneDoc, documentName, docWriter, logger, MAX_DOCVALUES_LENGTH, metadataFieldValues, omitNorms, parameters
Constructor and Description |
---|
DocIndexerConvertAndTag(DocIndexerConfig actualIndexer,
ConfigInputFormat config) |
Modifier and Type | Method and Description |
---|---|
void |
addMetadataField(String fieldName,
String value) |
void |
addNumericFields(Collection<String> fields) |
void |
close() |
protected int |
getCharacterPosition() |
org.apache.lucene.document.Document |
getCurrentLuceneDoc() |
DocWriter |
getDocWriter()
Returns our DocWriter object
|
void |
index()
Index documents contained in a file.
|
void |
indexSpecificDocument(String documentExpr)
Index a specific document.
|
void |
setConfigInputFormat(ConfigInputFormat config) |
void |
setDocument(InputStream is,
Charset cs)
Set the document to index.
|
void |
setDocument(Reader reader)
Use
setDocument(InputStream, Charset) if at
all possible. |
void |
setDocWriter(DocWriter indexer)
Set the DocWriter object.
|
void |
setOmitNorms(boolean b)
Enables or disables norms.
|
boolean |
shouldAddDefaultPunctuation() |
protected void |
storeDocument()
Store (or finish storing) the document in the content store.
|
fromConfig, getSensitivitySetting, init, opChatFormatAgeToMonths, optTranslateFieldName, processLinkedDocument, processMetadataValue, processString, processStringMultipleValues, replaceDollarRefs
addAnnotatedField, addEndChar, addStartChar, annotation, beginWord, dedupe, endDocument, endWord, getAnnotatedField, getAnnotatedFields, getAnnotation, getContentStoreName, getCurrentTokenPosition, getMainAnnotatedField, getMetadataFetcher, indexLinkedDocument, inlineTag, isStoreDocuments, propMain, propPunct, propTags, punctuation, reportCharsProcessed, reportTokensProcessed, resolveFileReference, setAddDefaultPunctuation, setCurrentAnnotatedFieldName, setPreventNextDefaultPunctuation, setStoreDocuments, startDocument, storeWholeDocument, trace, traceln
addMetadataFieldsFromParameters, addMetadataToDocument, addToForwardIndex, getMetadataField, getMetadataFieldTypeFromIndexerProperties, getParameter, getParameter, getParameter, getParameter, getSensitivitySetting, hasParameter, luceneTypeFromIndexMetadataType, setDocument, setDocument, setDocumentName, setParameter, setParameters, tokenizeField, warn
public DocIndexerConvertAndTag(DocIndexerConfig actualIndexer, ConfigInputFormat config)
public void close() throws BlackLabRuntimeException
close
in interface AutoCloseable
close
in class DocIndexer
BlackLabRuntimeException
public void setDocument(Reader reader)
setDocument(InputStream, Charset)
if at
all possible.setDocument
in class DocIndexer
reader
- documentpublic void setDocument(InputStream is, Charset cs)
DocIndexer
setDocument
in class DocIndexer
is
- document contentscs
- charset to use if no BOM found, or null for the default (utf-8)public void index() throws PluginException, MalformedInputFile, IOException
DocIndexer
index
in class DocIndexerConfig
PluginException
- if an error occurred in a pluginMalformedInputFile
- if the input file wasn't validIOException
- if an I/O error occurredprotected int getCharacterPosition()
getCharacterPosition
in class DocIndexer
protected void storeDocument()
DocIndexerBase
storeDocument
in class DocIndexerBase
public void setDocWriter(DocWriter indexer)
DocIndexer
setDocWriter
in class DocIndexer
indexer
- our DocWriter objectpublic void addMetadataField(String fieldName, String value)
addMetadataField
in class DocIndexerBase
public void addNumericFields(Collection<String> fields)
addNumericFields
in class DocIndexer
public org.apache.lucene.document.Document getCurrentLuceneDoc()
getCurrentLuceneDoc
in class DocIndexer
public DocWriter getDocWriter()
DocIndexer
getDocWriter
in class DocIndexer
public void indexSpecificDocument(String documentExpr)
DocIndexerBase
indexSpecificDocument
in class DocIndexerConfig
documentExpr
- Expression (e.g. XPath) used to find the document to
index in the filepublic void setConfigInputFormat(ConfigInputFormat config)
setConfigInputFormat
in class DocIndexerConfig
public void setOmitNorms(boolean b)
DocIndexer
setOmitNorms
in class DocIndexer
b
- if true, doesn't store norms; if false, does store normspublic boolean shouldAddDefaultPunctuation()
shouldAddDefaultPunctuation
in class DocIndexerBase
Copyright © 2020 Instituut voor Nederlandse Taal (INT). All rights reserved.