- All Implemented Interfaces:
- Closeable, AutoCloseable
public class BLDutchTokenizer
extends org.apache.lucene.analysis.util.CharTokenizer
A simple tokenizer for Dutch texts. Basically the whitespace tokenizer with a
few exceptional punctuation characters that are included in tokens.
These are the exceptions: * apostrophes (e.g. zo'n, da's: apostrophes at the
beginning or end of a token will be filtered out later) * dashes (e.g.
ex-man, multi-) * periods (e.g. a.u.b., N.B.; will be filtered out later) *
parens and brackets (e.g. bel(len), (pre)cursor; will be filtered out later)