# Term frequencies

Returns frequencies per term, sorted by descending frequency.

URL : /blacklab-server/<corpus-name>/termfreq

Method : GET

# Parameters

Parameter Description
annotation annotation to get term frequencies for. Default: main annotation (usually word)
sensitive whether or not to list terms case/diacritics sensitively. If not (which is the default), capital letters and diacritics are ignored when counting frequencies, so Het, hét en het will be lumped together and the total reported as het. Default: false
first first result (0-based) to return. Use this to get a page of results from the total set. Default: 0
number maximum number to return. Default: 20.
NOTE: this value is limited by the parameters.pageSize.max setting in blacklab-server.yaml. Pass -1 to get the maximum allowed.
filter Lucene Query Language (opens new window) document filter query
terms comma-separated list of terms for which to get the frequencies. Default: all terms

NOTE: this operation always has to find the frequencies for all terms, even if it only needs to return one page. Hence there is no waitfortotal parameter like some other operations have (you always have to wait). Results are cached though, so after the first page is returned, using multiple requests to page through the results should be fast.

# Success Response

Code : 200 OK

# Content examples

{
  "termFreq": {
    "en": 14221,
    "de": 10540,
    "dat": 9546,
    "van": 9313,
    "te": 6922,
    "het": 6760,
    "met": 5468,
    "een": 5261,
    "in": 5101,
    "is": 5061,
    "ik": 4784,
    "mijn": 4649,
    "niet": 4001,
    "ick": 3773,
    "als": 3724,
    "ende": 3510,
    "den": 3439,
    "die": 3370,
    "soo": 3215,
    "op": 3083
  }
}

# Notes

Regular grouped hits could be used as well and should be decently fast, thanks to an optimization that recognizes this type of query (patt = any token ([]), group by match) and uses a faster path. However, that operation uses the forward index to find term frequencies, whereas this one uses Lucene's term dictionary. We should test for any differences and if there are none (which there shouldn't be), always use the fastest implementation.

After that, we could consider removing this endpoint, or we could keep it for convenience and backwards compatibility.