# BlackLab Server
# What is it?
BlackLab Server is a web service providing a REST API for accessing BlackLab corpora. This makes it easy to use BlackLab from your favourite programming language. It can be used for anything from quick analysis scripts to full-featured corpus search applications.
This page explains how to set up and use BlackLab Server.
# Basic installation, configuration
# Using Docker
Images are available on Docker Hub (opens new window). We are preparing for an official Docker release. The current image is usable, but should be considered experimental: details may change in the final version. Also, there's currently no stable release tags, only a
latest version (updated from the
dev branch with no particular schedule) and
several versions of specific commits on the
Suggestions for improving the image (and this guide) are welcome.
A Docker version supporting BuildKit (opens new window) is required (18.09 or higher), as well as Docker Compose version 1.27.1 or higher.
We assume here that you are familiar with the BlackLab indexing process; see Indexing with BlackLab to learn more.
Create a file named
test.env with your indexing configuration:
BLACKLAB_FORMATS_DIR=/path/to/my/formats INDEX_NAME=my-index INDEX_FORMAT=my-file-format INDEX_INPUT_DIR=/path/to/my/input-files JAVA_OPTS=-Xmx10G
To index your data:
docker-compose --env-file test.env run --rm indexer
Now start the server:
docker-compose up -d
Your index should now be accessible at http://localhost:8080/blacklab-server/my-index.
See the Docker README (opens new window) for more details.
# Java JRE
Install a JRE (Java runtime environment). BlackLab requires at least version 11, but version 17 or newer versions should work as well.
BlackLab Server needs a Java application server to run. We will use Apache Tomcat.
Tomcat 10 not yet supported
BlackLab currently uses Java EE and therefore runs in Tomcat 8 and 9, but not in Tomcat 10 (which migrated to Jakarta EE (opens new window)). If you try to run BlackLab Server on Tomcat 10, you will get a ClassNotFoundException (opens new window). A future release of BlackLab will migrate to Jakarta EE.
# Configuration file
Create a configuration file
TIP: Other locations for the configuration file
/etc/blacklab is not practical for you, you can also place
- the directory specified in
$BLACKLAB_CONFIG_DIR, if Tomcat is started with this environment variable set (create or edit
setenv.shin the Tomcat
bindirectory to set environment variables, or e.g. put it in
/etc/sysconfig/tomcaton a system using systemd)
- somewhere on Tomcat's Java classpath, e.g. in its
$HOME/.blacklab/(if you're running Tomcat under your own user account, e.g. on a development machine;
$HOMErefers to your home directory)
The minimal configuration file only needs to specify a location for your corpora. Create a directory for your corpora, e.g.
/data/index and refer to it in your
--- configVersion: 2 # Where BlackLab can find corpora indexLocations: - /data/index
Your corpora would be in directories
# BlackLab Server WAR
Download the BlackLab Server WAR (Java web application archive). You can either:
- download the binary attached to the latest release (opens new window) (the file should be called
- clone the repository (opens new window) and build it using Maven (
mvn package; WAR file will be in
blacklab-server.war in Tomcat’s
webapps directory (
$TOMCAT is the directory where Tomcat is installed). Tomcat should automatically discover and deploy it, and you should be able to go to http://servername:8080/blacklab-server/ (opens new window) and see the BlackLab Server information page, which includes a list of available corpora.
TIP: Unicode URLs
To ensure the correct handling of accented characters in (search) URLs, you should configure Tomcat (opens new window) to interpret URLs as UTF-8 (by default, it does ISO-8859-1) by adding an attribute
URIEncoding="UTF-8" to the
<Connector/> element with the attribute
port="8080" in Tomcat's
Of course, make sure that URLs you send to BlackLab are URL-encoded using UTF-8 (so e.g. searching for
"señor" corresponds to a request like
http://myserver/blacklab-server/mycorpus/hits?patt=%22se%C3%B1or%22 . BlackLab Frontend does this by default.
TIP: Memory usage
For larger indices, it is important to give Tomcat's JVM enough heap memory (opens new window). (If heap memory is low and/or fragmented, the JVM garbage collector might start taking 100% CPU moving objects in order to recover enough free space, slowing things down to a crawl.) If you are indexing unique ids for each word, you may also be able to save memory by disabling the forward index for that 'unique id' annotation.
We used to also recommend locking the forward index in memory using the
vmtouch utility, but we now believe it's better to leave disk cache management to the operating system.
# Indexing data
You can index your data using the provided commandline tool IndexTool. See Indexing with BlackLab.
Another option is to configure user authentication to allow users to create corpora and add their data using BlackLab Server. See here (opens new window) to get started.
There is currently no way to use BlackLab Server to add data to non-user ("global" or regular) corpora. In the future, this will be available using Solr.
# Searching your corpus
TODO: provide a very short introduction here
We have a full-featured corpus search frontend available. See BlackLab Frontend for more information.