# BlackLab Server

# What is it?

BlackLab Server is a web service providing a REST API for accessing BlackLab corpora. This makes it easy to use BlackLab from your favourite programming language. It can be used for anything from quick analysis scripts to full-featured corpus search applications.

This page explains how to set up and use BlackLab Server.

# Basic installation, configuration

# Using Docker

Images are available on Docker Hub (opens new window). We are preparing for an official Docker release. The current image is usable, but should be considered experimental: details may change in the final version. Also, there's currently no stable release tags, only a latest version (updated from the dev branch with no particular schedule) and several versions of specific commits on the dev branch.

Suggestions for improving the image (and this guide) are welcome.

A Docker version supporting BuildKit (opens new window) is required (18.09 or higher), as well as Docker Compose version 1.27.1 or higher.

We assume here that you are familiar with the BlackLab indexing process; see Indexing with BlackLab to learn more.

Create a file named test.env with your indexing configuration:

BLACKLAB_FORMATS_DIR=/path/to/my/formats
INDEX_NAME=my-index
INDEX_FORMAT=my-file-format
INDEX_INPUT_DIR=/path/to/my/input-files
JAVA_OPTS=-Xmx10G

To index your data:

docker-compose --env-file test.env run --rm indexer

Now start the server:

docker-compose up -d

Your index should now be accessible at http://localhost:8080/blacklab-server/my-index.

See the Docker README (opens new window) for more details.

# Java JRE

Install a JRE (Java runtime environment). BlackLab requires at least version 11, but version 17 or newer versions should work as well.

# Tomcat

BlackLab Server needs a Java application server to run. We will use Apache Tomcat.

Install Tomcat on your machine. See the official docs (opens new window) or an OS-specific guide like this one for Ubuntu (opens new window).

Tomcat 10 not yet supported

BlackLab currently uses Java EE and therefore runs in Tomcat 8 and 9, but not in Tomcat 10 (which migrated to Jakarta EE (opens new window)). If you try to run BlackLab Server on Tomcat 10, you will get a ClassNotFoundException (opens new window). A future release of BlackLab will migrate to Jakarta EE.

# Configuration file

Create a configuration file /etc/blacklab/blacklab-server.yaml.

TIP: Other locations for the configuration file

If /etc/blacklab is not practical for you, you can also place blacklab-server.yaml here:

  • the directory specified in $BLACKLAB_CONFIG_DIR, if Tomcat is started with this environment variable set (create or edit setenv.sh in the Tomcat bin directory to set environment variables, or e.g. put it in /etc/sysconfig/tomcat on a system using systemd)
  • somewhere on Tomcat's Java classpath, e.g. in its lib directory
  • $HOME/.blacklab/ (if you're running Tomcat under your own user account, e.g. on a development machine; $HOME refers to your home directory)

The minimal configuration file only needs to specify a location for your corpora. Create a directory for your corpora, e.g. /data/index and refer to it in your blacklab-server.yaml file:

---
configVersion: 2

# Where BlackLab can find corpora
indexLocations:
- /data/index

Your corpora would be in directories /data/index/corpus1, /data/index/corpus2, etc.

# BlackLab Server WAR

Download the BlackLab Server WAR (Java web application archive). You can either:

Place blacklab-server.war in Tomcat’s webapps directory ($TOMCAT/webapps/, where $TOMCAT is the directory where Tomcat is installed). Tomcat should automatically discover and deploy it, and you should be able to go to http://servername:8080/blacklab-server/ (opens new window) and see the BlackLab Server information page, which includes a list of available corpora.

TIP: Unicode URLs

To ensure the correct handling of accented characters in (search) URLs, you should configure Tomcat (opens new window) to interpret URLs as UTF-8 (by default, it does ISO-8859-1) by adding an attribute URIEncoding="UTF-8" to the <Connector/> element with the attribute port="8080" in Tomcat's server.xml file.

Of course, make sure that URLs you send to BlackLab are URL-encoded using UTF-8 (so e.g. searching for "señor" corresponds to a request like http://myserver/blacklab-server/mycorpus/hits?patt=%22se%C3%B1or%22 . BlackLab Frontend does this by default.

TIP: Memory usage

For larger indices, it is important to give Tomcat's JVM enough heap memory (opens new window). (If heap memory is low and/or fragmented, the JVM garbage collector might start taking 100% CPU moving objects in order to recover enough free space, slowing things down to a crawl.) If you are indexing unique ids for each word, you may also be able to save memory by disabling the forward index for that 'unique id' annotation.

We used to also recommend locking the forward index in memory using the vmtouch utility, but we now believe it's better to leave disk cache management to the operating system.

# Indexing data

You can index your data using the provided commandline tool IndexTool. See Indexing with BlackLab.

Another option is to configure user authentication to allow users to create corpora and add their data using BlackLab Server. See here (opens new window) to get started.

There is currently no way to use BlackLab Server to add data to non-user ("global" or regular) corpora. In the future, this will be available using Solr.

# Searching your corpus

You can try most BlackLab Server requests out by typing URLs into your browser. See How to use and the API reference for more information.

TODO: provide a very short introduction here

We have a full-featured corpus search frontend available. See BlackLab Frontend for more information.

# What's next?