SolrCloud on Docker
SolrCloud on Docker
This is a follow-up to my Solr on Docker post. For this one, we’ll use a standalone ZooKeeper node, and three SolrCloud nodes, all in their own Docker containers.
Docker version 0.7, build 0d078b6, on Ubuntu 13.04.
ZooKeeper
The current version of ZooKeeper is 3.4.5, and there is a docker-zookeeper project which runs that in a single-node configuration.
If we build and run that in an instance named “zookeeper”:
cd ~
mkdir zookeeper-docker
cd zookeeper-docker
wget https://raw.github.com/jplock/docker-zookeeper/master/Dockerfile
docker build -t makuk66/zookeeper:3.4.5 .
...
Successfully built 26871fd90d0c
docker run -name zookeeper -p 2181 -p 2888 -p 3888 makuk66/zookeeper:3.4.5
We see that ZooKeeper starts running, and after a few seconds we can verify it’s happy:
$ echo ruok | nc -q 2 localhost `docker port zookeeper 2181|sed 's/.*://'`; echo
imok
SolrCloud: Distributed Solr
The current version of Solr is 4.6.0, so we download that:
cd ~
mkdir solr-docker
cd solr-docker
wget http://www.mirrorservice.org/sites/ftp.apache.org/lucene/solr/4.6.0/solr-4.6.0.tgz
This locally cached copy will get added to Docker container at build time.
Create a Docker file:
cat > Dockerfile <<'EOM'
#
# VERSION 0.2
FROM ubuntu
MAINTAINER Martijn Koster "mak-docker@greenhills.co.uk"
ENV SOLR solr-4.6.0
RUN mkdir -p /opt
ADD $SOLR.tgz /opt/$SOLR.tgz
RUN tar -C /opt --extract --file /opt/$SOLR.tgz
RUN ln -s /opt/$SOLR /opt/solr
RUN apt-get update
RUN apt-get --yes install openjdk-6-jdk
EXPOSE 8983
CMD ["/bin/bash", "-c", "cd /opt/solr/example; java -jar start.jar"]
EOM
and build:
docker build -rm=true -t makuk66/solr4:4.6.0 .
where makuk66 is my username; substitute your own.
If you don’t want to build your own image, you can pull makuk66/docker-solr, and use makuk66/docker-solr
instead of makuk66/solr4:4.6.0
below.
Now we’ll manually run this with docker in the foreground.
The first node bootstraps the collection (like the SolrCloud Example A):
docker run -link zookeeper:ZK -i -p 8983 -t makuk66/solr4:4.6.0
/bin/bash -c 'cd /opt/solr/example; java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -DzkHost=$ZK_PORT_2181_TCP_ADDR:$ZK_PORT_2181_TCP_PORT -DnumShards=2 -jar start.jar'
The -link zookeeper:ZK
makes the network information from the node named “zookeeper”
available as environment variables with the ZK_ prefix.
and then the other two start like:
docker run -link zookeeper:ZK -i -p 8983 -t makuk66/solr4:4.6.0
/bin/bash -c 'cd /opt/solr/example; java -DzkHost=$ZK_PORT_2181_TCP_ADDR:$ZK_PORT_2181_TCP_PORT -jar start.jar'
To show all the running containers:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
1cac635ec128 makuk66/solr4:4.6.0 /bin/bash -c cd /opt 3 seconds ago Up 2 seconds 0.0.0.0:49158->8983/tcp prickly_mccarthy
bd23d3891dd6 makuk66/solr4:4.6.0 /bin/bash -c cd /opt 5 seconds ago Up 4 seconds 0.0.0.0:49157->8983/tcp high_albattani
365a17a69176 makuk66/solr4:4.6.0 /bin/bash -c cd /opt About a minute ago Up About a minute 0.0.0.0:49156->8983/tcp elegant_bardeen
13805a493a79 makuk66/zookeeper:3.4.5 /opt/zookeeper-3.4.5 25 minutes ago Up 25 minutes 0.0.0.0:49153->2181/tcp, 0.0.0.0:49154->2888/tcp, 0.0.0.0:49155->3888/tcp elegant_bardeen/ZK,high_albattani/ZK,prickly_mccarthy/ZK,zookeeper
We can now use one of the exposed ports to look at Solr: http://docker1:49159/solr/#/~cloud
,
which shows the 3 Solr nodes in the cluster running on their own internal IP addresses. Neat.
Of course we won’t believe it’s real unless we see search in action.
So let’s run another docker instance to load some data, using the docker host port for one of the nodes above:
docker run -link zookeeper:ZK -i -t makuk66/solr4:4.6.0 /bin/bash
cd /opt/solr/example/exampledocs
java -Durl=http://192.168.0.221:49158/solr/update -jar post.jar *.xml
and search:
apt-get install wget
wget -O - 'http://192.168.0.221:49158/solr/collection1/select?q=solr&wt=xml'
you can do the same directly to the internal address, which you can find using inspect
:
docker inspect prickly_mccarthy
wget -O - 'http://172.17.0.37:8983/solr/collection1/select?q=solr&wt=xml'
You can see the shards in action by comparing:
wget -O - 'http://192.168.0.221:49158/solr/collection1/select?q=*:*&wt=xml' | sed 's/.*numFound="//' | sed 's/".*//'
32
wget -O - 'http://192.168.0.221:49158/solr/collection1/select?q=*:*&wt=xml&shards=shard1' | sed 's/.*numFound="//' | sed 's/".*//'
14
wget -O - 'http://192.168.0.221:49158/solr/collection1/select?q=*:*&wt=xml&shards=shard2' | sed 's/.*numFound="//' | sed 's/".*//'
18
Also interesting to try is:
docker diff prickly_mccarthy
to see what changes were made to the filesystem.
Further work
We can do a bunch of further polish here:
- we should be able to create images rather than specify command lines
- to allow multiple clusters to co-exist on a single Docker host, we should use something more dynamic than a ‘ZK’ prefix
- it’d be nice if we had a single script that deployed a whole cluster
- we should probably use Data Volumes for index storage
- we may want supervisord/upstart to monitor Java to recover from crashes
- it might be nice to auto-discover the latest versions of ZooKeeper and Solr and use those
- if we register containers, we could consider pre-expanding the Solr
.war
, for sartup speed and to reduce diffs
but those all depend a bit on use-case, and are for another day.
Conclusion
I can really see the value of this approach for certain use-cases.
The resource efficiency, startup speed and cleanliness makes it ideal for proof-of-concept deployments, A/B testing,
and for application developers to use as a local sandbox.
I’m intrigued about production use-cases for this kind of setup. It’s obviously suitable for
multi-tenant deployments, and I’d interested in how you could setup a SolrCloud deployment
across multiple Docker hosts.
LEARN MORE
Contact us today to learn how Lucidworks can help your team create powerful search and discovery applications for your customers and employees.