Developer’s Guide to Memex Explorer

Setting up Memex Explorer

To set up your machine, you will need Anaconda or Miniconda installed. Miniconda is a minimal Anaconda installation that bootstraps conda and Python on any operating system. Install Anaconda or Miniconda from their respective sites.

Memex Explorer requires conda, either from Miniconda or Anaconda.

Application Setup

To set up a developer’s environment, clone the repository, then run the app_setup.sh script:

$ git clone https://github.com/memex-explorer/memex-explorer.git
$ cd memex-explorer/source
$ ./app_setup.sh

You can then start the application from this directory:

$ source activate memex
$ supervisord

Memex Explorer will now be running locally at http://localhost:8000.

Tests

To run the tests, return to the root directory and run:

$ py.test

The Database Model

The current entity relation diagram:
_images/DbVisualizer.png

Updating the Database

As of version 0.4.0, Memex Explorer will start tracking all database migrations. This means that you will be able to upgrade your database and preserve the data without any issues.

If you are using a version that is 0.3.0 or earlier, and you are unable to update your database without server errors, the best course of action is to delete the existing file at source/db.sqlite3 and start over with a fresh database.

Enabling Non-Default Services

Nutch Visualizations

Nutch visualizations are not enabled by default. Nutch visualizations require RabbitMQ, and the method for installing RabbitMQ varies depending on the operating system. RabbitMQ can be installed via Homebrew on Mac, and apt-get on Debian systems. For more information on how to install RabbitMQ, read this page. Note: You may also need to change the below command to sudo rabbitmq-server, depending on how RabbitMQ is installed on your system and the permissions of the current user.

RabbitMQ and Bokeh-Server are necessary for creating the Nutch visualizations. The Nutch streaming visualization works by creating and subscribing to a queue of AMQP messages (hosted by RabbitMQ) being dispatched from Nutch as it runs the crawl. A background task reads the messages and updates the plot (hosted by Bokeh server).

To enable Bokeh visualizations for Nutch, change autostart=false to autostart=true for both of these directives in source/supervisord.conf, and then kill and restart supervisor.

[program:rabbitmq]
command=rabbitmq-server
priority=1
-autostart=false
+autostart=true

[program:bokeh-server]
command=bokeh-server --backend memory --port 5006
priority=1
-autostart=false
+autostart=true

Domain Discovery Tool (DDT)

Domain Discovery Tool can be installed as a conda package. Simply run conda install ddt to download the package for DDT.

Like with Nutch visualizations, to enable DDT, change the directive in source/supervisord.

[program:ddt]
command=ddt
priority=5
-autostart=false
+autostart=false

Temporal Anomaly Detection (TAD)

TAD does not currently have a conda package. Like the Nutch visualizations, it also has a RabbitMQ dependency. For instructions on installing TAD, visit the github repository.

Like DDT and Nutch Visualizations, you also have to change the supervisor directive.

[program:tad]
command=tad
priority=5
-autostart=false
+autostart=false