Installation
Via pip
Requirements
The following dependencies need to be installed in order to run synphage on your system.
Python 3.11
-
A Python package manager such as
Pip
oruv
-
Blast+ >= 2.12.0
Install Python and Blast+ using your package manager of choice, or by downloading an installer appropriate for your system from python.org and from the NCBI respectively.
The Python package manager pip
is installed by default with Python, however you may need to upgrade pip to the latest version:
pip install --upgrade pip
Install synphage
synphage
is available as a Python package and can be install with the Python package manager pip
in an opened terminal window.
# Latest
pip install synphage
# Latest
python -m pip install synphage
Step-by-step installation of synphage in Windows Linux Subsystem:
# Install all build python dependencies
sudo apt install build-essential zlib1g-dev libncurses5-dev libgdm-dev libnss3-dev libss1-dev libsqlite3-dev libreadline-dev libffi-dev curl libbz2-dev
# Get the install package for python
wget https://www.python.org/ftp/python/3.11.9/Python-3.11.9.tgz
# Unpack the tarball file
tar -zxvf Python-3.11.9.tgz
# Build Python
cd Python-3.11.9/
./configure --enable-optimizations # (video: 2:39-3:22)
make -j 2 # (video: 3:27-7:44)
sudo make install # (video: 8:05-8:25)
#Test Python Install
python3.11 -V
# Python installed
cd ..
# Install dependencies
sudo apt install libcairo2-dev pkg-config python3-dev
# Create project folder
mkdir -p ~/synphage_home
cd ~/synphage_home
# Create python environment
python3.11 -m venv .venv
source ./.venv/bin/activate
# Install synphage
pip install synphage
# Install the Blast+ dependency
sudo apt install ncbi-blast+
# Run synphage
mkdir /dagster_home
DAGSTER_HOME=$PWD/dagster_home dagster dev -h 0.0.0.0 -p 3000 -m synphage
Run synphage
-
synphage
uses the following environment variables:
-INPUT_DIR
: for specifying the path to the folder containing the user'sGenBank files
. If not set, this path will be defaulted to the temp folder. This path can also be modified at run time.
-OUTPUT_DIR
: for specifying the path to the folder where the data generated during the run will be stored. If not set, this path will be defaulted to the temp folder.
-EMAIL
(optional): for connecting to the NCBI database.
-API_KEY
(optional): for connecting to the NCBI database and download files.
-DAGSTER_HOME
(optional): for storing metadata generated during former run of the pipelineOptional env
EMAIL
andAPI_KEY
are only required for connecting to the NCBI database and downloading GenBank files. If the user only works with local data, these two variables can be ignored.DAGSTER_HOME
is only necessary to keep track of the previous runs and generated metadata. Does not impair data storage if not set.
Setting your env
These variables can be set with a
.env
file located in your working directory (Dagster will automatically load them from the .env file when initialising the pipeline) or can be passed in the terminal before starting to run synphage:INPUT_DIR=path/to/my/data/ OUTPUT_DIR=path/to/synphage/data EMAIL=user.email@email.com API_KEY=UserApiKey
export INPUT_DIR=<path_to_data_folder> export OUTPUT_DIR=<path_to_synphage_folder> export EMAIL=user.email@email.com export API_KEY=UserApiKey
-
Data Input and Output
-
The input data are the GenBank files located in the
INPUT_DIR
. However paths to other data location can be passed at run time for loading data from another directory.Warning
- Only a single path can be configured per loading job run.
- The use of special characters in file names, might causes errors downstream.
GenBank file extensions
.gb
and.gbk
are both valid extension for genbank files -
All output data are located in the
OUTPUT_DIR
set by the user.
This directory can be reused in future runs if the user needs to process additional sequences or simply generate additional synteny diagrams.Warning
- If no output directory is set, the data folder will be the temporary folder by default. Be aware that the naming convention for the temporary folder (temp/, tmp/, ...) depends on your system.
Tip
The current data directory can be checked in the config panel of the jobs.
-
-
Start synphage via dagster web-based interface
To start synphage run the following command:
dagster dev -h 0.0.0.0 -p 3000 -m synphage
Tip
As synphage uses dagster-webserver, -h and -p flags are required to visualise the pipeline in your browser:
-h : Host to use for the Dagster webserver
-p : Port to use for the Dagster webserverTo access the webserver, follow the link displayed in your terminal or copy/paste it in your web-browser. In this example:
http://0.0.0.0:3000
Dagster running from the terminal and link to the webserver -
Stop synphage
After completing your work, you can close the web-browser and stop the process running in the terminal with Ctrl+C .
Dagster shutting down
Via synphage docker image
Requirements
The following dependency needs to be installed in order to run synphage Docker Image on your system.
Docker
orDocker Desktop
- Install docker desktop from the executable.
- Check the full documentation for docker Linux.
- Install docker desktop from the executable.
- Check the full documentation for docker Mac.
- Install docker desktop from the executable.
- Check the full documentation for docker Windows.
Info
When installing docker from the website, the right version should automatically be selected for your computer.
Pull synphage image
-
Open the docker desktop app and go to
Images
.
-
Go to the search bar and search for
synphage
.
Note
The latest image will automatically be selected - advised
-
Pull the image.
SelectPull
and wait for the download to complete. -
synphage docker image is installed
Note
Your Dashboard might look a bit different depending on the Docker Desktop version and your OS.
# Pull the image from docker hub
docker pull vestalisvirginis/synphage:<tag>
# Check the list of installed Docker Images
docker image ls
<tag>
with the latest image tag.
Run synphage
container
-
Start the container
-
Open the drop-down menu
Optional settings
:
-
Set the
host port
to 3000Tip
Setting the port is required to run synphage as it uses a web-interface.
3000 is given as example, any otheravailable
port can be used.Warning
Make sure that the port is available and not already in use (by another running container for example).
-
Set the
Volumes
-
Data Output All output data are located in the
/data
directory of the container.
The output data can be copied after the run from the/data
folder or they can be stored in aDocker Volume
that can be mounted to a new Docker Container and reused in subsequent runs if the user needs to process additional sequences or simply generate additional synteny diagrams.Create a Docker Volume for your data Mount your volume to the docker data volume when starting your container Download the data from the container to you computer -
Dagster home
Metadata generated during the successive runs of the pipeline are stored in/dagster
directory.
Setting aDAGSTER_HOME
Volume is only necessary to keep track of the previous runs and generated metadata. It does not impair data storage if not set.
Danger
All the data will be deleted when the container will be removed. If no Volume is mounted to the /data directory and the user do not save the data, data will be lost.
-
-
Set the environment variables (optional)
synphage
uses the following environment variables:EMAIL
(optional): for connecting to the NCBI database.API_KEY
(optional): for connecting to the NCBI database and download files.DAGSTER_HOME
(optional): for storing metadata generated during former run of the pipeline
Info
EMAIL
andAPI_KEY
are only required for connecting to the NCBI database and downloading GenBank files. If the user only works with local data, these two variables can be ignored.
-
Press the
Run
button
Your container is now running. -
Import local GenBank files (optional)
/user_files
is the directory that received users' GenBank files.
For using locally stored GenBank files, the files can be imported or dragged and dropped (depending on your system) into the/user_files
directory.Warning
- The use of special characters in file names, might causes errors downstream.
Note
.gb
and.gbk
are both valid extension for genbank files -
Connect to the web interface
To connect to the web-interface, select the link to the port or copy this link to your web-browser. -
Stop and remove your container
After completing your work, you can close the web-browser and stop the container. After stopping your container a good practice is to remove it.Stop the container Remove the container
-
synphage
uses the following environment variables:
-EMAIL
(optional): for connecting to the NCBI database.
-API_KEY
(optional): for connecting to the NCBI database and download files.Info
EMAIL
andAPI_KEY
are only required for connecting to the NCBI database and downloading GenBank files. If the user only works with local data, these two variables can be ignored.Tip
These variables can be passed in the terminal before starting to run synphage:
export EMAIL=user.email@email.com export API_KEY=UserApiKey
-
Start the container
To run the container run the following command line:
docker run -d --rm --name my_phage_box -p 3000:3000 vestalisvirginis/synphage:<tag>
Image version
The
<tag>
corresponds to the<tag>
of the downloaded image.Tip
-
As synphage uses dagster-webserver, -p flag is required to visualise the pipeline in your browser:
-p : [host_port:container_port]
The container_port is fixed to 3000. -
To access the webserver, follow the link displayed in your browser or copy/paste it in your web-browser. In this example:
http://0.0.0.0:3000
Tip
- It is good practice to name your containers to find them easily:
--name
- It is also good practice to remove the container at the end of the run. By passing the
--rm
flag, the container will be automatically removed after being stopped.
-
-
Set the
Volumes
-
Data Output All output data are located in the
/data
directory of the container.
The output data can be copied after the run from the/data
folder or they can be stored in aDocker Volume
that can be mounted to a new Docker Container and reused in subsequent run if the user needs to process additional sequences or simply generate additional synteny diagrams.# Create volume synphage_data docker volume create synphage_data # Mount the volume to the /data directory in the container docker run -d --rm --name my_phage_box -v synphage_data:/data -p 3000:3000 vestalisvirginis/synphage:<tag>
docker cp container-id/data/* your/local/data_directory/
-
Dagster home
Metadata generated during the successive runs of the pipeline are stored in/dagster
directory.
Setting aDAGSTER_HOME
Volume is only necessary to keep track of the previous runs and generated metadata. It does not impair data storage if not set.# Create volume synphage_data docker volume create synphage_data docker volume create dagster_home # Mount the volume to the /data directory in the container docker run -d --rm --name my_phage_box -v synphage_data:/data -v dagster_home:/dagster -p 3000:3000 vestalisvirginis/synphage:<tag>
Danger
All the data will be deleted when the container will be removed. If no Volume is mounted to the
/data
directory and the user do not save the data, data will be lost.Warning
Volume names must be unique. You canot set two volumes wit the same name.
-
-
Import local GenBank files (optional)
/user_files
is the directory that received users GenBank files.
For using locally stored GenBank files, the files can be copied into the/user_files
directory.
docker cp path_to_my_gb_files/*.gb* container_id:/user_files
Tip
Start first the container and then copy the files into the container.
Warning
- The use of special characters in file names, might causes errors downstream.
GenBank files extensions
.gb
and.gbk
are both valid extension for genbank files -
Connect to the web interface
To connect to the web-interface, select the link to the port or copy this link to your web-browser.http://0.0.0.0:3000
Dagster running at the start of the docker container -
Stop synphage
After completing your work, you can close the web-browser and stop the process running in the terminal with Ctrl+C .
Dagster shutting down and the docker container is stopped and removed automatically