Skip to content

Synteny diagram of bacteriophage genomes with 'synphage'

For this step-by-step example, a group of closely related Lactococcus 936-type phages has been selected based on name from Bacterial and Viral Bioinformatics Resource Center (BV-BRC).

Selected Lactococcus phages
Selected Lactococcus phages for this example

Lactococcus 936-type phages are especially relevant for the diary industry. Genomes of this phage species are known for its mosaic architecture i.e., they carry genes or segments from distinct evolutionary origins, likely acquired by recombination. In this example, unique and conserved genes and proteins will be highlighted within this phage group, based on analysis of 35 genomes.

Running 'synphage' pipeline

To get familiar with synphage capabilities, you can reproduce the step-by-step example described below. To get started, go to step 1.

Prerequisite:

You need to have synphage installed in a python environment or in a docker container or to have pulled synphage docker image. Start synphage and open the Dagster UI in your browser to get started.

pip install synphage
dagster dev -h 0.0.0.0 -p 3000 -m synphage
For more details, see installation instruction or how to run the software.

docker pull vestalisvirginis/synphage:<tag>
docker run --rm --name my-synphage-container -p 3000 vestalisvirginis/synphage:<tag>
For more details, see installation instruction or how to run the software.

Dagster home page
Dagster UI - landing page

Step 1: Download the data of interest

Go to Image title Image title Dagster_home -> Jobs -> download.

Download job - overview
Overview of the download job
Info

A warning message will pop-up. Select Confirm. This will not impact the smooth run of the download job.

Warning message - Download Job
Warning message - Download job

In order to query our genomes of interest, we need to pass a query to the search-key in the configuration panel.
To access the configuration window, open the dropdown menu (white arrow on the right of the black box located on the up right corner, labelled Materialize all) and select Open launchpad.

Lauchpad
Access the lauchpad to the job configuration

Configure the search_key parameter, changing the default value with the following keywords (Accession names for the genomes previously selected in this case) to query the NCBI database:

KP793101[Accession] OR KP793102[Accession] OR KP793103[Accession] OR KP793105[Accession] OR KP793104[Accession] OR KP793107[Accession] OR KP793106[Accession] OR KP793108[Accession] OR KP793109[Accession] OR KP793112[Accession] OR KP793114[Accession] OR KP793113[Accession] OR KP793110[Accession] OR KP793115[Accession] OR KP793117[Accession] OR KP793118[Accession] OR KP793122[Accession] OR KP793116[Accession] OR KP793120[Accession] OR KP793121[Accession] OR KP793123[Accession] OR KP793126[Accession] OR KP793127[Accession] OR KP793125[Accession] OR KP793124[Accession] OR KP793128[Accession] OR KP793130[Accession] OR KP793129[Accession] OR KP793132[Accession] OR KP793131[Accession] OR KP793133[Accession] OR KP793135[Accession] OR KP793134[Accession] OR KP793111[Accession] OR KP793119[Accession]
Config panel - Download Job
Configuration panel - Download job

Select Materialize, in the right bottom corner.

Most of the assets provides metadata after successful complition of the run. These metadata allow the user to easily follow the smooth execution of the job.

Metadata fetch genome - Download Job
Example of metadata provided for the fetch_genome asset. After complition the user can see the number and the name of GenBank files that have been downloaded.

When the job finishes its execution, we can move to step 2 to run the validation on the data.

Executed - Download Job
Job is terminated

Step 2: Run data quality checks on the dataset

Note

For more detailed information on this step, check the Validation.

Go to Dagster_home -> Jobs -> make_validations.

Job overview - Validation Job ZoomIn- Validation Job
Zoom In
Overview of the validation job

Select Materialize all (black box located on the up right corner).

This job runs checks on each of the files that have been downloaded. The number of checks that pass or fail is directly visible on the assets and reflects how complete each dataset is. The result table for the checks is available in the metadata panel and the detailed results for the checks can be accessed either from the right panel under Checks -> View all check details or by selecting the asset and then the Checks tab.

Assets with checks - Validation Job
The assets display the number of failed and passed checks.
Check table - Validation Job
Result table for the check, accessible from the metadata panel.
Check result overview - Validation Job
Check results are accessible via the right panel.
Asset check results - Validation Job
Full asset check results.

The metadata attached to the second step of the validation inform the user about the logic applied to file processing in later steps, more precisely what feature type the software will be using for downstream processing and what attribute will be used as unique identifier for the coding genes.

Logic - Validation Job
Step 2 of the validation with metadata.

The metadata attached to the last step of the validation, render an overview of the transformed data.

Transformation - Validation Job
Step 3 of the validation with metadata.

Step 3: Run the blast

For this example, blastn was run on the dataset.

Go to Dagster_home -> Jobs -> make_blastn

Overview - Blastn Job
Overview of the blastn job

Select Materialize all (black box located on the up right corner).

Completed - Blastn Job
Completed job

Checks are run at the beginning of the job to verify that the keys and identifiers used for each of the coding element are unique over all the sequences.

Checks - Blastn Job
The checks confirm the uniqueness of the chosen identifier for each of the coding elements.

Several file types are generated during this step. An sample is presented below.

KP793103_1.fna
>9338265281086248732 
ATGCAAAAACAAAACGGTGGCAGGCCCACAATTTTACCTAAGATGTATGAAGAACCGCTTTTTAGTCAAATCATTGATAAAATTGAATCAGGCTGTAATGACAGAGAAATCTACACCAGTTTGCATTGTTCTGCTAAAACTTTTAGAAAGTGGCGAGATGACAATATAAAGGCGTATAACGAAGCTAAAAGCATTGCTAGGGGAAATCTATTAGAACTAGCTGAAAGTGCCTTAGCGAGCAAACTGACAGTCAGAACGCTAAAAGAAACAGAAACAATATATGACGCTGACGGAAACGTTGAAAAAGTAAAGGTTAAAGAAAAAGAACTGGATAAAGATAGCTTAGTAGCAATGATGGTTGCTAAGGCTGGAAACCCTGAACTTTATAACCCTACTGAATGGCGGAGATTACAACAGGAAGAAGCAAGCTCTAATGACCTTAAAGCTAAAATCGAAGAACTTGACGACTATAAACTAAGTAAGTATAAAACGCCAGAAGTTGAAGCACCGAAAGGGTTTGAATAA
>16784793607745071823 
ATGTATTATTTAAATAAAATGTTGGAATACAACAAAGAAAATGGCATTATTATTAATAAATACATTCGTAAGACTATTCAGAAGCAAATACGTATTCATAATAAGTATATTTATCGCTATGACCGTGTTACACAAGCTATTGAGTGGATACAAGACAACTTCTATTTGACTACTGGTAACCTGACGAAAATCGAGCTACTACCGCCACAAATTTGGTGGTACGAGTTAATGCTTGGTTATGATATGATTGATGAAAAAGGCGTTCAGGTCAACCTAGTTAATGAGATTTTCCTTAATTTAGGTCGTGGATCAGGTAAGTCAAGTTTAATGGCTACAAGAGTGCTTAACTGGATGATTTTAGGCGGACAATATGGTGGAGAGAGCTTAGTTATTGCGTACGATAACACACAGGCTAGACACGTATTTGACCAGGTTCGGAATCAAACGGAAGCAAGCGATACATTAAGAGTATACAATGAAAACAAGATTTTCAAGAGTACGAAACAAGGGCTAGAGTTTACTTCTTTTAAAACCACTTTCAAAAAGCAAACAAATGATACTTTGAGGGCGCAAGGTGGTAACAGTTCCCTTAATATATTTGATGAAGTTCATACCTATGGCGAAGATATAACAGAATCAGTCAATAAAGGTTCACGTCAAAAACAAGATAACTGGCAAAGTATTTACATTACTTCTGGTGGACTTAAACGAGACGGTTTATATGATAAACTTGTTGAACGCTTCAAATCAGAAGAAGAATTTTACAATGATAGGTCGTTCGGCTTACTTTACATGCTAGAAAGTCATGAGCAGGTTAAAGATAAGAAGAATTGGACTATGGCATTGCCTCTTATTGGTAATGTCCCTAAGTGGTCAGGAGTTATTGAGGAGTATGAGCTCGCGCAAGGAGACCCAGCGTTACAGAATAAGTTCTTAGCGTTTAATATGGGCTTGCCTATGCAGGACACAGCTTACTACTTCACTCCGCAAGATACTAAACTAACAGACTTCAATTTATCTGTATTTAATAAAAATAGAACTTATGTAGGAATTGACCTATCCTTAATTGGCGATTTAACCGCTGTGTCGTTCGTTTGCGAGTTAGAGGGTAAAACTTACAGTCATACGCTAACTTTCTCTGTACGGTCGCAATATGAGCAACTGGACACAGAACAACAAGAGTTATGGACTGAATTCGTTGACAGAGGTGAACTAATCTTACTTGATACGGAATACATTAATGTAAATGACTTAATACCATATATCAACGACTTTAGAACCAAGACAGGGTGCAGACTTAGAAAAATCGGTTATGACCCAGCACGATATGAGATTTTAAAAGGGTTGATCGAGCGTTATTTCTTCGATAAAGACGGAGATAACCAAAGAGCGATTCGACAAGGTTTCTCGATGAATGACTATATTAAGTTATTAAAATCTAAATTAGTGGAAAATAAACTTATCCATAATCAAAAAGTCATGCAATGGGCTTTAAATAATACTGCTGTTAAAATCGGACAAAGTGGGGATTACATGTATACTAAAAAACTTGAAAAAGATAAAATTGACCCTACTGTGGCTTTAACAATGGCATTAGAAATGGCGGTGTCAGATGAAGTATAA
>10311725312318426547 
ATGAAGTATAACGTTGACACAGTCAGAGAGAGTGGCTGGTACAATAAAAAAGAATGGTTGGCTGTTCGTGATTATGTCAGACAACGTGATAAGATGACTTGCGTAAGATGTGGTGCATTCGGTGCCAAAAAGTACGAAGTTGACCATATTGTAGAACTAACTTGGGAAAATCTTGATGATTGGAAAATAGCGCTAAACCCTGATAACCTACAACTCCTTTGTAAGTCTTGCCATAACAAGAAAACAGGCGAGTATAAACGAGGGAAAGGCGTAAGTTTATGGTAG
>16833909307678177342 
KP793103_1
KP793103_1.ndb
KP793103_1.nhr
KP793103_1.nim
KP793103_1.not
KP793103_1.nsq
KP793103_1.ntf
KP793103_1.nto
KP793103_1_vs_KP793107_1
      "results": {
        "search": {
          "query_id": "Query_1",
          "query_title": "9338265281086248732",
          "query_len": 525,
          "hits": [
            {
              "num": 1,
              "description": [
                {
                  "id": "gnl|BL_ORD_ID|0",
                  "accession": "0",
                  "title": "7763576513298070802 "
                }
              ],
              "len": 525,
              "hsps": [
                {
                  "num": 1,
                  "bit_score": 821.033,
                  "score": 444,
                  "evalue": 0,
                  "identity": 498,
                  "query_from": 1,
                  "query_to": 525,
                  "query_strand": "Plus",
                  "hit_from": 1,
                  "hit_to": 525,
                  "hit_strand": "Plus",
                  "align_len": 525,
                  "gaps": 0,
                  "qseq": "ATGCAAAAACAAAACGGTGGCAGGCCCACAATTTTACCTAAGATGTATGAAGAACCGCTTTTTAGTCAAATCATTGATAAAATTGAATCAGGCTGTAATGACAGAGAAATCTACACCAGTTTGCATTGTTCTGCTAAAACTTTTAGAAAGTGGCGAGATGACAATATAAAGGCGTATAACGAAGCTAAAAGCATTGCTAGGGGAAATCTATTAGAACTAGCTGAAAGTGCCTTAGCGAGCAAACTGACAGTCAGAACGCTAAAAGAAACAGAAACAATATATGACGCTGACGGAAACGTTGAAAAAGTAAAGGTTAAAGAAAAAGAACTGGATAAAGATAGCTTAGTAGCAATGATGGTTGCTAAGGCTGGAAACCCTGAACTTTATAACCCTACTGAATGGCGGAGATTACAACAGGAAGAAGCAAGCTCTAATGACCTTAAAGCTAAAATCGAAGAACTTGACGACTATAAACTAAGTAAGTATAAAACGCCAGAAGTTGAAGCACCGAAAGGGTTTGAATAA",
                  "hseq": "ATGCAAACACAAAACGGTGGAAGACCCACAATTTTACCTAAGATGTACGAAGAACCGCTTTTTAGTCAAATCATTGATAAAATGGAATCAGGCTGCAATGACAGAGAAATCTACACCAGTTTGCATTGTTCAGCTAAAACTTTTAGAAAGTGGCGAGATGACAATATAAAGGCGTATGACGAAGCTAAAAGTATCGCTAGGGGAAATCTATTAGAACTAGCTGAAAGTGCCTTAGCGAGCAAACTGACGGTCAGAACGCTAAAGGAAACAGAGACAATCTATGACGCTGACGGAAACGTTGAAAAAGTAAAGGTTAAAGAAAAAGAACTGGATAAAGACAGCTTGGTAGCGATGATGGTTGCTAAGGCTGGTAACCCTGAACTTTATAACCCTACTGAATGGCGGAGATTACAACAGGAAGAAGCAAGCTCTAATGACCTTAAAGCTAAGATCGAAGAACTTGATGACTATAAACTAAGTAAGTATAAAACGCCAGAAATCGAAGTCCCAGAGGGGTTTGAATAA",

KP793103_1_vs_KP793107_1.parquet

search_target query_id query_key query_len number_of_hits source_key num bit_score score evalue identity query_from query_to query_strand hit_from hit_to hit_strand align_len gaps percentage_of_identity
{"db":"/data/gene_identity/blastn_database/KP793107_1"} Query_1 9338265281086248732 525 1 7763576513298070802 1 821.033 444 0 498 1 525 Plus 1 525 Plus 525 0 94.857
{"db":"/data/gene_identity/blastn_database/KP793107_1"} Query_2 16784793607745071823 1623 1 5312076624733181123 1 2366.68 1281 0 1509 1 1623 Plus 1 1623 Plus 1623 0 92.976
{"db":"/data/gene_identity/blastn_database/KP793107_1"} Query_3 10311725312318426547 285 1 17285202555729306767 1 433.236 234 2.63096e-124 268 1 285 Plus 1 285 Plus 285 0 94.035
{"db":"/data/gene_identity/blastn_database/KP793107_1"} Query_4 16833909307678177342 1137 1 12431818248849801861 1 1945.64 1053 0 1109 1 1137 Plus 1 1137 Plus 1137 0 97.537
{"db":"/data/gene_identity/blastn_database/KP793107_1"} Query_5 7716253111867132591 537 1 6258967671276254993 1 931.832 504 0 526 1 537 Plus 1 537 Plus 537 0 97.952

gene_uniqueness.parquet

query_cds_gene query_cds_locus_tag query_protein_id query_function query_product query_translation query_transl_table query_codon_start query_start_sequence query_end_sequence query_strand query_cds_extract query_gene query_locus_tag query_extract query_translation_fn query_id query_name query_description query_topology query_organism query_taxonomy query_filename query_gb_type query_key query_len source_key query_from query_to percentage_of_identity source_cds_gene source_cds_locus_tag source_protein_id source_function source_product source_translation source_transl_table source_codon_start source_start_sequence source_end_sequence source_strand source_cds_extract source_gene source_locus_tag source_extract source_translation_fn source_id source_name source_description source_topology source_organism source_taxonomy source_filename source_gb_type
0 Phi4_05 ALM63044.1 protease MKLITNSAEIKVTENEDGSKSFQGIGSEVGVENLNGIILTPNCIEFARERYPLLYEHGTGSSEVIGDAKVYYDLASNKYLTDFTLYENAPNINKAVENGAFDSLSIAYYITDYEFNENDSLVVNKALFKEISLVSVPADPNAKFIQNALGEELTEERNKIIESRNALKEIEDIKKKYE 11 1 3821 4358 1 ATGAAACTAATAACCAATAGTGCTGAAATTAAAGTGACTGAAAACGAGGACGGCTCTAAGTCGTTCCAAGGTATCGGGTCAGAAGTTGGTGTAGAGAACCTTAATGGTATTATCTTGACTCCTAACTGTATTGAGTTTGCTAGAGAACGATATCCATTGCTATATGAACATGGTACTGGATCTAGCGAAGTCATCGGGGACGCAAAAGTTTACTATGATTTGGCTTCTAATAAATACCTGACTGACTTTACGCTTTACGAAAATGCACCAAACATTAATAAGGCTGTTGAAAATGGCGCTTTTGACTCACTATCAATTGCCTATTACATTACGGATTATGAGTTTAATGAAAATGATTCTCTAGTCGTAAATAAAGCACTGTTTAAAGAGATTTCTCTCGTTTCAGTACCAGCAGACCCTAACGCAAAATTTATTCAAAACGCGCTAGGCGAAGAACTCACAGAAGAACGTAACAAAATTATTGAAAGCCGAAACGCTTTGAAAGAAATTGAGGATATTAAAAAGAAATATGAATAA Phi4_05 ATGAAACTAATAACCAATAGTGCTGAAATTAAAGTGACTGAAAACGAGGACGGCTCTAAGTCGTTCCAAGGTATCGGGTCAGAAGTTGGTGTAGAGAACCTTAATGGTATTATCTTGACTCCTAACTGTATTGAGTTTGCTAGAGAACGATATCCATTGCTATATGAACATGGTACTGGATCTAGCGAAGTCATCGGGGACGCAAAAGTTTACTATGATTTGGCTTCTAATAAATACCTGACTGACTTTACGCTTTACGAAAATGCACCAAACATTAATAAGGCTGTTGAAAATGGCGCTTTTGACTCACTATCAATTGCCTATTACATTACGGATTATGAGTTTAATGAAAATGATTCTCTAGTCGTAAATAAAGCACTGTTTAAAGAGATTTCTCTCGTTTCAGTACCAGCAGACCCTAACGCAAAATTTATTCAAAACGCGCTAGGCGAAGAACTCACAGAAGAACGTAACAAAATTATTGAAAGCCGAAACGCTTTGAAAGAAATTGAGGATATTAAAAAGAAATATGAATAA MKLITNSAEIKVTENEDGSKSFQGIGSEVGVENLNGIILTPNCIEFARERYPLLYEHGTGSSEVIGDAKVYYDLASNKYLTDFTLYENAPNINKAVENGAFDSLSIAYYITDYEFNENDSLVVNKALFKEISLVSVPADPNAKFIQNALGEELTEERNKIIESRNALKEIEDIKKKYE KP793101.1 KP793101 Lactococcus phage 936 group phage Phi4, complete genome linear Lactococcus phage 936 group phage Phi4 ['Viruses' 'Duplodnaviria' 'Heunggongvirae' 'Uroviricota' 'Caudoviricetes' /data/genbank/KP793101_1.gb locus_tag 12466092645417556140 537 10806178786950748368 1 537 100 PhiA16_05 ALM63095.1 protease MKLITNSAEIKVTENEDGSKSFQGIGSEVGVENLNGIILTPNCIEFARERYPLLYEHGTGSSEVIGDAKVYYDLASNKYLTDFTLYENAPNINKAVENGAFDSLSIAYYITDYEFNENDSLVVNKALFKEISLVSVPADPNAKFIQNALGEELTEERNKIIESRNALKEIEDIKKKYE 11 1 3810 4347 1 ATGAAACTAATAACCAATAGTGCTGAAATTAAAGTGACTGAAAACGAGGACGGCTCTAAGTCGTTCCAAGGTATCGGGTCAGAAGTTGGTGTAGAGAACCTTAATGGTATTATCTTGACTCCTAACTGTATTGAGTTTGCTAGAGAACGATATCCATTGCTATATGAACATGGTACTGGATCTAGCGAAGTCATCGGGGACGCAAAAGTTTACTATGATTTGGCTTCTAATAAATACCTGACTGACTTTACGCTTTACGAAAATGCACCAAACATTAATAAGGCTGTTGAAAATGGCGCTTTTGACTCACTATCAATTGCCTATTACATTACGGATTATGAGTTTAATGAAAATGATTCTCTAGTCGTAAATAAAGCACTGTTTAAAGAGATTTCTCTCGTTTCAGTACCAGCAGACCCTAACGCAAAATTTATTCAAAACGCGCTAGGCGAAGAACTCACAGAAGAACGTAACAAAATTATTGAAAGCCGAAACGCTTTGAAAGAAATTGAGGATATTAAAAAGAAATATGAATAA PhiA16_05 ATGAAACTAATAACCAATAGTGCTGAAATTAAAGTGACTGAAAACGAGGACGGCTCTAAGTCGTTCCAAGGTATCGGGTCAGAAGTTGGTGTAGAGAACCTTAATGGTATTATCTTGACTCCTAACTGTATTGAGTTTGCTAGAGAACGATATCCATTGCTATATGAACATGGTACTGGATCTAGCGAAGTCATCGGGGACGCAAAAGTTTACTATGATTTGGCTTCTAATAAATACCTGACTGACTTTACGCTTTACGAAAATGCACCAAACATTAATAAGGCTGTTGAAAATGGCGCTTTTGACTCACTATCAATTGCCTATTACATTACGGATTATGAGTTTAATGAAAATGATTCTCTAGTCGTAAATAAAGCACTGTTTAAAGAGATTTCTCTCGTTTCAGTACCAGCAGACCCTAACGCAAAATTTATTCAAAACGCGCTAGGCGAAGAACTCACAGAAGAACGTAACAAAATTATTGAAAGCCGAAACGCTTTGAAAGAAATTGAGGATATTAAAAAGAAATATGAATAA MKLITNSAEIKVTENEDGSKSFQGIGSEVGVENLNGIILTPNCIEFARERYPLLYEHGTGSSEVIGDAKVYYDLASNKYLTDFTLYENAPNINKAVENGAFDSLSIAYYITDYEFNENDSLVVNKALFKEISLVSVPADPNAKFIQNALGEELTEERNKIIESRNALKEIEDIKKKYE KP793102.1 KP793102 Lactococcus phage 936 group phage PhiA.16, complete genome linear Lactococcus phage 936 group phage PhiA.16 ['Viruses' 'Duplodnaviria' 'Heunggongvirae' 'Uroviricota' 'Caudoviricetes' /data/genbank/KP793102_1.gb locus_tag
1 Phi4_06 ALM63045.1 major capsid protein MNKPDLIEKQNRLAELKENNVSLKSQISGFEVKNAIEDLPKVQELEKTLSENSIEIIKIENELNAQEEKPKGKAKMTNFIESQNAVTEFFDVLKKNSGKSEIKNAWNAKLAENGVTVTDKTFELPRKLVDSINTTLLNANPVFQVFRVTNVGALLVSRSFDSANEAQVHKDGQQKTEQAATLTIDTLEPVMVYKLQSLAERVKRLQMSYSELYNLIVAELTQAIVNKIVDLALVEGDGTNGFKSIDKEADAKKIKKITTKAKSAGKTPFADAIEEAVDFVRPTAGRRYLIVKAEDRKALLDELRQATANANVRIKNDDAEIASEVGVDEIIVYTGTKAVKPTVLVDQKYHIDMQDITKVDAFEWKTNSNMILVETLTSGHVETL 11 1 4350 5505 1 ATGAATAAACCTGATTTAATCGAAAAACAAAATCGCTTGGCAGAACTTAAAGAAAATAACGTATCTTTAAAATCTCAAATTAGTGGCTTTGAAGTAAAAAACGCAATTGAAGACTTGCCTAAAGTACAAGAATTAGAAAAAACACTTTCAGAAAATTCAATTGAAATTATCAAAATTGAGAACGAACTTAACGCACAGGAAGAAAAACCAAAAGGAAAAGCTAAAATGACAAACTTTATTGAATCACAAAACGCTGTAACAGAATTTTTCGATGTATTGAAAAAGAACTCTGGAAAATCAGAAATTAAAAACGCTTGGAATGCAAAACTTGCTGAAAATGGTGTAACTGTAACAGACAAAACTTTTGAGCTTCCACGTAAATTGGTTGACTCAATCAACACAACTTTGTTAAATGCTAACCCAGTATTCCAAGTCTTCCGTGTTACAAATGTTGGTGCTTTGCTTGTATCACGCTCATTTGATTCAGCTAATGAAGCACAAGTTCACAAAGACGGACAACAAAAAACAGAGCAGGCGGCTACACTCACTATTGACACTCTTGAACCTGTAATGGTTTATAAATTGCAATCACTTGCTGAACGTGTTAAACGACTTCAAATGTCATACTCTGAACTTTACAACTTGATTGTAGCAGAACTTACACAAGCTATCGTTAACAAAATTGTCGACCTTGCTCTTGTTGAGGGTGACGGAACGAACGGTTTTAAATCAATCGACAAAGAAGCAGACGCCAAAAAAATCAAAAAGATTACTACAAAGGCTAAATCAGCTGGAAAAACTCCATTTGCTGACGCTATCGAAGAAGCGGTTGACTTTGTTCGTCCTACTGCTGGTCGTCGTTATTTGATTGTTAAAGCGGAAGACCGCAAAGCATTGTTAGATGAGTTACGTCAAGCGACTGCAAATGCTAACGTTCGTATTAAAAATGATGACGCTGAAATTGCTTCAGAAGTTGGAGTAGATGAAATCATTGTCTATACAGGTACAAAGGCTGTTAAACCTACTGTATTAGTAGACCAAAAATATCATATCGATATGCAAGACATTACAAAAGTTGACGCATTTGAATGGAAAACTAATAGCAACATGATTTTGGTTGAAACACTAACAAGCGGACACGTTGAAACGTTATAA Phi4_06 ATGAATAAACCTGATTTAATCGAAAAACAAAATCGCTTGGCAGAACTTAAAGAAAATAACGTATCTTTAAAATCTCAAATTAGTGGCTTTGAAGTAAAAAACGCAATTGAAGACTTGCCTAAAGTACAAGAATTAGAAAAAACACTTTCAGAAAATTCAATTGAAATTATCAAAATTGAGAACGAACTTAACGCACAGGAAGAAAAACCAAAAGGAAAAGCTAAAATGACAAACTTTATTGAATCACAAAACGCTGTAACAGAATTTTTCGATGTATTGAAAAAGAACTCTGGAAAATCAGAAATTAAAAACGCTTGGAATGCAAAACTTGCTGAAAATGGTGTAACTGTAACAGACAAAACTTTTGAGCTTCCACGTAAATTGGTTGACTCAATCAACACAACTTTGTTAAATGCTAACCCAGTATTCCAAGTCTTCCGTGTTACAAATGTTGGTGCTTTGCTTGTATCACGCTCATTTGATTCAGCTAATGAAGCACAAGTTCACAAAGACGGACAACAAAAAACAGAGCAGGCGGCTACACTCACTATTGACACTCTTGAACCTGTAATGGTTTATAAATTGCAATCACTTGCTGAACGTGTTAAACGACTTCAAATGTCATACTCTGAACTTTACAACTTGATTGTAGCAGAACTTACACAAGCTATCGTTAACAAAATTGTCGACCTTGCTCTTGTTGAGGGTGACGGAACGAACGGTTTTAAATCAATCGACAAAGAAGCAGACGCCAAAAAAATCAAAAAGATTACTACAAAGGCTAAATCAGCTGGAAAAACTCCATTTGCTGACGCTATCGAAGAAGCGGTTGACTTTGTTCGTCCTACTGCTGGTCGTCGTTATTTGATTGTTAAAGCGGAAGACCGCAAAGCATTGTTAGATGAGTTACGTCAAGCGACTGCAAATGCTAACGTTCGTATTAAAAATGATGACGCTGAAATTGCTTCAGAAGTTGGAGTAGATGAAATCATTGTCTATACAGGTACAAAGGCTGTTAAACCTACTGTATTAGTAGACCAAAAATATCATATCGATATGCAAGACATTACAAAAGTTGACGCATTTGAATGGAAAACTAATAGCAACATGATTTTGGTTGAAACACTAACAAGCGGACACGTTGAAACGTTATAA MNKPDLIEKQNRLAELKENNVSLKSQISGFEVKNAIEDLPKVQELEKTLSENSIEIIKIENELNAQEEKPKGKAKMTNFIESQNAVTEFFDVLKKNSGKSEIKNAWNAKLAENGVTVTDKTFELPRKLVDSINTTLLNANPVFQVFRVTNVGALLVSRSFDSANEAQVHKDGQQKTEQAATLTIDTLEPVMVYKLQSLAERVKRLQMSYSELYNLIVAELTQAIVNKIVDLALVEGDGTNGFKSIDKEADAKKIKKITTKAKSAGKTPFADAIEEAVDFVRPTAGRRYLIVKAEDRKALLDELRQATANANVRIKNDDAEIASEVGVDEIIVYTGTKAVKPTVLVDQKYHIDMQDITKVDAFEWKTNSNMILVETLTSGHVETL KP793101.1 KP793101 Lactococcus phage 936 group phage Phi4, complete genome linear Lactococcus phage 936 group phage Phi4 ['Viruses' 'Duplodnaviria' 'Heunggongvirae' 'Uroviricota' 'Caudoviricetes' /data/genbank/KP793101_1.gb locus_tag 14120938943619292890 1155 6959416468477847864 1 1155 99.481 PhiA16_06 ALM63096.1 major capsid protein MNKPDLIEKQNRLAELKENNVSLKSQISGFEVKNAIEDLPKVQELEKTLSENSIEIIKIENELNAQEEKPKGKAKMTNFIESQNAVTEFFDVLKKNSGKSEIKNAWNAKLAENGVTVTDKTFELPRKLVDSINTTLLNANPVFQVFRVTNVGALLVSRSFDSVDEAQVHKDGQQKTEQAATLTIDTLEPVMVYKLQSLAERVKRLQMSYSELYNLIVAELTQAIVNKIVDLALVEGDGTNGFKSIDKEADAKKIKKITTKAKSAGKTPFADAIEEAVDFVRPTAGRRYLILKAEDRKALLDELRQATANANVRIKNDDAEIASEVGVDEIIVYTGTKAVKPTVLVDQKYHIDMQDITKVDAFEWKTNSNMILVETLTSGHVETYNAGAVITVA 11 1 4339 5521 1 ATGAATAAACCTGATTTAATCGAAAAACAAAATCGCTTGGCAGAACTTAAAGAAAATAACGTATCTTTAAAATCTCAAATTAGTGGCTTTGAAGTAAAAAACGCAATTGAAGACTTGCCTAAAGTACAAGAATTAGAAAAAACACTTTCAGAAAATTCAATTGAAATTATCAAAATTGAGAACGAACTTAACGCACAGGAAGAAAAACCAAAAGGAAAAGCTAAAATGACAAACTTTATTGAATCACAAAACGCTGTAACAGAATTTTTCGATGTATTGAAAAAGAACTCTGGAAAATCAGAAATTAAAAACGCTTGGAATGCAAAACTTGCTGAAAATGGTGTAACTGTAACAGACAAAACTTTTGAGCTTCCACGTAAATTGGTTGACTCAATCAACACAACTTTGTTAAATGCTAACCCAGTATTCCAAGTCTTCCGTGTTACAAATGTTGGTGCTTTGCTTGTATCACGCTCATTTGATTCAGTTGATGAAGCACAAGTTCACAAAGACGGACAACAAAAAACAGAGCAGGCGGCTACACTCACTATTGACACTCTTGAACCTGTAATGGTTTATAAATTGCAATCACTTGCTGAACGTGTTAAACGACTTCAAATGTCATACTCTGAACTTTACAACTTGATTGTAGCAGAACTTACACAAGCTATCGTTAACAAAATTGTCGACCTTGCTCTTGTTGAGGGTGACGGAACGAACGGTTTTAAATCAATCGACAAAGAAGCAGACGCCAAAAAAATCAAAAAGATTACTACAAAGGCTAAATCAGCTGGAAAAACTCCATTTGCTGACGCTATCGAAGAAGCGGTTGACTTTGTTCGTCCTACTGCTGGTCGTCGTTATTTGATTCTTAAAGCGGAAGATCGCAAAGCATTGTTAGATGAGTTACGTCAAGCGACTGCAAATGCTAACGTTCGTATTAAAAATGATGACGCTGAAATTGCTTCAGAAGTTGGAGTAGATGAAATTATTGTCTATACAGGTACAAAGGCTGTTAAACCTACTGTATTAGTAGACCAAAAATATCATATCGATATGCAAGACATTACAAAAGTTGACGCATTTGAATGGAAAACTAATAGCAACATGATTTTGGTTGAAACACTAACAAGCGGACACGTTGAAACTTATAACGCTGGTGCAGTAATTACAGTAGCATAA PhiA16_06 ATGAATAAACCTGATTTAATCGAAAAACAAAATCGCTTGGCAGAACTTAAAGAAAATAACGTATCTTTAAAATCTCAAATTAGTGGCTTTGAAGTAAAAAACGCAATTGAAGACTTGCCTAAAGTACAAGAATTAGAAAAAACACTTTCAGAAAATTCAATTGAAATTATCAAAATTGAGAACGAACTTAACGCACAGGAAGAAAAACCAAAAGGAAAAGCTAAAATGACAAACTTTATTGAATCACAAAACGCTGTAACAGAATTTTTCGATGTATTGAAAAAGAACTCTGGAAAATCAGAAATTAAAAACGCTTGGAATGCAAAACTTGCTGAAAATGGTGTAACTGTAACAGACAAAACTTTTGAGCTTCCACGTAAATTGGTTGACTCAATCAACACAACTTTGTTAAATGCTAACCCAGTATTCCAAGTCTTCCGTGTTACAAATGTTGGTGCTTTGCTTGTATCACGCTCATTTGATTCAGTTGATGAAGCACAAGTTCACAAAGACGGACAACAAAAAACAGAGCAGGCGGCTACACTCACTATTGACACTCTTGAACCTGTAATGGTTTATAAATTGCAATCACTTGCTGAACGTGTTAAACGACTTCAAATGTCATACTCTGAACTTTACAACTTGATTGTAGCAGAACTTACACAAGCTATCGTTAACAAAATTGTCGACCTTGCTCTTGTTGAGGGTGACGGAACGAACGGTTTTAAATCAATCGACAAAGAAGCAGACGCCAAAAAAATCAAAAAGATTACTACAAAGGCTAAATCAGCTGGAAAAACTCCATTTGCTGACGCTATCGAAGAAGCGGTTGACTTTGTTCGTCCTACTGCTGGTCGTCGTTATTTGATTCTTAAAGCGGAAGATCGCAAAGCATTGTTAGATGAGTTACGTCAAGCGACTGCAAATGCTAACGTTCGTATTAAAAATGATGACGCTGAAATTGCTTCAGAAGTTGGAGTAGATGAAATTATTGTCTATACAGGTACAAAGGCTGTTAAACCTACTGTATTAGTAGACCAAAAATATCATATCGATATGCAAGACATTACAAAAGTTGACGCATTTGAATGGAAAACTAATAGCAACATGATTTTGGTTGAAACACTAACAAGCGGACACGTTGAAACTTATAACGCTGGTGCAGTAATTACAGTAGCATAA MNKPDLIEKQNRLAELKENNVSLKSQISGFEVKNAIEDLPKVQELEKTLSENSIEIIKIENELNAQEEKPKGKAKMTNFIESQNAVTEFFDVLKKNSGKSEIKNAWNAKLAENGVTVTDKTFELPRKLVDSINTTLLNANPVFQVFRVTNVGALLVSRSFDSVDEAQVHKDGQQKTEQAATLTIDTLEPVMVYKLQSLAERVKRLQMSYSELYNLIVAELTQAIVNKIVDLALVEGDGTNGFKSIDKEADAKKIKKITTKAKSAGKTPFADAIEEAVDFVRPTAGRRYLILKAEDRKALLDELRQATANANVRIKNDDAEIASEVGVDEIIVYTGTKAVKPTVLVDQKYHIDMQDITKVDAFEWKTNSNMILVETLTSGHVETYNAGAVITVA KP793102.1 KP793102 Lactococcus phage 936 group phage PhiA.16, complete genome linear Lactococcus phage 936 group phage PhiA.16 ['Viruses' 'Duplodnaviria' 'Heunggongvirae' 'Uroviricota' 'Caudoviricetes' /data/genbank/KP793102_1.gb locus_tag
2 Phi4_07 ALM63046.1 putative strucural protein 1 MIDYIKVYCGIPILVTAYDSKLILFRSIAIKLLEKNGIKADETSVLVKEFISCYCRLNIVDEPAEQWRNAEMKRLASLQELMYYGGI 11 1 5553 5817 1 ATGATAGATTATATCAAGGTCTATTGTGGTATTCCGATTTTAGTAACAGCTTATGATAGTAAACTTATCTTATTCCGTTCAATAGCTATTAAATTGCTAGAAAAAAATGGTATTAAAGCTGACGAAACAAGTGTATTAGTGAAAGAATTTATCTCTTGTTATTGTCGGCTTAATATTGTTGATGAACCAGCAGAACAATGGCGAAATGCTGAAATGAAACGTTTGGCTTCTTTGCAAGAGTTAATGTATTATGGAGGTATTTAA Phi4_07 ATGATAGATTATATCAAGGTCTATTGTGGTATTCCGATTTTAGTAACAGCTTATGATAGTAAACTTATCTTATTCCGTTCAATAGCTATTAAATTGCTAGAAAAAAATGGTATTAAAGCTGACGAAACAAGTGTATTAGTGAAAGAATTTATCTCTTGTTATTGTCGGCTTAATATTGTTGATGAACCAGCAGAACAATGGCGAAATGCTGAAATGAAACGTTTGGCTTCTTTGCAAGAGTTAATGTATTATGGAGGTATTTAA MIDYIKVYCGIPILVTAYDSKLILFRSIAIKLLEKNGIKADETSVLVKEFISCYCRLNIVDEPAEQWRNAEMKRLASLQELMYYGGI KP793101.1 KP793101 Lactococcus phage 936 group phage Phi4, complete genome linear Lactococcus phage 936 group phage Phi4 ['Viruses' 'Duplodnaviria' 'Heunggongvirae' 'Uroviricota' 'Caudoviricetes' /data/genbank/KP793101_1.gb locus_tag 2759541733603255338 264 7364650362854213369 1 264 100 PhiA16_07 ALM63097.1 hypothetical protein MIDYIKVYCGIPILVTAYDSKLILFRSIAIKLLEKNGIKADETSVLVKEFISCYCRLNIVDEPAEQWRNAEMKRLASLQELMYYGGI 11 1 5541 5805 1 ATGATAGATTATATCAAGGTCTATTGTGGTATTCCGATTTTAGTAACAGCTTATGATAGTAAACTTATCTTATTCCGTTCAATAGCTATTAAATTGCTAGAAAAAAATGGTATTAAAGCTGACGAAACAAGTGTATTAGTGAAAGAATTTATCTCTTGTTATTGTCGGCTTAATATTGTTGATGAACCAGCAGAACAATGGCGAAATGCTGAAATGAAACGTTTGGCTTCTTTGCAAGAGTTAATGTATTATGGAGGTATTTAA PhiA16_07 ATGATAGATTATATCAAGGTCTATTGTGGTATTCCGATTTTAGTAACAGCTTATGATAGTAAACTTATCTTATTCCGTTCAATAGCTATTAAATTGCTAGAAAAAAATGGTATTAAAGCTGACGAAACAAGTGTATTAGTGAAAGAATTTATCTCTTGTTATTGTCGGCTTAATATTGTTGATGAACCAGCAGAACAATGGCGAAATGCTGAAATGAAACGTTTGGCTTCTTTGCAAGAGTTAATGTATTATGGAGGTATTTAA MIDYIKVYCGIPILVTAYDSKLILFRSIAIKLLEKNGIKADETSVLVKEFISCYCRLNIVDEPAEQWRNAEMKRLASLQELMYYGGI KP793102.1 KP793102 Lactococcus phage 936 group phage PhiA.16, complete genome linear Lactococcus phage 936 group phage PhiA.16 ['Viruses' 'Duplodnaviria' 'Heunggongvirae' 'Uroviricota' 'Caudoviricetes' /data/genbank/KP793102_1.gb locus_tag
3 Phi4_08 ALM63047.1 putative strucural protein 2 MIFSQVTLQVETTVKKKNGAEANVIKPIVLPAVKQRISQLRLDEFSMIGLGKNVRYELNGIGEMEDLIFNYFLDEKGDTFKRTTWERNPKNNKMILEGVVSNGL 11 1 5816 6131 1 ATGATATTTTCACAAGTAACATTGCAAGTTGAAACGACTGTTAAGAAGAAGAACGGTGCAGAAGCTAATGTTATAAAGCCTATCGTTTTACCAGCAGTTAAACAGAGAATTAGTCAGTTAAGACTTGATGAGTTTTCTATGATTGGACTAGGTAAAAATGTAAGATACGAGCTTAACGGAATCGGAGAAATGGAAGACTTAATTTTCAACTATTTCTTAGACGAAAAAGGCGATACTTTCAAGCGTACAACATGGGAAAGAAACCCTAAAAATAACAAGATGATTTTAGAGGGGGTCGTGAGCAACGGACTATGA Phi4_08 ATGATATTTTCACAAGTAACATTGCAAGTTGAAACGACTGTTAAGAAGAAGAACGGTGCAGAAGCTAATGTTATAAAGCCTATCGTTTTACCAGCAGTTAAACAGAGAATTAGTCAGTTAAGACTTGATGAGTTTTCTATGATTGGACTAGGTAAAAATGTAAGATACGAGCTTAACGGAATCGGAGAAATGGAAGACTTAATTTTCAACTATTTCTTAGACGAAAAAGGCGATACTTTCAAGCGTACAACATGGGAAAGAAACCCTAAAAATAACAAGATGATTTTAGAGGGGGTCGTGAGCAACGGACTATGA MIFSQVTLQVETTVKKKNGAEANVIKPIVLPAVKQRISQLRLDEFSMIGLGKNVRYELNGIGEMEDLIFNYFLDEKGDTFKRTTWERNPKNNKMILEGVVSNGL KP793101.1 KP793101 Lactococcus phage 936 group phage Phi4, complete genome linear Lactococcus phage 936 group phage Phi4 ['Viruses' 'Duplodnaviria' 'Heunggongvirae' 'Uroviricota' 'Caudoviricetes' /data/genbank/KP793101_1.gb locus_tag 8749918410610294761 315 16664177951761433604 1 315 98.73 PhiA16_08 ALM63098.1 hypothetical protein MIFSQVTLQVETTVKKKNGAEANVIKPIVLPAVKQRISQLRLDEFSMIGLGKNVRYELNGIGEMEDLIFNYFLDEKGDTFKRTTWERNPKNNKMILEGVVSNGL 11 1 5804 6119 1 ATGATATTTTCACAAGTAACATTGCAAGTTGAAACGACTGTTAAGAAGAAGAACGGTGCAGAAGCTAATGTTATAAAGCCTATCGTTTTACCAGCAGTTAAACAGAGAATTAGTCAGTTAAGACTTGATGAGTTTTCTATGATTGGACTAGGTAAAAATGTAAGATACGAGCTTAACGGAATCGGAGAAATGGAAGACTTAATTTTCAACTATTTCTTAGACGAAAAAGGCGATACTTTCAAGCGTACAACATGGGAAAGAAACCCTAAAAATAACAAAATGATTTTAGAAGGAGTCGTGAGTAACGGACTATGA PhiA16_08 ATGATATTTTCACAAGTAACATTGCAAGTTGAAACGACTGTTAAGAAGAAGAACGGTGCAGAAGCTAATGTTATAAAGCCTATCGTTTTACCAGCAGTTAAACAGAGAATTAGTCAGTTAAGACTTGATGAGTTTTCTATGATTGGACTAGGTAAAAATGTAAGATACGAGCTTAACGGAATCGGAGAAATGGAAGACTTAATTTTCAACTATTTCTTAGACGAAAAAGGCGATACTTTCAAGCGTACAACATGGGAAAGAAACCCTAAAAATAACAAAATGATTTTAGAAGGAGTCGTGAGTAACGGACTATGA MIFSQVTLQVETTVKKKNGAEANVIKPIVLPAVKQRISQLRLDEFSMIGLGKNVRYELNGIGEMEDLIFNYFLDEKGDTFKRTTWERNPKNNKMILEGVVSNGL KP793102.1 KP793102 Lactococcus phage 936 group phage PhiA.16, complete genome linear Lactococcus phage 936 group phage PhiA.16 ['Viruses' 'Duplodnaviria' 'Heunggongvirae' 'Uroviricota' 'Caudoviricetes' /data/genbank/KP793102_1.gb locus_tag
4 Phi4_12 ALM63051.1 neck passage structure MSLDNFRNRTILWDTVNKDFPQPIQIMQGDVNARTLLIKIVDNGAQIDLTGYSLKLTYQYTNSSNSGLVMIPPKDLAKGEFILVIPIEMTATGVIEANLILLNKDKEQVIVSKSLTFISDDSTVTNLAQKVNNKIDDFTKLLLENMPQVLRSELNDLHAQTESNKSNVELKANLADMTSLQSAMTELKNEVEAFGISPENLVTIKSLLDAIASNASESEVAELINSVKALTSNVSLMSNGDYSPKANQTDLESLQYAVNDHSATISAKANQTDLNNLQATVDKQGISISEKAEQSELSITNKNVATAQETAKQAESEAKNAMAKATEAQANSLPLNGTAVSALKLATPRKLRVNLQSSSFQYFDGTADATDIGVTGVLQIANGGTSTDDGVVNTIAYANSADGTDGFTTVYPNLNLLNNTRDYAGWTFYRTEILEADGTPTKILKINYDPNAWAGGSSPNIIQSVRPKPGDTVTLSFYAKGHGRIYSTIDGVNGAISADATDDWKLYKLTGVATKEVHSVAIYLHNVDKTSTSIYIHSVKIEKGSTATPWMPSASEVTINDYPKYVGFSNSIKPNKKNSDYKWLPMGLVSIDRVTGSLKPAVIGIDCAEAHPVGSVVTNTSSSSSGYSTGKWENIGSAVIGSTTIYYWKRTA 11 1 7365 9324 1 ATGAGTTTAGATAATTTTAGAAATAGAACGATTTTGTGGGATACAGTTAATAAAGATTTCCCCCAACCAATACAAATAATGCAAGGCGATGTCAATGCTAGAACATTATTAATTAAAATAGTTGATAATGGAGCTCAAATTGACTTAACTGGTTATTCGTTAAAACTTACATATCAATACACTAATAGTAGTAATTCTGGTCTTGTTATGATCCCTCCTAAGGACTTAGCTAAGGGAGAATTTATTTTGGTAATTCCTATCGAAATGACAGCGACAGGAGTTATTGAAGCGAACTTAATACTTCTCAATAAAGACAAAGAGCAAGTTATTGTCAGTAAAAGTCTTACATTTATATCAGATGATTCCACAGTTACAAATTTAGCTCAAAAAGTAAATAATAAGATTGATGATTTTACAAAATTATTATTGGAAAATATGCCACAAGTGTTGCGCAGTGAGTTGAATGACTTACATGCTCAAACTGAATCAAACAAGAGCAATGTTGAGCTTAAAGCAAATTTAGCTGATATGACGAGCTTACAAAGCGCAATGACAGAGCTAAAAAATGAAGTAGAAGCATTTGGTATTAGTCCTGAAAATTTAGTTACTATAAAATCGCTATTAGACGCAATCGCAAGTAACGCCAGTGAATCAGAAGTAGCCGAACTAATAAATTCAGTAAAGGCTTTAACAAGTAACGTTTCTCTGATGAGTAATGGAGATTATTCCCCTAAGGCTAATCAAACTGATTTAGAAAGTTTACAGTATGCTGTAAATGACCATTCGGCAACCATTTCAGCAAAGGCTAATCAAACAGACTTAAATAACTTACAAGCTACTGTTGATAAACAAGGTATCTCTATTTCAGAAAAAGCTGAACAATCAGAGTTATCAATCACAAATAAAAATGTCGCAACTGCTCAAGAAACAGCAAAACAAGCTGAAAGTGAAGCCAAAAATGCAATGGCGAAGGCTACCGAAGCACAAGCGAACAGTTTACCACTTAATGGCACCGCGGTAAGTGCACTCAAACTGGCAACACCTAGAAAACTCAGAGTAAATCTTCAATCTTCATCATTTCAATACTTTGACGGGACTGCTGATGCAACTGATATTGGAGTTACGGGTGTACTTCAAATTGCAAATGGAGGCACTTCAACAGATGACGGAGTTGTAAATACCATTGCCTATGCCAACAGCGCAGACGGTACTGACGGTTTCACGACTGTTTATCCTAATTTGAATCTGTTGAATAATACACGTGATTATGCTGGATGGACATTTTATCGCACAGAAATATTAGAAGCGGATGGAACGCCTACTAAGATTCTTAAAATTAATTACGATCCTAACGCTTGGGCAGGTGGATCTTCACCCAATATCATTCAGTCAGTAAGACCTAAACCTGGCGATACAGTTACTCTTAGTTTCTATGCAAAAGGACATGGTAGGATTTATTCTACTATTGACGGTGTTAATGGAGCAATTAGCGCCGATGCTACTGATGATTGGAAGCTTTACAAGTTGACTGGGGTGGCTACGAAAGAAGTTCATAGTGTCGCTATCTATCTACACAACGTTGACAAGACATCAACAAGCATTTATATTCATTCCGTTAAAATAGAAAAAGGCTCAACCGCCACCCCTTGGATGCCATCAGCTAGCGAAGTAACAATAAATGACTATCCGAAGTATGTGGGGTTTAGTAATAGCATTAAACCAAATAAGAAAAATTCTGATTACAAATGGCTACCAATGGGGTTAGTGTCAATTGATAGGGTTACAGGCTCACTCAAGCCTGCGGTTATAGGTATAGATTGCGCTGAAGCACACCCAGTTGGCTCAGTAGTCACAAATACTTCAAGTTCATCATCAGGATATTCCACAGGCAAATGGGAAAATATCGGTTCAGCAGTAATCGGTTCAACAACAATATATTATTGGAAACGTACTGCATAA Phi4_12 ATGAGTTTAGATAATTTTAGAAATAGAACGATTTTGTGGGATACAGTTAATAAAGATTTCCCCCAACCAATACAAATAATGCAAGGCGATGTCAATGCTAGAACATTATTAATTAAAATAGTTGATAATGGAGCTCAAATTGACTTAACTGGTTATTCGTTAAAACTTACATATCAATACACTAATAGTAGTAATTCTGGTCTTGTTATGATCCCTCCTAAGGACTTAGCTAAGGGAGAATTTATTTTGGTAATTCCTATCGAAATGACAGCGACAGGAGTTATTGAAGCGAACTTAATACTTCTCAATAAAGACAAAGAGCAAGTTATTGTCAGTAAAAGTCTTACATTTATATCAGATGATTCCACAGTTACAAATTTAGCTCAAAAAGTAAATAATAAGATTGATGATTTTACAAAATTATTATTGGAAAATATGCCACAAGTGTTGCGCAGTGAGTTGAATGACTTACATGCTCAAACTGAATCAAACAAGAGCAATGTTGAGCTTAAAGCAAATTTAGCTGATATGACGAGCTTACAAAGCGCAATGACAGAGCTAAAAAATGAAGTAGAAGCATTTGGTATTAGTCCTGAAAATTTAGTTACTATAAAATCGCTATTAGACGCAATCGCAAGTAACGCCAGTGAATCAGAAGTAGCCGAACTAATAAATTCAGTAAAGGCTTTAACAAGTAACGTTTCTCTGATGAGTAATGGAGATTATTCCCCTAAGGCTAATCAAACTGATTTAGAAAGTTTACAGTATGCTGTAAATGACCATTCGGCAACCATTTCAGCAAAGGCTAATCAAACAGACTTAAATAACTTACAAGCTACTGTTGATAAACAAGGTATCTCTATTTCAGAAAAAGCTGAACAATCAGAGTTATCAATCACAAATAAAAATGTCGCAACTGCTCAAGAAACAGCAAAACAAGCTGAAAGTGAAGCCAAAAATGCAATGGCGAAGGCTACCGAAGCACAAGCGAACAGTTTACCACTTAATGGCACCGCGGTAAGTGCACTCAAACTGGCAACACCTAGAAAACTCAGAGTAAATCTTCAATCTTCATCATTTCAATACTTTGACGGGACTGCTGATGCAACTGATATTGGAGTTACGGGTGTACTTCAAATTGCAAATGGAGGCACTTCAACAGATGACGGAGTTGTAAATACCATTGCCTATGCCAACAGCGCAGACGGTACTGACGGTTTCACGACTGTTTATCCTAATTTGAATCTGTTGAATAATACACGTGATTATGCTGGATGGACATTTTATCGCACAGAAATATTAGAAGCGGATGGAACGCCTACTAAGATTCTTAAAATTAATTACGATCCTAACGCTTGGGCAGGTGGATCTTCACCCAATATCATTCAGTCAGTAAGACCTAAACCTGGCGATACAGTTACTCTTAGTTTCTATGCAAAAGGACATGGTAGGATTTATTCTACTATTGACGGTGTTAATGGAGCAATTAGCGCCGATGCTACTGATGATTGGAAGCTTTACAAGTTGACTGGGGTGGCTACGAAAGAAGTTCATAGTGTCGCTATCTATCTACACAACGTTGACAAGACATCAACAAGCATTTATATTCATTCCGTTAAAATAGAAAAAGGCTCAACCGCCACCCCTTGGATGCCATCAGCTAGCGAAGTAACAATAAATGACTATCCGAAGTATGTGGGGTTTAGTAATAGCATTAAACCAAATAAGAAAAATTCTGATTACAAATGGCTACCAATGGGGTTAGTGTCAATTGATAGGGTTACAGGCTCACTCAAGCCTGCGGTTATAGGTATAGATTGCGCTGAAGCACACCCAGTTGGCTCAGTAGTCACAAATACTTCAAGTTCATCATCAGGATATTCCACAGGCAAATGGGAAAATATCGGTTCAGCAGTAATCGGTTCAACAACAATATATTATTGGAAACGTACTGCATAA MSLDNFRNRTILWDTVNKDFPQPIQIMQGDVNARTLLIKIVDNGAQIDLTGYSLKLTYQYTNSSNSGLVMIPPKDLAKGEFILVIPIEMTATGVIEANLILLNKDKEQVIVSKSLTFISDDSTVTNLAQKVNNKIDDFTKLLLENMPQVLRSELNDLHAQTESNKSNVELKANLADMTSLQSAMTELKNEVEAFGISPENLVTIKSLLDAIASNASESEVAELINSVKALTSNVSLMSNGDYSPKANQTDLESLQYAVNDHSATISAKANQTDLNNLQATVDKQGISISEKAEQSELSITNKNVATAQETAKQAESEAKNAMAKATEAQANSLPLNGTAVSALKLATPRKLRVNLQSSSFQYFDGTADATDIGVTGVLQIANGGTSTDDGVVNTIAYANSADGTDGFTTVYPNLNLLNNTRDYAGWTFYRTEILEADGTPTKILKINYDPNAWAGGSSPNIIQSVRPKPGDTVTLSFYAKGHGRIYSTIDGVNGAISADATDDWKLYKLTGVATKEVHSVAIYLHNVDKTSTSIYIHSVKIEKGSTATPWMPSASEVTINDYPKYVGFSNSIKPNKKNSDYKWLPMGLVSIDRVTGSLKPAVIGIDCAEAHPVGSVVTNTSSSSSGYSTGKWENIGSAVIGSTTIYYWKRTA KP793101.1 KP793101 Lactococcus phage 936 group phage Phi4, complete genome linear Lactococcus phage 936 group phage Phi4 ['Viruses' 'Duplodnaviria' 'Heunggongvirae' 'Uroviricota' 'Caudoviricetes' /data/genbank/KP793101_1.gb locus_tag 10926695262399362751 1959 11448547458600305838 1 1959 99.49 PhiA16_11 ALM63101.1 neck passage structure MSLDNFRNRTILWDTVNKDFPQPIQIMQGDVNARTLLIKIVDNGAQIDLTGYSLKLTYQYTNSSNSGLVMIPPKDLAKGEFILVIPTEMTATGVIEANLILLNKDKEQVIVSKSLTFISDNSTVTNLAQKVNNKIDDFTKLLLENMPQVLRSELNDLHAQTESNKSNVELKANLADMTSLQNAMTELKNEVEAFGISPENLVTIKSLLDAIASNASESEVAELINSVKALTSNVSLMSNGDYSPKANQTDLESLQHTVNDHSATISAKANQTDLNNLQATVDKQGISISEKAEQSELSITNKNVATAQETAKQAESEAKNAMAKATEAQANSLPLNGTAVSALKLATPRKLRVNLQSSSFQYFDGTADATDIGVTGVLQIANGGTSTDDGVVNTIAYANSADGTDGFTTVYPNLNLLNNTRDYAGWTFYRTEILEADGTPTKILKINYDPNAWAGGSSPNIIQSVRPKPGDTVTLSFYAKGHGRVYSTIDGVNGAISADATDDWKLYKLTGVATKEVHSVAIYLHNVDKTSTSIYVHSVKIEKGSTATPWMPSASEVTINDYPKYVGFSNSIKPNKKNSDYKWLPMGLVSIDRVTGSLKPAVIGIDCAEAHPVGSVVTNTSSSSSGYSTGKWENIGSAVIGSTTIYYWKRTA 11 1 6869 8828 1 ATGAGTTTAGATAATTTTAGAAATAGAACGATTTTGTGGGATACAGTTAATAAAGATTTCCCCCAACCAATACAAATAATGCAAGGCGATGTCAATGCTAGAACATTATTAATTAAAATAGTTGATAATGGAGCTCAAATTGACTTAACTGGTTATTCGTTAAAACTTACATATCAATACACTAATAGTAGTAATTCTGGTCTTGTTATGATCCCTCCTAAGGACTTAGCTAAGGGAGAATTTATTTTGGTAATTCCTACCGAGATGACAGCGACAGGAGTTATTGAAGCGAACTTAATACTTCTCAATAAAGACAAAGAGCAAGTTATTGTCAGTAAAAGTCTTACATTTATATCAGATAATTCCACAGTTACAAATTTAGCTCAAAAAGTAAATAATAAGATTGATGATTTTACAAAATTATTATTGGAAAATATGCCACAAGTGTTGCGCAGTGAGTTGAATGACTTACATGCTCAAACTGAATCAAACAAGAGCAATGTTGAGCTTAAAGCAAATTTAGCTGATATGACGAGCTTACAAAACGCAATGACAGAGCTAAAAAATGAAGTAGAAGCATTTGGTATTAGTCCTGAAAATTTAGTTACTATAAAATCGCTATTAGACGCAATCGCAAGTAACGCCAGTGAATCAGAAGTAGCCGAACTAATAAATTCAGTAAAGGCTTTAACAAGTAACGTTTCTCTGATGAGTAATGGAGATTATTCCCCTAAGGCTAATCAAACAGATTTAGAAAGTTTACAGCATACTGTAAATGACCATTCGGCAACCATTTCAGCAAAGGCTAATCAAACAGACTTAAATAACTTACAAGCTACTGTTGATAAACAAGGTATCTCTATTTCAGAAAAAGCTGAACAATCAGAGTTATCAATCACAAATAAAAATGTCGCAACTGCTCAAGAAACAGCAAAACAAGCTGAAAGTGAAGCCAAAAATGCAATGGCGAAGGCTACCGAAGCACAAGCGAACAGTTTACCACTTAATGGCACCGCGGTAAGTGCACTCAAACTGGCAACACCTAGAAAACTCAGAGTAAATCTTCAATCTTCATCATTTCAATACTTTGACGGGACTGCTGATGCAACTGATATTGGAGTTACGGGTGTACTTCAAATTGCAAATGGAGGCACTTCAACAGATGACGGAGTTGTAAATACCATTGCCTATGCCAATAGCGCAGACGGTACTGACGGTTTCACGACTGTTTATCCTAATTTGAATCTGTTGAATAATACACGTGATTATGCTGGATGGACATTTTATCGCACAGAAATATTAGAAGCGGATGGAACGCCTACTAAGATTCTTAAAATTAATTACGATCCTAACGCTTGGGCAGGTGGATCTTCACCCAATATCATTCAGTCAGTAAGACCTAAACCTGGCGATACAGTTACTCTTAGTTTCTATGCAAAAGGACATGGTAGGGTTTATTCTACTATTGACGGTGTTAATGGAGCAATTAGCGCCGATGCTACTGATGATTGGAAGCTTTACAAGTTGACTGGGGTGGCTACGAAAGAAGTTCATAGTGTCGCTATCTATCTACACAACGTTGACAAGACATCAACAAGCATTTATGTTCATTCCGTTAAAATAGAAAAAGGCTCAACCGCCACCCCTTGGATGCCATCAGCTAGCGAAGTAACAATAAATGACTATCCGAAGTATGTGGGGTTTAGTAATAGCATTAAACCAAATAAGAAAAATTCTGATTACAAATGGCTACCAATGGGGTTAGTGTCAATTGATAGGGTTACAGGCTCACTCAAGCCTGCGGTTATAGGTATAGATTGCGCTGAAGCACACCCAGTTGGCTCAGTAGTCACAAATACTTCAAGTTCATCATCAGGATATTCCACAGGCAAATGGGAAAATATCGGTTCAGCAGTAATCGGTTCAACAACAATATATTATTGGAAACGTACTGCATAA PhiA16_11 ATGAGTTTAGATAATTTTAGAAATAGAACGATTTTGTGGGATACAGTTAATAAAGATTTCCCCCAACCAATACAAATAATGCAAGGCGATGTCAATGCTAGAACATTATTAATTAAAATAGTTGATAATGGAGCTCAAATTGACTTAACTGGTTATTCGTTAAAACTTACATATCAATACACTAATAGTAGTAATTCTGGTCTTGTTATGATCCCTCCTAAGGACTTAGCTAAGGGAGAATTTATTTTGGTAATTCCTACCGAGATGACAGCGACAGGAGTTATTGAAGCGAACTTAATACTTCTCAATAAAGACAAAGAGCAAGTTATTGTCAGTAAAAGTCTTACATTTATATCAGATAATTCCACAGTTACAAATTTAGCTCAAAAAGTAAATAATAAGATTGATGATTTTACAAAATTATTATTGGAAAATATGCCACAAGTGTTGCGCAGTGAGTTGAATGACTTACATGCTCAAACTGAATCAAACAAGAGCAATGTTGAGCTTAAAGCAAATTTAGCTGATATGACGAGCTTACAAAACGCAATGACAGAGCTAAAAAATGAAGTAGAAGCATTTGGTATTAGTCCTGAAAATTTAGTTACTATAAAATCGCTATTAGACGCAATCGCAAGTAACGCCAGTGAATCAGAAGTAGCCGAACTAATAAATTCAGTAAAGGCTTTAACAAGTAACGTTTCTCTGATGAGTAATGGAGATTATTCCCCTAAGGCTAATCAAACAGATTTAGAAAGTTTACAGCATACTGTAAATGACCATTCGGCAACCATTTCAGCAAAGGCTAATCAAACAGACTTAAATAACTTACAAGCTACTGTTGATAAACAAGGTATCTCTATTTCAGAAAAAGCTGAACAATCAGAGTTATCAATCACAAATAAAAATGTCGCAACTGCTCAAGAAACAGCAAAACAAGCTGAAAGTGAAGCCAAAAATGCAATGGCGAAGGCTACCGAAGCACAAGCGAACAGTTTACCACTTAATGGCACCGCGGTAAGTGCACTCAAACTGGCAACACCTAGAAAACTCAGAGTAAATCTTCAATCTTCATCATTTCAATACTTTGACGGGACTGCTGATGCAACTGATATTGGAGTTACGGGTGTACTTCAAATTGCAAATGGAGGCACTTCAACAGATGACGGAGTTGTAAATACCATTGCCTATGCCAATAGCGCAGACGGTACTGACGGTTTCACGACTGTTTATCCTAATTTGAATCTGTTGAATAATACACGTGATTATGCTGGATGGACATTTTATCGCACAGAAATATTAGAAGCGGATGGAACGCCTACTAAGATTCTTAAAATTAATTACGATCCTAACGCTTGGGCAGGTGGATCTTCACCCAATATCATTCAGTCAGTAAGACCTAAACCTGGCGATACAGTTACTCTTAGTTTCTATGCAAAAGGACATGGTAGGGTTTATTCTACTATTGACGGTGTTAATGGAGCAATTAGCGCCGATGCTACTGATGATTGGAAGCTTTACAAGTTGACTGGGGTGGCTACGAAAGAAGTTCATAGTGTCGCTATCTATCTACACAACGTTGACAAGACATCAACAAGCATTTATGTTCATTCCGTTAAAATAGAAAAAGGCTCAACCGCCACCCCTTGGATGCCATCAGCTAGCGAAGTAACAATAAATGACTATCCGAAGTATGTGGGGTTTAGTAATAGCATTAAACCAAATAAGAAAAATTCTGATTACAAATGGCTACCAATGGGGTTAGTGTCAATTGATAGGGTTACAGGCTCACTCAAGCCTGCGGTTATAGGTATAGATTGCGCTGAAGCACACCCAGTTGGCTCAGTAGTCACAAATACTTCAAGTTCATCATCAGGATATTCCACAGGCAAATGGGAAAATATCGGTTCAGCAGTAATCGGTTCAACAACAATATATTATTGGAAACGTACTGCATAA MSLDNFRNRTILWDTVNKDFPQPIQIMQGDVNARTLLIKIVDNGAQIDLTGYSLKLTYQYTNSSNSGLVMIPPKDLAKGEFILVIPTEMTATGVIEANLILLNKDKEQVIVSKSLTFISDNSTVTNLAQKVNNKIDDFTKLLLENMPQVLRSELNDLHAQTESNKSNVELKANLADMTSLQNAMTELKNEVEAFGISPENLVTIKSLLDAIASNASESEVAELINSVKALTSNVSLMSNGDYSPKANQTDLESLQHTVNDHSATISAKANQTDLNNLQATVDKQGISISEKAEQSELSITNKNVATAQETAKQAESEAKNAMAKATEAQANSLPLNGTAVSALKLATPRKLRVNLQSSSFQYFDGTADATDIGVTGVLQIANGGTSTDDGVVNTIAYANSADGTDGFTTVYPNLNLLNNTRDYAGWTFYRTEILEADGTPTKILKINYDPNAWAGGSSPNIIQSVRPKPGDTVTLSFYAKGHGRVYSTIDGVNGAISADATDDWKLYKLTGVATKEVHSVAIYLHNVDKTSTSIYVHSVKIEKGSTATPWMPSASEVTINDYPKYVGFSNSIKPNKKNSDYKWLPMGLVSIDRVTGSLKPAVIGIDCAEAHPVGSVVTNTSSSSSGYSTGKWENIGSAVIGSTTIYYWKRTA KP793102.1 KP793102 Lactococcus phage 936 group phage PhiA.16, complete genome linear Lactococcus phage 936 group phage PhiA.16 ['Viruses' 'Duplodnaviria' 'Heunggongvirae' 'Uroviricota' 'Caudoviricetes' /data/genbank/KP793102_1.gb locus_tag
5 Phi4_13 ALM63052.1 major tail protein MKLDYNSREIFFGNEALIVADMAKGSNGKPEFTNHKIVTGLVSVGEMEDQAETNSYPADDVPDHGVKKGATLLQGEMVFIQTDQALKEDILGQQRTANGLGWSPTGNWKTKCVQYLIKGRKRDKVTGEFIDGYRVVVYPSLRPTAEATKESETDSVDGVDPIQWTLAVQATESDIYLNGDKKVPAIEYEIWGEQAKSFAKKMEAGLFIMQPDTVLADTFTLVPPVIPNTITAKHGGNDGAIIVPTTLKDSNGETVKVTSVIKDAHGKEATNGKLAPGIYLVTSSADGYKDVTSGFTVTDHS 11 1 9347 10253 1 ATGAAATTAGATTATAACTCACGTGAGATTTTCTTTGGTAATGAAGCTCTAATCGTAGCTGATATGGCTAAGGGAAGCAACGGAAAACCAGAGTTCACTAACCATAAAATCGTAACTGGTTTAGTATCAGTTGGCGAAATGGAAGACCAAGCGGAGACAAACAGCTATCCTGCTGATGACGTGCCAGACCATGGAGTAAAAAAAGGTGCTACCTTGCTTCAAGGCGAAATGGTATTTATTCAAACAGACCAAGCACTTAAAGAAGATATTTTAGGTCAACAAAGAACAGCGAATGGTTTGGGTTGGTCTCCTACTGGTAATTGGAAAACGAAATGTGTTCAGTACCTTATTAAAGGTCGCAAGCGTGATAAAGTTACAGGAGAATTTATTGACGGTTACCGTGTAGTCGTTTATCCAAGTTTGAGACCAACAGCAGAAGCTACAAAAGAATCAGAAACAGATTCAGTAGACGGTGTAGACCCTATTCAATGGACTCTGGCAGTACAAGCGACTGAATCAGATATTTATTTGAATGGCGATAAAAAAGTTCCTGCTATTGAGTATGAAATTTGGGGAGAACAAGCTAAAAGCTTTGCTAAGAAAATGGAAGCAGGCTTATTCATCATGCAACCTGACACAGTTCTAGCTGACACATTTACACTTGTACCTCCTGTTATTCCTAATACGATTACTGCTAAACATGGAGGAAATGACGGAGCAATCATAGTACCTACCACTTTGAAAGACTCTAATGGTGAAACTGTAAAAGTAACATCAGTGATTAAGGACGCACATGGAAAAGAAGCAACAAATGGGAAACTTGCGCCCGGTATCTATCTCGTAACGTCCTCCGCTGACGGTTATAAAGATGTTACCTCAGGGTTTACAGTAACTGACCATTCATAA Phi4_13 ATGAAATTAGATTATAACTCACGTGAGATTTTCTTTGGTAATGAAGCTCTAATCGTAGCTGATATGGCTAAGGGAAGCAACGGAAAACCAGAGTTCACTAACCATAAAATCGTAACTGGTTTAGTATCAGTTGGCGAAATGGAAGACCAAGCGGAGACAAACAGCTATCCTGCTGATGACGTGCCAGACCATGGAGTAAAAAAAGGTGCTACCTTGCTTCAAGGCGAAATGGTATTTATTCAAACAGACCAAGCACTTAAAGAAGATATTTTAGGTCAACAAAGAACAGCGAATGGTTTGGGTTGGTCTCCTACTGGTAATTGGAAAACGAAATGTGTTCAGTACCTTATTAAAGGTCGCAAGCGTGATAAAGTTACAGGAGAATTTATTGACGGTTACCGTGTAGTCGTTTATCCAAGTTTGAGACCAACAGCAGAAGCTACAAAAGAATCAGAAACAGATTCAGTAGACGGTGTAGACCCTATTCAATGGACTCTGGCAGTACAAGCGACTGAATCAGATATTTATTTGAATGGCGATAAAAAAGTTCCTGCTATTGAGTATGAAATTTGGGGAGAACAAGCTAAAAGCTTTGCTAAGAAAATGGAAGCAGGCTTATTCATCATGCAACCTGACACAGTTCTAGCTGACACATTTACACTTGTACCTCCTGTTATTCCTAATACGATTACTGCTAAACATGGAGGAAATGACGGAGCAATCATAGTACCTACCACTTTGAAAGACTCTAATGGTGAAACTGTAAAAGTAACATCAGTGATTAAGGACGCACATGGAAAAGAAGCAACAAATGGGAAACTTGCGCCCGGTATCTATCTCGTAACGTCCTCCGCTGACGGTTATAAAGATGTTACCTCAGGGTTTACAGTAACTGACCATTCATAA MKLDYNSREIFFGNEALIVADMAKGSNGKPEFTNHKIVTGLVSVGEMEDQAETNSYPADDVPDHGVKKGATLLQGEMVFIQTDQALKEDILGQQRTANGLGWSPTGNWKTKCVQYLIKGRKRDKVTGEFIDGYRVVVYPSLRPTAEATKESETDSVDGVDPIQWTLAVQATESDIYLNGDKKVPAIEYEIWGEQAKSFAKKMEAGLFIMQPDTVLADTFTLVPPVIPNTITAKHGGNDGAIIVPTTLKDSNGETVKVTSVIKDAHGKEATNGKLAPGIYLVTSSADGYKDVTSGFTVTDHS KP793101.1 KP793101 Lactococcus phage 936 group phage Phi4, complete genome linear Lactococcus phage 936 group phage Phi4 ['Viruses' 'Duplodnaviria' 'Heunggongvirae' 'Uroviricota' 'Caudoviricetes' /data/genbank/KP793101_1.gb locus_tag 3871623470077828020 906 4088581426845717968 1 906 94.046 PhiA16_12 ALM63102.1 major tail protein MKLDYNSREIFFGNEALVVADMTKGSNGKPEFTNHKIVTGLVSVGEMEDQAETNSYPADDVPDHGVKKGATLLQGEMVFIQTDQALKEDILGQQRTANGLGWSPTGNWKAKCVQYLIKGRKRDKVTGEFIDGYRVVVYPNLRPTAEATKESETDSVDGVDPIQWTLKVQATESDIYLNGDKKVPAIEYEIWGEQAKDFAKKMEAGLFIMQPDTELAGAVTLVAPVIPNVTTATKGNNDGTIVVPATLKDSKGGTVKVTSVIRDAHGKEATNGKLAPGVYLVTSSADGYKDVTSGFTVTDHS 11 1 8851 9757 1 ATGAAATTAGATTATAACTCACGTGAGATTTTCTTTGGTAATGAAGCTCTAGTCGTAGCTGATATGACTAAGGGAAGCAACGGAAAACCAGAGTTCACTAACCATAAAATCGTAACTGGTTTAGTATCAGTTGGCGAAATGGAAGACCAAGCGGAGACAAACAGCTATCCTGCTGATGACGTGCCAGACCATGGAGTAAAAAAAGGTGCTACCTTGCTTCAAGGCGAAATGGTATTTATTCAAACAGACCAAGCACTTAAAGAAGATATTTTAGGTCAACAAAGAACAGCGAATGGCTTGGGTTGGTCTCCTACTGGTAATTGGAAAGCGAAATGTGTTCAGTACCTTATTAAAGGTCGCAAGCGTGATAAAGTTACAGGAGAATTTATTGACGGTTACCGTGTAGTCGTTTATCCAAATTTGAGACCAACAGCAGAAGCAACAAAAGAATCAGAAACAGATTCAGTAGACGGCGTAGACCCTATCCAATGGACTTTGAAAGTACAAGCGACCGAGTCAGATATTTATTTGAATGGCGATAAAAAAGTACCTGCTATTGAATATGAAATTTGGGGAGAACAAGCAAAAGACTTCGCTAAGAAAATGGAAGCAGGCTTATTCATCATGCAACCTGACACGGAATTGGCTGGTGCTGTTACATTAGTTGCTCCAGTTATTCCTAATGTAACTACTGCTACAAAGGGTAATAATGACGGAACAATCGTAGTGCCTGCCACTTTGAAAGACTCTAAGGGTGGAACTGTAAAAGTAACATCAGTAATTAGAGACGCACATGGAAAAGAAGCAACAAATGGGAAACTTGCGCCCGGTGTCTATCTCGTAACGTCCTCCGCTGACGGTTATAAAGATGTTACCTCAGGGTTTACAGTAACTGACCATTCATAA PhiA16_12 ATGAAATTAGATTATAACTCACGTGAGATTTTCTTTGGTAATGAAGCTCTAGTCGTAGCTGATATGACTAAGGGAAGCAACGGAAAACCAGAGTTCACTAACCATAAAATCGTAACTGGTTTAGTATCAGTTGGCGAAATGGAAGACCAAGCGGAGACAAACAGCTATCCTGCTGATGACGTGCCAGACCATGGAGTAAAAAAAGGTGCTACCTTGCTTCAAGGCGAAATGGTATTTATTCAAACAGACCAAGCACTTAAAGAAGATATTTTAGGTCAACAAAGAACAGCGAATGGCTTGGGTTGGTCTCCTACTGGTAATTGGAAAGCGAAATGTGTTCAGTACCTTATTAAAGGTCGCAAGCGTGATAAAGTTACAGGAGAATTTATTGACGGTTACCGTGTAGTCGTTTATCCAAATTTGAGACCAACAGCAGAAGCAACAAAAGAATCAGAAACAGATTCAGTAGACGGCGTAGACCCTATCCAATGGACTTTGAAAGTACAAGCGACCGAGTCAGATATTTATTTGAATGGCGATAAAAAAGTACCTGCTATTGAATATGAAATTTGGGGAGAACAAGCAAAAGACTTCGCTAAGAAAATGGAAGCAGGCTTATTCATCATGCAACCTGACACGGAATTGGCTGGTGCTGTTACATTAGTTGCTCCAGTTATTCCTAATGTAACTACTGCTACAAAGGGTAATAATGACGGAACAATCGTAGTGCCTGCCACTTTGAAAGACTCTAAGGGTGGAACTGTAAAAGTAACATCAGTAATTAGAGACGCACATGGAAAAGAAGCAACAAATGGGAAACTTGCGCCCGGTGTCTATCTCGTAACGTCCTCCGCTGACGGTTATAAAGATGTTACCTCAGGGTTTACAGTAACTGACCATTCATAA MKLDYNSREIFFGNEALVVADMTKGSNGKPEFTNHKIVTGLVSVGEMEDQAETNSYPADDVPDHGVKKGATLLQGEMVFIQTDQALKEDILGQQRTANGLGWSPTGNWKAKCVQYLIKGRKRDKVTGEFIDGYRVVVYPNLRPTAEATKESETDSVDGVDPIQWTLKVQATESDIYLNGDKKVPAIEYEIWGEQAKDFAKKMEAGLFIMQPDTELAGAVTLVAPVIPNVTTATKGNNDGTIVVPATLKDSKGGTVKVTSVIRDAHGKEATNGKLAPGVYLVTSSADGYKDVTSGFTVTDHS KP793102.1 KP793102 Lactococcus phage 936 group phage PhiA.16, complete genome linear Lactococcus phage 936 group phage PhiA.16 ['Viruses' 'Duplodnaviria' 'Heunggongvirae' 'Uroviricota' 'Caudoviricetes' /data/genbank/KP793102_1.gb locus_tag

The generated gene_uniqueness.parquet file is used to generate the downstream graphic.
This file can be read and manipulated with any DataFrame API the user choose, such as Pandas, Apache Spark, Polars, DuckDB but also in a non-programmatic manner using softwares such as Tad.

Note

The pipeline to generate the blastp data follow the same logic and the same type of data is generated, except that it is based on the protein sequences.
The unified dataset generated through the blastp pipeline is saved under protein_uniqueness.parquet.

Step 4: Generate the plot

Go to Dagster_home -> Jobs -> make_plot

Overview - Plot Job
Overview of the plot job.
Info

If only the blastn or the blastp pipeline has been run, a warning message will pop-up. Select confirm to run the job. This will not impair the smooth run of step 4.

Warning message - Plot Job
Warning message - Plot Job.

Tip
  • By default, the graph will be plotted based on the DataFrame generated through the blastn pipeline. To plot the data obtained for the blastp, the value for graph-type needs to be changed to blastp.
  • To see the complete configuration for this step, go to the Configuration.
  • To access the configuration window, open the dropdown menu (white arrow on the right of the black box located on the up right corner, labelled Materialize all) and select Open launchpad.
Launchpad - Plot Job
Launchpad - Plot Job.
create_graph
ops:
    create_genome:
        config:
        sequence_file: sequences.csv
    create_graph:
        config:
        colours:
            - "#fde725"
            - "#90d743"
            - "#35b779"
            - "#21918c"
            - "#31688e"
            - "#443983"
            - "#440154"
        gradient: "#B22222"
        graph_fragments: 1
        graph_pagesize: A4
        graph_shape: linear
        graph_start: 0
        graph_type: blastn
        output_format: SVG
        title: synteny_based_on_blastn
ops:
    create_genome:
        config:
        sequence_file: sequences.csv
    create_graph:
        config:
        colours:
            - "#fde725"
            - "#90d743"
            - "#35b779"
            - "#21918c"
            - "#31688e"
            - "#443983"
            - "#440154"
        gradient: "#B22222"
        graph_fragments: 1
        graph_pagesize: A4
        graph_shape: linear
        graph_start: 0
        graph_type: blastp
        output_format: SVG
        title: synteny_based_on_blastp

Select the Materialize botton to run the job.

Completed job - Plot Job
Job after completion.

Metadata are also available for the plot, including a preview of the graph.

Sequences - Plot Job
Sequences to be plotted.
Diagram preview - Plot Job
Preview of the synteny diagram.

Analysing the data

The output plot(s) allows to quickly visualise conserved and unique genes among our 35 Lactococcus 936-type phage sequences.

Synteny diagram

Synteny diagram based on blastn data

Synteny diagram based on blastp data

In addition, the generated parquet files gene_uniqueness.parquet and protein_uniqueness.parquet, respectively as output of the blastn and the blastp, allows to query for particular gene(s) or sequence(s) of interest.

unique genes and proteins

For example, we would like to query all the unique genes presents in KP793123 sequence (using polars):

Query:
pl.read_parquet('gene_uniqueness.parquet').select('query_name', 'query_locus_tag', 'query_gene', 'query_protein_id', 'percentage_of_identity', 'source_name', 'source_locus_tag', 'source_gene', 'source_protein_id').filter((pl.col('query_name')=='KP793123') & (pl.col('source_name').is_null())).sort('query_locus_tag')
Result:

query_name query_locus_tag query_gene query_protein_id percentage_of_identity source_name source_locus_tag source_gene source_protein_id
KP793123 Phi42_18 ALM64133.1 nan
KP793123 Phi42_19 ALM64134.1 nan
KP793123 Phi42_20 ALM64135.1 nan
KP793123 Phi42_25 ALM64140.1 nan
KP793123 Phi42_26 ALM64141.1 nan

According to the synteny diagram, only four genomes KP793108, KP793119, KP793131 and KP793132 have unique proteins. We can retrieve the protein ids from the Dataframe (using polars):

Query:
pl.read_parquet('protein_uniqueness.parquet').select('query_name', 'query_locus_tag', 'query_gene', 'query_protein_id', 'percentage_of_identity', 'source_name', 'source_locus_tag', 'source_gene', 'source_protein_id').filter(pl.col('source_name').is_null()).sort('query_name', 'query_locus_tag')
Result:

query_name query_locus_tag query_gene query_protein_id percentage_of_identity source_name source_locus_tag source_gene source_protein_id
KP793108 Phi512_34 ALM63433.1 nan
KP793108 Phi512_45 ALM63444.1 nan
KP793119 Phi105_42 ALM64941.1 nan
KP793119 Phi105_44 ALM64943.1 nan
KP793131 PhiE1127_54 ALM64620.1 nan
KP793132 PhiM1127_03 ALM64629.1 nan

Gene(s), locus_tag(s) and protein_id(s) of interest can also be directly queried (using polars): example of locus_tag Phi19_46

pl.read_parquet('gene_uniqueness.parquet').select('query_name', 'query_locus_tag', 'query_gene', 'query_protein_id', 'percentage_of_identity', 'source_name', 'source_locus_tag', 'source_gene', 'source_protein_id', 'source_function', 'source_product').filter(pl.col('query_name')=='KP793103').filter(pl.col('query_locus_tag')=='Phi19_46').sort('source_name')
Result:

query_name query_locus_tag query_gene query_protein_id percentage_of_identity source_name source_locus_tag source_gene source_protein_id source_function source_product
KP793103 Phi19_46 ALM63185.1 100 KP793101 Phi4_47 ALM63086.1 hypothetical protein
KP793103 Phi19_46 ALM63185.1 92.063 KP793104 PhiB1127_49 ALM63238.1 hypothetical protein
KP793103 Phi19_46 ALM63185.1 94.18 KP793105 Phi193_48 ALM63290.1 hypothetical protein
KP793103 Phi19_46 ALM63185.1 98.413 KP793106 PhiA1127_48 ALM63342.1 hypothetical protein
KP793103 Phi19_46 ALM63185.1 91.053 KP793108 Phi512_49 ALM63448.1 hypothetical protein
KP793103 Phi19_46 ALM63185.1 96.045 KP793109 PhiC0139_49 ALM63501.1 hypothetical protein
KP793103 Phi19_46 ALM63185.1 94.18 KP793111 Phi192_51 ALM64895.1 hypothetical protein
KP793103 Phi19_46 ALM63185.1 93.122 KP793113 PhiF17_52 ALM63665.1 hypothetical protein
KP793103 Phi19_46 ALM63185.1 89.474 KP793114 Phi17_52 ALM63721.1 hypothetical protein
KP793103 Phi19_46 ALM63185.1 94.18 KP793115 Phi114_49 ALM63774.1 hypothetical protein
KP793103 Phi19_46 ALM63185.1 94.18 KP793116 Phi1316_50 ALM63828.1 hypothetical protein
KP793103 Phi19_46 ALM63185.1 93.478 KP793117 PhiG_53 ALM63885.1 hypothetical protein
KP793103 Phi19_46 ALM63185.1 92.593 KP793118 PhiF0139_52 ALM63942.1 hypothetical protein
KP793103 Phi19_46 ALM63185.1 95.238 KP793119 Phi105_59 ALM64958.1 hypothetical protein
KP793103 Phi19_46 ALM63185.1 90.526 KP793120 PhiL18_51 ALM63998.1 hypothetical protein
KP793103 Phi19_46 ALM63185.1 92.593 KP793121 Phi109_50 ALM64052.1 hypothetical protein
KP793103 Phi19_46 ALM63185.1 91.579 KP793122 PhiL6_54 ALM64110.1 hypothetical protein
KP793103 Phi19_46 ALM63185.1 94.709 KP793123 Phi42_54 ALM64169.1 hypothetical protein
KP793103 Phi19_46 ALM63185.1 94.18 KP793124 Phi44_48 ALM64222.1 hypothetical protein
KP793103 Phi19_46 ALM63185.1 89.474 KP793125 Phi91127_52 ALM64278.1 hypothetical protein
KP793103 Phi19_46 ALM63185.1 89.474 KP793126 PhiM5_53 ALM64337.1 hypothetical protein
KP793103 Phi19_46 ALM63185.1 90 KP793127 Phi40_52 ALM64394.1 hypothetical protein
KP793103 Phi19_46 ALM63185.1 90.526 KP793128 PhiM16_50 ALM64449.1 hypothetical protein
KP793103 Phi19_46 ALM63185.1 93.478 KP793129 PhiJF1_53 ALM64506.1 hypothetical protein
KP793103 Phi19_46 ALM63185.1 90 KP793130 Phi155_50 ALM64561.1 hypothetical protein
KP793103 Phi19_46 ALM63185.1 89.474 KP793131 PhiE1127_55 ALM64621.1 hypothetical protein
KP793103 Phi19_46 ALM63185.1 90.526 KP793132 PhiM1127_55 ALM64681.1 hypothetical protein
KP793103 Phi19_46 ALM63185.1 89.474 KP793135 Phi16_52 ALM64839.1 hypothetical protein