October 7, 2017   |   Use Case

Unraveling the structure and function of your bacterial genome

Many valuable genetic information are embedded in microbial genomes, such as rRNA, tRNA and protein coding genes.

Prediction of these structural genes and the addition of pertinent information to these sequences, generally known as genome annotation, are relatively important to dissect the biological functions of a microbial genome. To date, there are tools available to predict each of these structural genes, as well as BLAST sequence similarity search to assign the functions to the predicted structural genes.

However, there are a few steps involved, many steps are involved and requires different tools, and could be resource intensive especially for larger genomes and higher number of samples. Moreover, prior knowledge on software configuration, command line execution and Linux/UNIX-like operating systems are important to run all the analysis. Thus, it is unfavourable for beginners in genomics and bioinformatics. Arkgene aims to help the users to bypass these obstacles by offering an automated and optimized microbial annotation pipeline, bridging the big gap between initial genome sequence and final annotation outputs.

As an example, a food biotechnologist from a local university has isolated a new species of Lactobacillus from a food sample. He/she would like to know the genomic characteristics of this new species which might provide clues into expressed hydrolytic enzymes. This class of enzyme has the ability for bio-transformation of milk to cheese and promote cheese ripening/maturation. He/she has the DNA sequencing data and genome assembly for this new species. However, due to different domain of expert, he/she is not competent in performing bioinformatics analysis to gain insights into functional genomics of this new species.

With the introduction of Arkgene, he/she was able to upload the genome assembly files to the cloud storage and run automated genome annotation pipelines. The annotation process of the ~3 Mb Lactobacillus genome took only ~2 minutes and generated a set of annotation outputs such as tRNA, rRNA and functional genes.

Examples of annotation output files:

  • filename.tRNA.out – a list of predicted tRNAs with their tRNA type, anti codon, starting and ending nucleotide bounds and Cove score
  • filename.rRNA.gff2 – a list of predicted rRNAs (23S, 18S and 5S) with sequence start and end, score, strand and frame
  • filename.genemodel.gff – a list of CDS with sequence start and end, score, strand and frame
  • filename.annot.txt – a list of annotation files search against the microbial sequences from NCBI SwissProt database using DIAMOND blastp

By looking into the list of protein-coding genes provided, he/she was able to identify candidate hydrolytic enzymes involved in proteolysis and lipolysis during milk-cheese conversion. These enzymes could be promising target in metabolic engineering of bacteria and industrial-scale production of cheese.

With Arkgene, the gap between raw data and new discoveries of important genes can be filled.