October 13, 2017   |   Use Case

Make your own custom BLAST database

What is the best way to search for specific group of genes in your newly sequenced bacteria genome?

– Create your own customised database in cloud, and do a local BLAST against it.

An easy way to speed up gene identification is to perform a local BLAST against a smaller database targeted to sequences of interest. The targeted data can be subsetted from a parent database (e.g. SwissProt protein database at NCBI), integrated from publicly available databases (e.g. bacteria virulence factor database), or from an in-house database with manually curated data.

For example, a researcher who studies on marine bacteria that can degrade sulfated polysaccharides may have curated his/her inhouse database containing polysaccharide degradation genes (e.g. alginate lysases, agarase and carrageenans), which were manually mined from different sources. Through local BLAST, he/she is able to perform quick sequence similarity against the curated database, for each of the newly sequenced bacteria genomes, enabling quick discovery of putative polysaccharide degradation genes from different classes of marine bacteria.

The traditional way of setting up custom BLAST databases and performing local BLAST analysis against these databases requires software setup, command line execution (e.g. -makeblastdb -in input.fasta -dbtype nucl) and setting configuration (e.g. -parse_seqids, -dbtype, -outfmt). This type of working environment is unfavourable for researchers who are not experienced in bioinformatics and Linux based operating systems.

To overcome this problem, we built a local BLAST function in Arkgene to help researchers to annotate their bacterial genomes using custom sets of genetic data suited to their specific needs. These custom databases could contain a relatively small set of genetic data, but are of high-quality, well-curated and specific to a species, biosynthesis pathway or biochemical properties.

The manually curated sequences in the form of supported formats (such as Fasta, Fas) can be uploaded to Arkgene cloud under specific folders created by the user (e.g. Database). Next, a local BLAST search can be performed using our BLAST tool, in which you can select the database file, database type, BLAST input file and output filename, BLAST program (blastn, blastp and blastx) and E-value. The sequence file selected as BLAST database will be automatically converted and indexed into compatible database file by using makeblastdb command, running in the background.

The final results for the local BLAST will be computed between a few minutes to an hour, depending on the file sizes of both database and input file, as well as BLAST program used. The results will then be displayed in html format, showing the pairwise alignment of all significant matches, score, E-value, identities and gaps, as well as technical details of the BLAST search (program version, database etc.).

Arkgene users will also be able to share the curated integrated databases with their research collaborators to help facilitate efficient ultilization of data for their work on gene identification and annotation.

With this, Arkgene is not only perfect of individual work but also as a great platform to facilitate easy and efficient research collaborations.