fast.genomicscompare 6,377 genera of bacteria & archaea

 

Enter an identifier, a protein sequence, or a genus name and a protein description
examples: ING2E5A_RS06865, P0A884, 3osdA, Escherichia thymidylate synthase

Or search for a taxon:
use % for wild cards

About fast.genomics

Fast.genomics includes representative genomes for 6,377 genera of Archaea and Bacteria. These were classified by using the Genome Tree Database. Only high-quality genomes are included. Potential chimeras were excluded using GUNC. Where possible, genomes were taken from NCBI's RefSeq.

Fast.genomics uses mmseqs2 to find homologs for a protein sequence of interest. This usually takes a few seconds. To speed up the search, fast.genomics keeps the mmseqs2 index in memory and runs the alignment step in parallel. The protein of interest need not be in the fast.genomics database.

Once the homologs are identified, fast.genomics can quickly show:

(These examples are for a putative 3-ketoglycoside hydrolase, ING2E5A_RS06865. This family of proteins was formerly known as DUF1080.)

Also see the preprint.

A database for each order

Fast.genomics also includes a database for each order, with every species represented, and up to 10 genomes per species. The per-order database will often include many more close homologs than the top-level database (example). You can reach the per-order database from the taxon or genome pages. Also, most gene pages have a link to search for homologs within that genome's order.

To speed up searches for homologs within an order, fast.genomics uses a pre-computed clustering (from CD-HIT) of all of the proteins in that order. First, fast.genomics searches against clusters (using protein BLAST and E ≤ 0.001); then it compares the query to all members of those clusters (using lastal and E ≤ 0.001, with E-values rescaled).

Statistics for the main database of diverse Bacteria and Archaea

Phyla128
Classes283
Orders745
Families1,606
Genomes6,377
Database dateMay 12, 2023
GTDB versionR08-RS214 (Apr 2023)

Downloads for the main database

by Morgan Price, Arkin group
Lawrence Berkeley National Laboratory