fast.genomicscompare 6,377 genera of bacteria & archaea


Enter an identifier, a protein sequence, or a genus name and a protein description
examples: ING2E5A_RS06865, P0A884, 3osdA, Escherichia thymidylate synthase

Or search for a taxon:
use % for wild cards

About fast.genomics

Fast.genomics includes representative genomes for 6,377 genera of Archaea and Bacteria. These were classified by using the Genome Tree Database. Only high-quality genomes are included. Potential chimeras were excluded using GUNC. Where possible, genomes were taken from NCBI's RefSeq.

Fast.genomics uses mmseqs2 to find homologs for a protein sequence of interest. This usually takes a few seconds. To speed up the search, fast.genomics splits the protein database into pieces, which allows parallel analysis of a single query. Fast.genomics also keeps the indexes in memory. The protein of interest need not be in fast.genomics' database.

Once the homologs are identified, fast.genomics can quickly show:

(These examples are for a putative 3-ketoglycoside hydrolase, ING2E5A_RS06865. This family of proteins was formerly known as DUF1080.)

A database for each order

Fast.genomics also includes a database for each order, with every species represented, and up to 10 genomes per species. The per-order database will often include many more close homologs than the top-level database (example). You can reach the per-order database from the taxon or genome pages.

Statistics for the main database of diverse Bacteria and Archaea

Protein sequences21,822,708
Database dateMay 12, 2023
GTDB versionR08-RS214 (Apr 2023)

Downloads for the main database

by Morgan Price, Arkin group
Lawrence Berkeley National Laboratory