fast.genomicscompare 6,377 genera of bacteria & archaea

 

Enter an identifier, a protein sequence, or a genus name and a protein description
examples: ING2E5A_RS06865, P0A884, 3osdA, Escherichia thymidylate synthase

Or search for a taxon:
use % for wild cards

About fast.genomics

Fast.genomics includes representative genomes for 6,377 genera of Archaea and Bacteria. These were classified by using the Genome Tree Database. Only high-quality genomes are included. Potential chimeras were excluded using GUNC. Where possible, genomes were taken from NCBI's RefSeq.

Fast.genomics uses mmseqs2 to find homologs for a protein sequence of interest. This usually takes a few seconds. To speed up the search, fast.genomics splits the protein database into pieces, which allows parallel analysis of a single query. Fast.genomics also keeps the indexes in memory. The protein of interest need not be in fast.genomics' database.

Once the homologs are identified, fast.genomics can quickly show:

(These examples are for a putative 3-ketoglycoside hydrolase, ING2E5A_RS06865. This family of proteins was formerly known as DUF1080.)

A database for each order

Fast.genomics also includes a database for each order, with every species represented, and up to 10 genomes per species. The per-order database will often include many more close homologs than the top-level database (example). You can reach the per-order database from the taxon or genome pages.

Statistics for the main database of diverse Bacteria and Archaea

Phyla128
Classes283
Orders745
Families1,606
Genomes6,377
Protein sequences21,822,708
Genes22,778,507
Database dateMay 12, 2023
GTDB versionR08-RS214 (Apr 2023)

Downloads for the main database

by Morgan Price, Arkin group
Lawrence Berkeley National Laboratory