Fast.genomics includes representative genomes for 6,377 genera of Archaea and Bacteria. These were classified by using the Genome Tree Database. Only high-quality genomes are included. Potential chimeras were excluded using GUNC. Where possible, genomes were taken from NCBI's RefSeq.
Fast.genomics uses mmseqs2 to find homologs for a protein sequence of interest. This usually takes a few seconds. To speed up the search, fast.genomics splits the protein database into pieces, which allows parallel analysis of a single query. Fast.genomics also keeps the indexes in memory. The protein of interest need not be in fast.genomics' database.
Once the homologs are identified, fast.genomics can quickly show:
(These examples are for a putative 3-ketoglycoside hydrolase, ING2E5A_RS06865. This family of proteins was formerly known as DUF1080.)
Fast.genomics also includes a database for each order, with every species represented, and up to 10 genomes per species. The per-order database will often include many more close homologs than the top-level database (example). You can reach the per-order database from the taxon or genome pages.
Phyla | 128 |
Classes | 283 |
Orders | 745 |
Families | 1,606 |
Genomes | 6,377 |
Protein sequences | 21,822,708 |
Genes | 22,778,507 |
Database date | May 12, 2023 |
GTDB version | R08-RS214 (Apr 2023) |
Lawrence Berkeley National Laboratory