About Rummagene

Rummagene

Many biomedical research papers are published every day with a portion of them containing supporting tables with data about genes, transcripts, variants, and proteins. For example, supporting tables may contain differentially expressed genes and proteins from transcriptomics and proteomics assays, targets of transcription factors from ChIP-seq experiments, hits from genome-wide CRISPR screens, or genes identified to harbor mutations from GWAS studies. Because these gene sets are commonly buried in the supplemental tables of research publications, they are not widely available for search and reused. Rummagene is a web server application that provides access to hundreds of thousands human and mouse gene sets extracted from supporting materials of publications listed on PubMed Central (PMC). To create Rummagene, we first developed a softbot that extracts human and mouse gene sets from supporting tables of PMC publications. So far, the softbot scanned loading to find loading that contain loading. These gene sets are served for enrichment analysis, free text and table title search. Users of Rummagene can submit their own gene sets to find matching gene sets ranked by their overlap with the input gene set. In addition to providing the extracted gene sets for search, we investigated the massive corpus of these gene sets for statistical patterns. We show how Rummagene can be used for transcription factor and kinase enrichment analyses, for universal predictions of cell types for single cell RNA-seq data, and for gene function predictions. Finally, by combining gene set similarity with abstract similarity, Rummagene can be used to find surprising relationships between unexpected biological processes, concepts, and named entities.


This database is updated weekly to extract gene sets automatically from newly published open access PMC articles.


This site is programatically accessible via a GraphQL API.


Rummagene is actively being developed by the Ma'ayan Lab


Please acknowledge Rummagene in your publications by citing the following reference:

Clarke, D. J. B., Marino, G. B., Deng, E. Z., Xie, Z., Evangelista, J. E. & Ma'ayan, A. Rummagene: massive mining of gene sets from supporting materials of biomedical research publications. Commun Biol 7, (2024). https://doi.org/10.1038/s42003-024-06177-7