Given the genomic uniqueness, a local data set is most desired for Indians, who are underrepresented in existing public databases. We hypothesize patients with rare monogenic disorders and their family members can provide a reliable source of common variants in the population. Exome sequencing (ES) data from families with rare Mendelian disorders was aggregated from five centers in India. The dataset was refined by excluding related individuals and removing the disease-causing variants (refined cohort). The efficiency of these data sets was assessed in a new set of 50 exomes against gnomAD and GenomeAsia. Our original cohort comprised 1455 individuals from 1203 families. The refined cohort had 836 unrelated individuals that retained 1,251,064 variants with 181,125 population-specific and 489,618 common variants. The allele frequencies from our cohort helped to define 97,609 rare variants in gnomAD and 44,520 rare variants in GenomeAsia as common variants in our population. Our variant dataset provided an additional 1.7% and 0.1% efficiency for prioritizing heterozygous and homozygous variants respectively for rare monogenic disorders. We observed additional 19 genes/human knockouts. We list carrier frequency for 142 recessive disorders. This is a large and useful resource of exonic variants for Indians. Despite limitations, datasets from patients are efficient tools for variant prioritization in a resource-limited setting.
All Science Journal Classification (ASJC) codes