Ever wondered if some genomic regions of interest overlap significantly with known (or own) sets of regions?
LOLA is an R package that handles that for you. It includes a “core” set of regions from public databases and lets you extend them with your own regions of interest.
I wanted to include the position of every known SNP associated with a trait (specially clinical) in the database, but also preferebly grouped by the broad type of trait. here’s what I came up with by using EBI’s GWAS catalog.
Creating a bed file with SNPs for each disease group
You will have a file named index.txt enumerating the various region sets, which are inside a folder named regions.
Documenting your region sets
Simply create a tab-delimited file in the same folder named collection.txt with the following information:
collector
date
source
description
arendeiro
2016-01-28
customRegionDB/hg38/gwas
GWAS from EBI’s GWAS catalog (https://www.ebi.ac.uk/gwas/)