This R Markdown focuses on prioritizing variants based on whether
they fall in a credible set/the posterior inclusion probability is
significant. Here, we map 81 loci from the latest AD GWAS from the
European Alzheimer’s Disease BioBank (Bellenguez et al. 2022).
Statistical fine-mapping was performed on each locus separately with
FINEMAP and SuSiE, Bayesian fine-mapping methods that report:
1) Credible Set (CS): set of SNPs set of SNPs
containing the true functional variant with 95% coverage
probability.
2) Posterior Inclusion Probability (PIP): probability
from 0 to 1 indicating a given variant’s likelihood of causality.
Functional fine-mapping was performed using PolyFun, which compute
SNP-wise heritability-derived prior probabilities using an
L2-regularized extension of stratified-linkage disequilibrium (LD) Score
(S-LDSC) regression. For PolyFun + SuSiE, we used the default UK Biobank
baseline model composed of 187 binarized epigenomic and genic
annotations. Finally, we also use SuSiE and FINEMAP, In all subsequent
analyses presented here, SNPs that fall within the HLA region were
excluded due to the particularly complex LD structure.
For all analyses presented, we set the number of causal SNPs to
five.
All fine-mapped results split by method are located in the full_results folder within the GitHub repository.
Figure 2a: Here, we show the size of the smallest
credible set for each locus across all 6 fine-mapping methods. Methods
are listed here:
1) SuSiE (UKBB LD)
2) FINEMAP (UKBB LD)
3) PolyFun + SuSiE (UKBB LD)
4) PolyFun + FINEMAP (UKBB LD)
5) SuSiE (GWAS LD)
6) FINEMAP (GWAS LD)
Here, we observe that all SNPs fall into a credible set when we fine-map with the FINEMAP method, regardless of whether we add functional annotations or use GWAS LD. We also observe that many of these loci are noisy, and contain very large credible sets - does this mean all of the SNPs within the set contain very high PIPs? We can see how the size changes by calculating a modified credible set size and filter based on posterior probability.
As we increase the posterior probability, we see that the credible set sizes become much smaller and many loci now do not have SNPs above a certain posterior probability that fall in a credible set.
Figure 2b: Here, we check how many SNPs with PIP > 0.1 overlap across all fine-mapping methods.
We also show how many SNPs with PIP > 0.1 found in a credible set are found across all methods.
We define consensus SNPs as SNPs with a PIP > 0.1 and found in a credible set across at least 2 methods. We identify 1,654 consensus SNPs across all 6 fine-mapping methods (Supplementary Table 1).
names(UCS_rsids)[names(UCS_rsids) == "UCS_set.SNP"] <- "SNP"
df_sub <- inner_join(UCS_rsids, UCS_set, by='SNP')
# df_sub <- dplyr::select(UCS_set, -contains(".x.x"))
df_sub <- dplyr::select(df_sub, -contains(".y"))
df_sub$number_of_methods <- rowSums(df_sub[2:7])
df_sub$consensus_snp <- df_sub$number_of_methods >= 2
createDT(df_sub)