Fine-mapping Summary

This R Markdown focuses on prioritizing variants based on whether they fall in a credible set/the posterior inclusion probability is significant. Here, we map 81 loci from the latest AD GWAS from the European Alzheimer’s Disease BioBank (Bellenguez et al. 2022).

Fine-mapping Methods/Review

Statistical fine-mapping was performed on each locus separately with FINEMAP and SuSiE, Bayesian fine-mapping methods that report:
1) Credible Set (CS): set of SNPs set of SNPs containing the true functional variant with 95% coverage probability.
2) Posterior Inclusion Probability (PIP): probability from 0 to 1 indicating a given variant’s likelihood of causality.

Functional fine-mapping was performed using PolyFun, which compute SNP-wise heritability-derived prior probabilities using an L2-regularized extension of stratified-linkage disequilibrium (LD) Score (S-LDSC) regression. For PolyFun + SuSiE, we used the default UK Biobank baseline model composed of 187 binarized epigenomic and genic annotations. Finally, we also use SuSiE and FINEMAP, In all subsequent analyses presented here, SNPs that fall within the HLA region were excluded due to the particularly complex LD structure.

For all analyses presented, we set the number of causal SNPs to five.

All fine-mapped results split by method are located in the full_results folder within the GitHub repository.

Size of the smallest Credible Set for each Locus

Figure 2a: Here, we show the size of the smallest credible set for each locus across all 6 fine-mapping methods. Methods are listed here:
1) SuSiE (UKBB LD)
2) FINEMAP (UKBB LD)
3) PolyFun + SuSiE (UKBB LD)
4) PolyFun + FINEMAP (UKBB LD)
5) SuSiE (GWAS LD)
6) FINEMAP (GWAS LD)

Size of Smallest Credible Set for each locus (ALL SNPs included)

Here, we observe that all SNPs fall into a credible set when we fine-map with the FINEMAP method, regardless of whether we add functional annotations or use GWAS LD. We also observe that many of these loci are noisy, and contain very large credible sets - does this mean all of the SNPs within the set contain very high PIPs? We can see how the size changes by calculating a modified credible set size and filter based on posterior probability.

Modified Smallest Credible Set Size for each locus (PP >= 0.1)

Modified Smallest Credible Set Size for each locus (PP >= 0.5)

Modified Smallest Credible Set Size for each locus (PP >= 0.9)

As we increase the posterior probability, we see that the credible set sizes become much smaller and many loci now do not have SNPs above a certain posterior probability that fall in a credible set.

Upset Plot

Figure 2b: Here, we check how many SNPs with PIP > 0.1 overlap across all fine-mapping methods.

We also show how many SNPs with PIP > 0.1 found in a credible set are found across all methods.

print(side_plot + coord_flip())

# ggsave(file="/Users/ashvinravi/Desktop/finemapping_MPRA_project/Upset_side_plot.svg", plot=side_plot + coord_flip(), width=8, height=6)

Consensus SNPs

We define consensus SNPs as SNPs with a PIP > 0.1 and found in a credible set across at least 2 methods. We identify 1,654 consensus SNPs across all 6 fine-mapping methods (Supplementary Table 1).

names(UCS_rsids)[names(UCS_rsids) == "UCS_set.SNP"] <- "SNP"
df_sub <- inner_join(UCS_rsids, UCS_set, by='SNP')

# df_sub <- dplyr::select(UCS_set, -contains(".x.x"))
df_sub <- dplyr::select(df_sub, -contains(".y"))

df_sub$number_of_methods <- rowSums(df_sub[2:7])
df_sub$consensus_snp <- df_sub$number_of_methods >= 2

createDT(df_sub)