Many users had difficulties matching the genotype variant id to the prediction model variant id.

Here is one example added to the PrediXcan tutorial where the matching was failing because of the on the fly option not taking into account that in the GTEx v8 vcf file, chromosomes are names as chr# whereas in other vcf’s (more common in hg19?) chromosomes are indicated by their number or letter (no chr prefix).

A working example for the GTEx genotype data with GTEx v8 mashr models is shown below.


export PRE=/gpfs/data/im-lab/nas40t2/Data/test-PrediXcan-GTEx
export DATA=$PRE/data
export MODEL=$PRE/models
export RESULTS=$PRE/results/
export METAXCAN=$PRE/repos/MetaXcan-master/software
export VCFSMALL=$PRE/data/gtex-small-common-test.vcf.gz

printf "Predict expression\n\n"

python3   $METAXCAN/Predict.py \
--model_db_path $PRE/models/gtex_v8_mashr/mashr_Whole_Blood.db \
--model_db_snp_key varID \
--vcf_genotypes  $VCFSMALL \
--vcf_mode genotyped \
--prediction_output $RESULTS/Whole_Blood__predict.txt  \
--prediction_summary_output $RESULTS/Whole_Blood__summary.txt \
--verbosity  \
--throw \
--on_the_fly_mapping METADATA "{}_{}_{}_{}_b38" 

** Thank you, Yanyu, for solving the mystery **

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The source code is licensed under MIT.

Suggest changes

If you find any mistakes (including typos) or want to suggest changes, please feel free to edit the source file of this page on Github and create a pull request.

Citation

For attribution, please cite this work as

Haky Im (2020). PrediXcan 0% variant mapping issue. ImLab Notes. /post/2020/12/01/predixcan-0-variant-mapping-issue/

BibTeX citation

@misc{
  title = "PrediXcan 0% variant mapping issue",
  author = "Haky Im",
  year = "2020",
  journal = "ImLab Notes",
  note = "/post/2020/12/01/predixcan-0-variant-mapping-issue/"
}