BGEN format

A bgen file has a header block with information about the file, including number of samples, the number of variant data blocks, and flags which describe how data is stores. A variant data block contains data for a single snp, including ID, position, and alleles. bgens from UKBiobank also have a sample identifier block with has an identifier for each sample.

Querying a subset of a bgen genotype

Start an interactive job, qsub -I. We can query data for a region of a chromosome in a bgen file:

module load gcc/6.2.0; module load bgen/1.1.3
bgenix -g /gpfs/data/ukb-share/genotypes/v3/ukb_imp_chr17_v3.bgen -incl-range 17:46018872-46026674 > pnpo_37.bgen

This outputs a bgen containing data only for snps on chromosome 17, between positions 46018872 46026674.

Before querying it, we need to create an index file:

bgenix -g pnpo_37.bgen -index

More documentation is here.

Convert to VCF

For some reason, using the bgenix -vcf argument to convert the bgen output to a vcf is unreliable, so we use qctool to convert instead. qctool is installed in /gpfs/data/im-lab/nas40t2/bin. Note that qctool requires gcc, so this will need to be run in a job with the gcc module loaded.

export PATH=$PATH:/gpfs/data/im-lab/nas40t2/bin/software

bgenix -g pnpo_37.bgen | qctool -g - -filetype bgen -s /gpfs/data/ukb-share/genotypes/ukb19526_imp_chr1_v3_s487395.sample -og ~/PNPO_37.vcf

Run qctool -help for a list of options, or for more documentation, https://www.well.ox.ac.uk/~gav/qctool_v2/index.html.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The source code is licensed under MIT.

Suggest changes

If you find any mistakes (including typos) or want to suggest changes, please feel free to edit the source file of this page on Github and create a pull request.

Citation

For attribution, please cite this work as

Sabrina Mi (2020). Querying bgen files in CRI. ImLab Notes. /post/2020/11/10/querying-bgen-files-in-cri/

BibTeX citation

@misc{
  title = "Querying bgen files in CRI",
  author = "Sabrina Mi",
  year = "2020",
  journal = "ImLab Notes",
  note = "/post/2020/11/10/querying-bgen-files-in-cri/"
}