BBJ data directory: \gpfs/data/im-lab/nas40t2/Data/BBJ

I first downloaded and decrypted Biobank Japan data (instructions), then organized into subdirectories BBJ-genotypes-decrypted and BBJ-phenotypes-decrypted, in their original form.


BBJ phenotypes file: gpfs/data/im-lab/nas40t2/Data/BBJ/BBJ-phenotypes.csv

This CSV combines all phenotype data in the BBJ-phenotypes-decrypted subdirectory into one file. The original BBJ phenotype data in BBJ-phenotypes-decrypted, was bulky and used dataset IDs instead of phenotype names. The file BBJ-phenotype-list.txt contains all the phenotypes and their folder names (download)

I created the combined phenotype file with the following script:

python3 --BBJ_folder /Users/sabrinami/BBJ/BBJ-phenotypes \
--phenotype_mapping /Users/sabrinami/Github/analysis-sabrina/BBJ-data-processing/BBJ-phenotype-list.txt \
--output /Users/sabrinami/Github/analysis-sabrina/BBJ-data-processing/BBJ-phenotypes.csv


BBJ genotypes folder: gpfs/data/im-lab/nas40t2/Data/BBJ/BBJ-genotypes-decrypted


Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The source code is licensed under MIT.

Suggest changes

If you find any mistakes (including typos) or want to suggest changes, please feel free to edit the source file of this page on Github and create a pull request.


For attribution, please cite this work as

Sabrina Mi (2022). Biobank Japan Data in CRI. ImLab Notes. /post/2022/01/02/biobank-japan-data-in-cri/

BibTeX citation

  title = "Biobank Japan Data in CRI",
  author = "Sabrina Mi",
  year = "2022",
  journal = "ImLab Notes",
  note = "/post/2022/01/02/biobank-japan-data-in-cri/"