St. Jude Children's Research Hospital
Memphis, USA
Bioinformatics Research Scientist
03/2020-Present
My work at the St Jude is mainly about Immunotherapy. I have been analyzing time-series single cell RNA-Seq and single cell TCR-Seq data to identify CAR-T cells that persist longer than other normal CAR-T cells. My collogues and I also found trajectory of CAR-T lineages after injection and specified their functions by time. A classifier using single cell gene expressions to predict CAR-T cells' persistency was developed as well and it showed about 0.8 accuracy and 0.9 AUC. I pre-processed the single cell data and analyzed the data with Seurat package. DE & pathway analysis, drawing UMAP plots, clustering, dot/violin/network plots were generated by me including other data visualizations.
Another interesting project is to analyze murine bone marrow HSPC (Hematopoietic Stem and Progenitor Cell) data to find out how different cell types interact each other over time. The collogues and I revealed that fetal BM HSPCs are present by E15.5, but distinct from the HSPC pool seen in fetal liver, both phenotypically and functionally, until near birth. We also generated the first transcriptional atlas of perinatal BM HSPCs and the BM niche in mice across ontogeny, revealing that fetal BM lacks HSPCs with robust intrinsic stem cell programs, as well as niche cells supportive of HSPCs.
Computational HLA typing for both single cell & bulk RNA-Seq data to find out mis-labeled samples was another task that I frequently did in the St. Jude.
Columbia University Medical Center
New York, USA
Associate Research Scientist
10/2017-02/2020
I have supported client investigators who need computational and bioinformatic analyses to solve their biological questions. I have provided various types of analyses including principle component analysis, differential expression analysis, pathway analysis, differential methylation analysis, RNA-Seq, Chip-Seq, ATAC-Seq analyses, somatic mutation analysis, variant calling, sequence alignment, clustering, classification, regression, construction of gene co-expression network, and whatever specific analyses that clients want to do upon their requests. I have had meetings with client investigators to know which analyses they need and what kinds of tools should be implemented for their research goals. I have implemented codes for the analyses, run the analyses, and discussed the analyses results with the clients.
I have assisted 15 groups that include principal investigators, doctors, clinical researchers, and biologists. I have provided countless computational and bioinformatic analyses service to them. For few examples, I performed differential gene expression analysis for L-DOPA induced dyskinesia vs control, spinal muscular atrophy vs control, Apcdd1-knock-out vs wild type mice, and androgenetic alopecia vs control. I also performed differential methylation analysis for periodontitis vs normal and colorectal cancer vs normal. I constructed Aracne gene co-expression networks for more than 80 tissues including various cancer and normal tissues, I did variant calling for lung cancer patients, and did somatic mutation analysis on colorectal cancer data. Additionally, I performed clustering, correlation analysis, and survival analysis on glioblastoma dataset, ran transcription factor footprinting analysis on mouse andrognetic alopecia data, and did pathway analysis on Apcdd1-knock-out mice and on spinal muscular atrophy data. I also have helped the clients with data visualization skills as they wanted various types of figures that are able to show their data in understandable and efficient ways. For instance, I have generated heatmaps, line graphs, bar graphs, box plots, beeswarm plots, survival plots, scatter plots, pie charts, etc. for them.
Other than analyzing the clients' data, I also have my personal research. It is about comparing gene expression networks between cancer (TCGA) and normal (GTEx) tissues to find cancer-risk regulator modules. I developed a new approach that found some meaningful modules based on differentially conserved regulons between cancer and normal tissues, and I am writing a paper with the results.
University of Cambridge
Cambridge, UK
Postdoctoral Research Associate
09/2016-09/2017
My duties in Cambridge were to find genetic risk factors of lung cancer and brain cancer. Discovering differentially expressed genes, disease-associated pathways, master regulator transcription factors, correlation between genetic factors, and driver mutations are the major things I had done for my postdoc period in Cambridge.
More specifically, they can be categorized into the following five:
1. RNA-Seq analyses
I developed DE analysis and pathway analysis with RNA-Seq data. With the analyses, I discovered low DNA-repair people have more immune responses. I could also find highly expressed genes in TNFa had inflammatory pathways, and highly expressed genes in high hypoxic samples had necrosis-related pathways.
2. Master regulator analysis
The master regulator analysis is one of network analysis using gene network. RTN (Reconstruction and analysis of Transcriptional Networks) was performed to do the analysis. I found transcription factors which may regulate hypoxia-related genes in glioblastoma.
3. Variant calling from RNA-Seq
I used HaplotypeCaller of GATK to extract variants from RNA-Seq data. And compared those with genotypes from blood test to find swapped or mislabeled samples.
4. Somatic mutation calling from RNA-Seq
I used Mutect2 on normal epithelial samples to acquire somatic mutations. Since the data is RNA-Seq, and is come from normal tissues, it was a difficult work. One approach was just removing all the variants which already exist in dbSNP and 1000 Genomes Project. Another one is comparing bronchial and nasal samples to call tissue-specific somatic mutations. I implemented the both two approaches.
5. Mutation frequency analysis
I compared mutation frequencies between cancer vs healthy volunteers and obtained highly mutated genes. Correlations between mutation frequencies and DNA repair capacity were also measured.
University of Texas Southwestern Medical Center
Dallas, USA
Visiting Junior Researcher
12/2015-02/2016
I proposed a drug adverse event detection algorithm using FDA adverse event reporting system. I devised a computational algorithm to detect drug adverse event from clinical notes of electronic health record data, but procedure in UTSW to get the health record dataset was very intricate and time-consuming, so I tried to implement the algorithm with neural network on openFDA data.
Yonsei University
Seoul, South Korea
PhD Student
09/2010-08/2016
My PhD can be described with two words: machine learning and network biology. During my PhD, I did research about bioinformatics and data mining. I devised cancer classification algorithms (cancer vs benign, aggressive cancer vs non-aggressive cancer, cancer stage [multi-class]) using gene expression data (Microarray, RNA-Seq) and also proposed some computational algorithms to identify disease-associated genes, disease-associated miRNAs, and disease-associated drugs with various datasets such as gene expressions, protein-protein interactions, environmental factors, biomedical literature data, Google search data, web crawl data, etc. Most of them were based on network-based approaches.
I also did two more studies during my PhD. First, I used location concept and web crawl data to identify undiscovered disease-associated drugs. I used web crawl data as a text data source to replace the biomedical literature and proposed location-based discovery using web crawl data, a novel literature-based discovery approach introducing a location concept as a new linking term. Some novel and interesting disease-drug candidates were discovered through the approach. I also constructed a world disease map with disease-location associations from web crawl data.