By: Nitin S.
Year: 2021
School: Troy High
Grade: 12
Science Teacher: James Kirkpatrick
Abstract: Breast cancer (BC) subtypes are categorized by their molecular subtype as ER+, PR+, or HER2+. Cancers that lack expression of these receptors are the highly heterogeneous and deadly triple-negative (TN). Thus, there is an urgency to identify targeted therapies for TN patients. In this study, Nitin used bioinformatic tools to analyze data to study variation in subtypes of BC and to test the clinical relevance.
Normalized gene expression and corresponding clinical data of BC patients were downloaded using the TCGA Biolinks R package. Dimensionality reduction methods (PCA analysis and uniform manifold approximation and projection (UMAP)), k-means clustering, and Kaplan-Meier survival plots were used to analyze data. The results revealed one unique UMAP cluster, which comprised of mostly TN samples (labeled as TN-UMAP). Differential gene analysis of the TN-UMAP cluster revealed a bimodal distribution of 10 genes within the TN-UMAP. Survival analysis of the full dataset within the TN-UMAP revealed high expression of MHY7, XPB1, and IGFBP5 confers an overall better survival.
This study shows that visualizing global gene expression through UMAP may be a valuable method for classifying TN phenotypes. Future studies of gene expression variation between these groups may help to explore TN heterogeneity to design precision therapies. It is inferred that differentially expressed genes may actually regulate the initiation and progression of TN. These observations may provide insights into the pathogenesis of breast cancer at the subtype level and identify alternative biomarkers and potential therapeutic targets for TN patients.