-
What is PAM50 for breast cancer?
PAM50 is a list of 50 genes that classify breast cancers into one of five intrinsic subtypes from formalin-fixed, paraffin-embedded tissues by real time polymerase chain reaction (RT-PCR). Initially developed on microarray data, PAM50 is being successfully used in digital multiplexed gene expression platforms such as NanoString nCounter®, which is the basis for the Prosigna® […]
-
Machine Learning Project in Oncology 1 – Deep Learning-based Identification of Prostate Cancer using TCGA RNA-seq
Prostate cancer (PRAD) is the most common non-skin cancer in America. In the United States, 1 in 8 men will be diagnosed with prostate cancer in his lifetime. The challenge of classifying PRAD and normal tissues based on gene expression data has been tackled through the development of diverse machine learning methods, such as self-organizing […]
-
How to build a reproducible analytic pipeline on linux cluster using github
GitHub is a code hosting platform for version control and collaboration. It lets you and others work together on projects from anywhere. Here is a brief tutorial on how to generate a reproducible analytic pipeline on linux cluster using github. First, create a github account. You will find an interface like this Click the New You can […]
-
Retrieve RNA-seq and Reads counts from TCGA
The Cancer Genome Atlas (TCGA) is a landmark cancer genomics program that sequenced and molecularly characterized over 11000 cases of primary cancer samples. TCGA provides RNA-seq profiles for these primary cancer samples. I use two R packages for data retrieval, including: TCGA2STAT enables users to easily download TCGA data directly into a format ready for statistical […]
-
Two types of learning approaches for data science and machine learning
There are two main methods in mastering new knowledge: Bottom-up and Top-down learning. What’s Bottom-up learning According to Encyclopedia of the Sciences of Learning, Bottom-up learning refers to learning implicit knowledge first and then learning explicit knowledge on that basis (i.e., through “extracting” implicit knowledge). Most of our schools are built around the bottom-up natural […]
-
Share a great review paper on machine learning concepts for biologists
As a biologist, we avoidably use data science and machine learning technologies. The question is where should we start? Here’s a very good review paper on the topic: https://www.nature.com/articles/s41580-021-00407-0