Archives

  • 2022-09
  • 2022-08
  • 2022-07
  • 2022-06
  • 2022-05
  • 2022-04
  • 2021-03
  • 2020-08
  • 2020-07
  • 2020-03
  • 2019-11
  • 2019-10
  • 2019-09
  • 2019-08
  • 2019-07
  • b iframe width height src https www youtube

    2022-09-14

    r> Consensus Molecular Subtypes
    Using 6 independent classification methods and similar consensus networking analysis, the CRCSC defined 4 consensus molecular subtypes: CMS1, CMS2, CMS3, CMS4, and Mixed (non-consensus).4 These subtypes have significant biological and clinical aspects associated with them. For all samples in this CHIR-99021 dataset, the CRCSC reports CMS classifications obtained by network analysis, the random forest classifier, and a “CMS-final” classifica-tion that agrees with the network classification when there is a consensus among the contributing classifiers, and agrees with the random forest classifier for non-consensus samples. In this article, the term “CMS subtype” refers to “CMS-final” unless otherwise specified.
    Genomic Score Predicting Subtype Methods
    A continuous score that predicts genomic location was defined using multistate gene methodology.17 To apply this method, the  Independent Classifiers of Location
    Independent classifiers were built and validated using SVM with linear, polynomial, and radial kernels, a decision tree, and a random forest. Each method was developed independently using the Cohort A training set with anatomic location as the response variable of the classifier. To begin, the set of probes in the microarray data was restricted to those with coefficient of variation > 0.7 (n ¼ 2081). These features were used for all classifiers. For an SVM classification with linear kernel, the cost parameter of the model was set at the value that minimized error in 10-fold cross validation of the training dataset (cost ¼ 10,000). SVM classifiers with radial and polynomial kernels were defined similarly. Using the 2081 features described above as variables, a decision tree was built to predict anatomic location. Using cross validation, the full decision tree had the lowest cross validation error within 1 standard deviation of the smallest tree; hence, the final decision tree was unpruned. This tree used a total of 12 genomic features (Figure 1). A random forest classifier
    Colon Tumor Side by Gene Expression
    Figure 1 The Final Decision Tree for Tumor Side Computed With Gene Expression in the Cohort A Training Set. Up Arrow Indicates Increased Expression; Down Arrow Indicates Decreased Expression
    Abbreviations: L ¼ left-side tumor; R ¼ right-side tumor.
    was also built using the same set of variables as the other classifiers. The number of trees (10,001) was chosen to achieve the most ac-curate classification in the training dataset, while maintaining a reasonable computation time. An odd number of trees was needed to avoid random ties.
    Samples predicted to be left-sided or right-sided by each of these 5 classifiers in the Cohort A validation set were computed and compared with anatomic location (Table 2). Each classification method resulted in a group of discordant samples for which the anatomic location was different from the predicted genomic loca-tion (Table 3). There was a high degree of overlap between the location predicted by different classifiers, even for the samples whose predicted location disagrees with anatomic location, suggesting that discordance is a feature of the tumor genome and not simply owing to statistical error.
    Consensus Genomic Left and Genomic Right Subtypes
    Each of the 5 classifiers defines a set of samples in Cohort A that it predicts to be right-sided or left-sided tumors. We next identified the samples for which there was a consensus prediction among the classifiers. Specifically, for each sample we computed the probability that the multiple classifiers predicted the sample to be left-sided or right-sided using a hypergeometric test. A sample was termed genomic left-sided, respectively, genomic right-sided if said probability of the