# br data were represented by a matrix A Score

data were represented by a matrix A Score of size M × N . The columns of AScore represents the collapsed score of the M Forskolin in the N subjects. The matrix AScore was then decomposed using NMF. The purpose of this study was to find a set of intrinsic patterns that are likely to distinguish

cancer types. To perform NMF, the matrix AScore was factored into two low-rank matrices W and H . Mathematically, AScore is approximated by

A Score WH.Matrix AScore is the approximate linear combinations of the column vectors in matrix W and the coefficients supplied by columns in matrix H . Matrix W has size M × K , with each of the K columns re-presenting a group of weighted genes and wij corresponding to the weight of gene i in group j . K denotes the number of factors and is a given input. Matrix H has size K × N , where each of the N columns denotes the feature coefficients for each subject. Entry hij is the value of feature i in sample j. The decomposition is achieved by iteratively updating the matrix W and H to minimize a divergence objective [16,25]. Specifically, for the purpose of sparseness, we used non-smooth Nonnegative Matrix Factorization (nsNMF) [26]. More specifi-cally, the application of nsNMF led to a high degree of sparseness by adding a positive symmetric matrix in the objective function. We achieved modest to high degree of sparseness in both the W (average 51% sparseness) and H matrix (average 85% sparseness). Each analysis was repeated ten times to address the local optima problem.

2.4. Classifier training

Matrix H has size K × N , where each of the N columns denotes the feature coefficients for the corresponding subject. To retain the in-formation from both W and H matrices, a new matrix F was generated
by multiplying the transposed matrix AScore with matrix W . Specifically,
an entry in matrix F was computed asfij
= x=1 AixT
Wxj. Matrix F has

m

size N × K , with each of the K columns representing the coefficient for

each subject. Since A Score WH , matrix F can be approximated by a kernel matrix ( W × H )T × W . Subsequently, columns in matrix F were

utilized as features to train the classifier. Support vector machine (SVM) was used for the training, where each column corresponds to one pre-dictor in the model. The Radial Basis Function (RBF) kernel was used, with parameters of gamma and C set to default. This trained SVM model was a cancer type classifier.

2.5. Factor number selection

Note that before factorization, the number of factors K need to be pre-defined. Typically, the number of factors K is chosen so that ( N + M ) × K < N × M [16]. Selection of K is critical because it de-termines the number of patterns to be found. Numerous studies have presented different methods for factor number selection: The factor

Fig. 1. Workflow of the study. Orange boxes are the data or processes; green boxes are the tools used; blue boxes are the results of the study. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

number K can be determined based on different metrics composing of a cophenetic correlation coefficient [18,26], variation of sum of squares [27], or maximum information reservation [28]. In our study, the most important feature for the classifier is the ability to identify intrinsic patterns that best distinguish cancer types accurately. To achieve this, a numerical screening test was conducted to screen through the different number of factors for best prediction performance. The screened factor numbers ranged from 2 to 15. Multi-class prediction accuracy in each classification was obtained as a performance measurement. Five-fold cross-validation was conducted using multiple different factor numbers. The factor number with the best performance in the cross-validation was chosen. Each experiment was replicated ten times with different initial seeds. Prediction accuracy, precision, recall, and f-measure were used as performance evaluation matrices.

To set up baselines for comparison, the mutation frequency and the collapsed scores were used as independent predictors to fit SVM models and penalized logistic regression models were used to predict cancer types. All somatic mutations were also used as predictive variables for cancer classification using the SVM and penalized logistical regression model. A number of models have been developed for cancer classifi-cation utilizing somatic mutation profiles, including variations of CADD scores [29], logistical regression on L1-regularised terms [30], and SVM-RFE [30]. In this study, we compared the performances between our proposed model and these reported methods for cancer classifica-tion. All studies were replicated ten times with different initial seeds and significance tests were performed for evaluations. P-values were obtained for the evaluations.