ADD results from the modeling paper

2021-07-29 13:01:36 -04:00 · 2021-07-29 13:01:36 -04:00 · 89866515e8
parent 0c98ecce44
commit 89866515e8
1 changed files with 38 additions and 0 deletions
--- a/tex/thesis.tex
+++ b/tex/thesis.tex
@ -2265,6 +2265,17 @@ Venn diagram from the venn R package.
 % TODO this section header sucks
 \subsection{AI modeling reveals highly predictive species}

+Due to the heterogeneity of the multivariate data collected and knowing that no
+single model structure is perfect for all applications, we implemented an
+agnostic modeling approach to better understand these TN+TCM responses. To
+achieve this, a consensus analysis using seven machine learning (ML) techniques,
+Random Forest (RF), Gradient Boosted Machine (GBM), Conditional Inference Forest
+(CIF), Least Absolute Shrinkage and Selection Operator (LASSO), Partial
+Least-Squares Regression (PLSR), Support Vector Machine (SVM), and DataModeler’s
+Symbolic Regression (SR), was implemented to molecularly characterize TN+TCM
+cells and to extract predictive features of quality early on their expansion
+process (Fig.1d-e).
+
 % TODO this table looks like crap, break it up into smaller tables
 \begin{table}[!h] \centering
  \caption{Results for data-driven modeling}
@ -2272,6 +2283,25 @@ Venn diagram from the venn R package.
  \input{../tables/model_results.tex}
 \end{table}

+SR models achieved the highest predictive performance (R2>93\%) when using
+multi-omics predictors for all endpoint responses (\cref{tab:mod_results}). SR achieved R2>98\%
+while GBM tree-based ensembles showed leave-one-out cross-validated R2 (LOO-R2)
+>95\% for CD4+ and CD4+/CD8+ TN+TCM responses. Similarly, LASSO, PLSR, and SVM
+methods showed consistent high LOO-R2, 92.9\%, 99.7\%, and 90.5\%, respectively,
+to predict the CD4+/CD8+ TN+TCM. Yet, about 10\% reduction in LOO-R2,
+72.5\%-81.7\%, was observed for CD4+ TN+TCM with these three methods. Lastly, SR
+and PLSR achieved R2>90\% while other ML methods exhibited exceedingly variable
+LOO-R2 (0.3\%,RF-51.5\%,LASSO) for CD8+ TN+TCM cells.
+
+% FIGURE the CD4/CD8 model results using SR
+
+The top-performing technique, SR, showed that the median aggregated predictions
+for CD4+ and CD8+ TN+TCM cells increases when IL2 concentration, IL15, and IL2R
+increase while IL17a decreases in conjunction with other features. These
+patterns combined with low values of DMS concentration and GM-CSF uniquely
+characterized maximum CD8+ TN+TCM. Meanwhile, higher glycine but lower IL13 in
+combination with others showed maximum CD4+ TN+TCM predictions (Fig.2).
+
 \begin{figure*}[ht!]
  \begingroup

@ -2291,6 +2321,14 @@ Venn diagram from the venn R package.
  \label{fig:mod_flower}
 \end{figure*}

+Selecting CPPs and CQAs candidates consistently for T cell memory is desired.
+Here, \gls{tnfa} was found in consensus across all seven ML methods for predicting
+CD4+/CD8+ TN+TCM when considering features with the highest importance scores
+across models (Fig.3a;Methods). Other features, IL2R, IL4, IL17a, and DMS
+concentration, were commonly selected in >=5 ML methods (Fig.3a,c). Moreover,
+IL13 and IL15 were found predictive in combination with these using SR
+(Supp.Table.S4).
+
 \section{discussion}

 \chapter{aim 2b}\label{aim2b}