diff --git a/tex/thesis.tex b/tex/thesis.tex index 7eceb7a..167627b 100644 --- a/tex/thesis.tex +++ b/tex/thesis.tex @@ -2265,6 +2265,17 @@ Venn diagram from the venn R package. % TODO this section header sucks \subsection{AI modeling reveals highly predictive species} +Due to the heterogeneity of the multivariate data collected and knowing that no +single model structure is perfect for all applications, we implemented an +agnostic modeling approach to better understand these TN+TCM responses. To +achieve this, a consensus analysis using seven machine learning (ML) techniques, +Random Forest (RF), Gradient Boosted Machine (GBM), Conditional Inference Forest +(CIF), Least Absolute Shrinkage and Selection Operator (LASSO), Partial +Least-Squares Regression (PLSR), Support Vector Machine (SVM), and DataModeler’s +Symbolic Regression (SR), was implemented to molecularly characterize TN+TCM +cells and to extract predictive features of quality early on their expansion +process (Fig.1d-e). + % TODO this table looks like crap, break it up into smaller tables \begin{table}[!h] \centering \caption{Results for data-driven modeling} @@ -2272,6 +2283,25 @@ Venn diagram from the venn R package. \input{../tables/model_results.tex} \end{table} +SR models achieved the highest predictive performance (R2>93\%) when using +multi-omics predictors for all endpoint responses (\cref{tab:mod_results}). SR achieved R2>98\% +while GBM tree-based ensembles showed leave-one-out cross-validated R2 (LOO-R2) +>95\% for CD4+ and CD4+/CD8+ TN+TCM responses. Similarly, LASSO, PLSR, and SVM +methods showed consistent high LOO-R2, 92.9\%, 99.7\%, and 90.5\%, respectively, +to predict the CD4+/CD8+ TN+TCM. Yet, about 10\% reduction in LOO-R2, +72.5\%-81.7\%, was observed for CD4+ TN+TCM with these three methods. Lastly, SR +and PLSR achieved R2>90\% while other ML methods exhibited exceedingly variable +LOO-R2 (0.3\%,RF-51.5\%,LASSO) for CD8+ TN+TCM cells. + +% FIGURE the CD4/CD8 model results using SR + +The top-performing technique, SR, showed that the median aggregated predictions +for CD4+ and CD8+ TN+TCM cells increases when IL2 concentration, IL15, and IL2R +increase while IL17a decreases in conjunction with other features. These +patterns combined with low values of DMS concentration and GM-CSF uniquely +characterized maximum CD8+ TN+TCM. Meanwhile, higher glycine but lower IL13 in +combination with others showed maximum CD4+ TN+TCM predictions (Fig.2). + \begin{figure*}[ht!] \begingroup @@ -2291,6 +2321,14 @@ Venn diagram from the venn R package. \label{fig:mod_flower} \end{figure*} +Selecting CPPs and CQAs candidates consistently for T cell memory is desired. +Here, \gls{tnfa} was found in consensus across all seven ML methods for predicting +CD4+/CD8+ TN+TCM when considering features with the highest importance scores +across models (Fig.3a;Methods). Other features, IL2R, IL4, IL17a, and DMS +concentration, were commonly selected in >=5 ML methods (Fig.3a,c). Moreover, +IL13 and IL15 were found predictive in combination with these using SR +(Supp.Table.S4). + \section{discussion} \chapter{aim 2b}\label{aim2b}