ADD results from the modeling paper

This commit is contained in:
Nathan Dwarshuis 2021-07-29 13:01:36 -04:00
parent 0c98ecce44
commit 89866515e8
1 changed files with 38 additions and 0 deletions

View File

@ -2265,6 +2265,17 @@ Venn diagram from the venn R package.
% TODO this section header sucks % TODO this section header sucks
\subsection{AI modeling reveals highly predictive species} \subsection{AI modeling reveals highly predictive species}
Due to the heterogeneity of the multivariate data collected and knowing that no
single model structure is perfect for all applications, we implemented an
agnostic modeling approach to better understand these TN+TCM responses. To
achieve this, a consensus analysis using seven machine learning (ML) techniques,
Random Forest (RF), Gradient Boosted Machine (GBM), Conditional Inference Forest
(CIF), Least Absolute Shrinkage and Selection Operator (LASSO), Partial
Least-Squares Regression (PLSR), Support Vector Machine (SVM), and DataModelers
Symbolic Regression (SR), was implemented to molecularly characterize TN+TCM
cells and to extract predictive features of quality early on their expansion
process (Fig.1d-e).
% TODO this table looks like crap, break it up into smaller tables % TODO this table looks like crap, break it up into smaller tables
\begin{table}[!h] \centering \begin{table}[!h] \centering
\caption{Results for data-driven modeling} \caption{Results for data-driven modeling}
@ -2272,6 +2283,25 @@ Venn diagram from the venn R package.
\input{../tables/model_results.tex} \input{../tables/model_results.tex}
\end{table} \end{table}
SR models achieved the highest predictive performance (R2>93\%) when using
multi-omics predictors for all endpoint responses (\cref{tab:mod_results}). SR achieved R2>98\%
while GBM tree-based ensembles showed leave-one-out cross-validated R2 (LOO-R2)
>95\% for CD4+ and CD4+/CD8+ TN+TCM responses. Similarly, LASSO, PLSR, and SVM
methods showed consistent high LOO-R2, 92.9\%, 99.7\%, and 90.5\%, respectively,
to predict the CD4+/CD8+ TN+TCM. Yet, about 10\% reduction in LOO-R2,
72.5\%-81.7\%, was observed for CD4+ TN+TCM with these three methods. Lastly, SR
and PLSR achieved R2>90\% while other ML methods exhibited exceedingly variable
LOO-R2 (0.3\%,RF-51.5\%,LASSO) for CD8+ TN+TCM cells.
% FIGURE the CD4/CD8 model results using SR
The top-performing technique, SR, showed that the median aggregated predictions
for CD4+ and CD8+ TN+TCM cells increases when IL2 concentration, IL15, and IL2R
increase while IL17a decreases in conjunction with other features. These
patterns combined with low values of DMS concentration and GM-CSF uniquely
characterized maximum CD8+ TN+TCM. Meanwhile, higher glycine but lower IL13 in
combination with others showed maximum CD4+ TN+TCM predictions (Fig.2).
\begin{figure*}[ht!] \begin{figure*}[ht!]
\begingroup \begingroup
@ -2291,6 +2321,14 @@ Venn diagram from the venn R package.
\label{fig:mod_flower} \label{fig:mod_flower}
\end{figure*} \end{figure*}
Selecting CPPs and CQAs candidates consistently for T cell memory is desired.
Here, \gls{tnfa} was found in consensus across all seven ML methods for predicting
CD4+/CD8+ TN+TCM when considering features with the highest importance scores
across models (Fig.3a;Methods). Other features, IL2R, IL4, IL17a, and DMS
concentration, were commonly selected in >=5 ML methods (Fig.3a,c). Moreover,
IL13 and IL15 were found predictive in combination with these using SR
(Supp.Table.S4).
\section{discussion} \section{discussion}
\chapter{aim 2b}\label{aim2b} \chapter{aim 2b}\label{aim2b}