ENH proof aim2a

This commit is contained in:
Nathan Dwarshuis 2021-09-08 19:14:31 -04:00
parent 597c316a5c
commit eb19087bd2
9 changed files with 213 additions and 220 deletions

View File

@ -3,8 +3,8 @@
\begin{tabular}{@{\extracolsep{5pt}}lc} \begin{tabular}{@{\extracolsep{5pt}}lc}
\\[-1.8ex]\hline \\[-1.8ex]\hline
\hline \\[-1.8ex] \hline \\[-1.8ex]
& \multicolumn{1}{c}{\textit{Dependent variable:}} \\ % & \multicolumn{1}{c}{\textit{Dependent variable:}} \\
\cline{2-2} % \cline{2-2}
\\[-1.8ex] & CD4+ Cells \\ \\[-1.8ex] & CD4+ Cells \\
\hline \\[-1.8ex] \hline \\[-1.8ex]
Dataset [2] & 1,271,171.000$^{**}$ \\ Dataset [2] & 1,271,171.000$^{**}$ \\
@ -13,7 +13,7 @@
DMS Conc. (1/ml) & 1,742.752$^{***}$ \\ DMS Conc. (1/ml) & 1,742.752$^{***}$ \\
Intercept & $-$5,344,494.000$^{***}$ \\ Intercept & $-$5,344,494.000$^{***}$ \\
\hline \\[-1.8ex] \hline \\[-1.8ex]
Observations & 30 \\ % Observations & 30 \\
R$^{2}$ & 0.888 \\ R$^{2}$ & 0.888 \\
Adjusted R$^{2}$ & 0.870 \\ Adjusted R$^{2}$ & 0.870 \\
% Residual Std. Error & 727,042.800 (df = 25) \\ % Residual Std. Error & 727,042.800 (df = 25) \\

View File

@ -3,8 +3,8 @@
\begin{tabular}{@{\extracolsep{5pt}}lc} \begin{tabular}{@{\extracolsep{5pt}}lc}
\\[-1.8ex]\hline \\[-1.8ex]\hline
\hline \\[-1.8ex] \hline \\[-1.8ex]
& \multicolumn{1}{c}{\textit{Dependent variable:}} \\ % & \multicolumn{1}{c}{\textit{Dependent variable:}} \\
\cline{2-2} % \cline{2-2}
\\[-1.8ex] & CD62L+CCR7+ Cells \\ \\[-1.8ex] & CD62L+CCR7+ Cells \\
\hline \\[-1.8ex] \hline \\[-1.8ex]
Dataset [2] & 4,661,754.000$^{*}$ \\ Dataset [2] & 4,661,754.000$^{*}$ \\
@ -13,7 +13,7 @@
DMS Conc. (1/ml) & 240.038 \\ DMS Conc. (1/ml) & 240.038 \\
Intercept & $-$3,478,851.000 \\ Intercept & $-$3,478,851.000 \\
\hline \\[-1.8ex] \hline \\[-1.8ex]
Observations & 30 \\ % Observations & 30 \\
R$^{2}$ & 0.331 \\ R$^{2}$ & 0.331 \\
Adjusted R$^{2}$ & 0.224 \\ Adjusted R$^{2}$ & 0.224 \\
% Residual Std. Error & 3,659,501.000 (df = 25) \\ % Residual Std. Error & 3,659,501.000 (df = 25) \\

View File

@ -3,8 +3,8 @@
\begin{tabular}{@{\extracolsep{5pt}}lc} \begin{tabular}{@{\extracolsep{5pt}}lc}
\\[-1.8ex]\hline \\[-1.8ex]\hline
\hline \\[-1.8ex] \hline \\[-1.8ex]
& \multicolumn{1}{c}{\textit{Dependent variable:}} \\ % & \multicolumn{1}{c}{\textit{Dependent variable:}} \\
\cline{2-2} % \cline{2-2}
\\[-1.8ex] & log(CD62L+CCR7+ Cells) \\ \\[-1.8ex] & log(CD62L+CCR7+ Cells) \\
\hline \\[-1.8ex] \hline \\[-1.8ex]
Dataset [2] & 0.269 \\ Dataset [2] & 0.269 \\
@ -20,7 +20,7 @@
(Functional mAb \%)*(IL2 Conc. (IU/ml)*(DMS Conc. (1/ml)) & 0.00000$^{*}$ \\ (Functional mAb \%)*(IL2 Conc. (IU/ml)*(DMS Conc. (1/ml)) & 0.00000$^{*}$ \\
Intercept & 20.899$^{***}$ \\ Intercept & 20.899$^{***}$ \\
\hline \\[-1.8ex] \hline \\[-1.8ex]
Observations & 30 \\ % Observations & 30 \\
R$^{2}$ & 0.741 \\ R$^{2}$ & 0.741 \\
Adjusted R$^{2}$ & 0.583 \\ Adjusted R$^{2}$ & 0.583 \\
% Residual Std. Error & 0.228 (df = 18) \\ % Residual Std. Error & 0.228 (df = 18) \\

View File

@ -11,7 +11,7 @@
DMS Conc. (1/ml) & 926.925$^{***}$ \\ DMS Conc. (1/ml) & 926.925$^{***}$ \\
Intercept & $-$3,368,762.000$^{***}$ \\ Intercept & $-$3,368,762.000$^{***}$ \\
\hline \\[-1.8ex] \hline \\[-1.8ex]
Observations & 30 \\ % Observations & 30 \\
R$^{2}$ & 0.835 \\ R$^{2}$ & 0.835 \\
Adjusted R$^{2}$ & 0.808 \\ Adjusted R$^{2}$ & 0.808 \\
% Residual Std. Error & 493,168.700 (df = 25) \\ % Residual Std. Error & 493,168.700 (df = 25) \\

View File

@ -12,7 +12,7 @@
DMS Conc. (1/ml) & 0.0001$^{***}$ \\ DMS Conc. (1/ml) & 0.0001$^{***}$ \\
Intercept & $-$0.144$^{*}$ \\ Intercept & $-$0.144$^{*}$ \\
\hline \\[-1.8ex] \hline \\[-1.8ex]
Observations & 30 \\ % Observations & 30 \\
R$^{2}$ & 0.879 \\ R$^{2}$ & 0.879 \\
Adjusted R$^{2}$ & 0.860 \\ Adjusted R$^{2}$ & 0.860 \\
% Residual Std. Error & 0.039 (df = 25) \\ % Residual Std. Error & 0.039 (df = 25) \\

View File

@ -23,7 +23,7 @@ DOE & 16 & 10 & 500 & 100\\
DOE & 17 & 20 & 1500 & 60\\ DOE & 17 & 20 & 1500 & 60\\
DOE & 18 & 30 & 2500 & 60\\ DOE & 18 & 30 & 2500 & 60\\
ADOE & 1 & 40 & 500 & 100\\ ADOE & 1 & 40 & 500 & 100\\
ADOE & 2 & 35 & 2000 & 100\\ ADOE & 2\tnote{a} & 35 & 2000 & 100\\
ADOE & 3 & 30 & 1500 & 100\\ ADOE & 3 & 30 & 1500 & 100\\
ADOE & 4 & 30 & 2500 & 100\\ ADOE & 4 & 30 & 2500 & 100\\
ADOE & 5 & 40 & 2500 & 100\\ ADOE & 5 & 40 & 2500 & 100\\
@ -31,14 +31,14 @@ ADOE & 6 & 40 & 1500 & 100\\
ADOE & 7 & 30 & 500 & 100\\ ADOE & 7 & 30 & 500 & 100\\
ADOE & 8 & 35 & 2000 & 100\\ ADOE & 8 & 35 & 2000 & 100\\
ADOE & 9 & 35 & 1000 & 100\\ ADOE & 9 & 35 & 1000 & 100\\
ADOE & 10 & 30 & 1500 & 100\\ ADOE & 10\tnote{a} & 30 & 1500 & 100\\
ADOE & 11 & 35 & 3000 & 100\\ ADOE & 11 & 35 & 3000 & 100\\
ADOE & 12 & 30 & 2500 & 100\\ ADOE & 12 & 30 & 2500 & 100\\
ADOE & 13 & 40 & 1500 & 100\\ ADOE & 13\tnote{a} & 40 & 1500 & 100\\
ADOE & 14 & 40 & 500 & 100\\ ADOE & 14\tnote{a} & 40 & 500 & 100\\
ADOE & 15 & 30 & 500 & 100\\ ADOE & 15\tnote{a} & 30 & 500 & 100\\
ADOE & 16 & 35 & 1000 & 100\\ ADOE & 16\tnote{a} & 35 & 1000 & 100\\
ADOE & 17 & 35 & 3000 & 100\\ ADOE & 17\tnote{a} & 35 & 3000 & 100\\
ADOE & 18 & 40 & 3500 & 100\\ ADOE & 18 & 40 & 3500 & 100\\
ADOE & 19 & 40 & 2500 & 100\\ ADOE & 19 & 40 & 2500 & 100\\
ADOE & 20 & 40 & 3500 & 100\\ ADOE & 20 & 40 & 3500 & 100\\

View File

@ -4,20 +4,23 @@
\\[-1.8ex] Response/Predictors & SR & RF & GBM & CIF & LASSO & PLSR & SVM \\ \\[-1.8ex] Response/Predictors & SR & RF & GBM & CIF & LASSO & PLSR & SVM \\
\hline \\[-1.8ex] \hline \\[-1.8ex]
\multicolumn{8}{l}{CD4:CD8 Ratio} \\ \multicolumn{8}{l}{CD4:CD8 Ratio} \\
\\[-1.8ex]
PP+N4 & \SI{99}{\percent} & \SI{86.8}{\percent} & \SI{96.3}{\percent} & \SI{84.5}{\percent} & \SI{88.6}{\percent} & \SI{92.5}{\percent} & \SI{88.5}{\percent}\\ PP+N4 & \SI{99}{\percent} & \SI{86.8}{\percent} & \SI{96.3}{\percent} & \SI{84.5}{\percent} & \SI{88.6}{\percent} & \SI{92.5}{\percent} & \SI{88.5}{\percent}\\
PP+N6 & \SI{99}{\percent} & \SI{73.6}{\percent} & \SI{95.9}{\percent} & \SI{70.1}{\percent} & \SI{81.0}{\percent} & \SI{95.8}{\percent} & \SI{79.7}{\percent}\\ PP+N6 & \SI{99}{\percent} & \SI{73.6}{\percent} & \SI{95.9}{\percent} & \SI{70.1}{\percent} & \SI{81.0}{\percent} & \SI{95.8}{\percent} & \SI{79.7}{\percent}\\
PP+S6 & \SI{99}{\percent} & \SI{87.1}{\percent} & \SI{99.9}{\percent} & \SI{83.4}{\percent} & \SI{87.2}{\percent} & \SI{97.9}{\percent} & \SI{86.8}{\percent}\\ PP+S6 & \SI{99}{\percent} & \SI{87.1}{\percent} & \SI{99.9}{\percent} & \SI{83.4}{\percent} & \SI{87.2}{\percent} & \SI{97.9}{\percent} & \SI{86.8}{\percent}\\
PP+S6+N6 & \SI{99}{\percent} & \SI{85.5}{\percent} & \SI{95.3}{\percent} & \SI{83.4}{\percent} & \SI{92.9}{\percent} & \SI{99.7}{\percent} & \SI{90.5}{\percent}\\ PP+S6+N6 & \SI{99}{\percent} & \SI{85.5}{\percent} & \SI{95.3}{\percent} & \SI{83.4}{\percent} & \SI{92.9}{\percent} & \SI{99.7}{\percent} & \SI{90.5}{\percent}\\
\hline \\ \hline \\[-1.8ex]
\multicolumn{8}{l}{\ptmemh{} cells} \\ \multicolumn{8}{l}{\ptmemh{} cells} \\
\\[-1.8ex]
PP+N4 & \SI{97}{\percent} & \SI{67.0}{\percent} & \SI{93.6}{\percent} & \SI{69.3}{\percent} & \SI{34.3}{\percent} & \SI{90.1}{\percent} & \SI{75.5}{\percent}\\ PP+N4 & \SI{97}{\percent} & \SI{67.0}{\percent} & \SI{93.6}{\percent} & \SI{69.3}{\percent} & \SI{34.3}{\percent} & \SI{90.1}{\percent} & \SI{75.5}{\percent}\\
PP+N6 & \SI{96}{\percent} & \SI{45.9}{\percent} & \SI{92.6}{\percent} & \SI{51.2}{\percent} & \SI{42.8}{\percent} & \SI{92.1}{\percent} & \SI{79.4}{\percent}\\ PP+N6 & \SI{96}{\percent} & \SI{45.9}{\percent} & \SI{92.6}{\percent} & \SI{51.2}{\percent} & \SI{42.8}{\percent} & \SI{92.1}{\percent} & \SI{79.4}{\percent}\\
PP+S6 & \SI{98}{\percent} & \SI{71.4}{\percent} & \SI{99.9}{\percent} & \SI{75.0}{\percent} & \SI{74.9}{\percent} & \SI{80.0}{\percent} & \SI{75.5}{\percent}\\ PP+S6 & \SI{98}{\percent} & \SI{71.4}{\percent} & \SI{99.9}{\percent} & \SI{75.0}{\percent} & \SI{74.9}{\percent} & \SI{80.0}{\percent} & \SI{75.5}{\percent}\\
PP+S6+N6 & \SI{98}{\percent} & \SI{68.2}{\percent} & \SI{95.6}{\percent} & \SI{74.4}{\percent} & \SI{72.5}{\percent} & \SI{81.7}{\percent} & \SI{77.0}{\percent}\\ PP+S6+N6 & \SI{98}{\percent} & \SI{68.2}{\percent} & \SI{95.6}{\percent} & \SI{74.4}{\percent} & \SI{72.5}{\percent} & \SI{81.7}{\percent} & \SI{77.0}{\percent}\\
\hline \\ \hline \\[-1.8ex]
\multicolumn{8}{l}{\ptmemk{} cells} \\ \multicolumn{8}{l}{\ptmemk{} cells} \\
\\[-1.8ex]
PP+N4 & \SI{93}{\percent} & \SI{4.7}{\percent} & \SI{44.4}{\percent} & \SI{9.2}{\percent} & \SI{1.2}{\percent} & \SI{65.1}{\percent} & \SI{9.1}{\percent}\\ PP+N4 & \SI{93}{\percent} & \SI{4.7}{\percent} & \SI{44.4}{\percent} & \SI{9.2}{\percent} & \SI{1.2}{\percent} & \SI{65.1}{\percent} & \SI{9.1}{\percent}\\
PP+N6 & \SI{86}{\percent} & \SI{2.0}{\percent} & \SI{29.9}{\percent} & \SI{15.8}{\percent} & \SI{28.5}{\percent} & \SI{63.3}{\percent} & \SI{30.6}{\percent}\\ PP+N6 & \SI{86}{\percent} & \SI{2.0}{\percent} & \SI{29.9}{\percent} & \SI{15.8}{\percent} & \SI{28.5}{\percent} & \SI{63.3}{\percent} & \SI{30.6}{\percent}\\
PP+S6 & \SI{93}{\percent} & \SI{7.8}{\percent} & \SI{28.0}{\percent} & \SI{15.1}{\percent} & \SI{76.2}{\percent} & \SI{98.4}{\percent} & \SI{49.8}{\percent}\\ PP+S6 & \SI{93}{\percent} & \SI{7.8}{\percent} & \SI{28.0}{\percent} & \SI{15.1}{\percent} & \SI{76.2}{\percent} & \SI{98.4}{\percent} & \SI{49.8}{\percent}\\

View File

@ -2803,6 +2803,19 @@ CONCLUSIONS: We developed a simplified, semi-closed system for the initial selec
isbn = {046509760X}, isbn = {046509760X},
} }
@Article{Holmes2006,
author = {E. Holmes and O. Cloarec and J. K. Nicholson},
journal = {Journal of Proteome Research},
title = {Probing Latent Biomarker Signatures and in Vivo Pathway Activity in Experimental Disease States via Statistical Total Correlation Spectroscopy ({STOCSY}) of Biofluids:~ Application to {HgCl}2Toxicity},
year = {2006},
month = {jun},
number = {6},
pages = {1313--1320},
volume = {5},
doi = {10.1021/pr050399w},
publisher = {American Chemical Society ({ACS})},
}
@Comment{jabref-meta: databaseType:bibtex;} @Comment{jabref-meta: databaseType:bibtex;}
@Comment{jabref-meta: grouping: @Comment{jabref-meta: grouping:

View File

@ -21,6 +21,7 @@
\usepackage{listings} \usepackage{listings}
\usepackage{tocloft} \usepackage{tocloft}
\usepackage{epigraph} \usepackage{epigraph}
\usepackage{threeparttable}
\hypersetup{ \hypersetup{
colorlinks=true, colorlinks=true,
@ -2876,8 +2877,8 @@ The purpose of this sub-aim was to develop computational methods to identify
novel \glspl{cqa} and \glspl{cpp} that could be used for release criteria, novel \glspl{cqa} and \glspl{cpp} that could be used for release criteria,
process control, and process optimization for the \gls{dms} platform. We process control, and process optimization for the \gls{dms} platform. We
hypothesized that T cells grown using the \gls{dms} system would produce hypothesized that T cells grown using the \gls{dms} system would produce
detectable biological signatures in the media supernatent which corresponded to detectable biological signatures in the media supernatent which would correspond
clinically relevent responses such as fold expansion or phenotype. We tested to clinically relevent responses such as fold expansion or phenotype. We tested
this hypothesis by activating T cells under a variety of conditions using a this hypothesis by activating T cells under a variety of conditions using a
\gls{doe}, sampling the media at intermediate timepoints, and creating models to \gls{doe}, sampling the media at intermediate timepoints, and creating models to
predict the outcome of the cultures. We should stress that the specific predict the outcome of the cultures. We should stress that the specific
@ -2921,13 +2922,12 @@ progressed. Data from inputs and/or longitudinal samples were used to predict
the endpoint response. The fusion of cytokine and \gls{nmr} profiles from media the endpoint response. The fusion of cytokine and \gls{nmr} profiles from media
to model these responses included 30 cytokines from a custom Thermo Fisher to model these responses included 30 cytokines from a custom Thermo Fisher
ProcartaPlex Luminex kit and 20 \gls{nmr} features. These 20 spectral features ProcartaPlex Luminex kit and 20 \gls{nmr} features. These 20 spectral features
from \gls{nmr} media analysis were selected out of approximately 250 peaks from \gls{nmr} media analysis were selected out of approximately 250 peaks using
through the implementation of a variance-based feature selection approach and a variance-based feature selection approach and some manual inspection steps.
some manual inspection steps.
The first \gls{doe} resulted in a randomized 18-run I-optimal custom design The first \gls{doe} resulted in a randomized 18-run I-optimal custom design
where each \gls{dms} parameter was evaluated at three levels: \pilII{} (10, 20, where each \gls{dms} parameter was evaluated at three levels: \pilII{} (10, 20,
and 30 U/uL), \pdms{} (500, 1500, 2500 \si{\dms\per\ul}), and \pmab{} (60, 80, and 30 U/uL), \pdms{} (500, 1500, 2500 \si{\dms\per\ml}), and \pmab{} (60, 80,
100 \si{\percent}). These 18 runs consisted of 14 unique parameter combinations 100 \si{\percent}). These 18 runs consisted of 14 unique parameter combinations
where 4 of them were replicated twice to assess prediction error. To further where 4 of them were replicated twice to assess prediction error. To further
optimize the initial region explored, an \gls{adoe} was designed with 10 unique optimize the initial region explored, an \gls{adoe} was designed with 10 unique
@ -2972,11 +2972,11 @@ Cytokines were quantified via Luminex as described in
Prior to analysis, samples were centrifuged at \SI{2990}{\gforce} for Prior to analysis, samples were centrifuged at \SI{2990}{\gforce} for
\SI{20}{\minute} at \SI{4}{\degreeCelsius} to clear any debris\footnote{all \SI{20}{\minute} at \SI{4}{\degreeCelsius} to clear any debris\footnote{all
\gls{nmr} analysis was done by our collaborators Max Colonna and Art Edison at \gls{nmr} analysis was done by our collaborators Max Colonna and Art Edison at
the University of Georgia; methods included here for reference}. \SI{5}{\ul} of the University of Georgia; methods included here for reference}. \SI{5}{\ul}
100/3 \si{\mM} DSS-D6 in deuterium oxide (Cambridge Isotope Laboratories) were of 100/3 \si{\mM} DSS-D6 in deuterium oxide (Cambridge Isotope Laboratories)
added to \SI{1.7}{\mm} \gls{nmr} tubes (Bruker BioSpin), followed by were added to \SI{1.7}{\mm} \gls{nmr} tubes (Bruker BioSpin), followed by
\SI{45}{\ul} of media from each sample that was added and mixed, for a final \SI{45}{\ul} of media from each sample that was added and mixed, for a final
volume of \SI{50}{\ul} in each tube. Samples were prepared on ice and in volume of \SI{50}{\ul} in each tube. Samples were prepared on ice in
predetermined, randomized order. The remaining volume from each sample in the predetermined, randomized order. The remaining volume from each sample in the
rack (approx. \SI{4}{\ul}) was combined to create an internal pool. This rack (approx. \SI{4}{\ul}) was combined to create an internal pool. This
material was used for internal controls within each rack as well as metabolite material was used for internal controls within each rack as well as metabolite
@ -3010,15 +3010,15 @@ Two-dimensional spectra collected on pooled samples were uploaded to COLMARm web
server, where \gls{hsqc} peaks were automatically matched to database peaks. server, where \gls{hsqc} peaks were automatically matched to database peaks.
\gls{hsqc} matches were manually reviewed with additional 2D and proton spectra \gls{hsqc} matches were manually reviewed with additional 2D and proton spectra
to confirm the match. Annotations were assigned a confidence score based upon to confirm the match. Annotations were assigned a confidence score based upon
the levels of spectral data supporting the match as previously spectral data levels supporting the match as previously
described\cite{Dashti2017}. Annotated metabolites were matched to previously described\cite{Dashti2017}. Annotated metabolites were matched to previously
selected features used for statistical analysis. selected features used for statistical analysis.
Several low abundance features selected for analysis did not have database Several low abundance features selected for analysis did not have database
matches and were not annotated. Statistical total correlation spectroscopy41 matches and were not annotated. Statistical total correlation
suggested that some of these unknown features belonged to the same molecules spectroscopy\cite{Holmes2006} suggested that some of these unknown features
(not shown). Additional multidimensional \gls{nmr} experiments will be required belonged to the same molecules (not shown). Additional multidimensional
to determine their identity. \gls{nmr} experiments will be required to determine their identity.
\subsection{Machine Learning and Statistical Analysis} \subsection{Machine Learning and Statistical Analysis}
@ -3026,26 +3026,24 @@ Linear regression analysis of the \glspl{doe} was performed as described in
\cref{sec:statistics}. \cref{sec:statistics}.
Seven \gls{ml} techniques were implemented to predict three responses related to Seven \gls{ml} techniques were implemented to predict three responses related to
the memory phenotype of the cultured T cells under different process the memory phenotype of the cultured T cells under different process conditions
conditions (\rmemh{}, \rmemk{}, and \rratio{}). The \gls{ml} methods (\rmemh{}, \rmemk{}, and \rratio{}). The \gls{ml} methods executed were
executed were \gls{rf}, \gls{gbm}, \gls{cif}, \gls{lasso}, \gls{plsr}, \gls{rf}, \gls{gbm}, \gls{cif}, \gls{lasso}, \gls{plsr}, \gls{svm}, and
\gls{svm}, and DataModelers \gls{sr}\footnote{\gls{sr} was performed by Theresa DataModelers \gls{sr}\footnote{\gls{sr} was performed by Theresa Kotanchek at
Kotanchek at Evolved Analytics, \gls{rf}, \gls{gbm}, \gls{cif}, \gls{plsr}, Evolved Analytics, \gls{rf}, \gls{gbm}, \gls{cif}, \gls{plsr}, \gls{svm} were
\gls{svm} were performed by Valerie Odeh-Couvertier at UPRM. Methods included performed by Valerie Odeh-Couvertier at UPRM. Methods included here for
here for reference}. Primarily, \gls{sr} models were used to optimize process reference}. Primarily, \gls{sr} models were used to optimize process parameter
parameter values based on \ptmem{} phenotype and to extract early predictive values based on \ptmem{} phenotype and to extract early predictive variable
variable combinations from the multi-omics experiments. Furthermore, all combinations from the multi-omics experiments. Furthermore, high-performing
regression methods were executed, and the high-performing models were used to models from each method were used in consensus analysis to extract potential
perform a consensus analysis of the important variables to extract potential \glspl{cqa} and \glspl{cpp} predictive of T cell potency, safety, and
critical quality attributes and critical process parameters predictive of T cell consistency at the early stages of the manufacturing process.
potency, safety, and consistency at the early stages of the manufacturing
process.
\gls{sr} was done using Evolved Analytics DataModeler software (Evolved \gls{sr} was done using Evolved Analytics DataModeler software (Evolved
Analytics LLC, Midland, MI). DataModeler utilizes genetic programming to evolve Analytics LLC, Midland, MI). DataModeler utilizes genetic programming to evolve
symbolic regression models (both linear and non-linear) rewarding simplicity and symbolic regression models (both linear and non-linear) rewarding simplicity and
accuracy. Using the selection criteria of highest accuracy accuracy. Using the selection criteria of highest accuracy
($R^2$>\SI{90}{\percent}) and lowest complexity, the top-performing models were ($R^2>\SI{90}{\percent}$) and lowest complexity, the top-performing models were
identified. Driving variables, variable combinations, and model dimensionality identified. Driving variables, variable combinations, and model dimensionality
tables were generated. The top-performing variable combinations were used to tables were generated. The top-performing variable combinations were used to
generate model ensembles. In this analysis, DataModelers generate model ensembles. In this analysis, DataModelers
@ -3073,7 +3071,7 @@ values, potential optima in the responses, and regions of parameter values where
the predictions diverge the most. the predictions diverge the most.
Non-parametric tree-based ensembles were done through the Non-parametric tree-based ensembles were done through the
\inlinecode{randomForest}, inlinecode{gbm}, and \inlinecode{cforest} regression \inlinecode{randomForest}, \inlinecode{gbm}, and \inlinecode{cforest} regression
functions in R, for \gls{rf}, \gls{gbm}, and \gls{cif} models, respectively. functions in R, for \gls{rf}, \gls{gbm}, and \gls{cif} models, respectively.
Both \gls{rf} and \gls{cif} construct multiple decision trees in parallel, by Both \gls{rf} and \gls{cif} construct multiple decision trees in parallel, by
randomly choosing a subset of features at each decision tree split, in the randomly choosing a subset of features at each decision tree split, in the
@ -3117,8 +3115,8 @@ model with \gls{loocv} tuned parameters.
Consensus analysis of the relevant variables extracted from each machine Consensus analysis of the relevant variables extracted from each machine
learning model was done to identify consistent predictive features of quality at learning model was done to identify consistent predictive features of quality at
the early stages of manufacturing. First importance scores for all features were the early stages of manufacturing. First, importance scores for all features
measured across all \gls{ml} models using \inlinecode{varImp} with were measured across all \gls{ml} models using \inlinecode{varImp} with
\inlinecode{caret} R package except for scores for \gls{svm} which \inlinecode{caret} R package except for scores for \gls{svm} which
\inlinecode{rminer} R package was used. These importance scores were percent \inlinecode{rminer} R package was used. These importance scores were percent
increase in \gls{mse}, relative importance through average increase in increase in \gls{mse}, relative importance through average increase in
@ -3130,26 +3128,25 @@ respectively. Using these scores, key predictive variables were selected if
their importance scores were within the \nth{80} percentile ranking for the their importance scores were within the \nth{80} percentile ranking for the
following \gls{ml} methods: \gls{rf}, \gls{gbm}, \gls{cif}, \gls{lasso}, following \gls{ml} methods: \gls{rf}, \gls{gbm}, \gls{cif}, \gls{lasso},
\gls{plsr}, \gls{svm} while for \gls{sr} variables present in >\SI{30}{\percent} \gls{plsr}, \gls{svm} while for \gls{sr} variables present in >\SI{30}{\percent}
of the top-performing \gls{sr} models from DataModeler ($R^2\ge$ of the top-performing \gls{sr} models from DataModeler
\SI{90}{\percent}, Complexity $\ge$ 100) were chosen to investigate consensus ($R^2\ge \SI{90}{\percent}$, Complexity $\ge 100$) were chosen to investigate
except for \gls{nmr} media models at day 4 which considered a combination of the consensus except for \gls{nmr} media models at day 4 which considered a
top-performing results of models excluding lactate ppms, and included those combination of the top-performing results of models excluding lactate ppms, and
variables which were in >\SI{40}{\percent} of the best performing models. Only included those variables which were in >\SI{40}{\percent} of the best performing
variables with those high percentile scoring values were evaluated in terms of models. Only variables with high percentile scoring values were evaluated in
their logical relation (intersection across \gls{ml} models) and depicted using terms of their logical relation (intersection across \gls{ml} models) and
a Venn diagram from the \inlinecode{venn} R package. depicted using a Venn diagram from the \inlinecode{venn} R package.
\section{Results} \section{Results}
\subsection{DMSs Grow T Cells With Lower IL2 Concentrations} \subsection{DMSs Grow T Cells With Lower IL2 Concentrations}
Prior to the main experiments in this aim, we performed a preliminary experiment Prior to the main experiments in this aim, we assessed the effect of lowering
to assess the effect of lowering the \gls{il2} concentration on the T cells the \gls{il2} concentration on the T cells grown with either bead or \gls{dms}.
grown with either bead or \gls{dms}. One of the hypotheses for the \gls{dms} One of our hypotheses for the \gls{dms} system was that higher cell density
system was that the higher cell density would enable more efficient cross-talk would enhance cross-talk between T cells. Since \gls{il2} is secreted by
between T cells. Since \gls{il2} is secreted by activated T cells themselves, activated T cells themselves, T cells in the \gls{dms} system may need less or
T cells in the \gls{dms} system may need less or no \gls{il2} if this hypothesis no \gls{il2} if this is true.
were true.
\begin{figure*}[ht!] \begin{figure*}[ht!]
\begingroup \begingroup
@ -3164,7 +3161,7 @@ were true.
\caption[T Cells Grown at Varying IL2 Concentrations] \caption[T Cells Grown at Varying IL2 Concentrations]
{\glspl{dms} grow T cells effectively at lower IL2 concentrations. {\glspl{dms} grow T cells effectively at lower IL2 concentrations.
\subcap{fig:il2_mod_timecourse}{Longitudinal cell counts of T cells grown \subcap{fig:il2_mod_timecourse}{Longitudinal cell counts of T cells grown
with either bead or \glspl{dms} using varying IL2 concentrations} with either bead or \glspl{dms} using varying IL2 concentrations.}
Day 14 counts of either \subcap{fig:il2_mod_total}{total cells} or Day 14 counts of either \subcap{fig:il2_mod_total}{total cells} or
\subcap{fig:il2_mod_mem}{\ptmem{} cells} plotted against \gls{il2} \subcap{fig:il2_mod_mem}{\ptmem{} cells} plotted against \gls{il2}
concentration. concentration.
@ -3179,14 +3176,9 @@ expanded T cells as described in \cref{sec:tcellculture}. T cells grown with
either method expanded robustly as \gls{il2} concentration was increased either method expanded robustly as \gls{il2} concentration was increased
(\cref{fig:il2_mod_timecourse}). Surprisingly, neither the bead or the \gls{dms} (\cref{fig:il2_mod_timecourse}). Surprisingly, neither the bead or the \gls{dms}
group expanded at all with \SI{0}{\IU\per\ml} \gls{il2}. When examining the group expanded at all with \SI{0}{\IU\per\ml} \gls{il2}. When examining the
endpoint fold change after \SI{14}{\day}, we observe that the difference between endpoint fold change after \SI{14}{\day}, we observed that the difference
the bead and \gls{dms} appears to be greater at lower \gls{il2} concentrations between the bead and \gls{dms} appears to be greater at lower \gls{il2}
(\cref{fig:il2_mod_total}). concentrations (\cref{fig:il2_mod_total}). Furthermore, the same trend can be
% This is further supported by fitting a non-linear
% least squares equation to the data following a hyperbolic curve (which should be
% a plausible model given that this curve describes receptor-ligand kinetics,
% which we can assume \gls{il2} to follow).
Furthermore, the same trend can be
seen when only examining the \ptmem{} cell expansion at day 14 seen when only examining the \ptmem{} cell expansion at day 14
(\cref{fig:il2_mod_mem}). In this case, the \ptmemp{} of the T cells seemed to (\cref{fig:il2_mod_mem}). In this case, the \ptmemp{} of the T cells seemed to
be relatively close at higher \gls{il2} concentrations, but separated further at be relatively close at higher \gls{il2} concentrations, but separated further at
@ -3196,16 +3188,24 @@ Taken together, these data do not support the hypothesis that the \gls{dms}
system does not need \gls{il2} at all; however, it appears to have a modest system does not need \gls{il2} at all; however, it appears to have a modest
advantage at lower \gls{il2} concentrations compared to beads. For this reason, advantage at lower \gls{il2} concentrations compared to beads. For this reason,
we decided to investigate the lower range of \gls{il2} concentrations starting we decided to investigate the lower range of \gls{il2} concentrations starting
at \SI{10}{\IU\per\ml} throughout the remainder of this aim. at \SI{10}{\IU\per\ml} in the remainder of this aim.
\subsection{DOE Shows Optimal Conditions for Potent T Cells} \subsection{DOE Shows Optimal Conditions for Potent T Cells}
% TABLE not all of these were actually used, explain why by either adding columns \begin{table}[!h]
% or marking with an asterisk \centering
\begin{table}[!h] \centering \begin{threeparttable}
\caption{DOE Runs} \caption{DOE Runs}
\label{tab:doe_runs} \label{tab:doe_runs}
\input{../tables/doe_runs.tex} \input{../tables/doe_runs.tex}
\begin{tablenotes}
\item[a] It was determined later that the total \glspl{mab} surface density
may not be consistent across each batch of \gls{dms} used. Thus, these
runs were taken out as they were created at different scale and with a
different operator compared to the rest. Leaving them in may produce
unobserved confounding factors
\end{tablenotes}
\end{threeparttable}
\end{table} \end{table}
\begin{figure*}[ht!] \begin{figure*}[ht!]
@ -3224,38 +3224,29 @@ at \SI{10}{\IU\per\ml} throughout the remainder of this aim.
\label{fig:doe_response_first} \label{fig:doe_response_first}
\end{figure*} \end{figure*}
% RESULT maybe add regression tables to this, although it doesn't really matter
% since we end up doing regression on the full thing later anyways.
We conducted two consecutive \glspl{doe} to optimize the \pth{} and \ptmem{} We conducted two consecutive \glspl{doe} to optimize the \pth{} and \ptmem{}
responses for the \gls{dms} system. In the first \gls{doe} we, tested \pilII{} in responses for the \gls{dms} system. In the first, we tested \pilII{} in the
the range of \SIrange{10}{30}{\IU\per\ml}, \pdms{} in the range of range of \SIrange{10}{30}{\IU\per\ml}, \pdms{} in the range of
\SIrange{500}{2500}{\dms\per\ml}, and \pmab{} in the range of \SIrange{500}{2500}{\dms\per\ml}, and \pmab{} in the range of
\SIrange{60}{100}{\percent}. When looking at the total \ptmemp{} output, we \SIrange{60}{100}{\percent}. When looking at total \ptmemp{} cells, \pilII{}
observed that \pilII{} showed a positive linear trend with the \pdms{} and showed a positive linear trend and \pdms{} and \pmab{} showed possible
\pmab{} showing possible second-order effects with maximums and minimums at the second-order effects with intermediate maximums and minimums respectively
intermediate level (\cref{fig:doe_response_first_mem}). In the case of \pth{}, (\cref{fig:doe_response_first_mem}). In the case of \pth{}, all parameters
we observed that all parameters seemed to have a positive linear response, with showed a positive, suggesting a maximum might exist at a higher value for each.
\pilII{} and \pdms{} showing slight second order effects that suggest a maximum
might exist at a higher value for each.
After performing the first \gls{doe} we augmented the original design matrix After performing the first \gls{doe}, we augmented the original design matrix
with an \gls{adoe} which was built with three goals in mind. Firstly we wished with an \gls{adoe} which was built with three goals in mind. Firstly we wished
to validate the first \gls{doe} by assessing the strength and responses of each to validate the first \gls{doe} by assessing the strength and responses of each
effect. Secondly, we wished to improve our confidence in regions that showed effect. Secondly, we wished to improve our confidence in regions that showed
high complexity, such as the peak in the \gls{dms} concentration for the total high complexity, such as the peak in the \gls{dms} concentration for the total
\ptmem{} cell response. Thirdly, we wished to explore additional ranges of each \ptmem{} cell response. Thirdly, we wished to explore additional ranges of each
response. Since \pilII{} and \pdms{} appeared to continue positively influence response. Notably, \pilII{} appeared to increase beyond our tested range, thus
multiple responses beyond our tested range, we were curious if there was an we were curious if there was an optimum at some higher setting. For this reason,
optimum at some higher setting of either of these values. For this reason, we we increased the \pilII{} to include \SI{40}{\IU\per\ml} and the \pdms{} to
increased the \pilII{} to include \SI{40}{\IU\per\ml} and the \pdms{} to
\SI{3500}{\dms\per\ml}. Note that it was impossible to go beyond \SI{3500}{\dms\per\ml}. Note that it was impossible to go beyond
\SI{100}{\percent} for the \pmab{}, so runs were positioned for this parameter \SI{100}{\percent} for the \pmab{}, so runs were positioned for this parameter
with validation and confidence improvements in mind. The runs for each \gls{doe} with validation and confidence improvements in mind. The runs for each \gls{doe}
were shown in \cref{tab:doe_runs}\footnote{Not all runs in this table were used. were shown in \cref{tab:doe_runs}.
It was determined later that the total \glspl{mab} surface density may not be
consistent across each batch of \gls{dms} used, primarily due to the fact that a
subset were created at different scale and with a different operator. To remove
this bias in our data, these runs were not used.}.
\begin{figure*}[ht!] \begin{figure*}[ht!]
\begingroup \begingroup
@ -3329,10 +3320,10 @@ responses showed mostly linear relationships in all parameter cases
% anything to be significant % anything to be significant
We performed linear regression on the three input parameters as well as a binary We performed linear regression on the three input parameters as well as a binary
parameter representing if a given run came from the first or second \gls{doe} parameter representing if a given run came from the first or second \gls{doe}
(called `dataset'). Starting with the total \ptmem{} cells response, we fit a (called ``dataset''). Starting with the total \ptmem{} cells response, we fit a
first order regression model using these four parameters first order regression model using these four parameters
(\cref{tab:doe_mem1.tex}). While \pilII{} was found to be a significant (\cref{tab:doe_mem1.tex}). While \pilII{} was found to be a significant
predictor, the model fit was extremely poor ($R^2$ of 0.331). This was not predictor, the model fit was extremely poor ($R^2 = 0.331$). This was not
surprising given the apparent complexity of this response surprising given the apparent complexity of this response
(\cref{fig:doe_responses_mem}). To obtain a better fit, we added second and (\cref{fig:doe_responses_mem}). To obtain a better fit, we added second and
third degree terms (\cref{tab:doe_mem2.tex}). Note that the dataset parameter third degree terms (\cref{tab:doe_mem2.tex}). Note that the dataset parameter
@ -3350,9 +3341,8 @@ that our data might be underpowered for a model this complex. Further
experiments beyond what was performed here may be needed to fully describe this experiments beyond what was performed here may be needed to fully describe this
response. response.
% TABLE combine these tables into one
We performed linear regression on the other three responses, all of which We performed linear regression on the other three responses, all of which
performed much better than the \ptmem{} response as expected given the much performed much better than the \ptmem{} response as expected given the
lower apparent complexity in the response plots lower apparent complexity in the response plots
(\cref{fig:doe_responses_cd4,fig:doe_responses_mem4,fig:doe_responses_ratio}). (\cref{fig:doe_responses_cd4,fig:doe_responses_mem4,fig:doe_responses_ratio}).
All these models appeared to fit will, with $R^2$ and $R_{adj}^2$ upward of All these models appeared to fit will, with $R^2$ and $R_{adj}^2$ upward of
@ -3380,11 +3370,10 @@ significant predictors.
We then visualized the total \ptmemh{} cells and \rratio{} using the response We then visualized the total \ptmemh{} cells and \rratio{} using the response
explorer in DataModeler to create contour plots around the maximum responses. explorer in DataModeler to create contour plots around the maximum responses.
For both, it appeared that maximizing all three input parameters resulted in the For both, maximizing all input parameters maximized both responses
maximum value for either response (\cref{fig:doe_sr_contour}). While not all (\cref{fig:doe_sr_contour}). While not all combinations at and around this
combinations at and around this optimum were tested, the model nonetheless optimum were tested, these plots suggest that there were no other optimal values
showed that there were no other optimal values or regions elsewhere in the elsewhere.
model.
\subsection{Modeling with Machine Learning Reveals Putative CQAs} \subsection{Modeling with Machine Learning Reveals Putative CQAs}
@ -3407,16 +3396,15 @@ features of quality early in their expansion process.
\label{fig:doe_luminex} \label{fig:doe_luminex}
\end{figure*} \end{figure*}
We collected secretome data via luminex for days 4, 6, 8, 11, and 14. We collected secretome data via luminex for days 4, 6, 8, 11, and 14. Plotting
Plotting the concentrations of these cytokines showed a large variation over all the concentrations of these cytokines showed a large variation over all runs and
runs and between different timepoints, demonstrated that these could potentially between different timepoints, demonstrating that these could be used to
be used to differentiate between different process conditions qualitatively differentiate between different process conditions qualitatively simply based on
simply based on variance (\cref{fig:doe_luminex}). These were also much higher variance (\cref{fig:doe_luminex}). These were also much higher in most cases
in most cases that a set of bead based runs which were run in parallel, in that a set of bead based runs which were run in parallel, in agreement with the
agreement with the luminex data obtained previously in the Grex system (these luminex data obtained previously in the Grex system (these data were collected
data were collected in plates) (\cref{fig:grex_luminex}). in plates) (\cref{fig:grex_luminex}).
% TABLE this table looks like crap, break it up into smaller tables
\begin{table}[!h] \centering \begin{table}[!h] \centering
\caption[Machine Learning Model Results] \caption[Machine Learning Model Results]
{Results for \gls{ml} modeling using process parameters (PP) with {Results for \gls{ml} modeling using process parameters (PP) with
@ -3428,15 +3416,15 @@ data were collected in plates) (\cref{fig:grex_luminex}).
\end{table} \end{table}
\gls{sr} models achieved the highest predictive performance \gls{sr} models achieved the highest predictive performance
($R^2$>\SI{93}{\percent}) when using multi-omics predictors for all endpoint ($R^2>\SI{93}{\percent}$) when using multi-omics predictors for all endpoint
responses (\cref{tab:mod_results}). \gls{sr} achieved $R^2$>\SI{98}{\percent} responses (\cref{tab:mod_results}). \gls{sr} achieved $R^2>\SI{98}{\percent}$
while \gls{gbm} ensembles showed \gls{loocv} $R^2$ > \SI{95}{\percent} for while \gls{gbm} ensembles showed \gls{loocv} $R^2>\SI{95}{\percent}$ for
\rmemh{} and \rmemk{} responses. Similarly, \gls{lasso}, \gls{plsr}, and \rmemh{} and \rmemk{} responses. Similarly, \gls{lasso}, \gls{plsr}, and
\gls{svm} methods showed consistently high \gls{loocv}, (\SI{92.9}{\percent}, \gls{svm} methods showed consistently high \gls{loocv}, (\SI{92.9}{\percent},
\SI{99.7}{\percent}, and \SI{90.5}{\percent} respectively), to predict the \SI{99.7}{\percent}, and \SI{90.5}{\percent} respectively), to predict the
\rratio{}. Yet, about \SI{10}{\percent} reduction in \gls{loocv}, \rratio{}. Yet, about \SI{10}{\percent} reduction in \gls{loocv},
\SIrange{72.5}{81.7}{\percent}, was observed for \rmemh{} with these three \SIrange{72.5}{81.7}{\percent}, was observed for \rmemh{} with these three
methods. Lastly, \gls{sr} and \gls{plsr} achieved $R^2$>\SI{90}{\percent} while methods. Lastly, \gls{sr} and \gls{plsr} achieved $R^2>\SI{90}{\percent}$ while
other \gls{ml} methods exhibited exceedingly variable \gls{loocv} other \gls{ml} methods exhibited exceedingly variable \gls{loocv}
(\SI{0.3}{\percent} for \gls{rf} to \SI{51.5}{\percent} for \gls{lasso}) for (\SI{0.3}{\percent} for \gls{rf} to \SI{51.5}{\percent} for \gls{lasso}) for
\rmemk{}. \rmemk{}.
@ -3485,18 +3473,13 @@ methods for predicting \rratio{} when considering features with the highest
importance scores across models (\cref{fig:mod_flower_48r}). Other features, importance scores across models (\cref{fig:mod_flower_48r}). Other features,
IL2R, IL4, IL17a, and \pdms{}, were commonly selected in $\ge$ 5 \gls{ml} IL2R, IL4, IL17a, and \pdms{}, were commonly selected in $\ge$ 5 \gls{ml}
methods (\cref{fig:mod_flower_48r}). When restricting the models only to include methods (\cref{fig:mod_flower_48r}). When restricting the models only to include
metabolome, formate emerged as the dominant predictor shared across all seven metabolome, formate was the sole predictor shared by all.
models.
% Moreover, IL13 and IL15 were found predictive in combination When performing similar analysis on \rmemh{}, no species for either secretome or
% with these using \gls{sr} (Supp.Table.S4). metabolome was shared by all models (\cref{fig:mod_flower_cd4}). These models
also had worse fits compared to those for \rratio{} (\cref{tab:mod_results}).
When performing similar analysis on \rmemh{}, we observe that no species for For the secretome, IL4, IL17a, and IL2R were agreed upon by $\ge$ 5 models. For
either the secretome or metabolome was agreed upon by all seven models the metabolome, formate once again was shared by $\ge$ 5 models as well as
(\cref{fig:mod_flower_cd4}). We also observed that these models did not fit as
well as they did for \rratio{} (\cref{tab:mod_results}). For the secretome, the
species that were agreed upon by $\ge$ 5 models were IL4, IL17a, and IL2R. For
the metabolome, formate once again was agreed upon by $\ge$ 5 models as well as
lactate. lactate.
\begin{figure*}[ht!] \begin{figure*}[ht!]
@ -3523,12 +3506,12 @@ lactate.
\label{fig:nmr_cors} \label{fig:nmr_cors}
\end{figure*} \end{figure*}
We also investigated the \gls{nmr} features extracted from day of expansion to We also asked if day 4 \gls{nmr} features could predict \ptmemh{}; these models
assess if there was any predictive power for \ptmemh{}; in general these models generally fit well despite being 2 days earlier in the process
had almost as good of fit despite being 2 days earlier in the process (\cref{fig:nmr_cors})\footnote{for anyone wondering why we don't have the
(\cref{fig:nmr_cors}). Lactate and formate were observed to correlate with each matching secretome data for day 4, blame UPS for losing our samples}. Lactate
other, and both correlated with \rmemh{}. Furthermore, lactate was observed to and formate correlated with each other and with \rmemh{}. Furthermore, lactate
positively correlate with \pdms{} and negatively correlate with glucose positively correlated with \pdms{} and negatively correlated with glucose
(\cref{fig:nmr_cors_lactate}). Formate also had the same correlation patterns (\cref{fig:nmr_cors_lactate}). Formate also had the same correlation patterns
(\cref{fig:nmr_cors_formate}). Glucose was only negatively correlated with (\cref{fig:nmr_cors_formate}). Glucose was only negatively correlated with
formate and lactate (\cref{fig:nmr_cors_glucose}). Together, these data suggest formate and lactate (\cref{fig:nmr_cors_glucose}). Together, these data suggest
@ -3537,38 +3520,34 @@ that lactate, formate, \pdms{}, and \rmemh{} are fundamentally linked.
\section{Discussion} \section{Discussion}
\gls{cpp} modeling and understanding are critical to new product development and \gls{cpp} modeling and understanding are critical to new product development and
in cell therapy development, it can have life-saving implications. The have life-saving implications in the context of cell therapy. The challenges for
challenges for effective modeling grow with the increasing complexity of effective modeling grow with the increasing process complexity due to high
processes due to high dimensionality, and the potential for process interactions dimensionality, interactions between parameters, nonlinearity. Another critical
and nonlinear relationships. Another critical challenge is the limited amount of challenge is the limited amount of available data. \gls{sr} has the necessary
available data, mostly small \gls{doe} datasets. \gls{sr} has the necessary
capabilities to resolve the issues of process effects modeling and has been capabilities to resolve the issues of process effects modeling and has been
applied across multiple industries\cite{Kordona}. \gls{sr} discovers applied across multiple industries\cite{Kordona}. \gls{sr} discovers
mathematical expressions that fit a given sample and differs from conventional mathematical expressions that fit a given sample and differs from conventional
regression techniques in that a model structure is not defined \textit{a regression techniques in that a model structure is not defined \textit{a
priori}\cite{Koza1992}. Hence, a key advantage of this methodology is that priori}\cite{Koza1992}. Hence, a key advantage of this methodology is that
transparent, human-interpretable models can be generated from small and large transparent, human-interpretable models can be generated from small and large
datasets with no prior assumptions\cite{Kotancheka}. datasets with few prior assumptions\cite{Kotancheka}.
Since the model search process lets the data determine the model, diverse and Since the model search process lets the data determine the model, diverse and
competitive model structures are typically discovered. An ensemble of diverse competitive model structures are typically discovered. An diverse ensemble will
models can be formed where its constituent models will tend to agree when contain models that agree in regions constrained by observable data and diverge
constrained by observed data yet diverge in new regions. Collecting data in in regions without data. Collecting data in divergent regions ensures the system
these regions helps to ensure that the target system is accurately modeled, and is accurately modeled and its optimum accurately located\cite{Kotancheka}.
its optimum is accurately located\cite{Kotancheka}. Exploiting these features Consequently, this \gls{adoe} approach is useful in a many scenarios, including
allows adaptive data collection and interactive modeling. Consequently, this maximizing model validity for model-based decision making, optimizing processing
\gls{adoe} approach is useful in a variety of scenarios, including maximizing parameters to maximize yield, and developing emulators for online optimization
model validity for model-based decision making, optimizing processing parameters and human understanding\cite{Kotancheka}.
to maximize target yields, and developing emulators for online optimization and
human understanding\cite{Kotancheka}.
An in-depth characterization of potential \gls{dms} based T cell \glspl{cqa} An in-depth characterization of potential \gls{dms} based T cell \glspl{cqa}
includes a list of cytokine and \gls{nmr} features from media samples that are includes a list of cytokine and \gls{nmr} features from media samples that are
crucial in many aspects of T cell fate decisions and effector functions of crucial in many aspects of T cell fate decisions and effector functions of
immune cells. Cytokine features were observed to slightly improve prediction and immune cells. Cytokine features slightly improved prediction and dominated the
dominated the ranking of important features and variable combinations when ranking of important features and variable combinations when modeling together
modeling together with \gls{nmr} media analysis and process parameters with \gls{nmr} media analysis and process parameters (\cref{fig:mod_flower}).
(\cref{fig:mod_flower}).
Predictive cytokine features such as \gls{tnfa}, IL2R, IL4, IL17a, IL13, and Predictive cytokine features such as \gls{tnfa}, IL2R, IL4, IL17a, IL13, and
IL15 were biologically assessed in terms of their known functions and activities IL15 were biologically assessed in terms of their known functions and activities
@ -3577,35 +3556,35 @@ cells, as per their main functions, and activated T cells secrete more cytokines
than resting T cells. It is possible that some cytokines simply reflect the than resting T cells. It is possible that some cytokines simply reflect the
\rratio{} and the activation degree by proxy proliferation. However, the exact \rratio{} and the activation degree by proxy proliferation. However, the exact
ratio of expected cytokine abundance is less clear and depends on the subtypes ratio of expected cytokine abundance is less clear and depends on the subtypes
present, and thus examination of each relevant cytokine is needed. present, thus examination of each relevant cytokine is needed.
IL2R is secreted by activated T cells and binds to IL2, acting as a sink to IL2R is secreted by activated T cells and binds to IL2, acting as a sink to
dampen its effect on T cells\cite{Witkowska2005}. Since IL2R was much greater dampen its effect on T cells\cite{Witkowska2005}. Since IL2R was more abundant
than IL2 in solution, this might reduce the overall effect of IL2, which could than IL2 in solution, this might reduce the overall effect of IL2, which could
be further investigated by blocking IL2R with an antibody. In T cells, TNF can be further investigated by blocking IL2R with an antibody. In T cells, TNF can
increase IL2R, proliferation, and cytokine production\cite{Mehta2018}. It may increase IL2R, proliferation, and cytokine production\cite{Mehta2018}. It may
also induce apoptosis depending on concentration and alter the CD4+ to CD8+ also induce apoptosis depending on concentration and alter the CD4:CD8
ratio\cite{Vudattu2005}. Given that TNF has both a soluble and membrane-bound ratio\cite{Vudattu2005}. Given that TNF has both a soluble and membrane-bound
form, this may either increase or decrease CD4+ ratio and/or memory T cells form, this may either increase or decrease CD4:CD8 ratio and/or memory T cells
depending on the ratio of the membrane to soluble TNF\cite{Mehta2018}. Since depending on the ratio of the membrane to soluble TNF\cite{Mehta2018}. Since
only soluble TNF was measured, membrane TNF is needed to understand its impact only soluble TNF was measured, membrane TNF is needed to understand its impact
on both CD4+ ratio and memory T cells. Furthermore, IL13 is known to be critical on both CD4:CD8 ratio and memory T cells. Furthermore, IL13 is known to be
for \gls{th2} response and therefore could be secreted if there are significant critical for \gls{th2} response and therefore could be secreted if there are
\glspl{th2} already present in the starting population\cite{Wong2011}. This significant \glspl{th2} already present in the starting
cytokine has limited signaling in T cells and is thought to be more of an population\cite{Wong2011}. This cytokine has limited signaling in T cells and is
effector than a differentiation cytokine\cite{Junttila2018}. It might be thought to be more of an effector than a differentiation
emerging as relevant due to an initially large number of \glspl{th2} or because cytokine\cite{Junttila2018}. It might be emerging here due to an initially large
\glspl{th2} were preferentially expanded; indeed, IL4, also found important, is number of \glspl{th2} or because \glspl{th2} were preferentially expanded;
the conical cytokine that induces \gls{th2} differentiation indeed, IL4, also found important, is the canonical cytokine that induces
(\cref{fig:mod_flower}). The role of these cytokines could be investigated by \gls{th2} differentiation (\cref{fig:mod_flower}). The role of these cytokines
quantifying \glspl{th1}, \glspl{th2}, or \glspl{th17} both in the starting could be investigated by quantifying \glspl{th1}, \glspl{th2}, or \glspl{th17}
population and longitudinally. Similar to IL13, IL17 is an effector cytokine both in the starting population and longitudinally. Similar to IL13, IL17 is an
produced by \glspl{th17}\cite{Amatya2017} thus may reflect the number of effector cytokine produced by \glspl{th17}\cite{Amatya2017} thus may reflect the
\glspl{th17} in the population. GM-CSF has been linked with activated T cells, number of \glspl{th17} in the population. GM-CSF has been linked with activated
specifically \glspl{th17}, but it is not clear if this cytokine is inducing T cells, specifically \glspl{th17}, but it is not clear if this cytokine is
differential expansion of CD8+ T cells or if it is simply a covariate with inducing differential expansion of CD8+ T cells or if it is simply a covariate
another cytokine inducing this expansion\cite{Becher2016}. Finally, IL15 has with another cytokine inducing this expansion\cite{Becher2016}. Finally, IL15
been shown to be essential for memory signaling and effective in skewing has been shown to be essential for memory signaling and effective in skewing
\gls{car} T cells toward \glspl{tscm} when using membrane-bound IL15Ra and \gls{car} T cells toward \glspl{tscm} when using membrane-bound IL15Ra and
IL15R\cite{Hurton2016}. Its high predictive behavior goes with its ability to IL15R\cite{Hurton2016}. Its high predictive behavior goes with its ability to
induce large numbers of memory T cells by functioning in an autocrine/paracrine induce large numbers of memory T cells by functioning in an autocrine/paracrine
@ -3616,24 +3595,24 @@ activity associated with T cell activation and differentiation, yet it is not
clear how the various combinations of metabolites relate with each other in a clear how the various combinations of metabolites relate with each other in a
heterogeneous cell population. Formate and lactate were found to be highly heterogeneous cell population. Formate and lactate were found to be highly
predictive and observed to positively correlate with higher values of total live predictive and observed to positively correlate with higher values of total live
\rmemh{} cells (~\cref{fig:nmr_cors}). Formate is a byproduct of the \rmemh{} cells (\cref{fig:nmr_cors}). Formate is a byproduct of the one-carbon
one-carbon cycle implicated in promoting T cell activation\cite{RonHarel2016}. cycle implicated in promoting T cell activation\cite{RonHarel2016}. Importantly,
Importantly, this cycle occurs between the cytosol and mitochondria of cells and this cycle occurs between the cytosol and mitochondria, from which formate is
formate excreted\cite{Pietzke2020}. Mitochondrial biogenesis and function are excreted\cite{Pietzke2020}. Mitochondrial biogenesis and function are shown to
shown necessary for memory cell persistence\cite{van_der_Windt_2012, be necessary for memory cell persistence\cite{van_der_Windt_2012, Vardhana2020}.
Vardhana2020}. Therefore, increased formate in media could be an indicator of Therefore, increased formate in media could be an indicator of one-carbon
one-carbon metabolism and mitochondrial activity in the culture. metabolism and mitochondrial activity in the culture.
In addition to formate, lactate was found as a putative \gls{cqa} of \ptmem{} In addition to formate, lactate was found as a putative \gls{cqa} of \ptmem{}
cells. Lactate is the end-product of aerobic glycolysis, characteristic of cells. Lactate is the end-product of aerobic glycolysis, characteristic of
highly proliferating cells and activated T cells\cite{Lunt2011, Chang2013}. highly proliferating cells and activated T cells\cite{Lunt2011, Chang2013}.
Glucose import and glycolytic genes are immediately upregulated in response to T Glucose import and glycolytic genes are upregulated in response to T cell
cell stimulation, and thus generation of lactate. At earlier time-points, this stimulation, thus leading to lactate. At earlier time-points, this abundance
abundance suggests a more robust induction of glycolysis and higher overall T suggests a more robust induction of glycolysis and higher overall T cell
cell proliferation. Interestingly, our models indicate that higher lactate proliferation. Interestingly, our models indicate that higher lactate predicts
predicts higher CD4+, both in total and in proportion to CD8+, seemingly higher CD4+, both in total and in proportion to CD8+, seemingly contrary to
contrary to previous studies showing that CD8+ T cells rely more on glycolysis previous studies showing that CD8+ T cells rely more on glycolysis for
for proliferation following activation\cite{Cao2014}. It may be that glycolytic proliferation following activation\cite{Cao2014}. It may be that glycolytic
cells dominate in the culture at the early time points used for prediction, and cells dominate in the culture at the early time points used for prediction, and
higher lactate reflects more cells. higher lactate reflects more cells.
@ -3652,28 +3631,26 @@ confounded by the partial replacement of media that occurred periodically during
expansion, thus likely diluting some metabolic byproducts (such as formate, expansion, thus likely diluting some metabolic byproducts (such as formate,
lactate) and elevating depleted precursors (such as glucose and amino acids). lactate) and elevating depleted precursors (such as glucose and amino acids).
More definitive conclusions of metabolic activity across the expanding cell More definitive conclusions of metabolic activity across the expanding cell
population can be addressed by a closed system, ideally with on-line process population can be addressed by a closed system, ideally with on-line sensors and
sensors and controls for formate, lactate, along with ethanol and glucose. controls for formate, lactate, ethanol, and glucose.
Practically, knowledge of how cytokines and/or metabolites are related to Practically, knowledge of how cytokines and/or metabolites are related to
outcome can be utilized for process control, which involves measuring the outcome can be utilized for process control, which involves measuring the
current state of the culture, comparing it to a desired state, and intervening current state of the culture, comparing it to a desired state, and intervening
if it is outside an acceptable range. In the case of lactate and formate, a if it is outside an acceptable range. In the case of lactate and formate, a
benchtop \gls{nmr} can be utilized to sample the media in real time during benchtop \gls{nmr} can be tuned to quantify lactate and formate to sample the
culture. This \gls{nmr} can be tuned to automatically quantify the presence of media in real time during culture. Formate is part of the one-carbon pathway,
lactate and formate. Formate is part of the one-carbon pathway, and thus culture and thus culture fate may be controlled by altering the inputs to this pathway
fate may be controlled by altering the inputs to this pathway (glycine, serine, (glycine, serine, choline) and/or adding folic acid inhibitors\cite{Ducker2017}.
choline) and/or adding folic acid inhibitors\cite{Ducker2017}. Since lactate is Since lactate is a direct byproduct of glycolysis, this may be controlled by
a direct byproduct of glycolysis, this may be controlled by altering the altering the concentration of glucose in solution. Each of these control schemes
concentration of glucose in solution. Each of these control schemes would need would need further study to assess if they have enough precision and temporal
further study to assess if they have enough precision and temporal resolution to resolution to reasonably ensure product quality. For cytokines, there is
reasonably ensure product quality. In the case of cytokines, there is currently currently no analogue to a benchtop \gls{nmr}; however, research is underway to
no analogue to a benchtop \gls{nmr}; however, research is underway to develop develop protein-specific sensors using aptamers\cite{Parolo2020}. Even without
protein-specific sensors using aptamers\cite{Parolo2020}. Even without these these developments, \gls{elisa} or Luminex can still quantify cytokines in a
developments, one could still use \gls{elisa} or Luminex to assess protein semi-automated manner. However, these are temporally discrete and impose a
levels in a semi-automated manner, but the disadvantage is that these assays are non-trivial delay before the intervention can be performed.
temporally discrete and impose a significant time lag before the intervention
can be performed.
\chapter{AIM 2B}\label{aim2b} \chapter{AIM 2B}\label{aim2b}