@ -21,6 +21,7 @@
\usepackage { listings}
\usepackage { listings}
\usepackage { tocloft}
\usepackage { tocloft}
\usepackage { epigraph}
\usepackage { epigraph}
\usepackage { threeparttable}
\hypersetup {
\hypersetup {
colorlinks=true,
colorlinks=true,
@ -2876,8 +2877,8 @@ The purpose of this sub-aim was to develop computational methods to identify
novel \glspl { cqa} and \glspl { cpp} that could be used for release criteria,
novel \glspl { cqa} and \glspl { cpp} that could be used for release criteria,
process control, and process optimization for the \gls { dms} platform. We
process control, and process optimization for the \gls { dms} platform. We
hypothesized that T cells grown using the \gls { dms} system would produce
hypothesized that T cells grown using the \gls { dms} system would produce
detectable biological signatures in the media supernatent which corresponded to
detectable biological signatures in the media supernatent which would correspond
clinically relevent responses such as fold expansion or phenotype. We tested
to clinically relevent responses such as fold expansion or phenotype. We tested
this hypothesis by activating T cells under a variety of conditions using a
this hypothesis by activating T cells under a variety of conditions using a
\gls { doe} , sampling the media at intermediate timepoints, and creating models to
\gls { doe} , sampling the media at intermediate timepoints, and creating models to
predict the outcome of the cultures. We should stress that the specific
predict the outcome of the cultures. We should stress that the specific
@ -2921,13 +2922,12 @@ progressed. Data from inputs and/or longitudinal samples were used to predict
the endpoint response. The fusion of cytokine and \gls { nmr} profiles from media
the endpoint response. The fusion of cytokine and \gls { nmr} profiles from media
to model these responses included 30 cytokines from a custom Thermo Fisher
to model these responses included 30 cytokines from a custom Thermo Fisher
ProcartaPlex Luminex kit and 20 \gls { nmr} features. These 20 spectral features
ProcartaPlex Luminex kit and 20 \gls { nmr} features. These 20 spectral features
from \gls { nmr} media analysis were selected out of approximately 250 peaks
from \gls { nmr} media analysis were selected out of approximately 250 peaks using
through the implementation of a variance-based feature selection approach and
a variance-based feature selection approach and some manual inspection steps.
some manual inspection steps.
The first \gls { doe} resulted in a randomized 18-run I-optimal custom design
The first \gls { doe} resulted in a randomized 18-run I-optimal custom design
where each \gls { dms} parameter was evaluated at three levels: \pilII { } (10, 20,
where each \gls { dms} parameter was evaluated at three levels: \pilII { } (10, 20,
and 30 U/uL), \pdms { } (500, 1500, 2500 \si { \dms \per \ u l} ), and \pmab { } (60, 80,
and 30 U/uL), \pdms { } (500, 1500, 2500 \si { \dms \per \ m l} ), and \pmab { } (60, 80,
100 \si { \percent } ). These 18 runs consisted of 14 unique parameter combinations
100 \si { \percent } ). These 18 runs consisted of 14 unique parameter combinations
where 4 of them were replicated twice to assess prediction error. To further
where 4 of them were replicated twice to assess prediction error. To further
optimize the initial region explored, an \gls { adoe} was designed with 10 unique
optimize the initial region explored, an \gls { adoe} was designed with 10 unique
@ -2972,11 +2972,11 @@ Cytokines were quantified via Luminex as described in
Prior to analysis, samples were centrifuged at \SI { 2990} { \gforce } for
Prior to analysis, samples were centrifuged at \SI { 2990} { \gforce } for
\SI { 20} { \minute } at \SI { 4} { \degreeCelsius } to clear any debris\footnote { all
\SI { 20} { \minute } at \SI { 4} { \degreeCelsius } to clear any debris\footnote { all
\gls { nmr} analysis was done by our collaborators Max Colonna and Art Edison at
\gls { nmr} analysis was done by our collaborators Max Colonna and Art Edison at
the University of Georgia; methods included here for reference} . \SI { 5} { \ul } of
the University of Georgia; methods included here for reference} . \SI { 5} { \ul }
100/3 \si { \mM } DSS-D6 in deuterium oxide (Cambridge Isotope Laboratories) were
of 100/3 \si { \mM } DSS-D6 in deuterium oxide (Cambridge Isotope Laboratories)
added to \SI { 1.7} { \mm } \gls { nmr} tubes (Bruker BioSpin), followed by
were added to \SI { 1.7} { \mm } \gls { nmr} tubes (Bruker BioSpin), followed by
\SI { 45} { \ul } of media from each sample that was added and mixed, for a final
\SI { 45} { \ul } of media from each sample that was added and mixed, for a final
volume of \SI { 50} { \ul } in each tube. Samples were prepared on ice and in
volume of \SI { 50} { \ul } in each tube. Samples were prepared on ice in
predetermined, randomized order. The remaining volume from each sample in the
predetermined, randomized order. The remaining volume from each sample in the
rack (approx. \SI { 4} { \ul } ) was combined to create an internal pool. This
rack (approx. \SI { 4} { \ul } ) was combined to create an internal pool. This
material was used for internal controls within each rack as well as metabolite
material was used for internal controls within each rack as well as metabolite
@ -3010,15 +3010,15 @@ Two-dimensional spectra collected on pooled samples were uploaded to COLMARm web
server, where \gls { hsqc} peaks were automatically matched to database peaks.
server, where \gls { hsqc} peaks were automatically matched to database peaks.
\gls { hsqc} matches were manually reviewed with additional 2D and proton spectra
\gls { hsqc} matches were manually reviewed with additional 2D and proton spectra
to confirm the match. Annotations were assigned a confidence score based upon
to confirm the match. Annotations were assigned a confidence score based upon
the levels of spectral data supporting the match as previously
spectral data levels supporting the match as previously
described\cite { Dashti2017} . Annotated metabolites were matched to previously
described\cite { Dashti2017} . Annotated metabolites were matched to previously
selected features used for statistical analysis.
selected features used for statistical analysis.
Several low abundance features selected for analysis did not have database
Several low abundance features selected for analysis did not have database
matches and were not annotated. Statistical total correlation spectroscopy41
matches and were not annotated. Statistical total correlation
suggested that some of these unknown features belonged to the same molecules
spectroscopy\cite { Holmes2006} suggested that some of these unknown features
(not shown). Additional multidimensional \gls { nmr} experiments will be required
belonged to the same molecules (not shown). Additional multidimensional
to determine their identity.
\gls { nmr} experiments will be required to determine their identity.
\subsection { Machine Learning and Statistical Analysis}
\subsection { Machine Learning and Statistical Analysis}
@ -3026,26 +3026,24 @@ Linear regression analysis of the \glspl{doe} was performed as described in
\cref { sec:statistics} .
\cref { sec:statistics} .
Seven \gls { ml} techniques were implemented to predict three responses related to
Seven \gls { ml} techniques were implemented to predict three responses related to
the memory phenotype of the cultured T cells under different process
the memory phenotype of the cultured T cells under different process conditions
conditions (\rmemh { } , \rmemk { } , and \rratio { } ). The \gls { ml} methods
(\rmemh { } , \rmemk { } , and \rratio { } ). The \gls { ml} methods executed were
executed were \gls { rf} , \gls { gbm} , \gls { cif} , \gls { lasso} , \gls { plsr} ,
\gls { rf} , \gls { gbm} , \gls { cif} , \gls { lasso} , \gls { plsr} , \gls { svm} , and
\gls { svm} , and DataModeler’ s \gls { sr} \footnote { \gls { sr} was performed by Theresa
DataModeler’ s \gls { sr} \footnote { \gls { sr} was performed by Theresa Kotanchek at
Kotanchek at Evolved Analytics, \gls { rf} , \gls { gbm} , \gls { cif} , \gls { plsr} ,
Evolved Analytics, \gls { rf} , \gls { gbm} , \gls { cif} , \gls { plsr} , \gls { svm} were
\gls { svm} were performed by Valerie Odeh-Couvertier at UPRM. Methods included
performed by Valerie Odeh-Couvertier at UPRM. Methods included here for
here for reference} . Primarily, \gls { sr} models were used to optimize process
reference} . Primarily, \gls { sr} models were used to optimize process parameter
parameter values based on \ptmem { } phenotype and to extract early predictive
values based on \ptmem { } phenotype and to extract early predictive variable
variable combinations from the multi-omics experiments. Furthermore, all
combinations from the multi-omics experiments. Furthermore, high-performing
regression methods were executed, and the high-performing models were used to
models from each method were used in consensus analysis to extract potential
perform a consensus analysis of the important variables to extract potential
\glspl { cqa} and \glspl { cpp} predictive of T cell potency, safety, and
critical quality attributes and critical process parameters predictive of T cell
consistency at the early stages of the manufacturing process.
potency, safety, and consistency at the early stages of the manufacturing
process.
\gls { sr} was done using Evolved Analytics’ DataModeler software (Evolved
\gls { sr} was done using Evolved Analytics’ DataModeler software (Evolved
Analytics LLC, Midland, MI). DataModeler utilizes genetic programming to evolve
Analytics LLC, Midland, MI). DataModeler utilizes genetic programming to evolve
symbolic regression models (both linear and non-linear) rewarding simplicity and
symbolic regression models (both linear and non-linear) rewarding simplicity and
accuracy. Using the selection criteria of highest accuracy
accuracy. Using the selection criteria of highest accuracy
($ R ^ 2 $ >\SI { 90} { \percent } ) and lowest complexity, the top-performing models were
($ R ^ 2 > \SI { 90 } { \percent } $ ) and lowest complexity, the top-performing models were
identified. Driving variables, variable combinations, and model dimensionality
identified. Driving variables, variable combinations, and model dimensionality
tables were generated. The top-performing variable combinations were used to
tables were generated. The top-performing variable combinations were used to
generate model ensembles. In this analysis, DataModeler’ s
generate model ensembles. In this analysis, DataModeler’ s
@ -3073,7 +3071,7 @@ values, potential optima in the responses, and regions of parameter values where
the predictions diverge the most.
the predictions diverge the most.
Non-parametric tree-based ensembles were done through the
Non-parametric tree-based ensembles were done through the
\inlinecode { randomForest} , inlinecode{ gbm} , and \inlinecode { cforest} regression
\inlinecode { randomForest} , \ inlinecode{ gbm} , and \inlinecode { cforest} regression
functions in R, for \gls { rf} , \gls { gbm} , and \gls { cif} models, respectively.
functions in R, for \gls { rf} , \gls { gbm} , and \gls { cif} models, respectively.
Both \gls { rf} and \gls { cif} construct multiple decision trees in parallel, by
Both \gls { rf} and \gls { cif} construct multiple decision trees in parallel, by
randomly choosing a subset of features at each decision tree split, in the
randomly choosing a subset of features at each decision tree split, in the
@ -3117,8 +3115,8 @@ model with \gls{loocv} tuned parameters.
Consensus analysis of the relevant variables extracted from each machine
Consensus analysis of the relevant variables extracted from each machine
learning model was done to identify consistent predictive features of quality at
learning model was done to identify consistent predictive features of quality at
the early stages of manufacturing. First importance scores for all features were
the early stages of manufacturing. First, importance scores for all features
measured across all \gls { ml} models using \inlinecode { varImp} with
were measured across all \gls { ml} models using \inlinecode { varImp} with
\inlinecode { caret} R package except for scores for \gls { svm} which
\inlinecode { caret} R package except for scores for \gls { svm} which
\inlinecode { rminer} R package was used. These importance scores were percent
\inlinecode { rminer} R package was used. These importance scores were percent
increase in \gls { mse} , relative importance through average increase in
increase in \gls { mse} , relative importance through average increase in
@ -3130,26 +3128,25 @@ respectively. Using these scores, key predictive variables were selected if
their importance scores were within the \nth { 80} percentile ranking for the
their importance scores were within the \nth { 80} percentile ranking for the
following \gls { ml} methods: \gls { rf} , \gls { gbm} , \gls { cif} , \gls { lasso} ,
following \gls { ml} methods: \gls { rf} , \gls { gbm} , \gls { cif} , \gls { lasso} ,
\gls { plsr} , \gls { svm} while for \gls { sr} variables present in >\SI { 30} { \percent }
\gls { plsr} , \gls { svm} while for \gls { sr} variables present in >\SI { 30} { \percent }
of the top-performing \gls { sr} models from DataModeler ($ R ^ 2 \ge $
of the top-performing \gls { sr} models from DataModeler
\SI { 90} { \percent } , Complexity $ \ge $ 100) were chosen to investigate consensus
($ R ^ 2 \ge \SI { 90 } { \percent } $ , Complexity $ \ge 100 $ ) were chosen to investigate
except for \gls { nmr} media models at day 4 which considered a combination of the
consensus except for \gls { nmr} media models at day 4 which considered a
top-performing results of models excluding lactate ppms, and included those
combination of the top-performing results of models excluding lactate ppms, and
variables which were in >\SI { 40} { \percent } of the best performing models. Only
included those variables which were in >\SI { 40} { \percent } of the best performing
variables with those high percentile scoring values were evaluated in terms of
models. Only variables with high percentile scoring values were evaluated in
their logical relation (intersection across \gls { ml} models) and depicted using
terms of their logical relation (intersection across \gls { ml} models) and
a Venn diagram from the \inlinecode { venn} R package.
depicted using a Venn diagram from the \inlinecode { venn} R package.
\section { Results}
\section { Results}
\subsection { DMSs Grow T Cells With Lower IL2 Concentrations}
\subsection { DMSs Grow T Cells With Lower IL2 Concentrations}
Prior to the main experiments in this aim, we performed a preliminary experiment
Prior to the main experiments in this aim, we assessed the effect of lowering
to assess the effect of lowering the \gls { il2} concentration on the T cells
the \gls { il2} concentration on the T cells grown with either bead or \gls { dms} .
grown with either bead or \gls { dms} . One of the hypotheses for the \gls { dms}
One of our hypotheses for the \gls { dms} system was that higher cell density
system was that the higher cell density would enable more efficient cross-talk
would enhance cross-talk between T cells. Since \gls { il2} is secreted by
between T cells. Since \gls { il2} is secreted by activated T cells themselves,
activated T cells themselves, T cells in the \gls { dms} system may need less or
T cells in the \gls { dms} system may need less or no \gls { il2} if this hypothesis
no \gls { il2} if this is true.
were true.
\begin { figure*} [ht!]
\begin { figure*} [ht!]
\begingroup
\begingroup
@ -3164,7 +3161,7 @@ were true.
\caption [T Cells Grown at Varying IL2 Concentrations]
\caption [T Cells Grown at Varying IL2 Concentrations]
{ \glspl { dms} grow T cells effectively at lower IL2 concentrations.
{ \glspl { dms} grow T cells effectively at lower IL2 concentrations.
\subcap { fig:il2_ mod_ timecourse} { Longitudinal cell counts of T cells grown
\subcap { fig:il2_ mod_ timecourse} { Longitudinal cell counts of T cells grown
with either bead or \glspl { dms} using varying IL2 concentrations}
with either bead or \glspl { dms} using varying IL2 concentrations. }
Day 14 counts of either \subcap { fig:il2_ mod_ total} { total cells} or
Day 14 counts of either \subcap { fig:il2_ mod_ total} { total cells} or
\subcap { fig:il2_ mod_ mem} { \ptmem { } cells} plotted against \gls { il2}
\subcap { fig:il2_ mod_ mem} { \ptmem { } cells} plotted against \gls { il2}
concentration.
concentration.
@ -3179,14 +3176,9 @@ expanded T cells as described in \cref{sec:tcellculture}. T cells grown with
either method expanded robustly as \gls { il2} concentration was increased
either method expanded robustly as \gls { il2} concentration was increased
(\cref { fig:il2_ mod_ timecourse} ). Surprisingly, neither the bead or the \gls { dms}
(\cref { fig:il2_ mod_ timecourse} ). Surprisingly, neither the bead or the \gls { dms}
group expanded at all with \SI { 0} { \IU \per \ml } \gls { il2} . When examining the
group expanded at all with \SI { 0} { \IU \per \ml } \gls { il2} . When examining the
endpoint fold change after \SI { 14} { \day } , we observe that the difference between
endpoint fold change after \SI { 14} { \day } , we observed that the difference
the bead and \gls { dms} appears to be greater at lower \gls { il2} concentrations
between the bead and \gls { dms} appears to be greater at lower \gls { il2}
(\cref { fig:il2_ mod_ total} ).
concentrations (\cref { fig:il2_ mod_ total} ). Furthermore, the same trend can be
% This is further supported by fitting a non-linear
% least squares equation to the data following a hyperbolic curve (which should be
% a plausible model given that this curve describes receptor-ligand kinetics,
% which we can assume \gls { il2} to follow).
Furthermore, the same trend can be
seen when only examining the \ptmem { } cell expansion at day 14
seen when only examining the \ptmem { } cell expansion at day 14
(\cref { fig:il2_ mod_ mem} ). In this case, the \ptmemp { } of the T cells seemed to
(\cref { fig:il2_ mod_ mem} ). In this case, the \ptmemp { } of the T cells seemed to
be relatively close at higher \gls { il2} concentrations, but separated further at
be relatively close at higher \gls { il2} concentrations, but separated further at
@ -3196,16 +3188,24 @@ Taken together, these data do not support the hypothesis that the \gls{dms}
system does not need \gls { il2} at all; however, it appears to have a modest
system does not need \gls { il2} at all; however, it appears to have a modest
advantage at lower \gls { il2} concentrations compared to beads. For this reason,
advantage at lower \gls { il2} concentrations compared to beads. For this reason,
we decided to investigate the lower range of \gls { il2} concentrations starting
we decided to investigate the lower range of \gls { il2} concentrations starting
at \SI { 10} { \IU \per \ml } throughout the remainder of this aim.
at \SI { 10} { \IU \per \ml } in the remainder of this aim.
\subsection { DOE Shows Optimal Conditions for Potent T Cells}
\subsection { DOE Shows Optimal Conditions for Potent T Cells}
% TABLE not all of these were actually used, explain why by either adding columns
\begin { table} [!h]
% or marking with an asterisk
\centering
\begin { table} [!h] \centering
\begin { threeparttable}
\caption { DOE Runs}
\caption { DOE Runs}
\label { tab:doe_ runs}
\label { tab:doe_ runs}
\input { ../tables/doe_ runs.tex}
\input { ../tables/doe_ runs.tex}
\begin { tablenotes}
\item [a] It was determined later that the total \glspl { mab} surface density
may not be consistent across each batch of \gls { dms} used. Thus, these
runs were taken out as they were created at different scale and with a
different operator compared to the rest. Leaving them in may produce
unobserved confounding factors
\end { tablenotes}
\end { threeparttable}
\end { table}
\end { table}
\begin { figure*} [ht!]
\begin { figure*} [ht!]
@ -3224,38 +3224,29 @@ at \SI{10}{\IU\per\ml} throughout the remainder of this aim.
\label { fig:doe_ response_ first}
\label { fig:doe_ response_ first}
\end { figure*}
\end { figure*}
% RESULT maybe add regression tables to this, although it doesn't really matter
% since we end up doing regression on the full thing later anyways.
We conducted two consecutive \glspl { doe} to optimize the \pth { } and \ptmem { }
We conducted two consecutive \glspl { doe} to optimize the \pth { } and \ptmem { }
responses for the \gls { dms} system. In the first \gls { doe} we, tested \pilII { } in
responses for the \gls { dms} system. In the first, we tested \pilII { } in the
the range of \SIrange { 10} { 30} { \IU \per \ml } , \pdms { } in the range of
range of \SIrange { 10} { 30} { \IU \per \ml } , \pdms { } in the range of
\SIrange { 500} { 2500} { \dms \per \ml } , and \pmab { } in the range of
\SIrange { 500} { 2500} { \dms \per \ml } , and \pmab { } in the range of
\SIrange { 60} { 100} { \percent } . When looking at the total \ptmemp { } output, we
\SIrange { 60} { 100} { \percent } . When looking at total \ptmemp { } cells, \pilII { }
observed that \pilII { } showed a positive linear trend with the \pdms { } and
showed a positive linear trend and \pdms { } and \pmab { } showed possible
\pmab { } showing possible second-order effects with maximums and minimums at the
second-order effects with intermediate maximums and minimums respectively
intermediate level (\cref { fig:doe_ response_ first_ mem} ). In the case of \pth { } ,
(\cref { fig:doe_ response_ first_ mem} ). In the case of \pth { } , all parameters
we observed that all parameters seemed to have a positive linear response, with
showed a positive, suggesting a maximum might exist at a higher value for each.
\pilII { } and \pdms { } showing slight second order effects that suggest a maximum
might exist at a higher value for each.
After performing the first \gls { doe} we augmented the original design matrix
After performing the first \gls { doe} , we augmented the original design matrix
with an \gls { adoe} which was built with three goals in mind. Firstly we wished
with an \gls { adoe} which was built with three goals in mind. Firstly we wished
to validate the first \gls { doe} by assessing the strength and responses of each
to validate the first \gls { doe} by assessing the strength and responses of each
effect. Secondly, we wished to improve our confidence in regions that showed
effect. Secondly, we wished to improve our confidence in regions that showed
high complexity, such as the peak in the \gls { dms} concentration for the total
high complexity, such as the peak in the \gls { dms} concentration for the total
\ptmem { } cell response. Thirdly, we wished to explore additional ranges of each
\ptmem { } cell response. Thirdly, we wished to explore additional ranges of each
response. Since \pilII { } and \pdms { } appeared to continue positively influence
response. Notably, \pilII { } appeared to increase beyond our tested range, thus
multiple responses beyond our tested range, we were curious if there was an
we were curious if there was an optimum at some higher setting. For this reason,
optimum at some higher setting of either of these values. For this reason, we
we increased the \pilII { } to include \SI { 40} { \IU \per \ml } and the \pdms { } to
increased the \pilII { } to include \SI { 40} { \IU \per \ml } and the \pdms { } to
\SI { 3500} { \dms \per \ml } . Note that it was impossible to go beyond
\SI { 3500} { \dms \per \ml } . Note that it was impossible to go beyond
\SI { 100} { \percent } for the \pmab { } , so runs were positioned for this parameter
\SI { 100} { \percent } for the \pmab { } , so runs were positioned for this parameter
with validation and confidence improvements in mind. The runs for each \gls { doe}
with validation and confidence improvements in mind. The runs for each \gls { doe}
were shown in \cref { tab:doe_ runs} \footnote { Not all runs in this table were used.
were shown in \cref { tab:doe_ runs} .
It was determined later that the total \glspl { mab} surface density may not be
consistent across each batch of \gls { dms} used, primarily due to the fact that a
subset were created at different scale and with a different operator. To remove
this bias in our data, these runs were not used.} .
\begin { figure*} [ht!]
\begin { figure*} [ht!]
\begingroup
\begingroup
@ -3329,10 +3320,10 @@ responses showed mostly linear relationships in all parameter cases
% anything to be significant
% anything to be significant
We performed linear regression on the three input parameters as well as a binary
We performed linear regression on the three input parameters as well as a binary
parameter representing if a given run came from the first or second \gls { doe}
parameter representing if a given run came from the first or second \gls { doe}
(called `dataset'). Starting with the total \ptmem { } cells response, we fit a
(called `` dataset' '). Starting with the total \ptmem { } cells response, we fit a
first order regression model using these four parameters
first order regression model using these four parameters
(\cref { tab:doe_ mem1.tex} ). While \pilII { } was found to be a significant
(\cref { tab:doe_ mem1.tex} ). While \pilII { } was found to be a significant
predictor, the model fit was extremely poor ($ R ^ 2 $ of 0.331 ). This was not
predictor, the model fit was extremely poor ($ R ^ 2 = 0 . 331 $ ). This was not
surprising given the apparent complexity of this response
surprising given the apparent complexity of this response
(\cref { fig:doe_ responses_ mem} ). To obtain a better fit, we added second and
(\cref { fig:doe_ responses_ mem} ). To obtain a better fit, we added second and
third degree terms (\cref { tab:doe_ mem2.tex} ). Note that the dataset parameter
third degree terms (\cref { tab:doe_ mem2.tex} ). Note that the dataset parameter
@ -3350,9 +3341,8 @@ that our data might be underpowered for a model this complex. Further
experiments beyond what was performed here may be needed to fully describe this
experiments beyond what was performed here may be needed to fully describe this
response.
response.
% TABLE combine these tables into one
We performed linear regression on the other three responses, all of which
We performed linear regression on the other three responses, all of which
performed much better than the \ptmem { } response as expected given the much
performed much better than the \ptmem { } response as expected given the
lower apparent complexity in the response plots
lower apparent complexity in the response plots
(\cref { fig:doe_ responses_ cd4,fig:doe_ responses_ mem4,fig:doe_ responses_ ratio} ).
(\cref { fig:doe_ responses_ cd4,fig:doe_ responses_ mem4,fig:doe_ responses_ ratio} ).
All these models appeared to fit will, with $ R ^ 2 $ and $ R _ { adj } ^ 2 $ upward of
All these models appeared to fit will, with $ R ^ 2 $ and $ R _ { adj } ^ 2 $ upward of
@ -3380,11 +3370,10 @@ significant predictors.
We then visualized the total \ptmemh { } cells and \rratio { } using the response
We then visualized the total \ptmemh { } cells and \rratio { } using the response
explorer in DataModeler to create contour plots around the maximum responses.
explorer in DataModeler to create contour plots around the maximum responses.
For both, it appeared that maximizing all three input parameters resulted in the
For both, maximizing all input parameters maximized both responses
maximum value for either response (\cref { fig:doe_ sr_ contour} ). While not all
(\cref { fig:doe_ sr_ contour} ). While not all combinations at and around this
combinations at and around this optimum were tested, the model nonetheless
optimum were tested, these plots suggest that there were no other optimal values
showed that there were no other optimal values or regions elsewhere in the
elsewhere.
model.
\subsection { Modeling with Machine Learning Reveals Putative CQAs}
\subsection { Modeling with Machine Learning Reveals Putative CQAs}
@ -3407,16 +3396,15 @@ features of quality early in their expansion process.
\label { fig:doe_ luminex}
\label { fig:doe_ luminex}
\end { figure*}
\end { figure*}
We collected secretome data via luminex for days 4, 6, 8, 11, and 14.
We collected secretome data via luminex for days 4, 6, 8, 11, and 14. Plotting
Plotting the concentrations of these cytokines showed a large variation over all
the concentrations of these cytokines showed a large variation over all runs and
runs and between different timepoints, demonstrated that these could potentially
between different timepoints, demonstrating that these could be used to
be used to differentiate between different process conditions qualitatively
differentiate between different process conditions qualitatively simply based on
simply based on variance (\cref { fig:doe_ luminex} ). These were also much higher
variance (\cref { fig:doe_ luminex} ). These were also much higher in most cases
in most cases that a set of bead based runs which were run in parallel, in
that a set of bead based runs which were run in parallel, in agreement with the
agreement with the luminex data obtained previously in the Grex system (these
luminex data obtained previously in the Grex system (these data were collected
data were collected in plates) (\cref { fig:grex_ luminex} ).
in plates) (\cref { fig:grex_ luminex} ).
% TABLE this table looks like crap, break it up into smaller tables
\begin { table} [!h] \centering
\begin { table} [!h] \centering
\caption [Machine Learning Model Results]
\caption [Machine Learning Model Results]
{ Results for \gls { ml} modeling using process parameters (PP) with
{ Results for \gls { ml} modeling using process parameters (PP) with
@ -3428,15 +3416,15 @@ data were collected in plates) (\cref{fig:grex_luminex}).
\end { table}
\end { table}
\gls { sr} models achieved the highest predictive performance
\gls { sr} models achieved the highest predictive performance
($ R ^ 2 $ >\SI { 93} { \percent } ) when using multi-omics predictors for all endpoint
($ R ^ 2 > \SI { 93 } { \percent } $ ) when using multi-omics predictors for all endpoint
responses (\cref { tab:mod_ results} ). \gls { sr} achieved $ R ^ 2 $ >\SI { 98} { \percent }
responses (\cref { tab:mod_ results} ). \gls { sr} achieved $ R ^ 2 > \SI { 98 } { \percent } $
while \gls { gbm} ensembles showed \gls { loocv} $ R ^ 2 $ > \SI { 95} { \percent } for
while \gls { gbm} ensembles showed \gls { loocv} $ R ^ 2 > \SI { 95 } { \percent } $ for
\rmemh { } and \rmemk { } responses. Similarly, \gls { lasso} , \gls { plsr} , and
\rmemh { } and \rmemk { } responses. Similarly, \gls { lasso} , \gls { plsr} , and
\gls { svm} methods showed consistently high \gls { loocv} , (\SI { 92.9} { \percent } ,
\gls { svm} methods showed consistently high \gls { loocv} , (\SI { 92.9} { \percent } ,
\SI { 99.7} { \percent } , and \SI { 90.5} { \percent } respectively), to predict the
\SI { 99.7} { \percent } , and \SI { 90.5} { \percent } respectively), to predict the
\rratio { } . Yet, about \SI { 10} { \percent } reduction in \gls { loocv} ,
\rratio { } . Yet, about \SI { 10} { \percent } reduction in \gls { loocv} ,
\SIrange { 72.5} { 81.7} { \percent } , was observed for \rmemh { } with these three
\SIrange { 72.5} { 81.7} { \percent } , was observed for \rmemh { } with these three
methods. Lastly, \gls { sr} and \gls { plsr} achieved $ R ^ 2 $ >\SI { 90} { \percent } while
methods. Lastly, \gls { sr} and \gls { plsr} achieved $ R ^ 2 > \SI { 90 } { \percent } $ while
other \gls { ml} methods exhibited exceedingly variable \gls { loocv}
other \gls { ml} methods exhibited exceedingly variable \gls { loocv}
(\SI { 0.3} { \percent } for \gls { rf} to \SI { 51.5} { \percent } for \gls { lasso} ) for
(\SI { 0.3} { \percent } for \gls { rf} to \SI { 51.5} { \percent } for \gls { lasso} ) for
\rmemk { } .
\rmemk { } .
@ -3485,18 +3473,13 @@ methods for predicting \rratio{} when considering features with the highest
importance scores across models (\cref { fig:mod_ flower_ 48r} ). Other features,
importance scores across models (\cref { fig:mod_ flower_ 48r} ). Other features,
IL2R, IL4, IL17a, and \pdms { } , were commonly selected in $ \ge $ 5 \gls { ml}
IL2R, IL4, IL17a, and \pdms { } , were commonly selected in $ \ge $ 5 \gls { ml}
methods (\cref { fig:mod_ flower_ 48r} ). When restricting the models only to include
methods (\cref { fig:mod_ flower_ 48r} ). When restricting the models only to include
metabolome, formate emerged as the dominant predictor shared across all seven
metabolome, formate was the sole predictor shared by all.
models.
% Moreover, IL13 and IL15 were found predictive in combination
When performing similar analysis on \rmemh { } , no species for either secretome or
% with these using \gls { sr} (Supp.Table.S4).
metabolome was shared by all models (\cref { fig:mod_ flower_ cd4} ). These models
also had worse fits compared to those for \rratio { } (\cref { tab:mod_ results} ).
When performing similar analysis on \rmemh { } , we observe that no species for
For the secretome, IL4, IL17a, and IL2R were agreed upon by $ \ge $ 5 models. For
either the secretome or metabolome was agreed upon by all seven models
the metabolome, formate once again was shared by $ \ge $ 5 models as well as
(\cref { fig:mod_ flower_ cd4} ). We also observed that these models did not fit as
well as they did for \rratio { } (\cref { tab:mod_ results} ). For the secretome, the
species that were agreed upon by $ \ge $ 5 models were IL4, IL17a, and IL2R. For
the metabolome, formate once again was agreed upon by $ \ge $ 5 models as well as
lactate.
lactate.
\begin { figure*} [ht!]
\begin { figure*} [ht!]
@ -3523,12 +3506,12 @@ lactate.
\label { fig:nmr_ cors}
\label { fig:nmr_ cors}
\end { figure*}
\end { figure*}
We also investigated the \gls { nmr} features extracted from day of expansion to
We also asked if day 4 \gls { nmr} features could predict \ptmemh { } ; these models
assess if there was any predictive power for \ptmemh { } ; in general these model s
generally fit well despite being 2 days earlier in the proces s
had almost as good of fit despite being 2 days earlier in the process
(\cref { fig:nmr_ cors} )\footnote { for anyone wondering why we don't have the
(\cref { fig:nmr_ cors} ). Lactate and formate were observed to correlate with each
matching secretome data for day 4, blame UPS for losing our samples} . Lactate
other, and both correlate d with \rmemh { } . Furthermore, lactate was observed to
and formate correlated with each other an d with \rmemh { } . Furthermore, lactate
positively correlate with \pdms { } and negatively correlate with glucose
positively correlated with \pdms { } and negatively correlated with glucose
(\cref { fig:nmr_ cors_ lactate} ). Formate also had the same correlation patterns
(\cref { fig:nmr_ cors_ lactate} ). Formate also had the same correlation patterns
(\cref { fig:nmr_ cors_ formate} ). Glucose was only negatively correlated with
(\cref { fig:nmr_ cors_ formate} ). Glucose was only negatively correlated with
formate and lactate (\cref { fig:nmr_ cors_ glucose} ). Together, these data suggest
formate and lactate (\cref { fig:nmr_ cors_ glucose} ). Together, these data suggest
@ -3537,38 +3520,34 @@ that lactate, formate, \pdms{}, and \rmemh{} are fundamentally linked.
\section { Discussion}
\section { Discussion}
\gls { cpp} modeling and understanding are critical to new product development and
\gls { cpp} modeling and understanding are critical to new product development and
in cell therapy development, it can have life-saving implications. The
have life-saving implications in the context of cell therapy. The challenges for
challenges for effective modeling grow with the increasing complexity of
effective modeling grow with the increasing process complexity due to high
processes due to high dimensionality, and the potential for process interactions
dimensionality, interactions between parameters, nonlinearity. Another critical
and nonlinear relationships. Another critical challenge is the limited amount of
challenge is the limited amount of available data. \gls { sr} has the necessary
available data, mostly small \gls { doe} datasets. \gls { sr} has the necessary
capabilities to resolve the issues of process effects modeling and has been
capabilities to resolve the issues of process effects modeling and has been
applied across multiple industries\cite { Kordona} . \gls { sr} discovers
applied across multiple industries\cite { Kordona} . \gls { sr} discovers
mathematical expressions that fit a given sample and differs from conventional
mathematical expressions that fit a given sample and differs from conventional
regression techniques in that a model structure is not defined \textit { a
regression techniques in that a model structure is not defined \textit { a
priori} \cite { Koza1992} . Hence, a key advantage of this methodology is that
priori} \cite { Koza1992} . Hence, a key advantage of this methodology is that
transparent, human-interpretable models can be generated from small and large
transparent, human-interpretable models can be generated from small and large
datasets with no prior assumptions\cite { Kotancheka} .
datasets with few prior assumptions\cite { Kotancheka} .
Since the model search process lets the data determine the model, diverse and
Since the model search process lets the data determine the model, diverse and
competitive model structures are typically discovered. An ensemble of diverse
competitive model structures are typically discovered. An diverse ensemble will
models can be formed where its constituent models will tend to agree when
contain models that agree in regions constrained by observable data and diverge
constrained by observed data yet diverge in new regions. Collecting data in
in regions without data. Collecting data in divergent regions ensures the system
these regions helps to ensure that the target system is accurately modeled, and
is accurately modeled and its optimum accurately located\cite { Kotancheka} .
its optimum is accurately located\cite { Kotancheka} . Exploiting these features
Consequently, this \gls { adoe} approach is useful in a many scenarios, including
allows adaptive data collection and interactive modeling. Consequently, this
maximizing model validity for model-based decision making, optimizing processing
\gls { adoe} approach is useful in a variety of scenarios, including maximizing
parameters to maximize yield, and developing emulators for online optimization
model validity for model-based decision making, optimizing processing parameters
and human understanding\cite { Kotancheka} .
to maximize target yields, and developing emulators for online optimization and
human understanding\cite { Kotancheka} .
An in-depth characterization of potential \gls { dms} based T cell \glspl { cqa}
An in-depth characterization of potential \gls { dms} based T cell \glspl { cqa}
includes a list of cytokine and \gls { nmr} features from media samples that are
includes a list of cytokine and \gls { nmr} features from media samples that are
crucial in many aspects of T cell fate decisions and effector functions of
crucial in many aspects of T cell fate decisions and effector functions of
immune cells. Cytokine features were observed to slightly improve prediction and
immune cells. Cytokine features slightly improved prediction and dominated the
dominated the ranking of important features and variable combinations when
ranking of important features and variable combinations when modeling together
modeling together with \gls { nmr} media analysis and process parameters
with \gls { nmr} media analysis and process parameters (\cref { fig:mod_ flower} ).
(\cref { fig:mod_ flower} ).
Predictive cytokine features such as \gls { tnfa} , IL2R, IL4, IL17a, IL13, and
Predictive cytokine features such as \gls { tnfa} , IL2R, IL4, IL17a, IL13, and
IL15 were biologically assessed in terms of their known functions and activities
IL15 were biologically assessed in terms of their known functions and activities
@ -3577,35 +3556,35 @@ cells, as per their main functions, and activated T cells secrete more cytokines
than resting T cells. It is possible that some cytokines simply reflect the
than resting T cells. It is possible that some cytokines simply reflect the
\rratio { } and the activation degree by proxy proliferation. However, the exact
\rratio { } and the activation degree by proxy proliferation. However, the exact
ratio of expected cytokine abundance is less clear and depends on the subtypes
ratio of expected cytokine abundance is less clear and depends on the subtypes
present, and thus examination of each relevant cytokine is needed.
present, thus examination of each relevant cytokine is needed.
IL2R is secreted by activated T cells and binds to IL2, acting as a sink to
IL2R is secreted by activated T cells and binds to IL2, acting as a sink to
dampen its effect on T cells\cite { Witkowska2005} . Since IL2R was much greater
dampen its effect on T cells\cite { Witkowska2005} . Since IL2R was more abundant
than IL2 in solution, this might reduce the overall effect of IL2, which could
than IL2 in solution, this might reduce the overall effect of IL2, which could
be further investigated by blocking IL2R with an antibody. In T cells, TNF can
be further investigated by blocking IL2R with an antibody. In T cells, TNF can
increase IL2R, proliferation, and cytokine production\cite { Mehta2018} . It may
increase IL2R, proliferation, and cytokine production\cite { Mehta2018} . It may
also induce apoptosis depending on concentration and alter the CD4+ to CD8+
also induce apoptosis depending on concentration and alter the CD4:CD8
ratio\cite { Vudattu2005} . Given that TNF has both a soluble and membrane-bound
ratio\cite { Vudattu2005} . Given that TNF has both a soluble and membrane-bound
form, this may either increase or decrease CD4+ ratio and/or memory T cells
form, this may either increase or decrease CD4:CD8 ratio and/or memory T cells
depending on the ratio of the membrane to soluble TNF\cite { Mehta2018} . Since
depending on the ratio of the membrane to soluble TNF\cite { Mehta2018} . Since
only soluble TNF was measured, membrane TNF is needed to understand its impact
only soluble TNF was measured, membrane TNF is needed to understand its impact
on both CD4+ ratio and memory T cells. Furthermore, IL13 is known to be critical
on both CD4:CD8 ratio and memory T cells. Furthermore, IL13 is known to be
for \gls { th2} response and therefore could be secreted if there are significant
critical for \gls { th2} response and therefore could be secreted if there are
\glspl { th2} already present in the starting population\cite { Wong2011} . This
significant \glspl { th2} already present in the starting
cytokine has limited signaling in T cells and is thought to be more of an
population\cite { Wong2011} . This cytokine has limited signaling in T cells and is
effector than a differentiation cytokine\cite { Junttila2018} . It might be
thought to be more of an effector than a differentiation
emerging as relevant due to an initially large number of \glspl { th2} or becaus e
cytokine\cite { Junttila2018} . It might be emerging here due to an initially larg e
\glspl { th2} were preferentially expanded; indeed, IL4, also found important, is
number of \glspl { th2} or because \glspl { th2} were preferentially expanded;
the conical cytokine that induces \gls { th2} differentiation
indeed, IL4, also found important, is the can onical cytokine that induces
(\cref { fig:mod_ flower} ). The role of these cytokines could be investigated by
\gls { th2} differentiation (\cref { fig:mod_ flower} ). The role of these cytokines
quantifying \glspl { th1} , \glspl { th2} , or \glspl { th17} both in the starting
could be investigated by quantifying \glspl { th1} , \glspl { th2} , or \glspl { th17}
population and longitudinally. Similar to IL13, IL17 is an effector cytokine
both in the starting population and longitudinally. Similar to IL13, IL17 is an
produced by \glspl { th17} \cite { Amatya2017} thus may reflect the number of
effector cytokine produced by \glspl { th17} \cite { Amatya2017} thus may reflect the
\glspl { th17} in the population. GM-CSF has been linked with activated T cells,
number of \glspl { th17} in the population. GM-CSF has been linked with activated
specifically \glspl { th17} , but it is not clear if this cytokine is inducing
T cells, specifically \glspl { th17} , but it is not clear if this cytokine is
differential expansion of CD8+ T cells or if it is simply a covariate with
inducing differential expansion of CD8+ T cells or if it is simply a covariate
another cytokine inducing this expansion\cite { Becher2016} . Finally, IL15 has
with another cytokine inducing this expansion\cite { Becher2016} . Finally, IL15
been shown to be essential for memory signaling and effective in skewing
has been shown to be essential for memory signaling and effective in skewing
\gls { car} T cells toward \glspl { tscm} when using membrane-bound IL15Ra and
\gls { car} T cells toward \glspl { tscm} when using membrane-bound IL15Ra and
IL15R\cite { Hurton2016} . Its high predictive behavior goes with its ability to
IL15R\cite { Hurton2016} . Its high predictive behavior goes with its ability to
induce large numbers of memory T cells by functioning in an autocrine/paracrine
induce large numbers of memory T cells by functioning in an autocrine/paracrine
@ -3616,24 +3595,24 @@ activity associated with T cell activation and differentiation, yet it is not
clear how the various combinations of metabolites relate with each other in a
clear how the various combinations of metabolites relate with each other in a
heterogeneous cell population. Formate and lactate were found to be highly
heterogeneous cell population. Formate and lactate were found to be highly
predictive and observed to positively correlate with higher values of total live
predictive and observed to positively correlate with higher values of total live
\rmemh { } cells (~ \cref { fig:nmr_ cors} ). Formate is a byproduct of the
\rmemh { } cells (\cref { fig:nmr_ cors} ). Formate is a byproduct of the one-carbon
one-carbon cycle implicated in promoting T cell activation\cite { RonHarel2016} .
cycle implicated in promoting T cell activation\cite { RonHarel2016} . Importantly,
Importantly, this cycle occurs between the cytosol and mitochondria of cells and
this cycle occurs between the cytosol and mitochondria, from which formate is
formate excreted\cite { Pietzke2020} . Mitochondrial biogenesis and function are
excreted\cite { Pietzke2020} . Mitochondrial biogenesis and function are shown to
shown necessary for memory cell persistence\cite { van_ der_ Windt_ 2012,
be necessary for memory cell persistence\cite { van_ der_ Windt_ 2012, Vardhana2020} .
Vardhana2020} . Therefore, increased formate in media could be an indicator of
Therefore, increased formate in media could be an indicator of one-carbon
one-carbon metabolism and mitochondrial activity in the culture.
metabolism and mitochondrial activity in the culture.
In addition to formate, lactate was found as a putative \gls { cqa} of \ptmem { }
In addition to formate, lactate was found as a putative \gls { cqa} of \ptmem { }
cells. Lactate is the end-product of aerobic glycolysis, characteristic of
cells. Lactate is the end-product of aerobic glycolysis, characteristic of
highly proliferating cells and activated T cells\cite { Lunt2011, Chang2013} .
highly proliferating cells and activated T cells\cite { Lunt2011, Chang2013} .
Glucose import and glycolytic genes are immediately upregulated in response to T
Glucose import and glycolytic genes are upregulated in response to T cell
cell stimulation, and thus generation of lactate. At earlier time-points, this
stimulation, thus leading to lactate. At earlier time-points, this abundance
abundance suggests a more robust induction of glycolysis and higher overall T
suggests a more robust induction of glycolysis and higher overall T cell
cell proliferation. Interestingly, our models indicate that higher lactate
proliferation. Interestingly, our models indicate that higher lactate predicts
predicts higher CD4+, both in total and in proportion to CD8+, seemingly
higher CD4+, both in total and in proportion to CD8+, seemingly contrary to
contrary to previous studies showing that CD8+ T cells rely more on glycolysis
previous studies showing that CD8+ T cells rely more on glycolysis for
for proliferation following activation\cite { Cao2014} . It may be that glycolytic
proliferation following activation\cite { Cao2014} . It may be that glycolytic
cells dominate in the culture at the early time points used for prediction, and
cells dominate in the culture at the early time points used for prediction, and
higher lactate reflects more cells.
higher lactate reflects more cells.
@ -3652,28 +3631,26 @@ confounded by the partial replacement of media that occurred periodically during
expansion, thus likely diluting some metabolic byproducts (such as formate,
expansion, thus likely diluting some metabolic byproducts (such as formate,
lactate) and elevating depleted precursors (such as glucose and amino acids).
lactate) and elevating depleted precursors (such as glucose and amino acids).
More definitive conclusions of metabolic activity across the expanding cell
More definitive conclusions of metabolic activity across the expanding cell
population can be addressed by a closed system, ideally with on-line process
population can be addressed by a closed system, ideally with on-line sensors and
sensors and controls for formate, lactate, along with ethanol and glucose.
controls for formate, lactate, ethanol, and glucose.
Practically, knowledge of how cytokines and/or metabolites are related to
Practically, knowledge of how cytokines and/or metabolites are related to
outcome can be utilized for process control, which involves measuring the
outcome can be utilized for process control, which involves measuring the
current state of the culture, comparing it to a desired state, and intervening
current state of the culture, comparing it to a desired state, and intervening
if it is outside an acceptable range. In the case of lactate and formate, a
if it is outside an acceptable range. In the case of lactate and formate, a
benchtop \gls { nmr} can be utilized to sample the media in real time during
benchtop \gls { nmr} can be tuned to quantify lactate and formate to sample the
culture. This \gls { nmr} can be tuned to automatically quantify the presence of
media in real time during culture. Formate is part of the one-carbon pathway,
lactate and formate. Formate is part of the one-carbon pathway, and thus culture
and thus culture fate may be controlled by altering the inputs to this pathway
fate may be controlled by altering the inputs to this pathway (glycine, serine,
(glycine, serine, choline) and/or adding folic acid inhibitors\cite { Ducker2017} .
choline) and/or adding folic acid inhibitors\cite { Ducker2017} . Since lactate is
Since lactate is a direct byproduct of glycolysis, this may be controlled by
a direct byproduct of glycolysis, this may be controlled by altering the
altering the concentration of glucose in solution. Each of these control schemes
concentration of glucose in solution. Each of these control schemes would need
would need further study to assess if they have enough precision and temporal
further study to assess if they have enough precision and temporal resolution to
resolution to reasonably ensure product quality. For cytokines, there is
reasonably ensure product quality. In the case of cytokines, there is currently
currently no analogue to a benchtop \gls { nmr} ; however, research is underway to
no analogue to a benchtop \gls { nmr} ; however, research is underway to develop
develop protein-specific sensors using aptamers\cite { Parolo2020} . Even without
protein-specific sensors using aptamers\cite { Parolo2020} . Even without these
these developments, \gls { elisa} or Luminex can still quantify cytokines in a
developments, one could still use \gls { elisa} or Luminex to assess protein
semi-automated manner. However, these are temporally discrete and impose a
levels in a semi-automated manner, but the disadvantage is that these assays are
non-trivial delay before the intervention can be performed.
temporally discrete and impose a significant time lag before the intervention
can be performed.
\chapter { AIM 2B} \label { aim2b}
\chapter { AIM 2B} \label { aim2b}