ADD methods to aim 2a
This commit is contained in:
parent
f6a24398ca
commit
6d9084fe92
256
tex/thesis.tex
256
tex/thesis.tex
|
@ -722,7 +722,7 @@ better retention of memory phenotype compared to current bead-based methods.
|
||||||
|
|
||||||
\section{methods}
|
\section{methods}
|
||||||
|
|
||||||
\subsection{dms functionalization}
|
\subsection{dms functionalization}\label{sec:dms_fab}
|
||||||
|
|
||||||
\begin{figure*}[ht!]
|
\begin{figure*}[ht!]
|
||||||
\begingroup
|
\begingroup
|
||||||
|
@ -778,14 +778,6 @@ was then manually counted to obtain a concentration. Surface area for
|
||||||
\si{\ab\per\um\squared} was calculated using the properties for \gls{cus} and
|
\si{\ab\per\um\squared} was calculated using the properties for \gls{cus} and
|
||||||
\gls{cug} as given by the manufacturer {Table X}.
|
\gls{cug} as given by the manufacturer {Table X}.
|
||||||
|
|
||||||
%TODO this bit belongs in the next aim
|
|
||||||
% In the case of the \gls{doe} experiment where
|
|
||||||
% variable mAb surface density was utilized, the anti-CD3/anti-CD28 mAb mixture
|
|
||||||
% was further combined with a biotinylated isotype control to reduce the overall
|
|
||||||
% fraction of targeted mAbs (for example the 60\% mAb surface density corresponded
|
|
||||||
% to 3 mass parts anti-CD3, 3 mass parts anti-CD8, and 4 mass parts isotype
|
|
||||||
% control).
|
|
||||||
|
|
||||||
\subsection{dms quality control assays}
|
\subsection{dms quality control assays}
|
||||||
|
|
||||||
Biotin was quantified using the \product{\gls{haba} assay}{Sigma}{H2153-1VL}. In
|
Biotin was quantified using the \product{\gls{haba} assay}{Sigma}{H2153-1VL}. In
|
||||||
|
@ -848,11 +840,6 @@ depending on media color or a \SI{300}{\mg\per\deci\liter} minimum glucose
|
||||||
threshold. Media glucose was measured using a \product{GlucCell glucose
|
threshold. Media glucose was measured using a \product{GlucCell glucose
|
||||||
meter}{Chemglass}{CLS-1322-02}.
|
meter}{Chemglass}{CLS-1322-02}.
|
||||||
|
|
||||||
% TODO this belongs in aim 2
|
|
||||||
% In order to remove \glspl{dms} from
|
|
||||||
% culture, collagenase D (Sigma Aldrich) was sterile filtered in culture media and
|
|
||||||
% added to a final concentration of \SI{50}{\ug\per\ml} during media addition.
|
|
||||||
|
|
||||||
Cells on the \glspl{dms} were visualized by adding \SI{0.5}{\ul}
|
Cells on the \glspl{dms} were visualized by adding \SI{0.5}{\ul}
|
||||||
\product{\gls{stppe}}{\bl}{405204} and \SI{2}{ul}
|
\product{\gls{stppe}}{\bl}{405204} and \SI{2}{ul}
|
||||||
\product{\acd{45}-\gls{af647}}{\bl}{368538}, incubating for \SI{1}{\hour}, and
|
\product{\acd{45}-\gls{af647}}{\bl}{368538}, incubating for \SI{1}{\hour}, and
|
||||||
|
@ -1047,7 +1034,7 @@ These equations were then used analogously to describe the reaction profile of
|
||||||
|
|
||||||
% METHOD add the equation governing the washing steps
|
% METHOD add the equation governing the washing steps
|
||||||
|
|
||||||
\subsection{Luminex Analysis}
|
\subsection{Luminex Analysis}\label{sec:luminex_analysis}
|
||||||
|
|
||||||
Luminex was performed using a \product{ProcartaPlex kit}{\thermo}{custom} for
|
Luminex was performed using a \product{ProcartaPlex kit}{\thermo}{custom} for
|
||||||
the markers outlined in \cref{tab:luminex_panel} with modifications (note that
|
the markers outlined in \cref{tab:luminex_panel} with modifications (note that
|
||||||
|
@ -1055,14 +1042,21 @@ some markers were run in separate panels to allow for proper dilutions).
|
||||||
Briefly, media supernatents from cells were sampled as desired and immediately
|
Briefly, media supernatents from cells were sampled as desired and immediately
|
||||||
placed in a \SI{-80}{\degreeCelsius} freezer until use. Before use, samples were
|
placed in a \SI{-80}{\degreeCelsius} freezer until use. Before use, samples were
|
||||||
thawed at \gls{rt} and vortexed to ensure homogeneity. To run the plate,
|
thawed at \gls{rt} and vortexed to ensure homogeneity. To run the plate,
|
||||||
\SI{25}{\ul} of magnetic beads were added to the plate and washed 3x using
|
\SI{25}{\ul} of magnetic beads were added to the plate and washed 3X using
|
||||||
\SI{300}{\ul} of wash buffer. \SI{25}{\ul} of samples or standard were added to
|
\SI{300}{\ul} of wash buffer. \SI{25}{\ul} of samples or standard were added to
|
||||||
the plate and incubated for \SI{120}{\minute} at \SI{850}{\rpm} at \gls{rt}
|
the plate and incubated for \SI{120}{\minute} at \SI{850}{\rpm} at \gls{rt}
|
||||||
before washing analogously 3X with wash. \SI{12.5}{\ul} detection \glspl{mab}
|
before washing analogously 3X with wash. \SI{12.5}{\ul} detection \glspl{mab}
|
||||||
and \SI{25}{\ul} \gls{stppe} were sequentially added, incubated for
|
and \SI{25}{\ul} \gls{stppe} were sequentially added, incubated for
|
||||||
\SI{30}{\minute} and vortexed, and washed analogously to the sample step.
|
\SI{30}{\minute} and vortexed, and washed analogously to the sample step.
|
||||||
Finally, samples were resuspended in \SI{120}{\ul} reading buffer and analyzed
|
Finally, samples were resuspended in \SI{120}{\ul} reading buffer and analyzed
|
||||||
via a Biorad Bioplex 200 plate reader.
|
via a BioRad Bioplex 200 plate reader. An 8 point log2 standard curve was used,
|
||||||
|
and all samples were run with single replicates.
|
||||||
|
|
||||||
|
Luminex data was preprocessed using R for inclusion in downstream analysis as
|
||||||
|
follows. Any cytokine level that was over-range (`OOR >' in output spreadsheet)
|
||||||
|
was set to the maximum value of the standard curve for that cytokine. Any value
|
||||||
|
that was under-range (`OOR <l in output spreadsheet) was set to zero. All values
|
||||||
|
that were extrapolated from the standard curve were left unchanged.
|
||||||
|
|
||||||
\begin{table}[!h] \centering
|
\begin{table}[!h] \centering
|
||||||
\caption{Luminex Panel}
|
\caption{Luminex Panel}
|
||||||
|
@ -1119,6 +1113,11 @@ lack-of-fit tests where replicates were present (to assess model fit in the
|
||||||
context of pure error). Statistical significance was evaluated at $\upalpha$ =
|
context of pure error). Statistical significance was evaluated at $\upalpha$ =
|
||||||
0.05.
|
0.05.
|
||||||
|
|
||||||
|
\subsection{flow cytometry}\label{sec:flow_cytometry}
|
||||||
|
|
||||||
|
% METHOD add flow cytometry
|
||||||
|
% FIGURE add gating strategy
|
||||||
|
|
||||||
\section{results}
|
\section{results}
|
||||||
|
|
||||||
\subsection{DMSs can be fabricated in a controlled manner}
|
\subsection{DMSs can be fabricated in a controlled manner}
|
||||||
|
@ -1944,6 +1943,229 @@ provide these benefits.
|
||||||
|
|
||||||
\section{introduction}
|
\section{introduction}
|
||||||
\section{methods}
|
\section{methods}
|
||||||
|
|
||||||
|
\subsection{study design}
|
||||||
|
|
||||||
|
The first DOE resulted in a randomized 18-run I-optimal custom design where each
|
||||||
|
DMS parameter was evaluated at three levels: IL2 concentration (10, 20, and 30
|
||||||
|
U/μL), DMS concentration (500, 1500, 2500 carrier/μL), and functionalized
|
||||||
|
antibody percent (60\%, 80\%, 100\%). These 18 runs consisted of 14 unique
|
||||||
|
parameter combinations where 4 of them were replicated twice to assess
|
||||||
|
prediction error. Process parameters for the ADOE were evaluated at multiple
|
||||||
|
levels: IL2 concentration (30, 35, and 40 U/μL), DMS concentration (500, 1000,
|
||||||
|
1500, 2000, 2500, 3000, 3500 carrier/μL), and functionalized antibody percent
|
||||||
|
(100\%) as depicted in Fig.1b. To further optimize the initial region explored
|
||||||
|
(DOE) in terms of total live CD4+ TN+TCM cells, a sequential adaptive
|
||||||
|
design-of-experiment (ADOE) was designed with 10 unique parameter combinations,
|
||||||
|
two of these replicated twice for a total of 12 additional samples (Fig.1b). The
|
||||||
|
fusion of cytokine and NMR profiles from media to model these responses included
|
||||||
|
30 cytokines from a custom Thermo Fisher ProcartaPlex Luminex kit and 20 NMR
|
||||||
|
features. These 20 spectral features from NMR media analysis were selected out
|
||||||
|
of approximately 250 peaks through the implementation of a variance-based
|
||||||
|
feature selection approach and some manual inspection steps.
|
||||||
|
|
||||||
|
\subsection{DMS fabrication}
|
||||||
|
|
||||||
|
\glspl{dms} were fabricated as described in \cref{sec:dms_fab} with the
|
||||||
|
following modifications in order to obtain a variable functional \gls{mab}
|
||||||
|
surface density. During the \gls{mab} coating step, the anti-CD3/anti-CD28 mAb
|
||||||
|
mixture was further combined with a biotinylated isotype control to reduce the
|
||||||
|
overall fraction of targeted \glspl{mab} (for example the \SI{60}{\percent}
|
||||||
|
\gls{mab} surface density corresponded to 3 mass parts \acd{3}, 3 mass parts
|
||||||
|
\acd{28}, and 4 mass parts isotype control).
|
||||||
|
|
||||||
|
\subsection{T cell culture}
|
||||||
|
|
||||||
|
T cell culture was performed as described in \cref{sec:tcellculture} with the
|
||||||
|
following modifications. At days 4, 6, 8, and 11, \SI{100}{\ul} media were
|
||||||
|
collected for the Luminex assay and \gls{nmr} analysis. The volume of removed
|
||||||
|
media was equivalently replaced during the media feeding step, which took place
|
||||||
|
immediately after sample collection. Additionally, the same media feeding
|
||||||
|
schedule was followed for the DOE and ADOE to improve consistency, and the same
|
||||||
|
donor lot was used for both experiments. All cell counts were performed using
|
||||||
|
\gls{aopi}.
|
||||||
|
|
||||||
|
\subsection{flow cytometry}
|
||||||
|
|
||||||
|
Flow cytometry was performed analogously to \cref{sec:flow_cytometry}.
|
||||||
|
|
||||||
|
\subsection{Cytokine quantification}
|
||||||
|
|
||||||
|
Cytokines were quantified via Luminex as described in
|
||||||
|
\cref{sec:luminex_analysis}.
|
||||||
|
|
||||||
|
% TODO paraphrase this entire section since I didn't do it
|
||||||
|
\subsection{NMR metabolomics}
|
||||||
|
|
||||||
|
Prior to analysis, samples were centrifuged at \SI{2990}{\gforce} for
|
||||||
|
\SI{20}{\minute} at \SI{4}{\degreeCelsius} to clear any debris. 5 μL of 100/3 mM
|
||||||
|
DSS-D6 in deuterium oxide (Cambridge Isotope Laboratories) were added to 1.7 mm
|
||||||
|
NMR tubes (Bruker BioSpin), followed by 45 μL of media from each sample that was
|
||||||
|
added and mixed, for a final volume of 50 μL in each tube. Samples were prepared
|
||||||
|
on ice and in predetermined, randomized order. The remaining volume from each
|
||||||
|
sample in the rack (∼4 μL) was combined to create an internal pool. This
|
||||||
|
material was used for internal controls within each rack as well as metabolite
|
||||||
|
annotation.
|
||||||
|
|
||||||
|
NMR spectra were collected on a Bruker Avance III HD spectrometer at 600 MHz
|
||||||
|
using a 5-mm TXI cryogenic probe and TopSpin software (Bruker BioSpin).
|
||||||
|
One-dimensional spectra were collected on all samples using the noesypr1d pulse
|
||||||
|
sequence under automation using ICON NMR software. Two-dimensional HSQC and
|
||||||
|
TOCSY spectra were collected on internal pooled control samples for metabolite
|
||||||
|
annotation.
|
||||||
|
|
||||||
|
One-dimensional spectra were manually phased and baseline corrected in TopSpin.
|
||||||
|
Two-dimensional spectra were processed in NMRpipe37. One dimensional spectra
|
||||||
|
were referenced, water/end regions removed, and normalized with the PQN
|
||||||
|
algorithm38 using an in-house MATLAB (The MathWorks, Inc.) toolbox
|
||||||
|
(https://github.com/artedison/Edison_Lab_Shared_Metabolomics_UGA).
|
||||||
|
|
||||||
|
To reduce the total number of spectral features from approximately 250 peaks and
|
||||||
|
enrich for those that would be most useful for statistical modeling, a
|
||||||
|
variance-based feature selection was performed within MATLAB. For each digitized
|
||||||
|
point on the spectrum, the variance was calculated across all experimental
|
||||||
|
samples and plotted. Clearly-resolved features corresponding to peaks in the
|
||||||
|
variance spectrum were manually binned and integrated to obtain quantitative
|
||||||
|
feature intensities across all samples (Supp.Fig.S24). In addition to highly
|
||||||
|
variable features, several other clearly resolved and easily identifiable
|
||||||
|
features were selected (glucose, BCAA region, etc). Some features were later
|
||||||
|
discovered to belong to the same metabolite but were included in further
|
||||||
|
analysis.
|
||||||
|
|
||||||
|
Two-dimensional spectra collected on pooled samples were uploaded to COLMARm web
|
||||||
|
server10, where HSQC peaks were automatically matched to database peaks. HSQC
|
||||||
|
matches were manually reviewed with additional 2D and proton spectra to confirm
|
||||||
|
the match. Annotations were assigned a confidence score based upon the levels of
|
||||||
|
spectral data supporting the match as previously described11. Annotated
|
||||||
|
metabolites were matched to previously selected features used for statistical
|
||||||
|
analysis.
|
||||||
|
|
||||||
|
Using the list of annotated metabolites obtained above, an approximation of a
|
||||||
|
representative experimental spectrum was generated using the GISSMO mixture
|
||||||
|
simulation tool.39,40 With the simulated mixture of compounds, generated at 600
|
||||||
|
MHz to match the experimental data, a new simulation was generated at 80 MHz to
|
||||||
|
match the field strength of commercially available benchtop NMR spectrometers.
|
||||||
|
The GISSMO tool allows visualization of signals contributed from each individual
|
||||||
|
compound as well as the mixture, which allows annotation of features in the
|
||||||
|
mixture belonging to specific compounds.
|
||||||
|
|
||||||
|
Several low abundance features selected for analysis did not have database
|
||||||
|
matches and were not annotated. Statistical total correlation spectroscopy41
|
||||||
|
suggested that some of these unknown features belonged to the same molecules
|
||||||
|
(not shown). Additional multidimensional NMR experiments will be required to
|
||||||
|
determine their identity.
|
||||||
|
|
||||||
|
% TODO paraphrase most of this since I didn't do much of the analysis myself
|
||||||
|
\subsection{machine learning and statistical analysis}
|
||||||
|
|
||||||
|
Seven machine learning (ML) techniques were implemented to predict three
|
||||||
|
responses related to the memory phenotype of the cultured T cells under
|
||||||
|
different process parameters conditions (i.e. Total Live CD4+ TN and TCM, Total
|
||||||
|
Live CD8+ TN+TCM, and Ratio CD4+/CD8+ TN+TCM). The ML methods executed were
|
||||||
|
Random Forest (RF), Gradient Boosted Machine (GBM), Conditional Inference Forest
|
||||||
|
(CIF), Least Absolute Shrinkage and Selection Operator (LASSO), Partial
|
||||||
|
Least-Squares Regression (PLSR), Support Vector Machine (SVM), and DataModeler’s
|
||||||
|
Symbolic Regression (SR). Primarily, SR models were used to optimize process
|
||||||
|
parameter values based on TN+TCM phenotype and to extract early predictive
|
||||||
|
variable combinations from the multi-omics experiments. Furthermore, all
|
||||||
|
regression methods were executed, and the high-performing models were used to
|
||||||
|
perform a consensus analysis of the important variables to extract potential
|
||||||
|
critical quality attributes and critical process parameters predictive of T-cell
|
||||||
|
potency, safety, and consistency at the early stages of the manufacturing
|
||||||
|
process.
|
||||||
|
|
||||||
|
Symbolic regression (SR) was done using Evolved Analytics’ DataModeler software
|
||||||
|
(Evolved Analytics LLC, Midland, MI). DataModeler utilizes genetic programming
|
||||||
|
to evolve symbolic regression models (both linear and non-linear) rewarding
|
||||||
|
simplicity and accuracy. Using the selection criteria of highest accuracy
|
||||||
|
(R2>90\% or noise-power) and lowest complexity, the top-performing models were
|
||||||
|
identified. Driving variables, variable combinations, and model dimensionality
|
||||||
|
tables were generated. The top-performing variable combinations were used to
|
||||||
|
generate model ensembles. In this analysis, DataModeler’s SymbolicRegression
|
||||||
|
function was used to develop explicit algebraic (linear and nonlinear) models.
|
||||||
|
The fittest models were analyzed to identify the dominant variables using the
|
||||||
|
VariablePresence function, the dominant variable combinations using the
|
||||||
|
VariableCombinations function, and the model dimensionality (number of unique
|
||||||
|
variables) using the ModelDimensionality function. CreateModelEnsemble was used
|
||||||
|
to define trustable model ensembles using selected variable combinations and
|
||||||
|
these were summarized (model expressions, model phenotype, model tree plot,
|
||||||
|
ensemble quality, model quality, variable presence map, ANOVA tables, model
|
||||||
|
prediction plot, exportable model forms) using the ModelSummaryTable function.
|
||||||
|
Ensemble prediction and residual performance were respectively assessed via the
|
||||||
|
EnsemblePredictionPlot and EnsembleResidualPlot subroutines. Model maxima
|
||||||
|
(ModelMaximum function) and model minima (ModelMinimum function) were calculated
|
||||||
|
and displayed using the ResponsePlotExplorer function. Trade-off performance of
|
||||||
|
multiple responses was explored using the MultiTargetResponseExplorer and
|
||||||
|
ResponseComparisonExplorer with additional insights derived from the
|
||||||
|
ResponseContourPlotExplorer. Graphics and tables were generated by DataModeler.
|
||||||
|
These model ensembles were used to identify predicted response values, potential
|
||||||
|
optima in the responses, and regions of parameter values where the predictions
|
||||||
|
diverge the most.
|
||||||
|
|
||||||
|
Non-parametric tree-based ensembles were done through the randomForest, gbm, and
|
||||||
|
cforest regression functions in R, for random forest, gradient boosted trees,
|
||||||
|
and conditional inference forest models, respectively. Both random forest and
|
||||||
|
conditional inference forest construct multiple decision trees in parallel, by
|
||||||
|
randomly choosing a subset of features at each decision tree split, in the
|
||||||
|
training stage. Random forest individual decision trees are split using the Gini
|
||||||
|
Index, while conditional inference forest uses a statistical significance test
|
||||||
|
procedure to select the variables at each split, reducing correlation bias. In
|
||||||
|
contrast, gradient boosted trees construct regression trees in series through an
|
||||||
|
iterative procedure that adapts over the training set. This model learns from
|
||||||
|
the mistakes of previous regression trees in an iterative fashion to correct
|
||||||
|
errors from its precursors’ trees (i.e. minimize mean squared errors).
|
||||||
|
Prediction performance was evaluated using leave-one-out cross-validation
|
||||||
|
(LOO)-R2 and permutation-based variable importance scores assessing \% increase
|
||||||
|
of mean squared errors (MSE), relative influence based on the increase of
|
||||||
|
prediction error, coefficient values for RF, GBM, and CID, respectively. Partial
|
||||||
|
least squares regression was executed using the plsr function from the pls
|
||||||
|
package in R while LASSO regression was performed using the cv.glmnet R package,
|
||||||
|
both using leave-one-out cross-validation. Finally, the kernlab R package was
|
||||||
|
used to construct the Support Vector Machine regression models.
|
||||||
|
|
||||||
|
Parameter tuning was done for all models in a grid search manner using the train
|
||||||
|
function from the caret R package using LOO-R2 as the optimization criteria.
|
||||||
|
Specifically, the number of features randomly sampled as candidates at each
|
||||||
|
split (mtry) and the number of trees to grow (ntree) were tuned parameters for
|
||||||
|
random forest and conditional inference forest. In particular, minimum sum of
|
||||||
|
weights in a node to be considered for splitting and the minimum sum of weights
|
||||||
|
in a terminal node were manually tuned for building the CIF models. Moreover,
|
||||||
|
GBM parameters such as the number of trees to grow, maximum depth of each tree,
|
||||||
|
learning rate, and the minimal number of observations at the terminal node, were
|
||||||
|
tuned for optimum LOO-R2 performance as well. For PLSR, the optimal number of
|
||||||
|
components to be used in the model was assessed based on the standard error of
|
||||||
|
the cross-validation residuals using the function selectNcomp from the pls
|
||||||
|
package. Moreover, LASSO regression was performed using the cv.glmnet package
|
||||||
|
with alpha = 1. The best lambda for each response was chosen using the minimum
|
||||||
|
error criteria. Lastly, a fixed linear kernel (i.e. svmLinear) was used to build
|
||||||
|
the SVM regression models evaluating the cost parameter value with best LOO-R2.
|
||||||
|
Prediction performance was measured for all models using the final model with
|
||||||
|
LOO-R2 tuned parameters. Table M2 shows the parameter values evaluated per model
|
||||||
|
at the final stages of results reporting.
|
||||||
|
|
||||||
|
\subsection{consensus analysis}
|
||||||
|
|
||||||
|
Consensus analysis of the relevant variables extracted from each machine
|
||||||
|
learning model was done to identify consistent predictive features of quality at
|
||||||
|
the early stages of manufacturing. First importance scores for all features were
|
||||||
|
measured across all ML models using varImp with caret R package except for
|
||||||
|
scores for SVM which rminer R package was used. These importance scores were
|
||||||
|
percent increase in mean squared error (MSE), relative importance through
|
||||||
|
average increase in prediction error when a given predictor is permuted,
|
||||||
|
permuted coefficients values, absolute coefficient values, weighted sum of
|
||||||
|
absolute coefficients values, and relative importance from sensitivity analysis
|
||||||
|
determined for RF, GBM, CIF, LASSO, PLSR, and SVM, respectively. Using these
|
||||||
|
scores, key predictive variables were selected if their importance scores were
|
||||||
|
within the 80th percentile ranking for the following ML methods: RF, GBM, CIF,
|
||||||
|
LASSO, PLSR, SVM while for SR variables present in >30\% of the top-performing
|
||||||
|
SR models from DataModeler (R2≥ 90\%, Complexity ≥ 100) were chosen to
|
||||||
|
investigate consensus except for NMR media models at day 4 which considered a
|
||||||
|
combination of the top-performing results of models excluding lactate ppms, and
|
||||||
|
included those variables which were in > 40\% of the best performing models.
|
||||||
|
Only variables with those high percentile scoring values were evaluated in terms
|
||||||
|
of their logical relation (intersection across ML models) and depicted using a
|
||||||
|
Venn diagram from the venn R package.
|
||||||
|
|
||||||
\section{results}
|
\section{results}
|
||||||
|
|
||||||
\subsection{DOE shows optimal conditions for expanded potent T cells}
|
\subsection{DOE shows optimal conditions for expanded potent T cells}
|
||||||
|
|
Loading…
Reference in New Issue