From 6d9084fe92969c99a59e9a4d33316b1795bc3578 Mon Sep 17 00:00:00 2001 From: ndwarshuis Date: Thu, 29 Jul 2021 12:48:13 -0400 Subject: [PATCH] ADD methods to aim 2a --- tex/thesis.tex | 256 +++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 239 insertions(+), 17 deletions(-) diff --git a/tex/thesis.tex b/tex/thesis.tex index db987ad..7062876 100644 --- a/tex/thesis.tex +++ b/tex/thesis.tex @@ -722,7 +722,7 @@ better retention of memory phenotype compared to current bead-based methods. \section{methods} -\subsection{dms functionalization} +\subsection{dms functionalization}\label{sec:dms_fab} \begin{figure*}[ht!] \begingroup @@ -778,14 +778,6 @@ was then manually counted to obtain a concentration. Surface area for \si{\ab\per\um\squared} was calculated using the properties for \gls{cus} and \gls{cug} as given by the manufacturer {Table X}. -%TODO this bit belongs in the next aim -% In the case of the \gls{doe} experiment where -% variable mAb surface density was utilized, the anti-CD3/anti-CD28 mAb mixture -% was further combined with a biotinylated isotype control to reduce the overall -% fraction of targeted mAbs (for example the 60\% mAb surface density corresponded -% to 3 mass parts anti-CD3, 3 mass parts anti-CD8, and 4 mass parts isotype -% control). - \subsection{dms quality control assays} Biotin was quantified using the \product{\gls{haba} assay}{Sigma}{H2153-1VL}. In @@ -848,11 +840,6 @@ depending on media color or a \SI{300}{\mg\per\deci\liter} minimum glucose threshold. Media glucose was measured using a \product{GlucCell glucose meter}{Chemglass}{CLS-1322-02}. -% TODO this belongs in aim 2 -% In order to remove \glspl{dms} from -% culture, collagenase D (Sigma Aldrich) was sterile filtered in culture media and -% added to a final concentration of \SI{50}{\ug\per\ml} during media addition. - Cells on the \glspl{dms} were visualized by adding \SI{0.5}{\ul} \product{\gls{stppe}}{\bl}{405204} and \SI{2}{ul} \product{\acd{45}-\gls{af647}}{\bl}{368538}, incubating for \SI{1}{\hour}, and @@ -1047,7 +1034,7 @@ These equations were then used analogously to describe the reaction profile of % METHOD add the equation governing the washing steps -\subsection{Luminex Analysis} +\subsection{Luminex Analysis}\label{sec:luminex_analysis} Luminex was performed using a \product{ProcartaPlex kit}{\thermo}{custom} for the markers outlined in \cref{tab:luminex_panel} with modifications (note that @@ -1055,14 +1042,21 @@ some markers were run in separate panels to allow for proper dilutions). Briefly, media supernatents from cells were sampled as desired and immediately placed in a \SI{-80}{\degreeCelsius} freezer until use. Before use, samples were thawed at \gls{rt} and vortexed to ensure homogeneity. To run the plate, -\SI{25}{\ul} of magnetic beads were added to the plate and washed 3x using +\SI{25}{\ul} of magnetic beads were added to the plate and washed 3X using \SI{300}{\ul} of wash buffer. \SI{25}{\ul} of samples or standard were added to the plate and incubated for \SI{120}{\minute} at \SI{850}{\rpm} at \gls{rt} before washing analogously 3X with wash. \SI{12.5}{\ul} detection \glspl{mab} and \SI{25}{\ul} \gls{stppe} were sequentially added, incubated for \SI{30}{\minute} and vortexed, and washed analogously to the sample step. Finally, samples were resuspended in \SI{120}{\ul} reading buffer and analyzed -via a Biorad Bioplex 200 plate reader. +via a BioRad Bioplex 200 plate reader. An 8 point log2 standard curve was used, +and all samples were run with single replicates. + +Luminex data was preprocessed using R for inclusion in downstream analysis as +follows. Any cytokine level that was over-range (`OOR >' in output spreadsheet) +was set to the maximum value of the standard curve for that cytokine. Any value +that was under-range (`OOR 90\% or noise-power) and lowest complexity, the top-performing models were +identified. Driving variables, variable combinations, and model dimensionality +tables were generated. The top-performing variable combinations were used to +generate model ensembles. In this analysis, DataModeler’s SymbolicRegression +function was used to develop explicit algebraic (linear and nonlinear) models. +The fittest models were analyzed to identify the dominant variables using the +VariablePresence function, the dominant variable combinations using the +VariableCombinations function, and the model dimensionality (number of unique +variables) using the ModelDimensionality function. CreateModelEnsemble was used +to define trustable model ensembles using selected variable combinations and +these were summarized (model expressions, model phenotype, model tree plot, +ensemble quality, model quality, variable presence map, ANOVA tables, model +prediction plot, exportable model forms) using the ModelSummaryTable function. +Ensemble prediction and residual performance were respectively assessed via the +EnsemblePredictionPlot and EnsembleResidualPlot subroutines. Model maxima +(ModelMaximum function) and model minima (ModelMinimum function) were calculated +and displayed using the ResponsePlotExplorer function. Trade-off performance of +multiple responses was explored using the MultiTargetResponseExplorer and +ResponseComparisonExplorer with additional insights derived from the +ResponseContourPlotExplorer. Graphics and tables were generated by DataModeler. +These model ensembles were used to identify predicted response values, potential +optima in the responses, and regions of parameter values where the predictions +diverge the most. + +Non-parametric tree-based ensembles were done through the randomForest, gbm, and +cforest regression functions in R, for random forest, gradient boosted trees, +and conditional inference forest models, respectively. Both random forest and +conditional inference forest construct multiple decision trees in parallel, by +randomly choosing a subset of features at each decision tree split, in the +training stage. Random forest individual decision trees are split using the Gini +Index, while conditional inference forest uses a statistical significance test +procedure to select the variables at each split, reducing correlation bias. In +contrast, gradient boosted trees construct regression trees in series through an +iterative procedure that adapts over the training set. This model learns from +the mistakes of previous regression trees in an iterative fashion to correct +errors from its precursors’ trees (i.e. minimize mean squared errors). +Prediction performance was evaluated using leave-one-out cross-validation +(LOO)-R2 and permutation-based variable importance scores assessing \% increase +of mean squared errors (MSE), relative influence based on the increase of +prediction error, coefficient values for RF, GBM, and CID, respectively. Partial +least squares regression was executed using the plsr function from the pls +package in R while LASSO regression was performed using the cv.glmnet R package, +both using leave-one-out cross-validation. Finally, the kernlab R package was +used to construct the Support Vector Machine regression models. + +Parameter tuning was done for all models in a grid search manner using the train +function from the caret R package using LOO-R2 as the optimization criteria. +Specifically, the number of features randomly sampled as candidates at each +split (mtry) and the number of trees to grow (ntree) were tuned parameters for +random forest and conditional inference forest. In particular, minimum sum of +weights in a node to be considered for splitting and the minimum sum of weights +in a terminal node were manually tuned for building the CIF models. Moreover, +GBM parameters such as the number of trees to grow, maximum depth of each tree, +learning rate, and the minimal number of observations at the terminal node, were +tuned for optimum LOO-R2 performance as well. For PLSR, the optimal number of +components to be used in the model was assessed based on the standard error of +the cross-validation residuals using the function selectNcomp from the pls +package. Moreover, LASSO regression was performed using the cv.glmnet package +with alpha = 1. The best lambda for each response was chosen using the minimum +error criteria. Lastly, a fixed linear kernel (i.e. svmLinear) was used to build +the SVM regression models evaluating the cost parameter value with best LOO-R2. +Prediction performance was measured for all models using the final model with +LOO-R2 tuned parameters. Table M2 shows the parameter values evaluated per model +at the final stages of results reporting. + +\subsection{consensus analysis} + +Consensus analysis of the relevant variables extracted from each machine +learning model was done to identify consistent predictive features of quality at +the early stages of manufacturing. First importance scores for all features were +measured across all ML models using varImp with caret R package except for +scores for SVM which rminer R package was used. These importance scores were +percent increase in mean squared error (MSE), relative importance through +average increase in prediction error when a given predictor is permuted, +permuted coefficients values, absolute coefficient values, weighted sum of +absolute coefficients values, and relative importance from sensitivity analysis +determined for RF, GBM, CIF, LASSO, PLSR, and SVM, respectively. Using these +scores, key predictive variables were selected if their importance scores were +within the 80th percentile ranking for the following ML methods: RF, GBM, CIF, +LASSO, PLSR, SVM while for SR variables present in >30\% of the top-performing +SR models from DataModeler (R2≥ 90\%, Complexity ≥ 100) were chosen to +investigate consensus except for NMR media models at day 4 which considered a +combination of the top-performing results of models excluding lactate ppms, and +included those variables which were in > 40\% of the best performing models. +Only variables with those high percentile scoring values were evaluated in terms +of their logical relation (intersection across ML models) and depicted using a +Venn diagram from the venn R package. + \section{results} \subsection{DOE shows optimal conditions for expanded potent T cells}