From bead38f3cd14f2c52c928d34c73d0038f446cb7d Mon Sep 17 00:00:00 2001 From: ndwarshuis Date: Sun, 1 Aug 2021 17:27:14 -0400 Subject: [PATCH] ADD a bunch of fluff about how doe's work --- tex/thesis.tex | 61 +++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 58 insertions(+), 3 deletions(-) diff --git a/tex/thesis.tex b/tex/thesis.tex index e0049c5..5ecf4f5 100644 --- a/tex/thesis.tex +++ b/tex/thesis.tex @@ -719,7 +719,7 @@ themselves express \il{15} and all three of its receptor components (). Additionally, blocking \il{15} itself or \il{15R$\upalpha$} \invitro{} has been shown to inhibit homeostatic proliferation of resting human T cells (). -\subsection*{strategies to optimize cell manufacturing} +\subsection*{overview of design of experiments} The \gls{dms} system has a number of parameters that can be optimized, and a \gls{doe} is an ideal framework to test multiple parameters simultaneously. The @@ -729,7 +729,60 @@ resources. It was developed in many non-biological industries throughout the engineers needed to minimize downtime and resource consumption on full-scale production lines. -% TODO add a bit more about the math of a DOE here +At its core, a \gls{doe} is simply a matrix of conditions to test where each row +is usually called a `run' and corresponds to one experimental unit to which the +conditions are applied, and each column represents a parameter of concern to be +tested. The values in each cell represent the level at which each parameter is +to be tested. When the experiment is performed using this matrix of conditions, +the results are be summarized into one or more `responses' that correspond to +each run. These responses are then be modeled (usually using linear regression) +to determine the statistic relationship (also called an `effect') between each +parameter and the response(s). + +Collectively, the space spanned by all parameters at their feasible ranges is +commonly referred to as the `design space', and generally the goal of a +\gls{doe} is to explore this design space using using the least number of runs +possible. While there are many types of \glspl{doe} depending on the nature +of the parameters and the goal of the experimenter, they all share common +principles: + +% BACKGROUND cite montgomery, because I feel like it +\begin{description} +\item [randomization --] The order in which the runs are performed should + ideally be as random as possible. This is to mitigate against any confounding + factors that may be present which depend on the order or position of the + experimental runs. For an example in context, the evaporation rate of media in + a tissue culture plate will be much faster at the perimeter of the plate vs + the center. While randomization does not eliminate this bias, it will ensure + the bias is `spread' evenly across all runs in an unbiased manner. +\item [replication --] Since the analysis of a \gls{doe} is inherently + statistical, replicates should be used to ensure that the underlying + distribution of errors can be estimated. While this is not strictly necessary + to conclude results using a \gls{doe}, failure to use replications requires + strong assumptions about the model structure (particularly in the case of + high-complexity models which could easily fit the data perfectly) and also + precludes the use of statistical tests such as the lack-of-fit test which can + be useful in rejecting or accepting a particular analysis. Note that the + subject of replication is within but not the same as power analysis, which + concerns the number of runs required to estimate a certain effect size. +\item [orthogonality --] Orthogonality refers to the independence of each + parameter in the design matrix. In other words, the levels tested in any given + parameter add mutually-exclusive information about the response(s). Again, + while not strictly necessary, orthogonality drastically simplifies the + analysis of the experiment by allowing each parameter to be treated + separately. In cases where orthogonality is impossible (which is often true in + experiments with many categorical variables) strategies exist to maximize + orthogonality. +\item [blocking --] In the case where the experiment must be non-randomly spread + over multiple groups, runs are assigned to `blocks' which are not necessarily + relevant to the goals of the experiment but nonetheless could affect the + response. A key assumption that is (usually) made in the case of blocking is + that there is no interaction between the blocking variable and any of the + experimental parameters. For example, in T cell expansion, if media lot were a + blocking variable and expansion method were a parameter, we would by default + assume that the effect of the expansion method does not depend on the media + lot (even if the media lot itself might change the mean of the response). +\end{description} \Glspl{doe} served three purposes in this dissertation. First, we used them as screening tools, which allowed us to test many input parameters and filter out @@ -738,7 +791,9 @@ used to make a robust response surface model to predict optimums using relatively few resources, especially compared to full factorial or one-factor-at-a-time approaches. Third, we used \glspl{doe} to discover novel effects and interactions that generated hypotheses that could influence the -directions for future work. +directions for future work. To this end, the types of \glspl{doe} we generally +used in this work were fractional factorial designs with three levels, which +enable the estimation of both main effects and second order quadratic effects. \subsection*{strategies to characterize cell manufacturing}