ADD a bunch of fluff about how doe's work

2021-08-01 17:27:14 -04:00 · 2021-08-01 17:27:14 -04:00 · bead38f3cd
parent 1ce4b6bd01
commit bead38f3cd
1 changed files with 58 additions and 3 deletions
--- a/tex/thesis.tex
+++ b/tex/thesis.tex
@ -719,7 +719,7 @@ themselves express \il{15} and all three of its receptor components ().
 Additionally, blocking \il{15} itself or \il{15R$\upalpha$} \invitro{} has been
 shown to inhibit homeostatic proliferation of resting human T cells ().
-\subsection*{strategies to optimize cell manufacturing}
+\subsection*{overview of design of experiments}
 The \gls{dms} system has a number of parameters that can be optimized, and a
 \gls{doe} is an ideal framework to test multiple parameters simultaneously. The
@ -729,7 +729,60 @@ resources. It was developed in many non-biological industries throughout the
 engineers needed to minimize downtime and resource consumption on full-scale
 production lines.
-% TODO add a bit more about the math of a DOE here
+At its core, a \gls{doe} is simply a matrix of conditions to test where each row
 is usually called a `run' and corresponds to one experimental unit to which the
 conditions are applied, and each column represents a parameter of concern to be
 tested. The values in each cell represent the level at which each parameter is
 to be tested. When the experiment is performed using this matrix of conditions,
 the results are be summarized into one or more `responses' that correspond to
 each run. These responses are then be modeled (usually using linear regression)
 to determine the statistic relationship (also called an `effect') between each
 parameter and the response(s).
 Collectively, the space spanned by all parameters at their feasible ranges is
 commonly referred to as the `design space', and generally the goal of a
 \gls{doe} is to explore this design space using using the least number of runs
 possible. While there are many types of \glspl{doe} depending on the nature
 of the parameters and the goal of the experimenter, they all share common
 principles:
 % BACKGROUND cite montgomery, because I feel like it
 \begin{description}
 \item [randomization --] The order in which the runs are performed should
  ideally be as random as possible. This is to mitigate against any confounding
  factors that may be present which depend on the order or position of the
  experimental runs. For an example in context, the evaporation rate of media in
  a tissue culture plate will be much faster at the perimeter of the plate vs
  the center. While randomization does not eliminate this bias, it will ensure
  the bias is `spread' evenly across all runs in an unbiased manner.
 \item [replication --] Since the analysis of a \gls{doe} is inherently
  statistical, replicates should be used to ensure that the underlying
  distribution of errors can be estimated. While this is not strictly necessary
  to conclude results using a \gls{doe}, failure to use replications requires
  strong assumptions about the model structure (particularly in the case of
  high-complexity models which could easily fit the data perfectly) and also
  precludes the use of statistical tests such as the lack-of-fit test which can
  be useful in rejecting or accepting a particular analysis. Note that the
  subject of replication is within but not the same as power analysis, which
  concerns the number of runs required to estimate a certain effect size.
 \item [orthogonality --] Orthogonality refers to the independence of each
  parameter in the design matrix. In other words, the levels tested in any given
  parameter add mutually-exclusive information about the response(s). Again,
  while not strictly necessary, orthogonality drastically simplifies the
  analysis of the experiment by allowing each parameter to be treated
  separately. In cases where orthogonality is impossible (which is often true in
  experiments with many categorical variables) strategies exist to maximize
  orthogonality.
 \item [blocking --] In the case where the experiment must be non-randomly spread
  over multiple groups, runs are assigned to `blocks' which are not necessarily
  relevant to the goals of the experiment but nonetheless could affect the
  response. A key assumption that is (usually) made in the case of blocking is
  that there is no interaction between the blocking variable and any of the
  experimental parameters. For example, in T cell expansion, if media lot were a
  blocking variable and expansion method were a parameter, we would by default
  assume that the effect of the expansion method does not depend on the media
  lot (even if the media lot itself might change the mean of the response).
 \end{description}
 \Glspl{doe} served three purposes in this dissertation. First, we used them as
 screening tools, which allowed us to test many input parameters and filter out
@ -738,7 +791,9 @@ used to make a robust response surface model to predict optimums using
 relatively few resources, especially compared to full factorial or
 one-factor-at-a-time approaches. Third, we used \glspl{doe} to discover novel
 effects and interactions that generated hypotheses that could influence the
-directions for future work.
+directions for future work. To this end, the types of \glspl{doe} we generally
 used in this work were fractional factorial designs with three levels, which
 enable the estimation of both main effects and second order quadratic effects.
 \subsection*{strategies to characterize cell manufacturing}