ADD a bunch of fluff about how doe's work

This commit is contained in:
Nathan Dwarshuis 2021-08-01 17:27:14 -04:00
parent 1ce4b6bd01
commit bead38f3cd
1 changed files with 58 additions and 3 deletions

View File

@ -719,7 +719,7 @@ themselves express \il{15} and all three of its receptor components ().
Additionally, blocking \il{15} itself or \il{15R$\upalpha$} \invitro{} has been
shown to inhibit homeostatic proliferation of resting human T cells ().
\subsection*{strategies to optimize cell manufacturing}
\subsection*{overview of design of experiments}
The \gls{dms} system has a number of parameters that can be optimized, and a
\gls{doe} is an ideal framework to test multiple parameters simultaneously. The
@ -729,7 +729,60 @@ resources. It was developed in many non-biological industries throughout the
engineers needed to minimize downtime and resource consumption on full-scale
production lines.
% TODO add a bit more about the math of a DOE here
At its core, a \gls{doe} is simply a matrix of conditions to test where each row
is usually called a `run' and corresponds to one experimental unit to which the
conditions are applied, and each column represents a parameter of concern to be
tested. The values in each cell represent the level at which each parameter is
to be tested. When the experiment is performed using this matrix of conditions,
the results are be summarized into one or more `responses' that correspond to
each run. These responses are then be modeled (usually using linear regression)
to determine the statistic relationship (also called an `effect') between each
parameter and the response(s).
Collectively, the space spanned by all parameters at their feasible ranges is
commonly referred to as the `design space', and generally the goal of a
\gls{doe} is to explore this design space using using the least number of runs
possible. While there are many types of \glspl{doe} depending on the nature
of the parameters and the goal of the experimenter, they all share common
principles:
% BACKGROUND cite montgomery, because I feel like it
\begin{description}
\item [randomization --] The order in which the runs are performed should
ideally be as random as possible. This is to mitigate against any confounding
factors that may be present which depend on the order or position of the
experimental runs. For an example in context, the evaporation rate of media in
a tissue culture plate will be much faster at the perimeter of the plate vs
the center. While randomization does not eliminate this bias, it will ensure
the bias is `spread' evenly across all runs in an unbiased manner.
\item [replication --] Since the analysis of a \gls{doe} is inherently
statistical, replicates should be used to ensure that the underlying
distribution of errors can be estimated. While this is not strictly necessary
to conclude results using a \gls{doe}, failure to use replications requires
strong assumptions about the model structure (particularly in the case of
high-complexity models which could easily fit the data perfectly) and also
precludes the use of statistical tests such as the lack-of-fit test which can
be useful in rejecting or accepting a particular analysis. Note that the
subject of replication is within but not the same as power analysis, which
concerns the number of runs required to estimate a certain effect size.
\item [orthogonality --] Orthogonality refers to the independence of each
parameter in the design matrix. In other words, the levels tested in any given
parameter add mutually-exclusive information about the response(s). Again,
while not strictly necessary, orthogonality drastically simplifies the
analysis of the experiment by allowing each parameter to be treated
separately. In cases where orthogonality is impossible (which is often true in
experiments with many categorical variables) strategies exist to maximize
orthogonality.
\item [blocking --] In the case where the experiment must be non-randomly spread
over multiple groups, runs are assigned to `blocks' which are not necessarily
relevant to the goals of the experiment but nonetheless could affect the
response. A key assumption that is (usually) made in the case of blocking is
that there is no interaction between the blocking variable and any of the
experimental parameters. For example, in T cell expansion, if media lot were a
blocking variable and expansion method were a parameter, we would by default
assume that the effect of the expansion method does not depend on the media
lot (even if the media lot itself might change the mean of the response).
\end{description}
\Glspl{doe} served three purposes in this dissertation. First, we used them as
screening tools, which allowed us to test many input parameters and filter out
@ -738,7 +791,9 @@ used to make a robust response surface model to predict optimums using
relatively few resources, especially compared to full factorial or
one-factor-at-a-time approaches. Third, we used \glspl{doe} to discover novel
effects and interactions that generated hypotheses that could influence the
directions for future work.
directions for future work. To this end, the types of \glspl{doe} we generally
used in this work were fractional factorial designs with three levels, which
enable the estimation of both main effects and second order quadratic effects.
\subsection*{strategies to characterize cell manufacturing}