Register Now


Lost Password

Enter your email to reset your password.

Search for answers in posts, Q&A and documents:

BY Author

Multivariate Data Analysis for Biotechnology and Bio-processing

Powerful Multivariate Data Analysis and Design of Experiments methods are giving biotechnology companies greater insights into their complex data and driving efficiency and innovation across the entire product lifecycle.

Executive Summary

Multivariate data analysis (MVA) and design of experiments (DoE) are advanced analysis techniques which enable biotech organizations to improve their data analysis and optimize operations across the product lifecycle. MVA and DoE are used in applications such as raw material assessment, analysis of clinical trial results, understanding and controlling fermentation processes and improving quality control.

Given the large number and complexity of variables in biological systems, multivariate analysis has significant advantages over traditional statistical analysis tools. The powerful data mining capabilities allow researchers, scientists and engineers to cut through complex data sets to discover underlying patterns, while advanced regression methods can be used to make more robust predictions about a system’s behaviour.

Today’s biotech companies are increasingly looking to accelerate development, reduce process related costs and improve time to market. Unlocking the value in their data with tools such as multivariate analysis and design of experiments is a major source of potential gains in these areas.

This white paper covers applications of MVA and DoE across the product lifecycle, including examples of data analyzed for candidate therapy discovery, product formulation, clinical trials and fermentation batch process monitoring. It illustrates how these powerful analytical tools can be integrated with different systems throughout a biotechnology operation.


The modern biopharmaceutical/biotechnology manufacturing facility contains many sophisticated control, data logging and data archiving systems. Massive amounts of data are collected from sources such as raw materials analysis, process outputs and final quality assessments, which are stored in data warehouses.

The sheer volume of data contained in these warehouses makes it a near impossible task to extract the information using simple charting and univariate methods of analysis. Such complex data requires methods of analysis that can cope with multiple variables simultaneously that not only reveal influential variables, but also reveal the relationship such variables have with each other. This is where multivariate analysis (MVA) is finding a much greater role in the analysis of complex bioprocess data.

With much more effort being put into the discovery and development of biotherapies and personalised medicines, biopharmaceutical and biotechnology companies are looking for ways to accelerate drug discovery, and through initiatives such as Quality by Design (QbD) and data driven knowledge discovery, reduce the regulatory approval time and be first to market. This means that data collected throughout the entire product lifecycle must be analysed and interpreted in order to gain extensive product and process understanding. This, in turn, leads to improved quality, greater confidence in the market for a company’s products and ultimately market capitalisation.

It is estimated that it takes approximately 12 years to bring a new drug or therapy to market. This usually involves three phases:

  • ● Discovery
  • ● Clinical trials
  • ● Registration

Coupled with these phases is the development of a suitable manufacturing process that can consistently produce the highest quality product. This includes the development of a formulation that is robust under processing conditions, scale up considerations and technology transfer from facility to facility or even between different types of manufacturing equipment. Each of these phases can be improved and accelerated through the use of MVA and design of experiments (DoE).

Even before data is analysed, one of the biggest challenges facing the industry is getting this data into a format that is amenable to MVA. Many data collection and agglomeration systems are commercially available for compiling various forms of data and these can be seamlessly integrated into MVA packages such as Unscrambler so that the vast array of graphical and analytical approaches can be applied to reveal the information it contains.

Multivariate analysis in the complete product lifecycle

Unlike small molecule drug product development, biotherapies are fundamentally more complex in terms of structure and application and suffer greatly from natural biological variability. For example, isolating and selecting cell cultures or bacterial strains to further develop into future products is aided greatly by the tools of MVA, including the monitoring of the processes (e.g. fermentation reactions) used to produce them. From there, the tools of DoE can be used to devise formulations that stabilise the active component(s) during manufacture and are also useful in product scale up studies.

Once the candidate therapy (cell cultures, antibody, virus strain etc.) has been formulated into a stable matrix, MVA can be used to assist in the interpretation of clinical trial data and can even lead to accelerating the lengthy process through a much more comprehensive and holistic approach to data analysis, especially when combined with the principles of adaptive designs and the Critical Path Initiative endorsed by the US Food and Drug Administration (USFDA).

When the candidate therapy has been approved for market release, the tools of MVA are useful for assessing the success of technology transfer from R&D to production, or from one manufacturing facility to another. In the production environment, MVA is useful for assessing incoming or internally produced raw material quality and characteristics.

Combined with rapid spectroscopic or other characterisation methods, control strategies for the real time monitoring and adjustment of processes within the so-called ‘design space’ can be devised so that proactive quality control can be realised. DoE and MVA are then used in developing robust analytical methods for stability studies and other post production analyses.

Multivariate analysis

Multivariate data analysis (MVA) is the analysis of more than one statistical variable at a time. Essentially, it is a tool to find patterns and relationships between several variables simultaneously. It lets us predict the effect a change in one variable will have on other variables. Multivariate analysis methods include exploratory data analysis (data mining), classification (e.g. cluster analysis), regression analysis and predictive modelling.


Design of Experiments

Design of Experiments is a systematic approach involving a series of structured tests in which planned changes are made to a process or system, with the effect of the changes on a pre-defined output measured. It enables researchers to maximize product and process understanding with the least number of experiments and is widely used in R&D, process optimization and quality control applications.


Benefits of MVA and DoE in biotechnology

  • ● Reduced development timeframes and costs
  • ● Reduced manufacturing costs
  • ● Improved process understanding
  • ● Improved product quality
  • ● Faster time to market

Data collected over time from a manufacturing facility can be modelled to assess consistency from batch to batch and facilitate continuous improvement (CI) and preventive maintenance and corrective action (CAPA) programs. The entire process is summarised in Figure 1.

Figure 1. The application of MVA and DoE in the biotech product lifecycle. Biotechnology companies can gain significant benefits using MVA and DoE, from product development through to manufacturing and quality control.

The Design Space
The multidimensional (multivariate) combination and interaction of input variables and process parameters that have been demonstrated to provide assurance of quality.

Development and Discovery

Candidate Therapy Discovery

During the initial development of new therapies, there is usually much information available on candidate cultures, antibodies etc. regarding their chemical, biological and toxicological properties. Combined with information from origin and other background information, the method of principal component analysis (PCA) provides a key data mining tool for the development scientist to not only classify candidates of similar properties and characteristics, but also discover unique classes that may be better suited to the treatment of specific conditions.

PCA provides a visual map of the sample groupings, allowing for the more efficient selection of real candidate therapies, but it also provides a map of the input variables and their relationships that cause the samples to group the way they do. Figure 2 provides an example of the outputs of a PCA in the form of the Scores and Loadings plots. The Scores provide a map of the samples and the Loadings provide a map of the input variables.

Figure 2. Scores and Loadings plots for a candidate selection study. The Scores and Loadings plots clearly show how different samples cluster according to similar characteristics enabling faster identification of the important discriminators between classes.

In the example above, Source 1 samples (blue) have high amounts of impurities whereas Source 3 samples (green) have the highest cell count. As a rule of thumb, variables located outside the inner ellipse in the loadings plot are regarded as being important when interpretating of clusters in the Scores plot.

PCA (or more generally MVA) applied to this kind of data is sometimes referred to as quantitative structure activity relationships (QSAR) and has helped some companies to significantly reduce the time and effort required to isolate suitable candidates for further development.

Principal Component Analysis (PCA)

PCA is a method for analyzing variability in data. It does this by separating the data into principal components (PCs). Each PC contributes to explaining the total variability, with the first PC describing the greatest source of variability. The goal is to describe as much of the information in the system as possible with the fewest number of PCs and whatever is left can be attributed to noise (i.e. no information).

Formulation of suitable products

Stabilising the candidate into a suitable matrix for manufacturing and delivery is best approached using DoE, and in particular, excipient screening and mixture designs. Excipient screening designs allow the formulation scientist to select the best components that will preserve the nature of the candidate, while mixture designs allow for the development of the best combination that will not only stabilise the candidate, but also protect it during subsequent manufacturing processes. Figure 3 provides an example of the output from a mixture design, allowing the formulation scientist to fine tune the product to meet exact requirements.

Figure 3. Using Response Surfaces to optimize formulations. Response surfaces are often used in Design of Experiments projects to give a map of the optimal parameter space to ensure the best formulation.

In the example above, an analysis of a mixture design in a series of formulation experiments reveals the change in the product quality as a function of the proportion of the compounds in the mixture. The gradient shows that the upper left part of the response surface is where the viscosity is at the low level, as seen by the dark blue shading. The dark orange shading in the center shows where viscosity is highest.

Clinical Trials

Clinical trials have traditionally been the domain of univariate statistical approaches (in particular clinical statistics) where statistical significance is assessed for parameters such as efficacy and major side effects. The tools of MVA can be used to complement the findings generated by clinical trial statistics to further confirm and accelerate key findings through this phase of product development.

The ability to incorporate demographic, age, sex and patient history into predictive or exploratory models is a unique feature of the MVA method, and approaches such as the L-PLS model can provide an overall picture of the patient groups, disease markers and the candidate properties to better assess the effect of the therapy on specific patient groups. Figure 4 provides an example of the L-PLS model structure and an example output.


The ‘L’ shaped PLS (Partial Least Squares) model is an extension of the PLS method which allows three data tables to be analysed simultaneously. It is a highly investigative tool for analysing population data. L-PLS first finds the correlation between the Y-reference data with some external data collected on the system e.g. demographics in sensory data or chemical band assignments in spectroscopic data, so that important information is captured from Y-reference that may better model with X.

Figure 4. The L-PLS model and its potential for clinical trial data analysis. The L-PLS model provides an insightful ‘map’ of how different variables and characteristics relate to each other. Variables in green describe the background information of the patients, the variables in blue are the side effects of the formulations (the actual formulations in light blue) and the red dots indicate patient groups. This combined plot is the most informative way of displaying the relationship between the three data tables depicted in the frame above.

Manufacture and control

MVA tools for monitoring and controlling bioprocesses have helped manufacturers worldwide make significant cost savings through proactive quality control. During the scale up and technology transfer of a process from R&D to full scale manufacturing, the use of DoE is a critical strategy for assessing the effect of changing process and equipment variables. This allows the definition of the design space, which defines the most effective control strategy for the process.

Multivariate statistical process control (MSPC) uses multivariate exploratory and predictive models and integrates them into the entire data collection and process control system. This allows manufacturers to be more innovative in their approach to quality, combining in-line process analytics into single or holistic process models that better assess the quality of production than single measurements in isolation. Two particular processes that are commonly used in biotherapy manufacture are fermentation and lyophilisation. Some applications of MVA for these processes are discussed in the following sections.

Univariate analysis

Univariate analysis is the simplest form of quantitative (statistical) analysis. The analysis is carried out with the description of a single variable and its attributes of the applicable unit of analysis. Univariate analysis is also used primarily for descriptive purposes, while multivariate analysis is geared more towards explanatory purposes. (Source: Wikipedia)


Advantages of Multivariate Analysis over Univariate analysis

The complex natural processes in biotechnology often have many interrelated variables, making it necessary to sample, observe, study or measure more than one variable simultaneously to understand a process or set of samples. Univariate statistics are limited by only looking at one variable at a time. Crucially, they often fail to detect the relationships that exist between variables because they treat all variables as independent of each other. For more information please see Figure 6.

MVA for fermentation batch monitoring

For many years manufacturers have been challenged with the development of suitable models for monitoring the progress of batch processes, fermentation being one such process. Batch models aim to establish a process trajectory and associated limits around the trajectory that define the bounds of acceptable product quality.

Methods exist that unfold batch data and use so-called ‘maturity indices’ to model the process. However, the major drawback of these methods is that they assume linear relationships in the processes, which is fundamentally incorrect and has only partially solved the batch problem.

Other approaches use time warping to distort the time scale and align batch trajectories. Again, these approaches also suffer fundamentally as they distort the chemistry or biology of the system and hence do not describe the true state of the process.

Relative Time Mapping (RTM) addresses the shortcomings of the previously defined methods by keeping the chemistry/biology of the system intact, while at the same time, providing the usual batch trajectory plots and associated diagnostics that have become synonymous with batch analysis. Figure 5 shows some typical outputs from a RTM batch modelling process.

Figure 5. Relative time mapping (RTM) for batch monitoring. RTM gives a more realistic and accurate picture of batch behaviour, essential for understanding complex biotechnology processes.

The example above shows the time-dependent change in a batch process modelled for a number of historical batches by applying the relative time mapping algorithm. This is invariant of the actual time and frequency of the sampling but allows for new batches to be monitored on the biological time scale. The plot below shows how the new batch at the start of monitoring has not yet reached the common starting point for the historical batches as shown by point ‘0’ on the ‘X’ axis. However, when the batch has evolved further, it follows the trajectory, keeping within the red confidence interval lines, and ends up in the “sweet-spot” i.e. with product quality inside the specifications.

Relative time mapping (RTM) for batch monitoring

Traditional online batch monitoring solutions using ‘maturity indices’ assume linear relationships in the process, which is fundamentally incorrect. Relative Time Mapping (RTM) addresses the shortcomings of traditional batch monitoring methods by keeping the chemistry/biology of the system intact while also providing batch trajectory plots and associated diagnostics.

Whether batch models or traditional statistical process control (SPC) charts are used to assess the progress of a bioprocess, there are many diagnostics available in multivariate models that can be used to determine the onset of process failure.

The term early event detection (EED) is being increasingly used to describe the application of multivariate statistical process control for the detection of process faults. The diagnostics from these models can be fed back into the manufacturing control systems using protocols such as OPC to automate process adjustments and therefore maximise the quality of the final product. Figure 7 provides a schematic of such a system.

Multivariate Statistical Process Control (MSPC)

MSPC is fundamentally similar to traditional SPC, with the advantage of using powerful multivariate statistics which give a more holistic view of the process. As most processes involve several variables, MSPC is often more suited than basic SPC approaches. Additionally, multivariate process control can visualize all variables on 1 or 2 control charts, rather than many charts, simplifying the job for process operators and engineers.

In many processes, the variables have important interactions affecting the outcome (e.g. final product quality) which cannot be detected by traditional univariate statistical process control charts. An example of this is shown in Figure 6.


Figure 6. Comparing univariate and multivariate views of a simple process involving only two variables, temperature and pH. The process appears to be within specification limits when examining two separate univariate control charts (temperature control chart and pH control chart). When switching to a multivariate view, however, a fault in the process can be clearly observed outside the limits.

  • ● Only with multivariate analysis can the fault be detected
  • ● The univariate limits are too wide to detect a multivariate fault
  • ● The two variables under consideration are not independent
  • ● The “sweet spot” is defined by the ellipse

Quality Control Applications

Although initiatives such as process analytical technology (PAT) have been used by many manufacturers globally to assess product and process quality at the point of manufacture, not every process measurement can be replaced at the point of manufacture. Quality control (QC) operations are still vital in the final release stage of some, if not all, products.

Due to the high variability in many biological assays, DoE and MVA can be used to design and refine the analytical methods used in the QC laboratory and have been successfully applied to the optimisation of chromatographic methods, the refinement of sampling procedures and the analysis of complex data produced by mass spectrometers.

Another advantage of combining spectroscopic analysis with MVA methods is in stability studies. Since the NIR method is non-destructive and is sensitive to changes in the product and its matrix, the same sample can be assessed over the entire timeframe of the study. Where applicable, this avoids the destruction of product and the results are completely representative as the same sample is being assessed each time.

MVA for the assessment of lyophilised product quality

Near infrared (NIR) spectroscopy has been used for many years with multivariate predictive and exploratory models for the rapid, non-destructive assessment of product quality. One common application of the NIR method is the quantitative analysis of residual moisture in lyophilised products.

Lyophilisation is a common method used in the manufacture of biopharmaceutical products as it uses low temperatures to remove residual moisture, thus preserving the structure of the active components and allowing their storage at room temperature. The traditional method of analysis for residual moisture in lyophilised product is Karl Fischer (KF) titration, which is a destructive test and can only be applied to a small number of samples.

Replacement of the KF method with NIR not only results in non-destructive testing, but also allows for 100% inspection systems to be put in place. These systems use MVA predictive models to transform the NIR spectrum into a single value for residual moisture (or other properties) and are used to accept and reject product as it is being manufactured.

In one case, a biopharmaceutical manufacturer saved approximately $1 million by using the NIR method combined with PCA to validate the performance of a new freeze dryer. They also developed a quantitative partial least squares regression (PLSR) model to replace the KF method in the laboratory. This method saves them $1000 per sample and provides more confidence when releasing the batch to market.


Applications of MVA in Quality Control

  • ● Non-destructive quality checks
  • ● More representative sampling
  • ● 100% real-time spectroscopic inspection

Applying advanced analytics across the product lifecycle: Putting it all together

MVA and DoE are fast becoming essential tools for all process development and monitoring applications. Bioprocesses provide an excellent, but challenging application area. Modern manufacturing execution systems and control platforms produce a massive amount of data that requires the tools of MVA to fully ‘data mine’ the most important information and make real-time quality decisions.

From raw material analysis to final product release, MVA models can be integrated into the total quality management system (QMS) allowing manufacturers to gain the benefits of the Quality by Design (QbD) initiative.

Figure 7 provides an overall schematic on how MVA can be applied within an existing manufacturing plant. By implementing such procedures the full economic and cost saving benefits can be achieved from the discovery and development phase, to scale up, manufacturing and final release.

Multivariate analysis software can be seamlessly integrated into different areas of a biotech company to provide better understanding, process and quality control

The scalability and flexibility of today’s software systems allows powerful analytical tools to be integrated into existing systems such as MES/ERP systems, process equipment and scientific instruments e.g. spectrometers. This enables faster, more informed decision making from the laboratory to the shopfloor and quality department.


Figure 7. Overview of how MVA and DoE software can be used across different operational areas.

Contact Us

WordPress Lightbox