Computer as data analysis in Preclinical development

Unit - V  Computer as data analysis in Preclinical development


  Scientists from many different disciplines participate in pharmaceutical development.Their research areas may be very different, but they all generate scientific data (and text documents), which are the products of development laboratories.
  Literally, truckloads of data and documents are submitted to the regulatory authorities in support of investigational and marketing authorization filings.
  For example, even a typical Investigational New Drug (IND) application requires around 50,000 pages of supporting documents. One way or another, every single data point has to go through the acquiring, analyzing, managing, reporting, auditing, and archiving process according to a set of specific rules and regulations. Needless to say, the wide use of computers has tremendously increased efficiency and productivity in pharmaceutical development. 
  On the other hand, it has also created unique problems and challenges for the industry. This overview discusses these topics briefly by focusing on the preclinical development area (also known as the area of Chemical Manufacturing and Control, or CMC). Considering the pervasiveness of computer applications in every scientist’s daily activities, pecial emphases are puton three widely used computer systems:
 • CDS—chromatographic data systems
 • LIMS—laboratory information management systems
 • TIMS—text information management systems
  It is probably fair to say that these three computer systems handle the majority of the work in data/document management in the preclinical area, supporting the New Drug Application (NDA) and Marketing Authorization Application (MAA) filings. For each of these three types of systems, there are many vendors who provide various products. The selection of the right product can be complicated, and a mistake made in the process can also be costly. This overview tries to list some of the vendors that are more focused on serving the pharmaceutical industry. The lists are by no means comprehensive. The readers are encouraged to contact the vendors for more in-depth information.
It may also be beneficial to the reader if we define the sources of the scientific data in preclinical development. Some development activities that generate the majority of the data are:
 • Drug substance/drug product purity, potency, and other testing
 • Drug substance/drug product stability testing
 • Method development, validation, and transfer
 • Drug product formulation development
 • Drug substance/drug product manufacturing process development, validation, and transfer
 • Master production and control record keeping
 • Batch production and control record keeping
 • Equipment cleaning testing
  Another important aspect for discussion is the impact of regulations, specifically the regulation on electronic document management and electronic signatures, 21 CFR Part 11, published by the Food and Drug Administration (FDA) for the first time in 1997 [1] (also see Chapter 26, which covers 21 CFR Part 11 in detail). Since that time the draft rules of Part 11 have been withdrawn and reissued along with various guidance documents [2–3]. Some of the key points of Part 11 are as follows:
• Computer systems must be validated to ensure accuracy, reliability, and consistency with intended performance.
• Computer systems must provide time-stamped audit trails to record actions that create, modify, or delete electronic records.
• Computer system access must be limited to authorized personnel. 
• Computer systems should have configurable user capabilities.
Even though Part 11 has not yet been enforced by the FDA, the rules have impacted CDS, LIMS, and TIMS with regard to architectural design and
security of these systems.


   The importance of CDS is directly related to the roles that chromatography, particularly high-performance liquid chromatography (HPLC) and gas chromatography (GC), play in pharmaceutical analysis. HPLC and GC are the main workhorses in pharmaceutical analysis. In today’s pharmaceutical companies, development work cannot be done without HPLC and GC. CDS are also used for several other instrumental analysis technologies such as ion (exchange) chromatography (IC), capillary electrophoresis (CE), and supercritical fl uid chromatography (SFC).

The Days Before CDS

  In the 1960s and early 1970s, chromatographs were relatively primitive and inefficient. Chromatographers had to use microsyringes for sample injection and stopwatches for measurement of retention times. The chromatograms were collected with a strip chart recorder. Data analysis was also performed manually. Peak areas were obtained by drawing a “best fit” triangle manually for each peak and then using the equation Area = ½Base × Height. At that time, the management of chromatographic data was essentially paper based and very inefficient.
  However, compared with the traditional analytical methods, the adoption of chromatographic methods represented a significant improvement in pharmaceutical analysis. This was because chromatographic methods had the advantages of method specificity, the ability to separate and detect low-level impurities. Specificity is especially important for methods intended for earlyphase drug development when the chemical and physical properties of the active pharmaceutical ingredient (API) are not fully understood and the synthetic processes are not fully developed. Therefore the assurance of safety in clinical trials of an API relies heavily on the ability of analytical methods to detect and quantitate unknown impurities that may pose safety concerns. This task was not easily performed or simply could not be carried out by classic wet chemistry methods. Therefore, slowly, HPLC and GC established their places as the mainstream analytical methods in pharmaceutical analysis.
   As chromatographic methods became more and more important in the pharmaceutical industry as well as in other industries, practical needs
prompted instrument vendors to come up with more efficient ways for collecting and processing chromatographic data. In the mid-1970s, the integrator was introduced. At first, the integrator worked similarly to a strip chart recorder with the added capabilities of automatically calculating peak area and peak height. Because of limited available memory, chromatograms could not be stored for batch processing. However, new models with increasing capabilities quickly replaced the older ones. The newer models had a battery back-up to maintain integration parameters and larger memory modules to allow the storage of chromatograms for playback and reintegration. At that time, the integrator increased productivity and efficiency in pharmaceutical analysis, which in turn made HPLC and GC even more popular.

The Emergence and Evolution of CDS

   For some instrument vendors, the early CDS were developed as proprietary products to help with the sale of instruments. The first generation of CDS systems were based on a working model of multiuser, time-sharing minicomputers. The minicomputers were connected to terminals in the laboratory that the analysts would use. The detector channels of the chromatographs were connected to the data system through a device called the analog-to-digital (A/D) converter, which would convert the analog signals from the detectors into digital signals. In the late 1970s, Hewlett-Packard introduced the HP- 3300 series data-acquisition system. Through the A/D converters, the HP system was able to collect chromatographic data from up to 60 detector channels. This represented the beginning of computerized chromatographic data analysis and management [5].
Drawbacks of CDS
  •   Because the CDS used a dedicated hardware and wiring system, it was relatively expensive to install.
  • difficult to scale up because more minicomputers would be needed with increases in the number of users.
  • the performance of the system would degrade as the number of users increased.
  The next generation of CDS systems did not appear until the start of the personal computer (PC) revolution in the 1980s. The early PCs commercialized by Apple and IBM were not very reliable or powerful compared with today’s PCs. The operating systems were text based and difficult to use. However, it was economically feasible to put them on the desktop in each laboratory, and they were evolving rapidly to become more powerful in terms of hardware and software. By the early 1990s, the PCs were reaching the calculation speed of a minicomputer with a fraction of the cost. A graphicsbased operating system also made them more user-friendly. Taking advantage of the PC revolution, a new generation of CDS appeared on the market that utilized a client/server model. In the new CDS, the client provided the graphical and user interface through a PC and was responsible for some or most of the application processing. The server typically maintained the database and processed requests from the clients to extract data from or update the database. This model was adopted widely in the industry for almost a decade because of its scalability. It also facilitated the activities of data sharing, method transfer, result review and approval, and troubleshooting at different laboratories and locations. It also overcame the problem of scale-up. During this period of time, in parallel with the progress in CDS, chromatography itself was developing rapidly. Instrumentation had adopted modular design so that each functional part became more reliable and serviceable. Progress in microelectronics and machinery made the solvent delivery pump more accurate and reproducible. The accuracy and precision of auto samplers also were significantly improved. Compared with the time when chart recorders or integrators were used, the fully automated HPLC could now be programmed to run for days and nights nonstop. Results could also be accessed and processed remotely. With the help of sophisticated CDS, chromatography finally established its dominance in pharmaceutical analysis.
   As instrumental analysis played an increasingly important part in pharmaceutical development, an ever-larger percentage of the data in Good Manufacturing Practice and/or Good Laboratory Practice (GMP/GLP) studies were captured and stored electronically. As CDS became more sophisticated,
new functions such as electronic approval became available. However, the legal issues related to electronic signatures needed to be addressed and recognized by the regulatory authorities. To clarify the confusion and provide clear guidelines regarding electronic data, the FDA issued 21 CFR Part 11 rules to address concerns regarding the electronic media of scientific data. With respect to the FDA’s expectations, the CDS operated with the client/server model had a significant drawback. In the client/server model, the client must retain parts of the applications. To fulfill the requirements of system qualification, performance verification, and validation, one must validate not only the server, but also each PC used by the client. This created an enormous burden for the customer, which resulted in the adoption of a new operating model of server-based computing.
   With server-based computing, the applications are deployed, managed, supported, and executed on a dedicated application server. Server-based computing uses a multiuser operating system and a method for distributing the presentation of an application’s interface to a client device. There are no software components installed on the client PC. The client’s PC simply acts as the application server’s display. CDS using this model significantly reduced the total cost in implementation and maintenance and significantly increased its compliance with regulatory guidelines.

The Modern CDS

   Use of server-based computing is only one of the important features of the modern CDS. The other two important features are the use of embedded data structure and direct instrument control. The earlier generations of CDS used a directory file structure, meaning that the raw data and other files such as the instrument method and data processing method were stored at separate locations. There would either be no connections or only partial connections between these files. The most significant drawback of this type of file management was the potential for methods and raw data to be accidentally overwritten. To prevent this from happening, the raw data and result files must be locked. If in some cases the locked data needed to be reprocessed, the system administrator must unlock the files. The embedded relational database has been widely used for LIMS and is a much better file structure. The embedded data structure can be used to manage not only chromatographic data, but also all aspects of the CDS, including system security and user privileges. The embedded data structure maintains all information and changes by date- and time stamping them to prevent accidental overwriting of raw data and method files. It controls versions of all processed result files, acquisition methods, processing methods, and reporting methods to provide full audit trails. All of the metadata (acquisition, process, and reporting methods) related to a specific result are tied together. 
   Direct instrument control (or the lack of it) was an important issue for the earlier version of CDS. The scheme of connecting the detector channels through A/Ds to CDS worked well in analytical laboratories across the pharmaceutical industry. The scheme provided enough flexibility so that the CDS could collect data from a variety of instruments, including GC, HPLC, IC, SFC, and CE. It was equally important that the CDS could be connected to instruments that were manufactured by different vendors. It was not uncommon to find a variety of instruments from different vendors in a global pharmaceutical research company. The disadvantage of this scheme was that the instrument metadata could not be linked to the result file of each sample analyzed. It could not be guaranteed that the proper instrument parameters were used in sample analysis. Another need came from the increased use of information-rich detectors such as photodiode array detectors and mass spectrometer
(MS) detectors. To use these detectors in the GMP/GLP environment, data security had to be ensured. The data from these detectors could not be collected by CDS through A/Ds. This represented an important gap in reaching full compliance of the 21 CFR Part 11 regulations. In addition, the use of A/Dinevitably introduced additional noise and nonlinearity. Direct instrument control would avoid these problems. To address these problems, the instrument vendors had to cooperate by providing each other with the source codes of their software. Some progress has been made in this area. A good example is that of the CDS Empower (Waters), which now can directly control HPLC and GC equipment manufactured by Agilent. 


   CDS have certainly served the pharmaceutical industry well by being continuously improved. CDS have helped the pharmaceutical industry to increase efficiency and productivity by automating a large part of pharmaceutical analysis. But CDS still have room for improvement. So far the main focus of CDS has been on providing accurate and reliable data. The current regulatory trend in the pharmaceutical industry is to shift from data-based filings to information-based filings, meaning that the data must be analyzed and converted into information. This implies that enhancements indata searching and trend analysis capabilities will be desirable in the future. 


Laboratory information management systems, or LIMS represent an integral part of the data management systems used in preclinical development. LIMS are needed partly because CDS cannot provide enough data management capability. For example, CDS cannot handle data from nonchromatographic tests. 
  Another important use of LIMS is for sample management in preclinical development, more specifically in drug substance and drug product stability studies. Stability studies are very labor intensive, and the results have an important impact on regulatory filings. LIMS are designed to automate a large part of these stability studies including sample tracking, sample distribution, work assignment, results capturing, data processing, data review and approval, report generation, and data archiving, retrieving, and sharing. 

LIMS Hardware and Architectures

Commercial LIMS appeared on the market in the early 1980s. These operated on then state-of-the-art minicomputers such as the 16-bit Hewlett-Packard 1000 and 32-bit Digital VAX system. By the late 1980s, several DOS-based PC LIMS operating on the primitive PC network were available.
By the early 1990s, most LIMS started using commercial relational database technology and client/server systems, which operated on UNIX or the new Windows NT platform. The most advanced LIMS utilize server-based architecture to ensure system security and control.
   There are four main types of architectural options when implementing LIMS. The first is the LAN (local area network) installation. In a multiplesite situation and through the standard client/server setup, the application would be hosted separately on a server at each site connected to PC clients. In this setup, the LIMS are installed on both the clients and the server. System administration is required at each facility.
  The second type is the WAN (wide area network) installation. In this setup the LIMS take advantage of telecommunication technology to cover a great distance. The setup can also be used to connect disparate LANs together. In this configuration, the LIMS are installed on both the clients and a central server. The third type is the so-called “centrally hosted thin client installation”.  For this setup, system administration is managed at a corporate center, where the LIMS are hosted and distributed via a WAN or the Internet with a virtual private network (VPN). The last and also the newest type is the ASP (Application Service Provision provider)-hosted installation. In this setup, the LIMS are hosted on a centrally managed server form and maintained by third-party specialists. Users access the LIMS with any Internet-connected PC with a standard Web browser.

Different Types of LIMS

 The implementation of LIMS requires a significant amount of investment in capital money and manpower. There are large numbers of established vendors that provide commercial LIMS with a similar range of core functionality, but few of them are dedicated to the pharmaceutical industry because of the market size . The following discussion is not intended to categorize different types of LIMS; rather, we briefly point out the most obvious characteristics of different LIMS. LIMS may possess certain distinctive features, but their core functionalities may be very similar.
 Customer-tailored LIMS—In an implementation of this type of LIMS, the customer purchases a generic product from the vendor. The vendor and customer will work together over a period of time to configure the software to adapt it to meet end user needs. This usually involves extensive programming, which can be performed by the trained end user or dedicated supporting personnel on the customer side. Programming support is usually needed for the entire life of the LIMS to accommodate changes in development projects. The advantage is that the LIMS functions relatively closely to the business practices of the customer and the system can be tailored to fit the needs of the customer’s development projects. The disadvantage is that it takes considerable resources to implement and maintain the LIMS.
 Preconfigured LIMS—This LIMS does not require extensive customer programming. To meet specific needs of end users, the vendors provide a comprehensive suite of configuration tools. These tools allow end users to add new screens, menus, functions, and reports in a rapid and intuitive manner. The tools also allow the LIMS to be more easily integrated with other business applications such as document processing, spreadsheets, and manufacturing systems.
 Specialized LIMS—This type of LIMS is based on the fact that certain laboratories have a range of well-defined processes (e.g., stability testing) that are performed according to a specific set of regulations and by using wellestablished tests. The tests are done according to industry-wide accepted protocols. Specialized LIMS are tailor-made for certain types of laboratories.Therefore the performance can be optimized for clearly defined work process.
 LIMS as rented service—The application service provision provider (ASP) is a means of obtaining access to software applications without the need to acquire expensive licenses and hardware or employ high-cost support resources. The application is hosted on a third-party site with system maintenance, backup, and recovery provided by a third party. Products and services can be rented for a contract period on a fixed cost per user/per month basis. The advantages of obtaining LIMS in this fashion include reduced cost in initial investment and reduced requirement of resources for maintaining the LIMS. The continued security and integrity of the data transferred over the Internet is a major concern for this type of LIMS.

Implementation of LIMS

  Because of their complexity, implementing LIMS usually is a traumatic process. Good communication and planning can reduce the level of turmoil caused by LIMS.
   Planning (defining expectations) is the first step in a lengthy process of acquiring the LIMS. The LIMS vendor and customer have to work very closely at this stage. A series of meetings must be held between the LIMS vendor and potential end users and laboratory supervisors. The business processes and sample flows need to be mapped and documented to prepare for future system configuration. For each type of sample to be tracked by the LIMS, the attributes related to the samples must be defined. Even the data format has to be decided so that it is consistent with existing procedures and practices of the organization. When the expectations are compiled and analyzed, it is important to balance the needs of the end users from different disciplines because they may have different concerns, priorities, and requirements. Mistakes made in the planning stage can be very costly later on over the life span of the LIMS.
  The LIMS for GMP/GLP use must be validated. Validation includes design qualification, installation qualification, operational qualification, performance qualification, and final documentation. Each of these steps needs good planning and documentation. The compliance function (QA) of the development organization will need to be involved in reviewing and approving the plan and in the audit of the final report. During validation, the system is tested against normal, boundary value, and invalid data sets. Invalid data should be identified and flagged by the software. Dynamic “stress” tests should also be done with large data sets to verify whether the hardware is adequate. The validation work usually is conducted on a test system that is an exact copy of the production system to protect the data integrity of the production system.
  One of the major undertakings during LIMS implementation is user training, which should cover not only the LIMS itself but also the standard operating procedures (SOPs) that govern use, administration, training, and other aspects of the LIMS. The training should be conducted on the test system instead of the production system. The trainers should keep in mind that the LIMS is one of the less user-friendly systems for end users because of its complexity and rigid audit trail setups. Adequate support after training and rollout may have a long-lasting impact on the success of the new LIMS. 


  LIMS is a complex system and requires significant capital and manpower investment. Selection of the right LIMS product is a daunting task, and the outcome can have a significant impact on the business.
  Compared with CDS, LIMS has more core functionalities in managing laboratory data and other electronic information. It also has much stronger search and reporting capabilities. It is interesting to point out that some LIMS vendors have started to use the term “data mining” in their product introduction brochures. This means that they are aware of a new trend in the pharmaceutical industry, especially in preclinical development, namely, toward a better understanding and control of data in pharmaceutical manufacturing. The FDA has issued a new Guidance on Process Analytical Technologies (PAT), [9] promoting the concepts of “quality by design,” “process understanding,” and “real-time assurance of quality.” These concepts may have a profound impact on how pharmaceutical development is conducted in the future. To put these concepts into practice will mean an explosion in the amount of scientific data, not only through standard testing such as HPLCand GC but also through nonstandard technologies such as near-infrared spectroscopy, Raman spectroscopy, various particle size analysis techniques, etc. More importantly, the data will need to be analyzed with new (e.g., chemometrics) tools to generate process/product information and knowledge. The current LIMS are not designed to handle large amounts of spectral data. We will have to see whether the core functionalities of LIMS can be expanded or totally new information management systems will have to be developed to meet the new challenges.


   The name “text information management system” is not as widely used as the name “laboratory information management system.” Nevertheless, a text document management system is essential in preclinical development because huge numbers of text documents and other related information such as images, drawings, and photographs are generated in the area. All these documents and information are considered intellectual property and require protection and easy access.
 One of the characteristics of the pharmaceutical industry is large quantities of paperwork, particularly in areas where GMP/GLP are strictly enforced. The slogan “documentation, documentation, and documentation . . .” is always in the mind of laboratory scientists.
 The scientists in preclinical development spend quite a large percentage of their working time writing compound documents (reports). The report generation, review, approval, filing, and retrieval process can be very inefficient or even bureaucratic in a pharmaceutical company, partly because of the strict regulations. The following scenario could be seen often as recently as the late 1980s: The scientist would prepare his report with one type or another of text and graphic software, often through multiple cut-and-paste procedures to include pictures or images. Then the scientist would make hard copies of the report for review by managers and the department head. After all the corrections were made, the scientist would print out another copy for the QA auditor for auditing (this is only done for the documents used for submission). It could take months before the report was finally ready to be filed in the company record center, where photocopies and microfilms were made and indexing took place. When an end user needed a copy of the report, he would have to make a request to the record center for a hard copy.
  When TIMS is used in today’s workflow, the scientist can use a report template to facilitate report writing. Some cut-and-paste procedures are still needed to include data and figures. After the draft report is completed, the scientist can send the reviewers an electronic link for the document. The reviewers can review the document and make changes and corrections with the “tracking change” function. When the review is completed, the author can choose to accept the changes or deny them. If auditing is needed, the same process can be used. The finalized document is issued within the TIMS by adding an issue date and signatures, if necessary, and converting into an unalterable PDF file. Future changes made after issuance are captured through version control. End users can also access the issued document electronically and remotely. Comparison of the new process vs. the old one has demonstrated the advantages of TIMS.

Documentation Requirements in Preclinical Development

In preclinical development, the GMP/GLP regulations are enforced not only for scientific data but also for text documents. This section discusses several types of controlled text documents used in preclinical development. Most of these documents are managed by the fully validated TIMS.
   Product specification documents and analytical test methods—In preclinical development, these are important documents and they evolve along with the development phases. Drug substances and products for clinical trials are tested based on these documents, and so are the stability samples. It is critical to ensure that the analyst will perform the right tests against the right specifications with the correct version of the test method. Therefore a mechanism must be in place to control these documents. This can be done manually or with TIMS. A manually controlled system would require the analyst to sign out hard copies of the documents from a central location. After the testing is done, the analyst would have to return these controlled documents to the central location. Sometimes mistakes can be made with regard to the correct documents, and this will result in repetition and unnecessary investigation. If TIMS is implemented, the analyst can obtain the documents from the secured database and then the documents should be destroyed after the test is completed.
  Standard operating procedures (SOPs)—The SOPs are controlled in a way similar to that of specification documents and analytical methods. It must be ensured that the correct versions of the SOPs are accessed and used by the scientists. After use, the hard copies should be destroyed and disposed of properly. An added requirement is that the SOPs should be accessible during working hours without interruption. Hard copies should be available at a manageable location so that the SOPs are available when the electronic system is down. 
 Research reports—Research reports such as stability reports, method validation and transfer reports, and pharmaceutical development reports are key documents used for NDA/MAA filings. These documents are strictly version controlled.
 Laboratory notebooks—It may be debatable to consider laboratory notebooks as text documents, but they should be mentioned here because of their importance in preclinical development. Laboratory notebooks are used to record experimental procedures, observations, raw data, and other important information. Although laboratory notebooks are rarely used for submission to regulatory agencies directly, they are available for inspection by the authorities in the Preapproval Inspection (PAI) and other GMP/GLP-related inspections. Currently, most of the major pharmaceutical companies still use paper-based laboratory notebooks.

Current TIMS Products

   Various so-called Enterprise Content Management (ECM) systems are commercially available that can meet different end user requirements.
   TIMS used in preclinical text document management usually is a simplified version of ECM. At the highest enterprise platform level, ECM vendors include Documentum, FileNet, Interwoven, Stellent, and Vignette. At a lower level, the upper-tier products are provided by Day Software, FatWire, and IBM. For less costly products, there are Ingeniux, PaperThin, RedDot Solutions, and Serena Software. It should also be pointed out that the cost of acquiring and maintaining a fully validated TIMS is much higher than that of a non-GMP/GLP system. Therefore many of the non-GMP/GLP documents in early-phase development are managed with nonvalidated TIMS.


   TIMS has helped the pharmaceutical industry to improve efficiency in managing business-critical text documents. However, it is still a time-consuming process to write, review, audit, approve, and publish text documents for submission. The pharmaceutical industry is working toward making submissions electronically. However, this may take time, and the industry may need many changes in business practices to reach the goal.


1. FDA. “Code of Federal Regulations, Title 21 Food and Drugs, Part 11 Electronic Records; Electronic Signatures: Final Rule,” Fed Regr 62 (54), 13429–13466 (20 March 1997).
2. FDA. “Withdrawal of draft guidance for industry on Electronic Records; Electronic Signatures, Electronic Copies of Electronic Records,” Fed Regr 68 (23), 5645 (4 February 2003).
3. FDA. “Draft Guidance for Industry on ‘Part 11, Electronic Records; Electronic Signatures—Scope and Application;’ Availability of Draft Guidance and Withdrawal of Draft Part 11 Guidance Documents and a Compliance Policy Guide,” Fed Regr 68 (37), 8775–6 (25 February 1997).
4. Snyder LR, Kirkland JJ. Introduction to modern liquid chromatography, 2nd ed., New York, Wiley-Interscience, 1979.
5. Ahuja S, Dong MW. Handbook of pharmaceutical analysis by HPLC, Amsterdam, Elsevier Academic, 2005.
6. Thurston CG. LIMS/instrument integration computing architecture for improved automation and fl exibility. Am Lab 2004; Sep. 15–19.
7. Tejero J, Fish M. Internet delivery of LIMS via the application service provider model. Am Lab 2002; Sep. 32–8.

To get notification of our next post. Enter your email address:

Computer applications in pharmacy

*If you have additional questions or require more information, do not hesitate to Contact Us.

No comments:

Powered by Blogger.