Skip Navigation LinksHome > Data Solution > Resources > Clinical Study Data Analysis

Clinical Trial Data Analysis

-all roads lead to Rome

The primary objective of most clinical trial data analyses is answering the simple question: is the treatment effective and safe?  How many products have been approved by regulatory bodies based on very complex statistical data analyses? Probably very few.  The very fact of having to resort to complex statistical analysis indicates poor data, weak results or protocol flaws. However, the exploratory part of a clinical trial data analysis can be very valuable scientifically even though the results are not submitted.  It may involve a variety of data mining techniques to extract as much information from the data as possible.

No matter whether it is a simple statistical significance, effect size calculation or a complex data mining task, the most important requirement is the knowledge of the clinical trials and a solid understanding of statistical concepts.  Such knowledge cannot be replaced by any tools.  The combination of powerful tools and inadequate knowledge can lead to erroneous conclusions or waste of effort.

This note is intended to show that researchers armed with sufficient clinical and statistical knowledge can employ a variety of tools to achieve their data analysis goals.  No matter which tool is employed, it is extremely important for the data source to have a sensible structure and be easily accessible. There are too many data systems with their data structures so convoluted that is nearly impossible for third-party tools to access the data directly for analysis.

For this note, it is assumed that data are stored on one of the widely deployed database servers (e.g. MS SQL, MySQL, Oracle).  The following are examples of data analysis tools. There are many othe similar tools.

  • SAS may be the most popular and most expensive data analysis tool for clinical trials. It has a steep learning curve. It has the interfaces for all popular database servers, so its users can access the data directly within SAS without resorting to data export.
  • STATISTICA with its floating single-user network license priced around $2.5K is a serious alternative to SAS.  It has a friendly graphical user interface allowing users to start data analysis without learning any programming. For advanced users, its R language integration can be handy. Its STATISTICA Visual Basic is powerful in automating data analysis tasks, pre- or post-analysis data processing or presentation.  It can access any database servers supporting OLE DB or  ODBC such as Oracle, SQL Server.  
  • IBM SPSS with the annual subscription priced at $1.5K is another affordable alternative to SAS. It has seamless integration with R and Python which may be attractive to those who are already familiar with R or Python. It can connect to database servers via ODBC.
  • Minitab is much more famous in the Six Sigma world than the clinical research arena, but there are clinical trials which statistical analysese are done solely with Minitab. It has a user-friendly interafce, requires little programming.  It is similar to STATISTICA to a certain degree, but a bit lighter in almost all aspects. It alsos supports ODBC for accessing database servers. Its annual license fee is about $500 per user but the minimum number of users is 5.
  • Excel may not be sufficient for being the sole tool for a clinical study's data analysis, but it may be adequate for the majority of quick ad-hoc data analyses and reports. Almost everyone knows how to use Excel and generate nice charts from data on spreadsheets, not many know that Excel has numerous statistical functions including those of all the major distributions such as normal, t, chi-square, F. It take only a few clicks to access data in a variety of formats - database server, Access, text, XML, etc.  The free OpenOffice.org Calc has a number of statistical functions and can access database servers too.
  • Access is probably the least used among the Office applications. Though it can store data by its own database engine, its power lies with its capability of managing and consuming data of other major database servers directly. It provides user-friendly interface for querying data without writing any scripts. One can create complex queries involving grouping, joining, filter and other tasks that are practically impossible to do with spreadsheet applications like Excel. Even for users well versed in SQL, the database language, Access is often the preferred way to carry out data query, light analysis, chart generation quickly.  Its counterpart in the free OpenOffice.org suite is Base.
  • SSRS(SQL Server Reporting Services) is probably an under-utilized feature of Microsof SQL server. Many organizations using SQL servers never use it. SSRS allows generating web-based reports by drag-and-drop.  A nice-looking report with tables and charts can be created in minutes and shared on the web. Though SSRS is a part of MS SQL Server, it supports OLE DB and ODBC, so it can access other database servers such as Oracle. The Oracle's counterpart of SRSS is Oracle Reports Services.
  • Business intelligence tools such as Oracle Business Intelligence Enterprise Edition and Microsoft SQL Server Analysis Services (SSAS) are meant primarily for large scale data analyses such as those of etailers (e.g. Amazon, Netflix) and stock markets.  Oracle Business Intelligence Suite Enterprise Edition costs $300K. The most inexpensive version of SQL server that comes with SSAS costs about $7K. For those who already use these tools, some of their algorithms (e.g. association, neural network) can be handy for mining large amount of clinical data for exploratory purposes.

In summary, many tools can be used alone or in combination for clinical data analysis, and a sensible data structure is critical.