Clinical Trial Data Analysis
-all roads lead to Rome
The primary objective of most clinical trial data analyses is answering the simple
question: is the treatment effective and safe? How many products have been
approved by regulatory bodies based on very complex statistical data analyses? Probably
very few. The very fact of having to resort to complex statistical analysis
indicates poor data, weak results or protocol flaws. However, the exploratory part
of a clinical trial data analysis can be very valuable scientifically even though
the results are not submitted. It may involve a variety of data mining techniques
to extract as much information from the data as possible.
No matter whether it is a simple statistical significance, effect size calculation
or a complex data mining task, the most important requirement is the knowledge of
the clinical trials and a solid understanding of statistical concepts. Such
knowledge cannot be replaced by any tools. The combination of powerful tools
and inadequate knowledge can lead to erroneous conclusions or waste of effort.
This note is intended to show that researchers armed with sufficient clinical and
statistical knowledge can employ a variety of tools to achieve their data analysis
goals. No matter which tool is employed, it is extremely important for the
data source to have a sensible structure and be easily accessible. There are too many
data systems with their data structures so convoluted that is nearly impossible
for third-party tools to access the data directly for analysis.
For this note, it is assumed that data are stored on one of the widely deployed
database servers (e.g. MS SQL, MySQL, Oracle). The following are examples
of data analysis tools. There are many othe similar tools.
- SAS may be the most popular and most expensive
data analysis tool for clinical trials. It has a steep learning curve. It has the
interfaces for all popular database servers, so its users can access the data directly
within SAS without resorting to data export.
- STATISTICA with its floating single-user network license
priced around $2.5K is a serious alternative to SAS. It has a friendly graphical
user interface allowing users to start data analysis without learning any programming.
For advanced users, its R language integration can be handy. Its STATISTICA Visual
Basic is powerful in automating data analysis tasks, pre- or post-analysis data
processing or presentation. It can access any database servers supporting OLE DB or
ODBC such as Oracle, SQL Server.
- IBM SPSS with the annual subscription priced
at $1.5K is another affordable alternative to SAS. It has seamless integration with R
and Python which may be attractive to those who are already familiar with R or Python.
It can connect to database servers via
ODBC.
- Minitab is much more famous
in the Six Sigma world than the clinical research arena, but there are clinical
trials which statistical analysese are done solely with Minitab. It has a user-friendly interafce, requires little programming. It is similar to STATISTICA to
a certain degree, but a bit lighter in almost all aspects. It alsos supports
ODBC for accessing database servers. Its annual license fee is about $500
per user but the minimum number of users is 5.
- Excel
may not be sufficient for being the sole tool for a clinical study's data analysis,
but it may be adequate for the majority of quick ad-hoc data analyses and reports.
Almost everyone knows how to use Excel and generate nice charts from data on spreadsheets,
not many know that Excel has numerous statistical functions including those of
all the major distributions such as normal, t, chi-square, F. It take only
a few clicks to access data in a variety of formats - database server, Access, text,
XML, etc. The free OpenOffice.org
Calc has a number of statistical functions and can access database servers
too.
- Access is probably the
least used among the Office applications. Though it can store data by its
own database engine, its power lies with its capability of managing and consuming
data of other major database servers directly. It provides user-friendly interface
for querying data without writing any scripts. One can create complex queries involving
grouping, joining, filter and other tasks that are practically impossible to do
with spreadsheet applications like Excel. Even for users well versed in SQL, the
database language, Access is often the preferred way to carry out data query, light
analysis, chart generation quickly. Its counterpart in the free OpenOffice.org
suite
is Base.
- SSRS(SQL Server
Reporting Services) is probably an under-utilized feature of Microsof SQL server.
Many organizations using SQL servers never use it. SSRS allows generating web-based
reports by drag-and-drop. A nice-looking report with tables and charts can
be created in minutes and shared on the web. Though SSRS is a part of MS SQL Server,
it supports OLE DB and
ODBC, so it can access other database servers such as Oracle. The Oracle's
counterpart of SRSS is
Oracle Reports Services.
- Business intelligence tools such as
Oracle Business Intelligence Enterprise Edition and
Microsoft SQL Server Analysis Services (SSAS) are meant primarily
for large scale data analyses such as those of etailers (e.g. Amazon, Netflix) and
stock markets. Oracle Business Intelligence Suite Enterprise Edition costs
$300K. The most inexpensive version of SQL server that comes with SSAS costs
about $7K. For those who already use these tools, some of their algorithms (e.g.
association, neural network) can be handy for mining large amount of clinical
data for exploratory purposes.
In summary, many tools can be used alone or in combination for clinical data
analysis, and a sensible data structure is critical.