Bioinformatics is a new discipline that addresses the need to manage and interpret the data
that in the past decade was massively generated by genomic research. This discipline
represents the convergence of genomics, biotechnology and information technology, and
encompasses analysis and interpretation of data, modeling of biological phenomena, and
development of algorithms and statistics.
Bioinformatics is by nature a cross-disciplinary
field that began in the 1960s with the efforts of Margaret O. Dayhoff, Walter M. Fitch,
Russell F. Doolittle and others and has matured into a fully developed discipline. However,
bioinformatics is wide-encompassing and is therefore difficult to define. For many, including
myself, it is still a nebulous term that encompasses molecular evolution, biological modeling,
biophysics, and systems biology. For others, it is plainly computational science applied to a
biological system.
Bioinformatics is also a thriving field that is currently in the forefront of
science and technology. Our society is investing heavily in the acquisition, transfer and
exploitation of data and bioinformatics is at the center stage of activities that focus on the
living world. It is currently a hot commodity, and students in bioinformatics will benefit from
employment demand in government, the private sector, and academia.
With the advent of computers, humans have become ‘data gatherers’, measuring every aspect
of our life with inferences derived from these activities. In this new culture, everything can
and will become data (from internet traffic and consumer taste to the mapping of galaxies or
human behavior). Everything can be measured (in pixels, Hertz, nucleotide bases, etc), turned
into collections of numbers that can be stored (generally in bytes of information), archived in
databases, disseminated (through cable or wireless conduits), and analyzed. We are expecting
giant pay-offs from our data: proactive control of our world (from earthquakes and disease to
finance and social stability), and clear understanding of chemical, biological and
cosmological processes. Ultimately, we expect a better life. Unfortunately, data brings clutter
and noise and its interpretation cannot keep pace with its accumulation.
One problem with data is its multi-dimensionality and how to uncover underlying signal (patterns) in the most
parsimonious way (generally using nonlinear approaches. Another problem relates to what
we do with the data. Scientific discovery is driven by falsifiability and imagination and not by
purely logical processes that turn observations into understanding. Data will not generate
knowledge if we use inductive principles.
The gathering, archival, dissemination, modeling, and analysis of biological data falls within
a relatively young field of scientific inquiry, currently known as ‘bioinformatics’,
‘Bioinformatics was spurred by wide accessibility of computers with increased compute
power and by the advent of genomics. Genomics made it possible to acquire nucleic acid
sequence and structural information from a wide range of genomes at an unprecedented pace
and made this information accessible to further analysis and experimentation. For example,
sequences were matched to those coding for globular proteins of known structure (defined by
crystallography) and were used in high-throughput combinatorial approaches (such as DNA
microarrays) to study patterns of gene expression. Inferences from sequences and
biochemical data were used to construct metabolic networks.
These activities have generated terabytes of data that are now being analyzed with computer, statistical, and machine learning techniques. The sheer number of sequences and information derived from these endeavors
has given the false impression that imagination and hypothesis do not play a role in
acquisition of biological knowledge. However, bioinformatics becomes only a science when
fueled by hypothesis-driven research and within the context of the complex and everchanging living world. The science that relates to bioinformatics has many components. It usually relates to
biological molecules and therefore requires knowledge in the fields of biochemistry,
molecular biology, molecular evolution, thermodynamics, biophysics, molecular engineering,
and statistical mechanics, to name a few. It requires the use of computer science,
mathematical, and statistical principles. Bioinformatics is in the cross roads of experimental
and theoretical science. Bioinformatics is not only about modeling or data ‘mining’, it is
about understanding the molecular world that fuels life from evolutionary and mechanistic
perspectives. It is truly inter-disciplinary and is changing. Much like biotechnology and
genomics, bioinformatics is moving from applied to basic science, from developing tools to
developing hypotheses.