Skip to:

e-Science 2008 4th IEEE International Conference on e-Science

Main Conference Sessions

MapReduce for Data Intensive Scientific Analyses


  • Jaliya Ekanayake, Indiana University
  • Shrideep Pallickara, Indiana University
  • Geoffrey Fox, Indiana University


Most scientific data analyses comprise analyzing voluminous data collected from various instruments. Efficient parallel/concurrent algorithms and frameworks are the key to meeting the scalability and performance requirements entailed in such scientific data analyses. The recently introduced MapReduce technique has gained a lot of attention from the scientific community for its applicability in large parallel data analyses. Although there are many evaluations of the MapReduce technique using large textual data collections, there have been only a few evaluations for scientific data analyses. The goals of this paper are twofold. First, we present our experience in applying the MapReduce technique for two scientific data analyses: (i) High Energy Physics data analyses; (ii) Kmeans clustering. Second, we present CGL-MapReduce, a stream based MapReduce implementation and compare its performance with Hadoop.

Date and Time

Friday, December 12, 10:15 a.m. to 10:45 a.m.

Room Number


More Information

Show your support for e-Science 2008

Add one of our badges to your site:

  • Teal eScience 2008 Web badge
  • Green eScience 2008 Web badge
  • Orange eScience 2008 Web badge