Abstract:
This is a project to demonstrate how Map Reduce in a distributed environment readily and easily allows us to take several measurements that describe the empirical data and to answer several questions like, for example, stars that are like our sun, and how many Earth-like planets have been observed. This project has been done as part of the coursework for the course Distributed Computing. After applying the Map Reduce and Hadoop techniques to the planetary database, and to the star database, we realize that bringing a large dataset into a tidy and insightful categorization is made easy using Map Reduce. It also shows the power, and elegance of map-reduce in the realm of astronomy and the physical sciences.
Introduction:
Astronomy, is a fascinating area of the science where most of the information is unknown. We have been seeing stars since time immemorial. And most of us must have thought about, if living beings like on earth, present out of our universe. We know that there are stars, planets, galaxies like the ones where earth is part of. The immediate question that raises is whether there is any planet like earth, any star like sun.
In recent times, there has been tremendous advancements in the study of space and the data that is received in some form, from the satellites is huge. There are new discoveries, new findings about the planets and stars that were unknown to the world. The satellites send data day by day and the research is going on in finding some
Hadoop \cite{white2012hadoop} is an open-source framework for distributed storage and data-intensive processing, first developed by Yahoo!. It has two core projects: Hadoop Distributed File System (HDFS) and MapReduce programming model \cite{dean2008mapreduce}. HDFS is a distributed file system that splits and stores data on nodes throughout a cluster, with a number of replicas. It provides an extremely reliable, fault-tolerant, consistent, efficient and cost-effective way to store a large amount of data. The MapReduce model consists of two key functions: Mapper and Reducer. The Mapper processes input data splits in parallel through different map tasks and sends sorted, shuffled outputs to the Reducers that in turn groups and processes them using a reduce task for each group.
The sun is a star that is made up of hydrogen gas. The sun is unlike other stars that we see in the sky because the sun delivers energy and radiation that allows to living things to exist. Also, the sun is much closer than the other stars in the sky.
Each map task produces an intermediate data set that is used by the reduce task to combine the map task results. This paper also proposes various extensions to the framework to allow the users to customize the data partitioning as well as combiner. Also, the framework provides a mechanism for the user programs to track relevant metrics and publish them. This paper describes the various error scenarios that happen in a large cluster of commodity hardware machines. Fault tolerance to handle various map tasks
Despite that, 3D Graphics and Virtual reality are very important to NASA to visualize space environments otherwise they would be sending Astronauts blindly. Along with 3D graphics, Satellites are one of the most important and most used pieces of technology used for NASA. It was used for communicating with Astronauts in space and like how they were used to transmit radio waves, to communicate with people in Space Stations. "Satellite telescopes have been critical to understanding phenomena like pulsars and black holes as well as measuring the age of the universe." (What are Satellites used
The key functions to be implemented are Map and Reduce. The MapReduce framework operates on key and value pairs. Each Map task processes an input split block generating intermediate data of key and value format. Then, they are sorted and partitioned by key, so later at Reduce phase, pairs of the same key will be aggregated to the same reducer for further processing. Partitions from different nodes with the same key are transferred from the shuffle phase to a single node and then merged and get ready to be fed to the reduce task. The output of Reduce tasks is same format, key and value, as
During the Space Race, the first orbital satellites were sent to outer space. Today, Americans can communicate, predict the weather, and study geodetics because of the satellites that first orbited in space.All satellites and space probes carry specialized radios receiving and transmitting equipment for guidance and control purposes as well as relaying data from space to Earth. Without the first satellite transmission of a human voice in 1958, communication would not be as efficient (Rabinowitch, 1963).
Hadoop1 provides a distributed filesystem and a framework for the analysis and transformation of very large data sets using the MapReduce [DG04] paradigm. While the interface to HDFS is patterned after the Unix filesystem, faithfulness to standards was sacrificed in favor of improved performance for the applications at hand.
Since the beggining of time, man has always been curious about its surroundings. But, as time changes so do the techniques and technology used by curious investigators. Weather satellites have proved the most useful technology in the passages because they can gather information about on coming weather, they can save lives, and they are a tool that works well with other weather calculating instruments that allow for a better prediction of approaching weather and storms.
The Milky Way in which we live in is expanding 3.25 million light years per second (Moskowitz). By the time you finish reading this paper, the Milky Way will have expanded by 1.9 million light years. This might be a lot but the closest galaxy to the Milky Way is 13.2 billion light years away. This means there is a chance extra-terrestrial life might exist by the time you finish reading this paper. While saying this, the galaxy is made up of two hundred billion stars in which NASA has discovered. Each of these stars has at least one planet orbiting it. By saying this if the Sun has one habitable planet orbiting it, what is the chance of another star having a habitable planet circling it? Peter Behroozi, a Hubble fellow at the University of California Berkley talks about this issue, “For every grain of sand there is, there are ten other planets out there like Earth” (Cofield). Astronomers have only seen our galaxy, and there are galaxies larger than the Milky Way and more
Among the billions of stars in each galaxy, many have planets that revolve around them. At least a few are bound to be of the right size and have the correct spatial orientation with respect to their star to allow life to grow. We believe this because life exists on the Earth. Astronomers have concluded that there must be hundreds of billions of other stars with conditions approximating those of the sun. Therefore there could be a million earth like planets with advanced civilizations in the Milky Way galaxy alone (Stilley). It is very probable that there is intelligent life elsewhere in the universe.
To begin with, there is more than one sun out there in space, and a sun can contain about one million planets in space. Scientists have discovered about one thousand planets orbiting distant stars. They have much more to discover! There is a big possibility that in one of those planets life could exist. Another big clue that scientist have discovered is that there are many galaxies in space and we live in one. The galaxy that Earth is located at is called the Milky Way Galaxy. A galaxy can contain many planets just like our galaxy can contains over billions of planets.
2,271 satellites in space right now according to The Goddard Space Flight Center's lists. Satellites are helping astronauts discover new things and uncover mysteries. The first satellite that was put into orbit was about sixty years ago. Space exploration is a big debate right now. About half of the U.S population wants the money to go to another organization, rather than space, and the other half thinks the money that is being spent in space, is where it needs to be going. Space exploration is beneficial because NASA missions are not using as much of the U.S budget Americans think they are, and medical and health fields are becoming even more advanced through the NASA missions.
"A planet is a celestial body that revolves around a central star and does not shine by its own light " (Grolier, 1992). The only planetary system that is known to man is our
Humans live on a small planet in a tiny part of a vast universe. This part of the universe is called the solar system, and is dominated by a single brilliant star-the sun. The solar system is the earth’s neighbourhood and the planets Mercury, Venus, Mars, Jupiter, Saturn, Uranus, Neptune and Pluto are the Earth’s neighbours. They all have the same stars in the sky and orbit the same sun.
Over the years it has become very essential to process large amounts of data with high precision and speed. This large amounts of data that can no more be processed using the Traditional Systems is called Big Data. Hadoop, a Linux based tools framework addresses three main problems faced when processing Big Data which the Traditional Systems cannot. The first problem is the speed of the data flow, the second is the size of the data and the last one is the format of data. Hadoop divides the data and computation into smaller pieces, sends it to different computers, then gathers the results to combine them and sends it to the application. This is done using Map Reduce and HDFS i.e., Hadoop Distributed File System. The data node and the name node part of the architecture fall under HDFS.