Publish Date : 4/1/2017   Journal Name :   Pages : 14
Aras: A Method with Uniform Distributed Dataset to Solve Data Warehouse Problems for Big Data

Abstract

Because of to the high rate of data growth and the need for data analysis, data warehouse management for big data is an important issue. Single node solutions cannot manage the large amount of information. Information must be distributed over multiple hardware nodes. Nevertheless, data distribution over nodes causes each node to need data from other nodes to execute a query. Data exchange among nodes creates problems, such as the joins between data segments that exist on different nodes, network congestion, and hardware node wait for data reception. In addition, when a fact table is distributed over nodes, it is impossible to change the
data warehouse dimensions and measures. Another important problem is the management of the big dimensions. An important question is that how this type of dimensions should be distributed over nodes? In this paper, the Aras method is proposed. This method is a MapReduce-based method that introduces a data set on each mapper. By applying this method, each mapper node can execute its query independently and without need to exchange data with other nodes. Node independence solves the aforementioned data distribution problems. The proposed method has been compared with prominent data warehouses for big data, and the Aras query execution time was much lower than other methods.


Authors : Mohammadhossein Barkhordari, Mahdi Niamanesh