Efficient Data Deduplication in Hadoop Priteshkumar Prajapati and Parth Shah

Подробная информация о книге «Efficient Data Deduplication in Hadoop Priteshkumar Prajapati and Parth Shah»

Priteshkumar Prajapati and Parth Shah - «Efficient Data Deduplication in Hadoop»

О книге

Hadoop is widely used for massively distributed data storage. Even though it is highly fault tolerant, scalable and runs on commodity hardware, it does not provide efficient and optimized data storage solution. When user uploads files with the same contents in Hadoop, it stores all files to HDFS (Hadoop Distributed File System) even if the contents are same that leads to duplication of contents hence it is wastage of storage space. Data deduplication is process to reduce the required storage capacity as only the unique instances of data get stored. The Data Deduplication process is widely used in File Server, Database management systems, Backup storage and lots of other storage solutions. A proper Deduplication strategy sufficiently utilizes the storage space under the limited storage devices. Hadoop doesn’t provide Data Deduplication solution. In this work the module of deduplication has been integrated in Hadoop framework to achieve optimized data storage. Это и многое другое вы найдете в книге Efficient Data Deduplication in Hadoop (Priteshkumar Prajapati and Parth Shah)

Полное название книги Priteshkumar Prajapati and Parth Shah Efficient Data Deduplication in Hadoop
Автор Priteshkumar Prajapati and Parth Shah
Ключевые слова компьютерная литература, основы информатики общие работы
Категории Компьютеры и Internet
ISBN 9783659679711
Издательство
Год 2015
Название транслитом efficient-data-deduplication-in-hadoop-priteshkumar-prajapati-and-parth-shah
Название с ошибочной раскладкой efficient data deduplication in hadoop priteshkumar prajapati and parth shah