Abstract—
Cloud computing is an evolving paradigm with tremendous
momentum, but its unique aspects exacerbating security and privacy challenges.
Cloud computing provides massive computation power and storage capacity which
enable users to deploy computation and data-intensive applications without
infrastructure investment. Along the processing of such applications, a large
volume of intermediate data sets will be generated, and often stored to save
the cost of re-computing them. However, preserving the privacy of intermediate
data sets becomes a challenging problem because adversaries may recover privacy
sensitive information by analyzing multiple intermediate data sets. Encrypting
ALL data sets in cloud is widely adopted in existing approaches to address this
challenge. But we argue that encrypting all intermediate data sets are neither
efficient nor cost effective because it is very time consuming and costly for
data-intensive applications to end decrypt data sets frequently while
performing any operation on them. In this paper, we propose a novel upper bound
privacy leakage constraint-based approach to identify which intermediate data sets need to be encrypted
and which do not, so that privacy-preserving cost can be saved while the
privacy requirements of data holders can still be satisfied. Evaluation results
demonstrate that the privacy-preserving cost of intermediate data sets can be
significantly reduced with our approach over existing once where all data sets
are encrypted. Index Terms --Cloud Computing, Data Sets, Privacy Preserving,
Data Privacy Management, Privacy Upper Bound;
I. INTRODUCTION
Cloud computing is the use of computing resources (hardware
and software) that are delivered as a service over a network (typically the
Internet). The name comes from the common use of a cloud-shaped symbol as an
abstraction for the complex infrastructure it contains in system diagrams
Figure1: Structure of cloud computing Cloud computing entrusts remote services
with a user's data, software and computation. Cloud computing consists of
hardware and software resources made available on the Internet as managed
third-party services. These services typically provide access to advanced
software applications and high-end networks of server computers The goal of
cloud computing is to apply traditional supercomputing, or high-performance
computing power, normally used by military and research facilities, to perform
tens of trillions of computations per second, in consumer-oriented applications
such as financial portfolios, to deliver personalized information, to provide
data storage or to power large, immersive computer games The cloud computing
uses networks of large groups of servers typically running low-cost consumer PC
technology with specialized connections to spread data processing chores across
them. This shared IT infrastructure contains large pools of systems that are
linked together. Often, virtualization techniques are used to maximize the
power of cloud computing.
Cloud users can store their valuable intermediate data sets
selectively when processing original data sets in a data intensive application
in order to curtail the overall expenses by avoiding frequent re-computation to
obtain these data sets. Data users often reanalyze results, conduct new
analysis, or share some intermediate results with others for collaboration. The
secure encryption of privacy preserving of dynamic data sets are used to
identify which intermediate data sets need to be encrypted and which do not, so
that privacy preserving cost can be saved. The technical approaches for
preserving the privacy of data sets stored in cloud mainly include encryption
and anonymization. On one hand, encrypting all data sets, an effective
approach, is widely adopted in current research.
However, processing on encrypted data sets efficiently is a
challenging task, because most of the applications run on unencrypted data
sets. Although homomorphism encryption which theoretically allows performing
computation on encrypted data sets, applying algorithms are rather expensive
due to their inefficiency. On the other hand, partial information of data sets,
example aggregate information, is required to expose to data users in most
cloud applications like data mining and analytics. In such cases, data sets are
anonymized rather than encrypted to ensure both data utility and privacy
preserving. Current privacy-preserving techniques like generalization can
withstand most privacy attacks on one single data set, while preserving privacy
for multiple data sets is still a challenging problem. Thus, for preserving
privacy of multiple data sets, it is promising to anonymize all data sets first
and then encrypt them before storing or sharing them in cloud. Usually, the
volume of intermediate data sets is huge. Hence, encrypting all intermediate
data sets will lead to high overhead and low efficiency when they are
frequently accessed or processed. To address this issue, the system proposes to
encrypt a part of intermediate data sets rather than all for reducing privacy
preserving cost.
CONCLUSION-
This paper has proposed an approach that identifies which
part of intermediate data sets needs to be encrypted while the rest does not,
in order to save the privacy preserving cost. A tree structure has been modeled
from the generation relationships of intermediate data sets to analyze privacy
propagation among data sets. The problem of saving privacypreserving cost as a
constrained optimization problem which is addressed by decomposing the privacy
leakage constraints has been modeled. A practical heuristic algorithm has been
designed accordingly. Evaluation results on real-world data sets and larger
extensive data sets have demonstrated the cost of preserving privacy in cloud
can be reduced significantly with this approach over existing ones where all
data sets are encrypted.
No comments:
Post a Comment