An Efficient Scheme for Secure Deduplication with Privacy-Preserving

Loading...

Solving the Secure Storage Dilemma: An Efficient Scheme for Secure Deduplication with Privacy-Preserving Public Auditing S¨ uleyman Karda¸s∗ and Mehmet Sabır Kiraz+ ∗ Dept. of Computer Science and Engineering, Batman University, Turkey + Mathematical and Computational Sciences Labs ¨ ITAK ˙ ˙ TUB BILGEM, Turkey E-mail: [email protected], [email protected]

Abstract. Existing cloud storage systems obtain the data in its plaintext form and perform conventional (server-side) deduplication mechanisms. However, disclosing the data to the cloud can potentially threaten the security and privacy of users, which is of utmost importance for a realworld cloud storage. This can be solved by secure deduplication mechanisms which enables the user to encrypt the data on the client-side (or via an encryption-as-a-service module) before uploading it to the cloud storage. Conventional client-side encryption solutions unfortunately make the deduplication more challenging. Privacy-preserving public auditing schemes, on the other hand, is also crucial because the clients outsource their data to the cloud providers and then permanently deletes the data from their local storages. In this paper, we consider the problem of secure deduplication over encrypted data stored in the cloud while supporting a privacy-preserving public auditing mechanism. We show that existing solutions cannot support both goals simultaneously due to the conflict of their security and efficiency requirements. In this respect, we present an efficient and secure deduplication scheme that supports client-side encryption and privacy-preserving public auditing. We finally show that our scheme provides better security and efficiency with respect to the very recently proposed existing schemes. Keywords: Secure Client-Side Deduplication, Public auditing, Privacy, Cloud Storage.

1

Introduction

When n clients upload the same copy of a file F , the cloud provider stores only one copy that is sent back to these clients upon download. This is called deduplication technology which is a useful method for reducing data redundancy on storage space, reducing computations on disk storage, saving on network bandwidth, and speeding up the actual backups. The goal of a deduplication mechanism is to find out basically all identical files on the storage space. More storage costs on the server-side would lead to the clients to pay higher storage fees making deduplication to be one of the most cost saving mechanism for Cloud Storage Providers (CSPs). In terms of location of deduplication processing, the underlying techniques can be classified into two methods: (1) deduplication occurs before the upload on the client-side (see Figure 1), (2) deduplication occurs after the upload on the server-side (see Figure 2). Deduplication can also be classified into two methods in terms of data processing method: (1) deduplication at a block level where the data is divided into smaller fixed-size or variable-size blocks and only a single copy of each block is stored [1], (2) at a file level where only a single copy of a file is stored [2]. The server-side deduplication allows a server to check whether the two stored files are identical. In this case, the client first uploads a file, and then the server can internally check if there is a duplication. In this case, the clients do not know whether there is a duplicate. Server-side deduplication mechanisms are easy to handle and already in use [3–5]. However, server-side solutions only reduces the cost of storage for the server and do not reduce the bandwidth.

(1) Upload Hash(Data/File)

(5) Upload Data/File Cloud Service Provider Client/User

(4) Hash(Data/File) does not exist!

(3) No!

Storage System (2) Hash(Data/File)1

Fig. 1. Client-Side Deduplication. Note that if the answer is Yes! at Step (3) then a proof of ownership protocol is required to be executed between the user and the server.

Client-side deduplication is more interesting because both the clients and the CSPs do not want the files to be uploaded if they already exist in the server which also save high bandwidth costs [6, 7]. However, the client-side deduplication requires a client to prove the possession of the data without sending the actual data (e.g., the hash of the data instead of full plaintext where hash is a function taking a string as input and generates a unique compressed string). Namely, duplication control is performed on the client-side before users upload their files. More concretely, the client-side deduplication is jointly performed by the client and the server to check the existence of the uploading files in the server. In general, proof of ownership mechanisms are used to assure that the owner has the file without uploading it to the server. For example, a malicious user which does not own a file F may obtain its hash value, and then may send to the cloud server to prove that the file was actually on the local storage. Therefore, there must be a secure cryptographic protocol for the proof of possession mechanism. User data is already known to the CSPs in the existing deduplication solutions. What if the user does not want to reveal her data to the cloud providers while the CSPs still desire deduplication? The solution of this problem is known as secure deduplication and is not easy for the following reason: Assume that Alice and Bob have the same file F , and they encrypt F using a conventional cryptographic scheme like AES under their own secret symmetric keys KA and KB , respectively. Finally, they upload the ciphertexts FA = EncKA (F ) and FB = EncKB (F ) to the CSP, respectively. However, it is now computationally hard for the CSP to decide whether FA and FB are the same (because of the underlying encryption algorithm and KA 6= KB ). Furthermore, even if the CSP could decide, it would be difficult for the CSP to store FA and FB in a short copy that allows both Alice and Bob to obtain the plaintext F by decrypting it. Therefore, it should be ensured that the resulting ciphertexts are the same without giving any sensitive information about F to the CSPs. In the case that semantically secure encryption is used, secure deduplication will also be impossible because neither Alice nor Bob can decide whether the two ciphertexts correspond to the same plaintext. Note also that offline dictionary attacks are applicable if the keys are derived from the F (i.e., KA (= KB ) = H(F ) where H is a conventional hash function mapping inputs of arbitrary length to an output of a fixed length). Existing solutions in use unfortunately have either security and privacy issues (e.g., Dropbox, Amazon AWS, Google Drive, Bitcasa, Mozy, Memopal apply client-side deduplication [6, 8].). 2

(1) Upload Data/File

Cloud Service Provider Client/User

Storage System (2) Data/File Exists?

Fig. 2. Server-Side Deduplication

As a growing need, data integrity means that data should be correctly stored in the cloud server, and any modification without the consent of a user (e.g., if data is lost, becomes corrupt, modified or compromised) should easily be detected. More concretely, users store files to the cloud and want to guarantee that they are never changed or removed [9]. Integrity auditing is usually solved by using either message authentication codes (MACs) or digital signatures. Complete assurance of data integrity in an efficient manner is also not easy to handle. The main reason of scalability problem is that the computing resources is not run locally and the users cannot be sure that the CSP is correctly performing the computation. A direct solution may be first is to download complete files and check the integrity. However, this requires very high communication cost which resulting a non-scalable solution. Hence, to reduce the communication cost the work is generally delegated to a Third Party Auditor (TPA). Briefly, the TPA is assumed to be semi-honest and continuously performs the integrity auditing on behalf of the users by querying only a random subset of the small pieces of data [10–16]. The TPA does not need the client through the auditing process to assure that her/his data stored is indeed in the CSP. 1.1

Our Contributions

Privacy-preserving public auditing and secure deduplication are the two main lines of the research for secure storage of the outsourced data. However, these solutions may contradict each other because existing privacy-preserving auditing mechanisms require both the data and its signatures while deduplication requires not to upload the duplicated data again. The difficulty in solving this contradictory problem comes from the fact that once a user signs and uploads a file then how it can be possible for the next user to audit the same duplicated file without sending the actual file and her signatures. In this paper, we focus on this contradictory problem and propose an efficient solution for secure deduplication with a privacy-preserving public auditing mechanism. To that extent, our proposal is the first scheme for secure storage providing both a secure deduplication mechanism and a privacy-preserving public auditing solution simultaneously. In particular, it maintains the storage and bandwidth savings of the client-side solution. Furthermore, it can also support both file and block level deduplication. Our main contributions are as follows: 3

– Our first contribution is to provide a efficient privacy-preserved public auditing mechanism anonymously while supporting secure deduplication, and to verify the signatures without compromising the privacy of the users. Hence, our mechanisms ensure that the CSP cannot obtain any information about the data and the TPA assures that the data is available and has not been changed. To ensure the privacy of users, we basically use anonymous public-key certificate architectures [17–21]. – Our second contribution is supporting client-side encryption while also assuring proof of ownership. To achieve this we use an Oblivious Pseudorandom function (OPRF) for encryption key generation. – Our third contribution is to disallow CSP for denying the existence of the uploaded file. We utilize digital signature mechanism while uploading a new or deduplicated file to the CSP. 1.2

Roadmap

In Section 2, we summarize the related works by focusing on deduplication, privacy-preserving public auditing, and secure deduplication with privacy-preserving public auditing separately. We then give the necessary background and preliminaries in Section 3 that will be used throughout the manuscript. In Section 4, we propose our security model, and Section 5 presents our mechanism for secure deduplication with privacy-preserving public auditing. We analyze the security of our protocol in Section 6 and give its complexity in Section 7. Finally, Section 8 concludes the paper.

2

Related Work

2.1

Secure Deduplication

The concept of secure deduplication is interesting for both research and industrial community because of reducing high costs in cloud computing environment (e.g., Amazon S3, Dropbox, OwnCloud, TeamDrive, Box, OneDrive (formerly SkyDrive), Google Drive, DepSky, and SugarSync). The first solution is to apply convergent encryption mechanism which is designed by Douceur et al. [22]. The data here is encrypted using a symmetric encryption scheme with a key which is deterministically derived from the hash of the data content. The convergent encryption mechanism is actively used by commercial CSPs like Amazon S3, Dropbox, Google, and Bitcasa [23, 24]. Note that convergent encryption does not provide semantic security because of content-guessing attacks using the deterministic nature of the content hashing 2 . Several secure deduplication schemes have already been proposed [6, 25–27]. In [28], Harnik et al. showed some attacks in the case of client-side deduplication which can lead to data leakage. Later, the concept of proof of ownership has been introduced to prevent such attacks [29, 30]. However, none of these results guarantee the confidentiality against semi-honest or malicious CSPs. In [31], the authors discuss performance, back-up, and security and privacy issues of Mozy, Carbonite, Dropbox, and CrashPlan. [32] the authors specifically analyze the Dropbox client software with its communication protocol, and show the security and privacy issues within the Dropbox such as unauthorized access to the stored files. Later, Bellare, Keelveedhi, and Ristenpart in [23] formalized the convergent encryption, which is defined as message-locked encryption. Message-locked encryption basically allows a user to encrypt the data under a symmetric key sk where sk = H(M ), H is a conventional hash function [23, 27, 33–35]. Therefore, a plaintext M will lead to the same ciphertext Encsk (M ). 2

Semantic security in this context means that ciphertexts leak no information about the underlying plaintexts except the knowledge of their equalities.

4

Key Server (KS) Trusted Party Auditor (TPA)

ion

rv se re

De leg

-P

at

cy

seu us P

4

g bl di t in

c Fun

Da

Au

ta

ic

om

Au

di

Pu

and

tin

g

in

dor

1

iva

livio

Pr

Ob

0

g

tion

3 Secure Deduplication

Signed Data Flow

Secure Communication

Users Ui

2

Cloud Service Provider (CSP)

Fig. 3. A Privacy-Preserving Public Auditing Model with Secure Deduplication: (0) i-th user Ui is delegating the auditing mechanism to a TPA, (1) a protocol for oblivious pseudorandom random function is run between the i-th user Ui and the KS, (2) the i-th user Ui securely communicates with the CSP, (3) the i-th user Ui uploads a file to the CSP which supports both the secure deduplication and the privacy-preserving public auditing properties, (4) a privacy-preserving public auditing is run between the TPA and the CSP.

Furthermore, only sk is stored and her encrypted file M in the CSP in order to retrieve later and decrypt it. However, this mechanism of Bellare et al. does not provide semantic security and only satisfies security for unpredictable messages (since the secret key can be easily obtained for predictable files, and therefore, the knowledge of initialization vectors of the underlying symmetric encryption schemes become useless). Bellare et al. later presented another scheme called DupLESS which is a server-aided encryption mechanism for deduplicated storage and ensures semantic security [23]. In [36], the authors propose the first secure cross-user deduplication scheme that provides the client-side encryption. The scheme is based on using a PAKE (Password Authenticated Key Exchange) protocol and does not require any additional independent servers. Furthermore, in [37], the authors present a specific secure architecture enabling the encrypted cloud media center.

2.2

Privacy-preserving public auditing

There are basically two research direction of data integrity in the cloud setting: Provable Data Possession (PDP) and Proof of Retrievability (POR). PDP was first introduced by Ateniese et al. [38, 39] for assuring that the CSPs indeed possess the files without retrieving or downloading the whole data. It is basically a challenge-response protocol which assures a client that her data stored on the cloud has not been modified without being detected. The protocol is realized by asking a random proper subset of blocks from the CSPs to prove that they indeed possess these blocks. The verifier TPA, which is assumed to be semi-honest, only checks a small portion of data. In [40], Wang et al. proposed proxy PDP in public clouds. It is also rather important to utilize stateless and semi-honest third party verifiers (i.e., auditors) [41–45]. In general, existing 5

mechanisms comprised of four main procedures: (1) signature generation by the client, (2) challenges by the TPA, (3) proof by the CSP, and (4) verification by the TPA. Compared to PDP, POR (which is a challenge-response protocol) not only assures the cloud servers possess the target files, but also guarantees their full recovery [46–48].

2.3

The secure storage dilemma: Secure deduplication with privacy-preserving public auditing

Secure management and verification of integrity for the outsourced data contents are of importance in practice. Although they have been studied extensively in industrial and academic field a trivial combination of privacy-preserving public auditing and secure deduplication mechanisms does not simply solve both the secure deduplication and integrity auditing in an efficient manner (see Figure 3). This is because achieving storage efficiency contradicts with the deduplication of authentication tags (i.e., signatures). If the deduplicated data and its authentication tags were never sent to the cloud then how could the user audit data integrity on the cloud side? We would like to first highlight that there are many solutions that deal with public auditing with deduplication [49, 50]. However, these solutions do not provide secure deduplication simultaneously because files are already visible to storage providers. In this respect, to the best of our knowledge, only the following papers consider both secure deduplication and public auditing notions simultaneously. Similarly, in [51], the authors claim that public auditing with deduplication. However, this scheme does not provide secure deduplication because the data is available in its plaintext form on the cloud-side. To the best of our knowledge, the only existing schemes supporting both secure deduplication and public auditing are belong to [52] and [53]. In [52], the authors propose a variant of clientside deduplication mechanism where an honest mediator obtains the data in plaintext form and performs a block-level deduplication on users’ data before they are sent to the cloud. Privacypreserving public auditing is performed as usual via the trusted party auditor using Wang et al.’s scheme [42]. The main drawback of this scheme is to use of a mediator in the client’s company environment. Therefore, not many deduplication would occur on a small size of enterprises instead of performing deduplication on the large scale CSPs, and hence, this solution looses the benefits of deduplication. . Li et al. [53] proposed SecCloud+ which aims to solve both data integrity and deduplication by using the convergent encryption technique. The main drawback of this scheme is that the user also uploads his/her file to a TPA which is assumed to be trusted. Namely, this assumption is not realistic where the TPA may easily violate the privacy of the user. To the best our knowledge, disclosing the data to the TPA is also never modeled under the existing privacy-preserving public auditing schemes. Furthermore, this scheme has also several security and efficiency drawbacks. For example, their protocol is also too inefficient because of the use of the Type-1 pairing in their setting [54]. More concretely, for a secure implementation they need supersingular elliptic curves over large prime fields which significantly increase the complexity of their protocol (see [54] for details). On the other hand, uploading the file to an honest TPA a strong assumption because TPAs are always assumed to be semi-honest in all privacy-preserving existing auditing mechanisms to prevent privacy violations. Also, and more importantly, because the data is also sent to the TPA the bandwidth advantage of the client-side solutions is completely eliminated for the user. Their protocol is also too inefficient because of the same reasoning as above (i.e., they use Type-1 pairing [54]). 6

3 3.1

Preliminaries and Notation Bilinear Maps for Auditing Schemes.

Bilinear maps are one of the best candidate for enabling auditing property. We follow the lines of [55, Chapter IX], [54] for the properties of pairings. Let (G1 , +) and (G2 , +) be two additive cyclic groups of order q with G1 =< Q > and G2 =< P >, (G3 , ·) be a multiplicative cyclic group of order q, where q is a prime number, 0 denotes the identity element of G1 , G2 , and 1 denotes the identity element of G3 . Assume that Discrete Logarithm Problem (DLP) is hard in both G1 and G2 (i.e., given a random R ∈ G1 (or ∈ G2 ), it is computationally infeasible to find an integer x ∈ Z such that R = x · Q). A bilinear map is a map e : G1 × G2 → G3 satisfying the following properties: – Bilinearity: For all P1 , Q1 ∈ G1 , P1′ , Q′1 ∈ G2 , e is a group homomorphism in each component, i.e. 1. e(P1 + Q1 , P1′ ) = e(P1 , P1′ ) · e(Q1 , P1′ ), 2. e(P1 , P1′ + Q′1 ) = e(P1 , P1′ ) · e(P1 , Q′1 ). – Non-degeneracy: e is non-degenerate in each component, i.e. 1. For all P ∈ G1 , P 6= 0, there is an element Q ∈ G2 such that e(P, Q) 6= 1, 2. For all Q ∈ G2 , Q 6= 0, there is an element P ∈ G1 such that e(P, Q) 6= 1. – Computability: There exists an algorithm which computes the bilinear map e efficiently. Furthermore, the Computational Diffie-Hellman (CDH) problem in a multiplicative group G with a generator g is defined as follows: given only g, g x , g y ∈ G where x, y ∈ Z, compute g xy without the knowledge of x or y [56]. We also assume that the CDH problem is computationally intractable (which will be used for the underlying privacy-preserving public auditing mechanism). For the sake of easy notation, we use multiplicative notation for elements of G1 , G2 , and G3 in our protocol descriptions. 3.2

Symmetric Encryption & Hash Functions.

We denote a symmetric key encryption with Esym = SymEnck (M ) and decryption with M = SymDeck (Esym ), where k is a secret key and M is a plaintext to be encrypted. We denote H1 (.) for the conventional collision resistant hashing process where H1 : {0, 1}∗ → {0, 1}ℓ , where ℓ is a fixed length of output (e.g., if H1 is SHA-512 based, then ℓ = 512). Denote H2 (.) for a secure hashing of an arbitrary binary string to a point on the elliptic curve group G1 , i.e., H2 : {0, 1}∗ → G1 . 7

3.3

Asymmetric Encryption & Signature.

The underlying public key encryption scheme of our protocol must be semantically secure (some well-known algorithms are RSA-OAEP [57], Paillier [58], and ElGamal [59]). We utilize elliptic curve versions of ElGamal encryption scheme due to the underlying algorithmic suits which need to work compatibly. – (pkU , skU ) denotes a public and private key pair of a user U (which is generated by a trusted third party (TTP)). – AsymEncpkU (M ) denotes an encryption of a message M using the public key of U and AsymDecskU (C) denotes a decryption of a ciphertext using the private key of U . – We denote SignskU (M ) for the signature of a user U using its private key skU . Note that the functionality Sign is the traditional PKI signature scheme such as Digital Signature Standard (DSS), American National Standard (ANS) X9.31, and Public Key Cryptography Standard (PKCS) #1. 3.4

Anonymous Public-Key Certificates for Secure Deduplication and Privacy-Preserving Public Auditing.

Assume that a user A uploads a file F to the CSP (in signed and encrypted form) and would like to get the services for both secure deduplication and public auditing. Assume also that another user B has also the same file F and would like to upload to the CSP. However, the user B should not send the file in encrypted or signed form, otherwise the client B and the CSP would not be able to benefit communication and/or storage advantages. Still, the user B should be able to delegate the auditing of the duplicated data without harming the privacy of the user A. Namely, the TPA needs to verify the signatures without revealing the real identity of the user A. Traditional PKI based architectures unfortunately cannot satisfy because of the inherent disclosure of the real identities. We propose to use anonymous public-key certificates in our scheme to hide the real identities [17,18,20,21]. In an anonymous public-key certificate architecture, the certificate issuing protocol allows a user to apply to certificate authorities for generating anonymous certificates using a certificate already issued. Namely, pseudonyms will be in the anonymous certificate instead of the real identity of the users. Furthermore, in case of a malicious activity, an identity tracing protocol still enables a legal authority to trace the pseudonym in an anonymous certificate back to the real identity of the corresponding party. Note also that both Idemix [60] and uProve [61] offer complete anonymity even against to the certificate authorities. Furthermore, in case of a malicious user, their extensions can always extract the real identity of a user from an interaction with a relying party (by means of utilizing an external legal authority). The Requests for Comments (RFC) document in [19] already defines a practical architecture to assure privacy of a user who requests and uses an X.509 certificate containing a pseudonym. This RFC document still provides the ability to link a certificate to the identity of the user who requested it. We denote AnonymSignskU (M ) for the signature of an anonymous user U on a message M using its private key skU . Our proposed mechanism can easily utilize the anonymous public-key certificate architecture of the RFC in [19]. 8

3.5

Oblivious Pseudorandom Functions (OPRF).

OPRF is a two-party protocol between a client C and a key server KS for privately computing a pseudorandom function fkKS (x) where x is private key of C and kKS is the private key of KS [62, 63]. At the end of the protocol, C obtains only fkKS (x) while KS learns no information. More concretely, a functionality for the OPRF can be realized as FOPRF : (kKS , x) → (⊥, fkKS (x)) for some pseudorandom function fkKS . Chaum’s blind signatures can be an example of OPRF [23, 33, 64].

4

Security Model

In this section, we aim at providing a security model for both privacy-preserving public auditing and secure deduplication. 4.1

Components

Our system involves five entities: A Certificate Authority (CA), a Key Server (KS), a CSP, a Trusted Party Auditor (TPA), and a user Ui . Their roles are described below: – The CA is responsible for generating and issuing both traditional and anonymous public-key certificates. They are also responsible to trace the attackers back and identify their real identities (in case of a malicious activity). Note that this protocol is run only during the system setup. – The KS and Ui runs a two-party protocol OPRF. Note that this protocol is run only during the system setup. – The CSP is a data storage owner which provides storage services for its clients to create, store, update and request data for retrievability. We assume that the CSP has a large storage, computation resources, and also provides host application services by the underlying virtual infrastructures. The users utilize these services in order to store and update their data in the CSP. Furthermore, the CSP supports secure client-side deduplication and privacy-preserving public auditing. – The user (Ui ) is a client of the CSP and has large amount of data to be stored. Moreover, Ui utilizes the CSP’s resources to store, retrieve, and share data with multiple users. Ui uploads her encrypted data to the CSP, and deletes the original one from the local storage. As for auditing property, we assume that Ui delegates the integrity property to the TPA to verify that her data indeed stays in the CSP correctly. – The TPA is assumed to be semi-honest. It can check the cloud storage reliability and validity of cloud storage. Furthermore, the TPA is assumed to have a secure connection with the CSP in order to check the integrity and validity of users’ data. The TPA also assists users and the CSP to perform the duplication control and support secure deduplication. – A key server KS is assumed to be semi-honest. An oblivious pseudorandom function is run between the KS and a user to realize a FOPRF . 4.2

Threat Model

First of all, TPA is assumed to be semi-honest, and all other components do not trust each other and are assumed to be malicious. Furthermore, we assume that the CA is trustworthy and do 9

not collude with other components. The security of the proposed system is proven based on the CDH assumption in the random oracle model [65]. For designing a secure client-side deduplication scheme, we consider the following adversary types: (1) Malicious outside adversary, (2) Malicious insider adversary CSP, (3) Malicious cloud user adversary, and (4) Semi-honest adversary TPA. – Malicious outside adversary. All the communication channels are assumed to be secure to maintain the confidentiality and integrity of the data being transferred (the communication channel between the cloud storage provider and the client can be secured by TLS (which is already an industry standard) while uploading or downloading files.). The outside adversary is also assumed to possess its hash value for later to fool the CSP and to pass the proof of ownership of the data. – Malicious insider adversary CSP. We assume that the CSP is malicious which may arbitrarily deviate from the protocol and intends to modify or delete the users’ encrypted data contents in order to gain extra information over the outsourced sensitive data. Moreover, the CSP may also try to build links between user profiles and accessed data files. We also assume that the CSP can always detect and learn the existence of a deduplication. Furthermore, since the CSPs already knows their clients (i.e., their real identities) they know which users share the duplicated data but do not disclose their real identities. However, they cannot learn any information about the duplicated data. – Malicious cloud user adversary. The goal of a malicious user is to convince the CSP that he is indeed a legitimate data owner. We assume that the malicious adversary has the short hash of the data. This information can then be used as an input to be able to fool the CSP and pass the proof of ownership successfully. – Semi-honest adversary TPA. The TPA continuously executes the auditing scheme in a correct manner to assess the reliability of CSP on behalf of the users, but also tries to obtain some information about users’ data. We also assume that CSP and TPA are deployed by different organizations and they do not collude each other. 4.3

Security Goals

The proposed secure deduplication protocol with privacy-preserving public auditing model satisfies the following security objectives. – Public verifiability: TPA can verify the correctness and the availability of the outsourced data without retrieving the entire data and without having online connections with the users. – Storage correctness: A CSP can pass TPA’s verification only if it indeed keeps the user’s data. – Privacy: No information about the outsourced data is leaked to TPA during the auditing process. Also, data itself is never leaked to the CSP except the information of duplication. Users’ identities are also not disclosed when there is duplication. – Secure deduplication: Secure deduplication is still possible on the CSP and does not leak any information to the CSP except the knowledge of duplication.

5 5.1

Our Secure Deduplication with Public Auditing Scheme Setup

The following setup is based on Wang et al. [42] as we follow their secure auditing mechanism as a subalgorithm. The main difference is that the encrypted messages will be audited instead 10

of its plaintext form and the underlying signature will be anonymous and they are verified as described-above. We note that the scheme of Wang et al. [42] can adopt the use of the wellknown optimization Merkle trees for proof of retrievability of very large files in an efficient manner (i.e., without having to read the entire file) [66]. First of all, a user Ui runs a key generation protocol and obtains public and private inputs as follows. The signing key pair (spki , sski ) is generated, and then the public key parameters of the user Ui is computed, denoted by pki = {spki , gi , hi , vi , e(vi , hi ), q} where xi ∈ Zp , vi ∈ G1 , hi = gixi , q is the order of G1 , G2 (q = |G1 | = |G2 |) and the private key of the user Ui is denoted by ski = {xi , sski }. Note that the public and private key pairs (pki , ski ) of the i-th user is issued through an anonymous certificate protocol as described above (i.e., pseudonym is included in the certificate instead of the real identity). Furthermore, (pkCSP , skCSP ) denotes the public and private key pair of the CSP. Also, the CSP has the public keys gi , hi for all users Ui s. Similarly, the user Ui has also the public key of CSP denoted as pkCSP and has also the his file F = F1 k · · · kFt where Fi ∈ {0, 1}, i = 1, · · · , t. KUi also denotes the (private) symmetric key of the user Ui . Furthermore, an oblivious pseudorandom function (OPRF) protocol is run between a key server (KS) and a user in order to provide semantic security during encryption key generation where kKS is the private key belongs to the KS (e.g., blind signature protocols). Finally, the security parameter k is chosen large enough in order to make the brute force or other attacks computationally impossible on finding the pre-image message m (m ∈ {0, 1}k ) given y and k such that y = H1 (m).

Fig. 4. Overview of Our Secure Deduplication Mechanism with Public Auditing

5.2

Overview of Our Mechanism

Before we present the detailed description of our system, we refer to Figure 4 to illustrate the main architecture. Roughly speaking, the user Uℓ first interacts with a key server KS in order H1 (F ) to obtain a secure and blinded key α = OPRFkKS . Next, Uℓ communicates with the CSP in order for uploading a new file. At this stage, Uℓ first computes a secure hash value of the encryption of F (i.e., h = H1 (SymEncα (F ))) and interrogates the CSP with h. If there is no duplication on the server, the protocol depicted in Figure 5 will be run; otherwise, the protocol 11

TPA Private:⊥ Public:gi , hi , vi , pkCSP ∀i

User (Uℓ ) Private: F = F1 k · · · kFt , Fi ∈ {0, 1}, KUℓ , xℓ , sskℓ x Public: gℓ , hℓ (= gℓ ℓ ), pkCSP , vℓ , spkℓ , q

CSP Private Input: skCSP Public: pkCSP , gi , hi , vi , spki , q ∀ Ui

H (F )

1 α ← OPRFkKS CF ← SymEncα (F ) h ← H1 (CF )

h −−−−−−−−−−−−−−−−−−− → Check if h exists IUℓ ← {i1 , · · · , ik }, for 1 ≤ ij ≤ t, j = 1, · · · , k No Duplicate , IUℓ −−−−−−−−−−−−−−−−−−− ←

F (ℓ) ← Fi1 k · · · kFik , ij ∈ IUℓ CCSP ← AsymEncpkCSP (H1 (F (ℓ) )) CUℓ ← SymEncKU (α) ℓ m1 · · · mn ← CF kCCSP , ∀mi ∈ Z∗q (ℓ) ∀i : σi ← [H2 (iknameℓ )vℓmi ]xℓ tℓ ← nameℓ ||AnonymSignsskℓ (nameℓ ) (ℓ) SendData ← {∀i : σi }ktℓ kCF kCCSP kCUℓ SendData −−−−−−−−−−−−−−−−−−−→ −

Compute σ (ℓ) ←

Q ?

(ℓ)

∀i

σP i

Check e(σ (ℓ) , gℓ ) = e(vℓ

∀i

mi

Q

∀i

H2 (iknameℓ ), hℓ )

?

SignCSP (P ossessionF Uℓ ) −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− ← Verify and Store SignCSP (P ossessionF Uℓ ))

Verify SignCSP (P ossessionF Uℓ )

Check h = H1 (CF ) Store H1 (F (ℓ) ) ← AsymDecskCSP (CCSP ) Verify and Store SendData, H1 (F (ℓ) ) Store P ossessionF Uℓ ← H1 (CF )||tℓ ||Uℓ SignCSP (P ossessionF Uℓ ) −−−−−−−−−−−−−−−−−−−− ←

Fig. 5. The Scheme of Uploading a New File

of Figure 6 will be completed. Later, Uℓ can easily retrieve his/her file(s) by running the protocol of Figure 7. Finally, using the protocol summarized on Figure 8, TPA would audit the uploaded new/subsequent file on the behalf of the user Uℓ without further help of Uℓ . 5.3

Secure Data Upload Satisfying Deduplication with Public Auditing Scheme

The uploading algorithm is defined as follows (see Figure 5): H (F )

1 1. The ℓ-th user Uℓ runs an OPRF protocol to compute α = OPRFkKS . Next, Uℓ computes CF ← SymEncα (F ), h ← H1 (CF ), and finally sends h to CSP. 2. The CSP checks whether h exists in the database. If it does not exist, it chooses a random set IUℓ ← {i1 , · · · , ik } where 1 ≤ ij ≤ t (bit length of F ), j = 1, · · · , k, and returns No Duplicate, IUℓ to the Uℓ . Note that CSP will use this index set as challenges in Algorithm 6 when another user claims that she/he owns this file. 3. The Uℓ first computes F (ℓ) ← Fi1 k · · · kFik for ij ∈ IUℓ . Note that F (ℓ) is computed in order to ensure proof of retrievability (i.e., the data is available). Next, it encrypts it as CCSP ← AsymEncpkCSP (H1 (F (ℓ) )) and CUℓ ← SymEncKU (α). Let m1 · · · mn ← CF kCCSP

(ℓ)



where ∀mi ∈ Z∗q . For i = 1, · · · , n, it computes σi ← AnonymSignxℓ (H2 (ik nameℓ ) vℓmi ) = [H2 (iknameℓ )vℓmi ]xℓ . Next, Uℓ computes the file tag for tℓ ← nameℓ || AnonymSignsskℓ (nameℓ ) to ensure the integrity of the unique file identifier nameℓ . Finally, it prepares SendData ← (ℓ) {∀i : σi }ktℓ kCF kCCSP kCKℓ and returns SendData to the CSP. P Q m Q ? (ℓ) 4. The CSP first computes σ (ℓ) ← ∀i σi and checks the signatures by e(σ (ℓ) , gℓ ) = e(vℓ ∀i i ∀i H2 (iknameℓ ), ?

Next, it also checks h = H1 (CF ). If they are correct, it decrypts CCSP by computing AsymDecskCSP (CCSP ) and stores H1 (F (ℓ) ). It verifies and stores SendData, H1 (F (ℓ) ). Finally, CSP prepares P ossessionFUℓ ← H1 (CF )||tℓ ||Uℓ and returns SignCSP (P ossessionFUℓ ) to Uℓ . 5. The Uℓ verifies the signature SignCSP (P ossessionFUℓ ) and forwards it to TPA. 6. TPA verifies and stores SignCSP (P ossessionFUℓ ). 12

TPA Private:⊥ Public:gi , hi , vi , pkCSP ∀i

User (Um ) Private: F = F1 k · · · kFt , Fi ∈ {0, 1}, KUm , xm , sskm xm Public:gm , hm (= gm ), pkCSP , vm , spkm

CSP Private:skCSP Public:pkCSP , gi , hi , vi , spki ∀ Ui

H (F )

1 α ← OPRFkKS CF ← SymEncα (F ) h ← H1 (CF )

h −−−−−−−−−−−−−−−−−−−−− →

Verify H1 (CF kr) F (m) ← Fi1 , · · · , Fik , ij ∈ IUℓ ′ CCSP ← AsymEncpkCSP (H1 (F (m) )) s ∈R {0, 1}λ CUm ← EKUm (α)

Check if h exists r ∈R {0, 1}λ Duplicate: r, H1 (CF kr), IUℓ −−−−−−−−−−−−−−−−−−−−−− ←

′ s, H1 (CF krks), CUm , CCSP → − −−−−−−−−−−−−−−−−−−−−−

Verify H1 (CF krks) ?

′ ) Check H1 (F (ℓ) ) = AsymDecskCSP (CCSP (F ) Store DuplicateUm ← H1 (CF )ktℓ kUm kUℓ (F )

(F )

(F ) SignCSP (DuplicateUm )

SignCSP (DuplicateUm ) −−−−−−−−−−−−−−−−−−−−− ←

Verify SignCSP (DuplicateUm )

−−−−−−−−−−−−−−−−−−−−−−−−−−− ← (F )

Verify and Store SignCSP (DuplicateUm )

Fig. 6. The Scheme of Subsequent Upload

Remark 1. Note that, in the second round of the protocol, the cloud server searches the hash value h in the database and decides whether the encrypted file CF is in the database. The security of this solution is based on collision-resistant hash functions H1 (.). However, if malicious users somehow obtain or compute collisions in practice then only looking for h become not sufficient. Namely, if the malicious users somehow has the short hashes h then they could fool the CSP that they indeed have the encrypted file CF s. Therefore, in the third round of our protocol, we additionally request F (ℓ) to eliminate these threats and to ensure the proof of correctness and retrievability. 5.4

The Scheme of Subsequent Upload

The following algorithm is executed if there is duplicate file on the CSP (see Figure 6): H (F )

1 1. The m-th user Um runs an OPRF protocol to compute α ← OPRFkKS . Next, Um computes CF ← SymEncα (F ), h ← H1 (CF ), and finally sends h to CSP. 2. The CSP checks whether h exists in the database. If it exists, it chooses a random number r ∈R {0, 1}λ and returns Duplicate: r, H1 (CF kr), IUℓ . The message pair r, H1 (CF kr) is sent to prove the possession of CF to the Um . IUℓ index set has been already created in the first protocol between the first owner of file F (When no duplication occurs.). The same set IUℓ is sent to prove that the current user indeed possess the file F . 3. The Um first verifies H1 (CF kr) to ensure that the CSP indeed possess the CF . Next, it com′ putes F (m) ← Fi1 , · · · , Fik , CCSP ← AsymEncpkCSP (H1 (F (m) )). As described in Remark 1 for the protocol in Figure 5, the goal of sending F (m) is to guarantee the proof of retrievability of

13

the encrypted file CF (i.e., data is indeed on the client-side). Next, it chooses random number ′ s ∈R {0, 1}λ and computes CUm ← EKUm (α). Finally, it sends s, H1 (CF krks), CUm , CCSP . Similarly as in the previous round, s and H1 (CF krks) are sent to prove that the Um indeed ′ possesses CF . Also, CCSP proves the possession of F . ?

(F )

′ 4. The CSP verifies H1 (CF krks) and checks H1 (F (ℓ) ) = AsymDecskCSP (CCSP ), and stores DuplicateUm (F )

← H1 (CF ) k tℓ k Um k Uℓ . Finally, it returns SignCSP (DuplicateUm ) to Um . This signature assures the user that the data is linked with the Um . Also, it prevents a malicious CSP (the signer) from denying the signing. Note that the users’ pseudonyms are used instead of their real identities. (F ) 5. The Um verifies the signature SignCSP (DuplicateUm ) and forwards it to TPA. (F )

6. The TPA verifies and stores SignCSP (DuplicateUm ). Note that TPA will process the auditing utilizing the signatures of Uℓ on behalf of Um . Because the signatures are anonymous no one will know any information about the real identity of Uℓ .

User (Ui ) Private: KUi , xi , sski Public:gi , hi (= gixi ), pkCSP , vi , spki

α ← SymDecKU (CUi ) i F ← SymDecα (CF )

CSP Private:skCSP Public:pkCSP , gj , hj , vj , spkj ∀ Uj Download Request −−−−−−−−−−−−−−− → − (CF , CUi ) −−−−−−−−−−−−−−− ←

Private Output: F

Private: ⊥

Fig. 7. The Scheme of Data Retrieval

(F )

Remark 2. We highlight that one needs to store the CSP’s signatures SignCSP (DuplicateUm ) for later auditing processing since otherwise one cannot later convince CSP that the deduplication was already occurred between the specific anonymous users. Therefore, in our proposal, the CSP sends these signatures to the users which forwards them to the TPA. We highlight that, unlike the existing auditing mechanisms, the TPAs in our proposal are not stateless (memorysless). In case of utilizing a stateless TPA, a local copy of these signatures should be copied on the user side (instead of forwarding to the TPA). However, in that case, the user must always forward these signatures to the TPA before it can start the auditing process. Charging each client in the case of deduplication. In practice, cloud providers typically charge its clients based on the amount of data they stored. We would like to highlight that anonymous certificates are only used for duplicated users and cloud storage providers can identify their clients in our mechanism since there is already an authentication protocol between each client and the storage provider. Therefore, it is easy for a cloud provider to charge each client if there is already a deduplication. 5.5

The Scheme of Data Retrieval

Our data retrieval protocol is as follows (see Figure 7): the user Um sends a download request from CSP and receives (CF , CUm ). Next, Um then computes the encryption key α ← SymDecKUm (CUm ) and obtains the file by computing F ← SymDecα (CF ). 14

5.6

Privacy-Preserving Public Auditing Scheme

The privacy of our auditing mechanism follows from the Wang et al.’s scheme [42] (see Figure 8). Let F be the file for auditing and Uℓ is the first owner of the file. Let Um is the target user for auditing and Um also posses Fm . (F ) In order for auditing the file of the user Um , TPA gets either sign1 = SignCSP (P ossessionUm ) (F )

(F )

(F )

or sign2 = SignCSP (DuplicateUm ) where P ossesionUm = H1 (CF )ktℓ kUm and DuplicateUm = H1 (CF )ktℓ kUm kUℓ . Note that if m 6= ℓ then sign1 is null (there is no duplication or Um is the first owner of the file), otherwise signature2 is null. Note that the existence of sign2 ensures the CSP that there was a deduplication with the user Uℓ , and in this case, the CSP needs to send necessary auditing information of the Uℓ . Then, the TPA next picks a random proper subset of J ⊂ I = {1, · · · , n} and for all j ∈ J, it sends (j, ωj ) to the CSP. It also sends either (F ) (F ) SignCSP (P ossessionUm ) or SignCSP (DuplicateUm ). P Q (ℓ) The CSP first verifies the signatures and then computes σ (ℓ) ← j∈J (σj )ωj , µ′ ← j∈J ωj · mj , and for random r ∈R Zp it computes R ← e(vℓ , hℓ )r . Next, the CSP computes γ ← H2 (R) and µ ← r + γµ′ and sends σ (ℓ) , µ, R back to the TPA. The TPA first computes γ ← H2 (R) and Q ? then verifies R · e((σ (ℓ) )γ , gℓ ) = e(( j∈J H2 (j||nameℓ )ωj )γ · vℓµ , hℓ ). TPA Private: ⊥ Public: pkCSP , gi , hi , vi ∀Ui

CSP Private: skCSP Public: pkCSP , gi , hi , vi , spki ∀ Ui

Let Um be target user and F be target file Let Uℓ be the first owner of F (F ) (F ) Either SignCSP (P ossessionUm ) or SignCSP (DuplicateUm )} is valid Pick random J ⊂ I = {1, · · · , n} (F ) (F ) {∀j ∈ J : (j, ωj )}, {SignCSP (P ossessionUm ) or SignCSP (DuplicateUm )} − −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− → −

(F )

(F )

{Verify SignCSP (P ossessionUm ) or SignCSP (DuplicateUm )} Q (ℓ) σ (ℓ) ← j∈J (σj )ωj P µ′ ← j∈J ωj · mj r ∈R Zp , R ← e(vℓ , hℓ )r γ ← H2 (R) and µ ← r + γµ′ Compute γ ← H2 (R) Q ? R · e((σ (ℓ) )γ , gℓ ) = e(( j∈J H2 (j||nameℓ )ωj )γ · vℓµ , hℓ )

(σ (ℓ) , µ, R) −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− ←

Fig. 8. Privacy-Preserving Public Auditing Scheme

6

Security Analysis

In this section, we prove that our proposed protocols in Section 5 satisfy the security model defined in Section 4. In the following proof, we start by constructing a probabilistic polynomial-time simulator that produces a protocol transcript which is statistically indistinguishable from the one resulting from a real execution of the authentication protocol [67]. Namely, given a corrupted party (either a user Uℓ or the CSP), we construct an efficient simulator that has access to the public and private input of the corrupted party before the simulation is run and produces a view which is statistically indistinguishable from the view of that party interacting with the other honest party. Theorem 1. Assume that OPRF used is secure, H1 , H2 are hash functions modeled as random oracles, and the CDH problem is hard. Then, our protocol in Figure 5 securely uploads a file F in the presence of malicious adversaries. Proof. The security proof of the protocol is shown in the OPRF-hybrid model, where the user and SF are assumed to have access to a trusted party computing the OPRF functionality following 15

the ideal model of our definition. Thus, the simulator will play the trusted party in the OPRF, when simulating for the adversary. We separately consider the different corruption cases (when no parties are corrupted, and when either one of the parties is corrupted). In the case that no parties are corrupted, the security reduces to the semi-honest case. Assume that the user Uℓ is corrupted: In this case, the simulator A has the skCSP apart from the public parameters. First of all, the adversary A arbitrarily a set IUℓ . A then sends Q (ℓ) (ℓ) ← the IUℓ to the user and obtains SendData. The simulator A then computes σ ∀i σi P m Q ? and checks the signatures by e(σ (ℓ) , gℓ ) = e(vℓ ∀i i ∀i H2 (iknameℓ ), hℓ ). Next, it also checks ?

h = H1 (CF ). If they are correct, it simulates AsymDecskCSP (CCSP ) since it knows the CCSP , the encrypted message AsymEncpkCSP (CCSP ), and pkCSP . Assume that the CSP is corrupted: In this case, the simulator A has the private values of the user Uℓ , i.e., F , KUℓ , xℓ , sskℓ . The adversary A first arbitrarily chooses a file F and calls the ideal H1 (F ) functionality OPRF and learns α = OPRFkKS . Next, A computes CF ← SymEncα (F ), h ← H1 (CF ), and finally sends h to CSP. Upon receiving No Duplicate and IUℓ , the adversary A computes F (ℓ) ← Fi1 k · · · kFik for ij ∈ IUℓ . Next, it encrypts it as CCSP ← AsymEncpkCSP (H1 (F (ℓ) )) and CUℓ ← SymEncKU (α). Let m1 · · · mn ← CF kCCSP where ∀mi ∈ Z∗q . For i = 1, · · · , n, it (ℓ)



computes σi ← AnonymSignxℓ (H2 (ik nameℓ ) vℓmi ) = [H2 (iknameℓ )vℓmi ]xℓ . Next, A computes the file tag for tℓ ← nameℓ || AnonymSignsskℓ to ensure the integrity of the unique file identifier (ℓ)

nameℓ . Finally, it prepares SendData ← {∀i : σi }ktℓ kCF kCCSP kCKℓ and returns SendData to the CSP. The transcript of the view of both parties is simulated, and it is consistent and statistically indistinguishable when interacting with the honest verifier.

CSP

Client Side

# # EC Pairing Multiplication The Scheme of Up- 2 1 loading a New File (Figure 5)

# Asym- # # EC metric Pairing MultiEnc/Sign plication 1 Sign 0 2n 1 AsymDec

The Scheme of Subse- 0 quent Upload (Figure 6) The Scheme of Data 0 Retrieval (Figure 7) Privacy-Preserving 0 Public Auditing Scheme (Figure 8)

0

0

1 Sign 0 1 AsymDec 0 0

|J| + 1

0

0

NA

TPA

# Asym- # # EC metric Pairing MultiEnc/Sign plication H1 (F ) 1 OPRFkKS NA NA 1 AsymEnc 1 AnonymSign H1 (F ) 1 OPRFkKS 0 0 1 AsymEnc

# Asymmetric Enc/Sign NA

0

0

NA

NA

NA

NA

NA

2

|J| + 3

0

0

Table 1. Computational Complexity of the Proposed Scheme. NA denotes “Not Applicable”, n denotes the number of blocks of the message CF ||CCSP , and J ⊂ I = {1, · · · , n}. Note that, for the sake of easy notation, we use multiplicative notation for elements of G1 , G2 , and G3 in our protocol descriptions.

Theorem 2. Assume that H1 is a hash function modeled as a random oracle and the CDH problem is hard. Then, our protocol for the subsequent upload in Figure 6 is secure against malicious users (uploaders) and malicius outside attackers. Proof. We provide a formal proof via the formal verification tool Scyther [68, 69] to ensure the proof of possession of Cf and that a malicious user cannot get any information during the secure 16

deduplication part of the protocol in Figure 6. Before we give a formal proof, we would like to briefly sketch the idea on the proof: the objective of the malicious uploader to compromise the confidentiality of data uploaded by legitimate users. In particular, the goal of an adversary is to convince the CSP that it possesses a pre-stored data. The proof of possession and the proof of retrievability are given by the first three rounds of the protocol. Namely, proving the knowledge of F (m) indeed guarantees that the file F is available on the client-side. Informally speaking, in the first step, an adversary Uadv obtains the hash of the encrypted data. Uadv then sends the hash of encrypted data which only allows him to learn whether there is deduplication. If there is a deduplication, the CSP challenges the adversary to assure the possession of data via r, H1 (CF ||r), IUi for some i. Note that because of the underlying secure hash function (in the random oracle model) the adversary cannot upload a fake data (which compromises the integrity of data) and it does not give any sensitive information to the adversary. The adversary has to return H1 (CF ||r||s), CUadv which is only possible if the adversary possess the data F . The CSP can only be ensured if the verifications are passed successfully. Hence, the adversary cannot link himself to a legitimate data without possessing it. The formal proof via a formal verification tool Scyther [68, 69] is given in Appendix A. The modeling of this part is given in a language called is Security Protocol Description Language (SPDL) in Appendix A including the codes of the validation of the protocol specifications. In this way, we show that no malicious adversary can get a valuable information during the deduplication part of the protocol. If the proof of possession of the ciphertext Cf is guaranteed then the user must ensure the ?

′ knowledge of the message F (m) by checking H1 (F (ℓ) ) = AsymDecskCSP (CCSP ). Furthermore, the (F )

signature SignCSP (DuplicateUm ) links anonymously the Uℓ with the Um . Furthermore, the CSP (F )

returns SignCSP (DuplicateUm ) to the Um which forwards it to the TPA. Because the signatures are anonymous no one will know any information about the real identity of Uℓ . Hence, the proposed protocol is secure against malicious adversaries. Theorem 3. Our protocol for the data retrieval in Figure 7 is secure against malicious users (uploaders) and malicious outside attackers. Proof. It is trivial to show that the data retrieval protocol does not lead to a security breach because the only value that the CSP sends are the encryptions CF and CUi . That is, since the adversary does not know the symmetric key of the encryptions it cannot get any information about the data.

Alkhojandi and Miri [52]

Li et al. [53]

This paper

Efficient Realizability (in terms of Deduplica- Privacy-Preserving Pubtion and Auditing) lic Auditing ✗ (Loss of deduplication ad✗ vantages) ✗ ✗ (achieves weak privacy since (Loss of bandwidth advanthe file is uploaded to the tages) TPA) X X

Table 2. Comparison with the Existing Schemes.

Theorem 4. Assume that H2 is a hash function modeled as a random oracle and the CDH problem is hard. Then, our protocol for the privacy-preserving public auditing in Figure 8 is secure against semi-honest TPAs and malicious CSPs. 17

Proof. First of all, the TPA obtains the tags tℓ and verifies their corresponding signatures. If they are correct, the TPA next picks a random proper subset of J ⊂ I = {1, · · · , n} and for all j ∈ J, it sends (j, ωj ) to the CSP to start the auditing process. The security, the privacypreserving public verifiability, and the storage correctness against corrupted TPAs and CSPs basically follow the underlying auditing mechanisms of Wang et al. [42]. Note that the proof of Wang et al. only ensures the integrity of the data and do not hide the identity of the users. In our scheme, there are two cases. (F )

– Um is the first owner of the file and TPA would have the signature of possession SignCSP (P ossessionUm ) (F )

where P ossessionUm = H1 (CF ) k tℓ k Um . This signature prevents a malicious CSP (the signer) from denying the signing (i.e., assures non-repudiation). On the other hand, they prevent malicious users from claiming false possession. (F ) – Um is owner of a deduplicated file and TPA would have the signature of duplication SignCSP (DuplicateUm ) (F )

where DuplicateUm = H1 (CF ) k tℓ k Um kUℓ . Note that these signatures ensure that there (F )

is a deduplication with the user Uℓ . On the one hand, the signatures SignCSP (DuplicateUm ) assures the user Um that her data is linked to the user Uℓ , and therefore, it prevents a malicious CSP (the signer) from denying the signing (i.e., assures non-repudiation). On the other hand, they prevent malicious users from claiming false duplications. Furthermore, the privacy of users during the auditing process is guaranteed by the underlying anonymous public-key certificates, i.e., users’ pseudonyms are used in the signatures instead of their real identities. In the case of a malicious activity, the CA can also easily trace the malicious users back and can detect their real identities. Hence, the modified version of the auditing scheme provides both secure deduplication privacy of users simultaneously.

7

Complexity Analysis

In Table 1, we provide the number of pairings, Elliptic Curve (EC) multiplications, and Asymmetric operations (in terms of public key encryption and signatures) of our system. The comparison has been given for the user, the CSP, and the TPA separately. Furthermore, in Table 2, we compare our protocol with the only existing protocols of [52] and [53] which aim to satisfy public auditing and secure deduplication simultaneously. Note that the main drawback of [52] is that deduplication is performed by an mediator within the enterprise. Therefore, not many deduplication would occur on a small size of enterprises instead of performing deduplication on the large CSPs. On the other hand, the main drawback of [53] is that the file is uploaded to the semi-honest TPA in plaintext form. Therefore, privacy will not be achieved against malicious TPAs. And more importantly, client-side deduplication solution of [53] would not eliminate the bandwidth advantage for a user. We also note that the protocol of [53] is also too inefficient because for a secure implementation one needs supersingular elliptic curves over large prime fields (see for further details [54]).

8

Conclusion

In this paper, we proposed a solution for the dilemma where the CSPs are willing to use deduplication to save communication and storage costs while the users are willing to encrypt their data before uploading and also to ensure that their data has never been modified without being detected. In this respect, we propose an efficient solution in order to fulfill privacy-preserving data 18

integrity and secure duplication simultaneously. We prove the security of the proposed scheme in the random oracle model, assuming that the CDH problem is hard. We leave open the question of constructing efficient cryptographic protocols that do not need certificate authorities or anonymous certificates. Acknowledgments. This work is supported by a grant from Ministry of Development of Turkey provided to the Cloud Computing and Big Data Research Lab Project (the project id: ¨ 2014K121030). The authors would like to thank Osmanbey Uzunkol, Devrim Unal, Ziya Alper Gen¸c, and anonymous referees for their valuable comments and suggestions.

References 1. Sean Quinlan and Sean Dorward. Venti: A new approach to archival storage. In Proceedings of the Conference on File and Storage Technologies, FAST ’02, pages 89–101, Berkeley, CA, USA, 2002. USENIX Association. 2. John R. Douceur, Atul Adya, William J. Bolosky, Dan Simon, and Marvin Theimer. Reclaiming space from duplicate files in a serverless distributed file system. In Proceedings of the 22 Nd International Conference on Distributed Computing Systems (ICDCS’02), ICDCS ’02, Washington, DC, USA, 2002. IEEE Computer Society. 3. Alex Osuna, Eva Balogh, Alexandre Ramos Galante de Carvalho, Rucel F. Javier, and Zohar Mann. Implementing ibm storage data deduplication solutions. 2011. 4. Fatema Rashid. Secure Data Deduplication in Cloud Environments. PhD thesis, Ryerson University, Toronto, Ontario, Canada, 2015. 5. Jesus Diaz Vico. Study of the security in cloud storage services: Analysis of dropbox and mega, 2016 (Accessed June 2016). http://www.incibe.es/CERT_en/publications/Studies/. 6. Danny Harnik, Benny Pinkas, and Alexandra Shulman-Peleg. Side channels in cloud services: Deduplication in cloud storage. IEEE Security and Privacy, 8(6):40–47, November 2010. 7. Alex Osuna, Eva Balogh, Alexandre Ramos Galante de Carvalho, Rucel F. Javier, and Zohar Mann. Implementing IBM Storage Data Deduplication Solutions, pages 1–322. march 2011. 8. Kate Miller. Cloud deduplication, on-demand: Storreduce, an apn technology partner. 2015 (Accessed July 2016). 9. Frederik Armknecht, Jens-Matthias Bohli, Ghassan O. Karame, Zongren Liu, and Christian A. Reuter. Outsourced proofs of retrievability. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, CCS ’14, pages 831–843, New York, NY, USA, 2014. ACM. 10. Richard Chow, Philippe Golle, Markus Jakobsson, Elaine Shi, Jessica Staddon, Ryusuke Masuoka, and Jesus Molina. Controlling data in the cloud: Outsourcing computation without outsourcing control. In Proceedings of the 2009 ACM Workshop on Cloud Computing Security, CCSW ’09, pages 85–90, New York, NY, USA, 2009. ACM. 11. Guangwei Xu, Chunlin Chen, Hongya Wang, Zhuping Zang, Mugen Pang, and Ping Jiang. Two-level verification of data integrity for data storage in cloud computing. In Gang Shen and Xiong Huang, editors, Advanced Research on Electronic Commerce, Web Application, and Communication, volume 143 of Communications in Computer and Information Science, pages 439–445. Springer Berlin Heidelberg, 2011. 12. Wenjun Luo and Guojing Bai. Ensuring the data integrity in cloud data storage. In Cloud Computing and Intelligence Systems (CCIS), 2011 IEEE International Conference on, pages 240–243, Sept 2011. 13. Hovav Shacham and Brent Waters. Compact proofs of retrievability. Journal of Cryptology, 26(3):442–483, 2013. 14. Sara Bouchenak, Gregory Chockler, Hana Chockler, Gabriela Gheorghe, Nuno Santos, and Alexander Shraer. Verifying cloud services: Present and future. SIGOPS Oper. Syst. Rev., 47(2):6–19, July 2013. 15. B. Wang, B. Li, and H. Li. Panda: Public auditing for shared data with efficient user revocation in the cloud. IEEE Transactions on Services Computing, 8(1):92–106, Jan 2015. 16. Mehmet Sabır Kiraz. A comprehensive meta-analysis of cryptographic security mechanisms for cloud computing. Journal of Ambient Intelligence and Humanized Computing, pages 1–30, june 2016. 17. N. Zhang, Q. Shi, and M. Merabti. Anonymous public-key certificates for anonymous and fair document exchange. IEE Proceedings - Communications, 147(6):345–350, Dec 2000. 18. D. Critchlow and N. Zhang. Revocation invocation for accountable anonymous pki certificate trees. In Computers and Communications, 2004. Proceedings. ISCC 2004. Ninth International Symposium on, volume 1, pages 386–392 Vol.1, June 2004. 19. S. Park, H. Park, Y. Won, J. Lee KISA, and S. Kent. Public-Key Cryptography Standards (PKCS) #1: RSA Cryptography Specifications Version 2.1. RFC 5636, August 2009.

19

20. Jesus Diaz, David Arroyo, and Francisco B. Rodriguez. Anonymity Revocation through Standard Infrastructures, pages 112–127. Springer Berlin Heidelberg, Berlin, Heidelberg, 2013. 21. Jesus Diaz, David Arroyo, and Francisco B. Rodriguez. New x.509-based mechanisms for fair anonymity management. Computers & Security, 46:111–125, 2014. 22. J.R. Douceur, A. Adya, W.J. Bolosky, P. Simon, and M. Theimer. Reclaiming space from duplicate files in a serverless distributed file system. In Distributed Computing Systems, 2002. Proceedings. 22nd International Conference on, pages 617–624, 2002. 23. Mihir Bellare, Sriram Keelveedhi, and Thomas Ristenpart. Dupless: Server-aided encryption for deduplicated storage. In Proceedings of the 22Nd USENIX Conference on Security, SEC’13, pages 179–194. USENIX Association, 2013. 24. Jan Stanek, Alessandro Sorniotti, Elli Androulaki, and Lukas Kencl. A Secure Data Deduplication Scheme for Cloud Storage, pages 99–118. Springer Berlin Heidelberg, Berlin, Heidelberg, 2014. 25. Nagapramod Mandagere, Pin Zhou, Mark A Smith, and Sandeep Uttamchandani. Demystifying data deduplication. In Proceedings of the ACM/IFIP/USENIX Middleware ’08 Conference Companion, Companion ’08, pages 12–17. ACM, 2008. 26. Dirk Meister and Andr´e Brinkmann. Multi-level comparison of data deduplication in a backup scenario. In Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference, SYSTOR ’09, pages 8:1–8:12. ACM, 2009. 27. Jia Xu, Ee-Chien Chang, and Jianying Zhou. Weak leakage-resilient client-side deduplication of encrypted data in cloud storage. In Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer and Communications Security, ASIA CCS ’13, pages 195–206, New York, NY, USA, 2013. ACM. 28. D. Harnik, B. Pinkas, and A. Shulman-Peleg. Side channels in cloud services: Deduplication in cloud storage. Security Privacy, IEEE, 8(6):40–47, Nov 2010. 29. Shai Halevi, Danny Harnik, Benny Pinkas, and Alexandra Shulman-Peleg. Proofs of ownership in remote storage systems. In Proceedings of the 18th ACM Conference on Computer and Communications Security, CCS ’11, pages 491–500. ACM, 2011. 30. Roberto Di Pietro and Alessandro Sorniotti. Boosting efficiency and security in proof of ownership for deduplication. In Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security, ASIACCS ’12, pages 81–82. ACM, 2012. 31. Wenjin Hu, Tao Yang, and Jeanna N. Matthews. The good, the bad and the ugly of consumer cloud storage. SIGOPS Oper. Syst. Rev., 44(3):110–115, August 2010. 32. Martin Mulazzani, Sebastian Schrittwieser, Manuel Leithner, Markus Huber, and Edgar Weippl. Dark clouds on the horizon: Using cloud storage as attack vector and online slack space. In Proceedings of the 20th USENIX Conference on Security, SEC’11, pages 5–5. USENIX Association, 2011. 33. Mihir Bellare, Sriram Keelveedhi, and Thomas Ristenpart. Message-locked encryption and secure deduplication. In Thomas Johansson and PhongQ. Nguyen, editors, Advances in Cryptology – EUROCRYPT 2013, volume 7881 of Lecture Notes in Computer Science, pages 296–312. Springer Berlin Heidelberg, 2013. 34. Jin Li, Xiaofeng Chen, Mingqiang Li, Jingwei Li, P.P.C. Lee, and Wenjing Lou. Secure deduplication with efficient and reliable convergent key management. Parallel and Distributed Systems, IEEE Transactions on, 25(6):1615–1625, June 2014. 35. Jan Stanek, Alessandro Sorniotti, Elli Androulaki, and Lukas Kencl. A secure data deduplication scheme for cloud storage. In Nicolas Christin and Reihaneh Safavi-Naini, editors, Financial Cryptography and Data Security, volume 8437 of Lecture Notes in Computer Science, pages 99–118. Springer Berlin Heidelberg, 2014. 36. Jian Liu, N. Asokan, and Benny Pinkas. Secure deduplication of encrypted data without additional independent servers. In Proceedings of the 22Nd ACM SIGSAC Conference on Computer and Communications Security, CCS ’15, pages 874–885, New York, NY, USA, 2015. ACM. 37. Yifeng Zheng, Xingliang Yuan, Xinyu Wang, Jinghua Jiang, Cong Wang, and Xiaolin Gui. Enabling encrypted cloud media center with secure deduplication. In Proceedings of the 10th ACM Symposium on Information, Computer and Communications Security, ASIA CCS ’15, pages 63–72. ACM, 2015. 38. Giuseppe Ateniese, Randal Burns, Reza Curtmola, Joseph Herring, Lea Kissner, Zachary Peterson, and Dawn Song. Provable data possession at untrusted stores. Cryptology ePrint Archive, Report 2007/202, 2007. http://eprint.iacr.org/2007/202. 39. Giuseppe Ateniese, Randal Burns, Reza Curtmola, Joseph Herring, Osama Khan, Lea Kissner, Zachary Peterson, and Dawn Song. Remote data checking using provable data possession. ACM Trans. Inf. Syst. Secur., 14(1):12:1–12:34, June 2011. 40. Huaqun Wang. Proxy provable data possession in public clouds. Services Computing, IEEE Transactions on, 6(4):551–559, Oct 2013. 41. Jiawei Yuan and Shucheng Yu. Proofs of retrievability with public verifiability and constant communication cost in cloud. In Proceedings of the 2013 International Workshop on Security in Cloud Computing, Cloud Computing ’13, pages 19–26, New York, NY, USA, 2013. ACM.

20

42. Cong Wang, S.S.M. Chow, Qian Wang, Kui Ren, and Wenjing Lou. Privacy-preserving public auditing for secure cloud storage. Computers, IEEE Transactions on, 62(2):362–375, Feb 2013. 43. Mehdi Sookhak, Hamid Talebian, Ejaz Ahmed, Abdullah Gani, and Muhammad Khurram Khan. A review on remote data auditing in single cloud server: Taxonomy and open issues. Journal of Network and Computer Applications, 43(0):121 – 141, 2014. 44. Alshaimaa Abo-alian, N.L. Badr, and M.F. Tolba. Auditing-as-a-service for cloud storage. In P. Angelov, K.T. Atanassov, L. Doukovska, M. Hadjiski, V. Jotsov, J. Kacprzyk, N. Kasabov, S. Sotirov, E. Szmidt, and S. Zadro˙zny, editors, Intelligent Systems’2014, volume 322 of Advances in Intelligent Systems and Computing, pages 559–568. Springer International Publishing, 2015. ˙ Sertkaya, and Osmanbey Uzunkol. An Efficient ID-Based Message Recoverable 45. Mehmet Sabır Kiraz, Isa Privacy-Preserving Auditing Scheme. In Privacy, Security and Trust (PST), 2015 13th Annual Conference on, pages 117–124. IEEE, July 2015. 46. Hovav Shacham and Brent Waters. Compact proofs of retrievability. In Josef Pieprzyk, editor, Advances in Cryptology - ASIACRYPT 2008, volume 5350 of Lecture Notes in Computer Science, pages 90–107. Springer Berlin Heidelberg, 2008. 47. Qian Wang, Cong Wang, Jin Li, Kui Ren, and Wenjing Lou. Enabling public verifiability and data dynamics for storage security in cloud computing. In Proceedings of the 14th European Conference on Research in Computer Security, ESORICS’09, pages 355–370. Springer-Verlag, 2009. 48. Jia Xu and Ee-Chien Chang. Towards efficient proofs of retrievability. In Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security, ASIACCS ’12, pages 79–80. ACM, 2012. 49. Nathalie Baracaldo, Elli Androulaki, Joseph Glider, and Alessandro Sorniotti. Reconciling end-to-end confidentiality and data reduction in cloud storage. In Proceedings of the 6th Edition of the ACM Workshop on Cloud Computing Security, CCSW ’14, pages 21–32. ACM, 2014. 50. Youngjoo Shin, Dongyoung Koo, Junbeom Hur, and Joobeom Yun. Secure proof of storage with deduplication for cloud storage systems. Multimedia Tools and Applications, pages 1–16, 2015. 51. Jiawei Yuan and Shucheng Yu. Secure and constant cost public cloud storage auditing with deduplication. In Communications and Network Security (CNS), 2013 IEEE Conference on, pages 145–153, Oct 2013. 52. Naelah Alkhojandi and Ali Miri. Foundations and Practice of Security: 7th International Symposium, FPS 2014, Montreal, QC, Canada, November 3-5, 2014. Revised Selected Papers, chapter Privacy-Preserving Public Auditing in Cloud Computing with Data Deduplication, pages 35–48. Springer International Publishing, Cham, 2015. 53. J. Li, J. Li, D. Xie, and Z. Cai. Secure auditing and deduplicating data in cloud. IEEE Transactions on Computers, PP(99), 2015. 54. Mehmet Sabır Kiraz and Osmanbey Uzunkol. Still wrong use of pairings in cryptography. Cryptology ePrint Archive, Report 2016/223, 2016. http://eprint.iacr.org/2016/223. 55. I. Blake, G. Seroussi, N. Smart, and J. W. S. Cassels. Advances in Elliptic Curve Cryptography (London Mathematical Society Lecture Note Series). Cambridge University Press, New York, NY, USA, 2005. 56. W. Diffie and M. Hellman. New directions in cryptography. IEEE Trans. Inf. Theor., 22(6):644–654, September 2006. 57. Mihir Bellare and Phillip Rogaway. Optimal asymmetric encryption – how to encrypt with rsa. In Advances in Cryptology, EUROCRYPT 94, volume 950, pages 92–111. Springer-Verlag, May 1995. 58. Pascal Paillier and David Pointcheval. Efficient public-key cryptosystems provably secure against active adversaries. In Proc. of Asiacrypt’99, Lecture Notes in Computer Science, pages 165–179. Springer Verlag, 1999. 59. Taher El Gamal. A public key cryptosystem and a signature scheme based on discrete logarithms. In Proceedings of CRYPTO 84 on Advances in cryptology, pages 10–18, 1985. 60. Jan Camenisch and Els Van Herreweghen. Design and implementation of the idemix anonymous credential system. In Proceedings of the 9th ACM Conference on Computer and Communications Security, CCS ’02, pages 21–30, New York, NY, USA, 2002. ACM. 61. Christian Paquin. U-prove technology overview v1.1 (revision 2). April 2013. 62. Michael J. Freedman, Yuval Ishai, Benny Pinkas, and Omer Reingold. Keyword search and oblivious pseudorandom functions. In Theory of Cryptography: Second Theory of Cryptography Conference, TCC 2005, Cambridge, MA, USA, February 10-12, 2005. Proceedings, pages 303–324, Berlin, Heidelberg, 2005. Springer Berlin Heidelberg. 63. Stanislaw Jarecki and Xiaomin Liu. Efficient oblivious pseudorandom function with applications to adaptive ot and secure computation of set intersection. In Proceedings of the 6th Theory of Cryptography Conference on Theory of Cryptography, TCC ’09, pages 577–594, Berlin, Heidelberg, 2009. Springer-Verlag. 64. David Chaum. Blind Signatures for Untraceable Payments, pages 199–203. Springer US, Boston, MA, 1983. 65. Mihir Bellare and Phillip Rogaway. Random oracles are practical: A paradigm for designing efficient protocols. In Proceedings of the 1st ACM Conference on Computer and Communications Security, CCS ’93, pages 62–73, New York, NY, USA, 1993. ACM.

21

66. R. C. Merkle. Protocols for public key cryptosystems. In 1980 IEEE Symposium on Security and Privacy, pages 122–122, April 1980. 67. Oded Goldreich. The Foundations of Cryptography - Volume 1, Basic Techniques. Cambridge University Press, 2001. 68. C.J.F. Cremers. Scyther - Semantics and Verification of Security Protocols. Ph.D. dissertation, Eindhoven University of Technology, 2006. 69. Cas J. Cremers. The scyther tool: Verification, falsification, and analysis of security protocols. In Proceedings of the 20th International Conference on Computer Aided Verification, CAV ’08, pages 414–418, Berlin, Heidelberg, 2008. Springer-Verlag.

A A.1

Formal Security Analysis Using Scyhter Security Protocol Description Language (SPDL) Code

Because it is very effective for finding errors, its modeling adversaries with different capabilities, and it is easy to implement the protocol we chose Scyther [68, 69] as an automatic evaluation tool. Scyther’s output also ensures that no information is leaked during the protocol. # C f i s modeled a s k ( I ,R) b e c a u s e o f p r i v a t e s h a r e d i n f o r m a t i o n hashfunction h ; p r o t o c o l hashed−p r o t o c o l ( I ,R) { role I { var r : Nonce ; f r e s h s : Nonce ; s e n d 1 ( I , R, h ( k ( I ,R ) ) ) ; # U m s e n d s H 1 ( C f ) t o CSP r e c v 2 (R, I , r , h ( k ( I ,R) , r ) ) ; # CSP s e n d s r , H 1 ( C f | | r ) t o U m s e n d 3 ( I , R, s , h ( k ( I ,R) , r , s ) ) ; # U m s e n d s s , H 1 ( C f | | r | | s ) t o CSP c l a i m i 1 ( I , N i a g r e e ) ; # c h e c k s t h e i n t e g r i t y o f t h e t r a n s m i t t e d data c l a i m i 2 ( I , S e c r e t , k ( I ,R ) ) ; # c h e c k s t h e c o n f i d e n t i a l i t y o f t h e data } role R { f r e s h r : Nonce ; var s : Nonce ; r e c v 1 ( I , R, h ( k ( I ,R ) ) ) ; s e n d 2 (R, I , r , h ( k ( I ,R) , r ) ) ; r e c v 3 ( I , R, s , h ( k ( I ,R) , r , s ) ) ; c l a i m r 1 (R, N i a g r e e ) ; # c h e c k s t h e i n t e g r i t y o f t h e t r a n s m i t t e d data c l a i m r 2 (R, S e c r e t , k ( I ,R ) ) ; # c h e c k s t h e c o n f i d e n t i a l i t y o f t h e data } }

22

Loading...

An Efficient Scheme for Secure Deduplication with Privacy-Preserving

Solving the Secure Storage Dilemma: An Efficient Scheme for Secure Deduplication with Privacy-Preserving Public Auditing S¨ uleyman Karda¸s∗ and Mehme...

411KB Sizes 2 Downloads 12 Views

Recommend Documents

Secure and Efficient Log Management with Quest - LOMAG-MAN .Org
The information in this document is provided in connection with Quest products. No license, express or implied, by ... c

An Efficient Mobile Voting System Security Scheme Based on Elliptic
An Efficient Mobile Voting System Security Scheme Based on Elliptic Curve Cryptography. Abstract: The wide-spread use of

Platform for efficient and secure data collection and exploitation in
Oct 21, 2016 - based routing protocol (Secure Greedy Traffic Aware Routing Protocol (S-GyTAR)) which takes into .... sin

A Dual Key-Based Activation Scheme for Secure LoRaWAN - Hindawi
Oct 12, 2017 - (ii)Gateway: a LoRaWAN gateway receives packets from the end node via a LoRa radio link. ..... (ii)Gatewa

Global Deduplication with Veeam Backup&Replication - StarWind
Global Deduplication with Veeam Backup&Replication. Page | 2. TRADEMARKS. “StarWind”, “StarWind Software”, and t

BioTechniques - An electrophoretic capture method for efficient
(PCR Control, 1250 GE-purified human DNA spike with no stool sample added.) GE, genomic equivalents. We have shown in th

A new forward secure digital signature scheme
Destructive and ginger ferret rectifies its complement fatidically kaolinize precipitator. winterier and particularize t

Characteristics of an efficient market
Keywords: efficient market hypothesis, random walk model, information efficiency. ... command economy, A competitive mar

Frontiers | Development of an Efficient Bacterial Consortium for the
Kaustuvmani Patowary1, Rupshikha Patowary1, Mohan C. Kalita2 and Suresh Deka1* .... Moreover, hydrocarbon-degrading bact

PcapWT: An Efficient Packet Extraction Tool for Large - HAL-Inria
Jan 29, 2014 - select queries, thus we can use the RG or RRR representation to handle the bit sequences. The wavelet tre