An Improved Utility Driven Approach Towards K-Anonymity Using Data Constraint Rules

Morton, Stuart Michael

An Improved Utility Driven Approach Towards K-Anonymity Using Data Constraint Rules

Files

Final_version3.pdf (942.46 KB)

Date

2013-08-14

Authors

Morton, Stuart Michael

Language

American English

Committee Chair

Mahoui, Malika

Committee Members

Palakal, Mathew J.
Gibson, P. Joseph
Kharrazi, Hadi

Degree

Ph.D.

Degree Year

2012

Department

School of Informatics and Computing

Grantor

Indiana University

Abstract

As medical data continues to transition to electronic formats, opportunities arise for researchers to use this microdata to discover patterns and increase knowledge that can improve patient care. Now more than ever, it is critical to protect the identities of the patients contained in these databases. Even after removing obvious “identifier” attributes, such as social security numbers or first and last names, that clearly identify a specific person, it is possible to join “quasi-identifier” attributes from two or more publicly available databases to identify individuals. K-anonymity is an approach that has been used to ensure that no one individual can be distinguished within a group of at least k individuals. However, the majority of the proposed approaches implementing k-anonymity have focused on improving the efficiency of algorithms implementing k-anonymity; less emphasis has been put towards ensuring the “utility” of anonymized data from a researchers’ perspective. We propose a new data utility measurement, called the research value (RV), which extends existing utility measurements by employing data constraints rules that are designed to improve the effectiveness of queries against the anonymized data. To anonymize a given raw dataset, two algorithms are proposed that use predefined generalizations provided by the data content expert and their corresponding research values to assess an attribute’s data utility as it is generalizing the data to ensure k-anonymity. In addition, an automated algorithm is presented that uses clustering and the RV to anonymize the dataset. All of the proposed algorithms scale efficiently when the number of attributes in a dataset is large.

Description

Indiana University-Purdue University Indianapolis (IUPUI)

Keywords

Data Privacy, Utility, K-Anonymity

LC Subjects

Electronic records -- Access control, Privacy, Right of, Public records -- Access control, Utility theory -- Mathematical models, Attribute focusing (Data mining), Data protection -- Research, Cluster analysis -- Data processing, Database security

Rights

Type

Thesis

Permanent Link

https://hdl.handle.net/1805/3427
http://dx.doi.org/10.7912/C2/924

Collections

Informatics Graduate Theses and PhD Dissertations
Informatics School Theses and Dissertations

Full item page