An Improved Utility Driven Approach Towards K-Anonymity Using Data Constraint Rules

Morton, Stuart Michael

An Improved Utility Driven Approach Towards K-Anonymity Using Data Constraint Rules

dc.contributor.advisor	Mahoui, Malika
dc.contributor.author	Morton, Stuart Michael
dc.contributor.other	Palakal, Mathew J.
dc.contributor.other	Gibson, P. Joseph
dc.contributor.other	Kharrazi, Hadi
dc.date.accessioned	2013-08-14T16:20:50Z
dc.date.available	2013-08-14T16:20:50Z
dc.date.issued	2013-08-14
dc.degree.date	2012	en_US
dc.degree.discipline	School of Informatics and Computing	en_US
dc.degree.grantor	Indiana University	en_US
dc.degree.level	Ph.D.	en_US
dc.description	Indiana University-Purdue University Indianapolis (IUPUI)	en_US
dc.description.abstract	As medical data continues to transition to electronic formats, opportunities arise for researchers to use this microdata to discover patterns and increase knowledge that can improve patient care. Now more than ever, it is critical to protect the identities of the patients contained in these databases. Even after removing obvious “identifier” attributes, such as social security numbers or first and last names, that clearly identify a specific person, it is possible to join “quasi-identifier” attributes from two or more publicly available databases to identify individuals. K-anonymity is an approach that has been used to ensure that no one individual can be distinguished within a group of at least k individuals. However, the majority of the proposed approaches implementing k-anonymity have focused on improving the efficiency of algorithms implementing k-anonymity; less emphasis has been put towards ensuring the “utility” of anonymized data from a researchers’ perspective. We propose a new data utility measurement, called the research value (RV), which extends existing utility measurements by employing data constraints rules that are designed to improve the effectiveness of queries against the anonymized data. To anonymize a given raw dataset, two algorithms are proposed that use predefined generalizations provided by the data content expert and their corresponding research values to assess an attribute’s data utility as it is generalizing the data to ensure k-anonymity. In addition, an automated algorithm is presented that uses clustering and the RV to anonymize the dataset. All of the proposed algorithms scale efficiently when the number of attributes in a dataset is large.	en_US
dc.identifier.uri	https://hdl.handle.net/1805/3427
dc.identifier.uri	http://dx.doi.org/10.7912/C2/924
dc.language.iso	en_US	en_US
dc.subject	Data Privacy	en_US
dc.subject	Utility	en_US
dc.subject	K-Anonymity	en_US
dc.subject.lcsh	Electronic records -- Access control	en_US
dc.subject.lcsh	Privacy, Right of	en_US
dc.subject.lcsh	Public records -- Access control	en_US
dc.subject.lcsh	Utility theory -- Mathematical models	en_US
dc.subject.lcsh	Attribute focusing (Data mining)	en_US
dc.subject.lcsh	Data protection -- Research	en_US
dc.subject.lcsh	Cluster analysis -- Data processing	en_US
dc.subject.lcsh	Database security	en_US
dc.title	An Improved Utility Driven Approach Towards K-Anonymity Using Data Constraint Rules	en_US
dc.type	Thesis	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Final_version3.pdf
Size:: 942.46 KB
Format:: Adobe Portable Document Format
Description:: Correct table of contents version

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.88 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Informatics Graduate Theses and PhD Dissertations
Informatics School Theses and Dissertations