An Improved Utility Driven Approach Towards K-Anonymity Using Data Constraint Rules

dc.contributor.advisorMahoui, Malika
dc.contributor.authorMorton, Stuart Michael
dc.contributor.otherPalakal, Mathew J.
dc.contributor.otherGibson, P. Joseph
dc.contributor.otherKharrazi, Hadi
dc.date.accessioned2013-08-14T16:20:50Z
dc.date.available2013-08-14T16:20:50Z
dc.date.issued2013-08-14
dc.degree.date2012en_US
dc.degree.disciplineSchool of Informatics and Computingen_US
dc.degree.grantorIndiana Universityen_US
dc.degree.levelPh.D.en_US
dc.descriptionIndiana University-Purdue University Indianapolis (IUPUI)en_US
dc.description.abstractAs medical data continues to transition to electronic formats, opportunities arise for researchers to use this microdata to discover patterns and increase knowledge that can improve patient care. Now more than ever, it is critical to protect the identities of the patients contained in these databases. Even after removing obvious “identifier” attributes, such as social security numbers or first and last names, that clearly identify a specific person, it is possible to join “quasi-identifier” attributes from two or more publicly available databases to identify individuals. K-anonymity is an approach that has been used to ensure that no one individual can be distinguished within a group of at least k individuals. However, the majority of the proposed approaches implementing k-anonymity have focused on improving the efficiency of algorithms implementing k-anonymity; less emphasis has been put towards ensuring the “utility” of anonymized data from a researchers’ perspective. We propose a new data utility measurement, called the research value (RV), which extends existing utility measurements by employing data constraints rules that are designed to improve the effectiveness of queries against the anonymized data. To anonymize a given raw dataset, two algorithms are proposed that use predefined generalizations provided by the data content expert and their corresponding research values to assess an attribute’s data utility as it is generalizing the data to ensure k-anonymity. In addition, an automated algorithm is presented that uses clustering and the RV to anonymize the dataset. All of the proposed algorithms scale efficiently when the number of attributes in a dataset is large.en_US
dc.identifier.urihttps://hdl.handle.net/1805/3427
dc.identifier.urihttp://dx.doi.org/10.7912/C2/924
dc.language.isoen_USen_US
dc.subjectData Privacyen_US
dc.subjectUtilityen_US
dc.subjectK-Anonymityen_US
dc.subject.lcshElectronic records -- Access controlen_US
dc.subject.lcshPrivacy, Right ofen_US
dc.subject.lcshPublic records -- Access controlen_US
dc.subject.lcshUtility theory -- Mathematical modelsen_US
dc.subject.lcshAttribute focusing (Data mining)en_US
dc.subject.lcshData protection -- Researchen_US
dc.subject.lcshCluster analysis -- Data processingen_US
dc.subject.lcshDatabase securityen_US
dc.titleAn Improved Utility Driven Approach Towards K-Anonymity Using Data Constraint Rulesen_US
dc.typeThesisen
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Final_version3.pdf
Size:
942.46 KB
Format:
Adobe Portable Document Format
Description:
Correct table of contents version
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.88 KB
Format:
Item-specific license agreed upon to submission
Description: