An Improved Utility Driven Approach Towards K-Anonymity Using Data Constraint Rules
dc.contributor.advisor | Mahoui, Malika | |
dc.contributor.author | Morton, Stuart Michael | |
dc.contributor.other | Palakal, Mathew J. | |
dc.contributor.other | Gibson, P. Joseph | |
dc.contributor.other | Kharrazi, Hadi | |
dc.date.accessioned | 2013-08-14T16:20:50Z | |
dc.date.available | 2013-08-14T16:20:50Z | |
dc.date.issued | 2013-08-14 | |
dc.degree.date | 2012 | en_US |
dc.degree.discipline | School of Informatics and Computing | en_US |
dc.degree.grantor | Indiana University | en_US |
dc.degree.level | Ph.D. | en_US |
dc.description | Indiana University-Purdue University Indianapolis (IUPUI) | en_US |
dc.description.abstract | As medical data continues to transition to electronic formats, opportunities arise for researchers to use this microdata to discover patterns and increase knowledge that can improve patient care. Now more than ever, it is critical to protect the identities of the patients contained in these databases. Even after removing obvious “identifier” attributes, such as social security numbers or first and last names, that clearly identify a specific person, it is possible to join “quasi-identifier” attributes from two or more publicly available databases to identify individuals. K-anonymity is an approach that has been used to ensure that no one individual can be distinguished within a group of at least k individuals. However, the majority of the proposed approaches implementing k-anonymity have focused on improving the efficiency of algorithms implementing k-anonymity; less emphasis has been put towards ensuring the “utility” of anonymized data from a researchers’ perspective. We propose a new data utility measurement, called the research value (RV), which extends existing utility measurements by employing data constraints rules that are designed to improve the effectiveness of queries against the anonymized data. To anonymize a given raw dataset, two algorithms are proposed that use predefined generalizations provided by the data content expert and their corresponding research values to assess an attribute’s data utility as it is generalizing the data to ensure k-anonymity. In addition, an automated algorithm is presented that uses clustering and the RV to anonymize the dataset. All of the proposed algorithms scale efficiently when the number of attributes in a dataset is large. | en_US |
dc.identifier.uri | https://hdl.handle.net/1805/3427 | |
dc.identifier.uri | http://dx.doi.org/10.7912/C2/924 | |
dc.language.iso | en_US | en_US |
dc.subject | Data Privacy | en_US |
dc.subject | Utility | en_US |
dc.subject | K-Anonymity | en_US |
dc.subject.lcsh | Electronic records -- Access control | en_US |
dc.subject.lcsh | Privacy, Right of | en_US |
dc.subject.lcsh | Public records -- Access control | en_US |
dc.subject.lcsh | Utility theory -- Mathematical models | en_US |
dc.subject.lcsh | Attribute focusing (Data mining) | en_US |
dc.subject.lcsh | Data protection -- Research | en_US |
dc.subject.lcsh | Cluster analysis -- Data processing | en_US |
dc.subject.lcsh | Database security | en_US |
dc.title | An Improved Utility Driven Approach Towards K-Anonymity Using Data Constraint Rules | en_US |
dc.type | Thesis | en |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Final_version3.pdf
- Size:
- 942.46 KB
- Format:
- Adobe Portable Document Format
- Description:
- Correct table of contents version
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 1.88 KB
- Format:
- Item-specific license agreed upon to submission
- Description: