Utilizing Transfer Learning and Multi-Task Learning for Evaluating the Prediction of Chromatin Accessibility in Cancer and Neuron Cell Lines Using Genomic Sequences

If you need an accessible version of this item, please email your request to digschol@iu.edu so that they may create one and provide it to you.
Date
2023-08
Language
American English
Embargo Lift Date
Department
Committee Chair
Degree
M.S.E.C.E.
Degree Year
2023
Department
Electrical & Computer Engineering
Grantor
Purdue University
Journal Title
Journal ISSN
Volume Title
Found At
Abstract

The prediction of chromatin accessibility for cancer and neuron cell lines using genomic sequences is quite challenging. Advances in machine learning and deep learning techniques allow such challenges to be addressed. This thesis investigates the use of both the transfer learning and the multi-task learning techniques. In particular, this research demonstrates the potential of transfer learning and multi-task learning in improving the prediction accuracy for twenty-three cancer types in human and neuron cell lines. Three different network architectures are used: the Basset network, the network, and the DeepSEA network. In addition, two transfer learning techniques are also used. In the first technique data relevant to the desired prediction task is not used during the pre-training stage while the second technique includes limited data about the desired prediction task in the pre-training phase. The preferred performance evaluation metric used to evaluate the performance of the models was the AUPRC due to the numerous negative samples. Our results demonstrate an average improvement of 4% of the DeepSEA network in predicting all twenty-three cancer cell line types when using the first technique, a decrease of 0.42% when using the second technique, and an increase of 0.40% when using multi-task learning. Also, it had an average improvement of 3.09% when using the first technique, 1.16% when using the second technique and 4.60% for the multi-task learning when predicting chromatin accessibility for the 14 neuron cell line types. The DanQ network had an average improvement of 1.18% using the first transfer learning technique, the second transfer learning technique showed an average decrease of 1.93% and also, a decrease of 0.90% for the multi-task learning technique when predicting for the different cancer cell line types. When predicting for the different neuron cell line types the DanQ had an average improvement of 1.56% using the first technique, 3.21% when using the second technique, and 5.35% for the multi-task learning techniques. The Basset network showed an average improvement of 2.93% using the first transfer learning technique and an average decrease of 0.02%, and 0.63% when using the second technique and multi-task learning technique respectively. Using the Basset network for prediction of chromatin accessibility in the different neuron types showed an average increase of 2.47%, 3.80% and 5.50% for the first transfer learning technique, second transfer learning technique and the multi-task learning technique respectively. The results show that the best technique for the cancer cell lines prediction is the first transfer learning model as it showed an improvement for all three network types, while the best technique for predicting chromatin accessibility in the neuron cell lines is the multi-task learning technique which showed the highest average improvement among all networks. The DeepSEA network showed the greatest improvement in performance among all techniques when predicting the different cancer cell line types. Also, it showed the greatest improvement when using the first transfer learning technique for predicting chromatin accessibility for neuron cell lines in the brain. The basset network showed the greatest improvement for the multi-task learning technique and the second transfer learning technique when predicting the accessibility for neuron cell lines.

Description
IUPUI
item.page.description.tableofcontents
item.page.relation.haspart
Cite As
ISSN
Publisher
Series/Report
Sponsorship
Major
Extent
Identifier
Relation
Journal
Source
Alternative Title
Type
Thesis
Number
Volume
Conference Dates
Conference Host
Conference Location
Conference Name
Conference Panel
Conference Secretariat Location
Version
Full Text Available at
This item is under embargo {{howLong}}