- Department of Computer and Information Science Works
Department of Computer and Information Science Works
Permanent URI for this collection
Browse
Recent Submissions
Item Unsupervised Deep Learning for an Image Based Network Intrusion Detection System(IEEE, 2023-12) Hosler, Ryan; Sundar, Agnideven; Zou, Xukai; Li, Feng; Gao, Tianchong; Computer and Information Science, Purdue School of ScienceThe most cost-effective method of cybersecurity is prevention. Therefore, organizations and individuals utilize Network Intrusion Detection Systems (NIDS) to inspect network flow for potential intrusions. However, Deep Learning based NIDS still struggle with high false alarm rates and detecting novel and unseen attacks. Therefore, in this paper, we propose a novel NIDS framework based on generating images from feature vectors and applying Unsupervised Deep Learning. For evaluation, we apply this method on four publicly available datasets and have demonstrated an accuracy improvement of up to 8.25 % when compared to Deep Learning models applied to the original feature vectors.Item A neural network approach to multi-biomarker panel discovery by high-throughput plasma proteomics profiling of breast cancer(Springer Nature, 2013) Zhang, Fan; Chen, Jake; Wang, Mu; Drabier, Renee; Computer and Information Science, Purdue School of ScienceBackground: In the past several years, there has been increasing interest and enthusiasm in molecular biomarkers as tools for early detection of cancer. Liquid chromatography tandem mass spectrometry (LC/MS/MS) based plasma proteomics profiling technique is a promising technology platform to study candidate protein biomarkers for early detection of cancer. Factors such as inherent variability, protein detectability limitation, and peptide discovery biases among LC/MS/MS platforms have made the classification and prediction of proteomics profiles challenging. Developing proteomics data analysis methods to identify multi-protein biomarker panels for breast cancer diagnosis based on neural networks provides hope for improving both the sensitivity and the specificity of candidate cancer biomarkers for early detection. Results: In our previous method, we developed a Feed Forward Neural Network-based method to build the classifier for plasma samples of breast cancer and then applied the classifier to predict blind dataset of breast cancer. However, the optimal combination C* in our previous method was actually determined by applying the trained FFNN on the testing set with the combination. Therefore, in this paper, we applied a three way data split to the Feed Forward Neural Network for training, validation and testing based. We found that the prediction performance of the FFNN model based on the three way data split outperforms our previous method and the prediction performance is improved from (AUC = 0.8706, precision = 82.5%, accuracy = 82.5%, sensitivity = 82.5%, specificity = 82.5% for the testing set) to (AUC = 0.895, precision = 86.84%, accuracy = 85%, sensitivity = 82.5%, specificity = 87.5% for the testing set). Conclusions: Further pathway analysis showed that the top three five-marker panels are associated with complement and coagulation cascades, signaling, activation, and hemostasis, which are consistent with previous findings. We believe the new approach is a better solution for multi-biomarker panel discovery and it can be applied to other clinical proteomics.Item Hyper-structure mining of frequent patterns in uncertain data streams(Springer Nature, 2013) HewaNadungodage, Chandima; Xia, Yuni; Lee, Jaehwan John; Tu, Yi-Cheng; Computer and Information Science, Purdue School of ScienceData uncertainty is inherent in many real-world applications such as sensor monitoring systems, location-based services, and medical diagnostic systems. Moreover, many real-world applications are now capable of producing continuous, unbounded data streams. During the recent years, new methods have been developed to find frequent patterns in uncertain databases; nevertheless, very limited work has been done in discovering frequent patterns in uncertain data streams. The current solutions for frequent pattern mining in uncertain streams take a FP-tree-based approach; however, recent studies have shown that FP-tree-based algorithms do not perform well in the presence of data uncertainty. In this paper, we propose two hyper-structure-based false-positive-oriented algorithms to efficiently mine frequent itemsets from streams of uncertain data. The first algorithm, UHS-Stream, is designed to find all frequent itemsets up to the current moment. The second algorithm, TFUHS-Stream, is designed to find frequent itemsets in an uncertain data stream in a time-fading manner. Experimental results show that the proposed hyper-structure-based algorithms outperform the existing tree-based algorithms in terms of accuracy, runtime, and memory usage.Item Proof of User Similarity: The Spatial Measurer of Blockchain(IEEE, 2024-05) Wang, Shengling; Shi, Lina; Shi, Hongwei; Zhang, Yifang; Hu, Qin; Cheng, Xiuzhen; Computer and Information Science, Purdue School of ScienceAlthough proof of work (PoW) consensus dominates the current blockchain-based systems mostly, it has always been criticized for the uneconomic brute-force calculation. As alternatives, energy-conservation and energy-recycling mechanisms heaved in sight. In this article, we propose proof of user similarity (PoUS), a distinct energy-recycling consensus mechanism, harnessing the valuable computing power to calculate the similarities of users, and enact the calculation results into the packing rule. However, the expensive calculation required in PoUS challenges miners in participating, and may induce plagiarism and lying risks. To resolve these issues, PoUS embraces the best-effort schema by allowing miners to compute partially. Besides, a voting mechanism based on the secure two-party computation and Bayesian truth serum is proposed to guarantee privacy-preserved voting and truthful reports. Noticeably, PoUS distinguishes itself in recycling the computing power back to blockchain since it turns the resource wastage to facilitate refined cohort analysis of users, serving as the spatial measurer and enabling a searchable blockchain. We build a prototype of PoUS and compare its performance with PoW. The results show that PoUS outperforms PoW in achieving an average transaction per second (TPS) improvement of 24.01% and an average confirmation latency reduction of 43.64%. Besides, PoUS functions well in mirroring the spatial information of users, with negligible computation time and communication cost.Item Resource Optimization for Blockchain-Based Federated Learning in Mobile Edge Computing(IEEE, 2024-05) Wang, Zhilin; Hu, Qin; Xiong, Zehui; Computer and Information Science, Purdue School of ScienceWith the booming of mobile edge computing (MEC) and blockchain-based blockchain-based federated learning (BCFL), more studies suggest deploying BCFL on edge servers. In this case, edge servers with restricted resources face the dilemma of serving both mobile devices for their offloading tasks and the BCFL system for model training and blockchain consensus without sacrificing the service quality to any side. To address this challenge, this article proposes a resource allocation scheme for edge servers to provide optimal services at the minimum cost. Specifically, we first analyze the energy consumption of the MEC and BCFL tasks, considering the completion time of each task as the service quality constraint. Then, we model the resource allocation challenge into a multivariate, multiconstraint, and convex optimization problem. While solving the problem in a progressive manner, we design two algorithms based on the alternating direction method of multipliers (ADMMs) in both homogeneous and heterogeneous situations, where equal and on-demand resource distribution strategies are, respectively, adopted. The validity of our proposed algorithms is proved via rigorous theoretical analysis. Moreover, the convergence and efficiency of our proposed resource allocation schemes are evaluated through extensive experiments.Item Online Learning for Failure-Aware Edge Backup of Service Function Chains With the Minimum Latency(IEEE, 2023-12) Wang, Chen; Hu, Qin; Yu, Dongxiao; Cheng, Xiuzhen; Computer and Information Science, Purdue School of ScienceVirtual network functions (VNFs) have been widely deployed in mobile edge computing (MEC) to flexibly and efficiently serve end users running resource-intensive applications, which can be further serialized to form service function chains (SFCs), providing customized networking services. To ensure the availability of SFCs, it turns out to be effective to place redundant SFC backups at the edge for quickly recovering from any failures. The existing research largely overlooks the influences of SFC popularity, backup completeness, and failure rate on the optimal deployment of SFC backups on edge servers. In this paper, we comprehensively consider from the perspectives of both the end users and edge system to backup SFCs for providing popular services with the lowest latency. To overcome the challenges resulted from unknown SFC popularity and failure rate, as well as the known system parameter constraints, we take advantage of the online bandit learning technique to cope with the uncertainty issue. Combining the Prim -inspired method with the greedy strategy, we propose a Real-Time Selection and Deployment (RTSD) algorithm. Extensive simulation experiments are conducted to demonstrate the superiority of our proposed algorithms.Item Incentive Mechanism Design for Joint Resource Allocation in Blockchain-Based Federated Learning(IEEE, 2023-05) Wang, Zhilin; Hu, Qin; Li, Ruinian; Xu, Minghui; Xiong, Zehui; Computer and Information Science, Purdue School of ScienceBlockchain-based federated learning (BCFL) has recently gained tremendous attention because of its advantages, such as decentralization and privacy protection of raw data. However, there has been few studies focusing on the allocation of resources for the participated devices (i.e., clients) in the BCFL system. Especially, in the BCFL framework where the FL clients are also the blockchain miners, clients have to train the local models, broadcast the trained model updates to the blockchain network, and then perform mining to generate new blocks. Since each client has a limited amount of computing resources, the problem of allocating computing resources to training and mining needs to be carefully addressed. In this paper, we design an incentive mechanism to help the model owner (MO) (i.e., the BCFL task publisher) assign each client appropriate rewards for training and mining, and then the client will determine the amount of computing power to allocate for each subtask based on these rewards using the two-stage Stackelberg game. After analyzing the utilities of the MO and clients, we transform the game model into two optimization problems, which are sequentially solved to derive the optimal strategies for both the MO and clients. Further, considering the fact that local training related information of each client may not be known by others, we extend the game model with analytical solutions to the incomplete information scenario. Extensive experimental results demonstrate the validity of our proposed schemes.Item An Uncertainty- and Collusion-Proof Voting Consensus Mechanism in Blockchain(IEEE, 2023-10) Wang, Shengling; Qu, Xidi; Hu, Qin; Wang, Xia; Cheng, Xiuzhen; Computer and Information Science, Purdue School of ScienceThough voting-based consensus algorithms in blockchain outperform proof-based ones in energy- and transaction-efficiency, they are prone to incur wrong elections and bribery elections. The former originates from the uncertainties of candidates’ capability and availability, and the latter comes from the egoism of voters and candidates. Hence, in this paper, we propose an uncertainty- and collusion-proof voting consensus mechanism, including the selection pressure-based voting algorithm and the trustworthiness evaluation algorithm. The first algorithm can decrease the side effects of candidates’ uncertainties, lowering wrong elections while trading off the balance between efficiency and fairness in voting miners. The second algorithm adopts an incentive-compatible scoring rule to evaluate the trustworthiness of voting, motivating voters to report true beliefs on candidates by making egoism consistent with altruism so as to avoid bribery elections. A salient feature of our work is theoretically analyzing the proposed voting consensus mechanism by the large deviation theory. Our analysis provides not only the voting failure rate of a candidate but also its decay speed. The voting failure rate measures the incompetence of any candidate from a personal perspective by voting, based on which the concepts of the effective selection valve and the effective expectation of merit are introduced to help the system designer determine the optimal voting standard and guide a candidate to behave in an optimal way for lowering the voting failure rate.Item Online-Learning-Based Fast-Convergent and Energy-Efficient Device Selection in Federated Edge Learning(IEEE, 2023-03) Peng, Cheng; Hu, Qin; Wang, Zhilin; Liu, Ryan Wen; Xiong, Zehui; Computer and Information Science, Purdue School of ScienceAs edge computing faces increasingly severe data security and privacy issues of edge devices, a framework called federated edge learning (FEL) has recently been proposed to enable machine learning (ML) model training at the edge, ensuring communication efficiency and data privacy protection for edge devices. In this paradigm, the training efficiency has long been challenged by the heterogeneity of communication conditions, computing capabilities, and available data sets at devices. Currently, researchers focus on solving this challenge via device selection from the perspective of optimizing energy consumption or convergence speed. However, the consideration of any one of them is insufficient to guarantee the long-term system efficiency and stability. To fill the gap, we propose an optimization problem to simultaneously minimize the total energy consumption of selected devices and maximize the convergence speed of the global model for device selection in FEL, under the constraints of training data amount and time consumption. For the accurate calculation of energy consumption, we deploy online bandit learning to estimate the CPU-cycle frequency availability of each device, based on an efficient algorithm, named fast-convergent energy-efficient device selection (FCE2DS), is proposed to solve the optimization problem with a low level of time complexity. Through a series of comparative experiments, we evaluate the performance of the proposed FCE2DS scheme, verifying its high training accuracy and energy efficiency.Item Alliance Makes Difference? Maximizing Social Welfare in Cross-Silo Federated Learning(IEEE, 2024-02) Chen, Jianan; Hu, Qin; Jiang, Honglu; Computer and Information Science, Purdue School of ScienceAs one of the typical settings of Federated Learning (FL), cross-silo FL allows organizations to jointly train an optimal Machine Learning (ML) model. In this case, some organizations may try to obtain the global model without contributing their local training power, lowering the social welfare. In this article, we model the interactions among organizations in cross-silo FL as a public goods game and theoretically prove that there exists a social dilemma where the maximum social welfare is not achieved in Nash equilibrium. To overcome this dilemma, we employ the Multi-player Multi-action Zero-Determinant (MMZD) strategy to maximize the social welfare. With the help of the MMZD, an individual organization can unilaterally control the social welfare without extra cost. Since the MMZD strategy can be adopted by all organizations, we further study the case of multiple organizations jointly adopting the MMZD strategy to form an MMZD Alliance (MMZDA). We prove that the MMZDA strategy can strengthen the control of the maximum social welfare. Experimental results validate that the MMZD strategy is effective in obtaining the maximum social welfare and the MMZDA can achieve a larger maximum value.