- Browse by Author
Browsing by Author "Mukhopadhyay, Snehasis"
Now showing 1 - 10 of 46
Results Per Page
Sort Options
Item Automatic Detection of Associatons Among Terms Related to Alzheimer's Disease from Medline AbstractsLai, Dongbing; Mukhopadhyay, SnehasisAlzheimer's disease is a progressive, age-related, degenerative brain disorder, which is one of the most serious diseases in old people. The patients' memory is lost and their personality and behavior are changed gradually; furthermore, this process is irreversible until the patients die [1]. Alzheimer's disease first attacks the entorhinal cortex; then to the hippocampus, which help to control short-term memory; then to other regions, especially the cerebral cortex, which is very important in using language and reasoning [1 , 2]. After its attack, the neurons degenerate and lose synapses and eventually die [1 , 2]. According to the age of having this disease, Alzheimer's disease can be divided to early-onset (usually at age 30 to 60) and late-onset (at age of 65 or older) [1]. About 5% to 10% of Alzheimer's disease cases are early onset [1 ]. Another way to describe Alzheimer's disease is according to the inheritance pattern. In this way, Alzheimer's disease also can be divided to: sporadic Alzheimer's disease, which has no certain inheritance pattern; and familial Alzheimer's disease (FAD), which has certain inheritance pattern [1]. All FAD are early onset [1 ]. Alzheimer's disease is a progressive disease and the progression of symptoms can be divided into mild, moderate and severe phases [2, 3]. The symptoms of mild Alzheimer's disease include loss of memory, disorientation, and difficulty of performing routine tasks. [2]. Patients in this phase can live independently [3]. The moderate symptoms include having great difficulty in daily living, wandering, personality changes, agitation and anxiety [2]. Patients in this phase should be cared by other people. People in severe phase lose all communication functions, almost cannot think, and need total care [2].Item Automatic Extraction of Computer Science Concept Phrases Using a Hybrid Machine Learning Paradigm(2023-05) Jahin, S M Abrar; Al Hasan, Mohammad; Fang, Shiaofen; Mukhopadhyay, SnehasisWith the proliferation of computer science in recent years in modern society, the number of computer science-related employment is expanding quickly. Software engineer has been chosen as the best job for 2023 based on pay, stress level, opportunity for professional growth, and balance between work and personal life. This was decided by a rankings of different news, journals, and publications. Computer science occupations are anticipated to be in high demand not just in 2023, but also for the foreseeable future. It's not surprising that the number of computer science students at universities is growing and will continue to grow. The enormous increase in student enrolment in many subdisciplines of computers has presented some distinct issues. If computer science is to be incorporated into the K-12 curriculum, it is vital that K-12 educators are competent. But one of the biggest problems with this plan is that there aren't enough trained computer science professors. Numerous new fields and applications, for instance, are being introduced to computer science. In addition, it is difficult for schools to recruit skilled computer science instructors for a variety of reasons including low salary issue. Utilizing the K-12 teachers who are already in the schools, have a love for teaching, and consider teaching as a vocation is therefore the most effective strategy to improve or fix this issue. So, if we want teachers to quickly grasp computer science topics, we need to give them an easy way to learn about computer science. To simplify and expedite the study of computer science, we must acquaint school-treachers with the terminology associated with computer science concepts so they can know which things they need to learn according to their profile. If we want to make it easier for schoolteachers to comprehend computer science concepts, it would be ideal if we could provide them with a tree of words and phrases from which they could determine where the phrases originated and which phrases are connected to them so that they can be effectively learned. To find a good concept word or phrase, we must first identify concepts and then establish their connections or linkages. As computer science is a fast developing field, its nomenclature is also expanding at a frenetic rate. Therefore, adding all concepts and terms to the knowledge graph would be a challenging endeavor. Cre- ating a system that automatically adds all computer science domain terms to the knowledge graph would be a straightforward solution to the issue. We have identified knowledge graph use cases for the schoolteacher training program, which motivates the development of a knowledge graph. We have analyzed the knowledge graph's use case and the knowledge graph's ideal characteristics. We have designed a webbased system for adding, editing, and removing words from a knowledge graph. In addition, a term or phrase can be represented with its children list, parent list, and synonym list for enhanced comprehension. We' ve developed an automated system for extracting words and phrases that can extract computer science idea phrases from any supplied text, therefore enriching the knowledge graph. Therefore, we have designed the knowledge graph for use in teacher education so that school-teachers can educate K-12 students computer science topicses effectively.Item Biomedical Literature Mining with Transitive Closure and Maximum Network Flow(http://doi.acm.org/10.1145/1851476.1851552, 2011-05-15) Hoblitzell, Andrew P.; Mukhopadhyay, Snehasis; Xia, Yuni; Fang, ShiafoenThe biological literature is a huge and constantly increasing source of information which the biologist may consult for information about their field, but the vast amount of data can sometimes become overwhelming. Medline, which makes a great amount of biological journal data available online, makes the development of automated text mining systems and hence “data-driven discovery” possible. This thesis examines current work in the field of text mining and biological literature, and then aims to mine documents pertaining to bone biology. The documents are retrieved from PubMed, and then direct associations between the terms are computers. Potentially novel transitive associations among biological objects are then discovered using the transitive closure algorithm and the maximum flow algorithm. The thesis discusses in detail the extraction of biological objects from the collected documents and the co-occurrence based text mining algorithm, the transitive closure algorithm, and the maximum network flow which were then run to extract the potentially novel biological associations. Generated hypotheses (novel associations) were assigned with significance scores for further validation by a bone biologist expert. Extension of the work in to hypergraphs for enhanced meaning and accuracy is also examined in the thesis.Item Brain Connectome Network Properties Visualization(2018-12) Zhang, Chenfeng; Fang, Shiaofen; Tuceryan, Mihran; Mukhopadhyay, SnehasisBrain connectome network visualization could help the neurologists inspect the brain structure easily and quickly. In the thesis, the model of the brain connectome network is visualized in both three dimensions (3D) environment and two dimensions (2D) environment. One is named “Brain Explorer for Connectomic Analysis” (BECA) developed by the previous research already. It could present the 3D model of brain structure with region of interests (ROIs) in different colors [5]. The other is mainly for the information visualization of brain connectome in 2D. It adopts the force-directed layout to visualize the network. However, the brain network visualization could not bring the user intuitively ideas about brain structure. Sometimes, with the increasing scales of ROIs (nodes), the visualization would bring more visual clutter for readers [3]. So, brain connectome network properties visualization becomes a useful complement to brain network visualization. For a better understanding of the effect of Alzheimer’s disease on the brain nerves, the thesis introduces several methods about the brain graph properties visualization. There are the five selected graph properties discussed in the thesis. The degree and closeness are node properties. The shortest path, maximum flow, and clique are edge properties. Except for clique, the other properties are visualized in both 3D and 2D. The clique is visualized only in 2D. For the clique, a new hypergraph visualization method is proposed with three different algorithms. Instead of using an extra node to present a clique, the thesis uses a “belt” to connect all nodes within the same clique. The methods of node connections are based on the traveling salesman problem (TSP) and Law of cosines. In addition, the thesis also applies the result of the clique to adjust the force-directed layout of brain graph in 2D to dramatically eliminate the visual clutter. Therefore, with the support of the graph properties visualization, the brain connectome network visualization tools become more flexible.Item Computational Analysis of Flow Cytometry Data(2013-07-12) Irvine, Allison W.; Dundar, Murat; Tuceryan, Mihran; Mukhopadhyay, Snehasis; Fang, ShiaofenThe objective of this thesis is to compare automated methods for performing analysis of flow cytometry data. Flow cytometry is an important and efficient tool for analyzing the characteristics of cells. It is used in several fields, including immunology, pathology, marine biology, and molecular biology. Flow cytometry measures light scatter from cells and fluorescent emission from dyes which are attached to cells. There are two main tasks that must be performed. The first is the adjustment of measured fluorescence from the cells to correct for the overlap of the spectra of the fluorescent markers used to characterize a cell’s chemical characteristics. The second is to use the amount of markers present in each cell to identify its phenotype. Several methods are compared to perform these tasks. The Unconstrained Least Squares, Orthogonal Subspace Projection, Fully Constrained Least Squares and Fully Constrained One Norm methods are used to perform compensation and compared. The fully constrained least squares method of compensation gives the overall best results in terms of accuracy and running time. Spectral Clustering, Gaussian Mixture Modeling, Naive Bayes classification, Support Vector Machine and Expectation Maximization using a gaussian mixture model are used to classify cells based on the amounts of dyes present in each cell. The generative models created by the Naive Bayes and Gaussian mixture modeling methods performed classification of cells most accurately. These supervised methods may be the most useful when online classification is necessary, such as in cell sorting applications of flow cytometers. Unsupervised methods may be used to completely replace manual analysis when no training data is given. Expectation Maximization combined with a cluster merging post-processing step gives the best results of the unsupervised methods considered.Item Computational Mining and Survey of Simple Sequence Repeats (SSRs) in Expressed Sequence Tags (ESTs) of Dicotyledonous Plants(2004-07) Kumpatla, Siva Prasad; Mukhopadhyay, SnehasisDNA markers have revolutionized the field of genetics by increasing the pace of genetic analysis. Simple sequence repeats (SSRs) are repetitions of nucleotide motifs of 1 to 5 bases and are currently the markers of choice in many plant and animal genomes due to their abundant distribution in the genomes, hypervariable nature and suitability for high-throughput analysis. While SSRs, once developed, are extremely valuable, their development is time consuming, laborious and expensive. Sequences from many genomes are continuously made freely available in the public databases and mining of these sources using computational approaches permits rapid and economical marker development. Expressed sequence tags (ESTs) are ideal candidates for mining SSRs not only because of their availability in large numbers but also due to the fact that they represent expressed genes. Large scale SSR mining efforts in plants to date focused on monocotyledonous plants. In this project, an efficient SSR identification tool was developed and used to mine SSRs from more than 53 dicotyledonous species. A total of 92,648 non-redundant ESTs or 6.0% of the 1.54 million dicotyledonous ESTs investigated in this study were found to contain SSRs. The frequency of non-redundant-ESTs containing SSRs among the species investigated ranged from 2.65% to 16.82%. More than 80% of the non-redundant ESTs having SSRs contained a single SSR repeat while others contained 2 or more SSRs. An extensive analysis of the occurrence and frequencies of various SSR types revealed that the A/T mononucleotide, AG/GA/CT/TC dinucleotide, AAG/AGA/GAA/CTT/TTC/TCT trinucleotide and TTTA and TTAA tetranucleotide repeats are the most abundant in dicotyledonous species. In addition, an analysis of the number of repeats across species revealed that majority of the mononucleotide SSRs contained 15-25 repeats while majority of the di- and tri-nucleotide SSRs contained 5-10 repeats. By providing valuable information on the abundance of SSRs in ESTs of a large number of dicotyledonous species, this study demonstrates the potential of computational mining approach for rapid discovery of SSRs towards the development of markers for genetic analysis and related applications.Item Decentralized and Partially Decentralized Multi-Agent Reinforcement Learning(2013-08-22) Tilak, Omkar Jayant; Mukhopadhyay, Snehasis; Si, Luo; Neville, Jennifer; Raje, Rajeev; Tuceryan, Mihran; Gorman, William J.Multi-agent systems consist of multiple agents that interact and coordinate with each other to work towards to certain goal. Multi-agent systems naturally arise in a variety of domains such as robotics, telecommunications, and economics. The dynamic and complex nature of these systems entails the agents to learn the optimal solutions on their own instead of following a pre-programmed strategy. Reinforcement learning provides a framework in which agents learn optimal behavior based on the response obtained from the environment. In this thesis, we propose various novel de- centralized, learning automaton based algorithms which can be employed by a group of interacting learning automata. We propose a completely decentralized version of the estimator algorithm. As compared to the completely centralized versions proposed before, this completely decentralized version proves to be a great improvement in terms of space complexity and convergence speed. The decentralized learning algorithm was applied; for the first time; to the domains of distributed object tracking and distributed watershed management. The results obtained by these experiments show the usefulness of the decentralized estimator algorithms to solve complex optimization problems. Taking inspiration from the completely decentralized learning algorithm, we propose the novel concept of partial decentralization. The partial decentralization bridges the gap between the completely decentralized and completely centralized algorithms and thus forms a comprehensive and continuous spectrum of multi-agent algorithms for the learning automata. To demonstrate the applicability of the partial decentralization, we employ a partially decentralized team of learning automata to control multi-agent Markov chains. More flexibility, expressiveness and flavor can be added to the partially decentralized framework by allowing different decentralized modules to engage in different types of games. We propose the novel framework of heterogeneous games of learning automata which allows the learning automata to engage in disparate games under the same formalism. We propose an algorithm to control the dynamic zero-sum games using heterogeneous games of learning automata.Item Deep Learning Based Methods for Automatic Extraction of Syntactic Patterns and their Application for Knowledge Discovery(2023-12-28) Kabir, Md. Ahsanul; Hasan, Mohammad Al; Mukhopadhyay, Snehasis; Tuceryan, Mihran; Fang, ShiaofenSemantic pairs, which consist of related entities or concepts, serve as the foundation for comprehending the meaning of language in both written and spoken forms. These pairs enable to grasp the nuances of relationships between words, phrases, or ideas, forming the basis for more advanced language tasks like entity recognition, sentiment analysis, machine translation, and question answering. They allow to infer causality, identify hierarchies, and connect ideas within a text, ultimately enhancing the depth and accuracy of automated language processing. Nevertheless, the task of extracting semantic pairs from sentences poses a significant challenge, necessitating the relevance of syntactic dependency patterns (SDPs). Thankfully, semantic relationships exhibit adherence to distinct SDPs when connecting pairs of entities. Recognizing this fact underscores the critical importance of extracting these SDPs, particularly for specific semantic relationships like hyponym-hypernym, meronym-holonym, and cause-effect associations. The automated extraction of such SDPs carries substantial advantages for various downstream applications, including entity extraction, ontology development, and question answering. Unfortunately, this pivotal facet of pattern extraction has remained relatively overlooked by researchers in the domains of natural language processing (NLP) and information retrieval. To address this gap, I introduce an attention-based supervised deep learning model, ASPER. ASPER is designed to extract SDPs that denote semantic relationships between entities within a given sentential context. I rigorously evaluate the performance of ASPER across three distinct semantic relations: hyponym-hypernym, cause-effect, and meronym-holonym, utilizing six datasets. My experimental findings demonstrate ASPER's ability to automatically identify an array of SDPs that mirror the presence of these semantic relationships within sentences, outperforming existing pattern extraction methods by a substantial margin. Second, I want to use the SDPs to extract semantic pairs from sentences. I choose to extract cause-effect entities from medical literature. This task is instrumental in compiling various causality relationships, such as those between diseases and symptoms, medications and side effects, and genes and diseases. Existing solutions excel in sentences where cause and effect phrases are straightforward, such as named entities, single-word nouns, or short noun phrases. However, in the complex landscape of medical literature, cause and effect expressions often extend over several words, stumping existing methods, resulting in incomplete extractions that provide low-quality, non-informative, and at times, conflicting information. To overcome this challenge, I introduce an innovative unsupervised method for extracting cause and effect phrases, PatternCausality tailored explicitly for medical literature. PatternCausality employs a set of cause-effect dependency patterns as templates to identify the key terms within cause and effect phrases. It then utilizes a novel phrase extraction technique to produce comprehensive and meaningful cause and effect expressions from sentences. Experiments conducted on a dataset constructed from PubMed articles reveal that PatternCausality significantly outperforms existing methods, achieving a remarkable order of magnitude improvement in the F-score metric over the best-performing alternatives. I also develop various PatternCausality variants that utilize diverse phrase extraction methods, all of which surpass existing approaches. PatternCausality and its variants exhibit notable performance improvements in extracting cause and effect entities in a domain-neutral benchmark dataset, wherein cause and effect entities are confined to single-word nouns or noun phrases of one to two words. Nevertheless, PatternCausality operates within an unsupervised framework and relies heavily on SDPs, motivating me to explore the development of a supervised approach. Although SDPs play a pivotal role in semantic relation extraction, pattern-based methodologies remain unsupervised, and the multitude of potential patterns within a language can be overwhelming. Furthermore, patterns do not consistently capture the broader context of a sentence, leading to the extraction of false-positive semantic pairs. As an illustration, consider the hyponym-hypernym pattern the w of u which can correctly extract semantic pairs for a sentence like the village of Aasu but fails to do so for the phrase the moment of impact. The root cause of this limitation lies in the pattern's inability to capture the nuanced meaning of words and phrases in a sentence and their contextual significance. These observations have spurred my exploration of a third model, DepBERT which constitutes a dependency-aware supervised transformer model. DepBERT's primary contribution lies in introducing the underlying dependency structure of sentences to a language model with the aim of enhancing token classification performance. To achieve this, I must first reframe the task of semantic pair extraction as a token classification problem. The DepBERT model can harness both the tree-like structure of dependency patterns and the masked language architecture of transformers, marking a significant milestone, as most large language models (LLMs) predominantly focus on semantics and word co-occurrence while neglecting the crucial role of dependency architecture. In summary, my overarching contributions in this thesis are threefold. First, I validate the significance of the dependency architecture within various components of sentences and publish SDPs that incorporate these dependency relationships. Subsequently, I employ these SDPs in a practical medical domain to extract vital cause-effect pairs from sentences. Finally, my third contribution distinguishes this thesis by integrating dependency relations into a deep learning model, enhancing the understanding of language and the extraction of valuable semantic associations.Item The design and implementation of mobile deluge on Android platform for wireless sensor network reprogramming(2017-11-28) Faruk, MD Omor; Liang, Yao; Tuceryan, Mihran; Mukhopadhyay, SnehasisWireless Sensor Networks (WSN) is being used in various applications including environmental monitoring, site inspection and military. WSN is a distributed network of sensor devices that can be used to monitor temperature, humidity, light and other important metrics. The software that runs on the sensor devices define how the device should operate. In real world WSN deployment, device software update is required to maintain optimal operation. In this thesis, we propose a novel idea of updating the software of the sensor nodes using a mobile device running on Android Operating System. Our implementation builds upon Mobile Deluge with few enhancement which is a method of re-programming WSN with laptop computer. We have evaluated our application performance by lab experiments and in real world deployments of WSN and found the application stable and battery efficient.Item Effect of Stakeholder Attitudes on the Optimization of Watershed Conservation Practices(2013-01-30) Piemonti, Adriana Debora; Babbar-Sebens, Meghna; Jacinthe, Pierre-Andre; Mukhopadhyay, Snehasis; Luzar, E. Jane, 1951-Land use alterations have been major drivers for modifying hydrologic cycles in many watersheds nationwide. Imbalances in this cycle have led to unexpected or extreme changes in flood and drought patterns and intensities, severe impairment of rivers and streams due to pollutants, and extensive economic losses to affected communities. Eagle Creek Watershed (ECW) is a typical Midwestern agricultural watershed with a growing urban land-use that has been affected by these problems. Structural solutions, such as ditches and tiles, have helped in the past to reduce the flooding problem in the upland agricultural area. But these structures have led to extensive flooding and water quality problems downstream and loss of moisture storage in the soil upstream. It has been suggested that re-naturalization of watershed hydrology via a spatially-distributed implementation of non-structural and structural conservation practices, such as cover crops, wetlands, riparian buffers, grassed waterways, etc. will help to reduce these problems by improving the upland runoff (storing water temporally as moisture in the soil or in depression storages). However, spatial implementation of these upland storage practices poses hurdles not only due to the large number of possible alternatives offered by physical models, but also by the effect of tenure, social attitudes, and behaviors of landowners that could further add complexities on whether and how these practices are adopted and effectively implemented for benefits. This study investigates (a) how landowner tenure and attitudes can be used to identify promising conservation practices in an agricultural watershed, (b) how the different attitudes and preferences of stakeholders can modify the effectiveness of solutions obtained via classic optimization approaches that do not include the influence of social attitudes in a watershed, and (c) how spatial distribution of landowner tenure affects the spatial optimization of conservation practices on a watershed scale. Results showed two main preferred practices, one for an economic evaluation (filter strips) and one for an environmental perspective (wetlands). A land tenure comparison showed differences in spatial distribution of systems considering all the conservation practices. It also was observed that cash renters selected practices will provide a better cost-revenue relation than the selected optimal solution.