Mathematical Sciences Department Theses and Dissertations

Permanent URI for this collection

For more information about the Mathematical Sciences graduate programs visit:


Recent Submissions

Now showing 1 - 10 of 37
  • Item
    Sample Size Determination for Subsampling in the Analysis of Big Data, Multiplicative Models for Confidence Intervals and Free-Knot Changepoint Models
    (2024-05) Zhang, Sheng; Peng, Hanxiang; Tan, Fei; Sarkar, Jyoti; Boukai, Ben
    The dissertation consists of three parts. Motivated by subsampling in the analysis of Big Data and by data-splitting in machine learning, sample size determination for multidimensional parameters is presented in the first part. In the second part, we propose a novel approach to the construction of confidence intervals based on improved concentration inequalities. We provide the missing factor for the tail probability of a random variable which generalizes Talagrand’s (1995) result of the missing factor in Hoeffding’s inequalities. We give the procedure for constructing confidence intervals and illustrate it with simulations. In the third part, we study irregular change-point models using free-knot splines. The consistency and asymptotic normality of the least squares estimators are proved for the irregular models in which the linear spline is not differentiable. Simulations are carried out to explore the numerical properties of the proposed models. The results are used to analyze the US Covid-19 data.
  • Item
    Efficient Inference and Dominant-Set Based Clustering for Functional Data
    (2024-05) Wang, Xiang; Wang, Honglang; Boukai, Benzion; Tan, Fei; Peng, Hanxiang
    This dissertation addresses three progressively fundamental problems for functional data analysis: (1) To do efficient inference for the functional mean model accounting for within-subject correlation, we propose the refined and bias-corrected empirical likelihood method. (2) To identify functional subjects potentially from different populations, we propose the dominant-set based unsupervised clustering method using the similarity matrix. (3) To learn the similarity matrix from various similarity metrics for functional data clustering, we propose the modularity guided and dominant-set based semi-supervised clustering method. In the first problem, the empirical likelihood method is utilized to do inference for the mean function of functional data by constructing the refined and bias-corrected estimating equation. The proposed estimating equation not only improves efficiency but also enables practically feasible empirical likelihood inference by properly incorporating within-subject correlation, which has not been achieved by previous studies. In the second problem, the dominant-set based unsupervised clustering method is proposed to maximize the within-cluster similarity and applied to functional data with a flexible choice of similarity measures between curves. The proposed unsupervised clustering method is a hierarchical bipartition procedure under the penalized optimization framework with the tuning parameter selected by maximizing the clustering criterion called modularity of the resulting two clusters, which is inspired by the concept of dominant set in graph theory and solved by replicator dynamics in game theory. The advantage offered by this approach is not only robust to imbalanced sizes of groups but also to outliers, which overcomes the limitation of many existing clustering methods. In the third problem, the metric-based semi-supervised clustering method is proposed with similarity metric learned by modularity maximization and followed by the above proposed dominant-set based clustering procedure. Under semi-supervised setting where some clustering memberships are known, the goal is to determine the best linear combination of candidate similarity metrics as the final metric to enhance the clustering performance. Besides the global metric-based algorithm, another algorithm is also proposed to learn individual metrics for each cluster, which permits overlapping membership for the clustering. This is innovatively different from many existing methods. This method is superiorly applicable to functional data with various similarity metrics between functional curves, while also exhibiting robustness to imbalanced sizes of groups, which are intrinsic to the dominant-set based clustering approach. In all three problems, the advantages of the proposed methods are demonstrated through extensive empirical investigations using simulations as well as real data applications.
  • Item
    Weighted Curvatures in Finsler Geometry
    (2023-08) Zhao, Runzhong; Shen, Zhongmin; Buse, Olguta; Ramras, Daniel; Roeder, Roland
    The curvatures in Finsler geometry can be defined in similar ways as in Riemannian geometry. However, since there are fewer restrictions on the metrics, many geometric quantities arise in Finsler geometry which vanish in the Riemannian case. These quantities are generally known as non-Riemannian quantities and interact with the curvatures in controlling the global geometrical and topological properties of Finsler manifolds. In the present work, we study general weighted Ricci curvatures which combine the Ricci curvature and the S-curvature, and define a weighted flag curvature which combines the flag curvature and the T -curvature. We characterize Randers metrics of almost isotropic weighted Ricci curvatures and show the general weighted Ricci curvatures can be divided into three types. On the other hand, we show that a proper open forward complete Finsler manifold with positive weighted flag curvature is necessarily diffeomorphic to the Euclidean space, generalizing the Gromoll-Meyer theorem in Riemannian geometry.
  • Item
    Values of Ramanujan's Continued Fractions Arising as Periodic Points of Algebraic Functions
    (2023-08) Akkarapakam, Sushmanth Jacob; Morton, Richard Patrick; Klimek, Slawomir D.; Roeder, Roland K. W.; Geller, William A.
    The main focus of this dissertation is to find and explain the periodic points of certain algebraic functions that are related to some modular functions, which themselves can be represented by continued fractions. Some of these continued fractions are first explored by Srinivasa Ramanujan in early 20th century. Later on, much work has been done in terms of studying the continued fractions, and proving several relations, identities, and giving different representations for them. The layout of this report is as follows. Chapter 1 has all the basic background knowledge and ingredients about algebraic number theory, class field theory, Ramanujan’s theta functions, etc. In Chapter 2, we look at the Ramanujan-Göllnitz-Gordon continued fraction that we call v(τ) and evaluate it at certain arguments in the field K = Q(√−d), with −d ≡ 1 (mod 8), in which the ideal (2) = ℘2℘′2 is a product of two prime ideals. We prove several identities related to itself and with other modular functions. Some of these are new, while some of them are known but with different proofs. These values of v(τ) are shown to generate the inertia field of ℘2 or ℘′2 in an extended ring class field over the field K. The conjugates over Q of these same values, together with 0, −1 ± √2, are shown to form the exact set of periodic points of a fixed algebraic function ˆF(x), independent of d. These are analogues of similar results for the Rogers-Ramanujan continued fraction. See [1] and [2]. This joint work with my advisor Dr. Morton, is submitted for publication to the New York Journal. In Chapters 3 and 4, we take a similar approach in studying two more continued fractions c(τ) and u(τ), the first of which is more commonly known as the Ramanujan’s cubic continued fraction. We show what fields a value of this continued fraction generates over Q, and we describe how the periodic points for described functions arise as values of these continued fractions. Then in the last chapter, we summarise all these results, give some possible directions for future research as well as mentioning some conjectures.
  • Item
    Certain Aspects of Quantum and Classical Integrable Systems
    (2022-08) Kosmakov, Maksim; Tarasov, Vitaly; Its, Alexander; Mukhin, Evgeny; Ramras, Daniel
    We derive new combinatorail formulas for vector-valued weight functions for the evolution modules over the Yangians Y (gl_n). We obtain them using the Nested Algebraic Bethe ansatz method. We also describe the asymptotic behavior of the radial solutions of the negative tt* equation via the Riemann-Hilbert problem and the Deift-Zhou nonlinear steepest descent method.
  • Item
    Optimal Policies in Reliability Modelling of Systems Subject to Sporadic Shocks and Continuous Healing
    (2022-12) Chatterjee, Debolina; Sarkar, Jyotirmoy; Boukai, Benzion; Li, Fang; Wang, Honglang
    Recent years have seen a growth in research on system reliability and maintenance. Various studies in the scientific fields of reliability engineering, quality and productivity analyses, risk assessment, software reliability, and probabilistic machine learning are being undertaken in the present era. The dependency of human life on technology has made it more important to maintain such systems and maximize their potential. In this dissertation, some methodologies are presented that maximize certain measures of system reliability, explain the underlying stochastic behavior of certain systems, and prevent the risk of system failure. An overview of the dissertation is provided in Chapter 1, where we briefly discuss some useful definitions and concepts in probability theory and stochastic processes and present some mathematical results required in later chapters. Thereafter, we present the motivation and outline of each subsequent chapter. In Chapter 2, we compute the limiting average availability of a one-unit repairable system subject to repair facilities and spare units. Formulas for finding the limiting average availability of a repairable system exist only for some special cases: (1) either the lifetime or the repair-time is exponential; or (2) there is one spare unit and one repair facility. In contrast, we consider a more general setting involving several spare units and several repair facilities; and we allow arbitrary life- and repair-time distributions. Under periodic monitoring, which essentially discretizes the time variable, we compute the limiting average availability. The discretization approach closely approximates the existing results in the special cases; and demonstrates as anticipated that the limiting average availability increases with additional spare unit and/or repair facility. In Chapter 3, the system experiences two types of sporadic impact: valid shocks that cause damage instantaneously and positive interventions that induce partial healing. Whereas each shock inflicts a fixed magnitude of damage, the accumulated effect of k positive interventions nullifies the damaging effect of one shock. The system is said to be in Stage 1, when it can possibly heal, until the net count of impacts (valid shocks registered minus valid shocks nullified) reaches a threshold $m_1$. The system then enters Stage 2, where no further healing is possible. The system fails when the net count of valid shocks reaches another threshold $m_2 (> m_1)$. The inter-arrival times between successive valid shocks and those between successive positive interventions are independent and follow arbitrary distributions. Thus, we remove the restrictive assumption of an exponential distribution, often found in the literature. We find the distributions of the sojourn time in Stage 1 and the failure time of the system. Finally, we find the optimal values of the choice variables that minimize the expected maintenance cost per unit time for three different maintenance policies. In Chapter 4, the above defined Stage 1 is further subdivided into two parts: In the early part, called Stage 1A, healing happens faster than in the later stage, called Stage 1B. The system stays in Stage 1A until the net count of impacts reaches a predetermined threshold $m_A$; then the system enters Stage 1B and stays there until the net count reaches another predetermined threshold $m_1 (>m_A)$. Subsequently, the system enters Stage 2 where it can no longer heal. The system fails when the net count of valid shocks reaches another predetermined higher threshold $m_2 (> m_1)$. All other assumptions are the same as those in Chapter 3. We calculate the percentage improvement in the lifetime of the system due to the subdivision of Stage 1. Finally, we make optimal choices to minimize the expected maintenance cost per unit time for two maintenance policies. Next, we eliminate the restrictive assumption that all valid shocks and all positive interventions have equal magnitude, and the boundary threshold is a preset constant value. In Chapter 5, we study a system that experiences damaging external shocks of random magnitude at stochastic intervals, continuous degradation, and self-healing. The system fails if cumulative damage exceeds a time-dependent threshold. We develop a preventive maintenance policy to replace the system such that its lifetime is utilized prudently. Further, we consider three variations on the healing pattern: (1) shocks heal for a fixed finite duration $\tau$; (2) a fixed proportion of shocks are non-healable (that is, $\tau=0$); (3) there are two types of shocks---self healable shocks heal for a finite duration, and non-healable shocks. We implement a proposed preventive maintenance policy and compare the optimal replacement times in these new cases with those in the original case, where all shocks heal indefinitely. Finally, in Chapter 6, we present a summary of the dissertation with conclusions and future research potential.
  • Item
    Sample Size Determination in Multivariate Parameters With Applications to Nonuniform Subsampling in Big Data High Dimensional Linear Regression
    (2021-12) Wang, Yu; Peng, Hanxiang; Li, Fang; Sarkar, Jyoti; Tan, Fei
    Subsampling is an important method in the analysis of Big Data. Subsample size determination (SSSD) plays a crucial part in extracting information from data and in breaking the challenges resulted from huge data sizes. In this thesis, (1) Sample size determination (SSD) is investigated in multivariate parameters, and sample size formulas are obtained for multivariate normal distribution. (2) Sample size formulas are obtained based on concentration inequalities. (3) Improved bounds for McDiarmid’s inequalities are obtained. (4) The obtained results are applied to nonuniform subsampling in Big Data high dimensional linear regression. (5) Numerical studies are conducted. The sample size formula in univariate normal distribution is a melody in elementary statistics. It appears that its generalization to multivariate normal (or more generally multivariate parameters) hasn’t been caught much attention to the best of our knowledge. In this thesis, we introduce a definition for SSD, and obtain explicit formulas for multivariate normal distribution, in gratifying analogy of the sample size formula in univariate normal. Commonly used concentration inequalities provide exponential rates, and sample sizes based on these inequalities are often loose. Talagrand (1995) provided the missing factor to sharpen these inequalities. We obtained the numeric values of the constants in the missing factor and slightly improved his results. Furthermore, we provided the missing factor in McDiarmid’s inequality. These improved bounds are used to give shrunken sample sizes.
  • Item
    Genera of integer representations and the Lyndon-Hochschild-Serre spectral sequence
    (2021-08) Neuffer, Christopher; Ramras, Daniel; Ji, Ronghui; Morton, Patrick; Buse, Olguta
    There has been in the past ten to fifteen years a surge of activity concerning the cohomology of semi-direct product groups of the form $\mathbb{Z}^{n}\rtimes$G with G finite. A problem first stated by Adem-Ge-Pan-Petrosyan asks for suitable conditions for the Lyndon-Hochschild-Serre Spectral Sequence associated to this group extension to collapse at second page of the Lyndon-Hochschild-Serre spectral sequence. In this thesis we use facts from integer representation theory to reduce this problem to only considering representatives from each genus of representations, and establish techniques for constructing new examples in which the spectral sequence collapses.
  • Item
    On Random Polynomials Spanned by OPUC
    (2020-12) Aljubran, Hanan; Yattselev, Maxim; Bleher, Pavel; Mukhin, Evgeny; Roeder, Roland
    We consider the behavior of zeros of random polynomials of the from \begin{equation*} P_{n,m}(z) := \eta_0\varphi_m^{(m)}(z) + \eta_1 \varphi_{m+1}^{(m)}(z) + \cdots + \eta_n \varphi_{n+m}^{(m)}(z) \end{equation*} as \( n\to\infty \), where \( m \) is a non-negative integer (most of the work deal with the case \( m =0 \) ), \( \{\eta_n\}_{n=0}^\infty \) is a sequence of i.i.d. Gaussian random variables, and \( \{\varphi_n(z)\}_{n=0}^\infty \) is a sequence of orthonormal polynomials on the unit circle \( \mathbb T \) for some Borel measure \( \mu \) on \( \mathbb T \) with infinitely many points in its support. Most of the work is done by manipulating the density function for the expected number of zeros of a random polynomial, which we call the intensity function.
  • Item
    Modeling Temporal Patterns of Neural Synchronization: Synaptic Plasticity and Stochastic Mechanisms
    (2020-08) Zirkle, Joel; Rubchinsky, Leonid; Kuznetsov, Alexey; Arciero, Julia; Barber, Jared
    Neural synchrony in the brain at rest is usually variable and intermittent, thus intervals of predominantly synchronized activity are interrupted by intervals of desynchronized activity. Prior studies suggested that this temporal structure of the weakly synchronous activity might be functionally significant: many short desynchronizations may be functionally different from few long desynchronizations, even if the average synchrony level is the same. In this thesis, we use computational neuroscience methods to investigate the effects of (i) spike-timing dependent plasticity (STDP) and (ii) noise on the temporal patterns of synchronization in a simple model. The model is composed of two conductance-based neurons connected via excitatory unidirectional synapses. In (i) these excitatory synapses are made plastic, in (ii) two different types of noise implementation to model the stochasticity of membrane ion channels is considered. The plasticity results are taken from our recently published article, while the noise results are currently being compiled into a manuscript. The dynamics of this network is subjected to the time-series analysis methods used in prior experimental studies. We provide numerical evidence that both STDP and channel noise can alter the synchronized dynamics in the network in several ways. This depends on the time scale that plasticity acts on and the intensity of the noise. However, in general, the action of STDP and noise in the simple network considered here is to promote dynamics with short desynchronizations (i.e. dynamics reminiscent of that observed in experimental studies) over dynamics with longer desynchronizations.