Doctoral Public Lecture | Ana Carolina Da Cruz

Student NameAna Carolina Da Cruz
Program: Statistics
Thesis TitleBayesian Methods for Change-Point Data Clustering and Functional Data Analysis

Supervisors: Camila de Souza
Location: Western Science Centre 248

Abstract: 

Technological advancements have made high-dimensional data, such as multiple sequences of change-point data and functional data, increasingly available. However, the complexity of such data presents significant challenges in the data analysis, leading to the need of the development of efficient and reliable statistical methodologies. Bayesian inference methods, which integrate prior knowledge and manage model complexity, are commonly used in data analysis. Variational inference (VI) methods, in particular, are becoming increasingly popular for Bayesian model inference due to their efficiency and low computational cost. In this thesis, we propose Bayesian inference methods for analyzing high-dimensional data in the context of clustering constant-wise change-point data, basis function selection for functional data representation with within-curve correlation and estimation and variable selection in functional regression models.

In Chapter 2, we propose and implement a nonparametric Bayesian model for clustering observations based on their constant-wise change-point profiles via Gibbs sampler. We incorporate a Dirichlet Process on the constant-wise change-point structures to cluster observations while simultaneously performing multiple change-point estimation. In addition, we develop the R package BayesCPclust available on CRAN. In Chapter 3, we develop and implement a variational Expectation-Maximization (VEM) algorithm for selecting basis functions for smoothing one or multiple curves simultaneously, while accounting for within-curve correlation. In Chapter 4, we propose and implement variational Bayes (VB) algorithms for estimation and variable selection in scalar-on-function and partially functional regression models.

Extensive simulation studies assess the performance of the proposed algorithms, comparing them with alternative methods, including Markov chain Monte Carlo (MCMC) methods when available. Applications to real data, including a single-cell genomic sequencing dataset in Chapter 2, the motorcycle, Canadian weather and LIDAR (LIght Detection And Ranging) datasets in Chapter 3 and the sugar spectra and Japan weather datasets in Chapter 4, illustrate the applicability of the proposed methods.

Please contact the Graduate Assistant in the program for further information: https://grad.uwo.ca/about_us/program_contacts.cfm.