Statistics Seminar: Elena Tuzhilina – Statistical curve models for inferring 3D chromatin architecture
Feb 28, 2023
3:30PM to 5:00PM
Date(s) - 28/02/2023
3:30 pm - 5:00 pm
Speaker: Elena Tuzhilina, University of Toronto
Title: Statistical curve models for inferring 3D chromatin architecture
Abstract: Conformation reconstruction is an important challenge in computational biology. In this project we develop a model for the 3D spatial organization of chromatin, a crucial component of numerous cellular processes. The central object in this study is the so-called contact matrix. It represents the frequency of contacts between each pair of genomic loci; it can thus be used to infer the 3D structure. The following heuristic is usually applied to link the contact counts to the conformation: loci that are close to each other in 3D space should have a higher contact value. Most of the existing methods that operate on contact matrices are based on multidimensional scaling (MDS) and produce reconstructed 3D configurations in the form of a polygonal chain. However, none of them exploit the fact that the target solution should be a smooth curve in 3D. The smoothness attribute is either ignored or indirectly addressed via introducing highly non-convex penalties in the model. This typically leads to increased computational complexity and instability of the reconstruction algorithm. In this work we develop Principal Curve Metric Scaling (PCMS), a novel approach for modeling chromatin directly by a smooth curve. PCMS combines advantages of MDS and smoothness penalties while being computationally efficient. We subsequently use PCMS as a building block to create more complex distribution-based models for the conformation. In particular, we propose the PoisMS technique that assumes the Poisson distribution for the contact counts. The performance of the methods is illustrated on real Hi-C data computed for chromosome 20 and evaluated by means of orthogonal multiplex FISH imaging.
Keywords: chromatin conformation reconstruction; principal curves; multi-dimensional scaling; matrix type generalized linear models.
Date/Time: Tuesday February 28, 2023, 3:30 – 5:00
Location: MDCL 1115