Title: Balancing covariates in randomized experiments using the Gram–Schmidt walk

Date: 05/21/2020

Time: 2:30 PM - 3:30 PM

Place:

(Part of the One World MINDS series: https://sites.google.com/view/minds-seminar/home)
In randomized experiments, such as a medical trials, we randomly assign the treatment, such as a drug or a placebo, that each experimental subject receives. Randomization can help us accurately estimate the difference in treatment effects with high probability. We also know that we want the two groups to be similar: ideally the two groups would be similar in every statistic we can measure beforehand. Recent advances in algorithmic discrepancy theory allow us to divide subjects into groups with similar statistics.
By exploiting the recent Gram-Schmidt Walk algorithm of Bansal, Dadush, Garg, and Lovett, we can obtain random assignments of low discrepancy. These allow us to obtain more accurate estimates of treatment effects when the information we measure about the subjects is predictive, while also bounding the worst-case behavior when it is not.
We will explain the experimental design problem we address, the Gram-Schmidt walk algorithm, and the major ideas behind our analyses. This is joint work with Chris Harshaw, Fredrik Sävje, and Peng Zhang.
Paper: https://arxiv.org/abs/1911.03071
Code: https://github.com/crharshaw/GSWDesign.jl

Title: The Analytic Geometries of Data; zoom link @ https://sites.google.com/view/minds-seminar/home

Date: 05/28/2020

Time: 2:30 PM - 2:30 PM

Place:

(Part of the One World MINDS series: https://sites.google.com/view/minds-seminar/home)
\[\]
We will describe methodologies to build data geometries designed to simultaneously analyze and process data bases. The different geometries or affinity metrics arise naturally as we learn to contextualize and conceptualize. I.e; relate data regions, and data features (which we extend to data tensors). Moreover we generate tensorial multiscale structures.
We will indicate connection to analysis by deep nets and describe applications to modeling observations of dynamical systems, from stochastic molecular dynamics to calcium imaging of brain activity.

Title: The troublesome kernel: instabilities in deep learning for inverse problems; zoom link @ https://sites.google.com/view/minds-seminar/home

Date: 06/04/2020

Time: 2:30 PM - 3:30 PM

Place:

(Part of One World MINDS seminar:
https://sites.google.com/view/minds-seminar/home)
\[\]
Due to their stunning success in traditional machine learning applications such as classification, techniques based on deep learning have recently begun to be actively investigated for problems in computational science and engineering. One of the key areas at the forefront of this trend is inverse problems, and specifically, inverse problems in imaging. The last few years have witnessed the emergence of many neural network-based algorithms for important imaging modalities such as MRI and X-ray CT. These claim to achieve competitive, and sometimes even superior, performance to current state-of-the-art techniques.
However, there is a problem. Techniques based on deep learning are typically unstable. For example, small perturbations in the data can lead to a myriad of artefacts in the recovered images. Such artifacts can be hard to dismiss as obviously unphysical, meaning that this phenomenon has potentially serious consequences for the safe deployment of deep learning in practice. In this talk, I will first showcase the instability phenomenon empirically in a range of examples. I will then focus on its mathematical underpinnings, the consequences of these insights when it comes to potential remedies, and the future possibilities for computing genuinely stable neural networks for inverse problems in imaging.
This is joint work with Vegard Antun, Nina M. Gottschling, Anders C. Hansen, Clarice Poon, and Francesco Renna
Papers:
https://www.pnas.org/content/early/2020/05/08/1907377117
https://arxiv.org/abs/2001.01258

Title: Optimal terminal dimensionality reduction in Euclidean space; zoom link @ https://sites.google.com/view/minds-seminar/home

Date: 06/11/2020

Time: 2:30 PM - 3:30 PM

Place:

(Part of One World MINDS seminar:
https://sites.google.com/view/minds-seminar/home)
\[
\]
The Johnson-Lindenstrauss lemma states that for any X a subset of R^d with |X| = n and for any epsilon, there exists a map f:X-->R^m for m = O(log n / epsilon^2) such that: for all x in X, for all y in X, (1-epsilon)|x - y|_2 <= |f(x) - f(y)|_2 <= (1+epsilon)|x - y|_2. We show that this statement can be strengthened. In particular, the above claim holds true even if "for all y in X" is replaced with "for all y in R^d". Joint work with Shyam Narayanan.

Title: Understanding Deep Neural Networks: From Generalization to Interpretability; zoom link @ https://sites.google.com/view/minds-seminar/home

Date: 06/18/2020

Time: 2:30 PM - 3:30 PM

Place:

(Part of One World MINDS seminar:
https://sites.google.com/view/minds-seminar/home)
\[
\]
Deep neural networks have recently seen an impressive comeback with
applications both in the public sector and the sciences. However,
despite their outstanding success, a comprehensive theoretical
foundation of deep neural networks is still missing.
For deriving a theoretical understanding of deep neural networks,
one main goal is to analyze their generalization ability, i.e.
their performance on unseen data sets. In case of graph
convolutional neural networks, which are today heavily used, for
instance, for recommender systems, already the generalization
capability to signals on graphs unseen in the training set,
typically coined transferability, was not rigorously analyzed. In
this talk, we will prove that spectral graph convolutional neural
networks are indeed transferable, thereby also debunking a common
misconception about this type of graph convolutional neural
networks.
If such theoretical approaches fail or if one is just given a
trained neural network without knowledge of how it was trained,
interpretability approaches become necessary. Those aim to "break
open the black box" in the sense of identifying those features from
the input, which are most relevant for the observed output. Aiming
to derive a theoretically founded approach to this problem, we
introduced a novel approach based on rate-distortion theory coined
Rate-Distortion Explanation (RDE), which not only provides
state-of-the-art explanations, but in addition allows first
theoretical insights into the complexity of such problems. In this
talk we will discuss this approach and show that it also gives a
precise mathematical meaning to the previously vague term of
relevant parts of the input.

Title: Mad Max: Affine Spline Insights into Deep Learning; zoom link @ https://sites.google.com/view/minds-seminar/home

Date: 06/25/2020

Time: 2:30 PM - 3:30 PM

Place:

(Part of the One World MINDS series: https://sites.google.com/view/minds-seminar/home)
\[
\]
We build a rigorous bridge between deep networks (DNs) and approximation theory via spline functions and operators. Our key result is that a large class of DNs can be written as a composition of max-affine spline operators (MASOs), which provide a powerful portal through which to view and analyze their inner workings. For instance, conditioned on the input signal, the output of a MASO DN can be written as a simple affine transformation of the input. This implies that a DN constructs a set of signal-dependent, class-specific templates against which the signal is compared via a simple inner product; we explore the links to the classical theory of optimal classification via matched filters and the effects of data memorization. Going further, we propose a simple penalty term that can be added to the cost function of any DN learning algorithm to force the templates to be orthogonal with each other; this leads to significantly improved classification performance and reduced overfitting with no change to the DN architecture. The spline partition of the input signal space that is implicitly induced by a MASO directly links DNs to the theory of vector quantization (VQ) and K-means clustering, which opens up new geometric avenue to study how DNs organize signals in a hierarchical fashion. To validate the utility of the VQ interpretation, we develop and validate a new distance metric for signals and images that quantifies the difference between their VQ encodings.

Title: Beyond Sparsity: Non-Linear Harmonic Analysis with Phase for Deep Networks; zoom link @ https://sites.google.com/view/minds-seminar/home

Date: 07/02/2020

Time: 2:30 PM - 3:30 PM

Place:

(Part of One World MINDS seminar:
https://sites.google.com/view/minds-seminar/home)
\[
\]
Understanding the properties of deep neural networks is not just about applying standard harmonic analysis tools with a bit of optimization. It is shaking our understanding of non-linear harmonic analysis and opening new horizons. By considering complex image generation and classification problems of different complexities, I will show that sparsity is not always the answer and phase plays an important role to capture important structures including symmetries within multiscale representations. This talk will raise more questions than answers.

Title: Convergence of gradient flows for learning deep linear neural networks; zoom link @ https://sites.google.com/view/minds-seminar/home

Date: 07/09/2020

Time: 2:30 PM - 3:30 PM

Place:

(Part of One World MINDS seminar:
https://sites.google.com/view/minds-seminar/home)
\[
\]
Learning neural networks amounts to minimizing a loss function over given training data. Often gradient descent algorithms are used for this task, but their convergence properties are not yet well-understood. In order to make progress we consider the simplified setting of linear networks optimized via gradient flows. We show that such gradient flow defined with respect to the layers (factors) can be reinterpreted as a Riemannian gradient flow on the manifold of rank-r matrices in certain cases. The gradient flow always converges to a critical point of the underlying loss functional and, for almost all initializations, it converges to a global minimum on the manifold of rank-k matrices for some k.

Title: Multi-Domain Data Integration: From Observations to Mechanistic Insights; zoom link @ https://sites.google.com/view/minds-seminar/home

Date: 07/16/2020

Time: 2:30 PM - 3:30 PM

Place:

(Part of One World MINDS seminar: https://sites.google.com/view/minds-seminar/home)
\[
\]
Massive data collection holds the promise of a better understanding of complex phenomena and ultimately, of better decisions. An exciting opportunity in this regard stems from the growing availability of perturbation / intervention data (manufacturing, advertisement, education, genomics, etc.). In order to obtain mechanistic insights from such data, a major challenge is the integration of different data modalities (video, audio, interventional, observational, etc.). Using genomics and in particular the problem of identifying drugs for the repurposing against COVID-19 as an example, I will first discuss our recent work on coupling autoencoders in the latent space to integrate and translate between data of very different modalities such as sequencing and imaging. I will then present a framework for integrating observational and interventional data for causal structure discovery and characterize the causal relationships that are identifiable from such data. We end by a theoretical analysis of autoencoders linking overparameterization to memorization. In particular, I will characterize the implicit bias of overparameterized autoencoders and show that such networks trained using standard optimization methods implement associative memory. Collectively, our results have major implications for planning and learning from interventions in various application domains.

Speaker: Tamara Kolda, Sandia National Laboratories

Title: Practical Leverage-Based Sampling for Low-Rank Tensor Decomposition; zoom link @ https://sites.google.com/view/minds-seminar/home

Date: 07/23/2020

Time: 2:30 PM - 3:30 PM

Place:

(Part of One World MINDS seminar:
https://sites.google.com/view/minds-seminar/home)
\[
\]
Conventional algorithms for finding low-rank canonical polyadic (CP) tensor decompositions are unwieldy for large sparse tensors. The CP decomposition can be computed by solving a sequence of overdetermined least problems with special Khatri-Rao structure. In this work, we present an application of randomized algorithms to fitting the CP decomposition of sparse tensors, solving a significantly smaller sampled least squares problem at each iteration with probabilistic guarantees on the approximation errors. Prior work has shown that sketching is effective in the dense case, but the prior approach cannot be applied to the sparse case because a fast Johnson-Lindenstrauss transform (e.g., using a fast Fourier transform) must be applied in each mode, causing the sparse tensor to become dense. Instead, we perform sketching through leverage score sampling, crucially relying on the fact that the structure of the Khatri-Rao product allows sampling from overestimates of the leverage scores without forming the full product or the corresponding probabilities. Naive application of leverage score sampling is ineffective because we often have cases where a few scores are quite large, so we propose a novel hybrid of deterministic and random leverage-score sampling which consistently yields improved fits. Numerical results on real-world large-scale tensors show the method is significantly faster than competing methods without sacrificing accuracy. This is joint work with Brett Larsen, Stanford University.

Title: Sparse Recovery for Orthogonal Polynomial Transforms; zoom link @ https://sites.google.com/view/minds-seminar/home

Date: 07/30/2020

Time: 2:30 PM - 3:30 PM

Place:

(Part of One World MINDS seminar:
https://sites.google.com/view/minds-seminar/home)
\[
\]
Consider the following sparse recovery problem. We have query access to a vector x in R^N such that \hat{x} = Fx is k-sparse (or nearly k-sparse) for some orthogonal transform F. The goal is to output an approximation to \hat{x} in sublinear time. This problem has been well-studied in the special case that F is the Discrete Fourier Transform (DFT), and a long line of work has resulted in sparse Fast Fourier Transforms that run in time O(k polylog(N)). However, for transforms F other than the DFT (or closely related transforms like the Discrete Cosine Transform), the question is much less settled.
In this talk I'll describe sublinear-time algorithms---running in time poly(k log N)---for solving the sparse recovery problem for orthogonal transforms F that arise from orthogonal polynomials. More precisely, our algorithm works for any F that is an orthogonal polynomial transform derived from Jacobi polynomials. The Jacobi polynomials are a large class of classical orthogonal polynomials (and include Chebyshev and Legendre polynomials as special cases), and show up extensively in applications like numerical analysis and signal processing. One caveat of our work is that we require an assumption on the sparsity structure of the sparse vector, although we note that vectors with random support have this property with high probability.
Our approach is to give a very general reduction from the k-sparse sparse recovery problem to the 1-sparse sparse recovery problem, which works for any "flat" orthogonal polynomial transform.
This is joint work with Anna Gilbert, Albert Gu, Chris Re, and Atri Rudra.