Talk_id  Date  Speaker  Title 
26835

Thursday 1/7 2:30 PM

Jose Perea, Michigan State University

TBA; zoom link @ https://sites.google.com/view/mindsseminar/home
 Jose Perea, Michigan State University
 TBA; zoom link @ https://sites.google.com/view/mindsseminar/home
 01/07/2021
 2:30 PM  3:30 PM

(Virtual Meeting Link)
 Olga Turanova (turanova@msu.edu)
No abstract available.

26848

Thursday 1/14 3:30 PM

Raymond Chan, City University of Hong Kong

TBA; zoom link @ https://sites.google.com/view/mindsseminar/home
 Raymond Chan, City University of Hong Kong
 TBA; zoom link @ https://sites.google.com/view/mindsseminar/home
 01/14/2021
 3:30 PM  4:30 PM

(Virtual Meeting Link)
 Olga Turanova (turanova@msu.edu)
TBA; (Note the unusual time: 4:30pm Shanghai, 10:30am Paris.)

26961

Thursday 1/21 2:30 PM

Daniel Kane, University of California, San Diego

Point Location and Active Learning
 Daniel Kane, University of California, San Diego
 Point Location and Active Learning
 01/21/2021
 2:30 PM  3:30 PM
 Online (virtual meeting)
(Virtual Meeting Link)
 Olga Turanova (turanova@msu.edu)
In the point location problem one is given a hyperplane arrangement and an unknown point. By making linear queries about that point one wants to determine which cell of the hyperplane arrangement it lies in. This problem has an unexpected connection to the problem in machine learning of actively learning a halfspace. We discuss these problems and their relationship and provide a new and nearly optimal algorithm for solving them.

26962

Thursday 1/28 3:30 AM

Zuowei Shen, National University of Singapore

Deep Approximation via Deep Learning
 Zuowei Shen, National University of Singapore
 Deep Approximation via Deep Learning
 01/28/2021
 3:30 AM  4:30 AM
 Online (virtual meeting)
(Virtual Meeting Link)
 Olga Turanova (turanova@msu.edu)
The primary task of many applications is approximating/estimating a function through samples drawn from a probability distribution on the input space. The deep approximation is to approximate a function by compositions of many layers of simple functions, that can be viewed as a series of nested feature extractors. The key idea of deep learning network is to convert layers of compositions to layers of tuneable parameters that can be adjusted through a learning process, so that it achieves a good approximation with respect to the input data. In this talk, we shall discuss mathematical theory behind this new approach and approximation rate of deep network; how this new approach differs from the classic approximation theory, and how this new theory can be used to understand and design deep learning network.

26963

Thursday 2/4 2:30 PM

Tino Ullrich , TU Chemnitz

A New Subsampling Technique for Random Points and Optimal Least Squares Approximation of HighDimensional Functions
 Tino Ullrich , TU Chemnitz
 A New Subsampling Technique for Random Points and Optimal Least Squares Approximation of HighDimensional Functions
 02/04/2021
 2:30 PM  3:30 PM
 Online (virtual meeting)
(Virtual Meeting Link)
 Olga Turanova (turanova@msu.edu)
We provide a new general upper bound for the minimal L2worstcase recovery error in the framework of RKHS, where only n function samples are allowed. This quantity can be bounded in terms of the singular numbers of the compact embedding into the space of square integrable functions. It turns out that in many relevant situations this quantity is asymptotically only worse by square root of log(n) compared to the singular numbers. The algorithm which realizes this behavior is a weighted least squares algorithm based on a specific set of sampling nodes which works for the whole class of functions simultaneously. These points are constructed out of a random draw with respect to distribution tailored to the spectral properties of the reproducing kernel (importance sampling) in combination with a subsampling procedure coming from the celebrated proof of Weaver's conjecture, which was shown to be equivalent to the KadisonSinger problem. For the above multivariate setting, it is still a fundamental open problem whether sampling algorithms are as powerful as algorithms allowing general linear information like Fourier or wavelet coefficients. However, the gap is now rather small. As a consequence, we may study wellknown scenarios where it was widely believed that sparse grid sampling recovery methods perform optimally. It turns out that this is not the case for dimensions d > 2.
This is joint work with N. Nagel and M. Schaefer from TU Chemnitz.

26964

Thursday 2/11 2:30 PM

Massimo Fornasier , Technische Universität München

Consensusbased Optimization on the Sphere
 Massimo Fornasier , Technische Universität München
 Consensusbased Optimization on the Sphere
 02/11/2021
 2:30 PM  3:30 PM
 Online (virtual meeting)
(Virtual Meeting Link)
 Olga Turanova (turanova@msu.edu)
I present new stochastic multiparticle models for global optimization of nonconvex functions on the sphere. These models belong to the class of ConsensusBased Optimization methods. In fact, particles move over the manifold driven by a drift towards an instantaneous consensus point, computed as a combination of the particle locations weighted by the cost function according to Laplace's principle. The consensus point represents an approximation to a global minimizer. The dynamics is further perturbed by a random vector field to favor exploration, whose variance is a function of the distance of the particles to the consensus point. In particular, as soon as the consensus is reached, then the stochastic component vanishes. In the first part of the talk, I present the wellposedness of the model on the sphere and we derive rigorously its meanfield approximation for large particle limit.
In the second part I address the proof of convergence of numerical schemes to global minimizers provided conditions of wellpreparation of the initial datum. The proof combines the meanfield limit with a novel asymptotic analysis, and classical convergence results of numerical methods for SDE. We present several numerical experiments, which show that the proposed algorithm scales well with the dimension and is extremely versatile. To quantify the performances of the new approach, we show that the algorithm is able to perform essentially as good as ad hoc state of the art methods in challenging problems in signal processing and machine learning, namely the phase retrieval problem and the robust subspace detection.
Joint work with H. Huang, L. Pareschi, and P. Sünnen

26965

Thursday 2/18 2:30 PM

Mikhail Belkin , University of California, San Diego

A theory of optimization and transition to linearity in deep learning
 Mikhail Belkin , University of California, San Diego
 A theory of optimization and transition to linearity in deep learning
 02/18/2021
 2:30 PM  3:30 PM
 Online (virtual meeting)
(Virtual Meeting Link)
 Olga Turanova (turanova@msu.edu)
The success of deep learning is due, to a large extent, to the remarkable effectiveness of gradientbased optimization methods applied to large neural networks. In this talk I will discuss some general mathematical principles allowing for efficient optimization in overparameterized nonlinear systems, a setting that includes deep neural networks. Remarkably, it seems that optimization of such systems is "easy". In particular, optimization problems corresponding to these systems are not convex, even locally, but instead satisfy locally the PolyakLojasiewicz (PL) condition allowing for efficient optimization by gradient descent or SGD. We connect the PL condition of these systems to the condition number associated to the tangent kernel and develop a nonlinear theory parallel to classical analyses of overparameterized linear equations.
In a related but conceptually separate development, I will discuss a new perspective on the remarkable recently discovered phenomenon of transition to linearity (constancy of NTK) in certain classes of large neural networks. I will show how this transition to linearity results from the scaling of the Hessian with the size of the network.
Joint work with Chaoyue Liu and Libin Zhu

26983

Thursday 2/25 3:30 AM

Zaiwen Wen, Peking University, China

Stochastic SecondOrder Methods For Deep Learning
 Zaiwen Wen, Peking University, China
 Stochastic SecondOrder Methods For Deep Learning
 02/25/2021
 3:30 AM  4:30 AM
 Online (virtual meeting)
(Virtual Meeting Link)
 Olga Turanova (turanova@msu.edu)
Stochastic methods are widely used in deep learning. In this talk, we first review the stateoftheart methods such as KFAC. Then we present a structured stochastic quasiNewton method and a sketchy empirical natural gradient method. Numerical results on deep convolution networks illustrate that our methods are quite competitive to SGD and KFAC.

26984

Thursday 3/4 2:30 PM

Ronald DeVore, Texas A&M University

Deep Learning and Neural Networks: The Mathematical View
 Ronald DeVore, Texas A&M University
 Deep Learning and Neural Networks: The Mathematical View
 03/04/2021
 2:30 PM  3:30 PM
 Online (virtual meeting)
(Virtual Meeting Link)
 Olga Turanova (turanova@msu.edu)
Deep Learning is much publicized and has had great empirical success on challenging problems in learning. Yet there is no quantifiable proof of performance and certified guarantees for these methods. This talk will give an overview of Deep Learning from the viewpoint of mathematics and numerical computation.

27001

Thursday 3/11 3:30 AM

Jianfeng Cai, Hong Kong University of Science and Technology

Landscape analysis of nonconvex optimizations in phase retrieval
 Jianfeng Cai, Hong Kong University of Science and Technology
 Landscape analysis of nonconvex optimizations in phase retrieval
 03/11/2021
 3:30 AM  4:30 AM
 Online (virtual meeting)
(Virtual Meeting Link)
 Olga Turanova (turanova@msu.edu)
Nonconvex optimization is a ubiquitous tool in scientific and engineering research. For many important problems, simple nonconvex optimization algorithms often provide good solutions efficiently and effectively, despite possible local minima. One way to explain the success of these algorithms is through the global landscape analysis. In this talk, we present some results along with this direction for phase retrieval. The main results are, for several of nonconvex optimizations in phase retrieval, a local minimum is also global and all other critical points have a negative directional curvature. The results not only explain why simple nonconvex algorithms usually find a global minimizer for phase retrieval, but also are useful for developing new efficient algorithms with a theoretical guarantee by applying algorithms that are guaranteed to find a local minimum.

27002

Thursday 3/18 2:30 PM

Roberto Imbuzeiro Oliveira, IMPA, Rio de Janeiro

Sample average approximation with heavier tails
 Roberto Imbuzeiro Oliveira, IMPA, Rio de Janeiro
 Sample average approximation with heavier tails
 03/18/2021
 2:30 PM  3:30 PM
 Online (virtual meeting)
(Virtual Meeting Link)
 Olga Turanova (turanova@msu.edu)
Consider an "ideal" optimization problem where constraints and objective function are defined in terms of expectations over some distribution P. The sample average approximation (SAA)  a fundamental idea in stochastic optimization  consists of replacing the expectations by an average over a sample from P. A key question is how much the solutions of the SAA differ from those of the original problem. Results by Shapiro from many years ago consider what happens asymptotically when the sample size diverges, especially when the solution of the ideal problem lies on the boundary of the feasible set. In joint work with Philip Thompson (Purdue), we consider what happens with finite samples. As we will see, our results improve upon the nonasymptotic state of the art in various ways: we allow for heavier tails, unbounded feasible sets, and obtain bounds that (in favorable cases) only depend on the geometry of the feasible set in a small neighborhood of the optimal solution. Our results combine "localization" and "fixedpoint" type arguments inpired by the work of Mendelson with chainingtype inequalities. One of our contributions is showing what can be said when the SAA constraints are random.

27003

Thursday 3/25 2:30 PM

Rachel Ward , University of Texas at Austin

Function Approximation via Sparse Random Features
 Rachel Ward , University of Texas at Austin
 Function Approximation via Sparse Random Features
 03/25/2021
 2:30 PM  2:30 PM
 Online (virtual meeting)
(Virtual Meeting Link)
 Olga Turanova (turanova@msu.edu)
Random feature methods have been successful in various machine learning tasks, are easy to compute, and come with theoretical accuracy bounds. They serve as an alternative approach to standard neural networks since they can represent similar function spaces without a costly training phase. However, for accuracy, random feature methods require more measurements than trainable parameters, limiting their use for datascarce applications or problems in scientific machine learning. This paper introduces the sparse random feature method that learns parsimonious random feature models utilizing techniques from compressive sensing. We provide uniform bounds on the approximation error for functions in a reproducing kernel Hilbert space depending on the number of samples and the distribution of features. The error bounds improve with additional structural conditions, such as coordinate sparsity, compact clusters of the spectrum, or rapid spectral decay. We show that the sparse random feature method outperforms shallow networks for wellstructured functions and applications to scientific machine learning tasks.

27004

Thursday 4/1 2:30 PM

Yi Ma , University of California, Berkeley

Deep Networks from First Principles
 Yi Ma , University of California, Berkeley
 Deep Networks from First Principles
 04/01/2021
 2:30 PM  2:30 PM
 Online (virtual meeting)
(Virtual Meeting Link)
 Olga Turanova (turanova@msu.edu)
In this talk, we offer an entirely “white box’’ interpretation of deep (convolution) networks from the perspective of data compression (and group invariance). In particular, we show how modern deep layered architectures, linear (convolution) operators and nonlinear activations, and even all parameters can be derived from the principle of maximizing rate reduction (with group invariance). All layers, operators, and parameters of the network are explicitly constructed via forward propagation, instead of learned via back propagation. All components of soobtained network, called ReduNet, have precise optimization, geometric, and statistical interpretation. There are also several nice surprises from this principled approach: it reveals a fundamental tradeoff between invariance and sparsity for class separability; it reveals a fundamental connection between deep networks and Fourier transform for group invariance – the computational advantage in the spectral domain (why spiking neurons?); this approach also clarifies the mathematical role of forward propagation (optimization) and backward propagation (variation). In particular, the soobtained ReduNet is amenable to finetuning via both forward and backward (stochastic) propagation, both for optimizing the same objective.
This is joint work with students Yaodong Yu, Ryan Chan, Haozhi Qi of Berkeley, Dr. Chong You now at Google Research, and Professor John Wright of Columbia University.

28028

Thursday 4/8 2:30 PM

Mahdi Soltanolkotabi, University of Southern California

Precise Tradeoffs for Adversarial Training
 Mahdi Soltanolkotabi, University of Southern California
 Precise Tradeoffs for Adversarial Training
 04/08/2021
 2:30 PM  3:30 PM
 Online (virtual meeting)
(Virtual Meeting Link)
 Olga Turanova (turanova@msu.edu)
Despite breakthrough performance, modern learning models are known to be highly vulnerable to small adversarial perturbations in their inputs. While a wide variety of recent adversarial training methods have been effective at improving robustness to perturbed inputs (robust accuracy), often this benefit is accompanied by a decrease in accuracy on benign inputs (standard accuracy), leading to a tradeoff between often competing objectives. Complicating matters further, recent empirical evidence suggests that a variety of other factors (size and quality of training data, model size, etc.) affect this tradeoff in somewhat surprising ways. In this talk we will provide a precise and comprehensive understanding of the role of adversarial training in the context of linear regression with Gaussian features and binary classification in a mixture model. We precisely characterize the standard/robust accuracy and the corresponding tradeoff achieved by a contemporary minimax adversarial training approach in a highdimensional regime where the number of data points and the parameters of the model grow in proportion to each other. Our theory for adversarial training algorithms also facilitates the rigorous study of how a variety of factors (size and quality of training data, model overparametrization etc.) affect the tradeoff between these two competing accuracies.

28029

Thursday 4/15 2:30 PM

Michael Wakin, Colorado School of Mines

Spectral Properties of Timelimited Toeplitz Operators and Applications in Signal Processing
 Michael Wakin, Colorado School of Mines
 Spectral Properties of Timelimited Toeplitz Operators and Applications in Signal Processing
 04/15/2021
 2:30 PM  3:30 PM
 Online (virtual meeting)
(Virtual Meeting Link)
 Olga Turanova (turanova@msu.edu)
Toeplitz operators are fundamental and ubiquitous in signal processing and information theory as models for convolutional (filtering) systems. Due to the fact that any practical system can access only signals of finite duration, however, timelimited restrictions of Toeplitz operators are also of interest. In the discretetime case, timelimited Toeplitz operators are simply Toeplitz matrices. In this talk we survey existing and present new bounds on the eigenvalues (spectra) of timelimited Toeplitz operators, and we discuss applications of these results in various signal processing contexts. As a special case, we discuss timefrequency limiting operators, which alternatingly limit a signal in the time and frequency domains. Slepian functions arise as eigenfunctions of these operators, and we describe applications of Slepian functions in spectral analysis of multiband signals, superresolution SAR imaging, and blind beamforming in antenna arrays. This talk draws from joint work with numerous collaborators including Zhihui Zhu from the University of Denver.

28030

Thursday 4/22 4:30 AM

Shahar Mendelson, Australian National University

Approximating L_p balls via sampling
 Shahar Mendelson, Australian National University
 Approximating L_p balls via sampling
 04/22/2021
 4:30 AM  5:30 AM
 Online (virtual meeting)
(Virtual Meeting Link)
 Olga Turanova (turanova@msu.edu)
Let X be a centred random vector in R^n. The L_p norms that X endows on R^n are defined by \v\_{L_p}= (E<X,v>^p)^{1/p}. The goal is to approximate those L_p norms, and the given data consists of N independent sample points X_1,...,X_N distributed as X. More accurately, one would like to construct datadependent functionals \phi_{p,\epsilon} which satisfy with (very) high probability, that for every v in R^n, (1\epsilon) \phi_{p,\epsilon} \leq E<X,v>^p \leq (1+\epsilon) \phi_{p,\epsilon}.
I will show that the functionals \frac{1}{N}\sum_{j \in J} <X_j,v>^p are a good choice, where the set of indices J is obtained from \{1,...,N\} by removing the c\eps^2 N largest values of <X_j,v>. Under mild assumptions on X, only N=(c^p)\epsilon^{2} n measurements are required, and the probability that the functional performs well is at least 12\exp(c\epsilon^2 N).

28031

Thursday 4/29 1:00 PM

Anne Gelb, Dartmouth College

Empirical Bayesian Inference using Joint Sparsity
 Anne Gelb, Dartmouth College
 Empirical Bayesian Inference using Joint Sparsity
 04/29/2021
 1:00 PM  2:00 PM
 Online (virtual meeting)
(Virtual Meeting Link)
 Olga Turanova (turanova@msu.edu)
We develop a new empirical Bayesian inference algorithm for solving a linear inverse problem given multiple measurement vectors (MMV) of undersampled and noisy observable data. Specifically, by exploiting the joint sparsity across the multiple measurements in the sparse domain of the underlying signal or image, we construct a new support informed sparsity promoting prior. Several applications can be modeled using this framework. Our numerical experiments demonstrate that using this new prior not only improves accuracy of the recovery, but also reduces the uncertainty in the posterior when compared to standard sparsity producing priors.
This is joint work with Theresa Scarnati formerly of the Air Force Research Lab Wright Patterson and now working at Qualis Corporation in Huntsville, AL, and Jack Zhang, recent bachelor degree recipient at Dartmouth College and now enrolled at University of Minnesota’s PhD program in mathematics.
