Talk_id | Date | Speaker | Title |
26835
|
Thursday 1/7 2:30 PM
|
Jose Perea, Michigan State University
|
TBA; zoom link @ https://sites.google.com/view/minds-seminar/home
- Jose Perea, Michigan State University
- TBA; zoom link @ https://sites.google.com/view/minds-seminar/home
- 01/07/2021
- 2:30 PM - 3:30 PM
-
(Virtual Meeting Link)
- Olga Turanova (turanova@msu.edu)
No abstract available.
|
26848
|
Thursday 1/14 3:30 PM
|
Raymond Chan, City University of Hong Kong
|
TBA; zoom link @ https://sites.google.com/view/minds-seminar/home
- Raymond Chan, City University of Hong Kong
- TBA; zoom link @ https://sites.google.com/view/minds-seminar/home
- 01/14/2021
- 3:30 PM - 4:30 PM
-
(Virtual Meeting Link)
- Olga Turanova (turanova@msu.edu)
TBA; (Note the unusual time: 4:30pm Shanghai, 10:30am Paris.)
|
26961
|
Thursday 1/21 2:30 PM
|
Daniel Kane, University of California, San Diego
|
Point Location and Active Learning
- Daniel Kane, University of California, San Diego
- Point Location and Active Learning
- 01/21/2021
- 2:30 PM - 3:30 PM
- Online (virtual meeting)
(Virtual Meeting Link)
- Olga Turanova (turanova@msu.edu)
In the point location problem one is given a hyperplane arrangement and an unknown point. By making linear queries about that point one wants to determine which cell of the hyperplane arrangement it lies in. This problem has an unexpected connection to the problem in machine learning of actively learning a halfspace. We discuss these problems and their relationship and provide a new and nearly optimal algorithm for solving them.
|
26962
|
Thursday 1/28 3:30 AM
|
Zuowei Shen, National University of Singapore
|
Deep Approximation via Deep Learning
- Zuowei Shen, National University of Singapore
- Deep Approximation via Deep Learning
- 01/28/2021
- 3:30 AM - 4:30 AM
- Online (virtual meeting)
(Virtual Meeting Link)
- Olga Turanova (turanova@msu.edu)
The primary task of many applications is approximating/estimating a function through samples drawn from a probability distribution on the input space. The deep approximation is to approximate a function by compositions of many layers of simple functions, that can be viewed as a series of nested feature extractors. The key idea of deep learning network is to convert layers of compositions to layers of tuneable parameters that can be adjusted through a learning process, so that it achieves a good approximation with respect to the input data. In this talk, we shall discuss mathematical theory behind this new approach and approximation rate of deep network; how this new approach differs from the classic approximation theory, and how this new theory can be used to understand and design deep learning network.
|
26963
|
Thursday 2/4 2:30 PM
|
Tino Ullrich , TU Chemnitz
|
A New Subsampling Technique for Random Points and Optimal Least Squares Approximation of High-Dimensional Functions
- Tino Ullrich , TU Chemnitz
- A New Subsampling Technique for Random Points and Optimal Least Squares Approximation of High-Dimensional Functions
- 02/04/2021
- 2:30 PM - 3:30 PM
- Online (virtual meeting)
(Virtual Meeting Link)
- Olga Turanova (turanova@msu.edu)
We provide a new general upper bound for the minimal L2-worst-case recovery error in the framework of RKHS, where only n function samples are allowed. This quantity can be bounded in terms of the singular numbers of the compact embedding into the space of square integrable functions. It turns out that in many relevant situations this quantity is asymptotically only worse by square root of log(n) compared to the singular numbers. The algorithm which realizes this behavior is a weighted least squares algorithm based on a specific set of sampling nodes which works for the whole class of functions simultaneously. These points are constructed out of a random draw with respect to distribution tailored to the spectral properties of the reproducing kernel (importance sampling) in combination with a sub-sampling procedure coming from the celebrated proof of Weaver's conjecture, which was shown to be equivalent to the Kadison-Singer problem. For the above multivariate setting, it is still a fundamental open problem whether sampling algorithms are as powerful as algorithms allowing general linear information like Fourier or wavelet coefficients. However, the gap is now rather small. As a consequence, we may study well-known scenarios where it was widely believed that sparse grid sampling recovery methods perform optimally. It turns out that this is not the case for dimensions d > 2.
This is joint work with N. Nagel and M. Schaefer from TU Chemnitz.
|
26964
|
Thursday 2/11 2:30 PM
|
Massimo Fornasier , Technische Universität München
|
Consensus-based Optimization on the Sphere
- Massimo Fornasier , Technische Universität München
- Consensus-based Optimization on the Sphere
- 02/11/2021
- 2:30 PM - 3:30 PM
- Online (virtual meeting)
(Virtual Meeting Link)
- Olga Turanova (turanova@msu.edu)
I present new stochastic multi-particle models for global optimization of nonconvex functions on the sphere. These models belong to the class of Consensus-Based Optimization methods. In fact, particles move over the manifold driven by a drift towards an instantaneous consensus point, computed as a combination of the particle locations weighted by the cost function according to Laplace's principle. The consensus point represents an approximation to a global minimizer. The dynamics is further perturbed by a random vector field to favor exploration, whose variance is a function of the distance of the particles to the consensus point. In particular, as soon as the consensus is reached, then the stochastic component vanishes. In the first part of the talk, I present the well-posedness of the model on the sphere and we derive rigorously its mean-field approximation for large particle limit.
In the second part I address the proof of convergence of numerical schemes to global minimizers provided conditions of well-preparation of the initial datum. The proof combines the mean-field limit with a novel asymptotic analysis, and classical convergence results of numerical methods for SDE. We present several numerical experiments, which show that the proposed algorithm scales well with the dimension and is extremely versatile. To quantify the performances of the new approach, we show that the algorithm is able to perform essentially as good as ad hoc state of the art methods in challenging problems in signal processing and machine learning, namely the phase retrieval problem and the robust subspace detection.
Joint work with H. Huang, L. Pareschi, and P. Sünnen
|
26965
|
Thursday 2/18 2:30 PM
|
Mikhail Belkin , University of California, San Diego
|
A theory of optimization and transition to linearity in deep learning
- Mikhail Belkin , University of California, San Diego
- A theory of optimization and transition to linearity in deep learning
- 02/18/2021
- 2:30 PM - 3:30 PM
- Online (virtual meeting)
(Virtual Meeting Link)
- Olga Turanova (turanova@msu.edu)
The success of deep learning is due, to a large extent, to the remarkable effectiveness of gradient-based optimization methods applied to large neural networks. In this talk I will discuss some general mathematical principles allowing for efficient optimization in over-parameterized non-linear systems, a setting that includes deep neural networks. Remarkably, it seems that optimization of such systems is "easy". In particular, optimization problems corresponding to these systems are not convex, even locally, but instead satisfy locally the Polyak-Lojasiewicz (PL) condition allowing for efficient optimization by gradient descent or SGD. We connect the PL condition of these systems to the condition number associated to the tangent kernel and develop a non-linear theory parallel to classical analyses of over-parameterized linear equations.
In a related but conceptually separate development, I will discuss a new perspective on the remarkable recently discovered phenomenon of transition to linearity (constancy of NTK) in certain classes of large neural networks. I will show how this transition to linearity results from the scaling of the Hessian with the size of the network.
Joint work with Chaoyue Liu and Libin Zhu
|
26983
|
Thursday 2/25 3:30 AM
|
Zaiwen Wen, Peking University, China
|
Stochastic Second-Order Methods For Deep Learning
- Zaiwen Wen, Peking University, China
- Stochastic Second-Order Methods For Deep Learning
- 02/25/2021
- 3:30 AM - 4:30 AM
- Online (virtual meeting)
(Virtual Meeting Link)
- Olga Turanova (turanova@msu.edu)
Stochastic methods are widely used in deep learning. In this talk, we first review the state-of-the-art methods such as KFAC. Then we present a structured stochastic quasi-Newton method and a sketchy empirical natural gradient method. Numerical results on deep convolution networks illustrate that our methods are quite competitive to SGD and KFAC.
|
26984
|
Thursday 3/4 2:30 PM
|
Ronald DeVore, Texas A&M University
|
Deep Learning and Neural Networks: The Mathematical View
- Ronald DeVore, Texas A&M University
- Deep Learning and Neural Networks: The Mathematical View
- 03/04/2021
- 2:30 PM - 3:30 PM
- Online (virtual meeting)
(Virtual Meeting Link)
- Olga Turanova (turanova@msu.edu)
Deep Learning is much publicized and has had great empirical success on challenging problems in learning. Yet there is no quantifiable proof of performance and certified guarantees for these methods. This talk will give an overview of Deep Learning from the viewpoint of mathematics and numerical computation.
|
27001
|
Thursday 3/11 3:30 AM
|
Jianfeng Cai, Hong Kong University of Science and Technology
|
Landscape analysis of non-convex optimizations in phase retrieval
- Jianfeng Cai, Hong Kong University of Science and Technology
- Landscape analysis of non-convex optimizations in phase retrieval
- 03/11/2021
- 3:30 AM - 4:30 AM
- Online (virtual meeting)
(Virtual Meeting Link)
- Olga Turanova (turanova@msu.edu)
Non-convex optimization is a ubiquitous tool in scientific and engineering research. For many important problems, simple non-convex optimization algorithms often provide good solutions efficiently and effectively, despite possible local minima. One way to explain the success of these algorithms is through the global landscape analysis. In this talk, we present some results along with this direction for phase retrieval. The main results are, for several of non-convex optimizations in phase retrieval, a local minimum is also global and all other critical points have a negative directional curvature. The results not only explain why simple non-convex algorithms usually find a global minimizer for phase retrieval, but also are useful for developing new efficient algorithms with a theoretical guarantee by applying algorithms that are guaranteed to find a local minimum.
|
27002
|
Thursday 3/18 2:30 PM
|
Roberto Imbuzeiro Oliveira, IMPA, Rio de Janeiro
|
Sample average approximation with heavier tails
- Roberto Imbuzeiro Oliveira, IMPA, Rio de Janeiro
- Sample average approximation with heavier tails
- 03/18/2021
- 2:30 PM - 3:30 PM
- Online (virtual meeting)
(Virtual Meeting Link)
- Olga Turanova (turanova@msu.edu)
Consider an "ideal" optimization problem where constraints and objective function are defined in terms of expectations over some distribution P. The sample average approximation (SAA) -- a fundamental idea in stochastic optimization -- consists of replacing the expectations by an average over a sample from P. A key question is how much the solutions of the SAA differ from those of the original problem. Results by Shapiro from many years ago consider what happens asymptotically when the sample size diverges, especially when the solution of the ideal problem lies on the boundary of the feasible set. In joint work with Philip Thompson (Purdue), we consider what happens with finite samples. As we will see, our results improve upon the nonasymptotic state of the art in various ways: we allow for heavier tails, unbounded feasible sets, and obtain bounds that (in favorable cases) only depend on the geometry of the feasible set in a small neighborhood of the optimal solution. Our results combine "localization" and "fixed-point" type arguments inpired by the work of Mendelson with chaining-type inequalities. One of our contributions is showing what can be said when the SAA constraints are random.
|
27003
|
Thursday 3/25 2:30 PM
|
Rachel Ward , University of Texas at Austin
|
Function Approximation via Sparse Random Features
- Rachel Ward , University of Texas at Austin
- Function Approximation via Sparse Random Features
- 03/25/2021
- 2:30 PM - 2:30 PM
- Online (virtual meeting)
(Virtual Meeting Link)
- Olga Turanova (turanova@msu.edu)
Random feature methods have been successful in various machine learning tasks, are easy to compute, and come with theoretical accuracy bounds. They serve as an alternative approach to standard neural networks since they can represent similar function spaces without a costly training phase. However, for accuracy, random feature methods require more measurements than trainable parameters, limiting their use for data-scarce applications or problems in scientific machine learning. This paper introduces the sparse random feature method that learns parsimonious random feature models utilizing techniques from compressive sensing. We provide uniform bounds on the approximation error for functions in a reproducing kernel Hilbert space depending on the number of samples and the distribution of features. The error bounds improve with additional structural conditions, such as coordinate sparsity, compact clusters of the spectrum, or rapid spectral decay. We show that the sparse random feature method outperforms shallow networks for well-structured functions and applications to scientific machine learning tasks.
|
27004
|
Thursday 4/1 2:30 PM
|
Yi Ma , University of California, Berkeley
|
Deep Networks from First Principles
- Yi Ma , University of California, Berkeley
- Deep Networks from First Principles
- 04/01/2021
- 2:30 PM - 2:30 PM
- Online (virtual meeting)
(Virtual Meeting Link)
- Olga Turanova (turanova@msu.edu)
In this talk, we offer an entirely “white box’’ interpretation of deep (convolution) networks from the perspective of data compression (and group invariance). In particular, we show how modern deep layered architectures, linear (convolution) operators and nonlinear activations, and even all parameters can be derived from the principle of maximizing rate reduction (with group invariance). All layers, operators, and parameters of the network are explicitly constructed via forward propagation, instead of learned via back propagation. All components of so-obtained network, called ReduNet, have precise optimization, geometric, and statistical interpretation. There are also several nice surprises from this principled approach: it reveals a fundamental tradeoff between invariance and sparsity for class separability; it reveals a fundamental connection between deep networks and Fourier transform for group invariance – the computational advantage in the spectral domain (why spiking neurons?); this approach also clarifies the mathematical role of forward propagation (optimization) and backward propagation (variation). In particular, the so-obtained ReduNet is amenable to fine-tuning via both forward and backward (stochastic) propagation, both for optimizing the same objective.
This is joint work with students Yaodong Yu, Ryan Chan, Haozhi Qi of Berkeley, Dr. Chong You now at Google Research, and Professor John Wright of Columbia University.
|
28028
|
Thursday 4/8 2:30 PM
|
Mahdi Soltanolkotabi, University of Southern California
|
Precise Tradeoffs for Adversarial Training
- Mahdi Soltanolkotabi, University of Southern California
- Precise Tradeoffs for Adversarial Training
- 04/08/2021
- 2:30 PM - 3:30 PM
- Online (virtual meeting)
(Virtual Meeting Link)
- Olga Turanova (turanova@msu.edu)
Despite breakthrough performance, modern learning models are known to be highly vulnerable to small adversarial perturbations in their inputs. While a wide variety of recent adversarial training methods have been effective at improving robustness to perturbed inputs (robust accuracy), often this benefit is accompanied by a decrease in accuracy on benign inputs (standard accuracy), leading to a tradeoff between often competing objectives. Complicating matters further, recent empirical evidence suggests that a variety of other factors (size and quality of training data, model size, etc.) affect this tradeoff in somewhat surprising ways. In this talk we will provide a precise and comprehensive understanding of the role of adversarial training in the context of linear regression with Gaussian features and binary classification in a mixture model. We precisely characterize the standard/robust accuracy and the corresponding tradeoff achieved by a contemporary mini-max adversarial training approach in a high-dimensional regime where the number of data points and the parameters of the model grow in proportion to each other. Our theory for adversarial training algorithms also facilitates the rigorous study of how a variety of factors (size and quality of training data, model overparametrization etc.) affect the tradeoff between these two competing accuracies.
|
28029
|
Thursday 4/15 2:30 PM
|
Michael Wakin, Colorado School of Mines
|
Spectral Properties of Time-limited Toeplitz Operators and Applications in Signal Processing
- Michael Wakin, Colorado School of Mines
- Spectral Properties of Time-limited Toeplitz Operators and Applications in Signal Processing
- 04/15/2021
- 2:30 PM - 3:30 PM
- Online (virtual meeting)
(Virtual Meeting Link)
- Olga Turanova (turanova@msu.edu)
Toeplitz operators are fundamental and ubiquitous in signal processing and information theory as models for convolutional (filtering) systems. Due to the fact that any practical system can access only signals of finite duration, however, time-limited restrictions of Toeplitz operators are also of interest. In the discrete-time case, time-limited Toeplitz operators are simply Toeplitz matrices. In this talk we survey existing and present new bounds on the eigenvalues (spectra) of time-limited Toeplitz operators, and we discuss applications of these results in various signal processing contexts. As a special case, we discuss time-frequency limiting operators, which alternatingly limit a signal in the time and frequency domains. Slepian functions arise as eigenfunctions of these operators, and we describe applications of Slepian functions in spectral analysis of multiband signals, super-resolution SAR imaging, and blind beamforming in antenna arrays. This talk draws from joint work with numerous collaborators including Zhihui Zhu from the University of Denver.
|
28030
|
Thursday 4/22 4:30 AM
|
Shahar Mendelson, Australian National University
|
TBA
- Shahar Mendelson, Australian National University
- TBA
- 04/22/2021
- 4:30 AM - 5:30 AM
- Online (virtual meeting)
(Virtual Meeting Link)
- Olga Turanova (turanova@msu.edu)
No abstract available.
|
28031
|
Thursday 4/29 1:00 PM
|
Anne Gelb, Dartmouth College
|
TBA
- Anne Gelb, Dartmouth College
- TBA
- 04/29/2021
- 1:00 PM - 2:00 PM
- Online (virtual meeting)
(Virtual Meeting Link)
- Olga Turanova (turanova@msu.edu)
https://sites.google.com/view/minds-seminar/home
|