In the framework of tensor spaces, we consider orthogonalization kernels to generate an orthogonal basis of a tensor subspace from a set of linearly independent tensors. In particular, we investigate numerically the loss of orthogonality of six orthogonalization methods, namely Classical and Modified Gram-Schmidt with (CGS2, MGS2) and without (CGS, MGS) re-orthogonalization, the Gram approach, and the Householder transformation. To tackle the curse of dimensionality, we represent tensor with low rank approximation using the Tensor Train (TT) formalism, and we introduce recompression steps in the standard algorithm outline through the TT-rounding method at a prescribed accuracy. After describing the algorithm structure and properties, we illustrate numerically that the theoretical bounds for the loss of orthogonality in the classical matrix computation round-off analysis results are maintained, with the unit round-off replaced by the TT-rounding accuracy. The computational analysis for each orthogonalization kernel in terms of the memory requirement and the computational complexity measured as a function of the number of TT-rounding, which happens to be the computational most expensive operation, completes the study.
HAL
The backward stable variants of GMRES in variable accuracy
Emmanuel Agullo, Olivier Coulaud, Luc Giraud, and
3 more authors
In the context where the representation of the data is decoupled from the arithmetic used to process them, we investigate the backward stability of two backward-stable implementations of the GMRES method, namely the so-called Modified Gram-Schmidt (MGS) and the Householder variants. Considering data may be compressed to alleviate the memory footprint, we are interested in the situation where the leading part of the rounding error is related to the data representation. When the data representation of vectors introduces componentwise perturbations, we show that the existing backward stability analyses of MGS-GMRES and Householder-GMRES still apply. We illustrate this backward stability property in a practical context where an agnostic lossy compressor is employed and enables the reduction of the memory requirement to store the orthonormal Arnoldi basis or the Householder reflectors. Although technical arguments of the theoretical backward stability proofs do not readily apply to the situation where only the normwise relative perturbations of the vector storage can be controlled, we show experimentally that the backward stability is maintained; that is, the attainable normwise backward error is of the same order as the normwise perturbations induced by the data storage. We illustrate it with numerical experiments in two practical different contexts. The first one corresponds to the use of an agnostic compressor where vector compression is controlled normwise. The second one arises in the solution of tensor linear systems, where low-rank tensor approximations based on Tensor-Train is considered to tackle the curse of dimensionality.
HAL
A robust GMRES algorithm in Tensor Train format
Olivier Coulaud, Luc Giraud, and Martina Iannacito
We consider the solution of linear systems with tensor product structure using a GMRES algorithm. In order to cope with the computational complexity in large dimension both in terms of floating point operations and memory requirement, our algorithm is based on low-rank tensor representation, namely the Tensor Train format. In a backward error analysis framework, we show how the tensor approximation affects the accuracy of the computed solution. With the bacwkward perspective, we investigate the situations where the (𝑑+1)-dimensional problem to be solved results from the concatenation of a sequence of 𝑑-dimensional problems (like parametric linear operator or parametric right-hand side problems), we provide backward error bounds to relate the accuracy of the (𝑑+1)-dimensional computed solution with the numerical quality of the sequence of 𝑑-dimensional solutions that can be extracted form it. This enables to prescribe convergence threshold when solving the (𝑑+1)-dimensional problem that ensures the numerical quality of the 𝑑-dimensional solutions that will be extracted from the (𝑑+1)-dimensional computed solution once the solver has converged. The above mentioned features are illustrated on a set of academic examples of varying dimensions and sizes.
2021
HAL
Extension of Correspondence Analysis to multiway data-sets through High Order SVD: a geometric framework
Olivier Coulaud, Alain Franc, and Martina Iannacito
This paper presents an extension of Correspondence Analysis (CA) to tensors through High Order Singular Value Decomposition (HOSVD) from a geometric viewpoint. Correspondence analysis is a well-known tool, developed from principal component analysis, for studying contingency tables. Different algebraic extensions of CA to multi-way tables have been proposed over the years, nevertheless neglecting its geometric meaning. Relying on the Tucker model and the HOSVD, we propose a direct way to associate with each tensor mode a point cloud. We prove that the point clouds are related to each other. Specifically using the CA metrics we show that the barycentric relation is still true in the tensor framework. Finally two data sets are used to underline the advantages and the drawbacks of our strategy with respect to the classical matrix approaches.
Peer-reviewed journal
2021
BUMI
High order singular value decomposition for plant diversity estimation
Alessandra Bernardi, Martina Iannacito, and Duccio Rocchini
Bollettino dell’Unione Matematica Italiana, Dec 2021
We propose a new method to estimate plant diversity with Rényi and Rao indexes through the so called High Order Singular Value Decomposition (HOSVD) of tensors. Starting from NASA multi-spectral images we evaluate diversity and we compare original diversity estimates with those realized via the HOSVD compression methods for big data. Our strategy turns out to be extremely powerful in terms of memory storage and precision of the outcome. The obtained results are so promising that we can support the efficiency of our method in the ecological framework.
Comm. Eco.
Measuring diversity from space: a global view of the free and open source rasterdiv R package under a coding perspective
Elisa Thouverai, Matteo Marcantonio, Giovanni Bacaro, and
7 more authors
The variation of species diversity over space and time has been widely recognised as a key challenge in ecology. However, measuring species diversity over large areas might be difficult for logistic reasons related to both time and cost savings for sampling, as well as accessibility of remote ecosystems. In this paper, we present a new R package - rasterdiv - to calculate diversity indices based on remotely sensed data, by discussing the theory behind the developed algorithms. Obviously, measures of diversity from space should not be viewed as a replacement of in situ data on biological diversity, but they are rather complementary to existing data and approaches. In practice, they integrate available information of Earth surface properties, including aspects of functional (structural, biophysical and biochemical), taxonomic, phylogenetic and genetic diversity. Making use of the rasterdiv package can result useful in making multiple calculations based on reproducible open source algorithms, robustly rooted in Information Theory.
Glob. Ecol.
From zero to infinity: Minimum to maximum diversity of the planet by spatio-parametric Rao’s quadratic entropy
Duccio Rocchini, Matteo Marcantonio, Daniele Da Re, and
20 more authors
The majority of work done to gather information on the Earth’s biodiversity has been carried out using in-situ data, with known issues related to epistemology (e.g., species determination and taxonomy), spatial uncertainty, logistics (time and costs), among others. An alternative way to gather information about spatial ecosystem variability is the use of satellite remote sensing. It works as a powerful tool for attaining rapid and standardized information. Several metrics used to calculate remotely sensed diversity of ecosystems are based on Shannon’s information theory, namely on the differences in relative abundance of pixel reflectances in a certain area. Additional metrics like the Rao’s quadratic entropy allow the use of spectral distance beside abundance, but they are point descriptors of diversity, that is they can account only for a part of the whole diversity continuum. The aim of this paper is thus to generalize the Rao’s quadratic entropy by proposing its parameterization for the first time. Innovation The parametric Rao’s quadratic entropy, coded in R, (a) allows the representation of the whole continuum of potential diversity indices in one formula, and (b) starting from the Rao’s quadratic entropy, allows the explicit use of distances among pixel reflectance values, together with relative abundances. Main conclusions The proposed unifying measure is an integration between abundance- and distance-based algorithms to map the continuum of diversity given a satellite image at any spatial scale. Being part of the rasterdiv R package, the proposed method is expected to ensure high robustness and reproducibility.
MEE
rasterdiv—An Information Theory tailored R package for measuring ecosystem heterogeneity from space: To the origin and back
Duccio Rocchini, Elisa Thouverai, Matteo Marcantonio, and
32 more authors
Ecosystem heterogeneity has been widely recognized as a key ecological indicator of several ecological functions, diversity patterns and change, metapopulation dynamics, population connectivity or gene flow. In this paper, we present a new R package—rasterdiv—to calculate heterogeneity indices based on remotely sensed data. We also provide an ecological application at the landscape scale and demonstrate its power in revealing potentially hidden heterogeneity patterns. The rasterdiv package allows calculating multiple indices, robustly rooted in Information Theory, and based on reproducible open-source algorithms.
Other works
2022
thesis
Numerical linear algebra and data analysis in large dimensions using tensor format
Martina Iannacito
Inria center at the University of Bordeaux, France, Dec 2022
This work aims to establish which theoretical properties of classical linear algebra techniques, developed in two different contexts, namely numerical linear algebra and data analysis, are maintained and which are lost when extended to low-rank tensors. In the numerical linear algebra part, we experimentally study the effects of rounding errors on an iterative solver (Generalised Minimal RESidual) and several orthogonalization methods (Classical and Modified Gram-Schmidt and Householder transformation, among others) when extended to tensors by the Tensor Train (TT) formalism. In all the considered algorithms, we introduce additional rounding steps, with the TT-rounding compression algorithm, to cope with memory constraints, which are always crucial when dealing with tensors. Our experimental tests suggest that for these algorithms, the classical bounds due to the propagation of rounding errors remain valid, replacing the accuracy of the arithmetic with that of the TT-rounding algorithm. In the data analysis part, we geometrically study the generalization of correspondence analysis to multi-way tables by the Tucker tensor decomposition technique, thus contributing to the understanding of Multi-Way Correspondence Analysis (MWCA). Examples of MWCA applied to datasets complement the theoretical results. In particular, we perform the MWCA on an original ecological dataset provided to us in the framework of the Malabar project of the BioGeCo (INRAE) and Pleiade (Inria) teams.
2019
thesis
HOSVD FOR MULTISPECTRAL IMAGES. A numerical approach to the plant biodiversity estimate.
Remote sensing has already proved its power in estimating plant biodiversity. Since plants reflect and absorb in a specific way sunlight, ecologists use multispectral photos to estimate plant biodiversity of a certain area through biodiversity indexes, e.g., Rényi’s and Rao’s index. Working with these big data is often expensive in terms of storage memory requirements. A possible solution is compressing multispectral data through tensor techniques, as High Order Singular Value Decomposition. We apply two recent variants of HOSVD to spectral images and we compute biodiversity indexes over the compressed data generated. For the Rényi index compressing data is extremely advantageous, since the information loss in estimating biodiversity is very low. For Rao’s index the results are not as satisfactory, even if they are promising. Using approximatively 15% of the original data information, in the first case we estimate an average error per pixel between 5.5% and 7.5%, while in the second case the average error per pixel is between 17% and 19%.