This special issue of the IEEE Journal on Selected Areas in Information Theory is dedicated to the memory of Toby Berger, one of the most important information theorists of our time, who passed away in 2022 at the age of 81. He made foundational contributions to a wide range of areas in information theory, including rate-distortion theory, network information theory, quantum information theory, and bio-information theory. He also left a deep imprint on diverse fields in applied mathematics and theoretical engineering, such as Markov random fields, group testing, multiple access theory, and detection and estimation. Well known for his technical brilliance, he tackled many challenging problems, but above all, it is his pursuit of elegance in research and writing that shines throughout his work. The goal of this special issue is to celebrate Toby Berger’s lasting legacy and his impact on information theory and beyond. Original research papers on topics within the realm of his scientific investigations and their “offspring”, as well as expository articles that survey his pioneering contributions and their modern developments, are invited.

In the problem of quickest change detection, a change occurs at some unknown time in the distribution of a sequence of random vectors that are monitored in real time, and the goal is to detect this change as quickly as possible subject to a certain false alarm constraint. In this work we consider this problem in the presence of parametric uncertainty in the post-change regime and controlled sensing. That is, the post-change distribution contains an unknown parameter, and the distribution of each observation, before and after the change, is affected by a control action. In this context, in addition to a stopping rule that determines the time at which it is declared that the change has occurred, one also needs to determine a sequential control policy, which chooses the control action at each time based on the already collected observations. We formulate this problem mathematically using Lorden’s minimax criterion, and assuming that there are finitely many possible actions and post-change parameter values. We then propose a specific procedure for this problem that employs an adaptive CuSum statistic in which (i) the estimate of the parameter is based on a fixed number of the more recent observations, and (ii) each action is selected to maximize the Kullback-Leibler divergence of the next observation based on the current parameter estimate, apart from a small number of exploration times. We show that this procedure, which we call the Windowed Chernoff-CuSum (WCC), is first-order asymptotically optimal under Lorden’s minimax criterion, for every possible value of the unknown post-change parameter, as the mean time to false alarm goes to infinity. We also provide simulation results to illustrate the performance of the WCC procedure.

We study the problem of group testing with non-identical, independent priors. So far, the pooling strategies that have been proposed in the literature take the following approach: a hand-crafted test design along with a decoding strategy is proposed, and guarantees are provided on how many tests are sufficient in order to identify all infections in a population. In this paper, we take a different, yet perhaps more practical, approach: we fix the decoder and the number of tests, and we ask, given these, what is the best test design one could use? We explore this question for the Definite Non-Defectives (DND) decoder. We formulate a (non-convex) optimization problem, where the objective function is the expected number of errors for a particular design. We find approximate solutions via gradient descent, which we further optimize with informed initialization. We illustrate through simulations that our method can achieve significant performance improvement over traditional approaches.

We consider lossy compression of an information source when the decoder has lossless access to a correlated one. This setup, also known as the Wyner-Ziv problem, is a special case of distributed source coding. To this day, practical approaches for the Wyner-Ziv problem have neither been fully developed nor heavily investigated. We propose a data-driven method based on machine learning that leverages the universal function approximation capability of artificial neural networks. We find that our neural network-based compression scheme, based on variational vector quantization, recovers some principles of the optimum theoretical solution of the Wyner-Ziv setup, such as binning in the source space as well as optimal combination of the quantization index and side information, for exemplary sources. These behaviors emerge although no structure exploiting knowledge of the source distributions was imposed. Binning is a widely used tool in information theoretic proofs and methods, and to our knowledge, this is the first time it has been explicitly observed to emerge from data-driven learning.

In the popular federated learning scenarios, distributed nodes often represent and exchange information through functions or statistics of data, with communicative processes constrained by the dimensionality of transmitted information. This paper investigates the fundamental limits of distributed parameter estimation and model training problems under such constraints. Specifically, we assume that each node can observe a sequence of i.i.d. sampled data and communicate statistics of the observed data with dimensionality constraints. We first show the Cramer-Rao lower bound (CRLB) and the corresponding achievable estimators for the distributed parameter estimation problems, and the geometric insights and the computable algorithms of designing efficient estimators are also presented. Moreover, we consider model parameters training for distributed nodes with limited communicable statistics. We demonstrate that in order to optimize the excess risk, the feature functions of the statistics shall be designed along the largest eigenvectors of a matrix induced by the model training loss function. In summary, our results potentially provide theoretical guidelines of designing efficient algorithms for enhancing the performance of distributed learning systems.

Wireless communication links suffer from outage events caused by fading and interference. To facilitate a tractable analysis of network communication throughput and latency, we propose an outage link model to represent a communication link in the slow fading phenomenon. For a line-topology network with outage links, we study three types of intermediate network node schemes: random linear network coding, store-and-forward, and hop-by-hop retransmission. We provide the analytical formulas for the maximum throughputs and the end-to-end latency for each scheme. To gain a more explicit understanding, we perform a scalability analysis of the throughput and latency as the network length increases. We observe that the same order of throughput/latency holds across a wide range of outage functions for each scheme. We illustrate how our exact formulae and scalability results can be applied to compare different schemes.

Consider a source and multiple users who observe the independent and identically distributed (i.i.d.) copies of correlated Gaussian random variables. The source wishes to compress its observations and store the result in a public database such that (i) authorized sets of users are able to reconstruct the source with a certain distortion level, and (ii) information leakage to non-authorized sets of colluding users is minimized. In other words, the recovery of the source is restricted to a predefined access structure. The main result of this paper is a closed-form characterization of the fundamental trade-off between the source coding rate and the information leakage rate. As an example, threshold access structures are studied, i.e., the case where any set of at least t users is able to reconstruct the source with some predefined distortion level and the information leakage at any set of users with a size smaller than t is minimized.

We consider the Distributed Source Coding (DSC) problem concerning the task of encoding an input in the absence of correlated side information that is only available to the decoder. Remarkably, Slepian and Wolf showed in 1973 that an encoder without access to the side information can asymptotically achieve the same compression rate as when the side information is available to it. This seminal result was later extended to lossy compression of distributed sources by Wyner, Ziv, Berger, and Tung. While there is vast prior work on this topic, practical DSC has been limited to synthetic datasets and specific correlation structures. Here we present a framework for lossy DSC that is agnostic to the correlation structure and can scale to high dimensions. Rather than relying on hand-crafted source modeling, our method utilizes a conditional Vector-Quantized Variational auto-encoder (VQ-VAE) to learn the distributed encoder and decoder. We evaluate our method on multiple datasets and show that our method can handle complex correlations and achieves state-of-the-art PSNR.

This paper brings together topics of two of Berger’s main contributions to information theory: distributed source coding, and living information theory. Our goal is to understand which information theory techniques can be helpful in understanding a distributed source coding strategy used by the natural world. Towards this goal, we study the example of the encoding of location of an animal by grid cells in its brain. We use information measures of partial information decomposition (PID) to assess the unique, redundant, and synergistic information carried by multiple grid cells, first for simulated grid cells utilizing known encodings, and subsequently for data from real grid cells. In all cases, we make simplifying assumptions so we can assess the consistency of specific PID definitions with intuition. Our results suggest that the measure of PID proposed by Bertschinger et al. (Entropy, 2014) provides intuitive insights on distributed source coding by grid cells, and can be used for subsequent studies for understanding grid-cell encoding as well as broadly in neuroscience.

Conventional image compression techniques are primarily developed for the human visual system. However, with the extensive use of deep neural networks (DNNs) for computer vision, more and more images will be consumed by DNN-based intelligent machines, which makes it crucial to develop image compression techniques customized for DNN vision while being JPEG compliant. In this paper, we revisit the JPEG rate distortion theory for DNN vision. First, we propose a novel distortion measure, dubbed the sensitivity weighted error (SWE), for DNN vision. Second, we incorporate SWE into the soft decision quantization (SDQ) process of JPEG to trade SWE for rate. Finally, we develop an algorithm, called OptS, for designing optimal quantization tables for the luminance channel and chrominance channels, respectively. To test the performance of the resulting DNN-oriented compression framework and algorithm, experiments of image classification are conducted on the ImageNet dataset for four prevalent DNN models. Results demonstrate that our proposed framework and algorithm achieve better rate-accuracy (R-A) performance than the default JPEG. For some DNN models, our proposed framework and algorithm provide a significant reduction in the compression rate up to 67.84% with no accuracy loss compared to the default JPEG.

We consider a line network of nodes, connected by additive white noise channels, equipped with local feedback. We study the velocity at which information spreads over this network. For transmission of a data packet, we give an explicit positive lower bound on the velocity, for any packet size. Furthermore, we consider streaming, that is, transmission of data packets generated at a given average arrival rate. We show that a positive velocity exists as long as the arrival rate is below the individual Gaussian channel capacity, and provide an explicit lower bound. Our analysis involves applying pulse-amplitude modulation to the data (successively in the streaming case), and using linear mean-squared error estimation at the network nodes. For general white noise, we derive exponential error-probability bounds. For single-packet transmission over channels with (sub-)Gaussian noise, we show a doubly-exponential behavior, which reduces to the celebrated Schalkwijk–Kailath scheme when considering a single node. Viewing the constellation as an “analog source”, we also provide bounds on the exponential decay of the mean-squared error of source transmission over the network.

The problem of coding for the uplink and downlink of cloud radio access networks (C-RAN’s) with K users and L relays is considered. It is shown that low-complexity coding schemes that achieve any point in the rate-fronthaul region of joint coding and compression can be constructed starting from at most $4(K+L)-2$ point-to-point codes designed for symmetric channels. This reduces the seemingly hard task of constructing good codes for C-RAN’s to the much better understood task of finding good codes for single-user channels. To show this result, an equivalence between the achievable rate-fronthaul regions of joint coding and successive coding is established. Then, rate-splitting and quantization-splitting techniques are used to show that the task of achieving a rate-fronthaul point in the joint coding region can be simplified to that of achieving a corner point in a higher-dimensional C-RAN problem. As a by-product, some interesting properties of the rate-fronthaul region of joint decoding for uplink C-RAN’s are also derived.

The Shannon lower bound has been the subject of several important contributions by Berger. This paper surveys Shannon bounds on rate-distortion problems under mean-squared error distortion with a particular emphasis on Berger’s techniques. Moreover, as a new result, the Gray-Wyner network is added to the canon of settings for which such bounds are known. In the Shannon bounding technique, elegant lower bounds are expressed in terms of the source entropy power. Moreover, there is often a complementary upper bound that involves the source variance in such a way that the bounds coincide in the special case of Gaussian statistics. Such pairs of bounds are sometimes referred to as Shannon bounds. The present paper puts Berger’s work on many aspects of this problem in the context of more recent developments, encompassing indirect and remote source coding such as the CEO problem, originally proposed by Berger, as well as the Gray-Wyner network as a new contribution.

Proactive testing and interventions are crucial for disease containment during a pandemic until widespread vaccination is achieved. However, a key challenge remains: Can we accurately identify all new daily infections with only a fraction of tests needed compared to testing everyone, everyday? Group testing reduces the number of tests but overlooks infection dynamics and non i.i.d nature of infections in a community, while on the other hand traditional SIR (Susceptible-Infected-Recovered) models address these dynamics but don’t integrate discrete-time testing and interventions. This paper bridges the gap. We propose a “discrete-time SIR stochastic block model” that incorporates group testing and daily interventions, as a discrete counterpart to the well-known continuous-time SIR model that reflects community structure through a specific weighted graph. We analyze the model to determine the minimum number of daily group tests required to identify all infections with vanishing error probability. We find that one can leverage the knowledge of the community and the model to inform nonadaptive group testing algorithms that are order-optimal, and therefore achieve the same performance as complete testing using a much smaller number of tests.