## Computer Science (cs) updates on the arXiv.org e-print archive



There is increasing regulatory interest in whether machine learning algorithms deployed in consequential domains (e.g. in criminal justice) treat different demographic groups "fairly." However, there are several proposed notions of fairness, typically mutually incompatible. Using criminal justice as an example, we study a model in which society chooses an incarceration rule. Agents of different demographic groups differ in their outside options (e.g. opportunity for legal employment) and decide whether to commit crimes. We show that equalizing type I and type II errors across groups is consistent with the goal of minimizing the overall crime rate; other popular notions of fairness are not.

This study presents the analytical and finite element formulation of a geometrically nonlinear and fractional-order nonlocal model of an Euler-Bernoulli beam. The finite nonlocal strains in the Euler-Bernoulli beam are obtained from a frame-invariant and dimensionally consistent fractional-order (nonlocal) continuum formulation. The finite fractional strain theory provides a positive definite formulation that results in a unique solution which is consistent across loading and boundary conditions. The governing equations and the corresponding boundary conditions of the geometrically nonlinear and nonlocal Euler-Bernoulli beam are obtained using variational principles. Further, a nonlinear finite element model for the fractional-order system is developed in order to achieve the numerical solution of the integro-differential nonlinear governing equations. Following a thorough validation with benchmark problems, the fractional finite element model (f-FEM) is used to study the geometrically nonlinear response of a nonlocal beam subject to various loading and boundary conditions. Although presented in the context of a 1D beam, this nonlinear f-FEM formulation can be extended to higher dimensional fractional-order boundary value problems.

An $\alpha$-additive spanner of an undirected graph $G=(V, E)$ is a subgraph $H$ such that the distance between any two vertices in $G$ is stretched by no more than an additive factor of $\alpha$. It is previously known that unweighted graphs have 2-, 4-, and 6-additive spanners containing $\widetilde{O}(n^{3/2})$, $\widetilde{O}(n^{7/5})$, and $O(n^{4/3})$ edges, respectively. In this paper, we generalize these results to weighted graphs. We consider $\alpha=2W$, $4W$, $6W$, where $W$ is the maximum edge weight in $G$. We first show that every $n$-node graph has a subsetwise $2W$-spanner on $O(n |S|^{1/2})$ edges where $S \subseteq V$ and $W$ is a constant. We then show that for every set $P$ with $|P| = p$ vertex demand pairs, there are pairwise $2W$- and $4W$-spanners on $O(np^{1/3})$ and $O(np^{2/7})$ edges respectively. We also show that for every set $P$, there is a $6W$-spanner on $O(np^{1/4})$ edges where $W$ is a constant. We then show that every graph has additive $2W$- and $4W$-spanners on $O(n^{3/2})$ and $O(n^{7/5})$ edges respectively. Finally, we show that every graph has an additive $6W$-spanner on $O(n^{4/3})$ edges where $W$ is a constant.

A recent research theme has been the development of automatic methods to minimize robots' resource footprints. In particular, the class of combinatorial filters (discrete variants of widely-used probabilistic estimators) has been studied and methods developed for automatically reducing their space requirements. This paper extends existing combinatorial filters by introducing a natural generalization that we dub cover combinatorial filters. In addressing the new---but still NP-complete---problem of minimization of cover filters, this paper shows that three of the concepts previously believed to be true about combinatorial filters (and actually conjectured, claimed, or assumed to be) are in fact false. For instance, minimization does not induce an equivalence relation. We give an exact algorithm for the cover filter minimization problem. Unlike prior work (based on graph coloring) we consider a type of clique-cover problem, involving a new conditional constraint, from which we can find more general relations. In addition to solving the more general problem, the algorithm we present also corrects flaws present in all prior filter reduction methods. The algorithm also forms a promising basis for practical future development as it involves a reduction to SAT.

The conventional high-speed Wi-Fi has recently become a contender for low-power Internet-of-Things (IoT) communications. OFDM continues its adoption in the new IoT Wi-Fi standard due to its spectrum efficiency that can support the demand of massive IoT connectivity. While the IoT Wi-Fi standard offers many new features to improve power and spectrum efficiency, the basic physical layer (PHY) structure of transceiver design still conforms to its conventional design rationale where access points (AP) and clients employ the same OFDM PHY. In this paper, we argue that current Wi-Fi PHY design does not take full advantage of the inherent asymmetry between AP and IoT. To fill the gap, we propose an asymmetric design where IoT devices transmit uplink packets using the lowest power while pushing all the decoding burdens to the AP side. Such a design utilizes the sufficient power and computational resources at AP to trade for the transmission (TX) power of IoT devices. The core technique enabling this asymmetric design is that the AP takes full power of its high clock rate to boost the decoding ability. We provide an implementation of our design and show that it can reduce up to 88% of the IoT's TX power when the AP sets $8\times$ clock rate.

To study radiotherapy-related adverse effects, detailed dose information (3D distribution) is needed for accurate dose-effect modeling. For childhood cancer survivors who underwent radiotherapy in the pre-CT era, only 2D radiographs were acquired, thus 3D dose distributions must be reconstructed. State-of-the-art methods achieve this by using 3D surrogate anatomies. These can however lack personalization and lead to coarse reconstructions. We present and validate a surrogate-free dose reconstruction method based on Machine Learning (ML). Abdominal planning CTs (n=142) of recently-treated childhood cancer patients were gathered, their organs at risk were segmented, and 300 artificial Wilms' tumor plans were sampled automatically. Each artificial plan was automatically emulated on the 142 CTs, resulting in 42,600 3D dose distributions from which dose-volume metrics were derived. Anatomical features were extracted from digitally reconstructed radiographs simulated from the CTs to resemble historical radiographs. Further, patient and radiotherapy plan features typically available from historical treatment records were collected. An evolutionary ML algorithm was then used to link features to dose-volume metrics. Besides 5-fold cross validation, a further evaluation was done on an independent dataset of five CTs each associated with two clinical plans. Cross-validation resulted in mean absolute errors $\leq$0.6 Gy for organs completely inside or outside the field. For organs positioned at the edge of the field, mean absolute errors $\leq$1.7 Gy for $D_{mean}$, $\leq$2.9 Gy for $D_{2cc}$, and $\leq$13% for $V_{5Gy}$ and $V_{10Gy}$, were obtained, without systematic bias. Similar results were found for the independent dataset. To conclude, our novel organ dose reconstruction method is not only accurate, but also efficient, as the setup of a surrogate is no longer needed.

Domain-specific software and hardware co-design is encouraging as it is much easier to achieve efficiency for fewer tasks. Agile domain-specific benchmarking speeds up the process as it provides not only relevant design inputs but also relevant metrics, and tools. Unfortunately, modern workloads like Big data, AI, and Internet services dwarf the traditional one in terms of code size, deployment scale, and execution path, and hence raise serious benchmarking challenges.

This paper proposes an agile domain-specific benchmarking methodology. Together with seventeen industry partners, we identify ten important end-to-end application scenarios, among which sixteen representative AI tasks are distilled as the AI component benchmarks. We propose the permutations of essential AI and non-AI component benchmarks as end-to-end benchmarks. An end-to-end benchmark is a distillation of the essential attributes of an industry-scale application. We design and implement a highly extensible, configurable, and flexible benchmark framework, on the basis of which, we propose the guideline for building end-to-end benchmarks, and present the first end-to-end Internet service AI benchmark.

The preliminary evaluation shows the value of our benchmark suite---AIBench against MLPerf and TailBench for hardware and software designers, micro-architectural researchers, and code developers. The specifications, source code, testbed, and results are publicly available from the web site \url{this http URL}.

A disc packing in the plane is compact if its contact graph is a triangulation. There are $9$ values of $r$ such that a compact packing by discs of radii $1$ and $r$ exists. We prove, for each of these $9$ values, that the maximal density over all the packings by discs of radii $1$ and $r$ is reached for a compact packing (we give it as well as its density).

Policy evaluation is a key process in Reinforcement Learning (RL). It assesses a given policy by estimating the corresponding value function. When using parameterized value functions, common approaches minimize the sum of squared Bellman temporal-difference errors and receive a point-estimate for the parameters. Kalman-based and Gaussian-processes based frameworks were suggested to evaluate the policy by treating the value as a random variable. These frameworks can learn uncertainties over the value parameters and exploit them for policy exploration. When adopting these frameworks to solve deep RL tasks, several limitations are revealed: excessive computations in each optimization step, difficulty with handling batches of samples which slows training and the effect of memory in stochastic environments which prevents off-policy learning. In this work, we discuss these limitations and propose to overcome them by an alternative general framework, based on the extended Kalman filter. We devise an optimization method, called Kalman Optimization for Value Approximation (KOVA) that can be incorporated as a policy evaluation component in policy optimization algorithms. KOVA minimizes a regularized objective function that concerns both parameter and noisy return uncertainties. We analyze the properties of KOVA and present its performance on deep RL control tasks.

Impersonators on Online Social Networks such as Instagram are playing an important role in the propagation of the content. These entities are the type of nefarious fake accounts that intend to disguise a legitimate account by making similar profiles. In addition to having impersonated profiles, we observed a considerable engagement from these entities to the published posts of verified accounts. Toward that end, we concentrate on the engagement of impersonators in terms of active and passive engagements which is studied in three major communities including Politician'', News agency'', and Sports star'' on Instagram. Inside each community, four verified accounts have been selected. Based on the implemented approach in our previous studies, we have collected 4.8K comments, and 2.6K likes across 566 posts created from 3.8K impersonators during 7 months. Our study shed light into this interesting phenomena and provides a surprising observation that can help us to understand better how impersonators engaging themselves inside Instagram in terms of writing Comments and leaving Likes.

In the second decade of the 21st century, blockchain definitely became one of the most trending computational technologies. This research aims to question the feasibility and suitability of using blockchain technology within e-voting systems, regarding both technical and non-technical aspects. In today's world, although the course of this spreading is considerably slow, several countries already use means of e-voting due to many social and economic reasons, which we further investigated. Nevertheless, the number of countries offering various e-government solutions, apart from e-voting, is significantly high. E-voting systems, naturally, require much more attention and assurance regarding potential security and anonymity issues, since voting is one of the few extremely critical governmental processes. Nevertheless, e-voting is not purely a governmental service, but many companies and nonprofit organizations would benefit the cost-efficiency, scalability, remote accessibility, and ease of use that it provides. Blockchain technology is claimed to be able to address some, obviously not all, important security concerns, including anonymity, confidentiality, integrity, and non-repudiation. The analysis results presented in this article mostly confirm these claims.

A Wireless Sensor Network (WSN) is a collection of tiny nodes that have low energy levels and have become an essential component of the modern communication infrastructure and very important in industry and academia. Energy is crucial in WSN, and thus the design of WSN in the research community is based on energy efficiency, and node energy consumption is a great challenge to enhance WSN's lifetime. It may be costly or even impossible to charge or replace consumed batteries because of the difficult environment. Many energy efficiency methods are introduced in this article to decrease energy consumption, improve network performance and increase network lifetime.

The recently proposed Multilinear Compressive Learning (MCL) framework combines Multilinear Compressive Sensing and Machine Learning into an end-to-end system that takes into account the multidimensional structure of the signals when designing the sensing and feature synthesis components. The key idea behind MCL is the assumption of the existence of a tensor subspace which can capture the essential features from the signal for the downstream learning task. Thus, the ability to find such a discriminative tensor subspace and optimize the system to project the signals onto that data manifold plays an important role in Multilinear Compressive Learning. In this paper, we propose a novel solution to address both of the aforementioned requirements, i.e., How to find those tensor subspaces in which the signals of interest are highly separable? and How to optimize the sensing and feature synthesis components to transform the original signals to the data manifold found in the first question? In our proposal, the discovery of a high-quality data manifold is conducted by training a nonlinear compressive learning system on the inference task. Its knowledge of the data manifold of interest is then progressively transferred to the MCL components via multi-stage supervised training with the supervisory information encoding how the compressed measurements, the synthesized features, and the predictions should be like. The proposed knowledge transfer algorithm also comes with a semi-supervised adaption that enables compressive learning models to utilize unlabeled data effectively. Extensive experiments demonstrate that the proposed knowledge transfer method can effectively train MCL models to compressively sense and synthesize better features for the learning tasks with improved performances, especially when the complexity of the learning task increases.

Graph neural networks (GNNs) have achieved outstanding performance in learning graph-structured data. Many current GNNs suffer from three problems when facing large-size graphs and using a deeper structure: neighbors explosion, node dependence, and oversmoothing. In this paper, we propose a general subgraph-based training framework, namely Ripple Walk Training (RWT), for deep and large graph neural networks. RWT samples subgraphs from the full graph to constitute a mini-batch and the full GNN is updated based on the mini-batch gradient. We analyze the high-quality subgraphs required in a mini-batch in a theoretical way. A novel sampling method Ripple Walk Sampler works for sampling these high-quality subgraphs to constitute the mini-batch, which considers both the randomness and connectivity of the graph-structured data. Extensive experiments on different sizes of graphs demonstrate the effectiveness of RWT in training various GNNs (GCN & GAT).

There is an extensive literature on dynamic algorithms for a large number of graph theoretic problems, particularly for all varieties of shortest path problems. Germane to this paper are a number fully dynamic algorithms that are known for chordal graphs. However, to the best of our knowledge no study has been done for the problem of dynamic algorithms for strongly chordal graphs. To address this gap, in this paper, we propose a semi-dynamic algorithm for edge-deletions and a semi-dynamic algorithm for edge-insertions in a strongly chordal graph, $G = (V, E)$, on $n$ vertices and $m$ edges. The query complexity of an edge-deletion is $O(d_u^2d_v^2 (n + m))$, where $d_u$ and $d_v$ are the degrees of the vertices $u$ and $v$ of the candidate edge $\{u, v\}$, while the query-complexity of an edge-insertion is $O(n^2)$.

In 1992, Nisan (Combinatorica'92) constructed a pseudorandom generator for length $n$, width $w$ read-once branching program with error $\varepsilon$ and seed length $O(\log n\cdot \log(nw)+\log n\cdot\log(1/\varepsilon))$. A central question in complexity theory is to reduce the seed length to $O(\log (nw/\varepsilon))$, which will imply $\mathbf{BPL}=\mathbf{L}$. However, there has been no improvement on Nisan's construction for the case $n=w$, which is most relevant to space-bounded derandomization.

Recently, in a beautiful work, Braverman, Cohen and Garg (STOC'18) introduced the notion of a pseudorandom pseudo-distribution (PRPD) and gave an explicit construction of a PRPD with seed length $\tilde{O}(\log n\cdot \log(nw)+\log(1/\varepsilon))$. A PRPD is a relaxation of a pseudorandom generator, which suffices for derandomizing $\mathbf{BPL}$ and also implies a hitting set. Unfortunately, their construction is quite involved and complicated. Hoza and Williams (FOCS'18) later constructed a much simpler hitting set generator with seed length $O(\log n\cdot \log(nw)+\log(1/\varepsilon))$, but their techniques are restricted to hitting sets.

In this work, we construct a PRPD with seed length $$O(\log n\cdot \log (nw)\cdot \log\log(nw)+\log(1/\varepsilon)).$$ This improves upon the construction in [BCG18] by a $O(\log\log(1/\varepsilon))$ factor, and is optimal in the small error regime. In addition, we believe our construction and analysis to be simpler than the work of Braverman, Cohen and Garg.

We describe a new method to remove short cycles on regular graphs while maintaining spectral bounds (the nontrivial eigenvalues of the adjacency matrix), as long as the graphs have certain combinatorial properties. These combinatorial properties are related to the number and distance between short cycles and are known to happen with high probability in uniformly random regular graphs.

Using this method we can show two results involving high girth spectral expander graphs. First, we show that given $d \geq 3$ and $n$, there exists an explicit distribution of $d$-regular $\Theta(n)$-vertex graphs where with high probability its samples have girth $\Omega(\log_{d - 1} n)$ and are $\epsilon$-near-Ramanujan; i.e., its eigenvalues are bounded in magnitude by $2\sqrt{d - 1} + \epsilon$ (excluding the single trivial eigenvalue of $d$). Then, for every constant $d \geq 3$ and $\epsilon > 0$, we give a deterministic poly$(n)$-time algorithm that outputs a $d$-regular graph on $\Theta(n)$-vertices that is $\epsilon$-near-Ramanujan and has girth $\Omega(\sqrt{\log n})$, based on the work of arXiv:1909.06988 .

The multi-armed bandit formalism has been extensively studied under various attack models, in which an adversary can modify the reward revealed to the player. Previous studies focused on scenarios where the attack value either is bounded at each round or has a vanishing probability of occurrence. These models do not capture powerful adversaries that can catastrophically perturb the revealed reward. This paper investigates the attack model where an adversary attacks with a certain probability at each round, and its attack value can be arbitrary and unbounded if it attacks. Furthermore, the attack value does not necessarily follow a statistical distribution. We propose a novel sample median-based and exploration-aided UCB algorithm (called med-E-UCB) and a median-based $\epsilon$-greedy algorithm (called med-$\epsilon$-greedy). Both of these algorithms are provably robust to the aforementioned attack model. More specifically we show that both algorithms achieve $\mathcal{O}(\log T)$ pseudo-regret (i.e., the optimal regret without attacks). We also provide a high probability guarantee of $\mathcal{O}(\log T)$ regret with respect to random rewards and random occurrence of attacks. These bounds are achieved under arbitrary and unbounded reward perturbation as long as the attack probability does not exceed a certain constant threshold. We provide multiple synthetic simulations of the proposed algorithms to verify these claims and showcase the inability of existing techniques to achieve sublinear regret. We also provide experimental results of the algorithm operating in a cognitive radio setting using multiple software-defined radios.

This paper proposes a framework for distributed, in-storage training of neural networks on clusters of computational storage devices. Such devices not only contain hardware accelerators but also eliminate data movement between the host and storage, resulting in both improved performance and power savings. More importantly, this in-storage processing style of training ensures that private data never leaves the storage while fully controlling the sharing of public data. Experimental results show up to 2.7x speedup and 69% reduction in energy consumption and no significant loss in accuracy.

To make decisions based on a model fit by Auto-Encoding Variational Bayes (AEVB), practitioners typically use importance sampling to estimate a functional of the posterior distribution. The variational distribution found by AEVB serves as the proposal distribution for importance sampling. However, this proposal distribution may give unreliable (high variance) importance sampling estimates, thus leading to poor decisions. We explore how changing the objective function for learning the variational distribution, while continuing to learn the generative model based on the ELBO, affects the quality of downstream decisions. For a particular model, we characterize the error of importance sampling as a function of posterior variance and show that proposal distributions learned with evidence upper bounds are better. Motivated by these theoretical results, we propose a novel variant of the VAE. In addition to experimenting with MNIST, we present a full-fledged application of the proposed method to single-cell RNA sequencing. In this challenging instance of multiple hypothesis testing, the proposed method surpasses the current state of the art.

Type-two constructions abound in cryptography: adversaries for encryption and authentication schemes, if active, are modeled as algorithms having access to oracles, i.e. as second-order algorithms. But how about making cryptographic schemes themselves higher-order? This paper gives an answer to this question, by first describing why higher-order cryptography is interesting as an object of study, then showing how the concept of probabilistic polynomial time algorithm can be generalized so as to encompass algorithms of order strictly higher than two, and finally proving some positive and negative results about the existence of higher-order cryptographic primitives, namely authentication schemes and pseudorandom functions.

The support vector machine (SVM) and deep learning (e.g., convolutional neural networks (CNNs)) are the two most famous algorithms in small and big data, respectively. Nonetheless, smaller datasets may be very important, costly, and not easy to obtain in a short time. This paper proposes a novel convolutional SVM (CSVM) that has both advantages of CNN and SVM to improve the accuracy and effectiveness of mining smaller datasets. The proposed CSVM adapts the convolution product from CNN to learn new information hidden deeply in the datasets. In addition, it uses a modified simplified swarm optimization (SSO) to help train the CSVM to update classifiers, and then the traditional SVM is implemented as the fitness for the SSO to estimate the accuracy. To evaluate the performance of the proposed CSVM, experiments were conducted to test five well-known benchmark databases for the classification problem. Numerical experiments compared favorably with those obtained using SVM, 3-layer artificial NN (ANN), and 4-layer ANN. The results of these experiments verify that the proposed CSVM with the proposed SSO can effectively increase classification accuracy.

With increasing technology developments, there is a massive number of websites with varying purposes. But a particular type exists within this large collection, the so-called phishing sites which aim to deceive their users. The main challenge in detecting phishing websites is discovering the techniques that have been used. Where phishers are continually improving their strategies and creating web pages that can protect themselves against many forms of detection methods. Therefore, it is very necessary to develop reliable, active and contemporary methods of phishing detection to combat the adaptive techniques used by phishers. In this paper, different phishing detection approaches are reviewed by classifying them into three main groups. Then, the proposed model is presented in two stages. In the first stage, different machine learning algorithms are applied to validate the chosen dataset and applying features selection methods on it. Thus, the best accuracy was achieved by utilizing only 20 features out of 48 features combined with Random Forest is 98.11%. While in the second stage, the same dataset is applied to various fuzzy logic algorithms. As well the experimental results from the application of Fuzzy logic algorithms were incredible. Where in applying the FURIA algorithm with only five features the accuracy rate was 99.98%. Finally, comparison and discussion of the results between applying machine learning algorithms and fuzzy logic algorithms is done. Where the performance of using fuzzy logic algorithms exceeds the use of machine learning algorithms.

The choice of activation function can have a large effect on the performance of a neural network. While there have been some attempts to hand-engineer novel activation functions, the Rectified Linear Unit (ReLU) remains the most commonly-used in practice. This paper shows that evolutionary algorithms can discover novel activation functions that outperform ReLU. A tree-based search space of candidate activation functions is defined and explored with mutation, crossover, and exhaustive search. Experiments on training wide residual networks on the CIFAR-10 and CIFAR-100 image datasets show that this approach is effective. Replacing ReLU with evolved activation functions results in statistically significant increases in network accuracy. Optimal performance is achieved when evolution is allowed to customize activation functions to a particular task; however, these novel activation functions are shown to generalize, achieving high performance across tasks. Evolutionary optimization of activation functions is therefore a promising new dimension of metalearning in neural networks.

Rate splitting (RS) is a potentially powerful and flexible technique for multi-antenna downlink transmission. In this paper, we address several technical challenges towards its practical implementation for beyond 5G systems. To this end, we focus on a single-cell system with a multi-antenna base station (BS) and K single-antenna receivers. We consider RS in its most general form, and joint decoding to fully exploit the potential of RS. First, we investigate the achievable rates under joint decoding and formulate the precoder design problems to maximize a general utility function, or to minimize the transmit power under pre-defined rate targets. Building upon the concave-convex procedure (CCCP), we propose precoder design algorithms for an arbitrary number of users. Our proposed algorithms approximate the intractable non-convex problems with a number of successively refined convex problems, and provably converge to stationary points of the original problems. Then, to reduce the decoding complexity, we consider the optimization of the precoder and the decoding order under successive decoding. Further, we propose a stream selection algorithm to reduce the number of precoded signals. With a reduced number of streams and successive decoding at the receivers, our proposed algorithm can even be implemented when the number of users is relatively large, whereas the complexity was previously considered as prohibitively high in the same setting. Finally, we propose a simple adaptation of our algorithms to account for the imperfection of the channel state information at the transmitter. Numerical results demonstrate that the general RS scheme provides a substantial performance gain as compared to state-of-the-art linear precoding schemes, especially with a moderately large number of users.

Face frontalization provides an effective and efficient way for face data augmentation and further improves the face recognition performance in extreme pose scenario. Despite recent advances in deep learning-based face synthesis approaches, this problem is still challenging due to significant pose and illumination discrepancy. In this paper, we present a novel Dual-Attention Generative Adversarial Network (DA-GAN) for photo-realistic face frontalization by capturing both contextual dependencies and local consistency during GAN training. Specifically, a self-attention-based generator is introduced to integrate local features with their long-range dependencies yielding better feature representations, and hence generate faces that preserve identities better, especially for larger pose angles. Moreover, a novel face-attention-based discriminator is applied to emphasize local features of face regions, and hence reinforce the realism of synthetic frontal faces. Guided by semantic segmentation, four independent discriminators are used to distinguish between different aspects of a face (\ie skin, keypoints, hairline, and frontalized face). By introducing these two complementary attention mechanisms in generator and discriminator separately, we can learn a richer feature representation and generate identity preserving inference of frontal views with much finer details (i.e., more accurate facial appearance and textures) comparing to the state-of-the-art. Quantitative and qualitative experimental results demonstrate the effectiveness and efficiency of our DA-GAN approach.

Many sequence-to-sequence generation tasks, including machine translation and text-to-speech, can be posed as estimating the density of the output y given the input x: p(y|x). Given this interpretation, it is natural to evaluate sequence-to-sequence models using conditional log-likelihood on a test set. However, the goal of sequence-to-sequence generation (or structured prediction) is to find the best output y^ given an input x, and each task has its own downstream metric R that scores a model output by comparing against a set of references y*: R(y^, y* | x). While we hope that a model that excels in density estimation also performs well on the downstream metric, the exact correlation has not been studied for sequence generation tasks. In this paper, by comparing several density estimators on five machine translation tasks, we find that the correlation between rankings of models based on log-likelihood and BLEU varies significantly depending on the range of the model families being compared. First, log-likelihood is highly correlated with BLEU when we consider models within the same family (e.g. autoregressive models, or latent variable models with the same parameterization of the prior). However, we observe no correlation between rankings of models across different families: (1) among non-autoregressive latent variable models, a flexible prior distribution is better at density estimation but gives worse generation quality than a simple prior, and (2) autoregressive models offer the best translation performance overall, while latent variable models with a normalizing flow prior give the highest held-out log-likelihood across all datasets. Therefore, we recommend using a simple prior for the latent variable non-autoregressive model when fast generation speed is desired.

In this work, we establish lower-bounds against memory bounded algorithms for distinguishing between natural pairs of related distributions from samples that arrive in a streaming setting.

In our first result, we show that any algorithm that distinguishes between uniform distribution on $\{0,1\}^n$ and uniform distribution on an $n/2$-dimensional linear subspace of $\{0,1\}^n$ with non-negligible advantage needs $2^{\Omega(n)}$ samples or $\Omega(n^2)$ memory.

Our second result applies to distinguishing outputs of Goldreich's local pseudorandom generator from the uniform distribution on the output domain. Specifically, Goldreich's pseudorandom generator $G$ fixes a predicate $P:\{0,1\}^k \rightarrow \{0,1\}$ and a collection of subsets $S_1, S_2, \ldots, S_m \subseteq [n]$ of size $k$. For any seed $x \in \{0,1\}^n$, it outputs $P(x_{S_1}), P(x_{S_2}), \ldots, P(x_{S_m})$ where $x_{S_i}$ is the projection of $x$ to the coordinates in $S_i$. We prove that whenever $P$ is $t$-resilient (all non-zero Fourier coefficients of $(-1)^P$ are of degree $t$ or higher), then no algorithm, with $<n^\epsilon$ memory, can distinguish the output of $G$ from the uniform distribution on $\{0,1\}^m$ with a large inverse polynomial advantage, for stretch $m \le \left(\frac{n}{t}\right)^{\frac{(1-\epsilon)}{36}\cdot t}$ (barring some restrictions on $k$). The lower bound holds in the streaming model where at each time step $i$, $S_i\subseteq [n]$ is a randomly chosen (ordered) subset of size $k$ and the distinguisher sees either $P(x_{S_i})$ or a uniformly random bit along with $S_i$.

Our proof builds on the recently developed machinery for proving time-space trade-offs (Raz 2016 and follow-ups) for search/learning problems.

In this work we present a new method of black-box optimization and constraint satisfaction. Existing algorithms that have attempted to solve this problem are unable to consider multiple modes, and are not able to adapt to changes in environment dynamics. To address these issues, we developed a modified Cross-Entropy Method (CEM) that uses a masked auto-regressive neural network for modeling uniform distributions over the solution space. We train the model using maximum entropy policy gradient methods from Reinforcement Learning. Our algorithm is able to express complicated solution spaces, thus allowing it to track a variety of different solution regions. We empirically compare our algorithm with variations of CEM, including one with a Gaussian prior with fixed variance, and demonstrate better performance in terms of: number of diverse solutions, better mode discovery in multi-modal problems, and better sample efficiency in certain cases.

We propose a method for extracting hierarchical backbones from a bipartite network. Our method leverages the observation that a hierarchical relationship between two nodes in a bipartite network is often manifested as an asymmetry in the conditional probability of observing the connections to them from the other node set. Our method estimates both the importance and direction of the hierarchical relationship between a pair of nodes, thereby providing a flexible way to identify the essential part of the networks. Using semi-synthetic benchmarks, we show that our method outperforms existing methods at identifying planted hierarchy while offering more flexibility. Application of our method to empirical datasets---a bipartite network of skills and individuals as well as the network between gene products and Gene Ontology (GO) terms---demonstrates the possibility of automatically extracting or augmenting ontology from data.

In this article we present a class of codes with few weights arising from special type of linear sets. We explicitly show the weights of such codes, their weight enumerator and possible choices for their generator matrices. In particular, our construction yields also to linear codes with three weights and, in some cases, to almost MDS codes. The interest for these codes relies on their applications to authentication codes and secret schemes, and their connections with further objects such as association schemes and graphs.

Recently smoothing deep neural network based classifiers via isotropic Gaussian perturbation is shown to be an effective and scalable way to provide state-of-the-art probabilistic robustness guarantee against $\ell_2$ norm bounded adversarial perturbations. However, how to train a good base classifier that is accurate and robust when smoothed has not been fully investigated. In this work, we derive a new regularized risk, in which the regularizer can adaptively encourage the accuracy and robustness of the smoothed counterpart when training the base classifier. It is computationally efficient and can be implemented in parallel with other empirical defense methods. We discuss how to implement it under both standard (non-adversarial) and adversarial training scheme. At the same time, we also design a new certification algorithm, which can leverage the regularization effect to provide tighter robustness lower bound that holds with high probability. Our extensive experimentation demonstrates the effectiveness of the proposed training and certification approaches on CIFAR-10 and ImageNet datasets.

We prove that if $q_1, \ldots, q_m: {\Bbb R}^n \longrightarrow {\Bbb R}$ are quadratic forms in variables $x_1, \ldots, x_n$ such that each $q_k$ depends on at most $r$ variables and each $q_k$ has common variables with at most $r$ other forms, then the average value of the product $\left(1+ q_1\right) \cdots \left(1+q_m\right)$ with respect to the standard Gaussian measure in ${\Bbb R}^n$ can be approximated within relative error $\epsilon >0$ in quasi-polynomial $n^{O(1)} m^{O(\ln m -\ln \epsilon)}$ time, provided $|q_k(x)| \leq \gamma \|x\|^2 /r$ for some absolute constant $\gamma > 0$ and $k=1, \ldots, m$. When $q_k$ are interpreted as pairwise squared distances for configurations of points in Euclidean space, the average can be interpreted as the partition function of systems of particles with mollified logarithmic potentials. We sketch a possible application to testing the feasibility of systems of real quadratic equations.

Graph embedding has recently gained momentum in the research community, in particular after the introduction of random walk and neural network based approaches. However, most of the embedding approaches focus on representing the local neighborhood of nodes and fail to capture the global graph structure, i.e. to retain the relations to distant nodes. To counter that problem, we propose a novel extension to random walk based graph embedding, which removes a percentage of least frequent nodes from the walks at different levels. By this removal, we simulate farther distant nodes to reside in the close neighborhood of a node and hence explicitly represent their connection. Besides the common evaluation tasks for graph embeddings, such as node classification and link prediction, we evaluate and compare our approach against related methods on shortest path approximation. The results indicate, that extensions to random walk based methods (including our own) improve the predictive performance only slightly - if at all.

Given a natural number $k\ge 2$ and a $k$-automatic set $S$ of natural numbers, we show that the lower density and upper density of $S$ are recursively computable rational numbers and we provide an algorithm for computing these quantities. In addition, we show that for every natural number $k\ge 2$ and every pair of rational numbers $(\alpha,\beta)$ with $0<\alpha<\beta<1$ or with $(\alpha,\beta)\in \{(0,0),(1,1)\}$ there is a $k$-automatic subset of the natural numbers whose lower density and upper density are $\alpha$ and $\beta$ respectively, and we show that these are precisely the values that can occur as the lower and upper densities of an automatic set.

This paper presents a network hardware-in-the-loop (HIL) simulation system for modeling large-scale power systems. Researchers have developed many HIL test systems for power systems in recent years. Those test systems can model both microsecond-level dynamic responses of power electronic systems and millisecond-level transients of transmission and distribution grids. By integrating individual HIL test systems into a network of HIL test systems, we can create large-scale power grid digital twins with flexible structures at required modeling resolution that fits for a wide range of system operating conditions. This will not only significantly reduce the need for field tests when developing new technologies but also greatly shorten the model development cycle. In this paper, we present a networked OPAL-RT based HIL test system for developing transmission-distribution coordinative Volt-VAR regulation technologies as an example to illustrate system setups, communication requirements among different HIL simulation systems, and system connection mechanisms. Impacts of communication delays, information exchange cycles, and computing delays are illustrated. Simulation results show that the performance of a networked HIL test system is satisfactory.

We consider combinatorial semi-bandits over a set of arms ${\cal X} \subset \{0,1\}^d$ where rewards are uncorrelated across items. For this problem, the algorithm ESCB yields the smallest known regret bound $R(T) = {\cal O}\Big( {d (\ln m)^2 (\ln T) \over \Delta_{\min} }\Big)$, but it has computational complexity ${\cal O}(|{\cal X}|)$ which is typically exponential in $d$, and cannot be used in large dimensions. We propose the first algorithm which is both computationally and statistically efficient for this problem with regret $R(T) = {\cal O} \Big({d (\ln m)^2 (\ln T)\over \Delta_{\min} }\Big)$ and computational complexity ${\cal O}(T {\bf poly}(d))$. Our approach involves carefully designing an approximate version of ESCB with the same regret guarantees, showing that this approximate algorithm can be implemented in time ${\cal O}(T {\bf poly}(d))$ by repeatedly maximizing a linear function over ${\cal X}$ subject to a linear budget constraint, and showing how to solve this maximization problems efficiently.

We introduce a novel approach to optimize the architecture of deep neural networks by identifying critical neurons and removing non-critical ones. The proposed approach utilizes a mixed integer programming (MIP) formulation of neural models which includes a continuous importance score computed for each neuron in the network. The optimization in MIP solver minimizes the number of critical neurons (i.e., with high importance score) that need to be kept for maintaining the overall accuracy of the model. Further, the proposed formulation generalizes the recently considered lottery ticket optimization by identifying multiple "lucky" sub-networks resulting in optimized architecture that not only perform well on a single dataset, but also generalize across multiple ones upon retraining of network weights. Finally, the proposed framework provides significant improvement in scalability of automatic sparsification of deep network architectures compared to previous attempts. We validate the performance and generalizability of our approach on MNIST, Fashion-MNIST, and CIFAR-10 datasets, using three different neural networks: LeNet 5 and two ReLU fully connected models.

A standard method for analyzing the asymptotic complexity of a program is to extract a recurrence that describes its cost in terms of the size of its input, and then to compute a closed-form upper bound on that recurrence. In practice there is rarely a formal argument that the recurrence is in fact an upper bound; indeed, there is usually no formal connection between the program and the recurrence at all. Here we develop a method for extracting recurrences from functional programs in a higher-order language with let-polymorphism that provably bound their operational cost. The method consists of two phases. In the first phase, a monadic translation is performed to extract a cost-annotated version of the original program. In the second phase, the extracted program is interpreted in a model. The key feature of this second phase is that different models describe different notions of size. This plays out specifically for values of inductive type, where different notions of size may be appropriate depending on the analysis, and for polymorphic functions, where we show that the notion of size for a polymorphic function can be described formally as the data that is common to the notions of size of its instances. We give several examples of different models that formally justify various informal extract-a-recurrence-and-solve cost analyses to show the applicability of our approach.

In recent years, deep learning has become a part of our everyday life and is revolutionizing quantum chemistry as well. In this work, we show how deep learning can be used to advance the research field of photochemistry by learning all important properties for photodynamics simulations. The properties are multiple energies, forces, nonadiabatic couplings and spin-orbit couplings. The nonadiabatic couplings are learned in a phase-free manner as derivatives of a virtually constructed property by the deep learning model, which guarantees rotational covariance. Additionally, an approximation for nonadiabatic couplings is introduced, based on the potentials, their gradients and Hessians. As deep-learning method, we employ SchNet extended for multiple electronic states. In combination with the molecular dynamics program SHARC, our approach termed SchNarc is tested on a model system and two realistic polyatomic molecules and paves the way towards efficient photodynamics simulations of complex systems.

This paper tackles the problem of data fusion in the semantic scene completion (SSC) task, which can simultaneously deal with semantic labeling and scene completion. RGB images contain texture details of the object(s) which are vital for semantic scene understanding. Meanwhile, depth images capture geometric clues of high relevance for shape completion. Using both RGB and depth images can further boost the accuracy of SSC over employing one modality in isolation. We propose a 3D gated recurrent fusion network (GRFNet), which learns to adaptively select and fuse the relevant information from depth and RGB by making use of the gate and memory modules. Based on the single-stage fusion, we further propose a multi-stage fusion strategy, which could model the correlations among different stages within the network. Extensive experiments on two benchmark datasets demonstrate the superior performance and the effectiveness of the proposed GRFNet for data fusion in SSC. Code will be made available.

In this article we are investigating the computers development process in the past decades in order to identify the factors that influence it the most. We describe such factors and use them to predict the direction of further development. To solve these problems, we use the concept of the Computer Capacity, which allows us to estimate the performance of computers theoretically, relying only on the description of its architecture.

We study \emph{partial-information} two-player turn-based games on graphs with omega-regular objectives, when the partial-information player has \emph{limited memory}. Such games are a natural formalization for reactive synthesis when the environment player is not genuinely adversarial to the system player. The environment player has goals of its own, but the exact goal of the environment player is unknown to the system player. We prove that the problem of determining the existence of a winning strategy for the system player is PSPACE-hard for reachability, safety, and parity objectives. Moreover, when the environment player is memoryless, the problem is PSPACE-complete. However, it is simpler to decide if the environment player has a winning strategy; it is only NP-complete. Additionally, we construct a game where the the partial-information player needs at least $\mathcal{O}(\sqrt{n})$ bits of memory to retain winning strategies in a game of size $\mathcal{O}(n)$.

We study alternating good-for-games (GFG) automata, i.e., alternating automata where both conjunctive and disjunctive choices can be resolved in an online manner, without knowledge of the suffix of the input word still to be read. We show that they can be exponentially more succinct than both their nondeterministic and universal counterparts. Furthermore, we lift many results from nondeterministic parity GFG automata to alternating ones: a single exponential determinisation procedure, an Exptime upper bound to the GFGness problem, a PTime algorithm for the GFGness problem of weak automata, and a reduction from a positive solution to the $G_2$ conjecture to a PTime algorithm for the GFGness problem of parity automata with a fixed index. The $G_2$ conjecture states that a nondeterministic parity automaton A is GFG if and only if a token game, known as the $G_2$ game, played on A is won by the first player. So far, it had only been proved for B\"uchi automata; we provide further evidence for it by proving it for coB\"uchi automata. We also study the complexity of deciding "half-GFGness", a property specific to alternating automata that only requires nondeterministic choices to be resolved in an online manner. We show that this problem is strictly more difficult than GFGness check, already for alternating automata on finite words.

We present a novel attention-based sequential model for mutually dependent spatio-temporal discrete event data, which is a versatile framework for capturing the non-homogeneous influence of events. We go beyond the assumption that the influence of the historical event (causing an upper-ward or downward jump in the intensity function) will fade monotonically over time, which is a key assumption made by many widely-used point process models, including those based on Recurrent Neural Networks (RNNs). We borrow the idea from the attention model based on a probabilistic score function, which leads to a flexible representation of the intensity function and is highly interpretable. We demonstrate the superior performance of our approach compared to the state-of-the-art for both synthetic and real data.

Algorithms that tackle deep exploration -- an important challenge in reinforcement learning -- have relied on epistemic uncertainty representation through ensembles or other hypermodels, exploration bonuses, or visitation count distributions. An open question is whether deep exploration can be achieved by an incremental reinforcement learning algorithm that tracks a single point estimate, without additional complexity required to account for epistemic uncertainty. We answer this question in the affirmative. In particular, we develop Langevin DQN, a variation of DQN that differs only in perturbing parameter updates with Gaussian noise, and demonstrate through a computational study that the algorithm achieves deep exploration. We also provide an intuition for why Langevin DQN performs deep exploration.

We study convex empirical risk minimization for high-dimensional inference in binary models. Our first result sharply predicts the statistical performance of such estimators in the linear asymptotic regime under isotropic Gaussian features. Importantly, the predictions hold for a wide class of convex loss functions, which we exploit in order to prove a bound on the best achievable performance among them. Notably, we show that the proposed bound is tight for popular binary models (such as Signed, Logistic or Probit), by constructing appropriate loss functions that achieve it. More interestingly, for binary linear classification under the Logistic and Probit models, we prove that the performance of least-squares is no worse than 0.997 and 0.98 times the optimal one. Numerical simulations corroborate our theoretical findings and suggest they are accurate even for relatively small problem dimensions.

We consider the estimation of treatment effects in settings when multiple treatments are assigned over time and treatments can have a causal effect on future outcomes. We formulate the problem as a linear state space Markov process with a high dimensional state and propose an extension of the double/debiased machine learning framework to estimate the dynamic effects of treatments. Our method allows the use of arbitrary machine learning methods to control for the high dimensional state, subject to a mean square error guarantee, while still allowing parametric estimation and construction of confidence intervals for the dynamic treatment effect parameters of interest. Our method is based on a sequential regression peeling process, which we show can be equivalently interpreted as a Neyman orthogonal moment estimator. This allows us to show root-n asymptotic normality of the estimated causal effects.

Assume that an $N$-bit sequence $S$ of $k$ self-delimiting numbers is given as input. We present space-efficient algorithms for sorting, dense ranking and (competitive) ranking $S$ on the word RAM model with word size $\Omega(\log N)$. Our algorithms run in $O(k + \frac{N}{\log N})$ time and use $O(N)$ bits. The sorting algorithm returns the given numbers in sorted order stored within a bit-vector of $N$ bits whereas our ranking algorithms construct data structures that allows us subsequently to return the (dense) rank of each number $x$ in $S$ in constant time if the position of $x$ in $S$ is given together with $x$. As an application of our algorithms we give an algorithm for tree isomorphism that runs in $O(n)$ time and uses $O(n)$ bits on $n$-node trees.

The rapid progress of physical implementation of quantum computers paved the way of realising the design of tools to help users write quantum programs for any given quantum devices. The physical constraints inherent to the current NISQ architectures prevent most quantum algorithms from being directly executed on quantum devices. To enable two-qubit gates in the algorithm, existing works focus on inserting SWAP gates to dynamically remap logical qubits to physical qubits. However, their schemes lack the consideration of the depth of generated quantum circuits. In this work, we propose a depth-aware SWAP insertion scheme for qubit mapping problem in the NISQ era.

This paper introduces human-robot sensory augmentation and illustrates it on a tracking task, where performance can be improved by the exchange of sensory information between the robot and its human user. It was recently found that during interaction between humans, the partners use each other's sensory information to improve their own sensing, thus also their performance and learning. In this paper, we develop a computational model of this unique human ability, and use it to build a novel control framework for human-robot interaction. The human partner's control is formulated as a feedback control with unknown control gains and desired trajectory. A Kalman filter is used to estimate first the control gains and then the desired trajectory. The estimated human partner's desired trajectory is used as augmented sensory information about the system and combined with the robot's measurement to estimate an uncertain target trajectory. Simulations and an implementation of the presented framework on a robotic interface validate the proposed observer-predictor pair for a tracking task. The results obtained using this robot demonstrate how the human user's control can be identified, and exhibit similar benefits of this sensory augmentation as was observed between interacting humans.

We study the problem of estimating the distribution of effect sizes (the mean of the test statistic under the alternate hypothesis) in a multiple testing setting. Knowing this distribution allows us to calculate the power (type II error) of any experimental design. We show that it is possible to estimate this distribution using an inexpensive pilot experiment, which takes significantly fewer samples than would be required by an experiment that identified the discoveries. Our estimator can be used to guarantee the number of discoveries that will be made using a given experimental design in a future experiment. We prove that this simple and computationally efficient estimator enjoys a number of favorable theoretical properties, and demonstrate its effectiveness on data from a gene knockout experiment on influenza inhibition in Drosophila.

Population protocols are a model of distributed computation intended for the study of networks of independent computing agents with dynamic communication structure. Each agent has a finite number of states, and communication opportunities occur nondeterministically, allowing the agents involved to change their states based on each other's states. Population protocols are often studied in terms of reaching a consensus on whether the input configuration satisfied some predicate.

In the present paper we propose an alternative point of view. Instead of studying the properties of inputs that a protocol can recognise, we study the properties of outputs that a protocol eventually ensures. We define constructive expressive power. We show that for general population protocols and immediate observation population protocols the constructive expressive power coincides with the normal expressive power.

Immediate observation protocols also preserve their relatively low verification complexity in the constructive expressive power setting.

Pre-trained models have demonstrated their effectiveness in many downstream natural language processing (NLP) tasks. The availability of multilingual pre-trained models enables zero-shot transfer of NLP tasks from high resource languages to low resource ones. However, recent research in improving pre-trained models focuses heavily on English. While it is possible to train the latest neural architectures for other languages from scratch, it is undesirable due to the required amount of compute. In this work, we tackle the problem of transferring an existing pre-trained model from English to other languages under a limited computational budget. With a single GPU, our approach can obtain a foreign BERT base model within a day and a foreign BERT large within two days. Furthermore, evaluating our models on six languages, we demonstrate that our models are better than multilingual BERT on two zero-shot tasks: natural language inference and dependency parsing.

We uncover privacy vulnerabilities in the ICAO 9303 standard implemented by ePassports worldwide. These vulnerabilities, confirmed by ICAO, enable an ePassport holder who recently passed through a checkpoint to be reidentified without openning their ePassport. This paper explains how bisimilarity was used to discover these vulnerabilities. In order to tackle such bisimilarity problems, we develop here a chain of methods for the applied pi-calculus including a symbolic under approximation of bisimilarity, called open bisimilarity, and a modal logic, called classical FM, for describing and certifying attacks. Evidence is provided to argue for a new scheme for specifying such unlinkability problems that more accurately reflects the capabilities of an attacker.

In this paper, an optimal approach based on on-off controller is used to optimally control a DC-DC step-down converter. It is shown that the conventional controller techniques of DC-DC converters based on a linearized averaging model have several drawbacks, including different operating mode, linearization concerns, and constraint difficulties. A single-mode discretize state-space model is used, and a new optimal approach policy is implemented to control a step-down DC-DC converter. The simulation results confirm that the proposed DC-DC model associated with the optimal controller functions properly in controlling a DC-DC unfixed switch-mode step-down converter that is facing load changes, noisy inputs, and start-up procedure.

In general, adversarial perturbations superimposed on inputs are realistic threats for a deep neural network (DNN). In this paper, we propose a practical generation method of such adversarial perturbation to be applied to black-box attacks that demand access to an input-output relationship only. Thus, the attackers generate such perturbation without invoking inner functions and/or accessing the inner states of a DNN. Unlike the earlier studies, the algorithm to generate the perturbation presented in this study requires much fewer query trials. Moreover, to show the effectiveness of the adversarial perturbation extracted, we experiment with a DNN for semantic segmentation. The result shows that the network is easily deceived with the perturbation generated than using uniformly distributed random noise with the same magnitude.

It is commonly observed that the data are scattered everywhere and difficult to be centralized. The data privacy and security also become a sensitive topic. The laws and regulations such as the European Union's General Data Protection Regulation (GDPR) are designed to protect the public's data privacy. However, machine learning requires a large amount of data for better performance, and the current circumstances put deploying real-life AI applications in an extremely difficult situation. To tackle these challenges, in this paper we propose a novel privacy-preserving federated machine learning model, named Federated Extra-Trees, which applies local differential privacy in the federated trees model. A secure multi-institutional machine learning system was developed to provide superior performance by processing the modeling jointly on different clients without exchanging any raw data. We have validated the accuracy of our work by conducting extensive experiments on public datasets and the efficiency and robustness were also verified by simulating the real-world scenarios. Overall, we presented an extensible, scalable and practical solution to handle the data island problem.

We investigate the problem of the successive refinement for Wyner-Ziv coding with degraded side information and obtain a complete characterization of the rate region for the quadratic vector Gaussian case. The achievability part is based on the evaluation of the Tian-Diggavi inner bound that involves Gaussian auxiliary random vectors. For the converse part, a matching outer bound is obtained with the aid of a new extremal inequality. Herein, the proof of this extremal inequality depends on the integration of the monotone path argument and the doubling trick as well as information-estimation relations.

To ensure pedestrian friendly streets in the era of automated vehicles, reassessment of current policies, practices, design, rules and regulations of urban areas is of importance. This study investigates pedestrian crossing behaviour, as an important element of urban dynamics that is expected to be affected by the presence of automated vehicles. For this purpose, an interpretable machine learning framework is proposed to explore factors affecting pedestrians' wait time before crossing mid-block crosswalks in the presence of automated vehicles. To collect rich behavioural data, we developed a dynamic and immersive virtual reality experiment, with 180 participants from a heterogeneous population in 4 different locations in the Greater Toronto Area (GTA). Pedestrian wait time behaviour is then analyzed using a data-driven Cox Proportional Hazards (CPH) model, in which the linear combination of the covariates is replaced by a flexible non-linear deep neural network. The proposed model achieved a 5% improvement in goodness of fit, but more importantly, enabled us to incorporate a richer set of covariates. A game theoretic based interpretability method is used to understand the contribution of different covariates to the time pedestrians wait before crossing. Results show that the presence of automated vehicles on roads, wider lane widths, high density on roads, limited sight distance, and lack of walking habits are the main contributing factors to longer wait times. Our study suggested that, to move towards pedestrian-friendly urban areas, national level educational programs for children, enhanced safety measures for seniors, promotion of active modes of transportation, and revised traffic rules and regulations should be considered.

Despite significant progress in sequencing technology, there are many cellular enzymatic activities that remain unknown. We develop a new method, referred to as SUNDRY (Similarity-weighting for UNlabeled Data in a Residual HierarchY), for training enzyme-specific predictors that take as input a query substrate molecule and return whether the enzyme would act on that substrate or not. When addressing this enzyme promiscuity prediction problem, a major challenge is the lack of abundant labeled data, especially the shortage of labeled data for negative cases (enzyme-substrate pairs where the enzyme does not act to transform the substrate to a product molecule). To overcome this issue, our proposed method can learn to classify a target enzyme by sharing information from related enzymes via known tree hierarchies. Our method can also incorporate three types of data: those molecules known to be catalyzed by an enzyme (positive cases), those with unknown relationships (unlabeled cases), and molecules labeled as inhibitors for the enzyme. We refer to inhibitors as hard negative cases because they may be difficult to classify well: they bind to the enzyme, like positive cases, but are not transformed by the enzyme. Our method uses confidence scores derived from structural similarity to treat unlabeled examples as weighted negatives. We compare our proposed hierarchy-aware predictor against a baseline that cannot share information across related enzymes. Using data from the BRENDA database, we show that each of our contributions (hierarchical sharing, per-example confidence weighting of unlabeled data based on molecular similarity, and including inhibitors as hard-negative examples) contributes towards a better characterization of enzyme promiscuity.

A large fraction of online advertisement is sold via repeated second price auctions. In these auctions, the reserve price is the main tool for the auctioneer to boost revenues. In this work, we investigate the following question: Can changing the reserve prices based on the previous bids improve the revenue of the auction, taking into account the long-term incentives and strategic behavior of the bidders? We show that if the distribution of the valuations is known and satisfies the standard regularity assumptions, then the optimal mechanism has a constant reserve. However, when there is uncertainty in the distribution of the valuations, previous bids can be used to learn the distribution of the valuations and to update the reserve price. We present a simple, approximately incentive-compatible, and asymptotically optimal dynamic reserve mechanism that can significantly improve the revenue over the best static reserve.

The paper is from July 2014 (our submission to WINE 2014), posted later here on the arxiv to complement the 1-page abstract in the WINE 2014 proceedings.

In order to deal with issues caused by the increasing penetration of renewable resources in power systems, this paper proposes a novel distributed frequency control algorithm for each generating unit and controllable load in a transmission network to replace the conventional automatic generation control (AGC). The targets of the proposed control algorithm are twofold. First, it is to restore the nominal frequency and scheduled net inter-area power exchanges after an active power mismatch between generation and demand. Second, it is to optimally coordinate the active powers of all controllable units in a distributed manner. The designed controller only relies on local information, computation, and peer-to-peer communication between cyber-connected buses, and it is also robust against uncertain system parameters. Asymptotic stability of the closed-loop system under the designed algorithm is analysed by using a nonlinear structure-preserving model including the first-order turbine-governor dynamics. Finally, case studies validate the effectiveness of the proposed method.

We study the problem of locating the source of an epidemic diffusion process from a sparse set of sensors, under noise. In a graph $G=(V,E)$, an unknown source node $v^* \in V$ is drawn uniformly at random, and unknown edge weights $w(e)$ for $e\in E$, representing the propagation delays along the edges, are drawn independently from a Gaussian distribution of mean $1$ and variance $\sigma^2$. An algorithm then attempts to locate $v^*$ by picking sensor (also called query) nodes $s \in V$ and being told the length of the shortest path between $s$ and $v^*$ in graph $G$ weighted by $w$. We consider two settings: static, in which all query nodes must be decided in advance, and sequential, in which each query can depend on the results of the previous ones.

We characterize the query complexity when $G$ is an $n$-node path. In the static setting, $\Theta(n\sigma^2)$ queries are needed for $\sigma^2 \leq 1$, and $\Theta(n)$ for $\sigma^2 \geq 1$. In the sequential setting, somewhat surprisingly, only $\Theta(\log\log_{1/\sigma}n)$ are needed when $\sigma^2 \leq 1/2$, and $\Theta(\log \log n)+O_\sigma(1)$ when $\sigma^2 \geq 1/2$. This is the first mathematical study of source location under non-trivial amounts of noise.

Self-attention mechanisms have achieved great success on a variety of NLP tasks due to its flexibility of capturing dependency between arbitrary positions in a sequence. For problems such as query-based summarization (Qsumm) and knowledge graph reasoning where each input sequence is associated with an extra query, explicitly modeling such conditional contextual dependencies can lead to a more accurate solution, which however cannot be captured by existing self-attention mechanisms. In this paper, we propose \textit{conditional self-attention} (CSA), a neural network module designed for conditional dependency modeling. CSA works by adjusting the pairwise attention between input tokens in a self-attention module with the matching score of the inputs to the given query. Thereby, the contextual dependencies modeled by CSA will be highly relevant to the query. We further studied variants of CSA defined by different types of attention. Experiments on Debatepedia and HotpotQA benchmark datasets show CSA consistently outperforms vanilla Transformer and previous models for the Qsumm problem.

The synthesis process is essential for achieving computational experiment design in the field of inorganic materials chemistry. In this work, we present a novel corpus of the synthesis process for all-solid-state batteries and an automated machine reading system for extracting the synthesis processes buried in the scientific literature. We define the representation of the synthesis processes using flow graphs, and create a corpus from the experimental sections of 243 papers. The automated machine-reading system is developed by a deep learning-based sequence tagger and simple heuristic rule-based relation extractor. Our experimental results demonstrate that the sequence tagger with the optimal setting can detect the entities with a macro-averaged F1 score of 0.826, while the rule-based relation extractor can achieve high performance with a macro-averaged F1 score of 0.887.

This letter studies the problem of maintaining information freshness under passive eavesdropping attacks. The classical three-node wiretap channel model is considered, in which a source aims to send its latest status wirelessly to its intended destination, while protecting the message from being overheard by an eavesdropper. Considering that conventional channel capacity-based secrecy metrics are no longer adequate to measure the information timeliness in status update systems, we define two new age of information-based metrics to characterize the secrecy performance of the considered system. We further propose, analyze, and optimize a randomized stationary transmission policy implemented at the source for further enhancing the secrecy performance. Simulation results are provided to validate our analysis and optimization.

The rapid development of the fifth generation mobile communication systems accelerates the implementation of vehicle-to-everything communications. Compared with the other types of vehicular communications, vehicle-to-vehicle (V2V) communications mainly focus on the exchange of driving safety information with neighboring vehicles, which requires ultra-reliable and low-latency communications (URLLCs). However, the frame size is significantly shortened in V2V URLLCs because of the rigorous latency requirements, and thus the overhead is no longer negligible compared with the payload information from the perspective of size. In this paper, we investigate the frame design and resource allocation for an urban V2V URLLC system in which the uplink cellular resources are reused at the underlay mode. Specifically, we first analyze the lower bounds of performance for V2V pairs and cellular users based on the regular pilot scheme and superimposed pilot scheme. Then, we propose a frame design algorithm and a semi-persistent scheduling algorithm to achieve the optimal frame design and resource allocation with the reasonable complexity. Finally, our simulation results show that the proposed frame design and resource allocation scheme can greatly satisfy the URLLC requirements of V2V pairs and guarantee the communication quality of cellular users.

A permutation $\pi$ over alphabet $\Sigma = {1,2,3,\ldots,n}$, is a sequence where every element $x$ in $\Sigma$ occurs exactly once. $S_n$ is the symmetric group consisting of all permutations of length $n$ defined over $\Sigma$. $I_n$ = $(1, 2, 3,\ldots, n)$ and $R_n =(n, n-1, n-2,\ldots, 2, 1)$ are identity (i.e. sorted) and reverse permutations respectively. An operation, that we call as an $LRE$ operation, has been defined in OEIS with identity A186752. This operation is constituted by three generators: left-rotation, right-rotation and transposition(1,2). We call transposition(1,2) that swaps the two leftmost elements as $Exchange$. The minimum number of moves required to transform $R_n$ into $I_n$ with $LRE$ operation are known for $n \leq 11$ as listed in OEIS with sequence number A186752. For this problem no upper bound is known. OEIS sequence A186783 gives the conjectured diameter of the symmetric group $S_n$ when generated by $LRE$ operations \cite{oeis}. The contributions of this article are: (a) The first non-trivial upper bound for the number of moves required to sort $R_n$ with $LRE$; (b) a tighter upper bound for the number of moves required to sort $R_n$ with $LRE$; and (c) the minimum number of moves required to sort $R_{10}$ and $R_{11}$ have been computed. Here we are computing an upper bound of the diameter of Cayley graph generated by $LRE$ operation. Cayley graphs are employed in computer interconnection networks to model efficient parallel architectures. The diameter of the network corresponds to the maximum delay in the network.

Area under ROC curve (AUC) is a widely used performance measure for classification models. We propose a new distributionally robust AUC maximization model (DR-AUC) that relies on the Kantorovich metric and approximates the AUC with the hinge loss function. We use duality theory to reformulate the DR-AUC model as a tractable convex quadratic optimization problem. The numerical experiments show that the proposed DR-AUC model -- benchmarked with the standard deterministic AUC and the support vector machine models - improves the out-of-sample performance over the majority of the considered datasets. The results are particularly encouraging since our numerical experiments are conducted with training sets of small size which have been known to be conducive to low out-of-sample performance.

Compressive sensing (CS) is well-known for its unique functionalities of sensing, compressing, and security (i.e. CS measurements are equally important). However, there is a tradeoff. Improving sensing and compressing efficiency with prior signal information tends to favor particular measurements, thus decrease the security. This work aimed to improve the sensing and compressing efficiency without compromise the security with a novel sampling matrix, named Restricted Structural Random Matrix (RSRM). RSRM unified the advantages of frame-based and block-based sensing together with the global smoothness prior (i.e. low-resolution signals are highly correlated). RSRM acquired compressive measurements with random projection (equally important) of multiple randomly sub-sampled signals, which was restricted to be the low-resolution signals (equal in energy), thereby, its observations are equally important. RSRM was proven to satisfies the Restricted Isometry Property and shows comparable reconstruction performance with recent state-of-the-art compressive sensing and deep learning-based methods.

We present a new active learning algorithm that adaptively partitions the input space into a finite number of regions, and subsequently seeks a distinct predictor for each region, both phases actively requesting labels. We prove theoretical guarantees for both the generalization error and the label complexity of our algorithm, and analyze the number of regions defined by the algorithm under some mild assumptions. We also report the results of an extensive suite of experiments on several real-world datasets demonstrating substantial empirical benefits over existing single-region and non-adaptive region-based active learning baselines.

Unsupervised anomaly detection aims to identify anomalous samples from highly complex and unstructured data, which is pervasive in both fundamental research and industrial applications. However, most existing methods neglect the complex correlation among data samples, which is important for capturing normal patterns from which the abnormal ones deviate. In this paper, we propose a method of Correlation aware unsupervised Anomaly detection via Deep Gaussian Mixture Model (CADGMM), which captures the complex correlation among data points for high-quality low-dimensional representation learning. Specifically, the relations among data samples are correlated firstly in forms of a graph structure, in which, the node denotes the sample and the edge denotes the correlation between two samples from the feature space. Then, a dual-encoder that consists of a graph encoder and a feature encoder, is employed to encode both the feature and correlation information of samples into the low-dimensional latent space jointly, followed by a decoder for data reconstruction. Finally, a separate estimation network as a Gaussian Mixture Model is utilized to estimate the density of the learned latent vector, and the anomalies can be detected by measuring the energy of the samples. Extensive experiments on real-world datasets demonstrate the effectiveness of the proposed method.

To efficiently support the real-time control applications, networked control systems operating with ultra-reliable and low-latency communications (URLLCs) become fundamental technology for future Internet of things (IoT). However, the design of control, sensing and communications is generally isolated at present. In this paper, we propose the joint optimization of control cost and energy consumption for a centralized wireless networked control system. Specifically, with the sensing-then-control'' protocol, we first develop an optimization framework which jointly takes control, sensing and communications into account. In this framework, we derive the spectral efficiency, linear quadratic regulator cost and energy consumption. Then, a novel performance metric called the \textit{energy-to-control efficiency} is proposed for the IoT control system. In addition, we optimize the energy-to-control efficiency while guaranteeing the requirements of URLLCs, thereupon a general and complex max-min joint optimization problem is formulated for the IoT control system. To optimally solve the formulated problem by reasonable complexity, we propose two radio resource allocation algorithms. Finally, simulation results show that our proposed algorithms can significantly improve the energy-to-control efficiency for the IoT control system with URLLCs.

Orthogonal blinding based schemes for wireless physical layer security aim to achieve secure communication by injecting noise into channels orthogonal to the main channel and corrupting the eavesdropper's signal reception. These methods, albeit practical, have been proven vulnerable against multi-antenna eavesdroppers who can filter the message from the noise. The vulnerability is rooted in the fact that the main channel state remains static in spite of the noise injection, which allows an eavesdropper to estimate it promptly via known symbols and filter out the noise. Our proposed scheme leverages a reconfigurable antenna for Alice to rapidly change the channel state during transmission and a compressive sensing based algorithm for her to predict and cancel the changing effects for Bob. As a result, the communication between Alice and Bob remains clear, whereas randomized channel state prevents Eve from launching the known-plaintext attack. We formally analyze the security of the scheme against both single and multi-antenna eavesdroppers and identify its unique anti-eavesdropping properties due to the artificially created fast-changing channel. We conduct extensive simulations and real-world experiments to evaluate its performance. Empirical results show that our scheme can suppress Eve's attack success rate to the level of random guessing, even if she knows all the symbols transmitted through other antenna modes.

Recently, temporal action localization (TAL), i.e., finding specific action segments in untrimmed videos, has attracted increasing attentions of the computer vision community. State-of-the-art solutions for TAL involves predicting three values at each time point, corresponding to the probabilities that the action starts, continues and ends, and post-processing these curves for the final localization. This paper delves deep into this mechanism, and argues that existing approaches mostly ignored the potential relationship of these curves, and results in low quality of action proposals. To alleviate this problem, we add extra constraints to these curves, e.g., the probability of ''action continues'' should be relatively high between probability peaks of ''action starts'' and ''action ends'', so that the entire framework is aware of these latent constraints during an end-to-end optimization process. Experiments are performed on two popular TAL datasets, THUMOS14 and ActivityNet1.3. Our approach clearly outperforms the baseline both quantitatively (in terms of the AR@AN and mAP) and qualitatively (the curves in the testing stage become much smoother). In particular, when we build our constraints beyond TSA-Net and PGCN, we achieve the state-of-the-art performance especially at strict high IoU settings. The code will be available.

Prior work in multi-task learning has mainly focused on predictions on a single image. In this work, we present a new approach for multi-task learning from videos. Our approach contains a novel inter-frame attention module which allows learning of task-specific attention across frames. We embed the attention module in a "slow-fast" architecture, where the slower network runs on sparsely sampled keyframes and the lightweight shallow network runs on non-key frames at a high frame rate. We further propose an effective adversarial learning strategy to encourage the slow and fast network to learn similar features. The proposed architecture ensures low-latency multi-task learning while maintaining high quality prediction. Experiments show competitive accuracy compared to state-of-the-art on two multi-task learning benchmarks while reducing the number of floating point operations (FLOPs) by 70%. Meanwhile, our attention based feature propagation outperforms other feature propagation methods in accuracy by up to 90% reduction of FLOPs.

In a stable matching setting, we consider a query model that allows for an interactive learning algorithm to make precisely one type of query: proposing a matching, the response to which is either that the proposed matching is stable, or a blocking pair (chosen adversarially) indicating that this matching is unstable. For one-to-one matching markets, our main result is an essentially tight upper bound of $O(n^2\log n)$ on the deterministic query complexity of interactively learning a stable matching in this coarse query model, along with an efficient randomized algorithm that achieves this query complexity with high probability. For many-to-many matching markets in which participants have responsive preferences, we first give an interactive learning algorithm whose query complexity and running time are polynomial in the size of the market if the maximum quota of each agent is bounded; our main result for many-to-many markets is that the deterministic query complexity can be made polynomial (more specifically, $O(n^3 \log n)$) in the size of the market even for arbitrary (e.g., linear in the market size) quotas.

In this paper, the task of cross-network node classification, which leverages the abundant labeled nodes from a source network to help classify unlabeled nodes in a target network, is studied. The existing domain adaptation algorithms generally fail to model the network structural information, and the current network embedding models mainly focus on single-network applications. Thus, both of them cannot be directly applied to solve the cross-network node classification problem. This motivates us to propose an adversarial cross-network deep network embedding (ACDNE) model to integrate adversarial domain adaptation with deep network embedding so as to learn network-invariant node representations that can also well preserve the network structural information. In ACDNE, the deep network embedding module utilizes two feature extractors to jointly preserve attributed affinity and topological proximities between nodes. In addition, a node classifier is incorporated to make node representations label-discriminative. Moreover, an adversarial domain adaptation technique is employed to make node representations network-invariant. Extensive experimental results demonstrate that the proposed ACDNE model achieves the state-of-the-art performance in cross-network node classification.

Sliced-Wasserstein distance (SWD) and its variation, Max Sliced-Wasserstein distance (Max-SWD), have been widely used in the recent years due to their fast computation and scalability when the probability measures lie in very high dimension. However, these distances still have their weakness, SWD requires a lot of projection samples because it uses the uniform distribution to sample projecting directions, Max-SWD uses only one projection, causing it to lose a large amount of information. In this paper, we propose a novel distance that finds optimal penalized probability measure over the slices, which is named Distributional Sliced-Wasserstein distance (DSWD). We show that the DSWD is a generalization of both SWD and Max-SWD, and the proposed distance could be found by searching for the push-forward measure over a set of measures satisfying some certain constraints. Moreover, similar to SWD, we can extend Generalized Sliced-Wasserstein distance (GSWD) to Distributional Generalized Sliced-Wasserstein distance (DGSWD). Finally, we carry out extensive experiments to demonstrate the favorable generative modeling performances of our distances over the previous sliced-based distances in large-scale real datasets.

In this paper, we propose a structured linear parameterization of a feedback policy to solve the model-free stochastic optimal control problem. This parametrization is corroborated by a decoupling principle that is shown to be near-optimal under a small noise assumption, both in theory and by empirical analyses. Further, we incorporate a model-free version of the Iterative Linear Quadratic Regulator (ILQR) in a sample-efficient manner into our framework. Simulations on systems over a range of complexities reveal that the resulting algorithm is able to harness the superior second-order convergence properties of ILQR. As a result, it is fast and is scalable to a wide variety of higher dimensional systems. Comparisons are made with a state-of-the-art reinforcement learning algorithm, the Deep Deterministic Policy Gradient (DDPG) technique, in order to demonstrate the significant merits of our approach in terms of training-efficiency.

Current semantic segmentation models only exploit first-order statistics, while rarely exploring high-order statistics. However, common first-order statistics are insufficient to support a solid unanimous representation. In this paper, we propose High-Order Paired-ASPP Network to exploit high-order statistics from various feature levels. The network first introduces a High-Order Representation module to extract the contextual high-order information from all stages of the backbone. They can provide more semantic clues and discriminative information than the first-order ones. Besides, a Paired-ASPP module is proposed to embed high-order statistics of the early stages into the last stage. It can further preserve the boundary-related and spatial context in the low-level features for final prediction. Our experiments show that the high-order statistics significantly boost the performance on confusing objects. Our method achieves competitive performance without bells and whistles on three benchmarks, i.e, Cityscapes, ADE20K and Pascal-Context with the mIoU of 81.6%, 45.3% and 52.9%.

The rateless and information additive properties of fountain codes make them attractive for use in broadcast/multicast applications, especially in radio environments where channel characteristics vary with time and bandwidth is expensive. Conventional schemes using a combination of ARQ (Automatic Repeat reQuest) and FEC (Forward Error Correction) suffer from serious drawbacks such as feedback implosion at the transmitter, the need to know the channel characteristics apriori so that the FEC scheme is designed to be effective and the fact that a reverse channel is needed to request retransmissions if the FEC fails. This paper considers the assessment of fountain codes over radio channels. The performance of fountain codes, in terms of the associated overheads, over radio channels of the type experienced in GPRS (General Packet Radio Service) is presented. The work is then extended to assessing the performance of Fountain codes in combination with the GPRS channel coding schemes in a radio environment.

A Relational Markov Decision Process (RMDP) is a first-order representation to express all instances of a single probabilistic planning domain with possibly unbounded number of objects. Early work in RMDPs outputs generalized (instance-independent) first-order policies or value functions as a means to solve all instances of a domain at once. Unfortunately, this line of work met with limited success due to inherent limitations of the representation space used in such policies or value functions. Can neural models provide the missing link by easily representing more complex generalized policies, thus making them effective on all instances of a given domain?

We present the first neural approach for solving RMDPs, expressed in the probabilistic planning language of RDDL. Our solution first converts an RDDL instance into a ground DBN. We then extract a graph structure from the DBN. We train a relational neural model that computes an embedding for each node in the graph and also scores each ground action as a function over the first-order action variable and object embeddings on which the action is applied. In essence, this represents a neural generalized policy for the whole domain. Given a new test problem of the same domain, we can compute all node embeddings using trained parameters and score each ground action to choose the best action using a single forward pass without any retraining. Our experiments on nine RDDL domains from IPPC demonstrate that neural generalized policies are significantly better than random and sometimes even more effective than training a state-of-the-art deep reactive policy from scratch.

Overparameterization has been shown to benefit both the optimization and generalization of neural networks, but large networks are resource hungry at both training and test time. Network pruning can reduce test-time resource requirements, but is typically applied to trained networks and therefore cannot avoid the expensive training process. We aim to prune networks at initialization, thereby saving resources at training time as well. Specifically, we argue that efficient training requires preserving the gradient flow through the network. This leads to a simple but effective pruning criterion we term Gradient Signal Preservation (GraSP). We empirically investigate the effectiveness of the proposed method with extensive experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet and ImageNet, using VGGNet and ResNet architectures. Our method can prune 80% of the weights of a VGG-16 network on ImageNet at initialization, with only a 1.6% drop in top-1 accuracy. Moreover, our method achieves significantly better performance than the baseline at extreme sparsity levels.

This paper considers the distributed optimization problem over a network where the global objective is to optimize a sum of local functions using only local computation and communication. Since the existing algorithms either adopt a linear consensus mechanism, which converges at best linearly, or assume that each node starts sufficiently close to an optimal solution, they cannot achieve globally superlinear convergence. To break through the linear consensus rate, we propose a finite-time set-consensus method, and then incorporate it into Polyak's adaptive Newton method, leading to our distributed adaptive Newton algorithm (DAN). To avoid transmitting local Hessians, we adopt a low-rank approximation idea to compress the Hessian and design a communication-efficient DAN-LA. Then, the size of transmitted messages in DAN-LA is reduced to $O(p)$ per iteration, where $p$ is the dimension of decision vectors and is the same as the first-order methods. We show that DAN and DAN-LA can globally achieve quadratic and superlinear convergence rates, respectively. Numerical experiments on logistic regression problems are finally conducted to show the advantages over existing methods.

Accurate simulation of complex physical systems enables the development, testing, and certification of control strategies before they are deployed into the real systems. As simulators become more advanced, the analytical tractability of the differential equations and associated numerical solvers incorporated in the simulations diminishes, making them difficult to analyse. A potential solution is the use of probabilistic inference to assess the uncertainty of the simulation parameters given real observations of the system. Unfortunately the likelihood function required for inference is generally expensive to compute or totally intractable. In this paper we propose to leverage the power of modern simulators and recent techniques in Bayesian statistics for likelihood-free inference to design a control framework that is efficient and robust with respect to the uncertainty over simulation parameters. The posterior distribution over simulation parameters is propagated through a potentially non-analytical model of the system with the unscented transform, and a variant of the information theoretical model predictive control. This approach provides a more efficient way to evaluate trajectory roll outs than Monte Carlo sampling, reducing the online computation burden. Experiments show that the controller proposed attained superior performance and robustness on classical control and robotics tasks when compared to models not accounting for the uncertainty over model parameters.

Network function virtualization is a promising technology to simultaneously support multiple services with diverse characteristics and requirements in the fifth generation and beyond networks. In practice, each service consists of a predetermined sequence of functions, called a service function chain (SFC), running on a cloud environment. To make different service slices work properly in harmony, it is crucial to select the cloud nodes to deploy the functions in the SFC and flexibly route the flow of the services such that these functions are processed in sequence, the end-to-end (E2E) latency constraints of all services are guaranteed, and all resource constraints are respected. In this paper, we propose a new (mixed binary linear program) formulation of the above network slicing problem that optimizes the system energy efficiency while jointly considers the resource budget, functional instantiation, flow routing, and E2E latency requirement. Numerical results show the advantage of the proposed formulation compared to the existing ones.

Robots are required to not only learn spatial concepts autonomously but also utilize such knowledge for various tasks in a domestic environment. Spatial concept represents a multimodal place category acquired from the robot's spatial experience including vision, speech-language, and self-position. The aim of this study is to enable a mobile robot to perform navigational tasks with human speech instructions, such as Go to the kitchen', via probabilistic inference on a Bayesian generative model using spatial concepts. Specifically, path planning was formalized as the maximization of probabilistic distribution on the path-trajectory under speech instruction, based on a control-as-inference framework. Furthermore, we described the relationship between probabilistic inference based on the Bayesian generative model and control problem including reinforcement learning. We demonstrated path planning based on human instruction using acquired spatial concepts to verify the usefulness of the proposed approach in the simulator and in real environments. Experimentally, places instructed by the user's speech commands showed high probability values, and the trajectory toward the target place was correctly estimated. Our approach, based on probabilistic inference concerning decision-making, can lead to further improvement in robot autonomy.

Self-supervision is key to extending use of deep learning for label scarce domains. For most of self-supervised approaches data transformations play an important role. However, up until now the impact of transformations have not been studied. Furthermore, different transformations may have different impact on the system. We provide novel insights into the use of data transformation in self-supervised tasks, specially pertaining to clustering. We show theoretically and empirically that certain set of transformations are helpful in convergence of self-supervised clustering. We also show the cases when the transformations are not helpful or in some cases even harmful. We show faster convergence rate with valid transformations for convex as well as certain family of non-convex objectives along with the proof of convergence to the original set of optima. We have synthetic as well as real world data experiments. Empirically our results conform with the theoretical insights provided.

When a neural network is partitioned and distributed across physical nodes, failure of physical nodes causes the failure of the neural units that are placed on those nodes, which results in a significant performance drop. Current approaches focus on resiliency of training in distributed neural networks. However, resiliency of inference in distributed neural networks is less explored. We introduce ResiliNet, a scheme for making inference in distributed neural networks resilient to physical node failures. ResiliNet combines two concepts to provide resiliency: skip connection in residual neural networks, and a novel technique called failout, which is introduced in this paper. Failout simulates physical node failure conditions during training using dropout, and is specifically designed to improve the resiliency of distributed neural networks. The results of the experiments and ablation studies using three datasets confirm the ability of ResiliNet to provide inference resiliency for distributed neural networks.

This paper mainly illustrates the Bit error rate performance of M-ary QAM and M-ary PSK for different values of SNR over Rician Fading channel. A signal experiences multipath propagation in the wireless communication system which causes expeditious signal amplitude fluctuations in time, is defined as fading. Rician Fading is a small signal fading. Rician fading is a hypothetical model for radio propagation inconsistency produced by fractional cancellation of a radio signal by itself and as a result the signal reaches in the receiver by several different paths. In this case, at least one of the destination paths is being lengthened or shortened. From this paper , it can be observed that the value of Bit error rate decreases when signal to noise ratio increases in decibel for Mary QAM and M-ary PSK such as 256 QAM, 64 PSK etc. Constellation diagrams of M-QAM and M-PSK have also been showed in this paper using MATLAB Simulation. The falling of Bit error rate with the increase of diversity order for a fixed value of SNR has also been included in this paper. Diversity is a influential receiver system which offers improvement over received signal strength.

It has been proven in the literature that the main technological factors limiting the communication rates of quantum cryptography systems by single photon are mainly related to the choice of the encoding method. In fact, the efficiency of the used sources is very limited, at best of the order of a few percent for the single photon sources and the photon counters can not be operated beyond a certain speed and with a low order of detection efficiency. In order to overcome partially these drawbacks, it is advantageous to use continuous quantum states as an alternative to standard encodings based on quantum qubits. In this context, we propose a new reconciliation method based on Turbo codes. Our theoretical model assumptions are supported by experimental results. Indeed, our method leads to a significant improvement of the protocol security and a large decrease of the QBER. The gain is obtained with a reasonable complexity increase. Also, the novelty of our work is that it tested the reconciliation method on a real photonic system under VPItransmissionMaker.

Deep neural networks are known to be annotation-hungry. Numerous efforts have been devoted to reducing the annotation cost when learning with deep networks. Two prominent directions include learning with noisy labels and semi-supervised learning by exploiting unlabeled data. In this work, we propose DivideMix, a novel framework for learning with noisy labels by leveraging semi-supervised learning techniques. In particular, DivideMix models the per-sample loss distribution with a mixture model to dynamically divide the training data into a labeled set with clean samples and an unlabeled set with noisy samples, and trains the model on both the labeled and unlabeled data in a semi-supervised manner. To avoid confirmation bias, we simultaneously train two diverged networks where each network uses the dataset division from the other network. During the semi-supervised training phase, we improve the MixMatch strategy by performing label co-refinement and label co-guessing on labeled and unlabeled samples, respectively. Experiments on multiple benchmark datasets demonstrate substantial improvements over state-of-the-art methods. Code is available at https://github.com/LiJunnan1992/DivideMix .

This paper investigates a reconciliation method in order to establish an errorless secret key in a QKD protocol. Classical key distribution protocols are no longer unconditionally secure because computational complexity of mathematical problems forced hardships. In this context, QKD protocols offer a highest level of security because they are based on the quantum laws of physics. But, the protocol performances can be lowered by multiples errors. It appears clearly that reconciliation should be performed in such a situation in order to remove the errors as for the legitimate partners. The proposed method accomplishes reconciliation by using QTC in the special problem of sideinformation source coding (Slepian-Wolf coding model). Our theoretical hypothesis are sustained by experimental results that confirm the advantage of our method in resolving reconciliation problem compared to a recent related work. Indeed, the integration of our method generates an important progess in security and a large decrease of the QBER. The gain is obtained with a reasonable complexity increase. Also, the novelty of our work is that it tested the reconciliation method on a real photonic system under VPItransmissionMaker.

Open-domain retrieval-based dialogue systems require a considerable amount of training data to learn their parameters. However, in practice, the negative samples of training data are usually selected from an unannotated conversation data set at random. The generated training data is likely to contain noise and affect the performance of the response selection models. To address this difficulty, we consider utilizing the underlying correlation in the data resource itself to derive different kinds of supervision signals and reduce the influence of noisy data. More specially, we consider a main-complementary task pair. The main task (\ie our focus) selects the correct response given the last utterance and context, and the complementary task selects the last utterance given the response and context. The key point is that the output of the complementary task is used to set instance weights for the main task. We conduct extensive experiments in two public datasets and obtain significant improvement in both datasets. We also investigate the variant of our approach in multiple aspects, and the results have verified the effectiveness of our approach.

Federated learning is a new distributed machine learning framework, where a bunch of heterogeneous clients collaboratively train a model without sharing training data. In this work, we consider a practical and ubiquitous issue in federated learning: intermittent client availability, where the set of eligible clients may change during the training process. Such an intermittent client availability model would significantly deteriorate the performance of the classical Federated Averaging algorithm (FedAvg for short). We propose a simple distributed non-convex optimization algorithm, called Federated Latest Averaging (FedLaAvg for short), which leverages the latest gradients of all clients, even when the clients are not available, to jointly update the global model in each iteration. Our theoretical analysis shows that FedLaAvg attains the convergence rate of $O(1/(N^{1/4} T^{1/2}))$, achieving a sublinear speedup with respect to the total number of clients. We implement and evaluate FedLaAvg with the CIFAR-10 dataset. The evaluation results demonstrate that FedLaAvg indeed reaches a sublinear speedup and achieves 4.23% higher test accuracy than FedAvg.

In recent years we see a rapidly growing line of research which shows learnability of various models via common neural network algorithms. Yet, besides a very few outliers, these results show learnability of models that can be learned using linear methods. Namely, such results show that learning neural-networks with gradient-descent is competitive with learning a linear classifier on top of a data-independent representation of the examples. This leaves much to be desired, as neural networks are far more successful than linear methods. Furthermore, on the more conceptual level, linear models don't seem to capture the deepness" of deep networks. In this paper we make a step towards showing leanability of models that are inherently non-linear. We show that under certain distributions, sparse parities are learnable via gradient decent on depth-two network. On the other hand, under the same distributions, these parities cannot be learned efficiently by linear methods.

Graph Pattern based Node Matching (GPNM) is to find all the matches of the nodes in a data graph GD based on a given pattern graph GP. GPNM has become increasingly important in many applications, e.g., group finding and expert recommendation. In real scenarios, both GP and GD are updated frequently. However, the existing GPNM methods either need to perform a new GPNM procedure from scratch to deliver the node matching results based on the updated GP and GD or incrementally perform the GPNM procedure for each of the updates, leading to low efficiency. Therefore, there is a pressing need for a new method to efficiently deliver the node matching results on the updated graphs. In this paper, we first analyze and detect the elimination relationships between the updates. Then, we construct an Elimination Hierarchy Tree (EH-Tree) to index these elimination relationships. In order to speed up the GPNM process, we propose a graph partition method and then propose a new updates-aware GPNM method, called UA-GPNM, considering the single-graph elimination relationships among the updates in a single graph of GP or GD, and also the cross-graph elimination relationships between the updates in GP and the updates in GD. UA-GPNM first delivers the GPNM result of an initial query, and then delivers the GPNM result of a subsequent query, based on the initial GPNM result and the multiple updates that occur between two queries. The experimental results on five real-world social graphs demonstrate that our proposed UA-GPNM is much more efficient than the state-of-the-art GPNM methods.

Most current blockchains require all full nodes to execute all tasks limits the throughput of existing blockchains, which are well documented and among the most significant hurdles for the widespread adoption of decentralized technology.

This paper extends out presentation of Flow, a pipelined blockchain architecture, which separates the process of consensus on the transaction order from transaction computation. As we experimentally showed in our previous white paper, our architecture provides a significant throughput improvement while preserving the security of the system. Flow exploits the heterogeneity offered by the nodes, in terms of bandwidth, storage, and computational capacity, and defines the roles for the nodes based on their tasks in the pipeline, i.e., Collector, Consensus, Execution, and Verification. While transaction collection from the user agents is completed through the bandwidth-optimized Collector Nodes, the execution of them is done by the compute-optimized Execution Nodes. Checking the execution result is then distributed among a more extensive set of Verification Nodes, which confirm the result is correct in a distributed and parallel manner. In contrast to more traditional blockchain architectures, Flow's Consensus Nodes do not execute the transaction. Instead, Verification Nodes report observed faulty executions to the Consensus Nodes, which adjudicate the received challenges and slash malicious actors.

In this paper, we detail the lifecycle of the transactions from the submission to the system until they are getting executed. The paper covers the Collector, Consensus, and Execution role. We provide a protocol specification of collecting the transactions, forming a block, and executing the resulting block. Moreover, we elaborate on the safety and liveness of the system concerning these processes.

There has been an ongoing cycle where stronger defenses against adversarial attacks are subsequently broken by a more advanced defense-aware attack. We present a new approach towards ending this cycle where we "deflect'' adversarial attacks by causing the attacker to produce an input that semantically resembles the attack's target class. To this end, we first propose a stronger defense based on Capsule Networks that combines three detection mechanisms to achieve state-of-the-art detection performance on both standard and defense-aware attacks. We then show that undetected attacks against our defense often perceptually resemble the adversarial target class by performing a human study where participants are asked to label images produced by the attack. These attack images can no longer be called "adversarial'' because our network classifies them the same way as humans do.

We consider an extension of the rollout algorithm that applies to constrained deterministic dynamic programming, including challenging combinatorial optimization problems. The algorithm relies on a suboptimal policy, called base heuristic. Under suitable assumptions, we show that if the base heuristic produces a feasible solution, the rollout algorithm has a cost improvement property: it produces a feasible solution, whose cost is no worse than the base heuristic's cost.

We then focus on multiagent problems, where the control at each stage consists of multiple components (one per agent), which are coupled either through the cost function or the constraints or both. We show that the cost improvement property is maintained with an alternative implementation that has greatly reduced computational requirements, and makes possible the use of rollout in problems with many agents. We demonstrate this alternative algorithm by applying it to layered graph problems that involve both a spatial and a temporal structure. We consider in some detail a prominent example of such problems: multidimensional assignment, where we use the auction algorithm for 2-dimensional assignment as a base heuristic. This auction algorithm is particularly well-suited for our context, because through the use of prices, it can advantageously use the solution of an assignment problem as a starting point for solving other related assignment problems, and this can greatly speed up the execution of the rollout algorithm.

Online real-time bidding (RTB) is known as a complex auction game where ad platforms seek to consider various influential key performance indicators (KPIs), like revenue and return on investment (ROI). The trade-off among these competing goals needs to be balanced on a massive scale. To address the problem, we propose a multi-objective reinforcement learning algorithm, named MoTiAC, for the problem of bidding optimization with various goals. Specifically, in MoTiAC, instead of using a fixed and linear combination of multiple objectives, we compute adaptive weights overtime on the basis of how well the current state agrees with the agent's prior. In addition, we provide interesting properties of model updating and further prove that Pareto optimality could be guaranteed. We demonstrate the effectiveness of our method on a real-world commercial dataset. Experiments show that the model outperforms all state-of-the-art baselines.

Consider a distributed graph where each vertex holds one of two distinct opinions. In this paper, we are interested in synchronous voting processes where each vertex updates its opinion according to a predefined common local updating rule. For example, each vertex adopts the majority opinion among 1) itself and two randomly picked neighbors in best-of-two or 2) three randomly picked neighbors in best-of-three. Previous works intensively studied specific rules including best-of-two and best-of-three individually.

In this paper, we generalize and extend previous works of best-of-two and best-of-three on expander graphs by proposing a new model, quasi-majority functional voting. This new model contains best-of-two and best-of-three as special cases. We show that, on expander graphs with sufficiently large initial bias, any quasi-majority functional voting reaches consensus within $O(\log n)$ steps with high probability. Moreover, we show that, for any initial opinion configuration, any quasi-majority functional voting on expander graphs with higher expansion (e.g., Erd\H{o}s-R\'enyi graph $G(n,p)$ with $p=\Omega(1/\sqrt{n})$) reaches consensus within $O(\log n)$ with high probability. Furthermore, we show that the consensus time is $O(\log n/\log k)$ of best-of-$(2k+1)$ for $k=o(n/\log n)$.

We consider the parity variants of basic problems studied in fine-grained complexity. We show that finding the exact solution is just as hard as finding its parity (i.e. if the solution is even or odd) for a large number of classical problems, including All-Pairs Shortest Paths (APSP), Diameter, Radius, Median, Second Shortest Path, Maximum Consecutive Subsums, Min-Plus Convolution, and $0/1$-Knapsack.

A direct reduction from a problem to its parity version is often difficult to design. Instead, we revisit the existing hardness reductions and tailor them in a problem-specific way to the parity version. Nearly all reductions from APSP in the literature proceed via the (subcubic-equivalent but simpler) Negative Weight Triangle (NWT) problem. Our new modified reductions also start from NWT or a non-standard parity variant of it. We are not able to establish a subcubic-equivalence with the more natural parity counting variant of NWT, where we ask if the number of negative triangles is even or odd. Perhaps surprisingly, we justify this by designing a reduction from the seemingly-harder Zero Weight Triangle problem, showing that parity is (conditionally) strictly harder than decision for NWT.

The dominant object detection approaches treat each dataset separately and fit towards a specific domain, which cannot adapt to other domains without extensive retraining. In this paper, we address the problem of designing a universal object detection model that exploits diverse category granularity from multiple domains and predict all kinds of categories in one system. Existing works treat this problem by integrating multiple detection branches upon one shared backbone network. However, this paradigm overlooks the crucial semantic correlations between multiple domains, such as categories hierarchy, visual similarity, and linguistic relationship. To address these drawbacks, we present a novel universal object detector called Universal-RCNN that incorporates graph transfer learning for propagating relevant semantic information across multiple datasets to reach semantic coherency. Specifically, we first generate a global semantic pool by integrating all high-level semantic representation of all the categories. Then an Intra-Domain Reasoning Module learns and propagates the sparse graph representation within one dataset guided by a spatial-aware GCN. Finally, an InterDomain Transfer Module is proposed to exploit diverse transfer dependencies across all domains and enhance the regional feature representation by attending and transferring semantic contexts globally. Extensive experiments demonstrate that the proposed method significantly outperforms multiple-branch models and achieves the state-of-the-art results on multiple object detection benchmarks (mAP: 49.1% on COCO).

Reinforcement learning agents usually learn from scratch, which requires a large number of interactions with the environment. This is quite different from the learning process of human. When faced with a new task, human naturally have the common sense and use the prior knowledge to derive an initial policy and guide the learning process afterwards. Although the prior knowledge may be not fully applicable to the new task, the learning process is significantly sped up since the initial policy ensures a quick-start of learning and intermediate guidance allows to avoid unnecessary exploration. Taking this inspiration, we propose knowledge guided policy network (KoGuN), a novel framework that combines human prior suboptimal knowledge with reinforcement learning. Our framework consists of a fuzzy rule controller to represent human knowledge and a refine module to fine-tune suboptimal prior knowledge. The proposed framework is end-to-end and can be combined with existing policy-based reinforcement learning algorithm. We conduct experiments on both discrete and continuous control tasks. The empirical results show that our approach, which combines human suboptimal knowledge and RL, achieves significant improvement on learning efficiency of flat RL algorithms, even with very low-performance human prior knowledge.

In this work, we discuss in detail a flaw in the original security proof of the W-OTS${^+}$ variant of the Winternitz one-time signature scheme, which is an important component for various stateless and stateful many-time hash-based digital signature schemes. We update the security proof for the W-OTS${^+}$ scheme and derive the corresponding security level. Our result is of importance for the security analysis of hash-based digital signature schemes.

The notions of distance and similarity play a key role in many machine learning approaches, and artificial intelligence (AI) in general, since they can serve as an organizing principle by which individuals classify objects, form concepts and make generalizations. While distance functions for propositional representations have been thoroughly studied, work on distance functions for structured representations, such as graphs, frames or logical clauses, has been carried out in different communities and is much less understood. Specifically, a significant amount of work that requires the use of a distance or similarity function for structured representations of data usually employs ad-hoc functions for specific applications. Therefore, the goal of this paper is to provide an overview of this work to identify connections between the work carried out in different areas and point out directions for future work.

Object detectors trained on fully-annotated data currently yield state of the art performance but require expensive manual annotations. On the other hand, weakly-supervised detectors have much lower performance and cannot be used reliably in a realistic setting. In this paper, we study the hybrid-supervised object detection problem, aiming to train a high quality detector with only a limited amount of fullyannotated data and fully exploiting cheap data with imagelevel labels. State of the art methods typically propose an iterative approach, alternating between generating pseudo-labels and updating a detector. This paradigm requires careful manual hyper-parameter tuning for mining good pseudo labels at each round and is quite time-consuming. To address these issues, we present EHSOD, an end-to-end hybrid-supervised object detection system which can be trained in one shot on both fully and weakly-annotated data. Specifically, based on a two-stage detector, we proposed two modules to fully utilize the information from both kinds of labels: 1) CAMRPN module aims at finding foreground proposals guided by a class activation heat-map; 2) hybrid-supervised cascade module further refines the bounding-box position and classification with the help of an auxiliary head compatible with image-level data. Extensive experiments demonstrate the effectiveness of the proposed method and it achieves comparable results on multiple object detection benchmarks with only 30% fully-annotated data, e.g. 37.5% mAP on COCO. We will release the code and the trained models.

It is known that Recurrent Neural Networks (RNNs) can remember, in their hidden layers, part of the semantic information expressed by a sequence (e.g., a sentence) that is being processed. Different types of recurrent units have been designed to enable RNNs to remember information over longer time spans. However, the memory abilities of different recurrent units are still theoretically and empirically unclear, thus limiting the development of more effective and explainable RNNs. To tackle the problem, in this paper, we identify and analyze the internal and external factors that affect the memory ability of RNNs, and propose a Semantic Euclidean Space to represent the semantics expressed by a sequence. Based on the Semantic Euclidean Space, a series of evaluation indicators are defined to measure the memory abilities of different recurrent units and analyze their limitations. These evaluation indicators also provide a useful guidance to select suitable sequence lengths for different RNNs during training.

Upon a consistent topological statistical theory the application of structural statistics requires a quantification of the proximity structure of model spaces. An important tool to study these structures are (Pseudo-)Riemannian metrices, which in the category of statistical models are induced by statistical divergences. The present article is intended to extend the notation of topological statistical models by a differential structure to statistical manifolds and to introduce the differential geometric foundations to study specific families of probability distributions. In this purpose the article successively incorporates the structures of differential-, Riemannian- and symplectic geometry within an underlying topological statistical model. The last section addresses a specific structural category, termed a dually flat statistical manifold, which can be used to study the properties of exponential families, which are of particular importance in machine learning and deep learning.

Explainability is a gateway between Artificial Intelligence and society as the current popular deep learning models are generally weak in explaining the reasoning process and prediction results. Local Interpretable Model-agnostic Explanation (LIME) is a recent technique that explains the predictions of any classifier faithfully by learning an interpretable model locally around the prediction. However, the sampling operation in the standard implementation of LIME is defective. Perturbed samples are generated from a uniform distribution, ignoring the complicated correlation between features. This paper proposes a novel Modified Perturbed Sampling operation for LIME (MPS-LIME), which is formalized as the clique set construction problem. In image classification, MPS-LIME converts the superpixel image into an undirected graph. Various experiments show that the MPS-LIME explanation of the black-box model achieves much better performance in terms of understandability, fidelity, and efficiency.

Most existing 3D CNNs for video representation learning are clip-based methods, and thus do not consider video-level temporal evolution of spatio-temporal features. In this paper, we propose Video-level 4D Convolutional Neural Networks, referred as V4D, to model the evolution of long-range spatio-temporal representation with 4D convolutions, and at the same time, to preserve strong 3D spatio-temporal representation with residual connections. Specifically, we design a new 4D residual block able to capture inter-clip interactions, which could enhance the representation power of the original clip-level 3D CNNs. The 4D residual blocks can be easily integrated into the existing 3D CNNs to perform long-range modeling hierarchically. We further introduce the training and inference methods for the proposed V4D. Extensive experiments are conducted on three video recognition benchmarks, where V4D achieves excellent results, surpassing recent 3D CNNs by a large margin.

Hyper-heuristic is a new methodology for the adaptive hybridization of meta-heuristic algorithms to derive a general algorithm for solving optimization problems. This work focuses on the selection type of hyper-heuristic, called the Exponential Monte Carlo with Counter (EMCQ). Current implementations rely on the memory-less selection that can be counterproductive as the selected search operator may not (historically) be the best performing operator for the current search instance. Addressing this issue, we propose to integrate the memory into EMCQ for combinatorial t-wise test suite generation using reinforcement learning based on the Q-learning mechanism, called Q-EMCQ. The limited application of combinatorial test generation on industrial programs can impact the use of such techniques as Q-EMCQ. Thus, there is a need to evaluate this kind of approach against relevant industrial software, with a purpose to show the degree of interaction required to cover the code as well as finding faults. We applied Q-EMCQ on 37 real-world industrial programs written in Function Block Diagram (FBD) language, which is used for developing a train control management system at Bombardier Transportation Sweden AB. The results of this study show that Q-EMCQ is an efficient technique for test case generation. Additionally, unlike the t-wise test suite generation, which deals with the minimization problem, we have also subjected Q-EMCQ to a maximization problem involving the general module clustering to demonstrate the effectiveness of our approach.

We suggest a generalization of Karchmer-Wigderson communication games to the multiparty setting. Our generalization turns out to be tightly connected to circuits consisting of threshold gates. This allows us to obtain new explicit constructions of such circuits for several functions. In particular, we provide an explicit (polynomial-time computable) log-depth monotone formula for Majority function, consisting only of 3-bit majority gates and variables. This resolves a conjecture of Cohen et al. (CRYPTO 2013).

The bigraph theory is a relatively young, yet formally rigorous, mathematical framework encompassing Robin Milner's previous work on process calculi, on the one hand, and provides a generic meta-model for complex systems such as multi-agent systems, on the other. A bigraph $F = \langle F^P, F^L\rangle$ is a superposition of two independent graph structures comprising a place graph $F^P$ (i.e., a forest) and a link graph $F^L$ (i.e., a hypergraph), sharing the same node set, to express locality and communication of processes independently from each other.

In this paper, we take some preparatory steps towards an algorithm for generating random bigraphs with preferential attachment feature w.r.t. $F^P$ and assortative (disassortative) linkage pattern w.r.t. $F^L$. We employ parameters allowing one to fine-tune the characteristics of the generated bigraph structures. To study the pattern formation properties of our algorithmic model, we analyze several metrics from graph theory based on artificially created bigraphs under different configurations.

Bigraphs provide a quite useful and expressive semantic for process calculi for mobile and global ubiquitous computing. So far, this subject has not received attention in the bigraph-related scientific literature. However, artificial models may be particularly useful for simulation and evaluation of real-world applications in ubiquitous systems necessitating random structures.

We consider practical data characteristics underlying federated learning, where unbalanced and non-i.i.d. data from clients have a block-cyclic structure: each cycle contains several blocks, and each client's training data follow block-specific and non-i.i.d. distributions. Such a data structure would introduce client and block biases during the collaborative training: the single global model would be biased towards the client or block specific data. To overcome the biases, we propose two new distributed optimization algorithms called multi-model parallel SGD (MM-PSGD) and multi-chain parallel SGD (MC-PSGD) with a convergence rate of $O(1/\sqrt{NT})$, achieving a linear speedup with respect to the total number of clients. In particular, MM-PSGD adopts the block-mixed training strategy, while MC-PSGD further adds the block-separate training strategy. Both algorithms create a specific predictor for each block by averaging and comparing the historical global models generated in this block from different cycles. We extensively evaluate our algorithms over the CIFAR-10 dataset. Evaluation results demonstrate that our algorithms significantly outperform the conventional federated averaging algorithm in terms of test accuracy, and also preserve robustness for the variance of critical parameters.

In this article I proposed a new model to achieve Chinese word segmentation(CWS),which may have the potentiality to apply in other domains in the future.It is a new thinking in CWS compared to previous works,to consider it as a clustering problem instead of a labeling problem.In this model,LSTM and self attention structures are used to collect context also sentence level features in every layer,and after several layers,a clustering model is applied to split characters into groups,which are the final segmentation results.I call this model CLNN.This algorithm can reach 98 percent of F score (without OOV words) and 85 percent to 95 percent F score (with OOV words) in training data sets.Error analyses shows that OOV words will greatly reduce performances,which needs a deeper research in the future.

The tensor train approximation of electronic wave functions lies at the core of the QC-DMRG (Quantum Chemistry Density Matrix Renormalization Group) method, a recent state-of-the-art method for numerically solving the $N$-electron Schr\"odinger equation. It is well known that the accuracy of TT approximations is governed by the tail of the associated singular values, which in turn strongly depends on the ordering of the one-body basis.

Here we find that the singular values $s_1\ge s_2\ge ... \ge s_d$ of tensors representing ground states of noninteracting Hamiltonians possess a surprising inversion symmetry, $s_1s_d=s_2s_{d-1}=s_3s_{d-2}=...$, thus reducing the tail behaviour to a single hidden invariant, which moreover depends explicitly on the ordering of the basis. For correlated wavefunctions, we find that the tail is upper bounded by a suitable superposition of the invariants. Optimizing the invariants or their superposition thus provides a new ordering scheme for QC-DMRG. Numerical tests on simple examples, i.e. linear combinations of a few Slater determinants, show that the new scheme reduces the tail of the singular values by several orders of magnitudes over existing methods, including the widely used Fiedler order.

Given a dataset $V$ of points from some metric space, the popular $k$-center problem requires to identify a subset of $k$ points (centers) in $V$ minimizing the maximum distance of any point of $V$ from its closest center. The \emph{robust} formulation of the problem features a further parameter $z$ and allows up to $z$ points of $V$ (outliers) to be disregarded when computing the maximum distance from the centers. In this paper, we focus on two important constrained variants of the robust $k$-center problem, namely, the Robust Matroid Center (RMC) problem, where the set of returned centers are constrained to be an independent set of a matroid of rank $k$ built on $V$, and the Robust Knapsack Center (RKC) problem, where each element $i\in V$ is given a positive weight $w_i<1$ and the aggregate weight of the returned centers must be at most 1. We devise coreset-based strategies for the two problems which yield efficient sequential, MapReduce, and Streaming algorithms. More specifically, for any fixed $\epsilon>0$, the algorithms return solutions featuring a $(3+\epsilon)$-approximation ratio, which is a mere additive term $\epsilon$ away from the 3-approximations achievable by the best known polynomial-time sequential algorithms for the two problems. Moreover, the algorithms obliviously adapt to the intrinsic complexity of the dataset, captured by its doubling dimension $D$. For wide ranges of the parameters $k,z,\epsilon, D$, we obtain a sequential algorithm with running time linear in $|V|$, and MapReduce/Streaming algorithms with few rounds/passes and substantially sublinear local/working memory.

Registration of multi-view point sets is a prerequisite for 3D model reconstruction. To solve this problem, most of previous approaches either partially explore available information or blindly utilize unnecessary information to align each point set, which may lead to the undesired results or introduce extra computation complexity. To this end, this paper consider the multi-view registration problem as a maximum likelihood estimation problem and proposes a novel multi-view registration approach under the perspective of Expectation-Maximization (EM). The basic idea of our approach is that different data points are generated by the same number of Gaussian mixture models (GMMs). For each data point in one well-aligned point set, its nearest neighbors can be searched from other well-aligned point sets to explore more available information. Then, we can suppose this data point is generated by the special GMM, which is composed of each of its nearest neighbor adhered with one Gaussian distribution. Based on this assumption, it is reasonable to define the likelihood function, which contains all rigid transformations required to be estimated for multi-view registration. Subsequently, the EM algorithm is utilized to maximize the likelihood function so as to estimate all rigid transformations. Finally, the proposed approach is tested on several bench mark data sets and compared with some state-of-the-art algorithms. Experimental results illustrate its super performance on accuracy and efficiency for the registration of multi-view point sets.

We study the existence of approximate pure Nash equilibria ($\alpha$-PNE) in weighted atomic congestion games with polynomial cost functions of maximum degree $d$. Previously it was known that $d$-approximate equilibria always exist, while nonexistence was established only for small constants, namely for $1.153$-PNE. We improve significantly upon this gap, proving that such games in general do not have $\tilde{\Theta}(\sqrt{d})$-approximate PNE, which provides the first super-constant lower bound.

Furthermore, we provide a black-box gap-introducing method of combining such nonexistence results with a specific circuit gadget, in order to derive NP-completeness of the decision version of the problem. In particular, deploying this technique we are able to show that deciding whether a weighted congestion game has an $\tilde{O}(\sqrt{d})$-PNE is NP-complete. Previous hardness results were known only for the special case of exact equilibria and arbitrary cost functions.

The circuit gadget is of independent interest and it allows us to also prove hardness for a variety of problems related to the complexity of PNE in congestion games. For example, we demonstrate that the question of existence of $\alpha$-PNE in which a certain set of players plays a specific strategy profile is NP-hard for any $\alpha < 3^{d/2}$, even for unweighted congestion games.

Finally, we study the existence of approximate equilibria in weighted congestion games with general (nondecreasing) costs, as a function of the number of players $n$. We show that $n$-PNE always exist, matched by an almost tight nonexistence bound of $\tilde\Theta(n)$ which we can again transform into an NP-completeness proof for the decision problem.

Gaussian Markov random fields (GMRFs) are probabilistic graphical models widely used in spatial statistics and related fields to model dependencies over spatial structures. We establish a formal connection between GMRFs and convolutional neural networks (CNNs). Common GMRFs are special cases of a generative model where the inverse mapping from data to latent variables is given by a 1-layer linear CNN. This connection allows us to generalize GMRFs to multi-layer CNN architectures, effectively increasing the order of the corresponding GMRF in a way which has favorable computational scaling. We describe how well-established tools, such as autodiff and variational inference, can be used for simple and efficient inference and learning of the deep GMRF. We demonstrate the flexibility of the proposed model and show that it outperforms the state-of-the-art on a dataset of satellite temperatures, in terms of prediction and predictive uncertainty.

We propose an algorithm for calculating the cardiothoracic ratio (CTR) from chest X-ray films. Our approach applies a deep learning model based on U-Net with VGG16 encoder to extract lung and heart masks from chest X-ray images and calculate CTR from the extents of obtained masks. Human radiologists evaluated our CTR measurements, and $76.5\%$ were accepted to be included in medical reports without any need for adjustment. This result translates to a large amount of time and labor saved for radiologists using our automated tools.

In this paper, a Neural network is derived from first principles, assuming only that each layer begins with a linear dimension-reducing transformation. The approach appeals to the principle of Maximum Entropy (MaxEnt) to find the posterior distribution of the input data of each layer, conditioned on the layer output variables. This posterior has a well-defined mean, the conditional mean estimator, that is calculated using a type of neural network with theoretically-derived activation functions similar to sigmoid, softplus, and relu. This implicitly provides a theoretical justification for their use. A theorem that finds the conditional distribution and conditional mean estimator under the MaxEnt prior is proposed, unifying results for special cases. Combining layers results in an auto-encoder with conventional feed-forward analysis network and a type of linear Bayesian belief network in the reconstruction path.

An exploratory, descriptive analysis is presented of the national orientation of scientific, scholarly journals as reflected in the affiliations of publishing or citing authors. It calculates for journals covered in Scopus an Index of National Orientation (INO), and analyses the distribution of INO values across disciplines and countries, and the correlation between INO values and journal impact factors. The study did not find solid evidence that journal impact factors are good measures of journal internationality in terms of the geographical distribution of publishing or citing authors, as the relationship between a journal's national orientation and its citation impact is found to be inverse U-shaped. In addition, journals publishing in English are not necessarily internationally oriented in terms of the affiliations of publishing or citing authors; in social sciences and humanities also USA has their nationally oriented literatures. The paper examines the extent to which nationally oriented journals entering Scopus in earlier years, have become in recent years more international. It is found that in the study set about 40 per cent of such journals does reveal traces of internationalization, while the use of English as publication language and an Open Access (OA) status are important determinants.

In this work, we propose Knowledge Integration Networks (referred as KINet) for video action recognition. KINet is capable of aggregating meaningful context features which are of great importance to identifying an action, such as human information and scene context. We design a three-branch architecture consisting of a main branch for action recognition, and two auxiliary branches for human parsing and scene recognition which allow the model to encode the knowledge of human and scene for action recognition. We explore two pre-trained models as teacher networks to distill the knowledge of human and scene for training the auxiliary tasks of KINet. Furthermore, we propose a two-level knowledge encoding mechanism which contains a Cross Branch Integration (CBI) module for encoding the auxiliary knowledge into medium-level convolutional features, and an Action Knowledge Graph (AKG) for effectively fusing high-level context information. This results in an end-to-end trainable framework where the three tasks can be trained collaboratively, allowing the model to compute strong context knowledge efficiently. The proposed KINet achieves the state-of-the-art performance on a large-scale action recognition benchmark Kinetics-400, with a top-1 accuracy of 77.8%. We further demonstrate that our KINet has strong capability by transferring the Kinetics-trained model to UCF-101, where it obtains 97.8% top-1 accuracy.

It is often useful to tap secondary information from a running R script. Obvious use cases include logging, and profiling of time or memory consuption. Perhaps less obvious cases include tracking changes in R objects or collecting output of unit tests (assertions). In this paper we demonstrate an approach that abstracts collection and processing of such secondary information from the code in the running script. The approach is implemented in pure R, and allows users to control the secondary information stream stream without global side effects and without altering existing code. Although some elements of the approach discussed here have been applied in existing packages, the combination of elements proposed here appears thus far to have been overlooked.

The article clarifies the concept of "virtual information educational environment" (VIEE) and examines the researchers' views on its meaning exposed in the scientific literature. The article determines the didactic potential of the virtual information educational environment for the geography students training based on the analysis of the authors' experience of blended learning by means of the Google Classroom. It also specifies the features (immersion, interactivity, and dynamism, sense of presence, continuity, and causality). The authors highlighted the advantages of virtual information educational environment implementation, such as: increase of the efficiency of the educational process by intensifying the process of cognition and interpersonal interactive communication; continuous access to multimedia content both in Google Classroom and beyond; saving student time due to the absence of necessity to work out the training material "manually"; availability of virtual pages of the virtual class; individualization of the educational process; formation of informational culture of the geography students ; and more productive learning of the educational material at the expense of IT educational facilities. Among the disadvantages the article mentions low level of computerization, insignificant quantity and low quality of software products, underestimation of the role of VIEE in the professional training of geography students, and the lack of economic stimuli, etc.

This paper contributes to the design of a fractional order (FO) internal model controller (IMC) for a first order plus time delay (FOPTD) process model to satisfy a given set of desired robustness specifications in terms of gain margin (Am) and phase margin (Pm). The highlight of the design is the choice of a fractional order (FO) filter in the IMC structure which has two parameters (lambda and beta) to tune as compared to only one tuning parameter (lambda) for traditionally used integer order (IO) filter. These parameters are evaluated for the controller, so that Am and Pm can be chosen independently. A new methodology is proposed to find a complete solution for controller parameters, the methodology also gives the system gain cross-over frequency (wg) and phase cross-over frequency (wp). Moreover, the solution is found without any approximation of the delay term appearing in the controller.

Dimensionality reduction methods are an essential tool for multidimensional data analysis, and many interesting processes can be studied as time-dependent multivariate datasets. There are, however, few studies and proposals that leverage on the concise power of expression of projections in the context of dynamic/temporal data. In this paper, we aim at providing an approach to assess projection techniques for dynamic data and understand the relationship between visual quality and stability. Our approach relies on an experimental setup that consists of existing techniques designed for time-dependent data and new variations of static methods. To support the evaluation of these techniques, we provide a collection of datasets that has a wide variety of traits that encode dynamic patterns, as well as a set of spatial and temporal stability metrics that assess the quality of the layouts. We present an evaluation of 11 methods, 10 datasets, and 12 quality metrics, and elect the best-suited methods for projecting time-dependent multivariate data, exploring the design choices and characteristics of each method. All our results are documented and made available in a public repository to allow reproducibility of results.

Motion blur is a known issue in photography, as it limits the exposure time while capturing moving objects. Extensive research has been carried to compensate for it. In this work, a computational imaging approach for motion deblurring is proposed and demonstrated. Using dynamic phase-coding in the lens aperture during the image acquisition, the trajectory of the motion is encoded in an intermediate optical image. This encoding embeds both the motion direction and extent by coloring the spatial blur of each object. The color cues serve as prior information for a blind deblurring process, implemented using a convolutional neural network (CNN) trained to utilize such coding for image restoration. We demonstrate the advantage of the proposed approach over blind-deblurring with no coding and other solutions that use coded acquisition, both in simulation and real-world experiments.

Fully supervised deep-learning based denoisers are currently the most performing image denoising solutions. However, they require clean reference images. When the target noise is complex, e.g. composed of an unknown mixture of primary noises with unknown intensity, fully supervised solutions are limited by the difficulty to build a suited training set for the problem. This paper proposes a gradual denoising strategy that iteratively detects the dominating noise in an image, and removes it using a tailored denoiser. The method is shown to keep up with state of the art blind denoisers on mixture noises. Moreover, noise analysis is demonstrated to guide denoisers efficiently not only on noise type, but also on noise intensity. The method provides an insight on the nature of the encountered noise, and it makes it possible to extend an existing denoiser with new noise nature. This feature makes the method adaptive to varied denoising cases.

Wirelessly interconnected sensors, actuators, and controllers promise greater flexibility, lower installation and maintenance costs, and higher robustness in harsh conditions than wired solutions. However, to facilitate the adoption of wireless communication in cyber-physical systems (CPS), the functional and non-functional properties must be similar to those known from wired architectures. We thus present Time-Triggered Wireless (TTW), a wireless architecture for multi-mode CPS that offers reliable communication with guarantees on end-to-end delays and jitter among distributed applications executing on low-cost, low-power embedded devices. We achieve this by exploiting the high reliability and deterministic behavior of a synchronous-transmission-based communication stack we design, and by coupling the timings of distributed task executions and message exchanges across the wireless network by solving a novel co-scheduling problem. While some of the concepts in TTW have existed for some time and TTW has already been successfully applied for feedback control and coordination of multiple mechanical systems with closed-loop stability guarantees, this paper presents the key algorithmic, scheduling, and networking mechanisms behind TTW, along with their experimental evaluation, which have not been known so far. TTW is open source and ready to use.

Land-use regression (LUR) models are important for the assessment of air pollution concentrations in areas without measurement stations. While many such models exist, they often use manually constructed features based on restricted, locally available data. Thus, they are typically hard to reproduce and challenging to adapt to areas beyond those they have been developed for. In this paper, we advocate a paradigm shift for LUR models: We propose the Data-driven, Open, Global (DOG) paradigm that entails models based on purely data-driven approaches using only openly and globally available data. Progress within this paradigm will alleviate the need for experts to adapt models to the local characteristics of the available data sources and thus facilitate the generalizability of air pollution models to new areas on a global scale. In order to illustrate the feasibility of the DOG paradigm for LUR, we introduce a deep learning model called MapLUR. It is based on a convolutional neural network architecture and is trained exclusively on globally and openly available map data without requiring manual feature engineering. We compare our model to state-of-the-art baselines like linear regression, random forests and multi-layer perceptrons using a large data set of modeled $\text{NO}_2$ concentrations in Central London. Our results show that MapLUR significantly outperforms these approaches even though they are provided with manually tailored features. Furthermore, we illustrate that the automatic feature extraction inherent to models based on the DOG paradigm can learn features that are readily interpretable and closely resemble those commonly used in traditional LUR approaches.

The paper studies the main aspects of the realization of 2 x 2 ternary reversible circuits based on cycles, considering the results of the realization of all 362,880 2 x 2 ternary reversible functions. It has been shown that in most cases, realizations obtained with the MMD+ algorithm have a lower complexity (in terms of cost) than realizations based on cycles. In the paper it is shown under which conditions realizations based on transpositions may have a higher or a lower cost than realizations using larger cycles. Finally it is shown that there are a few special cases where realizations based on transpositions have the same cost or possibly lower cost than the MMD+ based realizations. Aspects of scaleability are considered in terms of 2 x 2-based n x n reversible circuits.

Score matching provides an effective approach to learning flexible unnormalized models, but its scalability is limited by the need to evaluate a second-order derivative. In this paper, we present a scalable approximation to a general family of learning objectives including score matching, by observing a new connection between these objectives and Wasserstein gradient flows. We present applications with promise in learning neural density estimators on manifolds, and training implicit variational and Wasserstein auto-encoders with a manifold-valued prior.

We consider a diffuse interface approach for solving an elliptic PDE on a given closed hypersurface. The method is based on a (bulk) finite element scheme employing numerical quadrature for the phase field function and hence is very easy to implement compared to other approaches. We estimate the error in natural norms in terms of the spatial grid size, the interface width and the order of the underlying quadrature rule. Numerical test calculations are presented which confirm the form of the error bounds.

Recent years have seen an increasing integration of distributed renewable energy resources into existing electric power grids. Due to the uncertain nature of renewable energy resources, network operators are faced with new challenges in balancing load and generation. In order to meet the new requirements, intelligent distributed energy resource plants can be used. However, the calculation of an adequate schedule for the unit commitment of such distributed energy resources is a complex optimization problem which is typically too complex for standard optimization algorithms if large numbers of distributed energy resources are considered. For solving such complex optimization tasks, population-based metaheuristics -- as, e.g., evolutionary algorithms -- represent powerful alternatives. Admittedly, evolutionary algorithms do require lots of computational power for solving such problems in a timely manner. One promising solution for this performance problem is the parallelization of the usually time-consuming evaluation of alternative solutions. In the present paper, a new generic and highly scalable parallel method for unit commitment of distributed energy resources using metaheuristic algorithms is presented. It is based on microservices, container virtualization and the publish/subscribe messaging paradigm for scheduling distributed energy resources. Scalability and applicability of the proposed solution are evaluated by performing parallelized optimizations in a big data environment for three distinct distributed energy resource scheduling scenarios. Thereby, unlike all other optimization methods in the literature, the new method provides cluster or cloud parallelizability and is able to deal with a comparably large number of distributed energy resources. The application of the new proposed method results in very good performance for scaling up optimization speed.

Reliability is an important requirement for both communication and storage systems. Due to continuous scale down of technology multiple adjacent bits error probability increases. The data may be corrupted due soft errors. Error correction codes are used to detect and correct the errors. In this paper, design of single error correction-double error detection (SEC-DED) and single error correction-double error detection-double adjacent error correction (SEC-DED-DAEC) codes of different data lengths have been proposed. Proposed SEC-DED and SEC-DED-DAEC codes require lower delay and power compared to existing coding schemes. Area complexity in terms of logic gates of proposed and existing codes have been presented. ASIC-based synthesis results show a notable reduction compared to existing SEC-DED codes. All the codec architectures are synthesized on ASIC platform. Performances of different SEC-DED-DAEC codes are tabulated in terms of area, power and delay.

Fault-tolerant distributed systems offer high reliability because even if faults in their components occur, they do not exhibit erroneous behavior. Depending on the fault model adopted, hardware and software errors that do not result in a process crashing are usually not tolerated. To tolerate these rather common failures the usual solution is to adopt a stronger fault model, such as the arbitrary or Byzantine fault model. Algorithms created for this fault model, however, are considerably more complex and require more system resources than the ones developed for less strict fault models. One approach to reach a middle ground is the non-malicious arbitrary fault model. This model assumes it is possible to detect and filter faults with a given probability, if these faults are not created with malicious intent, allowing the isolation and mapping of these faults to benign faults. In this paper we describe how we incremented an implementation of active replication in the non-malicious fault model with a basic type of distributed validation, where a deviation from the expected algorithm behavior will make a process crash. We experimentally evaluate this implementation using a fault injection framework showing that it is feasible to extend the concept of non-malicious failures beyond hardware failures.

Knowledge-grounded dialogue is a task of generating an informative response based on both discourse context and external knowledge. As we focus on better modeling the knowledge selection in the multi-turn knowledge-grounded dialogue, we propose a sequential latent variable model as the first approach to this matter. The model named sequential knowledge transformer (SKT) can keep track of the prior and posterior distribution over knowledge; as a result, it can not only reduce the ambiguity caused from the diversity in knowledge selection of conversation but also better leverage the response information for proper choice of knowledge. Our experimental results show that the proposed model improves the knowledge selection accuracy and subsequently the performance of utterance generation. We achieve the new state-of-the-art performance on Wizard of Wikipedia (Dinan et al., 2019) as one of the most large-scale and challenging benchmarks. We further validate the effectiveness of our model over existing conversation methods in another knowledge-based dialogue Holl-E dataset (Moghe et al., 2018).

It is now cost-effective to outsource large dataset and perform query over the cloud. However, in this scenario, there exist serious security and privacy issues that sensitive information contained in the dataset can be leaked. The most effective way to address that is to encrypt the data before outsourcing. Nevertheless, it remains a grand challenge to process queries in ciphertext efficiently. In this work, we shall focus on solving one representative query task, namely \textit{dynamic skyline query}, in a secure manner over the cloud. However, it is difficult to be performed on encrypted data as its dynamic domination criteria require both subtraction and comparison, which cannot be directly supported by a single encryption scheme efficiently. To this end, we present a novel framework called \textsc{scale}. It works by transforming traditional dynamic skyline domination into pure comparisons. The whole process can be completed in single-round interaction between user and the cloud. We theoretically prove that the outsourced database, query requests, and returned results are all kept secret under our model. Moreover, we also present an efficient strategy for dynamic insertion and deletion of stored records. Empirical study over a series of datasets demonstrates that our framework improves the efficiency of query processing by nearly\textbf{ three orders of magnitude} compared to the state-of-the-art.

An Intrusion Detection System (IDS) aims to alert users of incoming attacks by deploying a detector that monitors network traffic continuously. As an effort to increase detection capabilities, a set of independent IDS detectors typically work collaboratively to build intelligence of holistic network representation, which is referred to as Collaborative Intrusion Detection System (CIDS). However, developing an effective CIDS, particularly for the IoT ecosystem raises several challenges. Recent trends and advances in blockchain technology, which provides assurance in distributed trust and secure immutable storage, may contribute towards the design of effective CIDS. In this poster abstract, we present our ongoing work on a decentralized CIDS for IoT, which is based on blockchain technology. We propose an architecture that provides accountable trust establishment, which promotes incentives and penalties, and scalable intrusion information storage by exchanging bloom filters. We are currently implementing a proof-of-concept of our modular architecture in a local test-bed and evaluate its effectiveness in detecting common attacks in IoT networks and the associated overhead.

In the loss function of Variational Autoencoders there is a well known tension between two components: the reconstruction loss, improving the quality of the resulting images, and the Kullback-Leibler divergence, acting as a regularizer of the latent space. Correctly balancing these two components is a delicate issue, easily resulting in poor generative behaviours. In a recent work, Dai and Wipf obtained a sensible improvement by allowing the network to learn the balancing factor during training, according to a suitable loss function. In this article, we show that learning can be replaced by a simple deterministic computation, helping to understand the underlying mechanism, and resulting in a faster and more accurate behaviour. On typical datasets such as Cifar and Celeba, our technique sensibly outperforms all previous VAE architectures.

Distributed algorithms that operate in the fail-recovery model rely on the state stored in stable memory to guarantee the irreversibility of operations even in the presence of failures. The performance of these algorithms lean heavily on the performance of stable memory. Current storage technologies have a defined performance profile: data is accessed in blocks of hundreds or thousands of bytes, random access to these blocks is expensive and sequential access is somewhat better. File system implementations hide some of the performance limitations of the underlying storage devices using buffers and caches. However, fail-recovery distributed algorithms bypass some of these techniques and perform synchronous writes to be able to tolerate a failure during the write itself. Assuming the distributed system designer is able to buffer the algorithm's writes, we ask how buffer size and latency complement each other. In this paper we start to answer this question by characterizing the performance (throughput and latency) of typical stable memory devices using a representative set of current file systems.

Graph neural networks (GNNs) have received much attention recently because of their excellent performance on graph-based tasks. However, existing research on GNNs focuses on designing more effective models without considering much the quality of the input data itself. In this paper, we propose self-enhanced GNN, which improves the quality of the input data using the outputs of existing GNN models for better performance on semi-supervised node classification. As graph data consist of both topology and node labels, we improve input data quality from both perspectives. For topology, we observe that higher classification accuracy can be achieved when the ratio of inter-class edges (connecting nodes from different classes) is low and propose topology update to remove inter-class edges and add intra-class edges. For node labels, we propose training node augmentation, which enlarges the training set using the labels predicted by existing GNN models. As self-enhanced GNN improves the quality of the input graph data, it is general and can be easily combined with existing GNN models. Experimental results on three well-known GNN models and seven popular datasets show that self-enhanced GNN consistently improves the performance of the three models. The reduction in classification error is 16.2% on average and can be as high as 35.1%.

We analyze the effect of quantizing weights and activations of neural networks on their loss and derive a simple regularization scheme that improves robustness against post-training quantization. By training quantization-ready networks, our approach enables storing a single set of weights that can be quantized on-demand to different bit-widths as energy and memory requirements of the application change. Unlike quantization-aware training using the straight-through estimator that only targets a specific bit-width and requires access to training data and pipeline, our regularization-based method paves the way for "on the fly'' post-training quantization to various bit-widths. We show that by modeling quantization as a $\ell_\infty$-bounded perturbation, the first-order term in the loss expansion can be regularized using the $\ell_1$-norm of gradients. We experimentally validate the effectiveness of our regularization scheme on different architectures on CIFAR-10 and ImageNet datasets.

Few-shot learning is often motivated by the ability of humans to learn new tasks from few examples. However, standard few-shot classification benchmarks assume that the representation is learned on a limited amount of base class data, ignoring the amount of prior knowledge that a human may have accumulated before learning new tasks. At the same time, even if a powerful representation is available, it may happen in some domain that base class data are limited or non-existent. This motivates us to study a problem where the representation is obtained from a classifier pre-trained on a large-scale dataset of a different domain, assuming no access to its training process, while the base class data are limited to few examples per class and their role is to adapt the representation to the domain at hand rather than learn from scratch. We adapt the representation in two stages, namely on the few base class data if available and on the even fewer data of new tasks. In doing so, we obtain from the pre-trained classifier a spatial attention map that allows focusing on objects and suppressing background clutter. This is important in the new problem, because when base class data are few, the network cannot learn where to focus implicitly. We also show that a pre-trained network may be easily adapted to novel classes, without meta-learning.

We prove convergence of a finite difference approximation of the compressible Navier--Stokes system towards the strong solution in $R^d,$ $d=2,3,$ for the adiabatic coefficient $\gamma>1$. Employing the relative energy functional, we find a convergence rate which is \emph{uniform} in terms of the discretization parameters for $\gamma \geq d/2$. All results are \emph{unconditional} in the sense that we have no assumptions on the regularity nor boundedness of the numerical solution. We also provide numerical experiments to validate the theoretical convergence rate. To the best of our knowledge this work contains the first unconditional result on the convergence of a finite difference scheme for the unsteady compressible Navier--Stokes system in multiple dimensions.

In recent years, natural language processing (NLP) has got great development with deep learning techniques. In the sub-field of machine translation, a new approach named Neural Machine Translation (NMT) has emerged and got massive attention from both academia and industry. However, with a significant number of researches proposed in the past several years, there is little work in investigating the development process of this new technology trend. This literature survey traces back the origin and principal development timeline of NMT, investigates the important branches, categorizes different research orientations, and discusses some future research trends in this field.

We introduce a method to design a computationally efficient $G$-invariant neural network that approximates functions invariant to the action of a given permutation subgroup $G \leq S_n$ of the symmetric group on input data. The key element of the proposed network architecture is a new $G$-invariant transformation module, which produces a $G$-invariant latent representation of the input data. This latent representation is then processed with a multi-layer perceptron in the network. We prove the universality of the proposed architecture, discuss its properties and highlight its computational and memory efficiency. Theoretical considerations are supported by numerical experiments involving different network configurations, which demonstrate the effectiveness and strong generalization properties of the proposed method in comparison to other $G$-invariant neural networks.

The generalized linear bandit framework has attracted a lot of attention in recent years by extending the well-understood linear setting and allowing to model richer reward structures. It notably covers the logistic model, widely used when rewards are binary. For logistic bandits, the frequentist regret guarantees of existing algorithms are $\tilde{\mathcal{O}}(\kappa \sqrt{T})$, where $\kappa$ is a problem-dependent constant. Unfortunately, $\kappa$ can be arbitrarily large as it scales exponentially with the size of the decision set. This may lead to significantly loose regret bounds and poor empirical performance. In this work, we study the logistic bandit with a focus on the prohibitive dependencies introduced by $\kappa$. We propose a new optimistic algorithm based on a finer examination of the non-linearities of the reward function. We show that it enjoys a $\tilde{\mathcal{O}}(\sqrt{T})$ regret with no dependency in $\kappa$, but for a second order term. Our analysis is based on a new tail-inequality for self-normalized martingales, of independent interest.

Robotic vision introduces requirements for real-time processing of fast-varying, noisy information in a continuously changing environment. In a real-world environment, convenient assumptions, such as static camera systems and deep learning algorithms devouring high volumes of ideally slightly-varying data are hard to survive. Leveraging on recent studies on the neural connectome associated with eye movements, we designed a neuromorphic oculomotor controller and placed it at the heart of our in-house biomimetic robotic head prototype. The controller is unique in the sense that (1) all data are encoded and processed by a spiking neural network (SNN), and (2) by mimicking the associated brain areas' connectivity, the SNN required no training to operate. A biologically-constrained Hebbian learning further improved the SNN performance in tracking a moving target. Here, we report the tracking performance of the robotic head and show that the robotic eye kinematics are similar to those reported in human eye studies. This work contributes to our ongoing effort to develop energy-efficient neuromorphic SNN and harness their emerging intelligence to control biomimetic robots with versatility and robustness.

CPSs are widely used in all sorts of applications ranging from industrial automation to search-and-rescue. So far, in these applications they work either isolated with a high mobility or operate in a static networks setup. If mobile CPSs work cooperatively, it is in applications with relaxed real-time requirements. To enable such cooperation also in hard real-time applications we present a scheduling approach that is able to adapt real-time schedules to the changes that happen in mobile networks. We present a Mixed Integer Linear Programmingmodel and a heuristic to generate schedules for those networks. One of the key challenges is that running applications must not be interrupted while the schedule is adapted. Therefore, the scheduling has to consider the delay and jitter boundaries, given by the application, while generating the adapted schedule.

State Machine Replication (SMR) solutions often divide time into rounds, with a designated leader driving decisions in each round. Progress is guaranteed once all correct processes synchronize to the same round, and the leader of that round is correct. Recently suggested Byzantine SMR solutions such as HotStuff, Tendermint, and LibraBFT achieve progress with a linear message complexity and a constant time complexity once such round synchronization occurs. But round synchronization itself incurs an additional cost. By Dolev and Reischuk's lower bound, any deterministic solution must have $\Omega(n^2)$ communication complexity. Yet the question of randomized round synchronization with an expected linear message complexity remained open. We present an algorithm that, for the first time, achieves round synchronization with expected linear message complexity and expected constant latency. Existing protocols can use our round synchronization algorithm to solve Byzantine SMR with the same asymptotic performance.

Although cognitive engagement (CE) is crucial for motor learning, it remains underutilized in rehabilitation robots, partly because its assessment currently relies on subjective and gross measurements taken intermittently. Here, we propose an end-to-end computational framework that assesses CE in real-time, using electroencephalography (EEG) signals as objective measurements. The framework consists of i) a deep convolutional neural network (CNN) that extracts task-discriminative spatiotemporal EEG to predict the level of CE for two classes -- cognitively engaged vs. disengaged; and ii) a novel sliding window method that predicts continuous levels of CE in real-time. We evaluated our framework on 8 subjects using an in-house Go/No-Go experiment that adapted its gameplay parameters to induce cognitive fatigue. The proposed CNN had an average leave-one-out accuracy of 88.13\%. The CE prediction correlated well with a commonly used behavioral metric based on self-reports taken every 5 minutes ($\rho$=0.93). Our results objectify CE in real-time and pave the way for using CE as a rehabilitation parameter for tailoring robotic therapy to each patient's needs and skills.

The problem of distributed synthesis is to automatically generate a distributed algorithm, given a target communication network and a specification of the algorithm's correct behavior. Previous work has focused on static networks with an apriori fixed message size. This approach has two shortcomings: Recent work in distributed computing is shifting towards dynamically changing communication networks rather than static ones, and an important class of distributed algorithms are so-called full-information protocols, where nodes piggy-pack previously received messages onto current messages. In this work we consider the synthesis problem for a system of two nodes communicating in rounds over a dynamic link whose message size is not bounded. Given a network model, i.e., a set of link directions, in each round of the execution, the adversary choses a link from the network model, restricted only by the specification, and delivers messages according to the current link's directions. Motivated by communication buses with direct acknowledge mechanisms we further assume that nodes are aware of which messages have been delivered. We show that the synthesis problem is decidable for a network model if and only if it does not contain the empty link that dismisses both nodes' messages.

While there have been significant advances in de-tecting emotions in text, in the field of utter-ance-level emotion recognition (ULER), there are still many problems to be solved. In this paper, we address some challenges in ULER in dialog sys-tems. (1) The same utterance can deliver different emotions when it is in different contexts or from different speakers. (2) Long-range contextual in-formation is hard to effectively capture. (3) Unlike the traditional text classification problem, this task is supported by a limited number of datasets, among which most contain inadequate conversa-tions or speech. To address these problems, we propose a hierarchical transformer framework (apart from the description of other studies, the "transformer" in this paper usually refers to the encoder part of the transformer) with a lower-level transformer to model the word-level input and an upper-level transformer to capture the context of utterance-level embeddings. We use a pretrained language model bidirectional encoder representa-tions from transformers (BERT) as the lower-level transformer, which is equivalent to introducing external data into the model and solve the problem of data shortage to some extent. In addition, we add speaker embeddings to the model for the first time, which enables our model to capture the in-teraction between speakers. Experiments on three dialog emotion datasets, Friends, EmotionPush, and EmoryNLP, demonstrate that our proposed hierarchical transformer network models achieve 1.98%, 2.83%, and 3.94% improvement, respec-tively, over the state-of-the-art methods on each dataset in terms of macro-F1.

We explain how the popular, highly abstract MapReduce model of parallel computation (MRC) can be rooted in reality by explaining how it can be simulated on realistic distributed-memory parallel machine models like BSP. We first refine the model (MRC$^+$) to include parameters for total work $w$, bottleneck work $\hat{w}$, data volume $m$, and maximum object sizes $\hat{m}$. We then show matching upper and lower bounds for executing a MapReduce calculation on the distributed-memory machine -- $\Theta(w/p+\hat{w}+\log p)$ work and $\Theta(m/p+\hat{m}+\log p)$ bottleneck communication volume using $p$ processors.

The spectral deferred correction (SDC) method is class of iterative solvers for ordinary differential equations (ODEs). It can be interpreted as a preconditioned Picard iteration for the collocation problem. The convergence of this method is well-known, for suitable problems it gains one order per iteration up to the order of the quadrature method of the collocation problem provided. This appealing feature enables an easy creation of flexible, high-order accurate methods for ODEs. A variation of SDC are multi-level spectral deferred corrections (MLSDC). Here, iterations are performed on a hierarchy of levels and an FAS correction term, as in nonlinear multigrid methods, couples solutions on different levels. While there are several numerical examples which show its capabilities and efficiency, a theoretical convergence proof is still missing. This paper addresses this issue. A proof of the convergence of MLSDC, including the determination of the convergence rate in the time-step size, will be given and the results of the theoretical analysis will be numerically demonstrated. It turns out that there are restrictions for the advantages of this method over SDC regarding the convergence rate.

We study the relation of containment up to unknown regular resynchronization between two-way non-deterministic transducers. We show that it constitutes a preorder, and that the corresponding equivalence relation is properly intermediate between origin equivalence and classical equivalence. We give a syntactical characterization for containment of two transducers up to resynchronization, and use it to show that this containment relation is undecidable already for one-way non-deterministic transducers, and for simple classes of resynchronizations. This answers the open problem stated in recent works, asking whether this relation is decidable for two-way non-deterministic transducers.

This paper shows how the use of Structural Operational Semantics (SOS) in the style popularized by the process-algebra community can lead to a more succinct and useful construction for building finite automata from regular expressions. Such constructions have been known for decades, and form the basis for the proofs of one direction of Kleene's Theorem. The purpose of the new construction is, on the one hand, to show students how small automata can be constructed, without the need for empty transitions, and on the other hand to show how the construction method admits closure proofs of regular languages with respect to other operators as well. These results, while not theoretically surprising, point to an additional influence of process-algebraic research: in addition to providing fundamental insights into the nature of concurrent computation, it also sheds new light on old, well-known constructions in automata theory.

Nowadays, a significant portion of daily interacted posts in social media are infected by rumors. This study investigates the problem of rumor analysis in different areas from other researches. It tackles the unaddressed problem related to calculating the Spread Power of Rumor (SPR) for the first time and seeks to examine the spread power as the function of multi-contextual features. For this purpose, the theory of Allport and Postman will be adopted. In which it claims that there are two key factors determinant to the spread power of rumors, namely importance and ambiguity. The proposed Rumor Spread Power Measurement Model (RSPMM) computes SPR by utilizing a textual-based approach which entails contextual features to compute the spread power of the rumors in two categories: False Rumor (FR) and True Rumor (TR). Totally 51 contextual features are introduced to measure SPR and their impact on classification are investigated, then 42 features in two categories "importance" (28 features) and "ambiguity" (14 features) are selected to compute SPR. The proposed RSPMM is verified on two labelled datasets, which are collected from Twitter and Telegram. The results show that (i) the proposed new features are effective and efficient to discriminate between FRs and TRs. (ii) the proposed RSPMM approach focused only on contextual features while existing techniques are based on Structure and Content features, but RSPMM achieves considerably outstanding results (F-measure=83%). (iii) The result of T-Test shows that SPR criteria can significantly distinguish between FR and TR, in addition, it can be useful as a new method to verify trueness of rumors.

SURFACE, stands for Secure, Use-case adaptive, and Relatively Fork-free Approach of Chain Extension, is a consensus algorithm that can be used for blockchains in real-world networks that enjoys the benefits from both Nakamoto consensus and Byzantine Fault Tolerance (BFT) consensus. In SURFACE, a committee is randomly selected every round to validate and endorse the proposed block. The size of the committee could be set accordingly to the network condition in such a way that the blockchain is almost fork-free with minimum overhead in communication. Hence, it achieves fast probabilistic confirmation with high throughput and low latency if the network is not in extreme situations such as large network partition or under attack. At the meantime, a BFT based mechanism is used to guarantee consistency in the extreme situations.

We study the incentives of banks in a financial network, where the network consists of debt contracts and credit default swaps (CDSs) between banks. One of the most important questions in such a system is the problem of deciding which of the banks are in default, and how much of their liabilities these banks can pay. We study the payoff and preferences of the banks in the different solutions to this problem. We also introduce a more refined model which allows assigning priorities to payment obligations; this provides a more expressive and realistic model of real-life financial systems, while it always ensures the existence of a solution.

The main focus of the paper is an analysis of the actions that a single bank can execute in a financial system in order to influence the outcome to its advantage. We show that removing an incoming debt, or donating funds to another bank can result in a single new solution that is strictly more favorable to the acting bank. We also show that increasing the bank's external funds or modifying the priorities of outgoing payments cannot introduce a more favorable new solution into the system, but may allow the bank to remove some unfavorable solutions, or to increase its recovery rate. Finally, we show how the actions of two banks in a simple financial system can result in classical game theoretic situations like the prisoner's dilemma or the dollar auction, demonstrating the wide expressive capability of the financial system model.

Addressing a quest by Gupta et al. [ICALP'14], we provide a first, comprehensive study of finding a short s-t path in the multistage graph model, referred to as the Multistage s-t Path problem. Herein, given a sequence of graphs over the same vertex set but changing edge sets, the task is to find short s-t paths in each graph ("snapshot") such that in the resulting path sequence the consecutive s-t paths are "similar". We measure similarity by the size of the symmetric difference of either the vertex set (vertex-similarity) or the edge set (edge-similarity) of any two consecutive paths. We prove that the two variants of Multistage s-t Path are already NP-hard for an input sequence of only two graphs. Motivated by this fact and natural applications of this scenario e.g. in traffic route planning, we perform a parameterized complexity analysis. Among other results, we prove parameterized hardness (W[1]-hardness) regarding the size of the path sequence (solution size) for both variants, vertex- and edge-similarity. As a novel conceptual contribution, we then modify the multistage model by asking for dissimilar consecutive paths. As one of the main results, we prove that dissimilarity allows for fixed-parameter tractability for the parameter solution size, thereby contrasting our W[1]-hardness proof of the corresponding similarity case.

A marked free monoid morphism is a morphism for which the image of each generator starts with a different letter, and immersions are the analogous maps in free groups. We show that the (simultaneous) PCP is decidable for immersions of free groups, and provide an algorithm to compute bases for the sets, called equalisers, on which the immersions take the same values. We also answer a question of Stallings about the rank of the equaliser.

Analogous results are proven for marked morphisms of free monoids.

Accurate and timely metro passenger flow forecasting is critical for the successful deployment of intelligent transportation systems. However, it is quite challenging to propose an efficient and robust forecasting approach due to the inherent randomness and variations of metro passenger flow. In this study, we present a novel adaptive ensemble (AdaEnsemble) learning approach to accurately forecast the volume of metro passenger flows, and it combines the complementary advantages of variational mode decomposition (VMD), seasonal autoregressive integrated moving averaging (SARIMA), multilayer perceptron network (MLP) and long short-term memory (LSTM) network. The AdaEnsemble learning approach consists of three important stages. The first stage applies VMD to decompose the metro passenger flows data into periodic component, deterministic component and volatility component. Then we employ SARIMA model to forecast the periodic component, LSTM network to learn and forecast deterministic component and MLP network to forecast volatility component. In the last stage, the diverse forecasted components are reconstructed by another MLP network. The empirical results show that our proposed AdaEnsemble learning approach not only has the best forecasting performance compared with the state-of-the-art models but also appears to be the most promising and robust based on the historical passenger flow data in Shenzhen subway system and several standard evaluation measures.

Virtual reality (VR) is on the edge of getting a mainstream platform for gaming, education and product design. The feeling of being present in the virtual world is influenced by many factors and even more intriguing a single negative influence can destroy the illusion that was created with a lot of effort by other measures. Therefore, it is crucial to have a balance between the influencing factors, know the importance of the factors and have a good estimation of how much effort it takes to bring each factor to a certain level of fidelity. This paper collects influencing factors discussed in literature, analyses the immersion of current off-the-shelf VR-solutions and presents results from an empirical study on efforts and benefits from certain aspects influencing presence in VR experiences. It turns out, that sometimes delivering high fidelity is easier to achieve than medium fidelity and for other aspects it is worthwhile investing more effort to achieve higher fidelity to improve presence a lot.

With the Single-Instance Multi-Tenancy (SIMT) model for composite Software-as-a-Service (SaaS) applications, a single composite application instance can host multiple tenants, yielding the benefits of better service and resource utilization, and reduced operational cost for the SaaS provider. An SIMT application needs to share services and their aggregation (the application) among its tenants while supporting variations in the functional and performance requirements of the tenants. The SaaS provider requires a middleware environment that can deploy, enact and manage a designed SIMT application, to achieve the varied requirements of the different tenants in a controlled manner. This paper presents the SDSN@RT (Software-Defined Service Networks @ RunTime) middleware environment that can meet the aforementioned requirements. SDSN@RT represents an SIMT composite cloud application as a multi-tenant service network, where the same service network simultaneously hosts a set of virtual service networks (VSNs), one for each tenant. A service network connects a set of services, and coordinates the interactions between them. A VSN realizes the requirements for a specific tenant and can be deployed, configured, and logically isolated in the service network at runtime. SDSN@RT also supports the monitoring and runtime changes of the deployed multi-tenant service networks. We show the feasibility of SDSN@RT with a prototype implementation, and demonstrate its capabilities to host SIMT applications and support their changes with a case study. The performance study of the prototype implementation shows that the runtime capabilities of our middleware incur little overhead.

The proliferation of connected devices and emergence of internet-of-everything represent a major challenge for broadband wireless networks. This requires a paradigm shift towards the development of innovative technologies for next generation wireless systems. One of the key challenges is the scarcity of spectrum, owing to the unprecedented broadband penetration rate in recent years. A promising solution is the proposal of visible light communications (VLC), which explores the unregulated visible light spectrum to enable high-speed communications, in addition to efficient lighting. This solution offers a wider bandwidth that can accommodate ubiquitous broadband connectivity to indoor users and offload data traffic from cellular networks. Although VLC is secure and able to overcome the shortcomings of RF systems, it suffers from several limitations, e.g., limited modulation bandwidth. In this respect, solutions have been proposed recently to overcome this limitation. In particular, most common orthogonal and non-orthogonal multiple access techniques initially proposed for RF systems, e.g., space-division multiple access (SDMA) and NOMA, have been considered in the context of VLC. In spite of their promising gains, the performance of these techniques is somewhat limited. Consequently, in this article a new and generalized multiple access technique, called rate-splitting multiple access (RSMA), is introduced and investigated for the first time in VLC networks. We first provide an overview of the key multiple access technologies used in VLC systems. Then, we propose the first comprehensive approach to the integration of RSMA with VLC systems. In our proposed framework, SINR expressions are derived and used to evaluate the weighted sum rate (WSR) of a two-user scenario. Our results illustrate the flexibility of RSMA in generalizing NOMA and SDMA, and its WSR superiority in the VLC context.

Multi-field packet classification is a crucial component in modern software-defined data center networks. To achieve high throughput and low latency, state-of-the-art algorithms strive to fit the rule lookup data structures into on-die caches; however, they do not scale well with the number of rules. We present a novel approach, NuevoMatch, which improves the memory scaling of existing methods. A new data structure, Range Query Recursive Model Index (RQ-RMI), is the key component that enables NuevoMatch to replace most of the accesses to main memory with model inference computations. We describe an efficient training algorithm which guarantees the correctness of the RQ-RMI-based classification. The use of RQ-RMI allows the packet rules to be compressed into model weights that fit into the hardware cache and takes advantage of the growing support for fast neural network processing in modern CPUs, such as wide vector processing engines, achieving a rate of tens of nanoseconds per lookup. Our evaluation using 500K multi-field rules from the standard ClassBench benchmark shows a geomean compression factor of 4.9X, 8X, and 82X, and an average performance improvement of 2.7X, 4.4X and 2.6X in latency and 1.3X, 2.2X, and 1.2X in throughput compared to CutSplit, NeuroCuts, and TupleMerge, all state-of-the-art algorithms.

In this project, we aim to classify the speech taken as one of the four emotions namely, sadness, anger, fear and happiness. The samples that have been taken to complete this project are taken from Linguistic Data Consortium (LDC) and UGA database. The important characteristics determined from the samples are energy, pitch, MFCC coefficients, LPCC coefficients and speaker rate. The classifier used to classify these emotional states is Support Vector Machine (SVM) and this is done using two classification strategies: One against All (OAA) and Gender Dependent Classification. Furthermore, a comparative analysis has been conducted between the two and LPCC and MFCC algorithms as well.

A comprehensive and high-quality lexicon plays a crucial role in traditional text classification approaches. And it improves the utilization of the linguistic knowledge. Although it is helpful for the task, the lexicon has got little attention in recent neural network models. Firstly, getting a high-quality lexicon is not easy. We lack an effective automated lexicon extraction method, and most lexicons are hand crafted, which is very inefficient for big data. What's more, there is no an effective way to use a lexicon in a neural network. To address those limitations, we propose a Pre-Attention mechanism for text classification in this paper, which can learn attention of different words according to their effects in the classification tasks. The words with different attention can form a domain lexicon. Experiments on three benchmark text classification tasks show that our models get competitive result comparing with the state-of-the-art methods. We get 90.5% accuracy on Stanford Large Movie Review dataset, 82.3% on Subjectivity dataset, 93.7% on Movie Reviews. And compared with the text classification model without Pre-Attention mechanism, those with Pre-Attention mechanism improve by 0.9%-2.4% accuracy, which proves the validity of the Pre-Attention mechanism. In addition, the Pre-Attention mechanism performs well followed by different types of neural networks (e.g., convolutional neural networks and Long Short-Term Memory networks). For the same dataset, when we use Pre-Attention mechanism to get attention value followed by different neural networks, those words with high attention values have a high degree of coincidence, which proves the versatility and portability of the Pre-Attention mechanism. we can get stable lexicons by attention values, which is an inspiring method of information extraction.

Machine learning has emerged as a promising paradigm for enabling connected, automated vehicles to autonomously cruise the streets and react to unexpected situations. A key challenge, however, is to collect and select real-time and reliable information for the correct classification of unexpected, and often rare, situations that may happen on the road. Indeed, the data generated by vehicles, or received from neighboring vehicles, may be affected by errors or have different levels of resolution and freshness. To tackle this challenge, we propose an active learning framework that, leveraging the information collected through onboard sensors as well as received from other vehicles, effectively deals with scarce and noisy data. In particular, given the available information, our solution selects the data to add to the training set by trading off between two essential features, namely, quality and diversity. The results, obtained using real-world data sets, show that the proposed method significantly outperforms state-of-the-art solutions, providing high classification accuracy at the cost of a limited bandwidth requirement for the data exchange between vehicles.

Massive multiple-input multiple-output (MIMO) systems have attracted much attention lately due to the many advantages they provide over single-antenna systems. Owing to the many antennas, low-cost implementation and low power consumption per antenna are desired. To that end, massive MIMO structures with low-resolution analog-to-digital converters (ADC) have been investigated in many studies. However, the effect of a strong interferer in the adjacent band on quantized massive MIMO systems have not been examined yet. In this study, we analyze the performance of uplink massive MIMO with low-resolution ADCs under frequency selective fading with orthogonal frequency division multiplexing in the perfect and imperfect receiver channel state information cases. We derive analytical expressions for the bit error rate and ergodic capacity. We show that the interfering band can be suppressed by increasing the number of antennas or the oversampling rate when a zero-forcing receiver is employed.

The start up costs in many kinds of generators lead to complex cost structures, which in turn yield severe market loopholes in the locational marginal price (LMP) scheme. Convex hull pricing (a.k.a. extended LMP) is proposed to improve the market efficiency by providing the minimal uplift payment to the generators. In this letter, we consider a stylized model where all generators share the same generation capacity. We analyze the generators' possible strategic behaviors in such a setting, and then propose an index for market power quantification in the convex hull pricing schemes.

We consider two agents playing simultaneously the same stochastic three-armed bandit problem. The two agents are cooperating but they cannot communicate. We propose a strategy with no collisions at all between the players (with very high probability), and with near-optimal regret $O(\sqrt{T \log(T)})$. We also provide evidence that the extra logarithmic term $\sqrt{\log(T)}$ is necessary, with a lower bound for a variant of the problem.

In burst-mode communication systems, the quality of frame synchronization (FS) at receivers significantly impacts the overall system performance. To guarantee FS, an extreme learning machine (ELM)-based synchronization method is proposed to overcome the nonlinear distortion caused by nonlinear devices or blocks. In the proposed method, a preprocessing is first performed to capture the coarse features of synchronization metric (SM) by using empirical knowledge. Then, an ELM-based FS network is employed to reduce system's nonlinear distortion and improve SMs. Experimental results indicate that, compared with existing methods, our approach could significantly reduce the error probability of FS while improve the performance in terms of robustness and generalization.

Homogenization is a technique commonly used in multiscale computational science and engineering for predicting collective response of heterogeneous materials and extracting effective mechanical properties. In this paper, a three-dimensional deep convolutional neural network (3D-CNN) is proposed to predict the effective material properties for representative volume elements (RVEs) with random spherical inclusions. The high-fidelity dataset generated by a computational homogenization approach is used for training the 3D-CNN models. The inference results of the trained networks on unseen data indicate that the network is capable of capturing the microstructural features of RVEs and produces an accurate prediction of effective stiffness and Poisson's ratio. The benefits of the 3D-CNN over conventional finite-element-based homogenization with regard to computational efficiency, uncertainty quantification and model's transferability are discussed in sequence. We find the salient features of the 3D-CNN approach make it a potentially suitable alternative for facilitating material design with fast product design iteration and efficient uncertainty quantification.

Inspired by the recent advances in deep learning (DL), this work presents a deep neural network aided decoding algorithm for binary linear codes. Based on the concept of deep unfolding, we design a decoding network by unfolding the alternating direction method of multipliers (ADMM)-penalized decoder. In addition, we propose two improved versions of the proposed network. The first one transforms the penalty parameter into a set of iteration-dependent ones, and the second one adopts a specially designed penalty function, which is based on a piecewise linear function with adjustable slopes. Numerical results show that the resulting DL-aided decoders outperform the original ADMM-penalized decoder for various low density parity check (LDPC) codes with similar computational complexity.

In this paper, a new take on the concept of an active subspace for reducing the dimension of the design parameter space in a multidisciplinary analysis and optimization (MDAO) problem is proposed. The new approach is intertwined with the concepts of adaptive parameter sampling, projection-based model order reduction, and a database of linear, projection-based reduced-order models equipped with interpolation on matrix manifolds, in order to construct an efficient computational framework for MDAO. The framework is fully developed for MDAO problems with linearized fluid-structure interaction constraints. It is applied to the aeroelastic tailoring, under flutter constraints, of two different flight systems: a flexible configuration of NASA's Common Research Model; and NASA's Aeroelastic Research Wing #2 (ARW-2). The obtained results illustrate the feasibility of the computational framework for realistic MDAO problems and highlight the benefits of the new approach for constructing an active subspace in both terms of solution optimality and wall-clock time reduction

With the rapid development of manufacturing industry, machine fault diagnosis has become increasingly significant to ensure safe equipment operation and production. Consequently, multifarious approaches have been explored and developed in the past years, of which intelligent algorithms develop particularly rapidly. Convolutional neural network, as a typical representative of intelligent diagnostic models, has been extensively studied and applied in recent five years, and a large amount of literature has been published in academic journals and conference proceedings. However, there has not been a systematic review to cover these studies and make a prospect for the further research. To fill in this gap, this work attempts to review and summarize the development of the Convolutional Network based Fault Diagnosis (CNFD) approaches comprehensively. Generally, a typical CNFD framework is composed of the following steps, namely, data collection, model construction, and feature learning and decision making, thus this paper is organized by following this stream. Firstly, data collection process is described, in which several popular datasets are introduced. Then, the fundamental theory from the basic convolutional neural network to its variants is elaborated. After that, the applications of CNFD are reviewed in terms of three mainstream directions, i.e. classification, prediction and transfer diagnosis. Finally, conclusions and prospects are presented to point out the characteristics of current development, facing challenges and future trends. Last but not least, it is expected that this work would provide convenience and inspire further exploration for researchers in this field.

Cloud-RAN is a recent architecture for mobile networks where the processing units are located in distant data-centers while, until now, they were attached to antennas. The main challenge, to fulfill protocol time constraints, is to guarantee a low latency for the periodic messages sent from each antenna to its processing unit and back. The problem we address is to find a sending scheme of these periodic messages without contention nor buffering. We focus on a simple but common star shaped topology, where all contentions are on a single link shared by all antennas. For messages of arbitrary size, we show that there is always a solution as soon as the load of the network is less than $40\%$. Moreover, we explain how we can restrict our study to message of size $1$ without increasing too much the global latency. For message of size $1$, we prove that it is always possible to schedule them, when the load is less than $61\%$ using a polynomial time algorithm. Moreover, using a simple random greedy algorithm, we show that almost all instances of a given load admit a solution, explaining why most greedy algorithms work so well in practice.

Gerrymandering is a practice of manipulating district boundaries and locations in order to achieve a political advantage for a particular party. Lewenberg, Lev, and Rosenschein [AAMAS 2017] initiated the algorithmic study of a geographically-based manipulation problem, where voters must vote at the ballot box closest to them. In this variant of gerrymandering, for a given set of possible locations of ballot boxes and known political preferences of $n$ voters, the task is to identify locations for $k$ boxes out of $m$ possible locations to guarantee victory of a certain party in at least $l$ districts. Here integers $k$ and $l$ are some selected parameter.

It is known that the problem is NP-complete already for 4 political parties and prior to our work only heuristic algorithms for this problem were developed. We initiate the rigorous study of the gerrymandering problem from the perspectives of parameterized and fine-grained complexity and provide asymptotically matching lower and upper bounds on its computational complexity. We prove that the problem is W[1]-hard parameterized by $k+n$ and that it does not admit an $f(n,k)\cdot m^{o(\sqrt{k})}$ algorithm for any function $f$ of $k$ and $n$ only, unless Exponential Time Hypothesis (ETH) fails. Our lower bounds hold already for $2$ parties. On the other hand, we give an algorithm that solves the problem for a constant number of parties in time $(m+n)^{O(\sqrt{k})}$.

In online social network (OSN), understanding the factors bound to the role and strength of interaction(tie) are essential to model a wide variety of network-based applications. The recognition of these interactions can enhance the accuracy of link prediction, improve in ranking of dominants, reliability of recommendation and enhance targeted marketing as decision support system. In recent years, research interest on tie strength measures in OSN and its applications to diverse areas have increased, therefore it needs a comprehensive review covering tie strength estimation systematically. The objective of this paper is to provide an in-depth review, analyze and explore the tie strength in online social networks. A methodical category for tie strength estimation techniques are discussed and analyzed in a wide variety of network types. Representative applications of tie strength estimation are also addressed. Finally, a set of future challenges of the tie strength in online social networks is discussed.

We study the maximal independent set (MIS) and maximum independent set (MAX-IS) problems on dynamic sets of $O(n)$ axis-parallel rectangles, which can be modeled as dynamic rectangle intersection graphs. We consider the fully dynamic vertex update (insertion/deletion) model for two types of rectangles: (i) uniform height and width and (ii) uniform height and arbitrary width. These types of dynamic vertex update problems arise, e.g., in interactive map labeling. We present the first deterministic algorithm for maintaining a MIS (and thus a 4-approximate MAX-IS) of a dynamic set of uniform rectangles with amortized sub-logarithmic update time. This breaks the natural barrier of $O(\Delta)$ update time (where $\Delta$ is the maximum degree in the graph) for vertex updates presented by Assadi et al. (STOC 2018). We continue by investigating MAX-IS and provide a series of deterministic dynamic approximation schemes. For uniform rectangles, we first give an algorithm that maintains a $4$-approximate MAX-IS with $O(1)$ update time. In a subsequent algorithm, we establish the trade-off between approximation quality $2(1+\frac{1}{k})$ and update time $O(k^2\log n)$ for $k\in \mathbb{N}$. We conclude with an algorithm that maintains a $2$-approximate MAX-IS for dynamic sets of uniform height and arbitrary width rectangles with $O(\omega \log n)$ update time, where $\omega$ is the largest number of maximal cliques stabbed by any axis-parallel line. We have implemented our algorithms and report the results of an experimental comparison exploring the trade-off between solution size and update time for synthetic and real-world map labeling data sets.

A popular model to measure network stability is the $k$-core, that is the maximal induced subgraph in which every vertex has degree at least $k$. For example, $k$-cores are commonly used to model the unraveling phenomena in social networks. In this model, users having less than $k$ connections within the network leave it, so the remaining users form exactly the $k$-core. In this paper we study the question whether it is possible to make the network more robust by spending only a limited amount of resources on new connections. A mathematical model for the $k$-core construction problem is the following Edge $k$-Core optimization problem. We are given a graph $G$ and integers $k$, $b$ and $p$. The task is to ensure that the $k$-core of $G$ has at least $p$ vertices by adding at most $b$ edges.

The previous studies on Edge $k$-Core demonstrate that the problem is computationally challenging. In particular, it is NP-hard when $k=3$, W[1]-hard being parameterized by $k+b+p$ (Chitnis and Talmon, 2018), and APX-hard (Zhou et al, 2019). Nevertheless, we show that there are efficient algorithms with provable guarantee when the $k$-core has to be constructed from a sparse graph with some additional structural properties. Our results are 1) When the input graph is a forest, Edge $k$-Core is solvable in polynomial time; 2) Edge $k$-Core is fixed-parameter tractable (FPT) being parameterized by the minimum size of a vertex cover in the input graph. On the other hand, with such parameterization, the problem does not admit a polynomial kernel subject to a widely-believed assumption from complexity theory; 3) Edge $k$-Core is FPT parameterized by $\mathrm{tw}+k$. This improves upon the result of Chitnis and Talmon by not requiring $b$ to be small. Each of our algorithms is built upon a new graph-theoretical result interesting in its own.

Medical images differ from natural images in significantly higher resolutions and smaller regions of interest. Because of these differences, neural network architectures that work well for natural images might not be applicable to medical image analysis. In this work, we extend the globally-aware multiple instance classifier, a framework we proposed to address these unique properties of medical images. This model first uses a low-capacity, yet memory-efficient, network on the whole image to identify the most informative regions. It then applies another higher-capacity network to collect details from chosen regions. Finally, it employs a fusion module that aggregates global and local information to make a final prediction. While existing methods often require lesion segmentation during training, our model is trained with only image-level labels and can generate pixel-level saliency maps indicating possible malignant findings. We apply the model to screening mammography interpretation: predicting the presence or absence of benign and malignant lesions. On the NYU Breast Cancer Screening Dataset, consisting of more than one million images, our model achieves an AUC of 0.93 in classifying breasts with malignant findings, outperforming ResNet-34 and Faster R-CNN. Compared to ResNet-34, our model is 4.1x faster for inference while using 78.4% less GPU memory. Furthermore, we demonstrate, in a reader study, that our model surpasses radiologist-level AUC by a margin of 0.11. The proposed model is available online: https://github.com/nyukat/GMIC.

Although freelancing work has grown substantially in recent years, in part facilitated by a number of online labor marketplaces, (e.g., Guru, Freelancer, Amazon Mechanical Turk), traditional forms of "in-sourcing" work continue being the dominant form of employment. This means that, at least for the time being, freelancing and salaried employment will continue to co-exist. In this paper, we provide algorithms for outsourcing and hiring workers in a general setting, where workers form a team and contribute different skills to perform a task. We call this model team formation with outsourcing. In our model, tasks arrive in an online fashion: neither the number nor the composition of the tasks is known a-priori. At any point in time, there is a team of hired workers who receive a fixed salary independently of the work they perform. This team is dynamic: new members can be hired and existing members can be fired, at some cost. Additionally, some parts of the arriving tasks can be outsourced and thus completed by non-team members, at a premium. Our contribution is an efficient online cost-minimizing algorithm for hiring and firing team members and outsourcing tasks. We present theoretical bounds obtained using a primal-dual scheme proving that our algorithms have a logarithmic competitive approximation ratio. We complement these results with experiments using semi-synthetic datasets based on actual task requirements and worker skills from three large online labor marketplaces.

Pathology slides of lung malignancies are classified using the "Salient Slices" technique described in Frank et al., 2020. A four-fold cross-validation study using a small image set (42 adenocarcinoma slides and 42 squamous cell carcinoma slides) produced fully correct classifications in each fold. Probability maps enable visualization of the underlying basis for a classification.

In this paper, we present a topology optimization (TO) framework to enable automated design of mechanical components while ensuring the result can be manufactured using multi-axis machining. Although TO improves the part's performance, the as-designed model is often geometrically too complex to be machined and the as-manufactured model can significantly vary due to machining constraints that are not accounted for during TO. In other words, many of the optimized design features cannot be accessed by a machine tool without colliding with the part (or fixtures). The subsequent post-processing to make the part machinable with the given setup requires trial-and-error without guarantees on preserving the optimized performance. Our proposed approach is based on the well-established accessibility analysis formulation using convolutions in configuration space that is extensively used in spatial planning and robotics. We define an 'inaccessibility measure field' (IMF) over the design domain to identify non-manufacturable features and quantify their contribution to non-manufacturability. The IMF is used to penalize the sensitivity field of performance objectives and constraints to prevent formation of inaccessible regions. Unlike existing discrete formulations, our IMF provides a continuous spatial field that is desirable for TO convergence. Our approach applies to arbitrary geometric complexity of the part, tools, and fixtures, and is highly parallelizable on multi-core architecture. We demonstrate the effectiveness of our framework on benchmark and realistic examples in 2D and 3D. We also show that it is possible to directly construct manufacturing plans for the optimized designs based on the accessibility information.

Automatic speaker verification systems are vulnerable to audio replay attacks which bypass security by replaying recordings of authorized speakers. Replay attack detection (RA) detection systems built upon Residual Neural Networks (ResNet)s have yielded astonishing results on the public benchmark ASVspoof 2019 Physical Access challenge. With most teams using fine-tuned feature extraction pipelines and model architectures, the generalizability of such systems remains questionable though. In this work, we analyse the effect of discriminative feature learning in a multi-task learning (MTL) setting can have on the generalizability and discriminability of RA detection systems. We use a popular ResNet architecture optimized by the cross-entropy criterion as our baseline and compare it to the same architecture optimized by MTL using Siamese Neural Networks (SNN). It can be shown that SNN outperform the baseline by relative 26.8 % Equal Error Rate (EER). We further enhance the model's architecture and demonstrate that SNN with additional reconstruction loss yield another significant improvement of relative 13.8 % EER.

Iterative linear quadradic regulator(iLQR) has become a benchmark method to deal with nonlinear stochastic optimal control problem. However, it does not apply to delay system. In this paper, we extend the iLQR theory and prove new theorem in case of input signal with fixed delay. Which could be beneficial for machine learning or optimal control application to real time robot or human assistive device.

We consider the problem of downlink power control in wireless networks, consisting of multiple transmitter-receiver pairs communicating with each other over a single shared wireless medium. To mitigate the interference among concurrent transmissions, we leverage the network topology to create a graph neural network architecture, and we then use an unsupervised primal-dual counterfactual optimization approach to learn optimal power allocation decisions. We show how the counterfactual optimization technique allows us to guarantee a minimum rate constraint, which adapts to the network size, hence achieving the right balance between average and $5^{th}$ percentile user rates throughout a range of network configurations.

The goal of this paper is to develop a novel numerical method for efficient multiplicative noise removal. The nonlocal self-similarity of natural images implies that the matrices formed by their nonlocal similar patches are low-rank. By exploiting this low-rank prior with application to multiplicative noise removal, we propose a nonlocal low-rank model for this task and develop a proximal alternating reweighted minimization (PARM) algorithm to solve the optimization problem resulting from the model. Specifically, we utilize a generalized nonconvex surrogate of the rank function to regularize the patch matrices and develop a new nonlocal low-rank model, which is a nonconvex nonsmooth optimization problem having a patchwise data fidelity and a generalized nonlocal low-rank regularization term. To solve this optimization problem, we propose the PARM algorithm, which has a proximal alternating scheme with a reweighted approximation of its subproblem. A theoretical analysis of the proposed PARM algorithm is conducted to guarantee its global convergence to a critical point. Numerical experiments demonstrate that the proposed method for multiplicative noise removal significantly outperforms existing methods such as the benchmark SAR-BM3D method in terms of the visual quality of the denoised images, and the PSNR (the peak-signal-to-noise ratio) and SSIM (the structural similarity index measure) values.

We present a method for financial time series forecasting using representation learning techniques. Recent progress on deep autoregressive models has shown their ability to capture long-term dependencies of the sequence data. However, the shortage of available financial data for training will make the deep models susceptible to the overfitting problem. In this paper, we propose a neural-network-powered conditional mutual information (CMI) estimator for learning representations for the forecasting task. Specifically, we first train an encoder to maximize the mutual information between the latent variables and the label information conditioned on the encoded observed variables. Then the features extracted from the trained encoder are used to learn a subsequent logistic regression model for predicting time series movements. Our proposed estimator transforms the CMI maximization problem to a classification problem whether two encoded representations are sampled from the same class or not. This is equivalent to perform pairwise comparisons of the training datapoints, and thus, improves the generalization ability of the deep autoregressive model. Empirical experiments indicate that our proposed method has the potential to advance the state-of-the-art performance.

IoT devices are decentralized and deployed in un-stable environments, which causes them to be prone to various kinds of faults, such as device failure and network disruption. Yet, current IoT platforms require programmers to handle faults manually, a complex and error-prone task. In this paper, we present IoTRepair, a fault-handling system for IoT that (1)integrates a fault identification module to track faulty devices,(2) provides a library of fault-handling functions for effectively handling different fault types, (3) provides a fault handler on top of the library for autonomous IoT fault handling, with user and developer configuration as input. Through an evaluation in a simulated lab environment and with various fault injectio nmethods,IoTRepair is compared with current fault-handling solutions. The fault handler reduces the incorrect states on average 50.01%, which corresponds to less unsafe and insecure device states. Overall, through a systematic design of an IoT fault handler, we provide users flexibility and convenience in handling complex IoT fault handling, allowing safer IoT environments.

Arbitrary style transfer is the task of synthesis of an image that has never been seen before, using two given images: content image and style image. The content image forms the structure, the basic geometric lines and shapes of the resulting image, while the style image sets the color and texture of the result. The word "arbitrary" in this context means the absence of any one pre-learned style. So, for example, convolutional neural networks capable of transferring a new style only after training or retraining on a new amount of data are not con-sidered to solve such a problem, while networks based on the attention mech-anism that are capable of performing such a transformation without retraining - yes. An original image can be, for example, a photograph, and a style image can be a painting of a famous artist. The resulting image in this case will be the scene depicted in the original photograph, made in the stylie of this picture. Recent arbitrary style transfer algorithms make it possible to achieve good re-sults in this task, however, in processing portrait images of people, the result of such algorithms is either unacceptable due to excessive distortion of facial features, or weakly expressed, not bearing the characteristic features of a style image. In this paper, we consider an approach to solving this problem using the combined architecture of deep neural networks with a attention mechanism that transfers style based on the contents of a particular image segment: with a clear predominance of style over the form for the background part of the im-age, and with the prevalence of content over the form in the image part con-taining directly the image of a person.

For the purpose of addressing the multi-objective optimal reactive power dispatch (MORPD) problem, a two-step approach is proposed in this paper. First of all, to ensure the economy and security of the power system, the MORPD model aiming to minimize active power loss and voltage deviation is formulated. And then the two-step approach integrating decision-making into optimization is proposed to solve the model. Specifically speaking, the first step aims to seek the Pareto optimal solutions (POSs) with good distribution by using a multi-objective optimization (MOO) algorithm named classification and Pareto domination based multi-objective evolutionary algorithm (CPSMOEA). Furthermore, the reference Pareto-optimal front is generated to validate the Pareto front obtained using CPSMOEA; in the second step, integrated decision-making by combining fuzzy c-means algorithm (FCM) with grey relation projection method (GRP) aims to extract the best compromise solutions which reflect the preferences of decision-makers from the POSs. Based on the test results on the IEEE 30-bus and IEEE 118-bus test systems, it is demonstrated that the proposed approach not only manages to address the MORPD issue but also outperforms other commonly-used MOO algorithms including multi-objective particle swarm optimization (MOPSO), preference-inspired coevolutionary algorithm (PICEAg) and the third evolution step of generalized differential evolution (GDE3).

The compact Merkle multiproof is a new and significantly more memory-efficient way to generate and verify sparse Merkle multiproofs. A standard sparse Merkle multiproof requires to store an index for every non-leaf hash in the multiproof. The compact Merkle multiproof on the other hand requires only $k$ leaf indices, where $k$ is the number of elements used for creating a multiproof. This significantly reduces the size of multirpoofs, especially for larger Merke trees.

We study the maximum cardinality matching problem in a standard distributed setting, where the nodes $V$ of a given $n$-node network graph $G=(V,E)$ communicate over the edges $E$ in synchronous rounds. More specifically, we consider the distributed CONGEST model, where in each round, each node of $G$ can send an $O(\log n)$-bit message to each of its neighbors. We show that for every graph $G$ and a matching $M$ of $G$, there is a randomized CONGEST algorithm to verify $M$ being a maximum matching of $G$ in time $O(|M|)$ and disprove it in time $O(D + \ell)$, where $D$ is the diameter of $G$ and $\ell$ is the length of a shortest augmenting path. We hope that our algorithm constitutes a significant step towards developing a CONGEST algorithm to compute a maximum matching in time $\tilde{O}(s^*)$, where $s^*$ is the size of a maximum matching.

Uncertainty estimation is important for ensuring safety and robustness of AI systems, especially for high-risk applications. While much progress has recently been made in this area, most research has focused on un-structured prediction, such as image classification and regression tasks. However, while task-specific forms of confidence score estimation have been investigated by the speech and machine translation communities, limited work has investigated general uncertainty estimation approaches for structured prediction. Thus, this work aims to investigate uncertainty estimation for structured prediction tasks within a single unified and interpretable probabilistic ensemble-based framework. We consider uncertainty estimation for sequence data at the token-level and complete sequence-level, provide interpretations for, and applications of, various measures of uncertainty and discuss the challenges associated with obtaining them. This work also explores the practical challenges associated with obtaining uncertainty estimates for structured predictions tasks and provides baselines for token-level error detection, sequence-level prediction rejection, and sequence-level out-of-domain input detection using ensembles of auto-regressive transformer models trained on the WMT'14 English-French and WMT'17 English-German translation and LibriSpeech speech recognition datasets.

Learning to Rank is the problem involved with ranking a sequence of documents based on their relevance to a given query. Deep Q-Learning has been shown to be a useful method for training an agent in sequential decision making. In this paper, we show that DeepQRank, our deep q-learning to rank agent, demonstrates performance that can be considered state-of-the-art. Though less computationally efficient than a supervised learning approach such as linear regression, our agent has fewer limitations in terms of which format of data it can use for training and evaluation. We run our algorithm against Microsoft's LETOR listwise dataset and achieve an NDCG@1 (ranking accuracy in the range [0,1]) of 0.5075, narrowly beating out the leading supervised learning model, SVMRank (0.4958).

We demonstrate how the key notions of Tononi et al.'s Integrated Information Theory (IIT) can be studied within the simple graphical language of process theories, i.e. symmetric monoidal categories. This allows IIT to be generalised to a broad range of physical theories, including as a special case the Quantum IIT of Zanardi, Tomka and Venuti.

Integrated Information Theory is one of the leading models of consciousness. It aims to describe both the quality and quantity of the conscious experience of a physical system, such as the brain, in a particular state. In this contribution, we propound the mathematical structure of the theory, separating the essentials from auxiliary formal tools. We provide a definition of a generalized IIT which has IIT 3.0 of Tononi et. al., as well as the Quantum IIT introduced by Zanardi et. al. as special cases. This provides an axiomatic definition of the theory which may serve as the starting point for future formal investigations and as an introduction suitable for researchers with a formal background.

We introduce the use of autoregressive normalizing flows for rapid likelihood-free inference of binary black hole system parameters from gravitational-wave data with deep neural networks. A normalizing flow is an invertible mapping on a sample space that can be used to induce a transformation from a simple probability distribution to a more complex one: if the simple distribution can be rapidly sampled and its density evaluated, then so can the complex distribution. Our first application to gravitational waves uses an autoregressive flow, conditioned on detector strain data, to map a multivariate standard normal distribution into the posterior distribution over system parameters. We train the model on artificial strain data consisting of IMRPhenomPv2 waveforms drawn from a five-parameter $(m_1, m_2, \phi_0, t_c, d_L)$ prior and stationary Gaussian noise realizations with a fixed power spectral density. This gives performance comparable to current best deep-learning approaches to gravitational-wave parameter estimation. We then build a more powerful latent variable model by incorporating autoregressive flows within the variational autoencoder framework. This model has performance comparable to Markov chain Monte Carlo and, in particular, successfully models the multimodal $\phi_0$ posterior. Finally, we train the autoregressive latent variable model on an expanded parameter space, including also aligned spins $(\chi_{1z}, \chi_{2z})$ and binary inclination $\theta_{JN}$, and show that all parameters and degeneracies are well-recovered. In all cases, sampling is extremely fast, requiring less than two seconds to draw $10^4$ posterior samples.

We study the following algorithm synthesis question: given the description of a locally checkable graph problem $\Pi$ for paths or cycles, determine in which instances $\Pi$ is solvable, determine what is the distributed round complexity of solving $\Pi$ in the usual $\mathsf{LOCAL}$ model of distributed computing, and construct an asymptotically optimal distributed algorithm for solving $\Pi$.

To answer such questions, we represent $\Pi$ as a nondeterministic finite automaton $\mathcal{M}$ over a unary alphabet. We classify the states of $\mathcal{M}$ into repeatable states, flexible states, mirror-flexible states, loops, and mirror-flexible loops; all of these can be decided in polynomial time. We show that these five classes of states completely answer all questions related to the solvability and distributed computational complexity of $\Pi$ on cycles.

On paths, there is one case in which the question of solvability coincides with the classical universality problem for unary regular languages, and hence determining if a given problem $\Pi$ is always solvable is co-$\mathsf{NP}$-complete. However, we show that all other questions, including the question of determining the distributed round complexity of $\Pi$ and finding an asymptotically optimal algorithm for solving $\Pi$, can be answered in polynomial time.

We show the surprising result that the cutpoint isolation problem is decidable for probabilistic finite automata where input words are taken from a letter-monotonic context-free language. A context-free language $L$ is letter-monotonic when $L \subseteq a_1^*a_2^* \cdots a_\ell^*$ for some finite $\ell > 0$ where each letter is distinct. A cutpoint is isolated when it cannot be approached arbitrarily closely. The decidability of this problem is in marked contrast to the situation for the (strict) emptiness problem for PFA which is undecidable under the even more severe restrictions of PFA with polynomial ambiguity, commutative matrices and input over a letter-monotonic language as well as the injectivity problem which is undecidable for PFA over letter-monotonic languages. We provide a constructive nondeterministic algorithm to solve the cutpoint isolation problem, even for exponentially ambiguous PFA, and we also show that the problem is at least NP-hard.

We consider a system of linear equations, whose coefficients depend linearly on interval parameters. Its solution set is defined as the set of all solutions of all admissible realizations of the parameters. We study unbounded directions of the solution set and its relation with its kernel. The kernel of a matrix characterizes unbounded direction in the real case and in the case of ordinary interval systems. In the general parametric case, however, this is not completely true. There is still a close relation preserved, which we discuss in the paper. Nevertheless, we identify several special sub-classes, for which the characterization remains valid. Next, we extend the results to the so called AE parametric systems, which are defined by forall-exists quantification.

Most state of the art object detectors output multiple detections per object. The duplicates are removed in a post-processing step called Non-Maximum Suppression. Classical Non-Maximum Suppression has shortcomings in scenes that contain objects with high overlap: The idea of this heuristic is that a high bounding box overlap corresponds to a high probability of having a duplicate. We propose FeatureNMS to solve this problem. FeatureNMS recognizes duplicates not only based on the intersection over union between bounding boxes, but also based on the difference of feature vectors. These feature vectors can encode more information like visual appearance. Our approach outperforms classical NMS and derived approaches and achieves state of the art performance.

Common reporting styles of statistical results, such as confidence intervals (CI), are prone to dichotomous interpretations especially on null hypothesis testing frameworks, for example by claiming significant differences between drug treatment and placebo groups due to the non-overlapping CIs of the mean effects, while disregarding the magnitudes and absolute difference in the effect sizes. Techniques relying on the visual estimation of the strength of evidence have been recommended to limit such dichotomous interpretations but their effectiveness has been challenged. We ran two experiments to compare several representation alternatives of confidence intervals, and used Bayesian multilevel models to estimate the effects of the representation styles on differences in subjective confidence of the results and preferences in visualization styles. Our results suggest that adding visual information to classic CI representation can decrease the sudden drop around $p$-value 0.05 compared to classic CIs and textual representation of CI with $p$-values. All data analysis and scripts are available at https://github.com/helske/statvis.

We consider parameterized concurrent systems consisting of a finite but unknown number of components, obtained by replicating a given set of finite state automata. Components communicate by executing atomic interactions whose participants update their states simultaneously. We introduce an interaction logic to specify both the type of interactions (e.g.\ rendez-vous, broadcast) and the topology of the system (e.g.\ pipeline, ring). The logic can be easily embedded in monadic second order logic of finitely many successors, and is therefore decidable.

Proving safety properties of such a parameterized system, like deadlock freedom or mutual exclusion, requires to infer an inductive invariant that contains all reachable states of all system instances, and no unsafe state. We present a method to automatically synthesize inductive invariants directly from the formula describing the interactions, without costly fixed point iterations. We experimentally prove that this invariant is strong enough to verify safety properties of a large number of systems including textbook examples (dining philosophers, synchronization schemes), classical mutual exclusion algorithms, cache-coherence protocols and self-stabilization algorithms, for an arbitrary number of components.

Kolmogorov complexity is the length of the ultimately compressed version of a file (that is, anything which can be put in a computer). Formally, it is the length of a shortest program from which the file can be reconstructed. We discuss the incomputabilty of Kolmogorov complexity, which formal loopholes this leaves us, recent approaches to compute or approximate Kolmogorov complexity, which approaches fail and which approaches are viable.

Calibration and equal error rates are fundamental conditions for algorithmic fairness that have been shown to conflict with each other, suggesting that they cannot be satisfied simultaneously. This paper shows that the two are in fact compatible and presents a method for reconciling them. In particular, we derive necessary and sufficient conditions for the existence of calibrated scores that yield classifications achieving equal error rates. We then present an algorithm that searches for the most informative score subject to both calibration and minimal error rate disparity. Applied empirically to credit lending, our algorithm provides a solution that is more fair and profitable than a common alternative that omits sensitive features.

This paper gives a broader insight on the application of adaptive filter in noise cancellation during various processes where signal is transmitted. Adaptive filtering techniques like RLS, LMS and normalized LMS are used to filter the input signal using the concept of negative feedback to predict its nature and remove it effectively from the input. In this paper a comparative study between the effectiveness of RLS, LMS and normalized LMS is done based on parameters like SNR (Signal to Noise ratio), MSE (Mean squared error) and cross correlation. Implementation and analysis of the filters are done by taking different step sizes on different orders of the filters.

We consider extension of a closure system on a finite set S as a closure system on the same set S containing the given one as a sublattice. A closure system can be represented in different ways, e.g. by an implicational base or by the set of its meet-irreducible elements. When a closure system is described by an implicational base, we provide a characterization of the implicational base for the largest extension. We also show that the largest extension can be handled by a small modification of the implicational base of the input closure system. This answers a question asked in [12]. Second, we are interested in computing the largest extension when the closure system is given by the set of all its meet-irreducible elements. We give an incremental polynomial time algorithm to compute the largest extension of a closure system, and left open if the number of meet-irreducible elements grows exponentially.

Infrared spectra obtained from cell or tissue specimen have commonly been observed to involve a significant degree of (resonant) Mie scattering, which often overshadows biochemically relevant spectral information by a non-linear, non-additive spectral component in Fourier transformed infrared (FTIR) spectroscopic measurements. Correspondingly, many successful machine learning approaches for FTIR spectra have relied on preprocessing procedures that computationally remove the scattering components from an infrared spectrum. We propose an approach to approximate this complex preprocessing function using deep neural networks. As we demonstrate, the resulting model is not just several orders of magnitudes faster, which is important for real-time clinical applications, but also generalizes strongly across different tissue types. Furthermore, our proposed method overcomes the trade-off between computation time and the corrected spectrum being biased towards an artificial reference spectrum.

Fueled by massive data, important decision making is being automated with the help of algorithms, therefore, fairness in algorithms has become an especially important research topic. In this work, we design new streaming and distributed algorithms for the fair $k$-center problem that models fair data summarization. The streaming and distributed models of computation have an attractive feature of being able to handle massive data sets that do not fit into main memory. Our main contributions are: (a) the first distributed algorithm; which has provably constant approximation ratio and is extremely parallelizable, and (b) a two-pass streaming algorithm with a provable approximation guarantee matching the best known algorithm (which is not a streaming algorithm). Our algorithms have the advantages of being easy to implement in practice, being fast with linear running times, having very small working memory and communication, and outperforming existing algorithms on several real and synthetic data sets. To complement our distributed algorithm, we also give a hardness result for natural distributed algorithms, which holds for even the special case of $k$-center.

In many real world applications, data are characterized by a complex structure, that can be naturally encoded as a graph. In the last years, the popularity of deep learning techniques has renewed the interest in neural models able to process complex patterns. In particular, inspired by the Graph Neural Network (GNN) model, different architectures have been proposed to extend the original GNN scheme. GNNs exploit a set of state variables, each assigned to a graph node, and a diffusion mechanism of the states among neighbor nodes, to implement an iterative procedure to compute the fixed point of the (learnable) state transition function. In this paper, we propose a novel approach to the state computation and the learning algorithm for GNNs, based on a constraint optimisation task solved in the Lagrangian framework. The state convergence procedure is implicitly expressed by the constraint satisfaction mechanism and does not require a separate iterative phase for each epoch of the learning procedure. In fact, the computational structure is based on the search for saddle points of the Lagrangian in the adjoint space composed of weights, neural outputs (node states), and Lagrange multipliers. The proposed approach is compared experimentally with other popular models for processing graphs.

Neural network quantization methods often involve simulating the quantization process during training. This makes the trained model highly dependent on the precise way quantization is performed. Since low-precision accelerators differ in their quantization policies and their supported mix of data-types, a model trained for one accelerator may not be suitable for another. To address this issue, we propose KURE, a method that provides intrinsic robustness to the model against a broad range of quantization implementations. We show that KURE yields a generic model that may be deployed on numerous inference accelerators without a significant loss in accuracy.

On-device machine learning (ML) is quickly gaining popularity among mobile apps. It allows offline model inference while preserving user privacy. However, ML models, considered as core intellectual properties of model owners, are now stored on billions of untrusted devices and subject to potential thefts. Leaked models can cause both severe financial loss and security consequences.

This paper presents the first empirical study of ML model protection on mobile devices. Our study aims to answer three open questions with quantitative evidence: How widely is model protection used in apps? How robust are existing model protection techniques? How much can (stolen) models cost? To that end, we built a simple app analysis pipeline and analyzed 46,753 popular apps collected from the US and Chinese app markets. We identified 1,468 ML apps spanning all popular app categories. We found that, alarmingly, 41% of ML apps do not protect their models at all, which can be trivially stolen from app packages. Even for those apps that use model protection or encryption, we were able to extract the models from 66% of them via unsophisticated dynamic analysis techniques. The extracted models are mostly commercial products and used for face recognition, liveness detection, ID/bank card recognition, and malware detection. We quantitatively estimated the potential financial impact of a leaked model, which can amount to millions of dollars for different stakeholders.

Our study reveals that on-device models are currently at high risk of being leaked; attackers are highly motivated to steal such models. Drawn from our large-scale study, we report our insights into this emerging security problem and discuss the technical challenges, hoping to inspire future research on robust and practical model protection for mobile devices.

We study how efficiently a $k$-element set $S\subseteq[n]$ can be learned from a uniform superposition $|S\rangle$ of its elements. One can think of $|S\rangle=\sum_{i\in S}|i\rangle/\sqrt{|S|}$ as the quantum version of a uniformly random sample over $S$, as in the classical analysis of the coupon collector problem.'' We show that if $k$ is close to $n$, then we can learn $S$ using asymptotically fewer quantum samples than random samples. In particular, if there are $n-k=O(1)$ missing elements then $O(k)$ copies of $|S\rangle$ suffice, in contrast to the $\Theta(k\log k)$ random samples needed by a classical coupon collector. On the other hand, if $n-k=\Omega(k)$, then $\Omega(k\log k)$ quantum samples are~necessary.

More generally, we give tight bounds on the number of quantum samples needed for every $k$ and $n$, and we give efficient quantum learning algorithms. We also give tight bounds in the model where we can additionally reflect through $|S\rangle$. Finally, we relate coupon collection to a known example separating proper and improper PAC learning that turns out to show no separation in the quantum case.

Current mobile augmented reality devices are often equipped with range sensors. The Microsoft HoloLens for instance is equipped with a Time-Of-Flight (ToF) range camera providing coarse triangle meshes that can be used in custom applications. We suggest to use the triangle meshes for the automatic generation of indoor models that can serve as basis for augmenting their physical counterpart with location-dependent information. In this paper, we present a novel voxel-based approach for automated indoor reconstruction from unstructured three-dimensional geometries like triangle meshes. After an initial voxelization of the input data, rooms are detected in the resulting voxel grid by segmenting connected voxel components of ceiling candidates and extruding them downwards to find floor candidates. Semantic class labels like 'Wall', 'Wall Opening', 'Interior Object' and 'Empty Interior' are then assigned to the room voxels in-between ceiling and floor by a rule-based voxel sweep algorithm. Finally, the geometry of the detected walls and their openings is refined in voxel representation. The proposed approach is not restricted to Manhattan World scenarios and does not rely on room surfaces being planar.

We consider cache-aided wireless communication scenarios where each user requests both a file from an a-priori generated cacheable library (referred to as 'content'), and an uncacheable 'non-content' message generated at the start of the wireless transmission session. This scenario is easily found in real-world wireless networks, where the two types of traffic coexist and share limited radio resources. We focus on single-transmitter, single-antenna wireless networks with cache-aided receivers, where the wireless channel is modelled by a degraded Gaussian broadcast channel (GBC). For this setting, we study the delay-rate trade-off, which characterizes the content delivery time and non-content communication rates that can be achieved simultaneously. We propose a scheme based on the separation principle, which isolates the coded caching and multicasting problem from the physical layer transmission problem. We show that this separation-based scheme is sufficient for achieving an information-theoretically order optimal performance, up to a multiplicative factor of 2.01 for the content delivery time, when working in the generalized degrees of freedom (GDoF) limit. We further show that the achievable performance is near-optimal after relaxing the GDoF limit, up to an additional additive factor of 2 bits per dimension for the non-content rates. A key insight emerging from our scheme is that in some scenarios considerable amounts of non-content traffic can be communicated while maintaining the minimum content delivery time, achieved in the absence of non-content messages; compliments of 'topological holes' arising from asymmetries in wireless channel gains.

Computing cohesive subgraphs is a central problem in graph theory. While many formulations of cohesive subgraphs lead to NP-hard problems, finding a densest subgraph can be done in polynomial time. As such, the densest subgraph model has emerged as the most popular notion of cohesiveness. Recently, the data mining community has started looking into the problem of computing k densest subgraphs in a given graph, rather than one, with various restrictions on the possible overlap between the subgraphs. However, there seems to be very little known on this important and natural generalization from a theoretical perspective. In this paper we hope to remedy this situation by analyzing three natural variants of the k densest subgraphs problem. Each variant differs depending on the amount of overlap that is allowed between the subgraphs. In one extreme, when no overlap is allowed, we prove that the problem is NP-hard for k >= 3, but polynomial-time solvable for k <= 2. On the other extreme, when overlap is allowed without any restrictions and the solution subgraphs only have to be distinct, we show that the problem is fixed-parameter tractable with respect to k, and admits a PTAS for constant k. Finally, when a limited of overlap is allowed between the subgraphs, we prove that the problem is NP-hard for k = 2.

An important problem in multiview representation learning is finding the optimal combination of views with respect to the specific task at hand. To this end, we introduce NAM: a Neural Attentive Multiview machine that learns multiview item representations and similarity by employing a novel attention mechanism. NAM harnesses multiple information sources and automatically quantifies their relevancy with respect to a supervised task. Finally, a very practical advantage of NAM is its robustness to the case of dataset with missing views. We demonstrate the effectiveness of NAM for the task of movies and app recommendations. Our evaluations indicate that NAM outperforms single view models as well as alternative multiview methods on item recommendations tasks, including cold-start scenarios.

A cycle $C$ of a graph $G$ is \emph{isolating} if every component of $G-V(C)$ is a single vertex. We show that isolating cycles in polyhedral graphs can be extended to larger ones: every isolating cycle $C$ of length $8 \leq |E(C)| < \frac{2}{3}(|V(G)|+3)$ implies an isolating cycle $C'$ of larger length that contains $V(C)$. By hopping'' iteratively to such larger cycles, we obtain a powerful and very general inductive motor for proving and computing long cycles (we will give an algorithm with running time $O(n^2)$). This provides a method to prove lower bounds on Tutte cycles, as $C'$ will be a Tutte cycle of $G$ if $C$ is. We also prove that $E(C') \leq E(C)+3$ if $G$ does not contain faces of size five, which gives a new tool for proving results about cycle spectra and evidence that these face sizes obstruct long cycles. As a sample application, we test our motor on a conjecture on essentially 4-connected graphs. A planar graph is \emph{essentially $4$-connected} if it is 3-connected and every of its 3-separators is the neighborhood of a single vertex. Essentially $4$-connected graphs have been thoroughly investigated throughout literature as the subject of Hamiltonicity studies. Jackson and Wormald proved that every essentially 4-connected planar graph $G$ on $n$ vertices contains a cycle of length at least $\frac{2}{5}(n+2)$, and this result has recently been improved multiple times, culminating in the lower bound $\frac{5}{8}(n+2)$. However, the best known upper bound is given by an infinite family of such graphs in which every graph $G$ on $n$ vertices has no cycle longer than $\frac{2}{3}(n+4)$; this upper bound is still unmatched. Using isolating cycles, we improve the lower bound to match the upper (up to a summand $+1$). This settles the long-standing open problem of determining the circumference of essentially 4-connected planar graphs.

In this study, we propose a differentiable layer for OFDM-based autoencoders (OFDM-AEs) to avoid high instantaneous power without regularizing the cost function used during the training. The proposed approach relies on the manipulation of the parameters of a set of functions that yield complementary sequences (CSs) through a deep neural network (DNN). We guarantee the peak-to-average-power ratio (PAPR) of each OFDM-AE symbol to be less than or equal to 3 dB. We also show how to normalize the mean power by using the functions in addition to PAPR. The introduced layer admits auxiliary parameters that allow one to control the amplitude and phase deviations in the frequency domain. Numerical results show that DNNs at the transmitter and receiver can achieve reliable communications under this protection layer at the expense of complexity.

Applying machine learning technologies, especially deep learning, into medical image segmentation is being widely studied because of its state-of-the-art performance and results. It can be a key step to provide a reliable basis for clinical diagnosis, such as 3D reconstruction of human tissues, image-guided interventions, image analyzing and visualization. In this review article, deep-learning-based methods for ultrasound image segmentation are categorized into six main groups according to their architectures and training at first. Secondly, for each group, several current representative algorithms are selected, introduced, analyzed and summarized in detail. In addition, common evaluation methods for image segmentation and ultrasound image segmentation datasets are summarized. Further, the performance of the current methods and their evaluations are reviewed. In the end, the challenges and potential research directions for medical ultrasound image segmentation are discussed.

In this work we introduce a new bounding-box free network (BBFNet) for panoptic segmentation. Panoptic segmentation is an ideal problem for a bounding-box free approach as it already requires per-pixel semantic class labels. We use this observation to exploit class boundaries from an off-the-shelf semantic segmentation network and refine them to predict instance labels. Towards this goal BBFNet predicts coarse watershed levels and use it to detect large instance candidates where boundaries are well defined. For smaller instances, whose boundaries are less reliable, BBFNet also predicts instance centers by means of Hough voting followed by mean-shift to reliably detect small objects. A novel triplet loss network helps merging fragmented instances while refining boundary pixels. Our approach is distinct from previous works in panoptic segmentation that rely on a combination of a semantic segmentation network with a computationally costly instance segmentation network based on bounding boxes, such as Mask R-CNN, to guide the prediction of instance labels using a Mixture-of-Expert (MoE) approach. We benchmark our non-MoE method on Cityscapes and Microsoft COCO datasets and show competitive performance with other MoE based approaches while outperfroming exisiting non-proposal based approaches. We achieve this while been computationally more efficient in terms of number of parameters and FLOPs.

The story of information processing is a story of great success. Todays' microprocessors are devices of unprecedented complexity and MOSFET transistors are considered as the most widely produced artifact in the history of mankind. The current miniaturization of electronic circuits is pushed almost to the physical limit and begins to suffer from various parasitic effects. These facts stimulate intense research on neuromimetic devices. This feature article is devoted to various in materio implementation of neuromimetic processes, including neuronal dynamics, synaptic plasticity, and higher-level signal and information processing, along with more sophisticated implementations, including signal processing, speech recognition and data security. Due to vast number of papers in the field, only a subjective selection of topics is presented in this review.

Online machine learning (OML) algorithms do not need any training phase and can be deployed directly in an unknown environment. OML includes multi-armed bandit (MAB) algorithms that can identify the best arm among several arms by achieving a balance between exploration of all arms and exploitation of optimal arm. The Kullback-Leibler divergence based upper confidence bound (KLUCB) is the state-of-the-art MAB algorithm that optimizes exploration-exploitation trade-off but it is complex due to underlining optimization routine. This limits its usefulness for robotics and radio applications which demand integration of KLUCB with the PHY on the system on chip (SoC). In this paper, we efficiently map the KLUCB algorithm on SoC by realizing optimization routine via alternative synthesizable computation without compromising on the performance. The proposed architecture is dynamically reconfigurable such that the number of arms, as well as type of algorithm, can be changed on-the-fly. Specifically, after initial learning, on-the-fly switch to light-weight UCB offers around 10-factor improvement in latency and throughput. Since learning duration depends on the unknown arm statistics, we offer intelligence embedded in architecture to decide the switching instant. We validate the functional correctness and usefulness of the proposed architecture via a realistic wireless application and detailed complexity analysis demonstrates its feasibility in realizing intelligent radios.

Relation extraction from simple questions aims to capture the relation of a factoid question with one underlying relation from a set of predefined ones ina knowledge base. Most recent methods take advantage of neural networks for matching a question with all relations in order to find the best relation that is expressed by that question. In this paper, we propose an instance-based method to find similar questions of a new question, in the sense of their relations, to predict its mentioned relation. The motivation roots in the fact that a relation can be expressed with different forms of question and these forms mostly share similar terms or concepts. Our experiments on the SimpleQuestions dataset show that the proposed model achieved better accuracy compared to the state-of-the-art relation extraction models.

Automating molecular design using deep reinforcement learning (RL) holds the promise of accelerating the discovery of new chemical compounds. A limitation of existing approaches is that they work with molecular graphs and thus ignore the location of atoms in space, which restricts them to 1) generating single organic molecules and 2) heuristic reward functions. To address this, we present a novel RL formulation for molecular design in Cartesian coordinates, thereby extending the class of molecules that can be built. Our reward function is directly based on fundamental physical properties such as the energy, which we approximate via fast quantum-chemical methods. To enable progress towards de-novo molecular design, we introduce MolGym, an RL environment comprising several challenging molecular design tasks along with baselines. In our experiments, we show that our agent can efficiently learn to solve these tasks from scratch by working in a translation and rotation invariant state-action space.

In this paper we study a constraint-based representation of neural network architectures. We cast the learning problem in the Lagrangian framework and we investigate a simple optimization procedure that is well suited to fulfil the so-called architectural constraints, learning from the available supervisions. The computational structure of the proposed Local Propagation (LP) algorithm is based on the search for saddle points in the adjoint space composed of weights, neural outputs, and Lagrange multipliers. All the updates of the model variables are locally performed, so that LP is fully parallelizable over the neural units, circumventing the classic problem of gradient vanishing in deep networks. The implementation of popular neural models is described in the context of LP, together with those conditions that trace a natural connection with Backpropagation. We also investigate the setting in which we tolerate bounded violations of the architectural constraints, and we provide experimental evidence that LP is a feasible approach to train shallow and deep networks, opening the road to further investigations on more complex architectures, easily describable by constraints.

The increase in data traffic on the internet has significantly increased the relevance of data and image encryption. Among the techniques most used in cryptography, chaotic systems have received great attention due to their easy implementation. However, it has recently been observed that these systems can lose their chaotic properties due to the finite precision of computers. In this work, we intend to investigate flexible computing tools, particularly interval analysis, to reduce this problem. We opted for the Lorenz System, as it is one of the few systems whose chaoticity is proven analytically. The results of this study, based on the correlation and entropy indexes, were superior to other studies published in the recent literature.

Vector fields and line fields, their counterparts without orientations on tangent lines, are familiar objects in the theory of dynamical systems. Among the techniques used in their study, the Morse--Smale decomposition of a (generic) field plays a fundamental role, relating the geometric structure of phase space to a combinatorial object consisting of critical points and separatrices. Such concepts led Forman to a satisfactory theory of discrete vector fields, in close analogy to the continuous case. In this paper, we introduce discrete line fields. Again, our definition is rich enough to provide the counterparts of the basic results in the theory of continuous line fields: a Euler-Poincar\'e formula, a Morse--Smale decomposition and a topologically consistent cancellation of critical elements, which allows for topological simplification of the original discrete line field.

We present a study on the efficacy of adversarial training on transformer neural network models, with respect to the task of detecting check-worthy claims. In this work, we introduce the first adversarially-regularized, transformer-based claim spotter model that achieves state-of-the-art results on multiple challenging benchmarks. We obtain a 4.31 point F1-score improvement and a 1.09 point mAP score improvement over current state-of-the-art models on the ClaimBuster Dataset and CLEF2019 Dataset, respectively. In the process, we propose a method to apply adversarial training to transformer models, which has the potential to be generalized to many similar text classification tasks. Along with our results, we are releasing our codebase and manually labeled datasets. We also showcase our models' real world usage via a live public API.

We consider the rooted orienteering problem in Euclidean space: Given $n$ points $P$ in $\mathbb R^d$, a root point $s\in P$ and a budget $\mathcal B>0$, find a path that starts from $s$, has total length at most $\mathcal B$, and visits as many points of $P$ as possible. This problem is known to be NP-hard, hence we study $(1-\delta)$-approximation algorithms. The previous Polynomial-Time Approximation Scheme (PTAS) for this problem, due to Chen and Har-Peled (2008), runs in time $n^{O(d\sqrt{d}/\delta)}(\log n)^{(d/\delta)^{O(d)}}$, and improving on this time bound was left as an open problem. Our main contribution is a PTAS with a significantly improved time complexity of $n^{O(1/\delta)}(\log n)^{(d/\delta)^{O(d)}}$.

A known technique for approximating the orienteering problem is to reduce it to solving $1/\delta$ correlated instances of rooted $k$-TSP (a $k$-TSP tour is one that visits at least $k$ points). However, the $k$-TSP tours in this reduction must achieve a certain excess guarantee (namely, their length can surpass the optimum length only in proportion to a parameter of the optimum called excess) that is stronger than the usual $(1+\delta)$-approximation. Our main technical contribution is to improve the running time of these $k$-TSP variants, particularly in its dependence on the dimension $d$. Indeed, our running time is polynomial even for a moderately large dimension, roughly up to $d=O(\log\log n)$ instead of $d=O(1)$.

We develop a generic data-driven method for estimator selection in off-policy policy evaluation settings. We establish a strong performance guarantee for the method, showing that it is competitive with the oracle estimator, up to a constant factor. Via in-depth case studies in contextual bandits and reinforcement learning, we demonstrate the generality and applicability of the method. We also perform comprehensive experiments, demonstrating the empirical efficacy of our approach and comparing with related approaches. In both case studies, our method compares favorably with existing methods.

Nowadays technology is being adopted on every aspect of our lives and it is one of most important transformation driver in industry. Moreover, many of the systems and digital services that we use daily rely on artificial intelligent technology capable of modeling social or individual behaviors that in turns also modify personal decisions and actions. In this paper, we briefly discuss, from a technological perspective, a number of critical issues including the purpose of promoting trust and ensure social benefit by the proper use of Artificial Intelligent Systems. To achieve this goal we propose a generic ethical technological framework as a first attempt to define a common context towards developing real engineering ethical by design. We hope that this initial proposal to be useful for early adopters and especially for standardization teams.

We turn the definition of individual fairness on its head---rather than ascertaining the fairness of a model given a predetermined metric, we find a metric for a given model that satisfies individual fairness. This can facilitate the discussion on the fairness of a model, addressing the issue that it may be difficult to specify a priori a suitable metric. Our contributions are twofold: First, we introduce the definition of a minimal metric and characterize the behavior of models in terms of minimal metrics. Second, for more complicated models, we apply the mechanism of randomized smoothing from adversarial robustness to make them individually fair under a given weighted $L^p$ metric. Our experiments show that adapting the minimal metrics of linear models to more complicated neural networks can lead to meaningful and interpretable fairness guarantees at little cost to utility.

We study financial networks with debt contracts and credit default swaps between specific pairs of banks. Given such a financial system, we want to decide which of the banks are in default, and how much of their liabilities these defaulting banks can pay. There can easily be multiple different solutions to this problem, leading to a situation of default ambiguity and a range of possible solutions to implement for a financial authority.

In this paper, we study the general properties of the solution space of such financial systems, and analyze a wide range of reasonable objective functions for selecting from the set of solutions. Examples of such objective functions include minimizing the number of defaulting banks, minimizing the amount of unpaid debt, maximizing the number of satisfied banks, maximizing the equity of a specific bank, finding the most balanced distribution of equity, and many others. We show that for all of these objective functions, it is not only NP-hard to find the optimal solution, but it is also NP-hard to approximate this optimum: for each objective function, we show an inapproximability either to an $n^{1/2-\epsilon}$ or to an $n^{1/4-\epsilon}$ factor for any $\epsilon>0$, with $n$ denoting the number of banks in the system. Thus even if an authority has clear criteria to select a solution in case of default ambiguity, it is computationally intractable to find a solution that is reasonably good in terms of this criteria. We also show that our hardness results hold in a wide range of different model variants.

We consider N-fold integer programming problems. After a decade of continuous progress, the currently fastest algorithm for N-fold integer programming by Jansen et al. (2019) has a running time of $(rs\Delta)^{O(r^2s + s^2)} {\phi}^2 \cdot nt \log^{O(1)}(nt)$. Here ${\phi}$ is the largest binary encoding length of a number in the input. This algorithm, like its predecessors are based on the augmentation framework, a tailored integer programming variant of local search. In this paper we propose a different approach that is not based on augmentation. Our algorithm relies on a stronger LP-relaxation of the N-fold integer program instead. This relaxation can be solved in polynomial time with parameter dependence $(s{\Delta})^{O(s^2)}$ by resorting to standard techniques from convex optimization. We show that, for any given optimal vertex solution $x^*$ of this relaxation, there exists an optimal integer solution $z^*$ that is within short $\ell_1$-distance, namely $\|x^* - z^*\|_{1} \leq (rs\Delta)^{O(rs)}$. With dynamic programming one can then find an optimal integer solution of the N-fold IP in time $(rs\Delta)^{O(r^2s + s^2)} \,nt$. This, together with an off-the-shelf-method from convex optimization, results in the currently fastest algorithm for N-fold integer programming.

We introduce a very natural generalization of the well-known problem of simultaneous congruences. Instead of searching for a positive integer $s$ that is specified by $n$ fixed remainders modulo integer divisors $a_1,\dots,a_n$ we consider remainder intervals $R_1,\dots,R_n$ such that $s$ is feasible if and only if $s$ is congruent to $r_i$ modulo $a_i$ for some remainder $r_i$ in interval $R_i$ for all $i$.

This problem is a special case of a 2-stage integer program with only two variables per constraint which is is closely related to directed Diophantine approximation as well as the mixing set problem. We give a hardness result showing that the problem is NP-hard in general.

Motivated by the study of the mixing set problem and a recent result in the field of real-time systems we investigate the case of harmonic divisors, i.e. $a_{i+1}/a_i$ is an integer for all $i<n$. We present an algorithm to decide the feasibility of an instance in time $\mathcal{O}(n^2)$ and we show that even the smallest feasible solution can be computed in strongly polynomial time $\mathcal{O}(n^3)$.

The Interplanetary Filesystem (IPFS) is a distributed data storage service frequently used by blockchain applications and for sharing content in a censorship-resistant manner. Data is distributed within an open set of peers using a Kademlia-based distributed hash table (DHT). In this paper, we study the structure of the resulting overlay network, as it significantly influences the robustness and performance of IPFS. We monitor and systematically crawl IPFS' DHT towards mapping the IPFS overlay network. Our measurements found an average of 44474 nodes at every given time. At least 52.19% of these reside behind a NAT and are not reachable from the outside, suggesting that a large share of the network is operated by private individuals on an as-needed basis. Based on our measurements and our analysis of the IPFS code, we conclude that the topology of the IPFS network is, in its current state, closer to an unstructured overlay network than it is to a classical DHT. While such a structure has benefits for robustness and the resistance against Sybil attacks, it leaves room for improvement in terms of performance and query privacy.

A shadow stack validates on-stack return addresses and prevents arbitrary code execution vulnerabilities due to malicious returns. Several recent works demonstrate that without shadow stack protection, control-flow-integrity -- a related security hardening scheme -- is vulnerable to attacks. Above benefits notwithstanding, shadow stacks have not found mass adoption due to the high overheads they impose.

In this work, we re-examine the performance viability of shadow stacks as a binary hardening technique. Our work is inspired by the design principle of separating mechanism and policy. Existing research on shadow stacks focus on optimizing the implementation of the shadow stack, which is the mechanism. At a policy level, we define Return Address Safety (RA-Safety) to formally capture the impact of memory writes to return addresses. Based on RA-Safety, we design safe function elision and safe path elision, two novel algorithms to optimize the instrumentation policy for shadow stacks. These two algorithms statically identify functions and control-flow paths that will not overwrite any return address so we can safely elide instrumentation on them. Finally, we compliment above policy improvements with Register frame, Binary function inlining, and Dead register chasing; three new mechanism optimizations.

We evaluate our new shadow stack implementation ShadowGuard, with SPEC 2017 and show that it reduces the geometric mean overhead from 8% to 2% over an unoptimized shadow stack. We also evaluate several hardened server benchmarks including Apache HTTP Server and Redis, and the results show above techniques significantly reduce the latency and throughput overheads.

A secure multi-party batch matrix multiplication problem (SMBMM) is considered, where the goal is to allow a master node to efficiently compute the pairwise products of two batches of massive matrices that originate at external source nodes, by distributing the computation across $S$ honest but curious servers. Any group of up to $X$ colluding servers should gain no information about the input matrices, and the master should gain no additional information about the input matrices beyond the product. A solution called Generalized Cross Subspace Alignment codes with Noise Alignment (GCSA-NA in short) is proposed in this work, based on cross-subspace alignment codes. These codes originated in secure private information retrieval, and have recently been applied to distributed batch computation problems where they generalize and improve upon the state of art schemes such as Entangled Polynomial Codes and Lagrange Coded Computing. The prior state of art solution to SMBMM is a coding scheme called polynomial sharing (PS) that was proposed by Nodehi and Maddah-Ali. GCSA-NA outperforms PS codes in several key aspects -- more efficient and secure inter-server communication (which can entirely take place beforehand, i.e., even before the input matrices are determined), flexible inter-server network topology, efficient batch processing, and tolerance to stragglers. The idea of noise-alignment can also be applied to construct schemes for $N$ sources based on $N$-CSA codes, and to construct schemes for symmetric secure private information retrieval to achieve the asymptotic capacity.

The efficiency of a spatial DNN accelerator depends heavily on the compiler's ability to generate optimized mappings for a given DNN's operators (layers) on to the accelerator's compute and memory resources. Searching for the optimal mapping is challenging because of a massive space of possible data-layouts and loop transformations for the DNN layers. For example, there are over 10^19 valid mappings for a single convolution layer on average for mapping ResNet50 and MobileNetV2 on a representative DNN edge accelerator. This challenge gets exacerbated with new layer types (e.g., depth-wise and point-wise convolutions) and diverse hardware accelerator configurations. To address this challenge, we propose a decoupled off-chip/on-chip approach that decomposes the mapping space into off-chip and on-chip subspaces, and first optimizes the off-chip subspace followed by the on-chip subspace. The motivation for this decomposition is to dramatically reduce the size of the search space, and to also prioritize the optimization of off-chip data movement, which is 2-3 orders of magnitude more compared to the on-chip data movement. We introduce {\em Marvel}, which implements the above approach by leveraging two cost models to explore the two subspaces -- a classical distinct-block (DB) locality cost model for the off-chip subspace, and a state-of-the-art DNN accelerator behavioral cost model, MAESTRO, for the on-chip subspace. Our approach also considers dimension permutation, a form of data-layouts, in the mapping space formulation along with the loop transformations.

This paper considers a convolutional neural network transformation that reduces computation complexity and thus speedups neural network processing. Usage of convolutional neural networks (CNN) is the standard approach to image recognition despite the fact they can be too computationally demanding, for example for recognition on mobile platforms or in embedded systems. In this paper we propose CNN structure transformation which expresses 2D convolution filters as a linear combination of separable filters. It allows to obtain separated convolutional filters by standard training algorithms. We study the computation efficiency of this structure transformation and suggest fast implementation easily handled by CPU or GPU. We demonstrate that CNNs designed for letter and digit recognition of proposed structure show 15% speedup without accuracy loss in industrial image recognition system. In conclusion, we discuss the question of possible accuracy decrease and the application of proposed transformation to different recognition problems. convolutional neural networks, computational optimization, separable filters, complexity reduction.

We propose a hierarchical correlation clustering method that extends the well-known correlation clustering to produce hierarchical clusters. We then investigate embedding the respective hierarchy to be used for (tree preserving) embedding and feature extraction. We study the connection of such an embedding to single linkage embedding and minimax distances, and in particular study minimax distances for correlation clustering. Finally, we demonstrate the performance of our methods on several UCI and 20 newsgroup datasets.

Random access schemes in modern wireless communications are generally based on the framed-ALOHA (f-ALOHA), which can be optimized by flexibly organizing devices' transmission and re-transmission. However, this optimization is generally intractable due to the lack of information about complex traffic generation statistics and the occurrence of the random collision. In this article, we first summarize the general structure of access control optimization for different random access schemes, and then review the existing access control optimization based on Machine Learning (ML) and non-ML techniques. We demonstrate that the ML-based methods can better optimize the access control problem compared with non-ML based methods, due to their capability in solving high complexity long-term optimization problem and learning experiential knowledge from reality. To further improve the random access performance, we propose two-step learning optimizers for access control optimization, which individually execute the traffic prediction and the access control configuration. In detail, our traffic prediction method relies on online supervised learning adopting Recurrent Neural Networks (RNNs) that can accurately capture traffic statistics over consecutive frames, and the access control configuration can use either a non-ML based controller or a cooperatively trained Deep Reinforcement Learning (DRL) based controller depending on the complexity of different random access schemes. Numerical results show that the proposed two-step cooperative learning optimizer considerably outperforms the conventional Deep Q-Network (DQN) in terms of higher training efficiency and better access performance.

We develop an FPT algorithm and a kernel for the Weighted Edge Clique Partition (WECP) problem, where a graph with $n$ vertices and integer edge weights is given together with an integer $k$, and the aim is to find $k$ cliques, such that every edge appears in exactly as many cliques as its weight. The problem has been previously only studied in the unweighted version called Edge Clique Partition (ECP), where the edges need to be partitioned into $k$ cliques. It was shown that ECP admits a kernel with $k^2$ vertices [Mujuni and Rosamond, 2008], but this kernel does not extend to WECP. The previously fastest algorithm known for ECP had a runtime of $2^{O(k^2)}n^{O(1)}$ [Issac, 2019]. For WECP we develop a bi-kernel with $4^k$ vertices, and an algorithm with runtime $2^{O(k^{3/2}w^{1/2}\log(k/w))}n^{O(1)}$, where $w$ is the maximum edge weight. The latter in particular improves the runtime for ECP to~$2^{O(k^{3/2}\log k)}n^{O(1)}$. We also show that our algorithm necessarily needs a runtime of $2^{\Theta(k^{3/2}\log k)}n^{O(1)}$ to solve ECP.

This paper we define a new Puzzle called Proof-of-Interaction and we show how it can replace, in the Bitcoin protocol, the Proof-of-Work algorithm.

Representative sampling appears rare in software engineering research. Not all studies need representative samples, but a general lack of representative sampling undermines a scientific field. This study therefore investigates the state of sampling in recent, high-quality software engineering research. The key findings are: (1) random sampling is rare; (2) sophisticated sampling strategies are very rare; (3) sampling, representativeness and randomness do not appear well-understood. To address these problems, the paper synthesizes existing knowledge of sampling into a succinct primer and proposes extensive guidelines for improving the conduct, presentation and evaluation of sampling in software engineering research. It is further recommended that while researchers should strive for more representative samples, disparaging non-probability sampling is generally capricious and particularly misguided for predominately qualitative research.

Separating high-dimensional data like images into independent latent factors remains an open research problem. Here we develop a method that jointly learns a linear independent component analysis (ICA) model with non-linear bijective feature maps. By combining these two methods, ICA can learn interpretable latent structure for images. For non-square ICA, where we assume the number of sources is less than the dimensionality of data, we achieve better unsupervised latent factor discovery than flow-based models and linear ICA. This performance scales to large image datasets such as CelebA

One of the obstacles of abstractive summarization is the presence of various potentially correct predictions. Widely used objective functions for supervised learning, such as cross-entropy loss, cannot handle alternative answers effectively. Rather, they act as a training noise. In this paper, we propose Semantic Similarity strategy that can consider semantic meanings of generated summaries while training. Our training objective includes maximizing semantic similarity score which is calculated by an additional layer that estimates semantic similarity between generated summary and reference summary. By leveraging pre-trained language models, our model achieves a new state-of-the-art performance, ROUGE-L score of 41.5 on CNN/DM dataset. To support automatic evaluation, we also conducted human evaluation and received higher scores relative to both baseline and reference summaries.

In this article, the authors find the evidence that media coverage consisting of 13 online newspapers enhanced the electoral results of right wing party in Spain (Vox) during general elections in November 2019. We consider the political parties and leaders mentions in these media during the electoral campaign from 1st to 10th November 2019, and only visibility or prominence dimension is necessary for the evidence.

Security in embedded systems has become a main requirement in modern electronic devices. The demand for low-cost and highly secure cryptographic algorithms is increasingly growing in fields such as mobile telecommunications, handheld devices, etc. In this paper, we analyze and evaluate the development of cheap and relatively fast hardware implementations of the KATAN family of block ciphers. KATAN is a family of six hardware oriented block ciphers. All KATAN ciphers share an 80-bit key and have 32, 48, or 64-bit blocks. We use VHDL under Altera Quartus in conjunction with ModelSim to implement and analyze our hardware designs. The developed designs are mapped onto high-performance Field Programmable Gate Arrays. We compare our findings with similar hardware implementations and C software versions of the algorithms. The performance analysis of the C implementations is done using Intel Vtune Amplifier running on Dell precision T7500 with its dual quad-core Xeon processor and 24 GB of RAM. The obtained results show better performance when compared with existing hardware and software implementations.

We present ConSORT, a type system for safety verification in the presence of mutability and aliasing. Mutability requires strong updates to model changing invariants during program execution, but aliasing between pointers makes it difficult to determine which invariants must be updated in response to mutation. Our type system addresses this difficulty with a novel combination of refinement types and fractional ownership types. Fractional ownership types provide flow-sensitive and precise aliasing information for reference variables. ConSORT interprets this ownership information to soundly handle strong updates of potentially aliased references. We have proved ConSORT sound and implemented a prototype, fully automated inference tool. We evaluated our tool and found it verifies non-trivial programs including data structure implementations.

Neural networks and tree ensembles are state-of-the-art learners, each with its unique statistical and computational advantages. We aim to combine these advantages by introducing a new layer for neural networks, composed of an ensemble of differentiable decision trees (a.k.a. soft trees). While differentiable trees demonstrate promising results in the literature, in practice they are typically slow in training and inference as they do not support conditional computation. We mitigate this issue by introducing a new sparse activation function for sample routing, and implement true conditional computation by developing specialized forward and backward propagation algorithms that exploit sparsity. Our efficient algorithms pave the way for jointly training over deep and wide tree ensembles using first-order methods (e.g., SGD). Experiments on 23 classification datasets indicate over 10x speed-ups compared to the differentiable trees used in the literature and over 20x reduction in the number of parameters compared to gradient boosted trees, while maintaining competitive performance. Moreover, experiments on CIFAR, MNIST, and Fashion MNIST indicate that replacing dense layers in CNNs with our tree layer reduces the test loss by 7-53% and the number of parameters by 8x. We provide an open-source TensorFlow implementation with a Keras API.

Tree-based Long short term memory (LSTM) network has become state-of-the-art for modeling the meaning of language texts as they can effectively exploit the grammatical syntax and thereby non-linear dependencies among words of the sentence. However, most of these models cannot recognize the difference in meaning caused by a change in semantic roles of words or phrases because they do not acknowledge the type of grammatical relations, also known as typed dependencies, in sentence structure. This paper proposes an enhanced LSTM architecture, called relation gated LSTM, which can model the relationship between two inputs of a sequence using a control input. We also introduce a Tree-LSTM model called Typed Dependency Tree-LSTM that uses the sentence dependency parse structure as well as the dependency type to embed sentence meaning into a dense vector. The proposed model outperformed its type-unaware counterpart in two typical NLP tasks - Semantic Relatedness Scoring and Sentiment Analysis, in a lesser number of training epochs. The results were comparable or competitive with other state-of-the-art models. Qualitative analysis showed that changes in the voice of sentences had little effect on the model's predicted scores, while changes in nominal (noun) words had a more significant impact. The model recognized subtle semantic relationships in sentence pairs. The magnitudes of learned typed dependencies embeddings were also in agreement with human intuitions. The research findings imply the significance of grammatical relations in sentence modeling. The proposed models would serve as a base for future researches in this direction.

This paper investigates a new information reconciliation method for quantum key distribution in the case where two parties exchange key in the presence of a malevolent eavesdropper. We have observed that reconciliation is a special case of channel coding and for that existing techniques can be adapted for reconciliation. We describe an explicit reconciliation method based on Turbo codes. We believe that the proposed method can improve the efficiency of quantum key distribution protocols based on discrete quantum states.

Motivated by team formation applications, we study discrete optimization problems of the form $\max_{S\in\mathcal{S}}\left(f(S)-w(S)\right)$, where $f:2^{V}\to\mathbb{R_{+}}$ is a non-negative monotone submodular function, $w:2^{V}\to\mathbb{R}_{+}$ is a non-negative linear function, and $\mathcal{S}\subseteq2^{V}$. We give very simple and efficient algorithms for classical constraints, such as cardinality and matroid, that work in a variety of models, including the offline, online, and streaming. Our algorithms use a very simple scaling approach: we pick an absolute constant $c\geq1$ and optimize the function $f(S)-c\cdot w(S)$ using a black-box application of standard algorithms, such as the classical Greedy algorithm and the single-threshold Greedy algorithm. These algorithms are based on recent works that use (time varying) scaling combined with classical algorithms such as the discrete and continuous Greedy algorithms (Feldman, WADS'19; Harshaw \emph{et al.}, ICML'19).

The k-means++ algorithm of Arthur and Vassilvitskii (SODA 2007) is a state-of-the-art algorithm for solving the k-means clustering problem and is known to give an O(log k)-approximation in expectation. Recently, Lattanzi and Sohler (ICML 2019) proposed augmenting k-means++ with O(k log log k) local search steps to yield a constant approximation (in expectation) to the k-means clustering problem. In this paper, we improve their analysis to show that, for any arbitrarily small constant $\eps > 0$, with only $\eps k$ additional local search steps, one can achieve a constant approximation guarantee (with high probability in k), resolving an open problem in their paper.

The proliferation of personalized recommendation technologies has raised concerns about discrepancies in their recommendation performance across different genders, age groups, and racial or ethnic populations. This varying degree of performance could impact users' trust in the system and may pose legal and ethical issues in domains where fairness and equity are critical concerns, like job recommendation. In this paper, we investigate several potential factors that could be associated with discriminatory performance of a recommendation algorithm for women versus men. We specifically study several characteristics of user profiles and analyze their possible associations with disparate behavior of the system towards different genders. These characteristics include the anomaly in rating behavior, the entropy of users' profiles, and the users' profile size. Our experimental results on a public dataset using four recommendation algorithms show that, based on all the three mentioned factors, women get less accurate recommendations than men indicating an unfair nature of recommendation algorithms across genders.

Negotiation is a process where agents aim to work through disputes and maximize their surplus. As the use of deep reinforcement learning in bargaining games is unexplored, this paper evaluates its ability to exploit, adapt, and cooperate to produce fair outcomes. Two actor-critic networks were trained for the bidding and acceptance strategy, against time-based agents, behavior-based agents, and through self-play. Gameplay against these agents reveals three key findings. 1) Neural agents learn to exploit time-based agents, achieving clear transitions in decision preference values. The Cauchy distribution emerges as suitable for sampling offers, due to its peaky center and heavy tails. The kurtosis and variance sensitivity of the probability distributions used for continuous control produce trade-offs in exploration and exploitation. 2) Neural agents demonstrate adaptive behavior against different combinations of concession, discount factors, and behavior-based strategies. 3) Most importantly, neural agents learn to cooperate with other behavior-based agents, in certain cases utilizing non-credible threats to force fairer results. This bears similarities with reputation-based strategies in the evolutionary dynamics, and departs from equilibria in classical game theory.

The Fields Medal, often referred as the Nobel Prize of mathematics, is awarded to no more than four mathematician under the age of 40, every four years. In recent years, its conferral has come under scrutiny of math historians, for rewarding the existing elite rather than its original goal of elevating mathematicians from under-represented communities. Prior studies of elitism focus on citational practices and sub-fields; the structural forces that prevent equitable access remain unclear. Here we show the flow of elite mathematicians between countries and lingo-ethnic identity, using network analysis and natural language processing on 240,000 mathematicians and their advisor-advisee relationships. We found that the Fields Medal helped integrate Japan after WWII, through analysis of the elite circle formed around Fields Medalists. Arabic, African, and East Asian identities remain under-represented at the elite level. Through analysis of inflow and outflow, we rebuts the myth that minority communities create their own barriers to entry. Our results demonstrate concerted efforts by international academic committees, such as prize-giving, are a powerful force to give equal access. We anticipate our methodology of academic genealogical analysis can serve as a useful diagnostic for equality within academic fields.

Outlier, or anomaly, detection is essential for optimal performance of machine learning methods and statistical predictive models. It is not just a technical step in a data cleaning process but a key topic in many fields such as fraudulent document detection, in medical applications and assisted diagnosis systems or detecting security threats. In contrast to population-based methods, neighborhood based local approaches are simple flexible methods that have the potential to perform well in small sample size unbalanced problems. However, a main concern of local approaches is the impact that the computation of each sample neighborhood has on the method performance. Most approaches use a distance in the feature space to define a single neighborhood that requires careful selection of several parameters. This work presents a local approach based on a local measure of the heterogeneity of sample labels in the feature space considered as a topological manifold. Topology is computed using the communities of a weighted graph codifying mutual nearest neighbors in the feature space. This way, we provide with a set of multiple neighborhoods able to describe the structure of complex spaces without parameter fine tuning. The extensive experiments on real-world data sets show that our approach overall outperforms, both, local and global strategies in multi and single view settings.

Recent interest in self-supervised dense tracking has yielded rapid progress, but performance still remains far from supervised methods. We propose a dense tracking model trained on videos without any annotations that surpasses previous self-supervised methods on existing benchmarks by a significant margin (+15%), and achieves performance comparable to supervised methods. In this paper, we first reassess the traditional choices used for self-supervised training and reconstruction loss by conducting thorough experiments that finally elucidate the optimal choices. Second, we further improve on existing methods by augmenting our architecture with a crucial memory component. Third, we benchmark on large-scale semi-supervised video object segmentation(aka. dense tracking), and propose a new metric: generalizability. Our first two contributions yield a self-supervised network that for the first time is competitive with supervised methods on standard evaluation metrics of dense tracking. When measuring generalizability, we show self-supervised approaches are actually superior to the majority of supervised methods. We believe this new generalizability metric can better capture the real-world use-cases for dense tracking, and will spur new interest in this research direction.

Graphics processor units (GPUs) are prevalent in modern computing systems at all scales. They consume a significant fraction of the energy in these systems. However, vendors do not publish the actual cost of the power/energy overhead of their internal microarchitecture. In this paper, we accurately measure the energy consumption of various instructions found in modern NVIDIA GPUs. We provide an exhaustive comparison of more than 40 instructions for four high-end NVIDIA GPUs from four different generations (Maxwell, Pascal, Volta, and Turing). Furthermore, we show the effect of the CUDA compiler optimizations on the energy consumption of each instruction. We use three different software techniques to read the GPU on-chip power sensors, which use NVIDIA's NVML API and provide an in-depth comparison between these techniques. Additionally, we verified the software measurement techniques against a custom-designed hardware power measurement. The results show that Volta GPUs have the best energy efficiency of all the other generations for the different categories of the instructions. This work should give GPU architects and developers a more concrete understanding of these representative NVIDIA GPUs' microarchitecture. It should also make energy measurements of any GPU kernel both efficient and accurate.

We study $p$-Faulty Search, a variant of the classic cow-path optimization problem, where a unit speed robot searches the half-line (or $1$-ray) for a hidden item. The searcher is probabilistically faulty, and detection of the item with each visitation is an independent Bernoulli trial whose probability of success $p$ is known. The objective is to minimize the worst case expected detection time, relative to the distance of the hidden item to the origin. A variation of the same problem was first proposed by Gal in 1980. Then in 2003, Alpern and Gal [The Theory of Search Games and Rendezvous] proposed a so-called monotone solution for searching the line ($2$-rays); that is, a trajectory in which the newly searched space increases monotonically in each ray and in each iteration. Moreover, they conjectured that an optimal trajectory for the $2$-rays problem must be monotone. We disprove this conjecture when the search domain is the half-line ($1$-ray). We provide a lower bound for all monotone algorithms, which we also match with an upper bound. Our main contribution is the design and analysis of a sequence of refined search strategies, outside the family of monotone algorithms, which we call $t$-sub-monotone algorithms. Such algorithms induce performance that is strictly decreasing with $t$, and for all $p \in (0,1)$. The value of $t$ quantifies, in a certain sense, how much our algorithms deviate from being monotone, demonstrating that monotone algorithms are sub-optimal when searching the half-line.

The model of camera that was used to capture a particular photographic image (model attribution) can be inferred from model-specific artefacts present within the image. Typically these artefacts are found in high-frequency pixel patterns, rather than image content. Model anonymisation is the process of transforming these artefacts such that the apparent capture model is changed. Improved methods for attribution and anonymisation are important for improving digital forensics, and understanding its limits. Through conditional adversarial training, we present an approach for learning these transformations. Significantly, we augment the objective with the losses from pre-trained auxiliary model attribution classifiers that constrain the generator to not only synthesise discriminative high-frequency artefacts, but also salient image-based artefacts lost during image content suppression. Quantitative comparisons against a recent representative approach demonstrate the efficacy of our framework in a non-interactive black-box setting.

Given a graph $G$, and terminal vertices $s$ and $t$, the TRACKING PATHS problem asks to compute a minimum number of vertices to be marked as trackers, such that the sequence of trackers encountered in each s-t path is unique. TRACKING PATHS is NP-hard in both directed and undirected graphs in general. In this paper we give a collection of polynomial time algorithms for some restricted versions of TRACKING PATHS. We prove that TRACKING PATHS is polynomial time solvable for chordal graphs and tournament graphs. We prove that TRACKING PATHS is NP-hard in graphs with bounded maximum degree $\delta\geq 6$, and give a $2(\delta+1)$-approximate algorithm for the same. We also analyze the version of tracking s-t paths where paths are tracked using edges instead of vertices, and we give a polynomial time algorithm for the same. Finally, we show how to reconstruct an s-t path, given a sequence of trackers and a tracking set for the graph in consideration.

We study dynamic graph algorithms in the Massively Parallel Computation model (MPC), inspired by practical data processing systems. Our goal is to provide algorithms that can efficiently handle large batches of edge insertions and deletions.

We show algorithms that require fewer rounds to update a solution to problems such as Minimum Spanning Forest and Maximal Matching than would be required by their static counterparts to compute it from scratch. They work in the most restrictive memory regime, in which local memory per machine is strongly sublinear in the number of graph vertices. Improving on the size of the batch they can handle would improve on the round complexity of known static algorithms on sparse graphs.

More precisely, we provide $O(1)$ round algorithms that can process a batch of updated of size $O(S)$ for the Minimum Spanning Forest problem and a batch of updates of size $O(S^{1-\varepsilon})$ for the Maximal Matching problem, where $S$ is the limit on the local memory of a single machine.

Empirical networks are often globally sparse, with a small average number of connections per node, when compared to the total size of the network. However this sparsity tends not to be homogeneous, and networks can also be locally dense, for example with a few nodes connecting to a large fraction of the rest of the network, or with small groups of nodes with a large probability of connections between them. Here we show how latent Poisson models which generate hidden multigraphs can be effective at capturing this density heterogeneity, while being more tractable mathematically than some of the alternatives that model simple graphs directly. We show how these latent multigraphs can be reconstructed from data on simple graphs, and how this allows us to disentangle dissortative degree-degree correlations from the constraints of imposed degree sequences, and to improve the identification of community structure in empirically relevant scenarios.

The design of symbol detectors in digital communication systems has traditionally relied on statistical channel models that describe the relation between the transmitted symbols and the observed signal at the receiver. Here we review a data-driven framework to symbol detection design which combines machine learning (ML) and model-based algorithms. In this hybrid approach, well-known channel-model-based algorithms such as the Viterbi method, BCJR detection, and multiple-input multiple-output (MIMO) soft interference cancellation (SIC) are augmented with ML-based algorithms to remove their channel-model-dependence, allowing the receiver to learn to implement these algorithms solely from data. The resulting data-driven receivers are most suitable for systems where the underlying channel models are poorly understood, highly complex, or do not well-capture the underlying physics. Our approach is unique in that it only replaces the channel-model-based computations with dedicated neural networks that can be trained from a small amount of data, while keeping the general algorithm intact. Our results demonstrate that these techniques can yield near-optimal performance of model-based algorithms without knowing the exact channel input-output statistical relationship and in the presence of channel state information uncertainty.

We give a new construction for a small space summary satisfying the coreset guarantee of a data set with respect to the $k$-means objective function. The number of points required in an offline construction is in $\tilde{O}(k \epsilon^{-2}\min(d,k\epsilon^{-2}))$ which is minimal among all available constructions.

Aside from two constructions with exponential dependence on the dimension, all known coresets are maintained in data streams via the merge and reduce framework, which incurs are large space dependency on $\log n$. Instead, our construction crucially relies on Johnson-Lindenstrauss type embeddings which combined with results from online algorithms give us a new technique for efficiently maintaining coresets in data streams without relying on merge and reduce. The final number of points stored by our algorithm in a data stream is in $\tilde{O}(k^2 \epsilon^{-2} \log^2 n \min(d,k\epsilon^{-2}))$.

This paper studies ranking policies in a stylized trial-offer marketplace model, in which a single firm offers products and has consumers with heterogeneous preferences. Consumer trials are influenced by past purchases and the ranking of each product. The platform owner needs to devise a ranking policy to display the products to maximize the number of purchases in the long run. The model proposed attempts to understand the impact of market segmentation in a trial-offer market with social influence. In our model, consumers choices are based on a very general choice model known as the mixed MNL. We analyze the long-term dynamics of this highly complex stochastic model and we quantify the expected benefits of market segmentation. When past purchases are displayed, consumer heterogeneity makes buyers try the sub-optimal products, reducing the overall sales rate. We show that consumer heterogeneity makes the ranking problem NP-hard. We then analyze the benefits of market segmentation. We find tight bounds to the expected benefits of offering a distinct ranking to each consumer segment. Finally, we show that the market segmentation strategy always benefits from social influence when the average quality ranking is used. One of the managerial implications is that the firm is better off using an aggregate ranking policy when the variety of consumer preference is limited, but it should perform a market segmentation policy when consumers are highly heterogeneous. We also show that this result is robust to relatively small consumer classification mistakes; when these are large, an aggregate ranking is preferred.

In this paper we propose a new fast Fourier transform to recover a real nonnegative signal ${\bf x}$ from its discrete Fourier transform. If the signal ${\mathbf x}$ appears to have a short support, i.e., vanishes outside a support interval of length $m < N$, then the algorithm has an arithmetical complexity of only ${\cal O}(m \log m \log (N/m))$ and requires ${\cal O}(m \log (N/m))$ Fourier samples for this computation. In contrast to other approaches there is no a priori knowledge needed about sparsity or support bounds for the vector ${\bf x}$. The algorithm automatically recognizes and exploits a possible short support of the vector and falls back to a usual radix-2 FFT algorithm if ${\bf x}$ has (almost) full support. The numerical stability of the proposed algorithm ist shown by numerical examples.

With the fast development of deep learning, it has become common to learn big neural networks using massive training data. Asynchronous Stochastic Gradient Descent (ASGD) is widely adopted to fulfill this task for its efficiency, which is, however, known to suffer from the problem of delayed gradients. That is, when a local worker adds its gradient to the global model, the global model may have been updated by other workers and this gradient becomes "delayed". We propose a novel technology to compensate this delay, so as to make the optimization behavior of ASGD closer to that of sequential SGD. This is achieved by leveraging Taylor expansion of the gradient function and efficient approximation to the Hessian matrix of the loss function. We call the new algorithm Delay Compensated ASGD (DC-ASGD). We evaluated the proposed algorithm on CIFAR-10 and ImageNet datasets, and the experimental results demonstrate that DC-ASGD outperforms both synchronous SGD and asynchronous SGD, and nearly approaches the performance of sequential SGD.

Winning probabilities of The Hat Game (Ebert's Hat Problem) with three players and three colors are only known in the symmetric case: all probabilities of the colors are equal. This paper solves the asymmetric case: probabilities may be different. We find winning probabilies and optimal strategies in all cases.

In the problem of compressive phase retrieval, one wants to recover an approximately $k$-sparse signal $x \in \mathbb{C}^n$, given the magnitudes of the entries of $\Phi x$, where $\Phi \in \mathbb{C}^{m \times n}$. This problem has received a fair amount of attention, with sublinear time algorithms appearing in \cite{cai2014super,pedarsani2014phasecode,yin2015fast}. In this paper we further investigate the direction of sublinear decoding for real signals by giving a recovery scheme under the $\ell_2 / \ell_2$ guarantee, with almost optimal, $\Oh(k \log n )$, number of measurements. Our result outperforms all previous sublinear-time algorithms in the case of real signals. Moreover, we give a very simple deterministic scheme that recovers all $k$-sparse vectors in $\Oh(k^3)$ time, using $4k-1$ measurements.

Distributed state estimation strongly depends on collaborative signal processing, which often requires excessive communication and computation to be executed on resource-constrained sensor nodes. To address this problem, we propose an event-triggered diffusion Kalman filter, which collects measurements and exchanges messages between nodes based on a local signal indicating the estimation error. On this basis, we develop an energy-aware state estimation algorithm that regulates the resource consumption in wireless networks and ensures the effectiveness of every consumed resource. The proposed algorithm does not require the nodes to share its local covariance matrices, and thereby allows considerably reducing the number of transmission messages. To confirm its efficiency, we apply the proposed algorithm to the distributed simultaneous localization and time synchronization problem and evaluate it on a physical testbed of a mobile quadrotor node and stationary custom ultra-wideband wireless devices. The obtained experimental results indicate that the proposed algorithm allows saving 86% of the communication overhead associated with the original diffusion Kalman filter while causing deterioration of performance by 16% only. We make the Matlab code and the real testing data available online.

Low-Power Wide Area Networks (LPWAN) play a key role in the IoT marketplace wherein LoRaWAN is considered a leading solution. Despite the traction of LoRaWAN, research shows that the current contention management mechanisms of LoRaWAN do not scale. This paper tackles contention on LoRaWAN by introducing FLIP, a fully distributed and open architecture for LoRaWAN that fundamentally rethinks how LoRa gateways should be managed and coordinated. FLIP transforms LoRa gateways into a federated network that provides inherent support for roaming while tackling contention using consensus-driven load balancing. FLIP offers identical security guarantees to LoRaWAN, is compatible with existing gateway hardware and requires no updates to end-device hardware or firmware. These features ensure the practicality of FLIP and provide a path to its adoption. We evaluate the performance of FLIP in a large-scale real-world deployment and demonstrate that FLIP delivers scalable roaming and improved contention management in comparison to LoRaWAN. FLIP achieves these benefits within the resource constraints of conventional LoRa gateways and requires no server hardware.

Motivated by problems in data clustering, we establish general conditions under which families of nonparametric mixture models are identifiable, by introducing a novel framework involving clustering overfitted \emph{parametric} (i.e. misspecified) mixture models. These identifiability conditions generalize existing conditions in the literature, and are flexible enough to include for example mixtures of Gaussian mixtures. In contrast to the recent literature on estimating nonparametric mixtures, we allow for general nonparametric mixture components, and instead impose regularity assumptions on the underlying mixing measure. As our primary application, we apply these results to partition-based clustering, generalizing the notion of a Bayes optimal partition from classical parametric model-based clustering to nonparametric settings. Furthermore, this framework is constructive so that it yields a practical algorithm for learning identified mixtures, which is illustrated through several examples on real data. The key conceptual device in the analysis is the convex, metric geometry of probability measures on metric spaces and its connection to the Wasserstein convergence of mixing measures. The result is a flexible framework for nonparametric clustering with formal consistency guarantees.

Simulating the time-evolution of quantum mechanical systems is BQP-hard and expected to be one of the foremost applications of quantum computers. We consider classical algorithms for the approximation of Hamiltonian dynamics using subsampling methods from randomized numerical linear algebra. We derive a simulation technique whose runtime scales polynomially in the number of qubits and the Frobenius norm of the Hamiltonian. As an immediate application, we show that sample based quantum simulation, a type of evolution where the Hamiltonian is a density matrix, can be efficiently classically simulated under specific structural conditions. Our main technical contribution is a randomized algorithm for approximating Hermitian matrix exponentials. The proof leverages a low-rank, symmetric approximation via the Nystr\"om method. Our results suggest that under strong sampling assumptions there exist classical poly-logarithmic time simulations of quantum computations.

Mobile banking apps, belonging to the most security-critical app category, render massive and dynamic transactions susceptible to security risks. Given huge potential financial loss caused by vulnerabilities, existing research lacks a comprehensive empirical study on the security risks of global banking apps to provide useful insights and improve the security of banking apps.

Since data-related weaknesses in banking apps are critical and may directly cause serious financial loss, this paper first revisits the state-of-the-art available tools and finds that they have limited capability in identifying data-related security weaknesses of banking apps. To complement the capability of existing tools in data-related weakness detection, we propose a three-phase automated security risk assessment system, named AUSERA, which leverages static program analysis techniques and sensitive keyword identification. By leveraging AUSERA, we collect 2,157 weaknesses in 693 real-world banking apps across 83 countries, which we use as a basis to conduct a comprehensive empirical study from different aspects, such as global distribution and weakness evolution during version updates. We find that apps owned by subsidiary banks are always less secure than or equivalent to those owned by parent banks. In addition, we also track the patching of weaknesses and receive much positive feedback from banking entities so as to improve the security of banking apps in practice. To date, we highlight that 21 banks have confirmed the weaknesses we reported. We also exchange insights with 7 banks, such as HSBC in UK and OCBC in Singapore, via in-person or online meetings to help them improve their apps. We hope that the insights developed in this paper will inform the communities about the gaps among multiple stakeholders, including banks, academic researchers, and third-party security companies.

As the building block in symmetric cryptography, designing Boolean functions satisfying multiple properties is an important problem in sequence ciphers, block ciphers, and hash functions. However, the search of $n$-variable Boolean functions fulfilling global cryptographic constraints is computationally hard due to the super-exponential size $\mathcal{O}(2^{2^n})$ of the space. Here, we introduce a codification of the cryptographically relevant constraints in the ground state of an Ising Hamiltonian, allowing us to naturally encode it in a quantum annealer, which seems to provide a quantum speedup. Additionally, we benchmark small $n$ cases in a D-Wave machine, showing its capacity of devising bent functions, the most relevant set of cryptographic Boolean functions. We have complemented it with local search and chain repair to improve the D-Wave quantum annealer performance related to the low connectivity. This work shows how to codify super-exponential cryptographic problems into quantum annealers and paves the way for reaching quantum supremacy with an adequately designed chip.

This paper introduces the first, open source software library for Constraint Consistent Learning (CCL). It implements a family of data-driven methods that are capable of (i) learning state-independent and -dependent constraints, (ii) decomposing the behaviour of redundant systems into task- and null-space parts, and (iii) uncovering the underlying null space control policy. It is a tool to analyse and decompose many everyday tasks, such as wiping, reaching and drawing. The library also includes several tutorials that demonstrate its use with both simulated and real world data in a systematic way. This paper documents the implementation of the library, tutorials and associated helper methods. The software is made freely available to the community, to enable code reuse and allow users to gain in-depth experience in statistical learning in this area.

We present a new and formal coinductive proof of confluence and normalisation of B\"ohm reduction in infinitary lambda calculus. The proof is simpler than previous proofs of this result. The technique of the proof is new, i.e., it is not merely a coinductive reformulation of any earlier proofs. We formalised the proof in the Coq proof assistant.

We consider the sensitivity of real zeros of structured polynomial systems to perturbations of their coefficients. In particular, we provide explicit estimates for condition numbers of structured random real polynomial systems, and extend these estimates to smoothed analysis setting.

In this paper, we extend the notion of Lyndon word to transfinite words. We prove two main results. We first show that, given a transfinite word, there exists a unique factorization in Lyndon words that are densely non-increasing, a relaxation of the condition used in the case of finite words.

In the annex, we prove that the factorization of a rational word has a special form and that it can be computed from a rational expression describing the word.

Consider a wireless network where each communication link has a minimum bandwidth quality-of-service requirement. Certain pairs of wireless links interfere with each other due to being in the same vicinity, and this interference is modeled by a conflict graph. Given the conflict graph and link bandwidth requirements, the objective is to determine, using only localized information, whether the demands of all the links can be satisfied. At one extreme, each node knows the demands of only its neighbors; at the other extreme, there exists an optimal, centralized scheduler that has global information. The present work interpolates between these two extremes by quantifying the tradeoff between the degree of decentralization and the performance of the distributed algorithm. This open problem is resolved for the primary interference model, and the following general result is obtained: if each node knows the demands of all links in a ball of radius $d$ centered at the node, then there is a distributed algorithm whose performance is away from that of an optimal, centralized algorithm by a factor of at most $(2d+3)/(2d+2)$. The tradeoff between performance and complexity of the distributed algorithm is also analyzed. It is shown that for line networks under the protocol interference model, the row constraints are a factor of at most $3$ away from optimal. Both bounds are best possible.

We consider network graphs $G=(V,E)$ in which adjacent nodes share common secrets. In this setting, certain techniques for perfect end-to-end security (in the sense of confidentiality, authenticity (implying integrity) and availability, i.e., CIA+) can be made applicable without end-to-end shared secrets and without computational intractability assumptions. To this end, we introduce and study the concept of a unique-neighborhood network, in which nodes are uniquely identifiable upon their graph-topological neighborhood. While the concept is motivated by authentication, it may enjoy wider applicability as being a technology-agnostic (yet topology aware) form of addressing nodes in a network.

Memorability measures how easily an image is to be memorized after glancing, which may contribute to designing magazine covers, tourism publicity materials, and so forth. Recent works have shed light on the visual features that make generic images, object images or face photographs memorable. However, these methods are not able to effectively predict the memorability of outdoor natural scene images. To overcome this shortcoming of previous works, in this paper, we provide an attempt to answer: "what exactly makes outdoor natural scenes memorable". To this end, we first establish a large-scale outdoor natural scene image memorability (LNSIM) database, containing 2,632 outdoor natural scene images with their ground truth memorability scores and the multi-label scene category annotations. Then, similar to previous works, we mine our database to investigate how low-, middle- and high-level handcrafted features affect the memorability of outdoor natural scenes. In particular, we find that the high-level feature of scene category is rather correlated with outdoor natural scene memorability, and the deep features learnt by deep neural network (DNN) are also effective in predicting the memorability scores. Moreover, combining the deep features with the category feature can further boost the performance of memorability prediction. Therefore, we propose an end-to-end DNN based outdoor natural scene memorability (DeepNSM) predictor, which takes advantage of the learned category-related features. Then, the experimental results validate the effectiveness of our DeepNSM model, exceeding the state-of-the-art methods. Finally, we try to understand the reason of the good performance for our DeepNSM model, and also study the cases that our DeepNSM model succeeds or fails to accurately predict the memorability of outdoor natural scenes.

Deep convolutional neural networks (CNNs) have demonstrated impressive performance on many visual tasks. Recently, they became useful models for the visual system in neuroscience. However, it is still not clear what are learned by CNNs in terms of neuronal circuits. When a deep CNN with many layers is used for the visual system, it is not easy to compare the structure components of CNNs with possible neuroscience underpinnings due to highly complex circuits from the retina to higher visual cortex. Here we address this issue by focusing on single retinal ganglion cells with biophysical models and recording data from animals. By training CNNs with white noise images to predict neuronal responses, we found that fine structures of the retinal receptive field can be revealed. Specifically, convolutional filters learned are resembling biological components of the retinal circuit. This suggests that a CNN learning from one single retinal cell reveals a minimal neural network carried out in this cell. Furthermore, when CNNs learned from different cells are transferred between cells, there is a diversity of transfer learning performance, which indicates that CNNs are cell-specific. Moreover, when CNNs are transferred between different types of input images, here white noise v.s. natural images, transfer learning shows a good performance, which implies that CNNs indeed capture the full computational ability of a single retinal cell for different inputs. Taken together, these results suggest that CNNs could be used to reveal structure components of neuronal circuits, and provide a powerful model for neural system identification.

Learning algorithms are enabling robots to solve increasingly challenging real-world tasks. These approaches often rely on demonstrations and reproduce the behavior shown. Unexpected changes in the environment may require using different behaviors to achieve the same effect, for instance to reach and grasp an object in changing clutter. An emerging paradigm addressing this robustness issue is to learn a diverse set of successful behaviors for a given task, from which a robot can select the most suitable policy when faced with a new environment. In this paper, we explore a novel realization of this vision by learning a generative model over policies. Rather than learning a single policy, or a small fixed repertoire, our generative model for policies compactly encodes an unbounded number of policies and allows novel controller variants to be sampled. Leveraging our generative policy network, a robot can sample novel behaviors until it finds one that works for a new environment. We demonstrate this idea with an application of robust ball-throwing in the presence of obstacles. We show that this approach achieves a greater diversity of behaviors than an existing evolutionary approach, while maintaining good efficacy of sampled behaviors, allowing a Baxter robot to hit targets more often when ball throwing in the presence of obstacles.

The generalized Prony method introduced by Peter & Plonka (2013) is a reconstruction technique for a large variety of sparse signal models that can be represented as sparse expansions into eigenfunctions of a linear operator $A$. However, this procedure requires the evaluation of higher powers of the linear operator $A$ that are often expensive to provide.

In this paper we propose two important extensions of the generalized Prony method that simplify the acquisition of the needed samples essentially and at the same time can improve the numerical stability of the method. The first extension regards the change of operators from $A$ to $\varphi(A)$, where $\varphi$ is an analytic function, while $A$ and $\varphi(A)$ possess the same set of eigenfunctions. The goal is now to choose $\varphi$ such that the powers of $\varphi(A)$ are much simpler to evaluate than the powers of $A$. The second extension concerns the choice of the sampling functionals. We show, how new sets of different sampling functionals $F_{k}$ can be applied with the goal to reduce the needed number of powers of the operator $A$ (resp. $\varphi(A)$) in the sampling scheme and to simplify the acquisition process for the recovery method.

Locally checkable labeling problems (LCLs) are distributed graph problems in which a solution is globally feasible if it is locally feasible in all constant-radius neighborhoods. Vertex colorings, maximal independent sets, and maximal matchings are examples of LCLs.

On the one hand, it is known that some LCLs benefit exponentially from randomness---for example, any deterministic distributed algorithm that finds a sinkless orientation requires $\Theta(\log n)$ rounds in the LOCAL model, while the randomized complexity of the problem is $\Theta(\log \log n)$ rounds. On the other hand, there are also many LCLs in which randomness is useless.

Previously, it was not known if there are any LCLs that benefit from randomness, but only subexponentially. We show that such problems exist: for example, there is an LCL with deterministic complexity $\Theta(\log^2 n)$ rounds and randomized complexity $\Theta(\log n \log \log n)$ rounds.

We present a data partitioning technique performed over skip graphs that promotes significant quantitative and qualitative improvements on NUMA locality in concurrent data structures, as well as reduced contention. We build on previous techniques of thread-local indexing and laziness, and, at a high level, our design consists of a partitioned skip graph, well-integrated with thread-local sequential maps, operating without contention. As a proof-of-concept, we implemented map and relaxed priority queue ADTs using our technique. Maps were conceived using lazy and non-lazy approaches to insertions and removals, and our implementations are shown to be competitive with state-of-the-art maps. We observe a 6x higher CAS locality, a 68.6% reduction on the number of remote CAS operations, and a increase from 88.3% to 99% CAS success rate when using a lazy skip graph as compared to a control skip list (subject to the same codebase, optimizations, and implementation practices). Qualitatively speaking, remote memory accesses are not only reduced in number, but the larger the NUMA distance between threads, the larger the reduction is. We consider two alternative implementations of relaxed priority queues that further take advantage of our data partitioning over skip graphs: (a) using spraying'', a well-known random-walk technique usually performed over skip lists, but now performed over skip graphs; and (b) a custom protocol that traverses the skip graph deterministically, marking elements along this traversal. We provide formal arguments indicating that the first approach is more \emph{relaxed}, that is, that the span of removed keys is larger, while the second approach has smaller contention. Experimental results indicate that the approach based on spraying performs better on skip graphs, yet both seem to scale appropriately.

Metric-based meta-learning techniques have successfully been applied to few-shot classification problems. In this paper, we propose to leverage cross-modal information to enhance metric-based few-shot learning methods. Visual and semantic feature spaces have different structures by definition. For certain concepts, visual features might be richer and more discriminative than text ones. While for others, the inverse might be true. Moreover, when the support from visual information is limited in image classification, semantic representations (learned from unsupervised text corpora) can provide strong prior knowledge and context to help learning. Based on these two intuitions, we propose a mechanism that can adaptively combine information from both modalities according to new image categories to be learned. Through a series of experiments, we show that by this adaptive combination of the two modalities, our model outperforms current uni-modality few-shot learning methods and modality-alignment methods by a large margin on all benchmarks and few-shot scenarios tested. Experiments also show that our model can effectively adjust its focus on the two modalities. The improvement in performance is particularly large when the number of shots is very small.

The determination of accurate bathymetric information is a key element for near offshore activities, hydrological studies such as coastal engineering applications, sedimentary processes, hydrographic surveying as well as archaeological mapping and biological research. UAV imagery processed with Structure from Motion (SfM) and Multi View Stereo (MVS) techniques can provide a low-cost alternative to established shallow seabed mapping techniques offering as well the important visual information. Nevertheless, water refraction poses significant challenges on depth determination. Till now, this problem has been addressed through customized image-based refraction correction algorithms or by modifying the collinearity equation. In this paper, in order to overcome the water refraction errors, we employ machine learning tools that are able to learn the systematic underestimation of the estimated depths. In the proposed approach, based on known depth observations from bathymetric LiDAR surveys, an SVR model was developed able to estimate more accurately the real depths of point clouds derived from SfM-MVS procedures. Experimental results over two test sites along with the performed quantitative validation indicated the high potential of the developed approach.

We introduce data-driven decision-making algorithms that achieve state-of-the-art \emph{dynamic regret} bounds for non-stationary bandit settings. These settings capture applications such as advertisement allocation, dynamic pricing, and traffic network routing in changing environments. We show how the difficulty posed by the (unknown \emph{a priori} and possibly adversarial) non-stationarity can be overcome by an unconventional marriage between stochastic and adversarial bandit learning algorithms. Our main contribution is a general algorithmic recipe for a wide variety of non-stationary bandit problems. Specifically, we design and analyze the sliding window-upper confidence bound algorithm that achieves the optimal dynamic regret bound for each of the settings when we know the respective underlying \emph{variation budget}, which quantifies the total amount of temporal variation of the latent environments. Boosted by the novel bandit-over-bandit framework that adapts to the latent changes, we can further enjoy the (nearly) optimal dynamic regret bounds in a (surprisingly) parameter-free manner. In addition to the classical exploration-exploitation trade-off, our algorithms leverage the power of the forgetting principle" in the learning processes, which is vital in changing environments. Our extensive numerical experiments on both synthetic and real world online auto-loan datasets show that our proposed algorithms achieve superior empirical performance compared to existing algorithms.

To efficiently support safety-related vehicular applications, the ultra-reliable and low-latency communication (URLLC) concept has become an indispensable component of vehicular networks (VNETs). Due to the high mobility of VNETs, exchanging near-instantaneous channel state information (CSI) and making reliable resource allocation decisions based on such short-term CSI evaluations are not practical. In this paper, we consider the downlink of a vehicle-to-infrastructure (V2I) system conceived for URLLC based on idealized perfect and realistic imperfect CSI. By exploiting the benefits of the massive MIMO concept, a two-stage radio resource allocation problem is formulated based on a novel twin-timescale perspective for avoiding the frequent exchange of near-instantaneous CSI. Specifically, based on the prevalent road-traffic density, Stage 1 is constructed for minimizing the worst-case transmission latency on a long-term timescale. In Stage 2, the base station allocates the total power at a short-term timescale according to the large-scale fading CSI encountered for minimizing the maximum transmission latency across all vehicular users. Then, a primary algorithm and a secondary algorithm are conceived for our V2I URLLC system to find the optimal solution of the twin-timescale resource allocation problem, with special emphasis on the complexity imposed. Finally, our simulation results show that the proposed resource allocation scheme significantly reduces the maximum transmission latency, and it is not sensitive to the fluctuation of road-traffic density.

Human robot collaboration (HRC) is becoming increasingly important as the paradigm of manufacturing is shifting from mass production to mass customization. The introduction of HRC can significantly improve the flexibility and intelligence of automation. However, due to the stochastic and time-varying nature of human collaborators, it is challenging for the robot to efficiently and accurately identify the plan of human and respond in a safe manner. To address this challenge, we propose an integrated human robot collaboration framework in this paper which includes both plan recognition and trajectory prediction. Such a framework enables the robots to perceive, predict and adapt their actions to the human's plan and intelligently avoid collisions with the human based on the predicted human trajectory. Moreover, by explicitly leveraging the hierarchical relationship between the plan and trajectories, more robust plan recognition performance can be achieved. Experiments are conducted on an industrial robot to verify the proposed framework, which shows that our proposed framework can not only assure safe HRC, but also improve the time efficiency of the HRC team, and the plan recognition module is not sensitive to noises.

Dynamic Time Warping (DTW) is a well-known similarity measure for time series. The standard dynamic programming approach to compute the DTW distance of two length-$n$ time series, however, requires $O(n^2)$ time, which is often too slow for real-world applications. Therefore, many heuristics have been proposed to speed up the DTW computation. These are often based on lower bounding techniques, approximating the DTW distance, or considering special input data such as binary or piecewise constant time series. In this paper, we present a first exact algorithm to compute the DTW distance of two run-length encoded time series whose running time only depends on the encoding lengths of the inputs. The worst-case running time is cubic in the encoding length. In experiments we show that our algorithm is indeed fast for time series with short encoding lengths.

One of the most intriguing unsolved questions of matroid optimization is the characterization of the existence of $k$ disjoint common bases of two matroids. The significance of the problem is well-illustrated by the long list of conjectures that can be formulated as special cases, such as Woodall's conjecture on packing disjoint dijoins in a directed graph, or Rota's beautiful conjecture on rearrangements of bases.

In the present paper we prove that the problem is difficult under the rank oracle model, i.e., we show that there is no algorithm which decides if the common ground set of two matroids can be partitioned into $k$ common bases by using a polynomial number of independence queries. Our complexity result holds even for the very special case when $k=2$.

Through a series of reductions, we also show that the abstract problem of packing common bases in two matroids includes the NAE-SAT problem and the Perfect Even Factor problem in directed graphs. These results in turn imply that the problem is not only difficult in the independence oracle model but also includes NP-complete special cases already when $k=2$, one of the matroids is a partition matroid, while the other matroid is linear and is given by an explicit representation.

In this paper we develop optimal algorithms in the binary-forking model for a variety of fundamental problems, including sorting, semisorting, list ranking, tree contraction, range minima, and set union, intersection and difference. In the binary-forking model, tasks can only fork into two child tasks, but can do so recursively and asynchronously, and join up later. The tasks share memory, and costs are measured in terms of work (total number of instructions), and span (longest dependence chain).

Due to the asynchronous nature of the model, and a variety of schedulers that are efficient in both theory and practice, variants of the model are widely used in practice in languages such as Cilk and Java Fork-Join. PRAM algorithms can be simulated in the model but at a loss of a factor of $\Omega(\log n)$ so most PRAM algorithms are not optimal in the model even if optimal on the PRAM. All the algorithms we describe are optimal in work and span (logarithmic in span). Several are randomized. Beyond being the first optimal algorithms for their problems in the model, most are very simple.

We introduce a general framework for the construction of well-balanced finite volume methods for hyperbolic balance laws. The phrase well-balancing is used in a wider sense, since the method can be applied to exactly follow any solution of any system of hyperbolic balance laws in multiple spatial dimensions. The solution has to be known a priori, either as an analytical expression or as discrete data. The proposed framework modifies the standard finite volume approach such that the well-balancing property is obtained. The potentially high order of accuracy of the method is maintained under the modification. We show numerical tests for the compressible Euler equations with and without gravity source term and with different equations of state, and for the equations of compressible ideal magnetohydrodynamics. Different grid geometries and reconstruction methods are used. We demonstrate high order convergence numerically.

In image based feature descriptor design, an iterative scanning operation (e.g., convolution) is mainly adopted to extract local information of the image pixels. In this paper, we propose a Matrix based Local Binary Pattern (M-LBP) and a Matrix based Histogram of Oriented Gradients (M-HOG) descriptors based on global matrix projection. An integrated form of M-LBP and M-HOG, namely M-LBP-HOG, is subsequently constructed in a single line of matrix formulation. The proposed descriptors are evaluated using a publicly available mammogram database. The results show promising performance in terms of classification accuracy and computational efficiency.

Human annotations serve an important role in computational models where the target constructs under study are hidden, such as dimensions of affect. This is especially relevant in machine learning, where subjective labels derived from related observable signals (e.g., audio, video, text) are needed to support model training and testing. Current research trends focus on correcting artifacts and biases introduced by annotators during the annotation process while fusing them into a single annotation. In this work, we propose a novel annotation approach using triplet embeddings. By lifting the absolute annotation process to relative annotations where the annotator compares individual target constructs in triplets, we leverage the accuracy of comparisons over absolute ratings by human annotators. We then build a 1-dimensional embedding in Euclidean space that is indexed in time and serves as a label for regression. In this setting, the annotation fusion occurs naturally as a union of sets of sampled triplet comparisons among different annotators. We show that by using our proposed sampling method to find an embedding, we are able to accurately represent synthetic hidden constructs in time under noisy sampling conditions. We further validate this approach using human annotations collected from Mechanical Turk and show that we can recover the underlying structure of the hidden construct up to bias and scaling factors.

We first pose the Unsupervised Progressive Learning (UPL) problem: learning salient representations from a non-stationary stream of unlabeled data in which the number of object classes increases with time. To solve the UPL problem we propose an architecture that involves a module called Self-Taught Associative Memory (STAM). Layered hierarchies of STAM modules learn based on a combination of online clustering, novelty detection, forgetting outliers, and storing only prototypical representations rather than specific examples. We evaluate STAM representations using clustering and classification tasks, relying on limited labeled data for the latter. Even though there are no prior approaches that are directly applicable to the UPL problem, we compare the STAM architecture to a couple of unsupervised and self-supervised deep learning approaches adapted in the UPL context.

Community-based Question and Answering (CQA) platforms are nowadays enlightening over a billion people with crowdsourced knowledge. A key design issue in CQA platforms is how to find the potential answerers and to provide the askers timely and suitable answers, i.e., the so-called \textit{question routing} problem. State-of-art approaches often rely on extracting topics from the question texts. In this work, we analyze the question routing problem in a CQA system named Farm-Doctor that is exclusive for agricultural knowledge. The major challenge is that its questions contain limited textual information.

To this end, we conduct an extensive measurement and obtain the whole knowledge repository of Farm-Doctor that consists of over 690 thousand questions and over 3 million answers. To remedy the text deficiency, we model Farm-Doctor as a heterogeneous information network that incorporates rich side information and based on network representation learning models we accurately recommend for each question the users that are highly likely to answer it. With an average income of fewer than 6 dollars a day, over 300 thousands farmers in China seek online in Farm-Doctor for agricultural advices. Our method helps these less eloquent farmers with their cultivation and hopefully provides a way to improve their lives.

Today's multiagent systems have grown too complex to rely on centralized controllers, prompting increasing interest in the design of distributed algorithms. In this respect, game theory has emerged as a valuable tool to complement more traditional techniques. The fundamental idea behind this approach is the assignment of agents' local cost functions, such that their selfish minimization attains, or is provably close to, the global objective. Any algorithm capable of computing an equilibrium of the corresponding game inherits an approximation ratio that is, in the worst case, equal to its price-of-anarchy. Therefore, a successful application of the game design approach hinges on the possibility to quantify and optimize the equilibrium performance.

Toward this end, we introduce the notion of generalized smoothness, and show that the resulting efficiency bounds are significantly tighter compared to those obtained using the traditional smoothness approach. Leveraging this newly-introduced notion, we quantify the equilibrium performance for the class of local resource allocation games. Finally, we show how the agents' local decision rules can be designed in order to optimize the efficiency of the corresponding equilibria, by means of a tractable linear program.

In this paper, we propose a novel control architecture, inspired from neuroscience, for adaptive control of continuous-time systems. A key objective explored in this paper is to design control architectures and algorithms that can learn and adapt quickly to changes that are even abrupt. The proposed architecture, in the setting of standard neural network (NN) based adaptive control, augments an external working memory to the NN. The proposed architecture, in the setting of standard neural network (NN) based adaptive control, augments an {\it external working memory} to the NN. The external working memory, through a write operation, stores recently observed feature vectors from the hidden layer of the NN and can update this information quickly if it becomes less relevant after abrupt changes. The controller modifies the final control signal by retrieving information from the working memory. The use of external working memory is aimed at improving the context thereby inducing the learning system to search in a particular direction. This directed learning allows the learning system to find a good approximation of the unknown function even after abrupt changes quickly.We consider two classes of controllers for concrete development of our ideas (i) a model reference NN adaptive controller for linear systems with matched uncertainty (ii) robot arm controller. We prove that the resulting controllers lead to Uniformly Ulitmately Bounded (UUB) stable closed loop systems. We provide a detailed illustration of the working of this learning mechanism through a simple example. We also provide proof for a simpler estimation problem for how memory augmentation improves estimation within less number of iterations. Through extensive simulations and specific metrics we show that memory augmentation improves learning significantly even when the system undergoes sudden changes.

The goal of this study is to introduce a comprehensive gait database of 93 human subjects who walked between two endpoints during two different sessions and record their gait data using two smartphones, one was attached to the right thigh and another one on the left side of the waist. This data is collected with the intention to be utilized by a deep learning-based method which requires enough time points. The metadata including age, gender, smoking, daily exercise time, height, and weight of an individual is recorded. this data set is publicly available.

General matrix-matrix multiplications with double-precision real and complex entries (DGEMM and ZGEMM) in vendor-supplied BLAS libraries are best optimized for square matrices but often show bad performance for tall & skinny matrices, which are much taller than wide. NVIDIA's current CUBLAS implementation delivers only a fraction of the potential performance as indicated by the roofline model in this case. We describe the challenges and key characteristics of an implementation that can achieve close to optimal performance. We further evaluate different strategies of parallelization and thread distribution, and devise a flexible, configurable mapping scheme. To ensure flexibility and allow for highly tailored implementations we use code generation combined with autotuning. For a large range of matrix sizes in the domain of interest we achieve at least 2/3 of the roofline performance and often substantially outperform state-of-the art CUBLAS results on an NVIDIA Volta GPGPU.

We present fully polynomial approximation schemes for a broad class of Holant problems with complex edge weights, which we call Holant polynomials. We transform these problems into partition functions of abstract combinatorial structures known as polymers in statistical physics. Our method involves establishing zero-free regions for the partition functions of polymer models and using the most significant terms of the cluster expansion to approximate them.

Results of our technique include new approximation and sampling algorithms for a diverse class of Holant polynomials in the low-temperature regime and approximation algorithms for general Holant problems with small signature weights. Additionally, we give randomised approximation and sampling algorithms with faster running times for more restrictive classes. Finally, we improve the known zero-free regions for a perfect matching polynomial.

We develop a simple and efficient algorithm for approximating the John Ellipsoid of a symmetric polytope. Our algorithm is near optimal in the sense that our time complexity matches the current best verification algorithm. We also provide the MATLAB code for further research.

A/B testing is of central importance in the industry, especially in the technology sector where companies run long sequences of A/B tests to optimize their products. Since the space of potential innovations is typically vast, the experimenter must make quick and good decisions without wasting too much time on a single A/B test in the sequence. In particular, discarding an innovation with a small benefit might be better in the long run than using many samples to precisely determine its value. In this work, we introduce a performance measure that captures this idea and design an efficient algorithm that performs almost as well as the best A/B strategy in a given set. As it turns out, a key technical difficulty that significantly affects the learning rates is the hardness of obtaining unbiased estimates of the strategy rewards.

We introduce a probabilistic approach to unify deep continual learning with open set recognition, based on variational Bayesian inference. Our single model combines a joint probabilistic encoder with a generative model and a linear classifier that get shared across sequentially arriving tasks. In order to successfully distinguish unseen unknown data from trained known tasks, we propose to bound the class specific approximate posterior by fitting regions of high density on the basis of correctly classified data points. These bounds are further used to significantly alleviate catastrophic forgetting by avoiding samples from low density areas in generative replay. Our approach requires no storing of old- or upfront knowledge of future data and is empirically validated on visual and audio tasks in class incremental, as well as cross-dataset scenarios across modalities.

Explaining decisions of deep neural networks is a hot research topic with applications in medical imaging, video surveillance, and self driving cars. Many methods have been proposed in literature to explain these decisions by identifying relevance of different pixels, limiting the types of explanations possible. In this paper, we propose a method that can generate contrastive explanations for such data where we not only highlight aspects that are in themselves sufficient to justify the classification by the deep model, but also new aspects which if added will change the classification. In order to move beyond the limitations of previous explanations, our key contribution is how we define "addition" for such rich data in a formal yet humanly interpretable way that leads to meaningful results. This was one of the open questions laid out in in Dhurandhar et.al. (2018) [6], which proposed a general framework for creating (local) contrastive explanations for deep models, but is limited to simple use cases such as black/white images. We showcase the efficacy of our approach on three diverse image data sets (faces, skin lesions, and fashion apparel) in creating intuitive explanations that are also quantitatively superior compared with other state-of-the-art interpretability methods. A thorough user study with 200 individuals asks how well the various methods are understood by humans and demonstrates which aspects of contrastive explanations are most desirable.

Recent work by Jacot et al. (2018) has showed that training a neural network of any kind with gradient descent in parameter space is equivalent to kernel gradient descent in function space with Recent influential work by Jacot et al. (2018) has shown that training a neural network of any kind with gradient descent in parameter space is strongly related to kernel gradient descent in function space with respect to the Neural Tangent Kernel (NTK). Lee et al. (2019) built on this result by establishing that the output of a neural network trained using gradient descent can be approximated by a linear model for wide networks. In parallel, a recent line of studies (Schoenholz et al. (2017), Hayou et al. (2019)) has suggested that a special initialization known as the Edge of Chaos improves training. In this paper, we bridge the gap between these two concepts by quantifying the impact of the initialization and the activation function on the NTK when the network depth becomes large. We provide experiments illustrating our theoretical results.

We study gradient compression methods to alleviate the communication bottleneck in data-parallel distributed optimization. Despite the significant attention received, current compression schemes either do not scale well or fail to achieve the target test accuracy. We propose a new low-rank gradient compressor based on power iteration that can i) compress gradients rapidly, ii) efficiently aggregate the compressed gradients using all-reduce, and iii) achieve test performance on par with SGD. The proposed algorithm is the only method evaluated that achieves consistent wall-clock speedups when benchmarked against regular SGD with an optimized communication backend. We demonstrate reduced training times for convolutional networks as well as LSTMs on common datasets. Our code is available at https://github.com/epfml/powersgd.

We propose a novel score-based approach to learning a directed acyclic graph (DAG) from observational data. We adapt a recently proposed continuous constrained optimization formulation to allow for nonlinear relationships between variables using neural networks. This extension allows to model complex interactions while avoiding the combinatorial nature of the problem. In addition to comparing our method to existing continuous optimization methods, we provide missing empirical comparisons to nonlinear greedy search methods. On both synthetic and real-world data sets, this new method outperforms current continuous methods on most tasks, while being competitive with existing greedy search methods on important metrics for causal inference.

We improve the over-parametrization size over two beautiful results [Li and Liang' 2018] and [Du, Zhai, Poczos and Singh' 2019] in deep learning theory.

Grounding referring expressions in images aims to locate the object instance in an image described by a referring expression. It involves a joint understanding of natural language and image content, and is essential for a range of visual tasks related to human-computer interaction. As a language-to-vision matching task, the core of this problem is to not only extract all the necessary information (i.e., objects and the relationships among them) in both the image and referring expression, but also make full use of context information to align cross-modal semantic concepts in the extracted information. Unfortunately, existing work on grounding referring expressions fails to accurately extract multi-order relationships from the referring expression and associate them with the objects and their related contexts in the image. In this paper, we propose a Cross-Modal Relationship Extractor (CMRE) to adaptively highlight objects and relationships (spatial and semantic relations) related to the given expression with a cross-modal attention mechanism, and represent the extracted information as a language-guided visual relation graph. In addition, we propose a Gated Graph Convolutional Network (GGCN) to compute multimodal semantic contexts by fusing information from different modes and propagating multimodal information in the structured relation graph. Experimental results on three common benchmark datasets show that our Cross-Modal Relationship Inference Network, which consists of CMRE and GGCN, significantly surpasses all existing state-of-the-art methods.

Learning to disentangle the hidden factors of variations within a set of observations is a key task for artificial intelligence. We present a unified formulation for class and content disentanglement and use it to illustrate the limitations of current methods. We therefore introduce LORD, a novel method based on Latent Optimization for Representation Disentanglement. We find that latent optimization, along with an asymmetric noise regularization, is superior to amortized inference for achieving disentangled representations. In extensive experiments, our method is shown to achieve better disentanglement performance than both adversarial and non-adversarial methods that use the same level of supervision. We further introduce a clustering-based approach for extending our method for settings that exhibit in-class variation with promising results on the task of domain translation.

Data selection methods, such as active learning and core-set selection, are useful tools for machine learning on large datasets. However, they can be prohibitively expensive to apply in deep learning because they depend on feature representations that need to be learned. In this work, we show that we can greatly improve the computational efficiency by using a small proxy model to perform data selection (e.g., selecting data points to label for active learning). By removing hidden layers from the target model, using smaller architectures, and training for fewer epochs, we create proxies that are an order of magnitude faster to train. Although these small proxy models have higher error rates, we find that they empirically provide useful signals for data selection. We evaluate this "selection via proxy" (SVP) approach on several data selection tasks across five datasets: CIFAR10, CIFAR100, ImageNet, Amazon Review Polarity, and Amazon Review Full. For active learning, applying SVP can give an order of magnitude improvement in data selection runtime (i.e., the time it takes to repeatedly train and select points) without significantly increasing the final error (often within 0.1%). For core-set selection on CIFAR10, proxies that are over 10x faster to train than their larger, more accurate targets can remove up to 50% of the data without harming the final accuracy of the target, leading to a 1.6x end-to-end training time improvement.

We provide a discussion of several recent results which, in certain scenarios, are able to overcome a barrier in distributed stochastic optimization for machine learning. Our focus is the so-called asymptotic network independence property, which is achieved whenever a distributed method executed over a network of n nodes asymptotically converges to the optimal solution at a comparable rate to a centralized method with the same computational power as the entire network. We explain this property through an example involving the training of ML models and sketch a short mathematical analysis for comparing the performance of distributed stochastic gradient descent (DSGD) with centralized stochastic gradient decent (SGD).

We propose a fast, model agnostic method for finding interpretable counterfactual explanations of classifier predictions by using class prototypes. We show that class prototypes, obtained using either an encoder or through class specific k-d trees, significantly speed up the the search for counterfactual instances and result in more interpretable explanations. We introduce two novel metrics to quantitatively evaluate local interpretability at the instance level. We use these metrics to illustrate the effectiveness of our method on an image and tabular dataset, respectively MNIST and Breast Cancer Wisconsin (Diagnostic). The method also eliminates the computational bottleneck that arises because of numerical gradient evaluation for $\textit{black box}$ models.

Adversarial examples raise questions about whether neural network models are sensitive to the same visual features as humans. In this paper, we first detect adversarial examples or otherwise corrupted images based on a class-conditional reconstruction of the input. To specifically attack our detection mechanism, we propose the Reconstructive Attack which seeks both to cause a misclassification and a low reconstruction error. This reconstructive attack produces undetected adversarial examples but with much smaller success rate. Among all these attacks, we find that CapsNets always perform better than convolutional networks. Then, we diagnose the adversarial examples for CapsNets and find that the success of the reconstructive attack is highly related to the visual similarity between the source and target class. Additionally, the resulting perturbations can cause the input image to appear visually more like the target class and hence become non-adversarial. This suggests that CapsNets use features that are more aligned with human perception and have the potential to address the central issue raised by adversarial examples.

In this work the problem of path planning for an autonomous vehicle that moves on a freeway is considered. The most common approaches that are used to address this problem are based on optimal control methods, which make assumptions about the model of the environment and the system dynamics. On the contrary, this work proposes the development of a driving policy based on reinforcement learning. In this way, the proposed driving policy makes minimal or no assumptions about the environment, since a priori knowledge about the system dynamics is not required. Driving scenarios where the road is occupied both by autonomous and manual driving vehicles are considered. To the best of our knowledge, this is one of the first approaches that propose a reinforcement learning driving policy for mixed driving environments. The derived reinforcement learning policy, firstly, is compared against an optimal policy derived via dynamic programming, and, secondly, its efficiency is evaluated under realistic scenarios generated by the established SUMO microscopic traffic flow simulator. Finally, some initial results regarding the effect of autonomous vehicles' behavior on the overall traffic flow are presented.

Consider a graph problem that is locally checkable but not locally solvable: given a solution we can check that it is feasible by verifying all constant-radius neighborhoods, but to find a solution each node needs to explore the input graph at least up to distance $\Omega(\log n)$ in order to produce its output. We consider the complexity of such problems from the perspective of volume: how large a subgraph does a node need to see in order to produce its output. We study locally checkable graph problems on bounded-degree graphs. We give a number of constructions that exhibit tradeoffs between deterministic distance, randomized distance, deterministic volume, and randomized volume:

- If the deterministic distance is linear, it is also known that randomized distance is near-linear. In contrast, we show that there are problems with linear deterministic volume but only logarithmic randomized volume.

- We prove a volume hierarchy theorem for randomized complexity: among problems with linear deterministic volume complexity, there are infinitely many distinct randomized volume complexity classes between $\Omega(\log n)$ and $O(n)$. This hierarchy persists even when restricting to problems whose randomized and deterministic distance complexities are $\Theta(\log n)$.

- Similar hierarchies exist for polynomial distance complexities: for any $k, \ell \in N$ with $k \leq \ell$, there are problems whose randomized and deterministic distance complexities are $\Theta(n^{1/\ell})$, randomized volume complexities are $\Theta(n^{1/k})$, and whose deterministic volume complexities are $\Theta(n)$.

Additionally, we consider connections between our volume model and massively parallel computation (MPC). We give a general simulation argument that any volume-efficient algorithm can be transformed into a space-efficient MPC algorithm.

A serious challenge when finding influential actors in real-world social networks is the lack of knowledge about the structure of the underlying network. Current state-of-the-art methods rely on hand-crafted sampling algorithms; these methods sample nodes and their neighbours in a carefully constructed order and choose opinion leaders from this discovered network to maximize influence spread in the (unknown) complete network. In this work, we propose a reinforcement learning framework for network discovery that automatically learns useful node and graph representations that encode important structural properties of the network. At training time, the method identifies portions of the network such that the nodes selected from this sampled subgraph can effectively influence nodes in the complete network. The realization of such transferable network structure based adaptable policies is attributed to the meticulous design of the framework that encodes relevant node and graph signatures driven by an appropriate reward scheme. We experiment with real-world social networks from four different domains and show that the policies learned by our RL agent provide a 10-36% improvement over the current state-of-the-art method.

The densest subgraph problem, introduced in the 80s by Picard and Queyranne as well as Goldberg, is a classic problem in combinatorial optimization with a wide range of applications. The lowest outdegree orientation problem is known to be its dual problem. We study both the problem of finding dense subgraphs and the problem of computing a low outdegree orientation in the distributed settings.

Suppose $G=(V,E)$ is the underlying network as well as the input graph. Let $D$ denote the density of the maximum density subgraph of $G$. Our main results are as follows.

Given a value $\tilde{D} \leq D$ and $0 < \epsilon < 1$, we show that a subgraph with density at least $(1-\epsilon)\tilde{D}$ can be identified deterministically in $O((\log n) / \epsilon)$ rounds in the LOCAL model. We also present a lower bound showing that our result for the LOCAL model is tight up to an $O(\log n)$ factor.

In the CONGEST model, we show that such a subgraph can be identified in $O((\log^3 n) / \epsilon^3)$ rounds with high probability. Our techniques also lead to an $O(diameter + (\log^4 n)/\epsilon^4)$-round algorithm that yields a $1-\epsilon$ approximation to the densest subgraph. This improves upon the previous $O(diameter /\epsilon \cdot \log n)$-round algorithm by Das Sarma et al. [DISC 2012] that only yields a $1/2-\epsilon$ approximation.

Given an integer $\tilde{D} \geq D$ and $\Omega(1/\tilde{D}) < \epsilon < 1/4$, we give a deterministic, $\tilde{O}((\log^2 n) /\epsilon^2)$-round algorithm in the CONGEST model that computes an orientation where the outdegree of every vertex is upper bounded by $(1+\epsilon)\tilde{D}$. Previously, the best deterministic algorithm and randomized algorithm by Harris [FOCS 2019] run in $\tilde{O}((\log^6 n)/ \epsilon^4)$ rounds and $\tilde{O}((\log^3 n) /\epsilon^3)$ rounds respectively and only work in the LOCAL model.

The multigroup neutron transport criticality calculations using modern supercomputers have been widely employed in a nuclear reactor analysis for studying whether or not a system is self-sustaining. However, the design and development of efficient parallel algorithms for the transport criticality calculations is challenging especially when the number of processor cores is large and an unstructured mesh is adopted. In particular, both the compute time and memory usage have to be carefully taken into consideration due to the dimensionality of the neutron transport equations. In this paper, we study a monolithic multilevel Schwarz preconditioner for the transport criticality calculations based on a nonlinear diffusion acceleration (NDA) method. We propose a monolithic multilevel Schwarz method that is capable of efficiently handling the systems of linear equations for both the transport system and the diffusion system. However, in the multilevel method, algebraically constructing coarse spaces is expensive and often unscalable. We study a subspace-based coarsening algorithm to address such a challenge by exploring the matrix structures of the transport equations and the nonlinear diffusion equations. We numerically demonstrate that the monolithic multilevel preconditioner with the subspace-based coarsening algorithm is twice as fast as that equipped with an unmodified coarsening approach on thousands of processor cores for an unstructured mesh neutron transport problem with billions of unknowns.

Byzantine reliable broadcast is a powerful primitive that allows a set of processes to agree on a message from a designated sender, even if some processes (including the sender) are Byzantine. Existing broadcast protocols for this setting scale poorly, as they typically build on quorum systems with strong intersection guarantees, which results in linear per-process communication and computation complexity.

We generalize the Byzantine reliable broadcast abstraction to the probabilistic setting, allowing each of its properties to be violated with a fixed, arbitrarily small probability. We leverage these relaxed guarantees in a protocol where we replace quorums with stochastic samples. Compared to quorums, samples are significantly smaller in size, leading to a more scalable design. We obtain the first Byzantine reliable broadcast protocol with logarithmic per-process communication and computation complexity.

We conduct a complete and thorough analysis of our protocol, deriving bounds on the probability of each of its properties being compromised. During our analysis, we introduce a novel general technique we call adversary decorators. Adversary decorators allow us to make claims about the optimal strategy of the Byzantine adversary without having to make any additional assumptions. We also introduce Threshold Contagion, a model of message propagation through a system with Byzantine processes. To the best of our knowledge, this is the first formal analysis of a probabilistic broadcast protocol in the Byzantine fault model. We show numerically that practically negligible failure probabilities can be achieved with realistic security parameters.

Stochastic differential equations (SDEs) are widely used to model systems affected by random processes. In general, the analysis of an SDE model requires numerical solutions to be generated many times over multiple parameter combinations. However, this process often requires considerable computational resources to be practicable. Due to the embarrassingly parallel nature of the task, devices such as multi-core processors and graphics processing units (GPUs) can be employed for acceleration.

Here, we present {\bf SODECL} (\url{https://github.com/avramidis/sodecl}), a software library that utilises such devices to calculate multiple orbits of an SDE model. To evaluate the acceleration provided by SODECL, we compared the time required to calculate multiple orbits of an exemplar stochastic model when one CPU core is used, to the time required when using all CPU cores or a GPU. In addition, to assess scalability, we investigated how the model size affected execution time on different parallel compute devices.

Our results show that when using all 32 CPU cores of a high-end high-performance computing node, the task is accelerated by a factor of up to $\simeq$6.7, compared to when using a single CPU core. Executing the task on a high-end GPU yielded accelerations of up to $\simeq$4.5, compared to a single CPU core.

We study the algorithmic problem of estimating the mean of heavy-tailed random vector in $\mathbb{R}^d$, given $n$ i.i.d. samples. The goal is to design an efficient estimator that attains the optimal sub-gaussian error bound, only assuming that the random vector has bounded mean and covariance. Polynomial-time solutions to this problem are known but have high runtime due to their use of semi-definite programming (SDP). Conceptually, it remains open whether convex relaxation is truly necessary for this problem.

In this work, we show that it is possible to go beyond SDP and achieve better computational efficiency. In particular, we provide a spectral algorithm that achieves the optimal statistical performance and runs in time $\widetilde O\left(n^2 d \right)$, improving upon the previous fastest runtime $\widetilde O\left(n^{3.5}+ n^2d\right)$ by Cherapanamjeri el al. (COLT '19). Our algorithm is spectral in that it only requires (approximate) eigenvector computations, which can be implemented very efficiently by, for example, power iteration or the Lanczos method.

At the core of our algorithm is a novel connection between the furthest hyperplane problem introduced by Karnin et al. (COLT '12) and a structural lemma on heavy-tailed distributions by Lugosi and Mendelson (Ann. Stat. '19). This allows us to iteratively reduce the estimation error at a geometric rate using only the information derived from the top singular vector of the data matrix, leading to a significantly faster running time.

A fundamental issue in multiscale materials modeling and design is the consideration of traction-separation behavior at the interface. By enriching the deep material network (DMN) with cohesive layers, the paper presents a novel data-driven material model which enables accurate and efficient prediction of multiscale responses for heterogeneous materials with interfacial effect. In the newly invoked cohesive building block, the fitting parameters have physical meanings related to the length scale and orientation of the cohesive layer. It is shown that the enriched material network can be effectively optimized via a multi-stage training strategy, with training data generated only from linear elastic direct numerical simulation (DNS). The extrapolation capability of the method to unknown material and loading spaces is demonstrated through the debonding analysis of a unidirectional fiber-reinforced composite, where the interface behavior is governed by an irreversible softening mixed-mode cohesive law. Its predictive accuracy is validated against the nonlinear path-dependent DNS results, and the reduction in computational time is particularly significant.

In this paper, we explore meta-learning for few-shot text classification. Meta-learning has shown strong performance in computer vision, where low-level patterns are transferable across learning tasks. However, directly applying this approach to text is challenging--lexical features highly informative for one task may be insignificant for another. Thus, rather than learning solely from words, our model also leverages their distributional signatures, which encode pertinent word occurrence patterns. Our model is trained within a meta-learning framework to map these signatures into attention scores, which are then used to weight the lexical representations of words. We demonstrate that our model consistently outperforms prototypical networks learned on lexical knowledge (Snell et al., 2017) in both few-shot text classification and relation classification by a significant margin across six benchmark datasets (20.0% on average in 1-shot classification).

We introduce a new pre-trainable generic representation for visual-linguistic tasks, called Visual-Linguistic BERT (VL-BERT for short). VL-BERT adopts the simple yet powerful Transformer model as the backbone, and extends it to take both visual and linguistic embedded features as input. In it, each element of the input is either of a word from the input sentence, or a region-of-interest (RoI) from the input image. It is designed to fit for most of the visual-linguistic downstream tasks. To better exploit the generic representation, we pre-train VL-BERT on the massive-scale Conceptual Captions dataset, together with text-only corpus. Extensive empirical analysis demonstrates that the pre-training procedure can better align the visual-linguistic clues and benefit the downstream tasks, such as visual commonsense reasoning, visual question answering and referring expression comprehension. It is worth noting that VL-BERT achieved the first place of single model on the leaderboard of the VCR benchmark. Code is released at \url{https://github.com/jackroos/VL-BERT}.

When an agent acquires new information, ideally it would immediately be capable of using that information to understand its environment. This is not possible using conventional deep neural networks, which suffer from catastrophic forgetting when they are incrementally updated, with new knowledge overwriting established representations. A variety of approaches have been developed that attempt to mitigate catastrophic forgetting in the incremental batch learning scenario, where a model learns from a series of large collections of labeled samples. However, in this setting, inference is only possible after a batch has been accumulated, which prohibits many applications. An alternative paradigm is online learning in a single pass through the training dataset on a resource constrained budget, which is known as streaming learning. Streaming learning has been much less studied in the deep learning community. In streaming learning, an agent learns instances one-by-one and can be tested at any time, rather than only after learning a large batch. Here, we revisit streaming linear discriminant analysis, which has been widely used in the data mining research community. By combining streaming linear discriminant analysis with deep learning, we are able to outperform both incremental batch learning and streaming learning algorithms on both ImageNet ILSVRC-2012 and CORe50, a dataset that involves learning to classify from temporally ordered samples.

Machine-learning (ML) algorithms or models, especially deep neural networks (DNNs), have shown significant promise in several areas. However, researchers have recently demonstrated that ML algorithms, especially DNNs, are vulnerable to adversarial examples (slightly perturbed samples that cause misclassification). The existence of adversarial examples has hindered the deployment of ML algorithms in safety-critical sectors, such as security. Several defenses for adversarial examples exist in the literature. One of the important classes of defenses are manifold-based defenses, where a sample is pulled back" into the data manifold before classifying. These defenses rely on the assumption that data lie in a manifold of a lower dimension than the input space. These defenses use a generative model to approximate the input distribution. In this paper, we investigate the following question: do the generative models used in manifold-based defenses need to be topology-aware? We suggest the answer is yes, and we provide theoretical and empirical evidence to support our claim.

As the use of black-box models becomes ubiquitous in high stake decision-making systems, demands for fair and interpretable models are increasing. While it has been shown that interpretable models can be as accurate as black-box models in several critical domains, existing fair classification techniques that are interpretable by design often display poor accuracy/fairness tradeoffs in comparison with their non-interpretable counterparts. In this paper, we propose FairCORELS, a fair classification technique interpretable by design, whose objective is to learn fair rule lists. Our solution is a multi-objective variant of CORELS, a branch-and-bound algorithm to learn rule lists, that supports several statistical notions of fairness. Examples of such measures include statistical parity, equal opportunity and equalized odds. The empirical evaluation of FairCORELS on real-world datasets demonstrates that it outperforms state-of-the-art fair classification techniques that are interpretable by design while being competitive with non-interpretable ones.

We solve an open question in code-based cryptography by introducing two provably secure group signature schemes from code-based assumptions. Our basic scheme satisfies the CPA-anonymity and traceability requirements in the random oracle model, assuming the hardness of the McEliece problem, the Learning Parity with Noise problem, and a variant of the Syndrome Decoding problem. The construction produces smaller key and signature sizes than the previous group signature schemes from lattices, as long as the cardinality of the underlying group does not exceed $2^{24}$, which is roughly comparable to the current population of the Netherlands. We develop the basic scheme further to achieve the strongest anonymity notion, i.e., CCA-anonymity, with a small overhead in terms of efficiency. The feasibility of two proposed schemes is supported by implementation results. Our two schemes are the first in their respective classes of provably secure groups signature schemes. Additionally, the techniques introduced in this work might be of independent interest. These are a new verifiable encryption protocol for the randomized McEliece encryption and a novel approach to design formal security reductions from the Syndrome Decoding problem.

Many relevant applications in the environmental and socioeconomic sciences use areal data, such as biodiversity checklists, agricultural statistics, or socioeconomic surveys. For applications that surpass the spatial, temporal or thematic scope of any single data source, data must be integrated from several heterogeneous sources. Inconsistent concepts, definitions, or messy data tables make this a tedious and error-prone process. To date, a dedicated tool for organising areal data is still lacking. Here, we introduce the R package \texttt{arealDB} that integrates heterogeneous areal data and associated geometries into a consistent database. It is useful for harmonising language and semantics of variables, relating data to geometries, and documenting metadata and provenance. We illustrate the functionality by integrating two disparate datasets (Brazil, USA) on the harvested area of soybean. The easy-to-use tools in \texttt{arealDB} promise quality-improvements to downstream scientific, monitoring, and management applications but also substantial time-savings to database collation efforts.

In this work we present a monocular visual odometry (VO) algorithm which leverages geometry-based methods and deep learning. Most existing VO/SLAM systems with superior performance are based on geometry and have to be carefully designed for different application scenarios. Moreover, most monocular systems suffer from scale-drift issue.Some recent deep learning works learn VO in an end-to-end manner but the performance of these deep systems is still not comparable to geometry-based methods. In this work, we revisit the basics of VO and explore the right way for integrating deep learning with epipolar geometry and Perspective-n-Point (PnP) method. Specifically, we train two convolutional neural networks (CNNs) for estimating single-view depths and two-view optical flows as intermediate outputs. With the deep predictions, we design a simple but robust frame-to-frame VO algorithm (DF-VO) which outperforms pure deep learning-based and geometry-based methods. More importantly, our system does not suffer from the scale-drift issue being aided by a scale consistent single-view depth CNN. Extensive experiments on KITTI dataset shows the robustness of our system and a detailed ablation study shows the effect of different factors in our system.

Today's deep neural networks require substantial computation resources for their training, storage, and inference, which limits their effective use on resource-constrained devices. Many recent research activities explore different options for compressing and optimizing deep models. On the one hand, in many real-world applications, we face the data imbalance challenge, i.e. when the number of labeled instances of one class considerably outweighs the number of labeled instances of the other class. On the other hand, applications may pose a class imbalance problem, i.e. higher number of false positives produced when training a model and optimizing its performance may be tolerable, yet the number of false negatives must stay low. The problem originates from the fact that some classes are more important for the application than others, e.g. detection problems in medical and surveillance domains. Motivated by the success of the lottery ticket hypothesis, in this paper we propose an iterative deep model compression technique, which keeps the number of false negatives of the compressed model close to the one of the original model at the price of increasing the number of false positives if necessary. Our experimental evaluation using two benchmark data sets shows that the resulting compressed sub-networks 1) achieve up to 35\% lower number of false negatives than the compressed model without class optimization, 2) provide an overall higher \aucroc measure, and 3) use up to 99\% fewer parameters compared to the original network.

A decision list is an ordered list of rules. Each rule is specified by a term, which is a conjunction of literals, and a value. Given an input, the output of a decision list is the value corresponding to the first rule whose term is satisfied by the input. Decision lists generalize both CNFs and DNFs, and have been studied both in complexity theory and in learning theory.

The size of a decision list is the number of rules, and its width is the maximal number of variables in a term. We prove that decision lists of small width can always be approximated by decision lists of small size, where we obtain sharp bounds. This in particular resolves a conjecture of Gopalan, Meka and Reingold (Computational Complexity, 2013) on DNF sparsification.

An ingredient in our proof is a new random restriction lemma, which allows to analyze how DNFs (and more generally, decision lists) simplify if a small fraction of the variables are fixed. This is in contrast to the more commonly used switching lemma, which requires most of the variables to be fixed.

We give lower bounds on the complexity of the word problem of certain non-solvable groups: for a large class of non-solvable infinite groups, including in particular free groups, Grigorchuk's group and Thompson's groups, we prove that their word problem is $\mathsf{NC}^1$-hard. For some of these groups (including Grigorchuk's group and Thompson's groups) we prove that the compressed word problem (which is equivalent to the circuit evaluation problem) is $\mathsf{PSPACE}$-complete.

Kernelization is the fundamental notion for polynomial-time data reduction with performance guarantees. Kernelization for weighted problems particularly requires to also shrink weights. Marx and V\'egh [ACM Trans. Algorithms 2015] and Etscheid et al. [J. Comput. Syst. Sci. 2017] used a technique of Frank and Tardos [Combinatorica 1987] to obtain polynomial-size kernels for weighted problems, mostly with additive goal functions. We lift the technique to linearizable functions, a function type that we introduce and that also contains non-additive functions. Using the lifted technique, we obtain kernelization results for natural problems in graph partitioning, network design, facility location, scheduling, vehicle routing, and computational social choice, thereby improving and generalizing results from the literature.

Graph parameters such as the clique number, the chromatic number, and the independence number are central in many areas, ranging from computer networks to linguistics to computational neuroscience to social networks. In particular, the chromatic number of a graph (i.e., the smallest number of colors needed to color all vertices such that no two adjacent vertices are of the same color) can be applied in solving practical tasks as diverse as pattern matching, scheduling jobs to machines, allocating registers in compiler optimization, and even solving Sudoku puzzles. Typically, however, the underlying graphs are subject to (often minor) changes. To make these applications of graph parameters robust, it is important to know which graphs are stable for them in the sense that adding or deleting single edges or vertices does not change them. We initiate the study of stability of graphs for such parameters in terms of their computational complexity. We show that, for various central graph parameters, the problem of determining whether or not a given graph is stable is complete for \Theta_2^p, a well-known complexity class in the second level of the polynomial hierarchy, which is also known as "parallel access to NP."

Addressing the non-uniform missing mechanism of rating feedback is critical to build a well-performing recommeder in the real-world systems. To tackle the challenging issue, we first define an ideal loss function that should be optimized to achieve the goal of recommendation. Then, we derive the generalization error bound of the ideal loss that alleviates the variance and the misspecification problems of the previous propensity-based methods. We further propose a meta-learning method minimizing the bound. Empirical evaluation using real-world datasets validates the theoretical findings and demonstrates the practical advantages of the proposed upper bound minimization approach.

Generating visualizations and interpretations from high-dimensional data is a common problem in many applications. Two key approaches for tackling this problem are clustering and representation learning. On the one hand, there are very performant deep clustering models, such as DEC and IDEC. On the other hand, there are interpretable representation learning techniques, often relying on latent topological structures such as self-organizing maps. However, current methods do not yet successfully combine these two approaches. We present a novel way to fit self-organizing maps with probabilistic cluster assignments, PSOM, a new deep architecture for probabilistic clustering, DPSOM, and its extension to time series data, T-DPSOM. We show that they achieve superior clustering performance compared to current deep clustering methods on static MNIST/Fashion-MNIST data as well as medical time series, while also inducing an interpretable representation. Moreover, on medical time series, T-DPSOM successfully predicts future trajectories in the original data space.

Scalability in terms of object density in a scene is a primary challenge in unsupervised sequential object-oriented representation learning. Most of the previous models have been shown to work only on scenes with a few objects. In this paper, we propose SCALOR, a probabilistic generative model for learning SCALable Object-oriented Representation of a video. With the proposed spatially-parallel attention and proposal-rejection mechanisms, SCALOR can deal with orders of magnitude larger numbers of objects compared to the previous state-of-the-art models. Additionally, we introduce a background module that allows SCALOR to model complex dynamic backgrounds as well as many foreground objects in the scene. We demonstrate that SCALOR can deal with crowded scenes containing up to a hundred objects while jointly modeling complex dynamic backgrounds. Importantly, SCALOR is the first unsupervised object representation model shown to work for natural scenes containing several tens of moving objects.

It is important to collect credible training samples $(x,y)$ for building data-intensive learning systems (e.g., a deep learning system). In the literature, there is a line of studies on eliciting distributional information from self-interested agents who hold relevant information. Asking people to report complex distribution $p(x)$, though theoretically viable, is challenging in practice. This is primarily due to the heavy cognitive loads required for human agents to reason and report this high dimensional information. This paper introduces a deep learning aided method to incentivize credible sample contributions from selfish and rational agents. The challenge to do so is to design an incentive-compatible score function to score each reported sample to induce truthful reports, instead of an arbitrary or even adversarial one. We show that with accurate estimation of a certain $f$-divergence function we are able to achieve approximate incentive compatibility in eliciting truthful samples. We then present an efficient estimator with theoretical guarantee via studying the variational forms of $f$-divergence function. Our work complements the literature of information elicitation via introducing \emph{sample elicitation}. We also show a connection between this sample elicitation problem and $f$-GAN, and how this connection can help reconstruct an estimator of the distribution based on collected samples. Thorough numerical experiments are conducted to validate our designed mechanisms.

Learning multilingual representations of text has proven a successful method for many cross-lingual transfer learning tasks. There are two main paradigms for learning such representations: (1) alignment, which maps different independently trained monolingual representations into a shared space, and (2) joint training, which directly learns unified multilingual representations using monolingual and cross-lingual objectives jointly. In this paper, we first conduct direct comparisons of representations learned using both of these methods across diverse cross-lingual tasks. Our empirical results reveal a set of pros and cons for both methods, and show that the relative performance of alignment versus joint training is task-dependent. Stemming from this analysis, we propose a simple and novel framework that combines these two previously mutually-exclusive approaches. Extensive experiments demonstrate that our proposed framework alleviates limitations of both approaches, and outperforms existing methods on the MUSE bilingual lexicon induction (BLI) benchmark. We further show that this framework can generalize to contextualized representations such as Multilingual BERT, and produces state-of-the-art results on the CoNLL cross-lingual NER benchmark.

We present simple and efficient algorithms for the batched stochastic multi-armed bandit and batched stochastic linear bandit problems. We prove bounds for their expected regrets that improve over the best-known regret bounds for any number of batches. In particular, our algorithms in both settings achieve the optimal expected regrets by using only a logarithmic number of batches. We also study the batched adversarial multi-armed bandit problem for the first time and find the optimal regret, up to logarithmic factors, of any algorithm with predetermined batch sizes.

We examine the efficiency of Recurrent Neural Networks in forecasting the spatiotemporal dynamics of high dimensional and reduced order complex systems using Reservoir Computing (RC) and Backpropagation through time (BPTT) for gated network architectures. We highlight advantages and limitations of each method and discuss their implementation for parallel computing architectures. We quantify the relative prediction accuracy of these algorithms for the longterm forecasting of chaotic systems using as benchmarks the Lorenz-96 and the Kuramoto-Sivashinsky (KS) equations. We find that, when the full state dynamics are available for training, RC outperforms BPTT approaches in terms of predictive performance and in capturing of the long-term statistics, while at the same time requiring much less training time. However, in the case of reduced order data, large scale RC models can be unstable and more likely than the BPTT algorithms to diverge. In contrast, RNNs trained via BPTT show superior forecasting abilities and capture well the dynamics of reduced order systems. Furthermore, the present study quantifies for the first time the Lyapunov Spectrum of the KS equation with BPTT, achieving similar accuracy as RC. This study establishes that RNNs are a potent computational framework for the learning and forecasting of complex spatiotemporal systems.

A rapid change of channels in high-speed mobile communications will lead to difficulties in channel estimation and tracking but can also provide Doppler diversity. In this paper, the performance of a multiple-input multiple-output system with pilot-assisted repetition coding and spatial multiplexing is studied. With minimum mean square error (MMSE) channel estimation, an equivalent channel model and the corresponding system model are presented. Based on random matrix theory, asymptotic expressions of the normalized achievable sum rate of the linear receivers, such as the maximal ratio combining (MRC) receiver, MMSE receiver and MRC-like receiver, are derived. In addition, according to the symbol error rate of the MRC-like receiver, the maximum normalized Doppler diversity order and the minimum coding gain loss can be achieved when the repetition number and signal-to-noise ratio tend to infinity, and the corresponding conditions are derived. Based on the theoretical results, the impacts of different system configurations and channel parameters on the system performance are demonstrated.

Deployment and operation of autonomous underwater vehicles is expensive and time-consuming. High-quality realistic sonar data simulation could be of benefit to multiple applications, including training of human operators for post-mission analysis, as well as tuning and validation of autonomous target recognition (ATR) systems for underwater vehicles. Producing realistic synthetic sonar imagery is a challenging problem as the model has to account for specific artefacts of real acoustic sensors, vehicle altitude, and a variety of environmental factors. We propose a novel method for generating realistic-looking sonar side-scans of full-length missions, called Markov Conditional pix2pix (MC-pix2pix). Quantitative assessment results confirm that the quality of the produced data is almost indistinguishable from real. Furthermore, we show that bootstrapping ATR systems with MC-pix2pix data can improve the performance. Synthetic data is generated 18 times faster than real acquisition speed, with full user control over the topography of the generated data.

We introduce an alternative closed form lower bound on the Gaussian process ($\mathcal{GP}$) likelihood based on the R\'enyi $\alpha$-divergence. This new lower bound can be viewed as a convex combination of the Nystr\"om approximation and the exact $\mathcal{GP}$. The key advantage of this bound, is its capability to control and tune the enforced regularization on the model and thus is a generalization of the traditional variational $\mathcal{GP}$ regression. From a theoretical perspective, we provide the convergence rate and risk bound for inference using our proposed approach. Experiments on real data show that the proposed algorithm may be able to deliver improvement over several $\mathcal{GP}$ inference methods.

A robust observer for performing power system dynamic state estimation (DSE) of a synchronous generator is proposed. The observer is developed using the concept of $\mathcal{L}_{\infty}$ stability for uncertain, nonlinear dynamic generator models. We use this concept to (i) design a simple, scalable, and robust dynamic state estimator and (ii) obtain a performance guarantee on the state estimation error norm relative to the magnitude of uncertainty from unknown generator inputs, and process and measurement noises. Theoretical methods to obtain upper and lower bounds on the estimation error are also provided. Numerical tests validate the performance of the $\mathcal{L}_{\infty}$-based estimator in performing DSE under various scenarios. The case studies reveal that the derived theoretical bounds are valid for a variety of case studies and operating conditions, while yielding better performance than existing power system DSE methods.

Modern graph or network datasets often contain rich structure that goes beyond simple pairwise connections between nodes. This calls for complex representations that can capture, for instance, edges of different types as well as so-called "higher-order interactions" that involve more than two nodes at a time. However, we have fewer rigorous methods that can provide insight from such representations. Here, we develop a computational framework for the problem of clustering hypergraphs with categorical edge labels --- or different interaction types --- where clusters corresponds to groups of nodes that frequently participate in the same type of interaction.

Our methodology is based on a combinatorial objective function that is related to correlation clustering on graphs but enables the design of much more efficient algorithms that also seamlessly generalize to hypergraphs. When there are only two label types, our objective can be optimized in polynomial time, using an algorithm based on minimum cuts. Minimizing our objective becomes NP-hard with more than two label types, but we develop fast approximation algorithms based on linear programming relaxations that have theoretical cluster quality guarantees. We demonstrate the efficacy of our algorithms and the scope of the model through problems in edge-label community detection, clustering with temporal data, and exploratory data analysis.

Consequential decision-making incentivizes individuals to strategically adapt their behavior to the specifics of the decision rule. While a long line of work has viewed strategic adaptation as gaming and attempted to mitigate its effects, recent work has instead sought to design classifiers that incentivize individuals to improve a desired quality. Key to both accounts is a cost function that dictates which adaptations are rational to undertake. In this work, we develop a causal framework for strategic adaptation. Our causal perspective clearly distinguishes between gaming and improvement and reveals an important obstacle to incentive design. We prove any procedure for designing classifiers that incentivize improvement must inevitably solve a non-trivial causal inference problem. Moreover, we show a similar result holds for designing cost functions that satisfy the requirements of previous work. With the benefit of hindsight, our results show much of the prior work on strategic classification is causal modeling in disguise.

This extended abstract describes our solution for the Traffic4Cast Challenge 2019. The key problem we addressed is to properly model both low-level (pixel based) and high-level spatial information while still preserve the temporal relations among the frames. Our approach is inspired by the recent adoption of convolutional features into a recurrent neural networks such as LSTM to jointly capture the spatio-temporal dependency. While this approach has been proven to surpass the traditional stacked CNNs (using 2D or 3D kernels) in action recognition, we observe suboptimal performance in traffic prediction setting. Therefore, we apply a number of adaptations in the frame encoder-decoder layers and in sampling procedure to better capture the high-resolution trajectories, and to increase the training efficiency.

The performance of optimizers, particularly in deep learning, depends considerably on their chosen hyperparameter configuration. The efficacy of optimizers is often studied under near-optimal problem-specific hyperparameters, and finding these settings may be prohibitively costly for practitioners. In this work, we argue that a fair assessment of optimizers' performance must take the computational cost of hyperparameter tuning into account, i.e., how easy it is to find good hyperparameter configurations using an automatic hyperparameter search. Evaluating a variety of optimizers on an extensive set of standard datasets and architectures, our results indicate that Adam is the most practical solution, particularly in low-budget scenarios.

Multiple access technology plays an important role in wireless communication in the last decades: it increases the capacity of the channel and allows different users to access the system simultaneously. However, the conventional multiple access technology, as originally designed for current human-centric wireless networks, is not scalable for future machine-centric wireless networks.

Massive access (also known as Massive-device Multiple Access, Unsourced Massive Random Access, Massive Connectivity, Massive Machine-type Communication, and Many-Access Channels) exhibits a clean break with current networks by potentially supporting millions of devices in each cellular network. The tremendous growth in the number of connected devices requires a fundamental rethinking of the conventional multiple access technologies in favor of new schemes suited to massive random access. Among the the many new challenges arising in this setting, the most relevant are: the fundamental limits of communication from a massive number of bursty devices transmitting simultaneously with short packets, the design of low complexity and energy-efficient massive access coding and communication schemes, efficient methods for the detection of a relatively small number of active users among a large number of potential user devices with sporadic transmission pattern, and the integration of massive access with massive MIMO and other important wireless communication technologies. This paper presents an overview of the concept of massive access wireless communication and of the contemporary research on this important topic.

Generative Adversarial Networks (GANs) are powerful class of generative models in the deep learning community. Current practice on large-scale GAN training~\citep{brock2018large} utilizes large models and distributed large-batch training strategies, and is implemented on deep learning frameworks (e.g., TensorFlow, PyTorch, etc.) designed in a centralized manner. In the centralized network topology, every worker needs to communicate with the central node. However, when the network bandwidth is low or network latency is high, the performance would be significantly degraded. Despite recent progress on decentralized algorithms for training deep neural networks, it remains unclear whether it is possible to train GANs in a decentralized manner. The main difficulty lies at handling the nonconvex-nonconcave min-max optimization and the decentralized communication simultaneously. In this paper, we address this difficulty by designing the \textbf{first gradient-based decentralized parallel algorithm} which allows workers to have multiple rounds of communications in one iteration and to update the discriminator and generator simultaneously, and this design makes it amenable for the convergence analysis of the proposed decentralized algorithm. Theoretically, our proposed decentralized algorithm is able to solve a class of non-convex non-concave min-max problems with provable non-asymptotic convergence to first-order stationary point. Experimental results on GANs demonstrate the effectiveness of the proposed algorithm.

Few-shot learning systems for sound event recognition have gained interests since they require only a few examples to adapt to new target classes without fine-tuning. However, such systems have only been applied to chunks of sounds for classification or verification. In this paper, we aim to achieve few-shot detection of rare sound events, from query sequence that contain not only the target events but also the other events and background noise. Therefore, it is required to prevent false positive reactions to both the other events and background noise. We propose metric learning with background noise class for the few-shot detection. The contribution is to present the explicit inclusion of background noise as an independent class, a suitable loss function that emphasizes this additional class, and a corresponding sampling strategy that assists training. It provides a feature space where the event classes and the background noise class are sufficiently separated. Evaluations on few-shot detection tasks, using DCASE 2017 task2 and ESC-50, show that our proposed method outperforms metric learning without considering the background noise class. The few-shot detection performance is also comparable to that of the DCASE 2017 task2 baseline system, which requires huge amount of annotated audio data.

We propose a new model for supervised learning to rank. In our model, the relevance labels are assumed to follow a categorical distribution whose probabilities are constructed based on a scoring function. We optimize the training objective with respect to the multivariate categorical variables with an unbiased and low-variance gradient estimator. Learning-to-rank methods can generally be categorized into pointwise, pairwise, and listwise approaches. Although our scoring function is pointwise, the proposed framework permits flexibility over the choice of the loss function. In our new model, the loss function need not be differentiable and can either be pointwise or listwise. Our proposed method achieves better or comparable results on two datasets compared with existing pairwise and listwise methods.

The Burrows-Wheeler Transform (BWT) has been an essential tool in text compression and indexing. First introduced in 1994, it went on to provide the backbone for the first encoding of the classic suffix tree data structure in space close to the entropy-based lower bound. Recently, there has been the development of compact suffix trees in space proportional to "$r$", the number of runs in the BWT, as well as the appearance of $r$ in the time complexity of new algorithms. Unlike other popular measures of compression, the parameter $r$ is sensitive to the lexicographic ordering given to the text's alphabet. Despite several past attempts to exploit this, a provably efficient algorithm for finding, or approximating, an alphabet ordering which minimizes $r$ has been open for years.

We present the first set of results on the computational complexity of minimizing BWT-runs via alphabet reordering. We prove that the decision version of this problem is NP-complete and cannot be solved in time $2^{o(\sigma + \sqrt{n})}$ unless the Exponential Time Hypothesis fails, where $\sigma$ is the size of the alphabet and $n$ is the length of the text. We also show that the optimization problem is APX-hard. In doing so, we relate two previously disparate topics: the optimal traveling salesperson path and the number of runs in the BWT of a text, providing a surprising connection between problems on graphs and text compression. Also, by relating recent results in the field of dictionary compression, we illustrate that an arbitrary alphabet ordering provides a $O(\log^2 n)$-approximation.

We provide an optimal linear-time algorithm for the problem of finding a run minimizing ordering on a subset of symbols (occurring only once) under ordering constraints, and prove a generalization of this problem to a class of graphs with BWT like properties called Wheeler graphs is NP-complete.

We study the problem of verifying differential privacy for loop-free programs with probabilistic choice. Programs in this class can be seen as randomized Boolean circuits, which we will use as a formal model to answer two different questions: first, deciding whether a program satisfies a prescribed level of privacy; second, approximating the privacy parameters a program realizes. We show that the problem of deciding whether a program satisfies $\varepsilon$-differential privacy is $coNP^{\#P}$-complete. In fact, this is the case when either the input domain or the output range of the program is large. Further, we show that deciding whether a program is $(\varepsilon,\delta)$-differentially private is $coNP^{\#P}$-hard, and in $coNP^{\#P}$ for small output domains, but always in $coNP^{\#P^{\#P}}$. Finally, we show that the problem of approximating the level of differential privacy is both $NP$-hard and $coNP$-hard. These results complement previous results by Murtagh and Vadhan showing that deciding the optimal composition of differentially private components is $\#P$-complete, and that approximating the optimal composition of differentially private components is in $P$.

We consider a class of nonsmooth optimization problems over the Stiefel manifold, in which the objective function is weakly convex in the ambient Euclidean space. Such problems are ubiquitous in engineering applications but still largely unexplored. We present a family of Riemannian subgradient-type methods---namely Riemannain subgradient, incremental subgradient, and stochastic subgradient methods---to solve these problems and show that they all have an iteration complexity of ${\cal O}(\varepsilon^{-4})$ for driving a natural stationarity measure below $\varepsilon$. In addition, we establish the local linear convergence of the Riemannian subgradient and incremental subgradient methods when the problem at hand further satisfies a sharpness property and the algorithms are properly initialized and use geometrically diminishing stepsizes. To the best of our knowledge, these are the first convergence guarantees for using Riemannian subgradient-type methods to optimize a class of nonconvex nonsmooth functions over the Stiefel manifold. The fundamental ingredient in the proof of the aforementioned convergence results is a new Riemannian subgradient inequality for restrictions of weakly convex functions on the Stiefel manifold, which could be of independent interest. We also show that our convergence results can be extended to handle a class of compact embedded submanifolds of the Euclidean space. Finally, we discuss the sharpness properties of various formulations of the robust subspace recovery and orthogonal dictionary learning problems and demonstrate the convergence performance of the algorithms on both problems via numerical simulations.

We study the problem of maximizing a non-monotone submodular function under multiple knapsack constraints. We propose a simple discrete greedy algorithm to approach this problem, and prove that it yields strong approximation guarantees for functions with bounded curvature. In contrast to other heuristics, this requires no problem relaxation to continuous domains and it maintains a constant-factor approximation guarantee in the problem size. In the case of a single knapsack, our analysis suggests that the standard greedy can be used in non-monotone settings.

Additionally, we study this problem in a dynamic setting, by which knapsacks change during the optimization process. We modify our greedy algorithm to avoid a complete restart at each constraint update. This modification retains the approximation guarantees of the static case.

We evaluate our results experimentally on a video summarization and sensor placement task. We show that our proposed algorithm competes with the state-of-the-art in static settings. Furthermore, we show that in dynamic settings with tight computational time budget, our modified greedy yields significant improvements over starting the greedy from scratch, in terms of the solution quality achieved.

The control of constrained systems using model predictive control (MPC) becomes more challenging when full state information is not available and when the nominal system model and measurements are corrupted by noise. Since these conditions are often seen in practical scenarios, techniques such as robust output feedback MPC have been developed to address them. However, existing approaches to robust output feedback MPC are challenged by increased complexity of the online optimization problem, increased computational requirements for controller synthesis, or both. In this work we present a simple and efficient methodology for synthesizing a tube-based robust output feedback MPC scheme for linear, discrete, time-invariant systems subject to bounded, additive disturbances. Specifically, we first formulate a scheme where the online MPC optimization problem has the same complexity as in the nominal full state feedback MPC by using a single tube with constant cross-section. This makes our proposed approach simpler to implement and less computationally demanding than previous methods for both online implementation and offline controller synthesis. Secondly, we propose a novel and simple procedure for the computation of robust positively invariant (RPI) sets that are approximations of the minimal RPI set, which can be used to define the tube in the proposed control scheme.

LTLf synthesis is the automated construction of a reactive system from a high-level description, expressed in LTLf, of its finite-horizon behavior. So far, the conversion of LTLf formulas to deterministic finite-state automata (DFAs) has been identified as the primary bottleneck to the scalabity of synthesis. Recent investigations have also shown that the size of the DFA state space plays a critical role in synthesis as well.

Therefore, effective resolution of the bottleneck for synthesis requires the conversion to be time and memory performant, and prevent state-space explosion. Current conversion approaches, however, which are based either on explicit-state representation or symbolic-state representation, fail to address these necessities adequately at scale: Explicit-state approaches generate minimal DFA but are slow due to expensive DFA minimization. Symbolic-state representations can be succinct, but due to the lack of DFA minimization they generate such large state spaces that even their symbolic representations cannot compensate for the blow-up.

This work proposes a hybrid representation approach for the conversion. Our approach utilizes both explicit and symbolic representations of the state-space, and effectively leverages their complementary strengths. In doing so, we offer an LTLf to DFA conversion technique that addresses all three necessities, hence resolving the bottleneck. A comprehensive empirical evaluation on conversion and synthesis benchmarks supports the merits of our hybrid approach.

Performance monitoring is an essential function for margin measurements in live systems. Historically, system budgets have been described by the Q-factor converted from the bit error rate (BER) under binary modulation and direct detection. The introduction of hard-decision forward error correction (FEC) did not change this. In recent years technologies have changed significantly to comprise coherent detection, multilevel modulation and soft FEC. In such advanced systems, different metrics such as (nomalized) generalized mutual information (GMI/NGMI) and asymmetric information (ASI) are regarded as being more reliable. On the other hand, Q budgets are still useful because pre-FEC BER monitoring is established in industry for live system monitoring.

The pre-FEC BER is easily estimated from available information of the number of flipped bits in the FEC decoding, which does not require knowledge of the transmitted bits that are unknown in live systems. Therefore, the use of metrics like GMI/NGMI/ASI for performance monitoring has not been possible in live systems. However, in this work we propose a blind soft-performance estimation method. Based on a histogram of log-likelihood-values without the knowledge of the transmitted bits, we show how the ASI can be estimated.

We examined the proposed method experimentally for 16 and 64-ary quadrature amplitude modulation (QAM) and probabilistically shaped 16, 64, and 256-QAM in recirculating loop experiments. We see a relative error of 3.6%, which corresponds to around 0.5 dB signal-to-noise ratio difference for binary modulation, in the regime where the ASI is larger than the assumed FEC threshold. For this proposed method, the digital signal processing circuitry requires only a minimal additional function of storing the L-value histograms before the soft-decision FEC decoder.

We present a complete classification of the deterministic distributed time complexity for a family of graph problems: binary labeling problems in trees. These are locally checkable problems that can be encoded with an alphabet of size two in the edge labeling formalism. Examples of binary labeling problems include sinkless orientation, sinkless and sourceless orientation, 2-vertex coloring, perfect matching, and the task of coloring edges red and blue such that all nodes are incident to at least one red and at least one blue edge. More generally, we can encode e.g. any cardinality constraints on indegrees and outdegrees.

We study the deterministic time complexity of solving a given binary labeling problem in trees, in the usual LOCAL model of distributed computing. We show that the complexity of any such problem is in one of the following classes: $O(1)$, $\Theta(\log n)$, $\Theta(n)$, or unsolvable. In particular, a problem that can be represented in the binary labeling formalism cannot have time complexity $\Theta(\log^* n)$, and hence we know that e.g. any encoding of maximal matchings has to use at least three labels (which is tight).

Furthermore, given the description of any binary labeling problem, we can easily determine in which of the four classes it is and what is an asymptotically optimal algorithm for solving it. Hence the distributed time complexity of binary labeling problems is decidable, not only in principle, but also in practice: there is a simple and efficient algorithm that takes the description of a binary labeling problem and outputs its distributed time complexity.

Neural Ordinary Differential Equations (N-ODEs) are a powerful building block for learning systems, which extend residual networks to a continuous-time dynamical system. We propose a Bayesian version of N-ODEs that enables well-calibrated quantification of prediction uncertainty, while maintaining the expressive power of their deterministic counterpart. We assign Bayesian Neural Nets (BNNs) to both the drift and the diffusion terms of a Stochastic Differential Equation (SDE) that models the flow of the activation map in time. We infer the posterior on the BNN weights using a straightforward adaptation of Stochastic Gradient Langevin Dynamics (SGLD). We illustrate significantly improved stability on two synthetic time series prediction tasks and report better model fit on UCI regression benchmarks with our method when compared to its non-Bayesian counterpart.

This paper builds on the connection between graph neural networks and traditional dynamical systems. We propose continuous graph neural networks (CGNN), which generalise existing graph neural networks with discrete dynamics in that they can be viewed as a specific discretisation scheme. The key idea is how to characterise the continuous dynamics of node representations, i.e. the derivatives of node representations, w.r.t. time. Inspired by existing diffusion-based methods on graphs (e.g. PageRank and epidemic models on social networks), we define the derivatives as a combination of the current node representations, the representations of neighbors, and the initial values of the nodes. We propose and analyse two possible dynamics on graphs---including each dimension of node representations (a.k.a. the feature channel) change independently or interact with each other---both with theoretical justification. The proposed continuous graph neural networks are robust to over-smoothing and hence allow us to build deeper networks, which in turn are able to capture the long-range dependencies between nodes. Experimental results on the task of node classification demonstrate the effectiveness of our proposed approach over competitive baselines.

Deep neural networks (DNNs) are poorly calibrated when trained in conventional ways. To improve confidence calibration of DNNs, we propose a novel training method, distance-based learning from errors (DBLE). DBLE bases its confidence estimation on distances in the representation space. In DBLE, we first adapt prototypical learning to train classification models. It yields a representation space where the distance between a test sample and its ground truth class center can calibrate the model's classification performance. At inference, however, these distances are not available due to the lack of ground truth labels. To circumvent this by inferring the distance for every test sample, we propose to train a confidence model jointly with the classification model. We integrate this into training by merely learning from mis-classified training samples, which we show to be highly beneficial for effective learning. On multiple datasets and DNN architectures, we demonstrate that DBLE outperforms alternative single-model confidence calibration approaches. DBLE also achieves comparable performance with computationally-expensive ensemble approaches with lower computational cost and lower number of parameters.

Singing voice conversion is to convert a singer's voice to another one's voice without changing singing content. Recent work shows that unsupervised singing voice conversion can be achieved with an autoencoder-based approach [1]. However, the converted singing voice can be easily out of key, showing that the existing approach cannot model the pitch information precisely. In this paper, we propose to advance the existing unsupervised singing voice conversion method proposed in [1] to achieve more accurate pitch translation and flexible pitch manipulation. Specifically, the proposed PitchNet added an adversarially trained pitch regression network to enforce the encoder network to learn pitch invariant phoneme representation, and a separate module to feed pitch extracted from the source audio to the decoder network. Our evaluation shows that the proposed method can greatly improve the quality of the converted singing voice (2.92 vs 3.75 in MOS). We also demonstrate that the pitch of converted singing can be easily controlled during generation by changing the levels of the extracted pitch before passing it to the decoder network.

End-to-end approaches for automatic speech recognition (ASR) benefit from directly modeling the probability of the word sequence given the input audio stream in a single neural network. However, compared to conventional ASR systems, these models typically require more data to achieve comparable results. Well-known model adaptation techniques, to account for domain and style adaptation, are not easily applicable to end-to-end systems. Conventional HMM-based systems, on the other hand, have been optimized for various production environments and use cases. In this work, we propose to combine the benefits of end-to-end approaches with a conventional system using an attention-based discriminative language model that learns to rescore the output of a first-pass ASR system. We show that learning to rescore a list of potential ASR outputs is much simpler than learning to generate the hypothesis. The proposed model results in 8% improvement in word error rate even when the amount of training data is a fraction of data used for training the first-pass system.

We propose a novel formulation of group fairness in the contextual multi-armed bandit (CMAB) setting. In the CMAB setting a sequential decision maker must at each time step choose an arm to pull from a finite set of arms after observing some context for each of the potential arm pulls. In our model arms are partitioned into two or more sensitive groups based on some protected feature (e.g., age, race, or socio-economic status). Despite the fact that there may be differences in expected payout between the groups, we may wish to ensure some form of fairness between picking arms from the various groups. In this work we explore two definitions of fairness: equal group probability, wherein the probability of pulling an arm from any of the protected groups is the same; and proportional parity, wherein the probability of choosing an arm from a particular group is proportional to the size of that group. We provide a novel algorithm that can accommodate these notions of fairness for an arbitrary number of groups, and provide bounds on the regret for our algorithm. We then validate our algorithm using synthetic data as well as two real-world datasets for intervention settings wherein we want to allocate resources fairly across protected groups.

We present a framework capable of tackilng the problem of continual object recognition in a setting which resembles that under whichhumans see and learn. This setting has a set of unique characteristics:it assumes an egocentric point-of-view bound to the needs of a singleperson, which implies a relatively low diversity of data and a coldstart with no data; it requires to operate in an open world, where newobjects can be encounteredat any time; supervision is scarce and hasto be solicited to the user, and completelyunsupervised recognitionof new objects should be possible. Note that this setting differs fromthe one addressed in the open world recognition literature, where supervised feedback is always requested to be able to incorporate newobjects. We propose a first solution to this problem in the form ofa memory-based incremental framework that is capable of storinginformation of each and any object it encounters, while using the supervision of the user to learn to discriminate between known and unknown objects. Our approach is based on four main features: the useof time and space persistence (i.e., the appearance of objects changesrelatively slowly), the use of similarity as the main driving principlefor object recognition and novelty detection, the progressive introduction of new objects in a developmental fashion and the selectiveelicitation of user feedback in an online active learning fashion. Experimental results show the feasibility of open world, generic objectrecognition, the ability to recognize, memorize and re-identify newobjects even in complete absence of user supervision, and the utilityof persistence and incrementality in boosting performance.

This work is motivated by the need of collecting fresh data from power-constrained sensors in the industrial Internet of Things (IIoT) network. A recently proposed metric, the Age of Information (AoI) is adopted to measure data freshness from the perspective of the central controller in the IIoT network. We wonder what is the minimum average AoI the network can achieve and how to design scheduling algorithms to approach it. To answer these questions when the channel states of the network are Markov time-varying and scheduling decisions are restricted to bandwidth constraint, we first decouple the multi-sensor scheduling problem into a single-sensor constrained Markov decision process (CMDP) through relaxation of the hard bandwidth constraint. Next we exploit the threshold structure of the optimal policy for the decoupled single sensor CMDP and obtain the optimum solution through linear programming (LP). Finally, an asymptotically optimal truncated policy that can satisfy the hard bandwidth constraint is built upon the optimal solution to each of the decoupled single-sensor. Our investigation shows that to obtain a small AoI performance: (1) The scheduler exploits good channels to schedule sensors supported by limited power; (2) Sensors equipped with enough transmission power are updated in a timely manner such that the bandwidth constraint can be satisfied.

We propose FC, a logic on words that combines the previous approaches of finite-model theory and the theory of concatenation. It has immediate applications to spanners, a formalism for extracting structured data from text that has recently received considerable attention in database theory. In fact, FC is designed to be to spanners what FO is to relational databases.

Like the theory of concatenation, FC is built around word equations; in contrast to it, its semantics are defined to only allow finite models, by limiting the universe to a word and all its subwords. As a consequence of this, FC has many of the desirable properties of FO[<], while being far more expressive. Most noteworthy among these desirable properties are sufficient criteria for efficient model checking and capturing various complexity classes by extending the logic with appropriate closure or iteration operators.

These results allow us to obtain new insights into and techniques for the expressive power and efficient evaluation of spanners. More importantly, FC provides us with a general framework for logic on words that has potential applications far beyond spanners.

In this paper, we introduce Random Path Generative Adversarial Network (RPGAN) -- an alternative design of GANs that can serve as a tool for generative model analysis. While the latent space of a typical GAN consists of input vectors, randomly sampled from the standard Gaussian distribution, the latent space of RPGAN consists of random paths in a generator network. As we show, this design allows to understand factors of variation, captured by different generator layers, providing their natural interpretability. With experiments on standard benchmarks, we demonstrate that RPGAN reveals several interesting insights about the roles that different layers play in the image generation process. Aside from interpretability, the RPGAN model also provides competitive generation quality and allows efficient incremental learning on new data.

We study risk of the minimum norm linear least squares estimator in when the number of parameters $d$ depends on $n$, and $\frac{d}{n} \rightarrow \infty$. We assume that data has an underlying low rank structure by restricting ourselves to spike covariance matrices, where a fixed finite number of eigenvalues grow with $n$ and are much larger than the rest of the eigenvalues, which are (asymptotically) in the same order. We show that in this setting risk of minimum norm least squares estimator vanishes in compare to risk of the null estimator. We give asymptotic and non asymptotic upper bounds for this risk, and also leverage the assumption of spike model to give an analysis of the bias that leads to tighter bounds in compare to previous works.

A partition $\mathcal{P}$ of a weighted graph $G$ is $(\sigma,\tau,\Delta)$-sparse if every cluster has diameter at most $\Delta$, and every ball of radius $\Delta/\sigma$ intersects at most $\tau$ clusters. Similarly, $\mathcal{P}$ is $(\sigma,\tau,\Delta)$-scattering if instead for balls we require that every shortest path of length at most $\Delta/\sigma$ intersects at most $\tau$ clusters. Given a graph $G$ that admits a $(\sigma,\tau,\Delta)$-sparse partition for all $\Delta>0$, Jia et al. [STOC05] constructed a solution for the Universal Steiner Tree problem (and also Universal TSP) with stretch $O(\tau\sigma^2\log_\tau n)$. Given a graph $G$ that admits a $(\sigma,\tau,\Delta)$-scattering partition for all $\Delta>0$, we construct a solution for the Steiner Point Removal problem with stretch $O(\tau^3\sigma^3)$. We then construct sparse and scattering partitions for various different graph families, receiving many new results for the Universal Steiner Tree and Steiner Point Removal problems.

Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences. We introduce two techniques to improve the efficiency of Transformers. For one, we replace dot-product attention by one that uses locality-sensitive hashing, changing its complexity from O($L^2$) to O($L\log L$), where $L$ is the length of the sequence. Furthermore, we use reversible residual layers instead of the standard residuals, which allows storing activations only once in the training process instead of $N$ times, where $N$ is the number of layers. The resulting model, the Reformer, performs on par with Transformer models while being much more memory-efficient and much faster on long sequences.

This paper proposes a new method for interpreting and simplifying a black box model of a deep random forest (RF) using a proposed rule elimination. In deep RF, a large number of decision trees are connected to multiple layers, thereby making an analysis difficult. It has a high performance similar to that of a deep neural network (DNN), but achieves a better generalizability. Therefore, in this study, we consider quantifying the feature contributions and frequency of the fully trained deep RF in the form of a decision rule set. The feature contributions provide a basis for determining how features affect the decision process in a rule set. Model simplification is achieved by eliminating unnecessary rules by measuring the feature contributions. Consequently, the simplified model has fewer parameters and rules than before. Experiment results have shown that a feature contribution analysis allows a black box model to be decomposed for quantitatively interpreting a rule set. The proposed method was successfully applied to various deep RF models and benchmark datasets while maintaining a robust performance despite the elimination of a large number of rules.

Diverse inverse problems in imaging can be cast as variational problems composed of a task-specific data fidelity term and a regularization term. In this paper, we propose a novel learnable general-purpose regularizer exploiting recent architectural design patterns from deep learning. We cast the learning problem as a discrete sampled optimal control problem, for which we derive the adjoint state equations and an optimality condition. By exploiting the variational structure of our approach, we perform a sensitivity analysis with respect to the learned parameters obtained from different training datasets. Moreover, we carry out a nonlinear eigenfunction analysis, which reveals interesting properties of the learned regularizer. We show state-of-the-art performance for classical image restoration and medical image reconstruction problems.

Quantum effects are known to provide an advantage in particle transfer across networks. In order to achieve this advantage, requirements on both a graph type and a quantum system coherence must be found. Here we show that the process of finding these requirements can be automated by learning from simulated examples. The automation is done by using a convolutional neural network of a particular type that learns to understand with which network and under which coherence requirements quantum advantage is possible. Our machine learning approach is applied to study noisy quantum walks on cycle graphs of different sizes. We found that it is possible to predict the existence of quantum advantage for the entire decoherence parameter range, even for graphs outside of the training set. Our results are of importance for demonstration of advantage in quantum experiments and pave the way towards automating scientific research and discoveries.

This paper presents BigEarthNet that is a large-scale Sentinel-2 multispectral image dataset with a new class nomenclature to advance deep learning (DL) studies in remote sensing (RS). BigEarthNet is made up of 590,326 image patches annotated with multi-labels provided by the CORINE Land Cover (CLC) map of 2018 based on its most thematic detailed Level-3 class nomenclature. Initial research demonstrates that some CLC classes are challenging to be accurately described by considering only Sentinel-2 images. To increase the effectiveness of BigEarthNet, in this paper we introduce an alternative class-nomenclature to allow DL models for better learning and describing the complex spatial and spectral information content of the Sentinel-2 images. This is achieved by interpreting and arranging the CLC Level-3 nomenclature based on the properties of Sentinel-2 images in a new nomenclature of 19 classes. Then, the new class-nomenclature of BigEarthNet is used within state-of-the-art DL models in the context of multi-label classification. Results show that the models trained from scratch on BigEarthNet outperform those pre-trained on ImageNet, especially in relation to some complex classes including agriculture, other vegetated and natural environments. All DL models are made publicly available at this http URL, offering an important resource to guide future progress on RS image analysis.

The survey provides an overview of the developing area of parameterized algorithms for graph modification problems. We concentrate on edge modification problems, where the task is to change a small number of adjacencies in a graph in order to satisfy some required property.

Derksen proved that the spectral norm is multiplicative with respect to vertical tensor products (also known as tensor Kronecker products). We will use this result to show that the nuclear norm and other norms of interest are also multiplicative with respect to vertical tensor products.

Multi-writer distributed atomic registers are at the heart of a large number of distributed algorithms. While enjoying the benefits of atomicity, researchers further explore fast implementations of atomic reigsters which are optimal in terms of data access latency. Though it is proved that multi-writer atomic register implementations are impossible when both read and write are required to be fast, it is still open whether implementations are impossible when only write or read is required to be fast. This work proves the impossibility of fast write implementations based on a series of chain arguments among indistiguishable executions. We also show the necessary and sufficient condition for fast read implementations by extending the results in the single-writer case. This work completes a series of studies on fast implementations of distributed atomic registers.

Modern embedded computing platforms consist of a high amount of heterogeneous resources, which allows executing multiple applications on a single device. The number of running application on the system varies with time and so does the amount of available resources. This has considerably increased the complexity of analysis and optimization algorithms for runtime mapping of firm real-time applications. To reduce the runtime overhead, researchers have proposed to pre-compute partial mappings at compile time and have the runtime efficiently compute the final mapping. However, most existing solutions only compute a fixed mapping for a given set of running applications, and the mapping is defined for the entire duration of the workload execution. In this work we allow applications to adapt to the amount of available resources by using mapping segments. This way, applications may switch between different configurations with varied degree of parallelism. We present a runtime manager for firm real-time applications that generates such mapping segments based on partial solutions and aims at minimizing the overall energy consumption without deadline violations. The proposed algorithm outperforms the state-of-the-art approaches on the overall energy consumption by up to 13% while incurring an order of magnitude less scheduling overhead.

Few-shot classification aims to recognize novel categories with only few labeled images in each class. Existing metric-based few-shot classification algorithms predict categories by comparing the feature embeddings of query images with those from a few labeled images (support examples) using a learned metric function. While promising performance has been demonstrated, these methods often fail to generalize to unseen domains due to large discrepancy of the feature distribution across domains. In this work, we address the problem of few-shot classification under domain shifts for metric-based methods. Our core idea is to use feature-wise transformation layers for augmenting the image features using affine transforms to simulate various feature distributions under different domains in the training stage. To capture variations of the feature distributions under different domains, we further apply a learning-to-learn approach to search for the hyper-parameters of the feature-wise transformation layers. We conduct extensive experiments and ablation studies under the domain generalization setting using five few-shot classification datasets: mini-ImageNet, CUB, Cars, Places, and Plantae. Experimental results demonstrate that the proposed feature-wise transformation layer is applicable to various metric-based models, and provides consistent improvements on the few-shot classification performance under domain shift.

Centralized DNS over HTTPS/TLS (DoH/DoT) resolution, which has started being deployed by major hosting providers and web browsers, has sparked controversy among Internet activists and privacy advocates due to several privacy concerns. This design decision causes the trace of all DNS resolutions to be exposed to a third-party resolver, different than the one specified by the user's access network. In this work we propose K-resolver, a DNS resolution mechanism that disperses DNS queries across multiple DoH resolvers, reducing the amount of information about a user's browsing activity exposed to each individual resolver. As a result, none of the resolvers can learn a user's entire web browsing history. We have implemented a prototype of our approach for Mozilla Firefox, and used it to evaluate the performance of web page load time compared to the default centralized DoH approach. While our K-resolver mechanism has some effect on DNS resolution time and web page load time, we show that this is mainly due to the geographical location of the selected DoH servers. When more well-provisioned anycast servers are available, our approach incurs negligible overhead while improving user privacy.

We investigate how reinforcement learning can be used to train level-designing agents. This represents a new approach to procedural content generation in games, where level design is framed as a game, and the content generator itself is learned. By seeing the design problem as a sequential task, we can use reinforcement learning to learn how to take the next action so that the expected final level quality is maximized. This approach can be used when few or no examples exist to train from, and the trained generator is very fast. We investigate three different ways of transforming two-dimensional level design problems into Markov decision processes and apply these to three game environments.

Signal temporal logic (STL) is an expressive language to specify time-bound real-world robotic tasks and safety specifications. Recently, there has been an interest in learning optimal policies to satisfy STL specifications via reinforcement learning (RL). Learning to satisfy STL specifications often needs a sufficient length of state history to compute reward and the next action. The need for history results in exponential state-space growth for the learning problem. Thus the learning problem becomes computationally intractable for most real-world applications. In this paper, we propose a compact means to capture state history in a new augmented state-space representation. An approximation to the objective (maximizing probability of satisfaction) is proposed and solved for in the new augmented state-space. We show the performance bound of the approximate solution and compare it with the solution of an existing technique via simulations.

A recent body of work has focused on the theoretical study of neural networks at the regime of large width. Specifically, it was shown that training infinitely-wide and properly scaled vanilla ReLU networks using the L2 loss, is equivalent to kernel regression using the Neural Tangent Kernel (NTK), which is deterministic, and remains constant during training. In this work, we derive the form of the limiting kernel for architectures incorporating bypass connections, namely residual networks (ResNets), as well as to densely connected networks (DenseNets). In addition, we derive finite width and depth corrections for both cases. Our analysis reveals that deep practical residual architectures might operate much closer to the `kernel regime'' than their vanilla counterparts: while in networks that do not use skip connections, convergence to the NTK requires one to fix depth, while increasing the layers' width. Our findings show that in ResNets, convergence to the NTK may occur when depth and width simultaneously tend to infinity, provided proper initialization. In DenseNets, however, convergence to the NTK as the width tend to infinity is guaranteed, at a rate that is independent of both depth and scale of the weights.

Background. Recently, In Italy the vaccination coverage for key immunizations, as MMR, has been declining, with measles outbreaks. In 2017, the Italian Government expanded the number of mandatory immunizations establishing penalties for families of unvaccinated children. During the 2018 elections campaign, immunization policy entered the political debate, with the government accusing oppositions of fuelling vaccine scepticism. A new government established in 2018 temporarily relaxed penalties and announced the introduction of flexibility.

Objectives and Methods. By a sentiment analysis on tweets posted in Italian during 2018, we aimed at (i) characterising the temporal flow of communication on vaccines, (ii) evaluating the usefulness of Twitter data for estimating vaccination parameters, and (iii) investigating whether the ambiguous political communication might have originated disorientation among the public.

Results. The population appeared to be mostly composed by "serial twitterers" tweeting about everything including vaccines. Tweets favourable to vaccination accounted for 75% of retained tweets, undecided for 14% and unfavourable for 11%. Twitter activity of the Italian public health institutions was negligible. After smoothing the temporal pattern, an up-and-down trend in the favourable proportion emerged, synchronized with the switch between governments, providing clear evidence of disorientation.

Conclusion. The reported evidence of disorientation documents that critical health topics, as immunization, should never be used for political consensus. This is especially true given the increasing role of online social media as information source, which might yield to social pressures eventually harmful for vaccine uptake, and is worsened by the lack of institutional presence on Twitter. This calls for efforts to contrast misinformation and the ensuing spread of hesitancy.

Building on previous works, we present a general method to define proof relevant intersection type semantics for pure lambda calculus. We argue that the bicategory of distributors is an appropriate categorical framework for this kind of semantics. We first introduce a class of 2-monads whose algebras are monoidal categories modelling resource management, following Marsden-Zwardt's approach. We show how these monadic constructions determine Kleisli bicategories over the bicategory of distributors and we give a sufficient condition for cartesian closedness. We define a family of non-extentional models for pure lambda calculus. We then prove that the interpretation of lambda terms induced by these models can be concretely described via intersection type systems. The intersection constructor corresponds to the particular tensor product given by the considered free monadic construction. We conclude by describing two particular examples of these distributor-induced intersection type systems, proving that they characterise head-normalization.

Accurate local-level poverty measurement is an essential task for governments and humanitarian organizations to track the progress towards improving livelihoods and distribute scarce resources. Recent computer vision advances in using satellite imagery to predict poverty have shown increasing accuracy, but they do not generate features that are interpretable to policymakers, inhibiting adoption by practitioners. Here we demonstrate an interpretable computational framework to accurately predict poverty at a local level by applying object detectors to high resolution (30cm) satellite images. Using the weighted counts of objects as features, we achieve 0.539 Pearson's r^2 in predicting village-level poverty in Uganda, a 31% improvement over existing (and less interpretable) benchmarks. Feature importance and ablation analysis reveal intuitive relationships between object counts and poverty predictions. Our results suggest that interpretability does not have to come at the cost of performance, at least in this important domain.

Large-scale nonconvex optimization problems are ubiquitous in modern machine learning, and among practitioners interested in solving them, Stochastic Gradient Descent (SGD) reigns supreme. We revisit the analysis of SGD in the nonconvex setting and propose a new variant of the recently introduced expected smoothness assumption which governs the behaviour of the second moment of the stochastic gradient. We show that our assumption is both more general and more reasonable than assumptions made in all prior work. Moreover, our results yield the optimal $\mathcal{O}(\varepsilon^{-4})$ rate for finding a stationary point of nonconvex smooth functions, and recover the optimal $\mathcal{O}(\varepsilon^{-1})$ rate for finding a global solution if the Polyak-{\L}ojasiewicz condition is satisfied. We compare against convergence rates under convexity and prove a theorem on the convergence of SGD under Quadratic Functional Growth and convexity, which might be of independent interest. Moreover, we perform our analysis in a framework which allows for a detailed study of the effects of a wide array of sampling strategies and minibatch sizes for finite-sum optimization problems. We corroborate our theoretical results with experiments on real and synthetic data.

Stochastic optimization algorithms, such as Stochastic Gradient Descent (SGD) and its variants, are mainstream methods for training deep networks in practice. However, the theoretical mechanism behind gradient noise still remains to be further investigated. Deep learning is known to find flat minima with a large neighboring region in parameter space from which each weight vector has similar small error. In this paper, we focus on a fundamental problem in deep learning, "How can deep learning usually find flat minima among so many minima?" To answer the question, we develop a density diffusion theory (DDT) for revealing the fundamental dynamical mechanism of SGD and deep learning. More specifically, we study how escape time from loss valleys to the outside of valleys depends on minima sharpness, gradient noise and hyperparameters. One of the most interesting findings is that stochastic gradient noise from SGD can help escape from sharp minima exponentially faster than flat minima, while white noise can only help escape from sharp minima polynomially faster than flat minima. We also find large-batch training requires exponentially many iterations to pass through sharp minima and find flat minima. We present direct empirical evidence supporting the proposed theoretical results.

We show a hardness result for random smoothing to achieve certified adversarial robustness against attacks in the $\ell_p$ ball of radius $\epsilon$ when $p>2$. Although random smoothing has been well understood for the $\ell_2$ case using the Gaussian distribution, much remains unknown concerning the existence of a noise distribution that works for the case of $p>2$. This has been posed as an open problem by Cohen et al. (2019) and includes many significant paradigms such as the $\ell_\infty$ threat model. In this work, we show that any noise distribution $\mathcal{D}$ over $\mathbb{R}^d$ that provides $\ell_p$ robustness with $p>2$ for all base classifiers must satisfy $\mathbb{E}\eta_i^2=\Omega(d^{1-2/p}\epsilon^2(1-\delta)/\delta^2)$ for 99% of the features (pixels) of vector $\eta$ drawn from $\mathcal{D}$, where $\epsilon$ is the robust radius and $\delta$ measures the score gap between the highest-scored class and the runner-up. Therefore, for high-dimensional images with pixel values bounded in $[0,255]$, the required noise will eventually dominate the useful information in the images, leading to trivial smoothed classifiers.

The latent spaces of typical GAN models often have semantically meaningful directions. Moving in these directions corresponds to human-interpretable image transformations, such as zooming or recoloring, enabling a more controllable generation process. However, the discovery of such directions is currently performed in a supervised manner, requiring human labels, pretrained models, or some form of self-supervision. These requirements can severely limit a range of directions existing approaches can discover. In this paper, we introduce an unsupervised method to identify interpretable directions in the latent space of a pretrained GAN model. By a simple model-agnostic procedure, we find directions corresponding to sensible semantic manipulations without any form of (self-)supervision. Furthermore, we reveal several non-trivial findings, which would be difficult to obtain by existing methods, e.g., a direction corresponding to background removal. As an immediate practical benefit of our work, we show how to exploit this finding to achieve a new state-of-the-art for the problem of saliency detection.

RGB-Infrared (IR) person re-identification is very challenging due to the large cross-modality variations between RGB and IR images. The key solution is to learn aligned features to the bridge RGB and IR modalities. However, due to the lack of correspondence labels between every pair of RGB and IR images, most methods try to alleviate the variations with set-level alignment by reducing the distance between the entire RGB and IR sets. However, this set-level alignment may lead to misalignment of some instances, which limits the performance for RGB-IR Re-ID. Different from existing methods, in this paper, we propose to generate cross-modality paired-images and perform both global set-level and fine-grained instance-level alignments. Our proposed method enjoys several merits. First, our method can perform set-level alignment by disentangling modality-specific and modality-invariant features. Compared with conventional methods, ours can explicitly remove the modality-specific features and the modality variation can be better reduced. Second, given cross-modality unpaired-images of a person, our method can generate cross-modality paired images from exchanged images. With them, we can directly perform instance-level alignment by minimizing distances of every pair of images. Extensive experimental results on two standard benchmarks demonstrate that the proposed model favourably against state-of-the-art methods. Especially, on SYSU-MM01 dataset, our model can achieve a gain of 9.2% and 7.7% in terms of Rank-1 and mAP. Code is available at https://github.com/wangguanan/JSIA-ReID.

With the increasing awareness of high-quality life, there is a growing need for health monitoring devices running robust algorithms in home environment. Health monitoring technologies enable real-time analysis of users' health status, offering long-term healthcare support and reducing hospitalization time. The purpose of this work is twofold, the software focuses on the analysis of gait, which is widely adopted for joint correction and assessing any lower limb or spinal problem. On the hardware side, we design a novel marker-less gait analysis device using a low-cost RGB camera mounted on a mobile tele-robot. As gait analysis with a single camera is much more challenging compared to previous works utilizing multi-cameras, a RGB-D camera or wearable sensors, we propose using vision-based human pose estimation approaches. More specifically, based on the output of two state-of-the-art human pose estimation models (Openpose and VNect), we devise measurements for four bespoke gait parameters: inversion/eversion, dorsiflexion/plantarflexion, ankle and foot progression angles. We thereby classify walking patterns into normal, supination, pronation and limp. We also illustrate how to run the purposed machine learning models in low-resource environments such as a single entry-level CPU. Experiments show that our single RGB camera method achieves competitive performance compared to state-of-the-art methods based on depth cameras or multi-camera motion capture system, at smaller hardware costs.

Graph deep learning has recently emerged as a powerful ML concept allowing to generalize successful deep neural architectures to non-Euclidean structured data. Such methods have shown promising results on a broad spectrum of applications ranging from social science, biomedicine, and particle physics to computer vision, graphics, and chemistry. One of the limitations of the majority of the current graph neural network architectures is that they are often restricted to the transductive setting and rely on the assumption that the underlying graph is known and fixed. In many settings, such as those arising in medical and healthcare applications, this assumption is not necessarily true since the graph may be noisy, partially- or even completely unknown, and one is thus interested in inferring it from the data. This is especially important in inductive settings when dealing with nodes not present in the graph at training time. Furthermore, sometimes such a graph itself may convey insights that are even more important than the downstream task. In this paper, we introduce Differentiable Graph Module (DGM), a learnable function predicting the edge probability in the graph relevant for the task, that can be combined with convolutional graph neural network layers and trained in an end-to-end fashion. We provide an extensive evaluation of applications from the domains of healthcare (disease prediction), brain imaging (gender and age prediction), computer graphics (3D point cloud segmentation), and computer vision (zero-shot learning). We show that our model provides a significant improvement over baselines both in transductive and inductive settings and achieves state-of-the-art results.

Let $\Omega \subseteq \{1,\dots,m\} \times \{1,\dots,n\}$. We consider fibers of coordinate projections $\pi_\Omega : \mathscr{M}_k(r,m \times n) \rightarrow k^{\# \Omega}$ from the algebraic variety of $m \times n$ matrices of rank at most $r$ over an infinite field $k$. For $\#\Omega = \dim \mathscr{M}_k(r,m \times n)$ we describe a class of $\Omega$'s for which there exist non-empty Zariski open sets $\mathscr{U}_\Omega \subset \mathscr{M}_k(r,m \times n)$ such that $\pi_\Omega^{-1}\big(\pi_\Omega(X)\big) \cap \mathscr{U}_\Omega$ is a finite set $\forall X \in \mathscr{U}_\Omega$. For this we interpret matrix completion from a point of view of hyperplane sections on the Grassmannian $\operatorname{Gr}(r,m)$. Crucial is a description by Sturmfels $\&$ Zelevinsky of classes of local coordinates on $\operatorname{Gr}(r,m)$ induced by vertices of the Newton polytope of the product of maximal minors of an $m \times (m-r)$ matrix of variables.

The ascent of the Internet has caused numerous adjustments in our lives. The Internet has radically changed the manner in which we carry on with our lives, the manner in which we spend our occasions, how we speak with one another day by day, and how we buy items. The development of the Internet among users has created content on the Internet by sources, for example, web-based life, reviews site, online journals, item fan page and some more. This has a lead on to another method for arranging an occasion or searching for a reasonable hotel to remain. Thus, hotel review sites have turned into a famous stage for visitors to share their experiences, reviews, and suggestions on hotels, which they have visited. In Europe, the hotel business has been a standout amongst the most vital monetary developments of the nation. The essential objective of a hotel is to satisfy the customers, to have the capacity to give a high caliber of administration and give them a vital affair while remaining at the hotel. The motivation behind this examination is to comprehend and recognize the scope of elements, which may add as per the general inclination of customers and in addition through their reviews to decide the measures of customers' desires. Information was gathered from online review sites, for example, Booking.com. Text analytics is utilized to analyze the contents gathered.

The choice of approximate posterior distributions plays a central role in stochastic variational inference (SVI). One effective solution is the use of normalizing flows \cut{defined on Euclidean spaces} to construct flexible posterior distributions. However, one key limitation of existing normalizing flows is that they are restricted to the Euclidean space and are ill-equipped to model data with an underlying hierarchical structure. To address this fundamental limitation, we present the first extension of normalizing flows to hyperbolic spaces. We first elevate normalizing flows to hyperbolic spaces using coupling transforms defined on the tangent bundle, termed Tangent Coupling ($\mathcal{TC}$). We further introduce Wrapped Hyperboloid Coupling ($\mathcal{W}\mathbb{H}C$), a fully invertible and learnable transformation that explicitly utilizes the geometric structure of hyperbolic spaces, allowing for expressive posteriors while being efficient to sample from. We demonstrate the efficacy of our novel normalizing flow over hyperbolic VAEs and Euclidean normalizing flows. Our approach achieves improved performance on density estimation, as well as reconstruction of real-world graph data, which exhibit a hierarchical structure. Finally, we show that our approach can be used to power a generative model over hierarchical data using hyperbolic latent variables.

Inspired by the Galerkin and particular method, a new approximation approach is recalled in the Cartesian case. In this paper, we are interested specially by constructing this method, when the domain of consideration is a two dimensional ball, to extend this work to the several dimension. We reduce the number of iterations to calculate integrals and numerical solution of Poisson and the Heat problems (elliptic nd parabolic PDEs), in a very fast way.

Due to the outstanding capability of capturing underlying data distributions, deep learning techniques have been recently utilized for a series of traditional database problems. In this paper, we investigate the possibilities of utilizing deep learning for cardinality estimation of similarity selection. Answering this problem accurately and efficiently is essential to many data management applications, especially for query optimization. Moreover, in some applications the estimated cardinality is supposed to be consistent and interpretable. Hence a monotonic estimation w.r.t. the query threshold is preferred. We propose a novel and generic method that can be applied to any data type and distance function. Our method consists of a feature extraction model and a regression model. The feature extraction model transforms original data and threshold to a Hamming space, in which a deep learning-based regression model is utilized to exploit the incremental property of cardinality w.r.t. the threshold for both accuracy and monotonicity. We develop a training strategy tailored to our model as well as techniques for fast estimation. We also discuss how to handle updates. We demonstrate the accuracy and the efficiency of our method through experiments, and show how it improves the performance of a query optimizer.

Contract-based design is a promising methodology for taming the complexity of developing sophisticated systems. A formal contract distinguishes between assumptions, which are constraints that the designer of a component puts on the environments in which the component can be used safely, and guarantees, which are promises that the designer asks from the team that implements the component. A theory of formal contracts can be formalized as an interface theory, which supports the composition and refinement of both assumptions and guarantees. Although there is a rich landscape of contract-based design methods that address functional and extra-functional properties, we present the first interface theory that is designed for ensuring system-wide security properties, thus paving the way for a science of safety and security co-engineering. Our framework provides a refinement relation and a composition operation that support both incremental design and independent implementability. We develop our theory for both stateless and stateful interfaces. We illustrate the applicability of our framework with an example inspired from the automotive domain. Finally, we provide three plausible trace semantics to stateful information-flow interfaces and we show that only two correspond to temporal logics for specifying hyperproperties, while the third defines a new class of hyperproperties that lies between the other two classes.

The top-k operation, i.e., finding the k largest or smallest elements from a collection of scores, is an important model component, which is widely used in information retrieval, machine learning, and data mining. However, if the top-k operation is implemented in an algorithmic way, e.g., using bubble algorithm, the resulting model cannot be trained in an end-to-end way using prevalent gradient descent algorithms. This is because these implementations typically involve swapping indices, whose gradient cannot be computed. Moreover, the corresponding mapping from the input scores to the indicator vector of whether this element belongs to the top-k set is essentially discontinuous. To address the issue, we propose a smoothed approximation, namely the SOFT (Scalable Optimal transport-based diFferenTiable) top-k operator. Specifically, our SOFT top-k operator approximates the output of the top-k operation as the solution of an Entropic Optimal Transport (EOT) problem. The gradient of the SOFT operator can then be efficiently approximated based on the optimality conditions of EOT problem. We apply the proposed operator to the k-nearest neighbors and beam search algorithms, and demonstrate improved performance.

Though early successes of Statistical Machine Translation (SMT) systems are attributed in part to the explicit modelling of the interaction between any two source and target units, e.g., alignment, the recent Neural Machine Translation (NMT) systems resort to the attention which partially encodes the interaction for efficiency. In this paper, we employ Joint Representation that fully accounts for each possible interaction. We sidestep the inefficiency issue by refining representations with the proposed efficient attention operation. The resulting Reformer models offer a new Sequence-to- Sequence modelling paradigm besides the Encoder-Decoder framework and outperform the Transformer baseline in either the small scale IWSLT14 German-English, English-German and IWSLT15 Vietnamese-English or the large scale NIST12 Chinese-English translation tasks by about 1 BLEU point.We also propose a systematic model scaling approach, allowing the Reformer model to beat the state-of-the-art Transformer in IWSLT14 German-English and NIST12 Chinese-English with about 50% fewer parameters. The code is publicly available at https://github.com/lyy1994/reformer.

State-of-the-art lane detection methods achieve successful performance. Despite their advantages, these methods have critical deficiencies such as the limited number of detectable lanes and high false positive. In especial, high false positive can cause wrong and dangerous control. In this paper, we propose a novel lane detection method for the arbitrary number of lanes using the deep learning method, which has the lower number of false positives than other recent lane detection methods. The architecture of the proposed method has the shared feature extraction layers and several branches for detection and embedding to cluster lanes. The proposed method can generate exact points on the lanes, and we cast a clustering problem for the generated points as a point cloud instance segmentation problem. The proposed method is more compact because it generates fewer points than the original image pixel size. Our proposed post processing method eliminates outliers successfully and increases the performance notably. Whole proposed framework achieves competitive results on the tuSimple dataset.

Testing automotive mechatronic systems partly uses the software-in-the-loop approach, where systematically covering inputs of the system-under-test remains a major challenge. In current practice, there are two major techniques of input stimulation. One approach is to craft input sequences which eases control and feedback of the test process but falls short of exposing the system to realistic scenarios. The other is to replay sequences recorded from field operations which accounts for reality but requires collecting a well-labeled dataset of sufficient capacity for widespread use, which is expensive. This work applies the well-known unsupervised learning framework of Generative Adversarial Networks (GAN) to learn an unlabeled dataset of recorded in-vehicle signals and uses it for generation of synthetic input stimuli. Additionally, a metric-based linear interpolation algorithm is demonstrated, which guarantees that generated stimuli follow a customizable similarity relationship with specified references. This combination of techniques enables controlled generation of a rich range of meaningful and realistic input patterns, improving virtual test coverage and reducing the need for expensive field tests.

Several network growth models have been proposed in the literature that attempt to incorporate properties of citation networks. Generally, these models aim at retaining the degree distribution observed in real-world networks. In this work, we explore whether existing network growth models can realize the diversity in citation growth exhibited by individual papers - a new node-centric property observed recently in citation networks across multiple domains of research. We theoretically and empirically show that the network growth models which are solely based on degree and/or intrinsic fitness cannot realize certain temporal growth behaviors that are observed in real-world citation networks. To this end, we propose two new growth models that localize the influence of papers through an appropriate attachment mechanism. Experimental results on the real-world citation networks of Computer Science and Physics domains show that our proposed models can better explain the temporal behavior of citation networks than existing models.

There are several real-world tasks that would benefit from applying multiagent reinforcement learning (MARL) algorithms, including the coordination among self-driving cars. The real world has challenging conditions for multiagent learning systems, such as its partial observable and nonstationary nature. Moreover, if agents must share a limited resource (e.g. network bandwidth) they must all learn how to coordinate resource use. This paper introduces a deep recurrent multiagent actor-critic framework (R-MADDPG) for handling multiagent coordination under partial observable set-tings and limited communication. We investigate recurrency effects on performance and communication use of a team of agents. We demonstrate that the resulting framework learns time dependencies for sharing missing observations, handling resource limitations, and developing different communication patterns among agents.

Permissioned blockchain, in which only known nodes are allowed to participate, has been widely used by governments, companies, institutes and so on. We study the case where permissioned blockchain is applied to the field of horizontal strategic alliances to ensure that any participant of the alliance who does not follow the regulation will be detected and punished for his behavior afterward. We propose a general hierarchical model of permissioned blockchain which includes three tiers: providers, collectors, and governors. To utilize the overlap of collectors in gathering transactions from providers, we introduce the reputation as a measure of the reliability of the collectors. With the help of reputation system, governors will not need to check all transactions uploaded by collectors. As a result, our protocol will have a significant improvement in efficiency. Meanwhile, let us denote $T$ to be the number of total transactions. Then the number of mistakes that governors suffer is only asymptotically $O(\sqrt{T})$ when guided by our reputation mechanism, as long as there exists a collector who behaves well. This result implies that our protocol remains high performance. The reputation mechanism also provides incentives for collectors to work honestly. To our knowledge, Our work is the first one to give an analytical result on reputation mechanism in permissioned blockchains. Furthermore, we demonstrate two typical cases where our model can be well applied to.

The design of reliable path-following controllers is a key ingredient for successful deployment of self-driving vehicles. This controller-design problem is especially challenging for a general 2-trailer with a car-like tractor due to the vehicle's structurally unstable joint-angle kinematics in backward motion and the car-like tractor's curvature limitations which can cause the vehicle segments to fold and enter a jackknife state. Furthermore, optical sensors with a limited field of view have been proposed to solve the joint-angle estimation problem online, which introduce additional restrictions on which vehicle states that can be reliably estimated. To incorporate these restrictions at the level of control, a model predictive path-following controller is proposed. By taking the vehicle's physical and sensing limitations into account, it is shown in real-world experiments that the performance of the proposed path-following controller in terms of suppressing disturbances and recovering from non-trivial initial states is significantly improved compared to a previously proposed solution where the constraints have been neglected.

The information bottleneck principle provides an information-theoretic method for representation learning, by training an encoder to retain all information which is relevant for predicting the label while minimizing the amount of other, excess information in the representation. The original formulation, however, requires labeled data to identify the superfluous information. In this work, we extend this ability to the multi-view unsupervised setting, where two views of the same underlying entity are provided but the label is unknown. This enables us to identify superfluous information as that not shared by both views. A theoretical analysis leads to the definition of a new multi-view model that produces state-of-the-art results on the Sketchy dataset and label-limited versions of the MIR-Flickr dataset. We also extend our theory to the single-view setting by taking advantage of standard data augmentation techniques, empirically showing better generalization capabilities when compared to common unsupervised approaches for representation learning.

Partial quorum systems are widely used in distributed key-value stores due to their latency benefits at the expense of providing weaker consistency guarantees. The probabilistically bounded staleness framework (PBS) studied the latency-consistency trade-off of Dynamo-style partial quorum systems through Monte Carlo event-based simulations. In this paper, we study the latency-consistency trade-off for such systems analytically and derive a closed-form expression for the inconsistency probability. Our approach allows fine-tuning of latency and consistency guarantees in key-value stores, which is intractable using Monte Carlo event-based simulations.