## Computer Science (cs) updates on the arXiv.org e-print archive



Lecture videos are an increasingly important learning resource for higher education. However, the challenge of quickly finding the content of interest in a lecture video is an important limitation of this format. This paper introduces visual summarization of lecture video segments to enhance navigation. A lecture video is divided into segments based on the frame-to-frame similarity of content. The user navigates the lecture video content by viewing a single frame visual and textual summary of each segment. The paper presents a novel methodology to generate the visual summary of a lecture video segment by computing similarities between images extracted from the segment and employing a graph-based algorithm to identify the subset of most representative images. The results from this research are integrated into a real-world lecture video management portal called Videopoints. To collect ground truth for evaluation, a survey was conducted where multiple users manually provided visual summaries for 40 lecture video segments. The users also stated whether any images were not selected for the summary because they were similar to other selected images. The graph based algorithm for identifying summary images achieves 78% precision and 72% F1-measure with frequently selected images as the ground truth, and 94% precision and 72% F1-measure with the union of all user selected images as the ground truth. For 98% of algorithm selected visual summary images, at least one user also selected that image for their summary or considered it similar to another image they selected. Over 65% of automatically generated summaries were rated as good or very good by the users on a 4-point scale from poor to very good. Overall, the results establish that the methodology introduced in this paper produces good quality visual summaries that are practically useful for lecture video navigation.

When training a machine learning model, it is standard procedure for the researcher to have full knowledge of both the data and model. However, this engenders a lack of trust between data owners and data scientists. Data owners are justifiably reluctant to relinquish control of private information to third parties. Privacy-preserving techniques distribute computation in order to ensure that data remains in the control of the owner while learning takes place. However, architectures distributed amongst multiple agents introduce an entirely new set of security and trust complications. These include data poisoning and model theft. This paper outlines a distributed infrastructure which is used to facilitate peer-to-peer trust between distributed agents; collaboratively performing a privacy-preserving workflow. Our outlined prototype sets industry gatekeepers and governance bodies as credential issuers. Before participating in the distributed learning workflow, malicious actors must first negotiate valid credentials. We detail a proof of concept using Hyperledger Aries, Decentralised Identifiers (DIDs) and Verifiable Credentials (VCs) to establish a distributed trust architecture during a privacy-preserving machine learning experiment. Specifically, we utilise secure and authenticated DID communication channels in order to facilitate a federated learning workflow related to mental health care data.

Multi-dimensional Hawkes process (MHP) is a class of self and mutually exciting point processes that find wide range of applications -- from prediction of earthquakes to modelling of order books in high frequency trading. This paper makes two major contributions, we first find an unbiased estimator for the log-likelihood estimator of the Hawkes process to enable efficient use of the stochastic gradient descent method for maximum likelihood estimation. The second contribution is, we propose a specific single hidden layered neural network for the non-parametric estimation of the underlying kernels of the MHP. We evaluate the proposed model on both synthetic and real datasets, and find the method has comparable or better performance than existing estimation methods. The use of shallow neural network ensures that we do not compromise on the interpretability of the Hawkes model, while at the same time have the flexibility to estimate any non-standard Hawkes excitation kernel.

Geomagnetic disturbances (GMDs), a result of space weather, pose a severe risk to electric grids. When GMDs occur, they can cause geomagnetically-induced currents (GICs), which saturate transformers, induce hot-spot heating, and increase reactive power losses in the transmission grid. Furthermore, uncertainty in the magnitude and orientation of the geo-electric field, and insufficient historical data make the problem of mitigating the effects of uncertain GMDs challenging. In this paper, we propose a novel distributionally robust optimization (DRO) approach that models uncertain GMDs and mitigates the effects of GICs on electric grids. This is achieved via a set of mitigation actions (e.g., line switching, locating blocking devices, generator re-dispatch and load shedding), prior to the GMD event, such that the worst-case expectation of the system cost is minimized. To this end, we develop a column-and-constraint generation algorithm that solves a sequence of mixed-integer second-order conic programs to handle the underlying convex support set of the uncertain GMDs. Also, we present a monolithic exact reformulation of our DRO model when the underlying support set can be approximated by a polytope with three extreme points. Numerical experiments on 'epri-21' system show the efficacy of the proposed algorithms and the exact reformulation of our DRO model.

Machine learning inference is becoming a core building block for interactive web applications. As a result, the underlying model serving systems on which these applications depend must consistently meet low latency targets. Existing model serving architectures use well-known reactive techniques to alleviate common-case sources of latency, but cannot effectively curtail tail latency caused by unpredictable execution times. Yet the underlying execution times are not fundamentally unpredictable-on the contrary we observe that inference using Deep Neural Network (DNN) models has deterministic performance. Here, starting with the predictable execution times of individual DNN inferences, we adopt a principled design methodology to successively build a fully distributed model serving system that achieves predictable end-to-end performance. We evaluate our implementation, Clockwork, using production trace workloads, and show that Clockwork can support thousands of models while simultaneously meeting 100 ms latency targets for 99.997% of requests. We further demonstrate that Clockwork exploits predictable execution times to achieve tight request-level service-level objectives (SLOs) as well as a high degree of request-level performance isolation.

The popularity of smartphone messaging apps like WhatsApp are revolutionizing how many users communicate and interact with the internet. Characteristics such as the immediacy of messages directly delivered to the user's phone and secure communication through end-to-end encryption have made this tool unique but also allowed it to be extensively abused to create and spread misinformation. Due to the private encrypted nature of the messages it is hard to track the dissemination of misinformation at scale. In this work, we propose an approach for WhatsApp to counter misinformation that does not rely on content moderation. The idea is based on on-device checking, where WhatsApp can detect when a user shares multimedia content which have been previously labeled as misinformation by fact-checkers, without violating the privacy of the users. We evaluate the potential of this strategy for combating misinformation using data collected from both fact-checking agencies and WhatsApp during recent elections in Brazil and India. Our results show that our approach has the potential to detect a considerable amount of images containing misinformation, reducing 40.7% and 82.2% of their shares in Brazil and India, respectively.

An ongoing interest in innovation studies is to understand how knowledge generated from scientific research can be used in the development of technologies. While previous inquiries have devoted to studying the scientific capacity of technologies and institutional factors facilitating technology transfer, little is known about the intrinsic characteristics of scientific publications that gain direct technological impact. Here we focus on two features, namely basicness and novelty. Using a corpus of 3.8 million papers published between 1980 and 1999, we find that basic science papers and novel papers are substantially more likely to achieve direct technological impact. Further analysis that limits to papers with technological impact reveals that basic science and novel science have more patent citations, experience shorter time lag, and have impact in broader technological fields.

Object detection networks are powerful in computer vision, but not necessarily optimized for biomedical object detection. In this work, we propose CircleNet, a simple anchor-free detection method with circle representation for detection of the ball-shaped glomerulus. Different from the traditional bounding box based detection method, the bounding circle (1) reduces the degrees of freedom of detection representation, (2) is naturally rotation invariant, (3) and optimized for ball-shaped objects. The key innovation to enable this representation is the anchor-free framework with the circle detection head. We evaluate CircleNet in the context of detection of glomerulus. CircleNet increases average precision of the glomerulus detection from 0.598 to 0.647. Another key advantage is that CircleNet achieves better rotation consistency compared with bounding box representations.

We study the biased random walk where at each step of a random walk a "controller" can, with a certain small probability, fix the next step. This model was introduced by Azar et al. [STOC1992]; we extend their work to the time dependent setting and consider cover times of this walk. We obtain new bounds on the cover and hitting times and make progress towards resolving a conjecture of Azar et al. on maximising values of the stationary distribution. We also consider the problem of computing an optimal strategy for the controller to minimise the cover time and show that for directed graphs determining the cover time is PSPACE-complete.

We propose a loss function for generative adversarial networks (GANs) using R\'{e}nyi information measures with parameter $\alpha$. More specifically, we formulate GAN's generator loss function in terms of R\'{e}nyi cross-entropy functionals. We demonstrate that for any $\alpha$, this generalized loss function preserves the equilibrium point satisfied by the original GAN loss based on the Jensen-Renyi divergence, a natural extension of the Jensen-Shannon divergence. We also prove that the R\'{e}nyi-centric loss function reduces to the original GAN loss function as $\alpha \to 1$. We show empirically that the proposed loss function, when implemented on both DCGAN (with $L_1$ normalization) and StyleGAN architectures, confers performance benefits by virtue of the extra degree of freedom provided by the parameter $\alpha$. More specifically, we show improvements with regard to: (a) the quality of the generated images as measured via the Fr\'echet Inception Distance (FID) score (e.g., best FID=8.33 for RenyiStyleGAN vs 9.7 for StyleGAN when evaluated over 64$\times$64 CelebA images) and (b) training stability. While it was applied to GANs in this study, the proposed approach is generic and can be used in other applications of information theory to deep learning, e.g., AI bias or privacy.

In this paper, we document a design, implementation, and field tests of a vehicle-to-vehicle (V2V) communication-enabled rail collision avoidance system (RCAS) for urban light rail vehicles---trams. The RCAS runs onboard a tram and issues an acoustic warning to a tram driver if a collision with another tram is imminent---no commands to the braking subsystem are issued in the current version. The prediction of an imminent collision with another tram is based on real-time evaluation of predicted trajectories of both trams. The predictions are based on mathematical models of the longitudinal dynamics of the vehicles and real-time estimation of the current motion states (position and velocity). The information about the other tram's predicted trajectory is accessed through V2V communication. We also document the results of verification of the functionality of the proposed RCAS through several field tests with a real tram.

We propose to explain the behavior of black-box prediction methods (e.g., deep neural networks trained on image pixel data) using causal graphical models. Specifically, we explore learning the structure of a causal graph where the nodes represent prediction outcomes along with a set of macro-level "interpretable" features, while allowing for arbitrary unmeasured confounding among these variables. The resulting graph may indicate which of the interpretable features, if any, are possible causes of the prediction outcome and which may be merely associated with prediction outcomes due to confounding. The approach is motivated by a counterfactual theory of causal explanation wherein good explanations point to factors which are "difference-makers" in an interventionist sense. The resulting analysis may be useful in algorithm auditing and evaluation, by identifying features which make a causal difference to the algorithm's output.

We report on the development of TMVis, a web service to provide visualizations of how individual webpages have changed over time. We leverage past research on summarizing collections of webpages with thumbnail-sized screenshots and on choosing a small number of representative past archived webpages from a large collection. We offer four visualizations: image grid, image slider, timeline, and animated GIF. Embed codes for the image grid and image slider can be produced to include these on separate webpages. The animated GIF can be downloaded as an image file for the same purpose. This tool can be used to allow scholars from various disciplines, as well as the general public, to explore the temporal nature of web archives. We hope that these visualizations will just be the beginning and will provide a starting point for others to expand these types of offerings for users of web archives.

One of the main challenges for end-to-end speech translation is data scarcity. We leverage pseudo-labels generated from unlabeled audio by a cascade and an end-to-end speech translation model. This provides 8.3 and 5.7 BLEU gains over a strong semi-supervised baseline on the MuST-C English-French and English-German datasets, reaching state-of-the art performance. The effect of the quality of the pseudo-labels is investigated. Our approach is shown to be more effective than simply pre-training the encoder on the speech recognition task. Finally, we demonstrate the effectiveness of self-training by directly generating pseudo-labels with an end-to-end model instead of a cascade model.

Neural ordinary differential equations (NODEs) have recently attracted increasing attention; however, their empirical performance on benchmark tasks (e.g. image classification) are significantly inferior to discrete-layer models. We demonstrate an explanation for their poorer performance is the inaccuracy of existing gradient estimation methods: the adjoint method has numerical errors in reverse-mode integration; the naive method directly back-propagates through ODE solvers, but suffers from a redundantly deep computation graph when searching for the optimal stepsize. We propose the Adaptive Checkpoint Adjoint (ACA) method: in automatic differentiation, ACA applies a trajectory checkpoint strategy which records the forward-mode trajectory as the reverse-mode trajectory to guarantee accuracy; ACA deletes redundant components for shallow computation graphs; and ACA supports adaptive solvers. On image classification tasks, compared with the adjoint and naive method, ACA achieves half the error rate in half the training time; NODE trained with ACA outperforms ResNet in both accuracy and test-retest reliability. On time-series modeling, ACA outperforms competing methods. Finally, in an example of the three-body problem, we show NODE with ACA can incorporate physical knowledge to achieve better accuracy. We provide the PyTorch implementation of ACA: \url{https://github.com/juntang-zhuang/torch-ACA}.

In standard balanced truncation model order reduction, the initial condition is typically ignored in the reduction procedure and is assumed to be zero instead. However, such a reduced-order model may be a bad approximation to the full-order system, if the initial condition is not zero. In the literature there are several attempts for modified reduction methods at the price of having no error bound or only a posteriori error bounds which are often too expensive to evaluate. In this work we propose a new balancing procedure that is based on a shift transformation on the state. We first derive a joint projection reduced-order model in which the part of the system depending only on the input and the one depending only on the initial value are reduced at once and we prove an a priori error bound. With this result at hand, we derive a separate projection procedure in which the two parts are reduced separately. This gives the freedom to choose different reduction orders for the different subsystems. Moreover, we discuss how the reduced-order models can be constructed in practice. Since the error bounds are parameter-dependent we show how they can be optimized efficiently. We conclude this paper by comparing our results with the ones from the literature by a series of numerical experiments.

Internet of Things (IoT) services will use machine learning tools to efficiently analyze various types of data collected by IoT devices for inference, autonomy, and control purposes. However, due to resource constraints and privacy challenges, edge IoT devices may not be able to transmit their collected data to a central controller for training machine learning models. To overcome this challenge, federated learning (FL) has been proposed as a means for enabling edge devices to train a shared machine learning model without data exchanges thus reducing communication overhead and preserving data privacy. However, Google's seminal FL algorithm requires all devices to be directly connected with a central controller, which significantly limits its application scenarios. In this context, this paper introduces a novel FL framework, called collaborative FL (CFL), which enables edge devices to implement FL with less reliance on a central controller. The fundamentals of this framework are developed and then, a number of communication techniques are proposed so as to improve the performance of CFL. To this end, an overview of centralized learning, Google's seminal FL, and CFL is first presented. For each type of learning, the basic architecture as well as its advantages, drawbacks, and usage conditions are introduced. Then, three CFL performance metrics are presented and a suite of communication techniques ranging from network formation, device scheduling, mobility management, and coding is introduced to optimize the performance of CFL. For each technique, future research opportunities are also discussed. In a nutshell, this article will showcase how the proposed CFL framework can be effectively implemented at the edge of large-scale wireless systems such as the Internet of Things.

A hydrogeological model for the spread of pollution in an aquifer is considered. The model consists in a convection-diffusion-reaction equation involving the dispersion tensor which depends nonlinearly of the fluid velocity. We introduce an explicit flux in the model and use a mixed Finite Element Method for the discretization. We provide existence, uniqueness and stability results for the discrete model. A convergence result is obtained for the semi-discretized in time problem and for the fully discretization.

Many predictions are probabilistic in nature; for example, a prediction could be for precipitation tomorrow, but with only a 30 percent chance. Given both the predictions and the actual outcomes, "reliability diagrams" (also known as "calibration plots") help detect and diagnose statistically significant discrepancies between the predictions and the outcomes. The canonical reliability diagrams are based on histogramming the observed and expected values of the predictions; several variants of the standard reliability diagrams propose to replace the hard histogram binning with soft kernel density estimation using smooth convolutional kernels of widths similar to the widths of the bins. In all cases, an important question naturally arises: which widths are best (or are multiple plots with different widths better)? Rather than answering this question, plots of the cumulative differences between the observed and expected values largely avoid the question, by displaying miscalibration directly as the slopes of secant lines for the graphs. Slope is easy to perceive with quantitative precision even when the constant offsets of the secant lines are irrelevant. There is no need to bin or perform kernel density estimation with a somewhat arbitrary kernel.

Artificial Neural Networks (ANN) have been popularized in many science and technological areas due to their capacity to solve many complex pattern matching problems. That is the case of Virtual Screening, a research area that studies how to identify those molecular compounds with the highest probability to present biological activity for a therapeutic target. Due to the vast number of small organic compounds and the thousands of targets for which such large-scale screening can potentially be carried out, there has been an increasing interest in the research community to increase both, processing speed and energy efficiency in the screening of molecular databases. In this work, we present a classification model describing each molecule with a single energy-based vector and propose a machine-learning system based on the use of ANNs. Different ANNs are studied with respect to their suitability to identify biochemical similarities. Also, a high-performance and energy-efficient hardware acceleration platform based on the use of stochastic computing is proposed for the ANN implementation. This platform is of utility when screening vast libraries of compounds. As a result, the proposed model showed appreciable improvements with respect previously published works in terms of the main relevant characteristics (accuracy, speed and energy-efficiency).

Stein Variational Gradient Descent (SVGD), a popular sampling algorithm, is often described as the kernelized gradient flow for the Kullback-Leibler divergence in the geometry of optimal transport. We introduce a new perspective on SVGD that instead views SVGD as the (kernelized) gradient flow of the chi-squared divergence which, we show, exhibits a strong form of uniform exponential ergodicity under conditions as weak as a Poincar\'e inequality. This perspective leads us to propose an alternative to SVGD, called Laplacian Adjusted Wasserstein Gradient Descent (LAWGD), that can be implemented from the spectral decomposition of the Laplacian operator associated with the target density. We show that LAWGD exhibits strong convergence guarantees and good practical performance.

We consider the problem of generating a time-optimal quadrotor trajectory that attains a set of prescribed waypoints. This problem is challenging since the optimal trajectory is located on the boundary of the set of dynamically feasible trajectories. This boundary is hard to model as it involves limitations of the entire system, including hardware and software, in agile high-speed flight. In this work, we propose a multi-fidelity Bayesian optimization framework that models the feasibility constraints based on analytical approximation, numerical simulation, and real-world flight experiments. By combining evaluations at different fidelities, trajectory time is optimized while keeping the number of required costly flight experiments to a minimum. The algorithm is thoroughly evaluated in both simulation and real-world flight experiments at speeds up to 11 m/s. Resulting trajectories were found to be significantly faster than those obtained through minimum-snap trajectory planning.

A number of governments and organizations around the world agree that the first step to address national and international problems such as energy independence, global warming or emergency resilience, is the redesign of electricity networks, known as Smart Grids. Typically, power grids have broadcast power from generation plants to large population of consumers on a sub-optimal way. Nevertheless, the fusion of energy delivery networks and digital information networks, along with the introduction of intelligent monitoring systems (Smart Meters) and renewable energies, would enable two-way electricity trading relationships between electricity suppliers and electricity consumers. The availability of real-time information on electricity demand and pricing, would enable suppliers optimizing their delivery systems, while consumers would have the means to minimize their bill by turning on appliances at off-peak hours. The construction of the Smart Grid entails the design and deployment of information networks and systems of unprecedented requirements on storage, real-time event processing and availability. In this paper, a series of system architectures to store and process Smart Meter reading data are explored and compared aiming to establish a solid foundation in which future intelligent systems could be supported.

Originating from condensed matter physics, tensor networks are compact representations of high-dimensional tensors. In this paper, the prowess of tensor networks is demonstrated on the particular task of one-class anomaly detection. We exploit the memory and computational efficiency of tensor networks to learn a linear transformation over a space with dimension exponential in the number of original features. The linearity of our model enables us to ensure a tight fit around training instances by penalizing the model's global tendency to a predict normality via its Frobenius norm---a task that is infeasible for most deep learning models. Our method outperforms deep and classical algorithms on tabular datasets and produces competitive results on image datasets, despite not exploiting the locality of images.

With the recent development of autonomous vehicle technology, there have been active efforts on the deployment of this technology at different scales that include urban and highway driving. While many of the prototypes showcased have shown to operate under specific cases, little effort has been made to better understand their shortcomings and generalizability to new areas. Distance, uptime and number of manual disengagements performed during autonomous driving provide a high-level idea on the performance of an autonomous system but without proper data normalization, testing location information, and the number of vehicles involved in testing, the disengagement reports alone do not fully encompass system performance and robustness. Thus, in this study a complete set of metrics are proposed for benchmarking autonomous vehicle systems in a variety of scenarios that can be extended for comparison with human drivers. These metrics have been used to benchmark UC San Diego's autonomous vehicle platforms during early deployments for micro-transit and autonomous mail delivery applications.

In this research, we propose a series of methodologies to mine transit riders travel pattern and behavioral preferences, and then we use these knowledges to adjust and optimize the transit systems. Contributions are: 1) To increase the data validity: a) we propose a novel approach to rectify the time discrepancy of data between the AFC (Automated Fare Collection) systems and AVL (Automated Vehicle Location) system, our approach transforms data events into signals and applies time domain correlation the detect and rectify their relative discrepancies. b) By combining historical data and passengers ticketing time stamps, we induct and compensate missing information in AVL datasets. 2) To infer passengers alighting point, we introduce a maximum probabilistic model incorporating passengers home place to recover their complete transit trajectory from semi-complete boarding records.Then we propose an enhance activity identification algorithm which is capable of specifying passengers short-term activity from ordinary transfers. Finally, we analyze the temporal-spatial characteristic of transit ridership. 3) To discover passengers travel demands. We integrate each passengers trajectory data in multiple days and construct a Hybrid Trip Graph (HTG). We then use a depth search algorithm to derive the spatially closed transit trip chains; Finally, we use closed transit trip chains of passengers to study their travel pattern from various perspectives. Finally, we analyze urban transit corridors by aggregating the passengers critical transit chains.4) We derive eight influential factors, and then construct passengers choice models under various scenarios. Next, we validate our model using ridership re-distribute simulations. Finally, we conduct a comprehensive analysis on passengers temporal choice preference and use this information to optimize urban transit systems.

Performance of neural network models relies on the availability of large datasets with minimal levels of uncertainty. Transfer Learning (TL) models have been proposed to resolve the issue of small dataset size by letting the model train on a bigger, task-related reference dataset and then fine-tune on a smaller, task-specific dataset. In this work, we apply a transfer learning approach to improve predictive power in noisy data systems with large variable confidence datasets. We propose a deep neural network method called Filtered Transfer Learning (FTL) that defines multiple tiers of data confidence as separate tasks in a transfer learning setting. The deep neural network is fine-tuned in a hierarchical process by iteratively removing (filtering) data points with lower label confidence, and retraining. In this report we use FTL for predicting the interaction of drugs and proteins. We demonstrate that using FTL to learn stepwise, across the label confidence distribution, results in higher performance compared to deep neural network models trained on a single confidence range. We anticipate that this approach will enable the machine learning community to benefit from large datasets with uncertain labels in fields such as biology and medicine.

In this paper we propose an approach for computing multiple high-quality near-isometric maps between a pair of 3D shapes. Our method is fully automatic and does not rely on user-provided landmarks or descriptors. This allows us to analyze the full space of maps and extract multiple diverse and accurate solutions, rather than optimizing for a single optimal correspondence as done in previous approaches. To achieve this, we propose a compact tree structure based on the spectral map representation for encoding and enumerating possible rough initializations, and a novel efficient approach for refining them to dense pointwise maps. This leads to the first method capable of both producing multiple high-quality correspondences across shapes and revealing the symmetry structure of a shape without a priori information. In addition, we demonstrate through extensive experiments that our method is robust and results in more accurate correspondences than state-of-the-art for shape matching and symmetry detection.

Estimating depth from RGB images is a long-standing ill-posed problem, which has been explored for decades by the computer vision, graphics, and machine learning communities. Among the existing techniques, stereo matching remains one of the most widely used in the literature due to its strong connection to the human binocular system. Traditionally, stereo-based depth estimation has been addressed through matching hand-crafted features across multiple images. Despite the extensive amount of research, these traditional techniques still suffer in the presence of highly textured areas, large uniform regions, and occlusions. Motivated by their growing success in solving various 2D and 3D vision problems, deep learning for stereo-based depth estimation has attracted growing interest from the community, with more than 150 papers published in this area between 2014 and 2019. This new generation of methods has demonstrated a significant leap in performance, enabling applications such as autonomous driving and augmented reality. In this article, we provide a comprehensive survey of this new and continuously growing field of research, summarize the most commonly used pipelines, and discuss their benefits and limitations. In retrospect of what has been achieved so far, we also conjecture what the future may hold for deep learning-based stereo for depth estimation research.

Dopamine (DA) is an organic chemical that influences several parts of behaviour and physical functions. Fast-scan cyclic voltammetry (FSCV) is a technique used for in vivo phasic dopamine release measurements. The analysis of such measurements, though, requires notable effort. In this paper, we present the use of convolutional neural networks (CNNs) for the identification of phasic dopamine releases.

This paper develops a novel Continuous-time Accelerated Proximal Point Algorithm (CAPPA) for $\ell_1$-minimization problems with provable fixed-time convergence guarantees. The problem of $\ell_1$-minimization appears in several contexts, such as sparse recovery (SR) in Compressed Sensing (CS) theory, and sparse linear and logistic regressions in machine learning to name a few. Most existing algorithms for solving $\ell_1$-minimization problems are discrete-time, inefficient and require exhaustive computer-guided iterations. CAPPA alleviates this problem on two fronts: (a) it encompasses a continuous-time algorithm that can be implemented using analog circuits; (b) it betters LCA and finite-time LCA (recently developed continuous-time dynamical systems for solving SR problems) by exhibiting provable fixed-time convergence to optimal solution. Consequently, CAPPA is better suited for fast and efficient handling of SR problems. Simulation studies are presented that corroborate computational advantages of CAPPA.

While fast multipole methods (FMMs) are in widespread use for the rapid evaluation of potential fields governed by the Laplace, Helmholtz, Maxwell or Stokes equations, their coupling to high-order quadratures for evaluating layer potentials is still an area of active research. In three dimensions, a number of issues need to be addressed, including the specification of the surface as the union of high-order patches, the incorporation of accurate quadrature rules for integrating singular or weakly singular Green's functions on such patches, and their coupling to the oct-tree data structures on which the FMM separates near and far field interactions. Although the latter is straightforward for point distributions, the near field for a patch is determined by its physical dimensions, not the distribution of discretization points on the surface.

Here, we present a general framework for efficiently coupling locally corrected quadratures with FMMs, relying primarily on what are called generalized Gaussian quadratures rules, supplemented by adaptive integration. The approach, however, is quite general and easily applicable to other schemes, such as Quadrature by Expansion (QBX). We also introduce a number of accelerations to reduce the cost of quadrature generation itself, and present several numerical examples of acoustic scattering that demonstrate the accuracy, robustness, and computational efficiency of the scheme. On a single core of an Intel i5 2.3GHz processor, a Fortran implementation of the scheme can generate near field quadrature corrections for between 1000 and 10,000 points per second, depending on the order of accuracy and the desired precision. A Fortran implementation of the algorithm described in this work is available at https://gitlab.com/fastalgorithms/fmm3dbie.

Probabilistic Latent Variable Models (LVMs) provide an alternative to self-supervised learning approaches for linguistic representation learning from speech. LVMs admit an intuitive probabilistic interpretation where the latent structure shapes the information extracted from the signal. Even though LVMs have recently seen a renewed interest due to the introduction of Variational Autoencoders (VAEs), their use for speech representation learning remains largely unexplored. In this work, we propose Convolutional Deep Markov Model (ConvDMM), a Gaussian state-space model with non-linear emission and transition functions modelled by deep neural networks. This unsupervised model is trained using black box variational inference. A deep convolutional neural network is used as an inference network for structured variational approximation. When trained on a large scale speech dataset (LibriSpeech), ConvDMM produces features that significantly outperform multiple self-supervised feature extracting methods on linear phone classification and recognition on the Wall Street Journal dataset. Furthermore, we found that ConvDMM complements self-supervised methods like Wav2Vec and PASE, improving on the results achieved with any of the methods alone. Lastly, we find that ConvDMM features enable learning better phone recognizers than any other features in an extreme low-resource regime with few labeled training examples.

Normalizing flows model complex probability distributions by combining a base distribution with a series of bijective neural networks. State-of-the-art architectures rely on coupling and autoregressive transformations to lift up invertible functions from scalars to vectors. In this work, we revisit these transformations as probabilistic graphical models, showing that a flow reduces to a Bayesian network with a pre-defined topology and a learnable density at each node. From this new perspective, we propose the graphical normalizing flow, a new invertible transformation with either a prescribed or a learnable graphical structure. This model provides a promising way to inject domain knowledge into normalizing flows while preserving both the interpretability of Bayesian networks and the representation capacity of normalizing flows. We demonstrate experimentally that normalizing flows built on top of graphical conditioners are competitive density estimators. Finally, we illustrate how inductive bias can be embedded into normalizing flows by parameterizing graphical conditioners with convolutional networks.

In an electrostatic simulation, an equipotential condition with an undefined/floating potential value has to be enforced on the surface of an isolated conductor. If this conductor is charged, a nonzero charge condition is also required. While implementation of these conditions using a traditional finite element method (FEM) is not straightforward, they can be easily discretized and incorporated within a discontinuous Galerkin (DG) method. However, DG discretization results in a larger number of unknowns as compared to FEM. In this work, a hybridizable DG (HDG) method is proposed to alleviate this problem. Floating potential boundary conditions, possibly with different charge values, are introduced on surfaces of each isolated conductor and are weakly enforced in the global problem of HDG. The unknowns of the global HDG problem are those only associated with the nodes on the mesh skeleton and their number is much smaller than the total number of unknowns required by DG. Numerical examples show that the proposed method is as accurate as DG while it improves the computational efficiency significantly.

The focus of our work is dispersive, second-order effective model describing the low-frequency wave motion in heterogeneous (e.g.~functionally-graded) media endowed with periodic microstructure. For this class of quasi-periodic medium variations, we pursue homogenization of the scalar wave equation in $\mathbb{R}^d$, $d\geqslant 1$ within the framework of multiple scales expansion. When either $d=1$ or $d=2$, this model problem bears direct relevance to the description of (anti-plane) shear waves in elastic solids. By adopting the lengthscale of microscopic medium fluctuations as the perturbation parameter, we synthesize the germane low-frequency behavior via a fourth-order differential equation (with smoothly varying coefficients) governing the mean wave motion in the medium, where the effect of microscopic heterogeneities is upscaled by way of the so-called cell functions. In an effort to demonstrate the relevance of our analysis toward solving boundary value problems (deemed to be the ultimate goal of most homogenization studies), we also develop effective boundary conditions, up to the second order of asymptotic approximation, applicable to one-dimensional (1D) shear wave motion in a macroscopically heterogeneous solid with periodic microstructure. We illustrate the analysis numerically in 1D by considering (i) low-frequency wave dispersion, (ii) mean-field homogenized description of the shear waves propagating in a finite domain, and (iii) full-field homogenized description thereof. In contrast to (i) where the overall wave dispersion appears to be fairly well described by the leading-order model, the results in (ii) and (iii) demonstrate the critical role that higher-order corrections may have in approximating the actual waveforms in quasi-periodic media.

The perfectly matched layer (PML) is one of the most popular domain truncation techniques used in differential-equation based wave-type equation solvers. Practical implementations of PMLs often use smooth-varying attenuation coefficients to reduce the numerical reflection from PMLs and to reduce the thickness required for reaching a certain level of absorption. In discontinuous Galerkin time-domain (DGTD) methods, assigning smooth-varying coefficients calls for storing individual mass matrices for each mesh element, which significantly increases the memory-cost. This work proposes a memory-efficient implementation of smooth-varying PMLs in DGTD. A weight-adjusted approximation is applied to the mass matrices involved in the PML formulation, which yields a low memory-cost algorithm and maintains the high-order accuracy of DGTD. The proposed scheme has the same accuracy as the implementation of storing local mass matrices of each element, and provides a higher mesh flexibility and a smaller reflection than using a constant coefficient in each element.

Topological Data Analysis (TDA) provides novel approaches that allow us to analyze the geometrical shapes and topological structures of a dataset. As one important application, TDA can be used for data visualization and dimension reduction. We follow the framework of circular coordinate representation, which allows us to perform dimension reduction and visualization for high-dimensional datasets on a torus using persistent cohomology. In this paper, we propose a method to adapt the circular coordinate framework to take into account sparsity in high-dimensional applications. We use a generalized penalty function instead of an $L_{2}$ penalty in the traditional circular coordinate algorithm. We provide simulation experiments and real data analysis to support our claim that circular coordinates with generalized penalty will accommodate the sparsity in high-dimensional datasets under different sampling schemes while preserving the topological structures.

In this paper, we employ Cooperative Rate-Splitting (CRS) technique to enhance the Secrecy Sum Rate (SSR) for the Multiple Input Single Output (MISO) Broadcast Channel (BC), consisting of two legitimate users and one eavesdropper, with perfect Channel State Information (CSI) available at all nodes. For CRS based on the three-node relay channel, the transmitter splits and encodes the messages of legitimate users into common and private streams based on Rate-Splitting (RS). With the goal of maximizing SSR, the proposed CRS strategy opportunistically asks the relaying legitimate user to forward its decoded common message. During the transmission, the eavesdropper keeps wiretapping silently. To ensure secure transmission, the common message is used for the dual purpose, serving both as a desired message and Artificial Noise (AN) without consuming extra transmit power comparing to the conventional AN design. Taking into account the total power constraint and the Physical Layer (PHY) security, the precoders and time-slot allocation are jointly optimized by solving the non-convex SSR maximization problem based on Sequential Convex Approximation (SCA) algorithm. Numerical results show that the proposed CRS secure transmission scheme outperforms existing Multi-User Linear Precoding (MU-LP) and Cooperative Non-Orthogonal Multiple Access (C-NOMA) strategies. Therefore, CRS is a promising strategy to enhance the PHY security in multi-antenna BC systems.

Anderson acceleration (AA) is a popular method for accelerating fixed-point iterations, but may suffer from instability and stagnation. We propose a globalization method for AA to improve its stability and achieve global and local convergence. Unlike existing AA globalization approaches that often rely on safeguarding operations and might hinder fast local convergence, we adopt a nonmonotone trust-region framework and introduce an adaptive quadratic regularization together with a tailored acceptance mechanism. We prove global convergence and show that our algorithm attains the same local convergence as AA under appropriate assumptions. The effectiveness of our method is demonstrated in several numerical experiments.

Hacking password databases is one of the most frequently reported cyber-attacks. Current password management systems are based on known and public algorithms. Also, many studies have shown that users select weak passwords. Thus, with the emergence of new powerful computing devices, the passwords based on known algorithms can be disclosed. Using physical unclonable functions (PUFs) for increasing the security level of password management systems is a quite recent method that is proposed to solve this problem. In this method, Addressable PUF Generator (APG) is added to the conventional password management systems. This report is aimed at implementing the password generation scheme using SRAM-based PUF. The bit error is indeed the main issue with using PUFs is addresses in this paper. To solve this issue, Ternary Addresseble PUF Generator is used.

Structural balance theory assumes triads in networks to gravitate towards stable configurations. The theory has been verified for undirected graphs. Since real-world networks are often directed, we introduce a novel method for considering both transitivity and sign consistency for calculating balance in signed digraphs. We test our approach on graphs that we constructed by using different methods for identifying edge signs: natural language processing to infer signs from underlying text data, and self-reported survey data. Our results show that for various social contexts and edge sign detection methods, balance is moderately high, ranging from 67.5% to 92.4%.

We present a corpus of 7,500 tweets annotated with COVID-19 events, including positive test results, denied access to testing, and more. We show that our corpus enables automatic identification of COVID-19 events mentioned in Twitter with text spans that fill a set of pre-defined slots for each event. We also present analyses on the self-reporting cases and user's demographic information. We will make our annotated corpus and extraction tools available for the research community to use upon publication at https://github.com/viczong/extract_COVID19_events_from_Twitter

Purpose: We proposed a deep convolutional neural network (CNN), named Retinal Fluid Segmentation Network (ReF-Net) to segment volumetric retinal fluid on optical coherence tomography (OCT) volume. Methods: 3 x 3-mm OCT scans were acquired on one eye by a 70-kHz OCT commercial AngioVue system (RTVue-XR; Optovue, Inc.) from 51 participants in a clinical diabetic retinopathy (DR) study (45 with retinal edema and 6 healthy controls). A CNN with U-Net-like architecture was constructed to detect and segment the retinal fluid. Cross-sectional OCT and angiography (OCTA) scans were used for training and testing ReF-Net. The effect of including OCTA data for retinal fluid segmentation was investigated in this study. Volumetric retinal fluid can be constructed using the output of ReF-Net. Area-under-Receiver-Operating-Characteristic-curve (AROC), intersection-over-union (IoU), and F1-score were calculated to evaluate the performance of ReF-Net. Results: ReF-Net shows high accuracy (F1 = 0.864 +/- 0.084) in retinal fluid segmentation. The performance can be further improved (F1 = 0.892 +/- 0.038) by including information from both OCTA and structural OCT. ReF-Net also shows strong robustness to shadow artifacts. Volumetric retinal fluid can provide more comprehensive information than the 2D area, whether cross-sectional or en face projections. Conclusions: A deep-learning-based method can accurately segment retinal fluid volumetrically on OCT/OCTA scans with strong robustness to shadow artifacts. OCTA data can improve retinal fluid segmentation. Volumetric representations of retinal fluid are superior to 2D projections. Translational Relevance: Using a deep learning method to segment retinal fluid volumetrically has the potential to improve the diagnostic accuracy of diabetic macular edema by OCT systems.

The outbreak of COVID-19 has shocked the entire world with its fairly rapid spread and has challenged different sectors. One of the most effective ways to limit its spread is the early and accurate diagnosis of infected patients. Medical imaging such as X-ray and Computed Tomography (CT) combined with the potential of Artificial Intelligence (AI) plays an essential role in supporting the medical staff in the diagnosis process. Thereby, the use of five different deep learning models (ResNet18, ResNet34, InceptionV3, InceptionResNetV2, and DenseNet161) and their Ensemble have been used in this paper, to classify COVID-19, pneumoni{\ae} and healthy subjects using Chest X-Ray. Multi-label classification was performed to predict multiple pathologies for each patient, if present. Foremost, the interpretability of each of the networks was thoroughly studied using techniques like occlusion, saliency, input X gradient, guided backpropagation, integrated gradients, and DeepLIFT. The mean Micro-F1 score of the models for COVID-19 classifications ranges from 0.66 to 0.875, and is 0.89 for the Ensemble of the network models. The qualitative results depicted the ResNets to be the most interpretable model.

Entropy regularization in optimal transport (OT) has been the driver of many recent interests for Wasserstein metrics and barycenters in machine learning. It allows to keep the appealing geometrical properties of the unregularized Wasserstein distance while having a significantly lower complexity thanks to Sinkhorn's algorithm. However, entropy brings some inherent smoothing bias, resulting for example in blurred barycenters. This side effect has prompted an increasing temptation in the community to settle for a slower algorithm such as log-domain stabilized Sinkhorn which breaks the parallel structure that can be leveraged on GPUs, or even go back to unregularized OT. Here we show how this bias is tightly linked to the reference measure that defines the entropy regularizer and propose debiased Wasserstein barycenters that preserve the best of both worlds: fast Sinkhorn-like iterations without entropy smoothing. Theoretically, we prove that the entropic OT barycenter of univariate Gaussians is a Gaussian and quantify its variance bias. This result is obtained by extending the differentiability and convexity of entropic OT to sub-Gaussian measures with unbounded supports. Empirically, we illustrate the reduced blurring and the computational advantage on various applications.

This paper concerns the ethics and morality of algorithms and computational systems, and has been circulating internally at Facebook for the past couple years. The paper reviews many Nobel laureates' work, as well as the work of other prominent scientists such as Richard Dawkins, Andrei Kolmogorov, Vilfredo Pareto, and John von Neumann. The paper draws conclusions based on such works, as summarized in the title. The paper argues that the standard approach to modern machine learning and artificial intelligence is bound to be biased and unfair, and that longstanding traditions in the professions of law, justice, politics, and medicine should help.

Robust traffic sign detection and recognition (TSDR) is of paramount importance for the successful realization of autonomous vehicle technology. The importance of this task has led to a vast amount of research efforts and many promising methods have been proposed in the existing literature. However, the SOTA (SOTA) methods have been evaluated on clean and challenge-free datasets and overlooked the performance deterioration associated with different challenging conditions (CCs) that obscure the traffic images captured in the wild. In this paper, we look at the TSDR problem under CCs and focus on the performance degradation associated with them. To overcome this, we propose a Convolutional Neural Network (CNN) based TSDR framework with prior enhancement. Our modular approach consists of a CNN-based challenge classifier, Enhance-Net, an encoder-decoder CNN architecture for image enhancement, and two separate CNN architectures for sign-detection and classification. We propose a novel training pipeline for Enhance-Net that focuses on the enhancement of the traffic sign regions (instead of the whole image) in the challenging images subject to their accurate detection. We used CURE-TSD dataset consisting of traffic videos captured under different CCs to evaluate the efficacy of our approach. We experimentally show that our method obtains an overall precision and recall of 91.1% and 70.71% that is 7.58% and 35.90% improvement in precision and recall, respectively, compared to the current benchmark. Furthermore, we compare our approach with SOTA object detection networks, Faster-RCNN and R-FCN, and show that our approach outperforms them by a large margin.

Reinforcement learning algorithms have had tremendous successes in online learning settings. However, these successes have relied on low-stakes interactions between the algorithmic agent and its environment. In many settings where RL could be of use, such as health care and autonomous driving, the mistakes made by most online RL algorithms during early training come with unacceptable costs. These settings require developing reinforcement learning algorithms that can operate in the so-called batch setting, where the algorithms must learn from set of data that is fixed, finite, and generated from some (possibly unknown) policy. Evaluating policies different from the one that collected the data is called off-policy evaluation, and naturally poses counter-factual questions. In this project we show how off-policy evaluation and the estimation of treatment effects in causal inference are two approaches to the same problem, and compare recent progress in these two areas.

We consider speeding up stochastic gradient descent (SGD) by parallelizing it across multiple workers. We assume the same data set is shared among $n$ workers, who can take SGD steps and coordinate with a central server. Unfortunately, this could require a lot of communication between the workers and the server, which can dramatically reduce the gains from parallelism. The Local SGD method, proposed and analyzed in the earlier literature, suggests machines should make many local steps between such communications. While the initial analysis of Local SGD showed it needs $\Omega ( \sqrt{T} )$ communications for $T$ local gradient steps in order for the error to scale proportionately to $1/(nT)$, this has been successively improved in a string of papers, with the state-of-the-art requiring $\Omega \left( n \left( \mbox{ polynomial in log } (T) \right) \right)$ communications. In this paper, we give a new analysis of Local SGD. A consequence of our analysis is that Local SGD can achieve an error that scales as $1/(nT)$ with only a fixed number of communications independent of $T$: specifically, only $\Omega(n)$ communications are required.

Online mirror descent (OMD) and dual averaging (DA) are two fundamental algorithms for online convex optimization. They are known to have very similar (or even identical) performance guarantees in most scenarios when a \emph{fixed} learning rate is used. However, for \emph{dynamic} learning rates OMD is provably inferior to DA. It is known that, with a dynamic learning rate, OMD can suffer linear regret, even in common settings such as prediction with expert advice. This hints that the relationship between OMD and DA is not fully understood at present.

In this paper, we modify the OMD algorithm by a simple technique that we call stabilization. We give essentially the same abstract regret bound for stabilized OMD and DA by modifying the classical OMD convergence analysis in a careful and modular way, yielding proofs that we believe to be clean and flexible. Simple corollaries of these bounds show that OMD with stabilization and DA enjoy the same performance guarantees in many applications even under dynamic learning rates. We also shed some light on the similarities between OMD and DA and show simple conditions under which stabilized OMD and DA generate the same iterates.

Graphs neural networks (GNNs) learn node features by aggregating and combining neighbor information, which have achieved promising performance on many graph tasks. However, GNNs are mostly treated as black-boxes and lack human intelligible explanations. Thus, they cannot be fully trusted and used in certain application domains if GNN models cannot be explained. In this work, we propose a novel approach, known as XGNN, to interpret GNNs at the model-level. Our approach can provide high-level insights and generic understanding of how GNNs work. In particular, we propose to explain GNNs by training a graph generator so that the generated graph patterns maximize a certain prediction of the model.We formulate the graph generation as a reinforcement learning task, where for each step, the graph generator predicts how to add an edge into the current graph. The graph generator is trained via a policy gradient method based on information from the trained GNNs. In addition, we incorporate several graph rules to encourage the generated graphs to be valid. Experimental results on both synthetic and real-world datasets show that our proposed methods help understand and verify the trained GNNs. Furthermore, our experimental results indicate that the generated graphs can provide guidance on how to improve the trained GNNs.

Dialog policy determines the next-step actions for agents and hence is central to a dialogue system. However, when migrated to novel domains with little data, a policy model can fail to adapt due to insufficient interactions with the new environment. We propose Deep Transferable Q-Network (DTQN) to utilize shareable low-level signals between domains, such as dialogue acts and slots. We decompose the state and action representation space into feature subspaces corresponding to these low-level components to facilitate cross-domain knowledge transfer. Furthermore, we embed DTQN in a meta-learning framework and introduce Meta-DTQN with a dual-replay mechanism to enable effective off-policy training and adaptation. In experiments, our model outperforms baseline models in terms of both success rate and dialogue efficiency on the multi-domain dialogue dataset MultiWOZ 2.0.

A new method for improving the optimization performance of a state-of-the-art differential evolution (DE) variant is proposed in this paper. The technique can increase the exploration by adopting the long-tailed property of the Cauchy distribution, which helps the algorithm to generate a trial vector with great diversity. Compared to the previous approaches, the proposed approach perturbs a target vector instead of a mutant vector based on a jumping rate. We applied the proposed approach to LSHADE-RSP ranked second place in the CEC 2018 competition on single objective real-valued optimization. A set of 30 different and difficult optimization problems is used to evaluate the optimization performance of the improved LSHADE-RSP. Our experimental results verify that the improved LSHADE-RSP significantly outperformed not only its predecessor LSHADE-RSP but also several cutting-edge DE variants in terms of convergence speed and solution accuracy.

Data augmentations have been widely studied to improve the accuracy and robustness of classifiers. However, the potential of image augmentation in improving GAN models for image synthesis has not been thoroughly investigated in previous studies. In this work, we systematically study the effectiveness of various existing augmentation techniques for GAN training in a variety of settings. We provide insights and guidelines on how to augment images for both vanilla GANs and GANs with regularizations, improving the fidelity of the generated images substantially. Surprisingly, we find that vanilla GANs attain generation quality on par with recent state-of-the-art results if we use augmentations on both real and generated images. When this GAN training is combined with other augmentation-based regularization techniques, such as contrastive loss and consistency regularization, the augmentations further improve the quality of generated images. We provide new state-of-the-art results for conditional generation on CIFAR-10 with both consistency loss and contrastive loss as additional regularizations.

Tracking an unknown target captured from medium- or high-aerial view is challenging, especially in scenarios of small objects, large viewpoint change, drastic camera motion, and high density. This paper introduces a context-aware IoU-guided tracker that exploits an offline reference proposal generation strategy and a multitask two-stream network. The proposed strategy introduces an efficient sampling strategy to generalize the network on the target and its parts without imposing extra computational complexity during online tracking. It considerably helps the proposed tracker, COMET, to handle occlusion and view-point change, where only some parts of the target are visible. Extensive experimental evaluations on broad range of small object benchmarks (UAVDT, VisDrone-2019, and Small-90) demonstrate the effectiveness of our approach for small object tracking.

A major endeavor of computer vision is to represent, understand and extract structure from 3D data. Towards this goal, unsupervised learning is a powerful and necessary tool. Most current unsupervised methods for 3D shape analysis use datasets that are aligned, require objects to be reconstructed and suffer from deteriorated performance on downstream tasks. To solve these issues, we propose to extend the InfoMax and contrastive learning principles on 3D shapes. We show that we can maximize the mutual information between 3D objects and their "chunks" to improve the representations in aligned datasets. Furthermore, we can achieve rotation invariance in SO${(3)}$ group by maximizing the mutual information between the 3D objects and their geometric transformed versions. Finally, we conduct several experiments such as clustering, transfer learning, shape retrieval, and achieve state of art results.

We study the convergence rates of the EM algorithm for learning two-component mixed linear regression under all regimes of signal-to-noise ratio (SNR). We resolve a long-standing question that many recent results have attempted to tackle: we completely characterize the convergence behavior of EM, and show that the EM algorithm achieves minimax optimal sample complexity under all SNR regimes. In particular, when the SNR is sufficiently large, the EM updates converge to the true parameter $\theta^{*}$ at the standard parametric convergence rate $\mathcal{O}((d/n)^{1/2})$ after $\mathcal{O}(\log(n/d))$ iterations. In the regime where the SNR is above $\mathcal{O}((d/n)^{1/4})$ and below some constant, the EM iterates converge to a $\mathcal{O}({\rm SNR}^{-1} (d/n)^{1/2})$ neighborhood of the true parameter, when the number of iterations is of the order $\mathcal{O}({\rm SNR}^{-2} \log(n/d))$. In the low SNR regime where the SNR is below $\mathcal{O}((d/n)^{1/4})$, we show that EM converges to a $\mathcal{O}((d/n)^{1/4})$ neighborhood of the true parameters, after $\mathcal{O}((n/d)^{1/2})$ iterations. Notably, these results are achieved under mild conditions of either random initialization or an efficiently computable local initialization. By providing tight convergence guarantees of the EM algorithm in middle-to-low SNR regimes, we fill the remaining gap in the literature, and significantly, reveal that in low SNR, EM changes rate, matching the $n^{-1/4}$ rate of the MLE, a behavior that previous work had been unable to show.

This paper investigates the multi-GPU performance of a 3D buoyancy driven cavity solver using MPI and OpenACC directives on different platforms. The paper shows that decomposing the total problem in different dimensions affects the strong scaling performance significantly for the GPU. Without proper performance optimizations, it is shown that 1D domain decomposition scales poorly on multiple GPUs due to the noncontiguous memory access. The performance using whatever decompositions can be benefited from a series of performance optimizations in the paper. Since the buoyancy driven cavity code is latency-bounded on the clusters examined, a series of optimizations both agnostic and tailored to the platforms are designed to reduce the latency cost and improve memory throughput between hosts and devices efficiently. First, the parallel message packing/unpacking strategy developed for noncontiguous data movement between hosts and devices improves the overall performance by about a factor of 2. Second, transferring different data based on the stencil sizes for different variables further reduces the communication overhead. These two optimizations are general enough to be beneficial to stencil computations having ghost changes on all of the clusters tested. Third, GPUDirect is used to improve the communication on clusters which have the hardware and software support for direct communication between GPUs without staging CPU's memory. Finally, overlapping the communication and computations is shown to be not efficient on multi-GPUs if only using MPI or MPI+OpenACC. Although we believe our implementation has revealed enough overlap, the actual running does not utilize the overlap well due to a lack of asynchronous progression.

Model-based reinforcement learning (MBRL) has been applied to meta-learning settings and demonstrated its high sample efficiency. However, in previous MBRL for meta-learning settings, policies are optimized via rollouts that fully rely on a predictive model for an environment, and thus its performance in a real environment tends to degrade when the predictive model is inaccurate. In this paper, we prove that the performance degradation can be suppressed by using branched meta-rollouts. Based on this theoretical analysis, we propose meta-model-based meta-policy optimization (M3PO), in which the branched meta-rollouts are used for policy optimization. We demonstrate that M3PO outperforms existing meta reinforcement learning methods in continuous-control benchmarks.

Multi-object tracking has seen a lot of progress recently, albeit with substantial annotation costs for developing better and larger labeled datasets. In this work, we remove the need for annotated datasets by proposing an unsupervised re-identification network, thus sidestepping the labeling costs entirely, required for training. Given unlabeled videos, our proposed method (SimpleReID) first generates tracking labels using SORT and trains a ReID network to predict the generated labels using crossentropy loss. We demonstrate that SimpleReID performs substantially better than simpler alternatives, and we recover the full performance of its supervised counterpart consistently across diverse tracking frameworks. The observations are unusual because unsupervised ReID is not expected to excel in crowded scenarios with occlusions, and drastic viewpoint changes. By incorporating our unsupervised SimpleReID with CenterTrack trained on augmented still images, we establish a new state-of-the-art performance on popular datasets like MOT16/17 without using tracking supervision, beating current best (CenterTrack) by 0.2-0.3 MOTA and 4.4-4.8 IDF1 scores. We further provide evidence for limited scope for improvement in IDF1 scores beyond our unsupervised ReID in the studied settings. Our investigation suggests reconsideration towards more sophisticated, supervised, end-to-end trackers by showing promise in simpler unsupervised alternatives.

Automated heart sounds classification is a much-required diagnostic tool in the view of increasing incidences of heart related diseases worldwide. In this study, we conduct a comprehensive study of heart sounds classification by using various supervised, semi-supervised and unsupervised approaches on the PhysioNet/CinC 2016 Challenge dataset. Supervised approaches, including deep learning and machine learning methods, require large amounts of labelled data to train the models, which are challenging to obtain in most practical scenarios. In view of the need to reduce the labelling burden for clinical practices, where human labelling is both expensive and time-consuming, semi-supervised or even unsupervised approaches in restricted data setting are desirable. A GAN based semi-supervised method is therefore proposed, which allows the usage of unlabelled data samples to boost the learning of data distribution. It achieves a better performance in terms of AUROC over the supervised baseline when limited data samples exist. Furthermore, several unsupervised methods are explored as an alternative approach by considering the given problem as an anomaly detection scenario. In particular, the unsupervised feature extraction using 1D CNN Autoencoder coupled with one-class SVM obtains good performance without any data labelling. The potential of the proposed semi-supervised and unsupervised methods may lead to a workflow tool in the future for the creation of higher quality datasets.

We consider the problem of model selection for two popular stochastic linear bandit settings, and propose algorithms that adapts to the unknown problem complexity. In the first setting, we consider the $K$ armed mixture bandits, where the mean reward of arm $i \in [K]$, is $\mu_i+ \langle \alpha_{i,t},\theta^* \rangle$, with $\alpha_{i,t} \in \mathbb{R}^d$ being the known context vector and $\mu_i \in [-1,1]$ and $\theta^*$ are unknown parameters. We define $\|\theta^*\|$ as the problem complexity and consider a sequence of nested hypothesis classes, each positing a different upper bound on $\|\theta^*\|$. Exploiting this, we propose Adaptive Linear Bandit (ALB), a novel phase based algorithm that adapts to the true problem complexity, $\|\theta^*\|$. We show that ALB achieves regret scaling of $O(\|\theta^*\|\sqrt{T})$, where $\|\theta^*\|$ is apriori unknown. As a corollary, when $\theta^*=0$, ALB recovers the minimax regret for the simple bandit algorithm without such knowledge of $\theta^*$. ALB is the first algorithm that uses parameter norm as model section criteria for linear bandits. Prior state of art algorithms \cite{osom} achieve a regret of $O(L\sqrt{T})$, where $L$ is the upper bound on $\|\theta^*\|$, fed as an input to the problem. In the second setting, we consider the standard linear bandit problem (with possibly an infinite number of arms) where the sparsity of $\theta^*$, denoted by $d^* \leq d$, is unknown to the algorithm. Defining $d^*$ as the problem complexity, we show that ALB achieves $O(d^*\sqrt{T})$ regret, matching that of an oracle who knew the true sparsity level. This is the first algorithm that achieves such model selection guarantees resolving an open problem in \cite{foster_model_selection}. We further verify through synthetic and real-data experiments that the performance gains are fundamental and not artifacts of mathematical bounds.

In this article, we consider the problem of high-dimensional conditional independence testing, which is a key building block in statistics and machine learning. We propose a double generative adversarial networks (GANs)-based inference procedure. We first introduce a double GANs framework to learn two generators, and integrate the two generators to construct a doubly-robust test statistic. We next consider multiple generalized covariance measures, and take their maximum as our test statistic. Finally, we obtain the empirical distribution of our test statistic through multiplier bootstrap. We show that our test controls type-I error, while the power approaches one asymptotically. More importantly, these theoretical guarantees are obtained under much weaker and practically more feasible conditions compared to existing tests. We demonstrate the efficacy of our test through both synthetic and real datasets.

End-to-end speaker diarization using a fully supervised self-attention mechanism (SA-EEND) has achieved significant improvement from the state-of-art clustering-based methods, especially for the overlapping case. However, applications of original SA-EEND are limited since it has been developed based on offline self-attention algorithms. In this paper, we propose a novel speaker-tracing mechanism to extend SA-EEND to online speaker diarization for practical use. First, this paper demonstrates oracle experiments to show that a straightforward online extension, in which SA-EEND is performed independently for each chunked recording, results in degrading the diarization error rate (DER) due to the speaker permutation inconsistency across the chunk. To circumvent this inconsistency issue, our proposed method, called speaker-tracing buffer, maintains the speaker permutation information determined in previous chunks within the self-attention mechanism for correct speaker-tracing. Our experimental results show that the proposed online SA-EEND with speaker-tracing buffer achieved the DERs of 12.84% for CALLHOME and 21.64% for Corpus of Spontaneous Japanese with 1s latency. These results are significantly better than the conventional online clustering method based on x-vector with 1.5s latency, which achieved the DERs of 26.90% and 25.45%, respectively.

Machine learning is poised as a very powerful tool that can drastically improve our ability to carry out scientific research. However, many issues need to be addressed before this becomes a reality. This article focuses on one particular issue of broad interest: How can we integrate machine learning with physics-based modeling to develop new interpretable and truly reliable physical models? After introducing the general guidelines, we discuss the two most important issues for developing machine learning-based physical models: Imposing physical constraints and obtaining optimal datasets. We also provide a simple and intuitive explanation for the fundamental reasons behind the success of modern machine learning, as well as an introduction to the concurrent machine learning framework needed for integrating machine learning with physics-based modeling. Molecular dynamics and moment closure of kinetic equations are used as examples to illustrate the main issues discussed. We end with a general discussion on where this integration will lead us to, and where the new frontier will be after machine learning is successfully integrated into scientific modeling.

Image Completion refers to the task of filling in the missing regions of an image and Image Extrapolation refers to the task of extending an image at its boundaries while keeping it coherent. Many recent works based on GAN have shown progress in addressing these problem statements but lack adaptability for these two cases, i.e. the neural network trained for the completion of interior masked images does not generalize well for extrapolating over the boundaries and vice-versa. In this paper, we present a technique to train both completion and extrapolation networks concurrently while benefiting each other. We demonstrate our method's efficiency in completing large missing regions and we show the comparisons with the contemporary state of the art baseline.

Most existing black-box optimization methods assume that all variables in the system being optimized have equal cost and can change freely at each iteration. However, in many real world systems, inputs are passed through a sequence of different operations or modules, making variables in earlier stages of processing more costly to update. Such structure imposes a cost on switching variables in early parts of a data processing pipeline. In this work, we propose a new algorithm for switch cost-aware optimization called Lazy Modular Bayesian Optimization (LaMBO). This method efficiently identifies the global optimum while minimizing cost through a passive change of variables in early modules. The method is theoretical grounded and achieves vanishing regret when augmented with switching cost. We apply LaMBO to multiple synthetic functions and a three-stage image segmentation pipeline used in a neuroscience application, where we obtain promising improvements over prevailing cost-aware Bayesian optimization algorithms. Our results demonstrate that LaMBO is an effective strategy for black-box optimization that is capable of minimizing switching costs in modular systems.

We propose a new numerical method for one dimensional stochastic differential equations (SDEs). The main idea of this method is based on a representation of a weak solution of a SDE with a time changed Brownian motion, dated back to Doeblin (1940). In cases where the diffusion coefficient is bounded and $\beta$-H\"{o}lder continuous with $0 < \beta \leq 1$, we provide the rate of strong convergence. An advantage of our approach is that we approximate the weak solution, which enables us to treat a SDE with no strong solution. Our scheme is the first to achieve the strong convergence for the case $0 < \beta < 1/2$.

Whole brain extraction, also known as skull stripping, is a process in neuroimaging in which non-brain tissue such as skull, eyeballs, skin, etc. are removed from neuroimages. Skull striping is a preliminary step in presurgical planning, cortical reconstruction, and automatic tumor segmentation. Despite a plethora of skull stripping approaches in the literature, few are sufficiently accurate for processing pathology-presenting MRIs, especially MRIs with brain tumors. In this work we propose a deep learning approach for skull striping common MRI sequences in oncology such as T1-weighted with gadolinium contrast (T1Gd) and T2-weighted fluid attenuated inversion recovery (FLAIR) in patients with brain tumors. We automatically created gray matter, white matter, and CSF probability masks using SPM12 software and merged the masks into one for a final whole-brain mask for model training. Dice agreement, sensitivity, and specificity of the model (referred herein as DeepBrain) was tested against manual brain masks. To assess data efficiency, we retrained our models using progressively fewer training data examples and calculated average dice scores on the test set for the models trained in each round. Further, we tested our model against MRI of healthy brains from the LBP40A dataset. Overall, DeepBrain yielded an average dice score of 94.5%, sensitivity of 96.4%, and specificity of 98.5% on brain tumor data. For healthy brains, model performance improved to a dice score of 96.2%, sensitivity of 96.6% and specificity of 99.2%. The data efficiency experiment showed that, for this specific task, comparable levels of accuracy could have been achieved with as few as 50 training samples. In conclusion, this study demonstrated that a deep learning model trained on minimally processed automatically-generated labels can generate more accurate brain masks on MRI of brain tumor patients within seconds.

Multi-modal multi-objective optimization problems (MMMOPs) have multiple solution vectors mapping to the same objective vector. For MMMOPs, it is important to discover equivalent solutions associated with each point in the Pareto-Front for allowing end-users to make informed decisions. Prevalent multi-objective evolutionary algorithms are incapable of searching for multiple solution subsets, whereas, algorithms designed for MMMOPs demonstrate degraded performance in the objective space. This motivates the design of better algorithms for addressing MMMOPs. The present work highlights the disadvantage of using crowding distance in the decision space when solving MMMOPs. Subsequently, an evolutionary framework, called graph Laplacian based Optimization using Reference vector assisted Decomposition (LORD), is proposed, which is the first algorithm to use decomposition in both objective and decision space for dealing with MMMOPs. Its filtering step is further extended to present LORD-II algorithm, which demonstrates its dynamics on multi-modal many-objective problems. The efficacy of the frameworks are established by comparing their performance on 34 test instances (obtained from the CEC 2019 multi-modal multi-objective test suite) with the state-of-the-art algorithms for MMMOPs and other multi- and many-objective evolutionary algorithms. The manuscript is concluded mentioning the limitations of the proposed frameworks and future directions to design still better algorithms for MMMOPs.

We present FastReID, as a widely used object re-identification (re-id) software system in JD AI Research. High modular and extensible design makes it easy for the researcher to achieve new research ideas. Friendly manageable system configuration and engineering deployment functions allow practitioners to quickly deploy models into productions. We have implemented some state-of-the-art algorithms, including person re-id, partial re-id, cross-domain re-id and vehicle re-id, and plan to release these pre-trained models on multiple benchmark datasets. FastReID is by far the most complete and high-performance toolbox supports single and multiple GPU servers, you can reproduce our project results very easily and are very welcome to use it, the code and models are available at https://github.com/JDAI-CV/fast-reid.

There are increasingly applications of natural language processing techniques for information retrieval, indexing and topic modelling in the engineering contexts. A standard component of such tasks is the removal of stopwords, which are uninformative components of the data. While researchers use readily available stopword lists which are derived for general English language, the technical jargon of engineering fields contains their own highly frequent and uninformative words and there exists no standard stopword list for technical language processing applications. Here we address this gap by rigorously identifying generic, insignificant, uninformative stopwords in engineering texts beyond the stopwords in general texts, based on the synthesis of alternative data-driven approaches, and curating a stopword list ready for technical language processing applications.

For the problem of nonlinear equations, the homotopy methods (continuation methods) are popular in engineering fields since their convergence regions are large and they are fairly reliable to find a solution. The disadvantage of the classical homotopy methods is that their consumed time is heavy since they need to solve many auxiliary systems of nonlinear equations during the intermediate continuation processes. In order to overcome this shortcoming, we consider the special continuation method based on the Newton flow and follow its trajectory with the new time-stepping scheme based on the trust-region technique. Furthermore, we analyze the global convergence and local superlinear convergence of the new method. Finally, the promising numerical results of the new method for some real-world problems are also reported, with comparison to the traditional trust-region method (the built-in subroutine fsolve.m of the MATLAB environment \cite{MATLAB,More1978}) and the classical homotopy continuation methods (HOMPACK90 \cite{WSMMW1997} and the built-in subroutines psolve.m for polynomial systems, GaussNewton.m for non-polynomial systems of the NAClab environment \cite{ZL2013}).

This paper presents a Multitask Multilingual Multimodal Pre-trained model (M3P) that combines multilingual-monomodal pre-training and monolingual-multimodal pre-training into a unified framework via multitask learning and weight sharing. The model learns universal representations that can map objects that occurred in different modalities or expressed in different languages to vectors in a common semantic space. To verify the generalization capability of M3P, we fine-tune the pre-trained model for different types of downstream tasks: multilingual image-text retrieval, multilingual image captioning, multimodal machine translation, multilingual natural language inference and multilingual text generation. Evaluation shows that M3P can (i) achieve comparable results on multilingual tasks and English multimodal tasks, compared to the state-of-the-art models pre-trained for these two types of tasks separately, and (ii) obtain new state-of-the-art results on non-English multimodal tasks in the zero-shot or few-shot setting. We also build a new Multilingual Image-Language Dataset (MILD) by collecting large amounts of (text-query, image, context) triplets in 8 languages from the logs of a commercial search engine

Roads have well defined geometries, topologies, and traffic rules. While this has been widely exploited in motion planning methods to produce maneuvers that obey the law, little work has been devoted to utilize these priors in perception and motion forecasting methods. In this paper we propose to incorporate these structured priors as a loss function. In contrast to imposing hard constraints, this approach allows the model to handle non-compliant maneuvers when those happen in the real world. Safe motion planning is the end goal, and thus a probabilistic characterization of the possible future developments of the scene is key to choose the plan with the lowest expected cost. Towards this goal, we design a framework that leverages REINFORCE to incorporate non-differentiable priors over sample trajectories from a probabilistic model, thus optimizing the whole distribution. We demonstrate the effectiveness of our approach on real-world self-driving datasets containing complex road topologies and multi-agent interactions. Our motion forecasts not only exhibit better precision and map understanding, but most importantly result in safer motion plans taken by our self-driving vehicle. We emphasize that despite the importance of this evaluation, it has been often overlooked by previous perception and motion forecasting works.

For the gradient computation across the time domain in Spiking Neural Networks (SNNs) training, two different approaches have been independently studied. The first is to compute the gradients with respect to the change in spike activation (activation-based methods), and the second is to compute the gradients with respect to the change in spike timing (timing-based methods). In this work, we present a comparative study of the two methods and propose a new supervised learning method that combines them. The proposed method utilizes each individual spike more effectively by shifting spike timings as in the timing-based methods as wells as generating and removing spikes as in the activation-based methods. Experimental results showed that the proposed method achieves higher performance in terms of both accuracy and efficiency than the previous approaches.

Motivated by the prevalent data science applications of processing and mining large-scale graph data such as social networks, web graphs, and biological networks, as well as the high I/O and communication costs of storing and transmitting such data, this paper investigates lossless compression of data appearing in the form of a labeled graph. A universal graph compression scheme is proposed, which does not depend on the underlying statistics/distribution of the graph model. For graphs generated by a stochastic block model, which is a widely used random graph model capturing the clustering effects in social networks, the proposed scheme achieves the optimal theoretical limit of lossless compression without the need to know edge probabilities, community labels, or the number of communities.

The key ideas in establishing universality for stochastic block models include: 1) block decomposition of the adjacency matrix of the graph; 2) generalization of the Krichevsky-Trofimov probability assignment, which was initially designed for i.i.d. random processes. In four benchmark graph datasets (protein-to-protein interaction, LiveJournal friendship, Flickr, and YouTube), the compressed files from competing algorithms (including CSR, Ligra+, PNG image compressor, and Lempel-Ziv compressor for two-dimensional data) take 2.4 to 27 times the space needed by the proposed scheme.

We modeled the Quora question pairs dataset to identify a similar question. The dataset that we use is provided by Quora. The task is a binary classification. We tried several methods and algorithms and different approach from previous works. For feature extraction, we used Bag of Words including Count Vectorizer, and Term Frequency-Inverse Document Frequency with unigram for XGBoost and CatBoost. Furthermore, we also experimented with WordPiece tokenizer which improves the model performance significantly. We achieved up to 97 percent accuracy. Code and Dataset.

Transfer learning entails taking an artificial neural network (ANN) that is trained on a source dataset and adapting it to a new target dataset. While this has been shown to be quite powerful, its use has generally been restricted by architectural constraints. Previously, in order to reuse and adapt an ANN's internal weights and structure, the underlying topology of the ANN being transferred across tasks must remain mostly the same while a new output layer is attached, discarding the old output layer's weights. This work introduces network-aware adaptive structure transfer learning (N-ASTL), an advancement over prior efforts to remove this restriction. N-ASTL utilizes statistical information related to the source network's topology and weight distribution in order to inform how new input and output neurons are to be integrated into the existing structure. Results show improvements over prior state-of-the-art, including the ability to transfer in challenging real-world datasets not previously possible and improved generalization over RNNs trained without transfer.

We present a motion planning algorithm with probabilistic guarantees for limbed robots with stochastic gripping forces. Planners based on deterministic models with a worst-case uncertainty can be conservative and inflexible to consider the stochastic behavior of the contact, especially when a gripper is installed. Our proposed planner enables the robot to simultaneously plan its pose and contact force trajectories while considering the risk associated with the gripping forces. Our planner is formulated as a nonlinear programming problem with chance constraints, which allows the robot to generate a variety of motions based on different risk bounds. To model the gripping forces as random variables, we employ Gaussian Process regression. We validate our proposed motion planning algorithm on an 11.5 kg six-limbed robot for two-wall climbing. Our results show that our proposed planner generates various trajectories (e.g., avoiding low friction terrain under the low risk bound, choosing an unstable but faster gait under the high risk bound) by changing the probability of risk based on various specifications.

Distributed localization is essential in many robotic collective tasks such as shape formation and self-assembly.Inspired by the statistical mechanics of energy transition, this paper presents a fully distributed localization algorithm named as virtual particle exchange (VPE) localization algorithm, where each robot repetitively exchanges virtual particles (VPs) with neighbors and eventually obtains its relative position from the virtual particle (VP) amount it owns. Using custom-designed hardware and protocol, VPE localization algorithm allows robots to achieve localization using sensor readings only, avoiding direct communication with neighbors and keeping anonymity. Moreover, VPE localization algorithm determines the swarm center automatically, thereby eliminating the requirement of fixed beacons to embody the origin of coordinates. Theoretical analysis proves that the VPE localization algorithm can always converge to the same result regardless of initial state and has low asymptotic time and memory complexity. Extensive localization simulations with up to 10000 robots and experiments with 52 lowcost robots are carried out, which verify that VPE localization algorithm is scalable, accurate and robust to sensor noises. Based on the VPE localization algorithm, shape formations are further achieved in both simulations and experiments with 52 robots, illustrating that the algorithm can be directly applied to support swarm collaborative tasks.

With the increasing popularity of deep neural networks (DNNs), it has recently been applied to many advanced and diverse tasks, such as medical diagnosis, automatic pilot etc. Due to the lack of transparency of the deep models, it causes serious concern about widespread deployment of ML/DL technologies. In this work, we address the Explainable AI problem of black-box classifiers which take images as input and output probabilities of classes. We propose a novel technology, the Morphological Fragmental Perturbation Pyramid (MFPP), in which we segment input image into different scales of fragments and randomly mask them as perturbation to generate an importance map that indicates how salient each pixel is for prediction results of the black-box DNNs. Compared to existing input sampling perturbation methods, this pyramid structure fragmentation has proven to be more efficient and it can better explore the morphological information of input image to match its semantic information, while it does not require any values inside model. We qualitatively and quantitatively demonstrate that MFPP matches and exceeds the performance of state-of-the-art black-box explanation methods on multiple models and datasets.

Identification of lesions plays a vital role in the accurate classification of retinal diseases and in helping clinicians analyzing the disease severity. In this paper, we present a detailed evaluation of RAGNet, PSPNet, SegNet, UNet, FCN-8 and FCN-32 for the extraction of retinal lesions such as intra-retinal fluid, sub-retinal fluid, hard exudates, drusen, and other chorioretinal anomalies from retinal fundus and OCT scans. We also discuss the transferability of these models for extracting retinal lesions by varying training-testing dataset pairs. A total of 363 fundus and 173,915 OCT scans were considered in this evaluation from seven publicly available datasets from which 297 fundus and 59,593 OCT scans were used for testing purposes. Overall, the best performance is achieved by RAGNet with a mean dice coefficient ($\mathrm{D_C}$) score of 0.822 for extracting retinal lesions. The second-best performance is achieved by PSPNet (mean $\mathrm{D_C}$: 0.785) using ResNet\textsubscript{50} as a backbone. Moreover, the best performance for extracting drusen is achieved by UNet ($\mathrm{D_C}$: 0.864). The source code is available at: this http URL

Infectious keratitis is the most common entities of corneal diseases, in which pathogen grows in the cornea leading to inflammation and destruction of the corneal tissues. Infectious keratitis is a medical emergency, for which a rapid and accurate diagnosis is needed for speedy initiation of prompt and precise treatment to halt the disease progress and to limit the extent of corneal damage; otherwise it may develop sight-threatening and even eye-globe-threatening condition. In this paper, we propose a sequential-level deep learning model to effectively discriminate the distinction and subtlety of infectious corneal disease via the classification of clinical images. In this approach, we devise an appropriate mechanism to preserve the spatial structures of clinical images and disentangle the informative features for clinical image classification of infectious keratitis. In competition with 421 ophthalmologists, the performance of the proposed sequential-level deep model achieved 80.00% diagnostic accuracy, far better than the 49.27% diagnostic accuracy achieved by ophthalmologists over 120 test images.

In this work we present our work in developing a software verification tool for llvm-code - Lodin - that incorporates both explicit-state model checking, statistical model checking and symbolic state model checking algorithms.

We study sample complexity of optimizing "hill-climbing friendly" functions defined on a graph under noisy observations. We define a notion of convexity, and we show that a variant of best-arm identification can find a near-optimal solution after a small number of queries that is independent of the size of the graph. For functions that have local minima and are nearly convex, we show a sample complexity for the classical simulated annealing under noisy observations. We show effectiveness of the greedy algorithm with restarts and the simulated annealing on problems of graph-based nearest neighbor classification as well as a web document re-ranking application.

Optical wireless communication is a promising technology for underwater broadband access networks, which are particularly important for high-resolution environmental monitoring applications. This paper focuses on a deep sea monitoring system, where an underwater optical wireless network is deployed on the seafloor. We model such an optical wireless network as a general queueing network and formulate an optimal relay placement problem, whose objective is to maximize the stability region of the whole system, i.e., the supremum of the traffic volume that the network is capable of accommodating. The formulated optimization problem is further shown to be non-convex, so that its global optimization is non-trivial. In this paper, we develop a global optimization method for this problem and we provide an efficient algorithm to compute an optimal solution. Through numerical evaluations, we show that a significant performance gain can be obtained by using the derived optimal solution.

Many libraries offer free access to digitised historical newspapers via user interfaces. After an initial period of search and filter options as the only features, the availability of more advanced tools and the desire for more options among users has ushered in a period of interface development. However, this raises a number of open questions and challenges. For example, how can we provide interfaces for different user groups? What tools should be available on interfaces and how can we avoid too much complexity? What tools are helpful and how can we improve usability? This paper will not provide definite answers to these questions, but it gives an insight into the difficulties, challenges and risks of using interfaces to investigate historical newspapers. More importantly, it provides ideas and recommendations for the improvement of user interfaces and digital tools.

While vehicular sensor networks (VSNs) have earned the stature of a mobile sensing paradigm utilizing sensors built into cars, they have limited sensing scopes since car drivers only opportunistically discover new events. Conversely, social sensing is emerging as a new sensing paradigm where measurements about the physical world are collected from humans. In contrast to VSNs, social sensing is more pervasive, but one of its key limitations lies in its inconsistent reliability stemming from the data contributed by unreliable human sensors. In this paper, we present DASC, a road Damage-Aware Social-media-driven Car sensing framework that exploits the collective power of social sensing and VSNs for reliable disaster response applications. However, integrating VSNs with social sensing introduces a new set of challenges: i) How to leverage noisy and unreliable social signals to route the vehicles to accurate regions of interest? ii) How to tackle the inconsistent availability (e.g., churns) caused by car drivers being rational actors? iii) How to efficiently guide the cars to the event locations with little prior knowledge of the road damage caused by the disaster, while also handling the dynamics of the physical world and social media? The DASC framework addresses the above challenges by establishing a novel hybrid social-car sensing system that employs techniques from game theory, feedback control, and Markov Decision Process (MDP). In particular, DASC distills signals emitted from social media and discovers the road damages to effectively drive cars to target areas for verifying emergency events. We implement and evaluate DASC in a reputed vehicle simulator that can emulate real-world disaster response scenarios. The results of a real-world application demonstrate the superiority of DASC over current VSNs-based solutions in detection accuracy and efficiency.

Generative Adversarial Networks (GANs) have been successful in producing outstanding results in areas as diverse as image, video, and text generation. Building on these successes, a large number of empirical studies have validated the benefits of the cousin approach called Wasserstein GANs (WGANs), which brings stabilization in the training process. In the present paper, we add a new stone to the edifice by proposing some theoretical advances in the properties of WGANs. First, we properly define the architecture of WGANs in the context of integral probability metrics parameterized by neural networks and highlight some of their basic mathematical features. We stress in particular interesting optimization properties arising from the use of a parametric 1-Lipschitz discriminator. Then, in a statistically-driven approach, we study the convergence of empirical WGANs as the sample size tends to infinity, and clarify the adversarial effects of the generator and the discrimi-nator by underlining some trade-off properties. These features are finally illustrated with experiments using both synthetic and real-world datasets.

Medical image segmentation is inherently an ambiguous task due to factors such as partial volumes and variations in anatomical definitions. While in most cases the segmentation uncertainty is around the border of structures of interest, there can also be considerable inter-rater differences. The class of conditional variational autoencoders (cVAE) offers a principled approach to inferring distributions over plausible segmentations that are conditioned on input images. Segmentation uncertainty estimated from samples of such distributions can be more informative than using pixel level probability scores. In this work, we propose a novel conditional generative model that is based on conditional Normalizing Flow (cFlow). The basic idea is to increase the expressivity of the cVAE by introducing a cFlow transformation step after the encoder. This yields improved approximations of the latent posterior distribution, allowing the model to capture richer segmentation variations. With this we show that the quality and diversity of samples obtained from our conditional generative model is enhanced. Performance of our model, which we call cFlow Net, is evaluated on two medical imaging datasets demonstrating substantial improvements in both qualitative and quantitative measures when compared to a recent cVAE based model.

Graph neural networks (GNNs) model nonlinear representations in graph data with applications in distributed agent coordination, control, and planning among others. Current GNN architectures assume ideal scenarios and ignore link fluctuations that occur due to environment, human factors, or external attacks. In these situations, the GNN fails to address its distributed task if the topological randomness is not considered accordingly. To overcome this issue, we put forth the stochastic graph neural network (SGNN) model: a GNN where the distributed graph convolution module accounts for the random network changes. Since stochasticity brings in a new learning paradigm, we conduct a statistical analysis on the SGNN output variance to identify conditions the learned filters should satisfy for achieving robust transference to perturbed scenarios, ultimately revealing the explicit impact of random link losses. We further develop a stochastic gradient descent (SGD) based learning process for the SGNN and derive conditions on the learning rate under which this learning process converges to a stationary point. Numerical results corroborate our theoretical findings and compare the benefits of SGNN robust transference with a conventional GNN that ignores graph perturbations during learning.

This paper deals with the problem of state estimation for a class of linear time-invariant systems with quadratic output measurements. An immersion-type approach is presented that transforms the system into a state-affine system by adding a finite number of states to the original system. Under suitable persistence of excitation conditions on the input and its higher derivatives, global state estimation is exhibited by means of a Kalman-type observer. A numerical example is provided to illustrate the applicability of the proposed observer design for the problem of position and velocity estimation for a vehicle navigating in the $n-$dimensional Euclidean space using a single position range measurement.

Despite significant progress in general AI planning, certain domains remain out of reach of current AI planning systems. Sokoban is a PSPACE-complete planning task and represents one of the hardest domains for current AI planners. Even domain-specific specialized search methods fail quickly due to the exponential search complexity on hard instances. Our approach based on deep reinforcement learning augmented with a curriculum-driven method is the first one to solve hard instances within one day of training while other modern solvers cannot solve these instances within any reasonable time limit. In contrast to prior efforts, which use carefully handcrafted pruning techniques, our approach automatically uncovers domain structure. Our results reveal that deep RL provides a promising framework for solving previously unsolved AI planning problems, provided a proper training curriculum can be devised.

The paper describes our experience collecting a new dataset for the light source estimation problem in a single image. The analysis of existing color targets is presented along with various technical and scientific aspects essential for data collection. The paper also contains an announcement of an upcoming 2-nd International Illumination Estimation Challenge (IEC 2020). %international illumination estimation challenge.

Nucleus segmentation is an important task in medical image analysis. However, machine learning models cannot perform well because there are large amount of clusters of crowded nuclei. To handle this problem, existing approaches typically resort to sophisticated hand-crafted post-processing strategies; therefore, they are vulnerable to the variation of post-processing hyper-parameters. Accordingly, in this paper, we devise a Boundary-assisted Region Proposal Network (BRP-Net) that achieves robust instance-level nucleus segmentation. First, we propose a novel Task-aware Feature Encoding (TAFE) network that efficiently extracts respective high-quality features for semantic segmentation and instance boundary detection tasks. This is achieved by carefully considering the correlation and differences between the two tasks. Second, coarse nucleus proposals are generated based on the predictions of the above two tasks. Third, these proposals are fed into instance segmentation networks for more accurate prediction. Experimental results demonstrate that the performance of BRP-Net is robust to the variation of post-processing hyper-parameters. Furthermore, BRP-Net achieves state-of-the-art performances on both the Kumar and CPM17 datasets. The code of BRP-Net will be released at https://github.com/csccsccsccsc/brpnet.

This study examines the time complexities of the unbalanced optimal transport problems from an algorithmic perspective for the first time. We reveal which problems in unbalanced optimal transport can/cannot be solved efficiently. Specifically, we prove that the Kantrovich Rubinstein distance and optimal partial transport in Euclidean metric cannot be computed in strongly subquadratic time under the strong exponential time hypothesis. Then, we propose an algorithm that solves a more general unbalanced optimal transport problem exactly in quasi-linear time on a tree metric. The proposed algorithm processes a tree with one million nodes in less than one second. Our analysis forms a foundation for the theoretical study of unbalanced optimal transport algorithms and opens the door to the applications of unbalanced optimal transport to million-scale datasets.

The recent development of light-weighted neural networks has promoted the applications of deep learning under resource constraints and mobile applications. Many of these applications need to perform a real-time and efficient prediction for semantic segmentation with a light-weighted network. This paper introduces a light-weighted network with an efficient reduced non-local module (LRNNet) for efficient and realtime semantic segmentation. We proposed a factorized convolutional block in ResNet-Style encoder to achieve more lightweighted, efficient and powerful feature extraction. Meanwhile, our proposed reduced non-local module utilizes spatial regional dominant singular vectors to achieve reduced and more representative non-local feature integration with much lower computation and memory cost. Experiments demonstrate our superior trade-off among light-weight, speed, computation and accuracy. Without additional processing and pretraining, LRNNet achieves 72.2% mIoU on Cityscapes test dataset only using the fine annotation data for training with only 0.68M parameters and with 71 FPS on a GTX 1080Ti card.

Single-view depth estimation using CNNs trained from unlabelled videos has shown significant promise. However, the excellent results have mostly been obtained in street-scene driving scenarios, and such methods often fail in other settings, particularly indoor videos taken by handheld devices, in which case the ego-motion is often degenerate, i.e., the rotation dominates the translation. In this work, we establish that the degenerate camera motions exhibited in handheld settings are a critical obstacle for unsupervised depth learning. A main contribution of our work is fundamental analysis which shows that the rotation behaves as noise during training, as opposed to the translation (baseline) which provides supervision signals. To capitalise on our findings, we propose a novel data pre-processing method for effective training, i.e., we search for image pairs with modest translation and remove their rotation via the proposed weak image rectification. With our pre-processing, existing unsupervised models can be trained well in challenging scenarios (e.g., NYUv2 dataset), and the results outperform the unsupervised SOTA by a large margin (0.147 vs. 0.189 in the AbsRel error).

Flexible electricity networks continuously coordinate and optimize operations through ICT systems. An overlay data grid conveys information about the state of the electricity grid, as well as the status of demand and production of electricity in households and industry. Data is thus the basis for decisions that affect electricity costs and availability of assets on the electricity grid. It is crucial that these decisions are formed and monitored according to a well-defined governance model. No such data governance model exists today. In this paper, we focus on the central role of virtual power plants in flexible electricity networks. We define the problem of data governance in a virtual power plant, insisting on the issues linked to the inherent asymmetry of this system. We propose unionization as a framing device to reason about this issue. The central contribution of this paper is thus principles for a unionized data governance model for virtual power plants.

Rapid advancements in driver-assistance technology will lead to the integration of fully autonomous vehicles on our roads that will interact with other road users. To address the problem that driverless vehicles make interaction through eye contact impossible, we describe a framework for estimating the crossing intentions of pedestrians in order to reduce the uncertainty that the lack of eye contact between road users creates. The framework was deployed in a real vehicle and tested with three experimental cases that showed a variety of communication messages to pedestrians in a shared space scenario. Results from the performed field tests showed the feasibility of the presented approach.

Domain adaptive object re-ID aims to transfer the learned knowledge from the labeled source domain to the unlabeled target domain to tackle the open-class re-identification problems. Although state-of-the-art pseudo-label-based methods have achieved great success, they did not make full use of all valuable information because of the domain gap and unsatisfying clustering performance. To solve these problems, we propose a novel self-paced contrastive learning framework with hybrid memory. The hybrid memory dynamically generates source-domain class-level, target-domain cluster-level and un-clustered instance-level supervisory signals for learning feature representations. Different from the conventional contrastive learning strategy, the proposed framework jointly distinguishes source-domain classes, and target-domain clusters and un-clustered instances. Most importantly, the proposed self-paced method gradually creates more reliable clusters to refine the hybrid memory and learning targets, and is shown to be the key to our outstanding performance. Our method outperforms state-of-the-arts on multiple domain adaptation tasks of object re-ID and even boosts the performance on the source domain without any extra annotations. Our generalized version on unsupervised person re-ID surpasses state-of-the-art algorithms by considerable 16.2% and 14.6% on Market-1501 and DukeMTMC-reID benchmarks. Code is available at https://github.com/yxgeee/SpCL.

In this paper we formalize and prove the soundness of Tarsis, a new abstract domain based on the abstract interpretation theory that approximates string values through finite state automata. The main novelty of Tarsis is that it works over an alphabet of strings instead of single characters. On the one hand, such approach requires a more complex and refined definition of the widening operator, and the abstract semantics of string operators. On the other hand, it is in position to obtain strictly more precise results than than state-of-the-art approaches. We implemented a prototype of Tarsis, and we applied it on some case studies taken from some of the most popular Java libraries manipulating string values. The experimental results confirm that Tarsis is in position to obtain strictly more precise results than existing analyses.

Tabu Search (TS) metaheuristic improves simple local search algorithms (e.g. steepest ascend hill-climbing) by enabling the algorithm to escape local optima points. It has shown to be useful for addressing several combinatorial optimization problems. This paper investigates the performance of TS and considers the effects of the size of the Tabu list and the size of the neighbourhood for a procedural content generation, specifically the generation of maps for a popular tabletop game called Terra Mystica. The results validate the feasibility of the proposed method and how it can be used to generate maps that improve existing maps for the game.

Deep Learning has become one of the primary research areas in developing intelligent machines. Most of the well-known applications (such as Speech Recognition, Image Processing and NLP) of AI are driven by Deep Learning. Deep Learning algorithms mimic human brain using artificial neural networks and progressively learn to accurately solve a given problem. But there are significant challenges in Deep Learning systems. There have been many attempts to make deep learning models imitate the biological neural network. However, many deep learning models have performed poorly in the presence of adversarial examples. Poor performance in adversarial examples leads to adversarial attacks and in turn leads to safety and security in most of the applications. In this paper we make an attempt to characterize the solution space of a deep neural network in terms of three different subsets viz. weights belonging to exact trained patterns, weights belonging to generalized pattern set and weights belonging to adversarial pattern sets. We attempt to characterize the solution space with two seemingly different learning paradigms viz. the Deep Neural Networks and the Dense Associative Memory Model, which try to achieve learning via quite different mechanisms. We also show that adversarial attacks are generally less successful against Associative Memory Models than Deep Neural Networks.

Narrowband Internet-of-Things (NB-IoT) is one of the major access technologies proposed to support massive machine type communications (mMTC) services for the 5th generation (5G) mobile networks. Many emerging services and networking paradigms are expected to be developed on top of NB-IoT networks. This paper summarizes the steps required to build up an open source narrowband Internet-of-Things (NB-IoT) network. This work is a joint research and development (R&D) result from industry and academic collaboration. The open source NB-IoT enhanced Node B (eNB) is jointly developed by B-COM and NTUST based on the well-known OpenAirInterfaceTM (OAI) open source Long-Term Evolution (LTE) eNB developed by EURECOM. The NB-IoT eNB is successfully connected to an evolved packet core (EPC) developed by Nokia Bell Lab. We demonstrate how to use commercial off-the-shelf (COTS) NB-IoT module to forward its sensing data to the Internet via the open source NB-IoT network.

We study numerically the one-dimensional Allen-Cahn equation with the spectral fractional Laplacian $(-\Delta)^{\alpha/2}$ on intervals with homogeneous Neumann boundary conditions. In particular, we are interested in the speed of sharp interfaces approaching and annihilating each other. This process is known to be exponentially slow in the case of the classical Laplacian. Here we investigate how the width and speed of the interfaces change if we vary the exponent $\alpha$ of the fractional Laplacian. For the associated model on the real-line we derive asymptotic formulas for the interface speed and time-to-collision in terms of $\alpha$ and a scaling parameter $\varepsilon$. We use a numerical approach via a finite-element method based upon extending the fractional Laplacian to a cylinder in the upper-half plane, and compute the interface speed, time-to-collapse and interface width for $\alpha\in(0.2,2]$. A comparison shows that the asymptotic formulas for the interface speed and time-to-collision give a good approximation for large intervals.

In this paper, we propose a maximum mutual information (MMI) framework for multi-agent reinforcement learning (MARL) to enable multiple agents to learn coordinated behaviors by regularizing the accumulated return with the mutual information between actions. By introducing a latent variable to induce nonzero mutual information between actions and applying a variational bound, we derive a tractable lower bound on the considered MMI-regularized objective function. Applying policy iteration to maximize the derived lower bound, we propose a practical algorithm named variational maximum mutual information multi-agent actor-critic (VM3-AC), which follows centralized learning with decentralized execution (CTDE). We evaluated VM3-AC for several games requiring coordination, and numerical results show that VM3-AC outperforms MADDPG and other MARL algorithms in multi-agent tasks requiring coordination.

Deep learning requires regularization mechanisms to reduce overfitting and improve generalization. We address this problem by a new regularization method based on distributional robust optimization. The key idea is to modify the contribution from each sample for tightening the empirical risk bound. During the stochastic training, the selection of samples is done according to their accuracy in such a way that the worst performed samples are the ones that contribute the most in the optimization. We study different scenarios and show the ones where it can make the convergence faster or increase the accuracy.

Recent advances in blockchain have led to a significant interest in developing blockchain-based applications. While data can be retained in blockchains, the stored values can be deleted or updated. From a user viewpoint that searches for the data, it is unclear whether the discovered data from the blockchain storage is relevant for real-time decision-making process for blockchain-based application. The data freshness issue serves as a critical factor especially in dynamic networks handling real-time information. In general, transactions to renew the data require additional processing time inside the blockchain network, which is called ledger-commitment latency. Due to this problem, some users may receive outdated data. As a result, it is important to investigate if blockchain is suitable for providing real-time data services. In this article, we first describe blockchain-enabled (BCE) networks with Hyperledger Fabric (HLF). Then, we define age of information (AoI) of BCE networks and investigate the influential factors in this AoI. Analysis and experiments are conducted to support our proposed framework. Lastly, we conclude by discussing some future challenges.

The 2019 Multi-Agent Programming Contest introduced a new scenario, Agents Assemble, where two teams of agents move around a 2D grid and compete to assemble complex block structures. In this paper, we describe the strategies used by our team that led us to achieve first place in the contest. Our strategies tackle some of the major challenges in the 2019 contest: how to explore and build a map when agents only have access to local vision and no global coordinates; how to move around the map efficiently even though there are dynamic events that can change the cells in the grid; and how to assemble and submit complex block structures given that the opposing team may try to sabotage us. To implement our strategies, we use the multi-agent systems development platform JaCaMo to program our agents and the Fast Downward planner to plan the movement of the agent in the grid. We also provide a brief discussion of our matches in the contest and give our analysis of how our team performed in each match.

With the growing technological advances in autonomous driving, the transport industry and research community seek to determine the impact that autonomous vehicles (AV) will have on consumers, as well as identify the different factors that will influence their use. Most of the research performed so far relies on laboratory-controlled conditions using driving simulators, as they offer a safe environment for testing advanced driving assistance systems (ADAS). In this study we analyze the behavior of drivers that are placed in control of an automated vehicle in a real life driving environment. The vehicle is equipped with advanced autonomy, making driver control of the vehicle unnecessary in many scenarios, although a driver take over is possible and sometimes required. In doing so, we aim to determine the impact of such a system on the driver and their driving performance. To this end road users' behavior from naturalistic driving data is analyzed focusing on awareness and diagnosis of the road situation. Results showed that the road features determined the level of visual attention and trust in the automation. They also showed that the activities performed during the automation affected the reaction time to take over the control of the vehicle.

The Multi-Agent Programming Contest, MAPC, is an annual event organized since 2005 out of Clausthal University of Technology. Its aim is to investigate the potential of using decentralized, autonomously acting intelligent agents, by providing a complex scenario to be solved in a competitive environment. For this we need suitable benchmarks where agent-based systems can shine. We present previous editions of the contest and also its current scenario and results from its use in the 2019 MAPC with a special focus on its suitability. We conclude with lessons learned over the years.

In this paper, we investigate a general class of stochastic gradient descent (SGD) algorithms, called conditioned SGD, based on a preconditioning of the gradient direction. Under some mild assumptions, namely the $L$-smoothness of the objective function and some weak growth condition on the noise, we establish the almost sure convergence and the asymptotic normality for a broad class of conditioning matrices. In particular, when the conditioning matrix is an estimate of the inverse Hessian at the optimal point, the algorithm is proved to be asymptotically optimal. The benefits of this approach are validated on simulated and real datasets.

This comment presents the results of using chance-constrained model predictive control (MPC) to solve a one-horizon benchmark collision avoidance problem.

Dynamic real-time optimization (DRTO) is a challenging task due to the fact that optimal operating conditions must be computed in real time. The main bottleneck in the industrial application of DRTO is the presence of uncertainty. Many stochastic systems present the following obstacles: 1) plant-model mismatch, 2) process disturbances, 3) risks in violation of process constraints. To accommodate these difficulties, we present a constrained reinforcement learning (RL) based approach. RL naturally handles the process uncertainty by computing an optimal feedback policy. However, no state constraints can be introduced intuitively. To address this problem, we present a chance-constrained RL methodology. We use chance constraints to guarantee the probabilistic satisfaction of process constraints, which is accomplished by introducing backoffs, such that the optimal policy and backoffs are computed simultaneously. Backoffs are adjusted using the empirical cumulative distribution function to guarantee the satisfaction of a joint chance constraint. The advantage and performance of this strategy are illustrated through a stochastic dynamic bioprocess optimization problem, to produce sustainable high-value bioproducts.

This paper offers a review of numerical methods for computation of the eigenvalues of Hermitian matrices and the singular values of general and some classes of structured matrices. The focus is on the main principles behind the methods that guarantee high accuracy even in the cases that are ill-conditioned for the conventional methods. First, it is shown that a particular structure of the errors in a finite precision implementation of an algorithm allows for a much better measure of sensitivity and that computation with high accuracy is possible despite a large classical condition number. Such structured errors incurred by finite precision computation are in some algorithms e.g. entry-wise or column-wise small, which is much better than the usually considered errors that are in general small only when measured in the Frobenius matrix norm. Specially tailored perturbation theory for such structured perturbations of Hermitian matrices guarantees much better bounds for the relative errors in the computed eigenvalues. % Secondly, we review an unconventional approach to accurate computation of the singular values and eigenvalues of some notoriously ill-conditioned structured matrices, such as e.g. Cauchy, Vandermonde and Hankel matrices. The distinctive feature of accurate algorithms is using the intrinsic parameters that define such matrices to obtain a non-orthogonal factorization, such as the \textsf{LDU} factorization, and then computing the singular values of the product of thus computed factors. The state of the art software is discussed as well.

Since Android has become a popular software platform for mobile devices recently; they offer almost the same functionality as personal computers. Malwares have also become a big concern. As the number of new Android applications tends to be rapidly increased in the near future, there is a need for automatic malware detection quickly and efficiently. In this paper, we define a simple static analysis approach to first extract the features of the android application based on intents and categories the application into a known major category and later on mapping it with the permissions requested by the application and also comparing it with the most obvious intents of category. As a result, getting to know which apps are using features which they are not supposed to use or they do not need.

It has been clearly identified that I/O is one of the bottleneck to extend application for the exascale era. New concepts such as 'in transit' and 'in situ' visualization and analysis have been identified as key technologies to circumvent this particular issue. A new parallel I/O and data management library called Hercule, developed at CEA-DAM, has been integrated to Ramses, an AMR simulation code for self-gravitating fluids. Splitting the original Ramses output format in Hercule database formats dedicated to either checkpoints/restarts (HProt format) or post-processing (HDep format) not only improved I/O performance and scalability of the Ramses code but also introduced much more flexibility in the simulation outputs to help astrophysicists prepare their DMP (Data Management Plan). Furthermore, the very lightweight and purpose-specific post-processing format (HDep) will significantly improve the overall performance of analysis and visualization tools such as PyMSES 5. An introduction to the Hercule parallel I/O library as well as I/O benchmark results will be discussed.

The authors design and demonstrate a process for carrying out design science (DS) research in information systems and demonstrate use of the process to conduct research in two case studies. Several IS researchers have pioneered the acceptance of DS research in IS, but in the last 15 years little DS research has been done within the discipline. The lack of a generally accepted process for DS research in IS may have contributed to this problem. We sought to design a design science research process (DSRP) model that would meet three objectives: it would be consistent with prior literature, it would provide a nominal process model for doing DS research, and it would provide a mental model for presenting and appreciating DS research in IS. The process includes six steps: problem identification and motivation, objectives for a solution, design and development, evaluation, and communication. We demonstrated the process by using it in this study and by presenting two case studies, one in IS planning to develop application ideas for mobile financial services and another in requirements engineering to specify feature requirements for a self service advertising design and sales system intended for wide audience end users. The process effectively satisfies the three objectives and has the potential to help aid the acceptance of DS research in the IS discipline.

We study graph-based Laplacian semi-supervised learning at low labeling rates. Laplacian learning uses harmonic extension on a graph to propagate labels. At very low label rates, Laplacian learning becomes degenerate and the solution is roughly constant with spikes at each labeled data point. Previous work has shown that this degeneracy occurs when the number of labeled data points is finite while the number of unlabeled data points tends to infinity. In this work we allow the number of labeled data points to grow to infinity with the number of labels. Our results show that for a random geometric graph with length scale $\varepsilon>0$ and labeling rate $\beta>0$, if $\beta \ll\varepsilon^2$ then the solution becomes degenerate and spikes form, and if $\beta\gg \varepsilon^2$ then Laplacian learning is well-posed and consistent with a continuum Laplace equation. Furthermore, in the well-posed setting we prove quantitative error estimates of $O(\varepsilon\beta^{-1/2})$ for the difference between the solutions of the discrete problem and continuum PDE, up to logarithmic factors. We also study $p$-Laplacian regularization and show the same degeneracy result when $\beta \ll \varepsilon^p$. The proofs of our well-posedness results use the random walk interpretation of Laplacian learning and PDE arguments, while the proofs of the ill-posedness results use $\Gamma$-convergence tools from the calculus of variations. We also present numerical results on synthetic and real data to illustrate our results.

We propose a generative framework based on generative adversarial network (GAN) to enhance facial attractiveness while preserving facial identity and high-fidelity. Given a portrait image as input, having applied gradient descent to recover a latent vector that this generative framework can use to synthesize an image resemble to the input image, beauty semantic editing manipulation on the corresponding recovered latent vector based on InterFaceGAN enables this framework to achieve facial image beautification. This paper compared our system with Beholder-GAN and our proposed result-enhanced version of Beholder-GAN. It turns out that our framework obtained state-of-art attractiveness enhancement results. The code is available at https://github.com/zoezhou1999/BeautifyBasedOnGAN.

Intelligent Conversational Agent development using Artificial Intelligence or Machine Learning technique is an interesting problem in the field of Natural Language Processing. With the rise of deep learning, these models were quickly replaced by end to end trainable neural networks.

Pruning neural networks has regained interest in recent years as a means to compress state-of-the-art deep neural networks and enable their deployment on resource-constrained devices. In this paper, we propose a robust compressive learning framework that efficiently prunes network parameters during training with minimal computational overhead. We incorporate fast mechanisms to prune individual layers and build upon these to automatically prune the entire network under a user-defined budget constraint. Key to our end-to-end network pruning approach is the formulation of an intuitive and easy-to-implement adaptive sparsity loss that is used to explicitly control sparsity during training, enabling efficient budget-aware optimization. Extensive experiments demonstrate the effectiveness of the proposed framework for image classification on the CIFAR and ImageNet datasets using different architectures, including AlexNet, ResNets and Wide ResNets.

Secondary analysis or the reuse of existing data is a common practice among social scientists. The complexity of datasets, however, exceeds those known from traditional document retrieval. Dataset retrieval, especially in the social sciences, incorporates additional material such as codebooks, questionnaires, raw data files and more. Due to the diverse nature of datasets, document retrieval models often do not work as efficiently for retrieving datasets. One way of enhancing these types of searches is to incorporate the users' interaction context in order to personalise dataset retrieval sessions. As a first step towards this long term goal, we study characteristics of dataset retrieval sessions from a real-life Digital Library for the social sciences that incorporates both: research data and publications. Previous studies reported a way of discerning queries between document of dataset search by query length. In this paper, we argue the claim and report our findings of indistinguishability of queries, whether aiming for dataset or a document. Amongst others, we report our findings of dataset retrieval sessions with respect to query characteristics, interaction sequences and topical drift within 65,000 unique sessions.

Formalisms based on quantum theory have been used in Cognitive Science for decades due to their descriptive features. A quantum-like (QL) approach provides descriptive features such as state superposition and probabilistic interference behavior. Moreover, quantum systems dynamics have been found isomorphic to cognitive or biological systems dynamics. The objective of this paper is to study the feasibility of a QL perception model for a robot with limited sensing capabilities. We introduce a case study, we highlight its limitations, and we investigate and analyze actual robot behaviors through simulations, while actual implementations based on quantum devices encounter errors for unbalanced situations. In order to investigate QL models for robot behavior, and to study the advantages leveraged by QL approaches for robot knowledge representation and processing, we argue that it is preferable to proceed with simulation-oriented techniques rather than actual realizations on quantum backends.

We investigate the impact of more realistic room simulation for training far-field keyword spotting systems without fine-tuning on in-domain data. To this end, we study the impact of incorporating the following factors in the room impulse response (RIR) generation: air absorption, surface- and frequency-dependent coefficients of real materials, and stochastic ray tracing. Through an ablation study, a wake word task is used to measure the impact of these factors in comparison with a ground-truth set of measured RIRs. On a hold-out set of re-recordings under clean and noisy far-field conditions, we demonstrate up to $35.8\%$ relative improvement over the commonly-used (single absorption coefficient) image source method. Source code is made available in the Pyroomacoustics package, allowing others to incorporate these techniques in their work.

In this paper, a novel and efficient hardware implementation of steganographic cryptosystem based on a public-key cryptography is proposed. Digital images are utilized as carriers of secret data between sender and receiver parties in the communication channel. The proposed public-key cryptosystem offers a separable framework that allows to embed or extract secret data and encrypt or decrypt the carrier using the public-private key pair, independently. Paillier cryptographic system is adopted to encrypt and decrypt pixels of the digital image. To achieve efficiency, a proposed efficient parallel montgomery exponentiation core is designed and implemented for performing the underlying field operations in the Paillier cryptosystem. The hardware implementation results of the proposed steganographic cryptosystem show an efficiency in terms of area (resources), performance (speed) and power consumption. Our steganographic cryptosystem represents a small footprint making it well-suited for the embedded systems and real-time processing engines in applications such as medical scanning devices, autopilot cars and drones.

Increasing availability and quality of actual, as opposed to scheduled, open transport data offers new possibilities for capturing the spatiotemporal dynamics of the railway and other networks of social infrastructure. One way to describe such complex phenomena is in terms of stochastic processes. At its core, a stochastic model is domain-agnostic and algorithms discussed here have been successfully used in other applications, including Google's PageRank citation ranking. Our key assumption is that train routes constitute meaningful sequences analogous to sentences of literary text. A corpus of routes is thus susceptible to the same analytic tool-set as a corpus of sentences. With our experiment in Switzerland, we introduce a method for building Markov Chains from aggregated daily streams of railway traffic data. The stationary distributions under normal and perturbed conditions are used to define systemic risk measures with non-evident, valuable information for operation, and planning.

A new model description and type classification carried out on its base of a wide variety of practical hysteresis loops are suggested. An analysis of the loop approximating function was carried out; the parameters and characteristics of the model were defined - coersitivity, remanent polarization, value of hysteresis, spontaneous polarization, induced piezocoefficients, value of saturation, hysteresis losses of energy per cycle. It was shown that with piezomanipulators of certain hysteresis loop types, there is no difference in heat production. The harmonic linearization coefficients were calculated, and the harmonically linearized transfer function of a nonlinear hysteresis element was deduced. The hysteresis loop type was defined that possesses minimum phase shift. The average relative approximation error of the model has been evaluated as 1.5%-6% for real hysteresis loops. A procedure for definition of the model parameters by experimental data is introduced. Examples of using the results in a scan unit of a scanning tunneling microscope for compensation of raster distortion are given.

From 2017 to 2019 the Text REtrieval Conference (TREC) held a challenge task on precision medicine using documents from medical publications (PubMed) and clinical trials. Despite lots of performance measurements carried out in these evaluation campaigns, the scientific community is still pretty unsure about the impact individual system features and their weights have on the overall system performance. In order to overcome this explanatory gap, we first determined optimal feature configurations using the Sequential Model-based Algorithm Configuration (SMAC) program and applied its output to a BM25-based search engine. We then ran an ablation study to systematically assess the individual contributions of relevant system features: BM25 parameters, query type and weighting schema, query expansion, stop word filtering, and keyword boosting. For evaluation, we employed the gold standard data from the three TREC-PM installments to evaluate the effectiveness of different features using the commonly shared infNDCG metric.

Most approaches to multi-talker overlapped speech separation and recognition assume that the number of simultaneously active speakers is given, but in realistic situations, it is typically unknown. To cope with this, we extend an iterative speech extraction system with mechanisms to count the number of sources and combine it with a single-talker speech recognizer to form the first end-to-end multi-talker automatic speech recognition system for an unknown number of active speakers. Our experiments show very promising performance in counting accuracy, source separation and speech recognition on simulated clean mixtures from WSJ0-2mix and WSJ0-3mix. Among others, we set a new state-of-the-art word error rate on the WSJ0-2mix database. Furthermore, our system generalizes well to a larger number of speakers than it ever saw during training, as shown in experiments with the WSJ0-4mix database.

Persistence diagrams, a key tool in the field of Topological Data Analysis, concisely represent the topology of a point cloud. Most current methods to integrate persistence diagrams into machine learning either require prior knowledge of a ground-truth topology or map them into a feature vector, offering a trade-off between information loss and invoking the curse of dimensionality. In this paper we give an algorithm for Fuzzy c-Means (FCM) clustering directly on the space of persistence diagrams, enabling unsupervised learning that automatically captures the topological structure of data, with no prior knowledge or additional processing of persistence diagrams. We prove the same convergence guarantees as traditional FCM clustering: that any convergent subsequence of our algorithm tends to a local minimum or saddle point of the cost function. We end by presenting experiments that demonstrate our algorithm can successfully cluster transformed datasets from materials science where comparable Wasserstein barycentre clustering algorithms fail, whilst also running at least an order of magnitude faster.

In past few years, linear rectified unit activation functions have shown its significance in the neural networks, surpassing the performance of sigmoid activations. RELU (Nair & Hinton, 2010), ELU (Clevert et al., 2015), PRELU (He et al., 2015), LRELU (Maas et al., 2013), SRELU (Jin et al., 2016), ThresholdedRELU, all these linear rectified activation functions have its own significance over others in some aspect. Most of the time these activation functions suffer from bias shift problem due to non-zero output mean, and high weight update problem in deep complex networks due to unit gradient, which results in slower training, and high variance in model prediction respectively. In this paper, we propose, "Thresholded exponential rectified linear unit" (TERELU) activation function that works better in alleviating in overfitting: large weight update problem. Along with alleviating overfitting problem, this method also gives good amount of non-linearity as compared to other linear rectifiers. We will show better performance on the various datasets using neural networks, considering TERELU activation method compared to other activations.

Understanding the 3D geometric structure of the Earth's surface has been an active research topic in photogrammetry and remote sensing community for decades, serving as an essential building block for various applications such as 3D digital city modeling, change detection, and city management. Previous researches have extensively studied the problem of height estimation from aerial images based on stereo or multi-view image matching. These methods require two or more images from different perspectives to reconstruct 3D coordinates with camera information provided. In this paper, we deal with the ambiguous and unsolved problem of height estimation from a single aerial image. Driven by the great success of deep learning, especially deep convolution neural networks (CNNs), some researches have proposed to estimate height information from a single aerial image by training a deep CNN model with large-scale annotated datasets. These methods treat height estimation as a regression problem and directly use an encoder-decoder network to regress the height values. In this paper, we proposed to divide height values into spacing-increasing intervals and transform the regression problem into an ordinal regression problem, using an ordinal loss for network training. To enable multi-scale feature extraction, we further incorporate an Atrous Spatial Pyramid Pooling (ASPP) module to extract features from multiple dilated convolution layers. After that, a post-processing technique is designed to transform the predicted height map of each patch into a seamless height map. Finally, we conduct extensive experiments on ISPRS Vaihingen and Potsdam datasets. Experimental results demonstrate significantly better performance of our method compared to the state-of-the-art methods.

Human infants have the remarkable ability to learn the associations between object names and visual objects from inherently ambiguous experiences. Researchers in cognitive science and developmental psychology have built formal models that implement in-principle learning algorithms, and then used pre-selected and pre-cleaned datasets to test the abilities of the models to find statistical regularities in the input data. In contrast to previous modeling approaches, the present study used egocentric video and gaze data collected from infant learners during natural toy play with their parents. This allowed us to capture the learning environment from the perspective of the learner's own point of view. We then used a Convolutional Neural Network (CNN) model to process sensory data from the infant's point of view and learn name-object associations from scratch. As the first model that takes raw egocentric video to simulate infant word learning, the present study provides a proof of principle that the problem of early word learning can be solved, using actual visual data perceived by infant learners. Moreover, we conducted simulation experiments to systematically determine how visual, perceptual, and attentional properties of infants' sensory experiences may affect word learning.

Convolutional Neural Networks (CNNs) have been widely used in many fields. However, the training process costs much energy and time, in which the convolution operations consume the major part. In this paper, we propose a fixed-point training framework, in order to reduce the data bit-width for the convolution multiplications. Firstly, we propose two constrained group-wise scaling methods that can be implemented with low hardware cost. Secondly, to overcome the challenge of trading off overflow and rounding error, a shiftable fixed-point data format is used in this framework. Finally, we propose a double-width deployment technique to boost inference performance with the same bit-width hardware multiplier. The experimental results show that the input data of convolution in the training process can be quantized to 2-bit for CIFAR-10 dataset, 6-bit for ImageNet dataset, with negligible accuracy degradation. Furthermore, our fixed-point train-ing framework has the potential to save at least 75% energy of the computation in the training process.

This investigation reports on the results of convolutional neural networks developed for the recently introduced PathologicAL Myopia (PALM) dataset, which consists of 1200 fundus images. We propose a new Optic Nerve Head (ONH)-based prediction enhancement for the segmentation of atrophy and fovea. Models trained with 400 available training images achieved an AUC of 0.9867 for pathological myopia classification, and a Euclidean distance of 58.27 pixels on the fovea localization task, evaluated on a test set of 400 images. Dice and F1 metrics for semantic segmentation of lesions scored 0.9303 and 0.9869 on optic disc, 0.8001 and 0.9135 on retinal atrophy, and 0.8073 and 0.7059 on retinal detachment, respectively. Our work was acknowledged with an award in the context of the "PathologicAL Myopia detection from retinal images" challenge held during the IEEE International Symposium on Biomedical Imaging (April 2019). Considering that (pathological) myopia cases are often identified as false positives and negatives in classification systems for glaucoma, we envision that the current work could aid in future research to discriminate between glaucomatous and highly-myopic eyes, complemented by the localization and segmentation of landmarks such as fovea, optic disc and atrophy.

More than half of the 7,000 languages in the world are in imminent danger of going extinct. Traditional methods of documenting language proceed by collecting audio data followed by manual annotation by trained linguists at different levels of granularity. This time consuming and painstaking process could benefit from machine learning. Many endangered languages do not have any orthographic form but usually have speakers that are bi-lingual and trained in a high resource language. It is relatively easy to obtain textual translations corresponding to speech. In this work, we provide a multimodal machine learning framework for speech representation learning by exploiting the correlations between the two modalities namely speech and its corresponding text translation. Here, we construct a convolutional neural network audio encoder capable of extracting linguistic representations from speech. The audio encoder is trained to perform a speech-translation retrieval task in a contrastive learning framework. By evaluating the learned representations on a phone recognition task, we demonstrate that linguistic representations emerge in the audio encoder's internal representations as a by-product of learning to perform the retrieval task.

The 2019 Multi-Agent Programming Contest (MAPC) scenario poses many challenges for agents participating in the contest. We discuss The Requirement Gatherers' (TRG) approach to handling the various challenges we faced -- including how we designed our system, how we went about debugging our agents, and the strategy we employed to each of our agents. We conclude the paper with remarks about the performance of our agents, and what we should have done differently.

In this paper, we propose enhancing actor-critic reinforcement learning agents by parameterising the final actor layer which produces the actions in order to accommodate the behaviour discrepancy of different actuators, under different load conditions during interaction with the environment. We propose branching the action producing layer in the actor to learn the tuning parameter controlling the activation layer (e.g. Tanh and Sigmoid). The learned parameters are then used to create tailored activation functions for each actuator. We ran experiments on three OpenAI Gym environments, i.e. Pendulum-v0, LunarLanderContinuous-v2 and BipedalWalker-v2. Results have shown an average of 23.15% and 33.80% increase in total episode reward of the LunarLanderContinuous-v2 and BipedalWalker-v2 environments, respectively. There was no significant improvement in Pendulum-v0 environment but the proposed method produces a more stable actuation signal compared to the state-of-the-art method. The proposed method allows the reinforcement learning actor to produce more robust actions that accommodate the discrepancy in the actuators' response functions. This is particularly useful for real life scenarios where actuators exhibit different response functions depending on the load and the interaction with the environment. This also simplifies the transfer learning problem by fine tuning the parameterised activation layers instead of retraining the entire policy every time an actuator is replaced. Finally, the proposed method would allow better accommodation to biological actuators (e.g. muscles) in biomechanical systems.

The study presents a neural network, which uses filters based on logistic mapping (LogNNet). LogNNet has a feedforward network structure, but possesses the properties of reservoir neural networks. The input weight matrix, set by a recurrent logistic mapping, forms the kernels that transform the input space to the higher-dimensional feature space. The most effective MNIST handwritten digit recognition occurs under chaotic behavior of logistic map. An advantage of LogNNet implementation on IoT Devices is the significant savings in used memory (more than 10 times) compared to other neural networks. The presented network architecture uses an array of weights with a total memory size from 1 kB to 29 kB and achieves a classification accuracy of 80.3-96.3 %.

Communication is crucial when disasters isolate communities of people and rescue is delayed. Such delays force citizens to be first responders and form small rescue teams. Rescue teams require reliable communication, particularly in the first 72 hours, which is challenging due to damaged infrastructure and electrical blackouts. We design a peer-to-peer communication network that meets these challenges. We introduce the concept of participatory fairness: equal communication opportunities for all citizens regardless of initial inequality in phone battery charge. Our value-sensitive design approach achieves an even battery charge distribution across phones over time and enables citizens to communicate over 72 hours. We apply the fairness principle to communication in an adapted standard Barabasi-Albert model of a scale-free network that automatically (i) assigns high-battery phones as hubs, (ii) adapts the network topology to the spatio-temporal battery charge distribution, and (iii) self-organizes to remain robust and reliable when links fail or phones leave the network. While the Barabasi-Albert model has become a widespread descriptive model, we demonstrate its use as a design principle to meet values such as fairness and systemic efficiency. Our results demonstrate that, compared to a generic peer-to-peer mesh network, the new protocol achieves (i) a longer network lifetime, (ii) an adaptive information flow, (iii) a fair distribution of battery charge, and (iv) higher participation rates. Hence, our protocol, Self-Organization for Survival ('SOS'), provides fair communication opportunities to all citizens during a disaster through self-organization. SOS enables participatory resilience and sustainability, empowering citizens to communicate when they need it most.

Event cameras are bio-inspired sensors capable of providing a continuous stream of events with low latency and high dynamic range. As a single event only carries limited information about the brightness change at a particular pixel, events are commonly accumulated into spatio-temporal windows for further processing. However, the optimal window length varies depending on the scene, camera motion, the task being performed, and other factors. In this research, we develop a novel ensemble-based scheme for combining spatio-temporal windows of varying lengths that are processed in parallel. For applications where the increased computational requirements of this approach are not practical, we also introduce a new "approximate" ensemble scheme that achieves significant computational efficiencies without unduly compromising the original performance gains provided by the ensemble approach. We demonstrate our ensemble scheme on the visual place recognition (VPR) task, introducing a new Brisbane-Event-VPR dataset with annotated recordings captured using a DAVIS346 color event camera. We show that our proposed ensemble scheme significantly outperforms all the single-window baselines and conventional model-based ensembles, irrespective of the image reconstruction and feature extraction methods used in the VPR pipeline, and evaluate which ensemble combination technique performs best. These results demonstrate the significant benefits of ensemble schemes for event camera processing in the VPR domain and may have relevance to other related processes, including feature tracking, visual-inertial odometry, and steering prediction in driving.

A Hybrid cloud is an integration of resources between private and public clouds. It enables users to horizontally scale their on-premises infrastructure up to public clouds in order to improve performance and cut up-front investment cost. This model of applications deployment is called cloud bursting that allows data-intensive applications especially distributed database systems to have the benefit of both private and public clouds. In this work, we present an automated implementation of a hybrid cloud using (i) a robust and zero-cost Linux-based VPN to make a secure connection between private and public clouds, and (ii) Terraform as a software tool to deploy infrastructure resources based on the requirements of hybrid cloud. We also explore performance evaluation of cloud bursting for six modern and distributed database systems on the hybrid cloud spanning over local OpenStack and Microsoft Azure. Our results reveal that MongoDB and MySQL Cluster work efficient in terms of throughput and operations latency if they burst into a public cloud to supply their resources. In contrast, the performance of Cassandra, Riak, Redis, and Couchdb reduces if they significantly leverage their required resources via cloud bursting.

State-of-the-art spoof detection methods tend to overfit to the spoof types seen during training and fail to generalize to unknown spoof types. Given that face anti-spoofing is inherently a local task, we propose a face anti-spoofing framework, namely Self-Supervised Regional Fully Convolutional Network (SSR-FCN), that is trained to learn local discriminative cues from a face image in a self-supervised manner. The proposed framework improves generalizability while maintaining the computational efficiency of holistic face anti-spoofing approaches (< 4 ms on a Nvidia GTX 1080Ti GPU). The proposed method is interpretable since it localizes which parts of the face are labeled as spoofs. Experimental results show that SSR-FCN can achieve TDR = 65% @ 2.0% FDR when evaluated on a dataset comprising of 13 different spoof types under unknown attacks while achieving competitive performances under standard benchmark datasets (Oulu-NPU, CASIA-MFSD, and Replay-Attack).

In this work we present a mapping from a fragment of the quantum programming language Quipper, called Quip-E, to the semantics of the QPMC model checker, aiming at the automatic verification of quantum programs. As a main outcome, we define a structural operational semantics for the Quip-E language corresponding to quantum Markov chains, and we use it as a basis for analysing quantum programs through the QPMC model checker. The properties of the semantics are proved and contextualised in the development of a tool translating from quantum programs to quantum Markov chains.

In this article, the shape optimization of a linear elastic body subject to frictional (Tresca) contact is investigated. Due to the projection operators involved in the formulation of the contact problem, the solution is not shape differentiable in general. Moreover, shape optimization of the contact zone requires the computation of the gap between the bodies in contact, as well as its shape derivative. Working with directional derivatives, sufficient conditions for shape differentiability are derived. %The problem is addressed in the general framework of two bodies with smooth boundaries. Then, some numerical results, obtained with a gradient descent algorithm based on those shape derivatives, are presented.

Analogy-making is at the core of human intelligence and creativity with applications to such diverse tasks as commonsense reasoning, learning, language acquisition, and story telling. This paper contributes to the foundations of artificial general intelligence by introducing an abstract algebraic framework of analogical proportions of the form $a$ is to $b$ what $c$ is to $d$' in the general setting of universal algebra. This enables us to compare mathematical objects possibly across different domains in a uniform way which is crucial for artificial general intelligence. The main idea is to define solutions to analogical proportions in terms of generalizations and to derive abstract terms of concrete elements from a known' source domain which can then be instantiated in an unknown' target domain to obtain analogous elements. We extensively compare our framework with two prominent and recently introduced frameworks of analogical proportions from the literature in the concrete domains of sets, numbers, and words and show that our framework is strictly more general in all of these cases which provides strong evidence for the applicability of our framework. In a broader sense, this paper is a first step towards an algebraic theory of analogical reasoning and learning systems with potential applications to fundamental AI-problems like commonsense reasoning and computational learning and creativity.

In 1988, Eric B. Baum showed that two-layers neural networks with threshold activation function can perfectly memorize the binary labels of $n$ points in general position in $\mathbb{R}^d$ using only $\ulcorner n/d \urcorner$ neurons. We observe that with ReLU networks, using four times as many neurons one can fit arbitrary real labels. Moreover, for approximate memorization up to error $\epsilon$, the neural tangent kernel can also memorize with only $O\left(\frac{n}{d} \cdot \log(1/\epsilon) \right)$ neurons (assuming that the data is well dispersed too). We show however that these constructions give rise to networks where the magnitude of the neurons' weights are far from optimal. In contrast we propose a new training procedure for ReLU networks, based on complex (as opposed to real) recombination of the neurons, for which we show approximate memorization with both $O\left(\frac{n}{d} \cdot \frac{\log(1/\epsilon)}{\epsilon}\right)$ neurons, as well as nearly-optimal size of the weights.

We consider three-dimensional (3D) localization and imaging of space debris from only one two-dimensional (2D) snapshot image. The technique involves an optical imager that exploits off-center image rotation to encode both the lateral and depth coordinates of point sources, with the latter being encoded in the angle of rotation of the PSF. We formulate 3D localization into a large-scale sparse 3D inverse problem in the discretized form. A recently developed penalty called continuous exact l0 (CEL0) is applied in this problem for the Gaussian noise model. Numerical experiments and comparisons illustrate the efficiency of the algorithm.

Mobile Augmented Reality (MAR) mixes physical environments with user-interactive virtual annotations. Immersive MAR experiences are supported by computation-intensive tasks which rely on offloading mechanisms to ease device workloads. However, this introduces additional network traffic which in turn influences the motion-to-photon latency (a determinant of user-perceived quality of experience). Therefore, a proper transport protocol is crucial to minimise transmission latency and ensure sufficient throughput to support MAR performance. Relatedly, 5G, a potential MAR supporting technology, is widely believed to be smarter, faster, and more efficient than its predecessors. However, the suitability and performance of existing transport protocols in MAR in the 5G context has not been explored. Therefore, we present an evaluation of popular transport protocols, including UDP, TCP, MPEG-TS, RTP, and QUIC, with a MAR system on a real-world 5G testbed. We also compare with their 5G performance with LTE and WiFi. Our evaluation results indicate that TCP has the lowest round-trip-time on 5G, with a median of $15.09\pm0.26$ ms, while QUIC appears to perform better on LTE. Through an additional test with varying signal quality (specifically, degrading secondary synchronisation signal reference signal received quality), we discover that protocol performance appears to be significantly impacted by signal quality.

This work is done as part of a research master's thesis project. The goal is to generate SPARQL queries based on user-supplied keywords to query RDF graphs. To do this, we first transformed the input ontology into an RDF graph that reflects the semantics represented in the ontology. Subsequently, we stored this RDF graph in the Neo4j graphical database to ensure efficient and persistent management of RDF data. At the time of the interrogation, we studied the different possible and desired interpretations of the request originally made by the user. We have also proposed to carry out a sort of transformation between the two query languages SPARQL and Cypher, which is specific to Neo4j. This allows us to implement the architecture of our system over a wide variety of BD-RDFs providing their query languages, without changing any of the other components of the system. Finally, we tested and evaluated our tool using different test bases, and it turned out that our tool is comprehensive, effective, and powerful enough.

We revisit the hypothesis testing problem against independence over a noisy channel and prove a strong converse theorem. In particular, under the Neyman-Pearson formulation, we derive a non-asymptotic upper bound on the type-II exponent of any encoding-decoding functions which can ensure that the type-I error probability is upper bounded by a constant. The strong converse theorem for the problem follows as a corollary as our result. Our proof is based on the recently proposed strong converse technique by Tyagi and Watanabe (TIT 2020) which is based on the change of measure technique. Our work is the first application of the strong converse technique by Tyagi and Watanabe to a hypothesis testing problem over a noisy channel and thus further demonstrates the generality of the technique.

Complex systems thinking is applied to a wide variety of domains, from neuroscience to computer science and economics. The wide variety of implementations has resulted in two key challenges: the progenation of many domain-specific strategies that are seldom revisited or questioned, and the siloing of ideas within a domain due to inconsistency of complex systems language. In this work we offer basic, domain-agnostic language in order to advance towards a more cohesive vocabulary. We use this language to evaluate each step of the complex systems analysis pipeline, beginning with the system and data collected, then moving through different mathematical formalisms for encoding the observed data (i.e. graphs, simplicial complexes, and hypergraphs), and relevant computational methods for each formalism. At each step we consider different types of \emph{dependencies}; these are properties of the system that describe how the existence of one relation among the parts of a system may influence the existence of another relation. We discuss how dependencies may arise and how they may alter interpretation of results or the entirety of the analysis pipeline. We close with two real-world examples using coauthorship data and email communications data that illustrate how the system under study, the dependencies therein, the research question, and choice of mathematical representation influence the results. We hope this work can serve as an opportunity of reflection for experienced complexity scientists, as well as an introductory resource for new researchers.

Available alternative routes on which traffic can be rerouted in the case of disruptions are vital for transportation networks. Line sections with less traffic under normal operational conditions but with increased importance in the case of disruptions are identified in the railway network of Hungary by using a weighted directed graph. To describe the goodness of the individual alternative routes the so-called redundancy index is used. The results show that the structure of the network is good, but the lines with the highest redundancy (lines No. 80, 2, 4 and 77 according to the numbering of the national railway operator, M\'AV) are mostly single tracked and in many cases the line speed is low. The building of additional tracks and electrifying these lines while still maintaining the existing diesel locomotives for the case of disruptions of the electric support are the keys to make the performance of the rather dense railway network of Hungary sustainable.

Improving neural machine translation (NMT) models using the back-translations of the monolingual target data (synthetic parallel data) is currently the state-of-the-art approach for training improved translation systems. The quality of the backward system - which is trained on the available parallel data and used for the back-translation - has been shown in many studies to affect the performance of the final NMT model. In low resource conditions, the available parallel data is usually not enough to train a backward model that can produce the qualitative synthetic data needed to train a standard translation model. This work proposes a self-training strategy where the output of the backward model is used to improve the model itself through the forward translation technique. The technique was shown to improve baseline low resource IWSLT'14 English-German and IWSLT'15 English-Vietnamese backward translation models by 11.06 and 1.5 BLEUs respectively. The synthetic data generated by the improved English-German backward model was used to train a forward model which out-performed another forward model trained using standard back-translation by 2.7 BLEU.

We present an approach to synthesizing new graph structures from empirically specified distributions. The generative model is an auto-decoder that learns to synthesize graphs from latent codes. The graph synthesis model is learned jointly with an empirical distribution over the latent codes. Graphs are synthesized using self-attention modules that are trained to identify likely connectivity patterns. Graph-based normalizing flows are used to sample latent codes from the distribution learned by the auto-decoder. The resulting model combines accuracy and scalability. On benchmark datasets of large graphs, the presented model outperforms the state of the art by a factor of 1.5 in mean accuracy and average rank across at least three different graph statistics, with a 2x speedup during inference.

This article presents a simple port-Hamiltonian formulation of the equations for an RLC electric circuit as a differential-algebraic equation system, and a proof that structural analysis always succeeds on it for a well-posed circuit, thus providing a correct regularisation for numerical solution. The DAE is small - its number of variables/equations is at most the number of edges in the circuit graph.

Inference systems are a widespread framework used to define possibly recursive predicates by means of inference rules. They allow both inductive and coinductive interpretations that are fairly well-studied. In this paper, we consider a middle way interpretation, called regular, which combines advantages of both approaches: it allows non-well-founded reasoning while being finite. We show that the natural proof-theoretic definition of the regular interpretation, based on regular trees, coincides with a rational fixed point. Then, we provide an equivalent inductive characterization, which leads to an algorithm to check whether a judgement is derivable in the regular interpretation. Relying on these results, we define proof techniques for regular reasoning: the regular coinduction principle, to prove completeness, and and an inductive technique to prove soundness, based on the inductive characterization of the regular interpretation. Finally, we show the regular approach can be smoothly extended to inference systems with corules, a recently introduced, generalised framework, which allows to refine the coinductive interpretation.

The main goal of 1-bit compressive sampling is to decode $n$ dimensional signals with sparsity level $s$ from $m$ binary measurements. This is a challenging task due to the presence of nonlinearity, noises and sign flips. In this paper, the cardinality constraint least square is proposed as a desired decoder. We prove that, up to a constant $c$, with high probability, the proposed decoder achieves a minimax estimation error as long as $m \geq \mathcal{O}( s\log n)$. Computationally, we utilize a generalized Newton algorithm (GNA) to solve the cardinality constraint minimization problem with the cost of solving a least squares problem with small size at each iteration. We prove that, with high probability, the $\ell_{\infty}$ norm of the estimation error between the output of GNA and the underlying target decays to $\mathcal{O}(\sqrt{\frac{\log n }{m}})$ after at most $\mathcal{O}(\log s)$ iterations. Moreover, the underlying support can be recovered with high probability in $\mathcal{O}(\log s)$ steps provided that the target signal is detectable. Extensive numerical simulations and comparisons with state-of-the-art methods are presented to illustrate the robustness of our proposed decoder and the efficiency of the GNA algorithm.

Proof-of-work (PoW) is one of the most common techniques to defend against Sybil attacks. Unfortunately, current PoW defenses have two main drawbacks. First, they require work to be done even in the absence of an attack. Second, during an attack, they require good identities (IDs) to spend as much as the attacker.

Recent theoretical work by Gupta, Saia, and Young suggests the possibility of overcoming these two drawbacks. In particular, they describe a new algorithm, GMCom, that always ensures that a minority of IDs are Sybil. They show that rate at which all good IDs perform computation is $O(J_G + \sqrt{T(J_G+1)})$, where $J_G$ is the join rate of good IDs, and $T$ is the rate at which the adversary performs computation.

Unfortunately, this cost bound only holds in the case where (1) GMCom always knows the join rate of good IDs; and (2) there is a fixed constant amount of time that separates join events by good IDs. Here, we present ToGCom, which removes these two shortcomings. To do so, we design and analyze a mechanism for estimating the join rate of good IDs; and also devise a new method for setting the computational cost to join the system. Additionally, we evaluate the performance of ToGCom alongside prior PoW-based defenses. Based on our experiments, we design heuristics that further improve the performance of ToGCom by up to $3$ orders of magnitude over these previous Sybil defenses.

Deep learning (DL) approaches are achieving extraordinary results in a wide range of domains but often require a massive collection of private data. Hence, methods for training neural networks on the joint data of different data owners, that keep each party's input confidential, are called for. We address the setting of horizontally distributed data in deep learning, where the participants' vulnerable intermediate results have to be processed in a privacy-preserving manner. The predominant scheme for this setting is based on homomorphic encryption (HE), and it is widely considered to be without alternative. In contrast to this, we demonstrate that a carefully chosen, less complex and computationally less expensive secure sum protocol in conjunction with default secure channels exhibits superior properties in terms of both collusion-resistance and runtime. Finally, we discuss several open research questions in the context of collaborative DL, which possibly might lead back to HE-based solutions.

Physical layer security is essential in optical networks. In this paper, we study a jamming-aware control plane, in which a high power jamming attack exists in the network. The studied control plane considers that the jammed connections can be detected and avoided. We used a physical layer model, in which we embedded the additional jamming power, to evaluate different security in scenarios, such as a jamming-free scenario, jamming with an unaware controller, and jamming with an aware controller. The performance is analyzed in terms of the blocking rate and slots utilization. We analyze the impact of jamming attacks in the least used link and in the most used link on the network. The results demonstrates that the jamming avoidance by the control plane can reach performance near the not jammed scenario.

How about converting Taylor series to a network to solve the black-box nature of Neural Networks? Controllable and readable polynomial neural network (Gang transform or CR-PNN) is the Taylor expansion in the form of network, which is about ten times more efficient than typical BPNN for forward-propagation. Additionally, we can control the approximation precision and explain the internal structure of the network; thus, it is used for prediction and system identification. However, as the network depth increases, the computational complexity increases. Here, we presented an accelerated method based on an expanded order to optimize CR-PNN. The running speed of the structure of CR-PNN II is significantly higher than CR-PNN I under preserving the properties of CR-PNN I.

In this paper we introduce a recurrent neural network (RNN) based variational autoencoder (VAE) model with a new constrained loss function that can generate more meaningful electroencephalography (EEG) features from raw EEG features to improve the performance of EEG based speech recognition systems. We demonstrate that both continuous and isolated speech recognition systems trained and tested using EEG features generated from raw EEG features using our VAE model results in improved performance and we demonstrate our results for a limited English vocabulary consisting of 30 unique sentences for continuous speech recognition and for an English vocabulary consisting of 2 unique sentences for isolated speech recognition. We compare our method with another recently introduced method described by authors in [1] to improve the performance of EEG based continuous speech recognition systems and we demonstrate that our method outperforms their method as vocabulary size increases when trained and tested using the same data set. Even though we demonstrate results only for automatic speech recognition (ASR) experiments in this paper, the proposed VAE model with constrained loss function can be extended to a variety of other EEG based brain computer interface (BCI) applications.

Deep learning has made major breakthroughs and progress in many fields. This is due to the powerful automatic representation capabilities of deep learning. It has been proved that the design of the network architecture is crucial to the feature representation of data and the final performance. In order to obtain a good feature representation of data, the researchers designed various complex network architectures. However, the design of the network architecture relies heavily on the researchers' prior knowledge and experience. Therefore, a natural idea is to reduce human intervention as much as possible and let the algorithm automatically design the architecture of the network. Thus going further to the strong intelligence.

In recent years, a large number of related algorithms for \textit{Neural Architecture Search} (NAS) have emerged. They have made various improvements to the NAS algorithm, and the related research work is complicated and rich. In order to reduce the difficulty for beginners to conduct NAS-related research, a comprehensive and systematic survey on the NAS is essential. Previously related surveys began to classify existing work mainly from the basic components of NAS: search space, search strategy and evaluation strategy. This classification method is more intuitive, but it is difficult for readers to grasp the challenges and the landmark work in the middle. Therefore, in this survey, we provide a new perspective: starting with an overview of the characteristics of the earliest NAS algorithms, summarizing the problems in these early NAS algorithms, and then giving solutions for subsequent related research work. In addition, we conducted a detailed and comprehensive analysis, comparison and summary of these works. Finally, we give possible future research directions.

This paper uses deformed coherent states, based on a deformed Weyl-Heisenberg algebra that unifies the well-known SU(2), Weyl-Heisenberg, and SU(1,1) groups, through a common parameter. We show that deformed coherent states provide the theoretical foundation of a meta-kernel function, that is a kernel which in turn defines kernel functions. Kernel functions drive developments in the field of machine learning and the meta-kernel function presented in this paper opens new theoretical avenues for the definition and exploration of kernel functions. The meta-kernel function applies associated revolution surfaces as feature spaces identified with non-linear coherent states. An empirical investigation compares the deformed SU(2) and SU(1,1) kernels derived from the meta-kernel which shows performance similar to the Radial Basis kernel, and offers new insights (based on the deformed Weyl-Heisenberg algebra).

The purpose of this work was to develop of metrics to assess network architectures that balance neural network size and task performance. To this end, the concept of neural efficiency is introduced to measure neural layer utilization, and a second metric called artificial intelligence quotient (aIQ) was created to balance neural network performance and neural network efficiency. To study aIQ and neural efficiency, two simple neural networks were trained on MNIST: a fully connected network (LeNet-300-100) and a convolutional neural network (LeNet-5). The LeNet-5 network with the highest aIQ was 2.32% less accurate but contained 30,912 times fewer parameters than the highest accuracy network. Both batch normalization and dropout layers were found to increase neural efficiency. Finally, high aIQ networks are shown to be memorization and overtraining resistant, capable of learning proper digit classification with an accuracy of 92.51% even when 75% of the class labels are randomized. These results demonstrate the utility of aIQ and neural efficiency as metrics for balancing network performance and size.

This paper presents tailor-made neural model structures and two custom fitting criteria for learning dynamical systems. The proposed framework is based on a representation of the system behavior in terms of continuous-time state-space models. The sequence of hidden states is optimized along with the neural network parameters in order to minimize the difference between measured and estimated outputs, and at the same time to guarantee that the optimized state sequence is consistent with the estimated system dynamics. The effectiveness of the approach is demonstrated through three case studies, including two public system identification benchmarks based on experimental data.

Anyone can take a photo, but not everybody has the ability to retouch their pictures and obtain result close to professional. Since it is not possible to ask experts to retouch thousands of pictures, we thought about teaching a piece of software how to reproduce the work of those said experts. This study aims to explore the possibility to use deep learning methods and more specifically, generative adversarial networks (GANs), to mimic artists' retouching and find which one of the studied models provides the best results.

Stochastic gradient descent (SGD) is an inherently sequential training algorithm--computing the gradient at batch $i$ depends on the model parameters learned from batch $i-1$. Prior approaches that break this dependence do not honor them (e.g., sum the gradients for each batch, which is not what sequential SGD would do) and thus potentially suffer from poor convergence. This paper introduces a novel method to combine gradients called Adasum (for adaptive sum) that converges faster than prior work. Adasum is easy to implement, almost as efficient as simply summing gradients, and is integrated into the open-source toolkit Horovod.

This paper first provides a formal justification for Adasum and then empirically demonstrates Adasum is more accurate than prior gradient accumulation methods. It then introduces a series of case-studies to show Adasum works with multiple frameworks, (TensorFlow and PyTorch), scales multiple optimizers (Momentum-SGD, Adam, and LAMB) to larger batch-sizes while still giving good downstream accuracy. Finally, it proves that Adasum converges.

To summarize, Adasum scales Momentum-SGD on the MLPerf Resnet50 benchmark to 64K examples before communication (no MLPerf v0.5 entry converged with more than 16K), the Adam optimizer to 64K examples before communication on BERT-LARGE (prior work showed Adam stopped scaling at 16K), and the LAMB optimizer to 128K before communication on BERT-LARGE (prior work used 64K), all while maintaining downstream accuracy metrics. Finally, if a user does not need to scale, we show LAMB with Adasum on BERT-LARGE converges in 30% fewer steps than the baseline.

Security analysis is an essential activity in security engineering to identify potential system vulnerabilities and achieve security requirements in the early design phases. Due to the increasing complexity of modern systems, traditional approaches, which only consider component failures and simple cause-and-effect linkages, lack the power to identify insecure incidents caused by complex interactions among physical systems, human and social entities. By contrast, a top-down System-Theoretic Process Analysis for Security (STPA-Sec) approach views losses as resulting from interactions, focuses on controlling system vulnerabilities instead of external threats and is applicable for complex socio-technical systems. In this paper, we proposed an extension of STPA-Sec based on data flow structures to overcome STPA-Sec's limitations and achieve security constraints of information-critical systems systematically. We analyzed a Bluetooth digital key system of a vehicle by using both the proposed and the original approach to investigate the relationship and differences between both approaches as well as their applicability and highlights. To conclude, the proposed approach can identify more information-related problems with technical details and be used with other STPA-based approaches to co-design systems in multi-disciplines under the unified STPA process framework.

As the 5G communication networks are being widely deployed worldwide, both industry and academia have started to move beyond 5G and explore 6G communications. It is generally believed that 6G will be established on ubiquitous Artificial Intelligence (AI) to achieve data-driven Machine Learning (ML) solutions in heterogeneous and massive-scale networks. However, traditional ML techniques require centralized data collection and processing by a central server, which is becoming a bottleneck of large-scale implementation in daily life due to significantly increasing privacy concerns. Federated learning, as an emerging distributed AI approach with privacy preservation nature, is particularly attractive for various wireless applications, especially being treated as one of the vital solutions to achieve ubiquitous AI in 6G. In this article, we first introduce the integration of 6G and federated learning and provide potential federated learning applications for 6G. We then describe key technical challenges, the corresponding federated learning methods, and open problems for future research on federated learning in the context of 6G communications.

The use of container technology has skyrocketed during the last few years, with Docker as the leading container platform. Docker's online repository for publicly available container images, called Docker Hub, hosts over 3.5 million images at the time of writing, making it the world's largest community of container images. We perform an extensive vulnerability analysis of 2500 Docker images. It is of particular interest to perform this type of analysis because the vulnerability landscape is a rapidly changing category, the vulnerability scanners are constantly developed and updated, new vulnerabilities are discovered, and the volume of images on Docker Hub is increasing every day. Our main findings reveal that (1) the number of newly introduced vulnerabilities on Docker Hub is rapidly increasing; (2) certified images are the most vulnerable; (3) official images are the least vulnerable; (4) there is no correlation between the number of vulnerabilities and image features (i.e., number of pulls, number of stars, and days since the last update); (5) the most severe vulnerabilities originate from two of the most popular scripting languages, JavaScript and Python; and (6) Python 2.x packages and jackson-databind packages contain the highest number of severe vulnerabilities. We perceive our study as the most extensive vulnerability analysis published in the open literature in the last couple of years.

Detection and description of keypoints from an image is a well-studied problem in Computer Vision. Some methods like SIFT, SURF or ORB are computationally really efficient. This paper proposes a solution for a particular case study on object recognition of industrial parts based on hierarchical classification. Reducing the number of instances leads to better performance, indeed, that is what the use of the hierarchical classification is looking for. We demonstrate that this method performs better than using just one method like ORB, SIFT or FREAK, despite being fairly slower.

In a low-rank linear bandit problem, the reward of an action (represented by a matrix of size $d_1 \times d_2$) is the inner product between the action and an unknown low-rank matrix $\Theta^*$. We propose an algorithm based on a novel combination of online-to-confidence-set conversion~\citep{abbasi2012online} and the exponentially weighted average forecaster constructed by a covering of low-rank matrices. In $T$ rounds, our algorithm achieves $\widetilde{O}((d_1+d_2)^{3/2}\sqrt{rT})$ regret that improves upon the standard linear bandit regret bound of $\widetilde{O}(d_1d_2\sqrt{T})$ when the rank of $\Theta^*$: $r \ll \min\{d_1,d_2\}$. We also extend our algorithmic approach to the generalized linear setting to get an algorithm which enjoys a similar bound under regularity conditions on the link function. To get around the computational intractability of covering based approaches, we propose an efficient algorithm by extending the "Explore-Subspace-Then-Refine" algorithm of~\citet{jun2019bilinear}. Our efficient algorithm achieves $\widetilde{O}((d_1+d_2)^{3/2}\sqrt{rT})$ regret under a mild condition on the action set $\mathcal{X}$ and the $r$-th singular value of $\Theta^*$. Our upper bounds match the conjectured lower bound of \cite{jun2019bilinear} for a subclass of low-rank linear bandit problems. Further, we show that existing lower bounds for the sparse linear bandit problem strongly suggest that our regret bounds are unimprovable. To complement our theoretical contributions, we also conduct experiments to demonstrate that our algorithm can greatly outperform the performance of the standard linear bandit approach when $\Theta^*$ is low-rank.

Throughout life, we might seek a calling, companions, skills, entertainment, truth, self-knowledge, beauty, and edification. The practice of curiosity can be viewed as an extended and open-ended search for valuable information with hidden identity and location in a complex space of interconnected information. Despite its importance, curiosity has been challenging to computationally model because the practice of curiosity often flourishes without specific goals, external reward, or immediate feedback. Here, we show how network science, statistical physics, and philosophy can be integrated into an approach that coheres with and expands the psychological taxonomies of specific-diversive and perceptual-epistemic curiosity. Using this interdisciplinary approach, we distill functional modes of curious information seeking as searching movements in information space. The kinesthetic model of curiosity offers a vibrant counterpart to the deliberative predictions of model-based reinforcement learning. In doing so, this model unearths new computational opportunities for identifying what makes curiosity curious.

As COVID-19 transmissions spread worldwide, governments have announced and enforced travel restrictions to prevent further infections. Such restrictions have a direct effect on the volume of international flights among these countries, resulting in extensive social and economic costs. To better understand the situation in a quantitative manner, we used the Opensky network data to clarify flight patterns and flight densities around the world and observe relationships between flight numbers with new infections, and with the economy (unemployment rate) in Barcelona. We found that the number of daily flights gradually decreased and suddenly dropped 64% during the second half of March in 2020 after the US and Europe enacted travel restrictions. We also observed a 51% decrease in the global flight network density decreased during this period. Regarding new COVID-19 cases, the world had an unexpected surge regardless of travel restrictions. Finally, the layoffs for temporary workers in the tourism and airplane business increased by 4.3 fold in the weeks following Spain's decision to close its borders.

How can deep neural networks encode information that corresponds to words in human speech into raw acoustic data? This paper proposes two neural network architectures for modeling unsupervised lexical learning from raw acoustic inputs, ciwGAN (Categorical InfoWaveGAN) and fiwGAN (Featural InfoWaveGAN), that combine a DCGAN architecture for audio data (WaveGAN; arXiv:1705.07904) with InfoGAN (arXiv:1606.03657), and propose a new latent space structure that can model featural learning simultaneously with a higher level classification. The architectures introduce a network that learns to retrieve latent codes from generated audio outputs. Lexical learning is thus modeled as emergent from an architecture that forces a deep neural network to output data such that unique information is retrievable from its acoustic outputs. The networks trained on lexical items from TIMIT learn to encode unique information corresponding to lexical items in the form of categorical variables. By manipulating these variables, the network outputs specific lexical items. Innovative outputs suggest that phonetic and phonological representations learned by the network can be productively recombined and directly paralleled to productivity in human speech: a fiwGAN network trained on 'suit' and 'dark' outputs innovative 'start', even though it never saw 'start' or even a [st] sequence in the training data. We also argue that setting latent featural codes to values well beyond training range results in almost categorical generation of prototypical lexical items and reveals underlying values of each latent code. Probing deep neural networks trained on well understood dependencies in speech bears implications for latent space interpretability, understanding how deep neural networks learn meaningful representations, as well as a potential for unsupervised text-to-speech generation in the GAN framework.

For the Stokes equation over 2D and 3D domains, explicit a posteriori and a priori error estimation are novelly developed for the finite element solution. The difficulty in handling the divergence-free condition of the Stokes equation is solved by utilizing the extended hypercircle method along with the Scott-Vogelius finite element scheme. Since all terms in the error estimation have explicit values, by further applying the interval arithmetic and verified computing algorithms, the computed results provide rigorous estimation for the approximation error. As an application of the proposed error estimation, the eigenvalue problem of the Stokes operator is considered and rigorous bounds for the eigenvalues are obtained. The efficiency of proposed error estimation is demonstrated by solving the Stokes equation on both convex and non-convex 3D domains.

Gaussian Mixture models (GMMs) are a powerful tool for clustering, classification and density estimation when clustering structures are embedded in the data. The presence of missing values can largely impact the GMMs estimation process, thus handling missing data turns out to be a crucial point in clustering, classification and density estimation. Several techniques have been developed to impute the missing values before model estimation. Among these, multiple imputation is a simple and useful general approach to handle missing data. In this paper we propose two different methods to fit Gaussian mixtures in the presence of missing data. Both methods use a variant of the Monte Carlo Expectation-Maximisation (MCEM) algorithm for data augmentation. Thus, multiple imputations are performed during the E-step, followed by the standard M-step for a given eigen-decomposed component-covariance matrix. We show that the proposed methods outperform the multiple imputation approach, both in terms of clusters identification and density estimation.

Many real-world scenarios require the random selection of one or more individuals from a pool of eligible candidates. One example of especial social relevance refers to the legal system, in which the jurors and judges are commonly picked according to some probability distribution aiming to avoid biased decisions. In this scenario, ensuring auditability of the random drawing procedure is imperative to promote confidence in its fairness. With this goal in mind, this article describes a protocol for random drawings specially designed for use in legal systems. The proposed design combines the following properties: security by design, ensuring the fairness of the random draw as long as at least one participant behaves honestly; auditability by any interested party, even those having no technical background, using only public information; and statistical robustness, supporting drawings where candidates may have distinct probability distributions. Moreover, it is capable of inviting and engaging as participating stakeholders the main interested parties of a legal process, in a way that promotes process transparency, public trust and institutional resilience. An open-source implementation is also provided as supplementary material.

Reservoir Computing (RC) is a well-known strategy for designing Recurrent Neural Networks featured by striking efficiency of training. The crucial aspect of RC is to properly instantiate the hidden recurrent layer that serves as dynamical memory to the system. In this respect, the common recipe is to create a pool of randomly and sparsely connected recurrent neurons. While the aspect of sparsity in the design of RC systems has been debated in the literature, it is nowadays understood mainly as a way to enhance the efficiency of computation, exploiting sparse matrix operations. In this paper, we empirically investigate the role of sparsity in RC network design under the perspective of the richness of the developed temporal representations. We analyze both sparsity in the recurrent connections, and in the connections from the input to the reservoir. Our results point out that sparsity, in particular in input-reservoir connections, has a major role in developing internal temporal representations that have a longer short-term memory of past inputs and a higher dimension.

The amount of video data being produced is rapidly growing. At the same time, advances in machine learning and computer vision have enabled applications to query over the contents of videos. For example, an ornithology application may retrieve birds of various species from a nature video. However, modern video data management systems store videos as a single encoded file, which does not provide opportunities to optimize queries for spatial subsets of videos. We propose utilizing a feature in modern video codecs called "tiles" to enable spatial random access into encoded videos. We present the design of TASM, a tile-aware storage manager, and describe techniques it uses to optimize the physical layout of videos for various query workloads. We demonstrate how TASM can significantly improve the performance of queries over videos when the workload is known, as well as how it can incrementally adapt the physical layout of videos to improve performance even when the workload is not known. Layouts picked by TASM speed up individual queries by an average of 51% and up to 94% while maintaining good quality.

This paper presents a free Japanese singing voice corpus that can be used for highly applicable and reproducible singing voice synthesis research. A singing voice corpus helps develop singing voice synthesis, but existing corpora have two critical problems: data imbalance (singing voice corpora do not guarantee phoneme balance, unlike speaking-voice corpora) and copyright issues (cannot legally share data). As a way to avoid these problems, we constructed a PJS (phoneme-balanced Japanese singing voice) corpus that guarantees phoneme balance and is licensed with CC BY-SA 4.0, and we composed melodies using a phoneme-balanced speaking-voice corpus. This paper describes how we built the corpus.

In this paper we analyse full discretizations of an initial boundary value problem (IBVP) related to reaction-diffusion equations. The IBVP is first discretized in time via the deferred correction method for the implicit midpoint rule and leads to a time-stepping scheme of order $2p+2$ of accuracy at the stage $p=0,1,2,\cdots$ of the correction. Each semi-discretized scheme results in a nonlinear elliptic equation for which the existence of a solution is proven using the Schaefer fixed point theorem. The elliptic equation corresponding to the stage $p$ of the correction is discretized by the Galerkin finite element method and gives a full discretization of the IBVP. This fully discretized scheme is unconditionlly stable with order $2p+2$ of accuracy in time. The order of accuracy in space is equal to the degree of the finite element used when the family of meshes considered is shape-regular while an increment of one order is proven for shape-regular and quasi-uniform family of meshes. A numerical test with a bistable reaction-diffusion equation having a strong stiffness ratio is performed and shows that the orders 2,4,6,8 and 10 of accuracy in time are achieved with a very strong stability.

RarePlanes is a unique open-source machine learning dataset that incorporates both real and synthetically generated satellite imagery. The RarePlanes dataset specifically focuses on the value of synthetic data to aid computer vision algorithms in their ability to automatically detect aircraft and their attributes in satellite imagery. Although other synthetic/real combination datasets exist, RarePlanes is the largest openly-available very-high resolution dataset built to test the value of synthetic data from an overhead perspective. Previous research has shown that synthetic data can reduce the amount of real training data needed and potentially improve performance for many tasks in the computer vision domain. The real portion of the dataset consists of 253 Maxar WorldView-3 satellite scenes spanning 112 locations and 2,142 km^2 with 14,700 hand-annotated aircraft. The accompanying synthetic dataset is generated via a novel simulation platform and features 50,000 synthetic satellite images with ~630,000 aircraft annotations. Both the real and synthetically generated aircraft feature 10 fine grain attributes including: aircraft length, wingspan, wing-shape, wing-position, wingspan class, propulsion, number of engines, number of vertical-stabilizers, presence of canards, and aircraft role. Finally, we conduct extensive experiments to evaluate the real and synthetic datasets and compare performances. By doing so, we show the value of synthetic data for the task of detecting and classifying aircraft from an overhead perspective.

Grammar error correction (GEC) systems have become ubiquitous in a variety of software applications, and have started to approach human-level performance for some datasets. However, very little is known about how to efficiently personalize these systems to the user's characteristics, such as their proficiency level and first language, or to emerging domains of text. We present the first results on adapting a general-purpose neural GEC system to both the proficiency level and the first language of a writer, using only a few thousand annotated sentences. Our study is the broadest of its kind, covering five proficiency levels and twelve different languages, and comparing three different adaptation scenarios: adapting to the proficiency level only, to the first language only, or to both aspects simultaneously. We show that tailoring to both scenarios achieves the largest performance improvement (3.6 F0.5) relative to a strong baseline.

This paper describes FBK's participation in the IWSLT 2020 offline speech translation (ST) task. The task evaluates systems' ability to translate English TED talks audio into German texts. The test talks are provided in two versions: one contains the data already segmented with automatic tools and the other is the raw data without any segmentation. Participants can decide whether to work on custom segmentation or not. We used the provided segmentation. Our system is an end-to-end model based on an adaptation of the Transformer for speech data. Its training process is the main focus of this paper and it is based on: i) transfer learning (ASR pretraining and knowledge distillation), ii) data augmentation (SpecAugment, time stretch and synthetic data), iii) combining synthetic and real data marked as different domains, and iv) multi-task learning using the CTC loss. Finally, after the training with word-level knowledge distillation is complete, our ST models are fine-tuned using label smoothed cross entropy. Our best model scored 29 BLEU on the MuST-C En-De test set, which is an excellent result compared to recent papers, and 23.7 BLEU on the same data segmented with VAD, showing the need for researching solutions addressing this specific data condition.

A major goal of dynamical systems theory is the search for simplified descriptions of the dynamics of a large number of interacting states. For overwhelmingly complex dynamical systems, the derivation of a reduced description on the entire dynamics at once is computationally infeasible. Other complex systems are so expansive that despite the continual onslaught of new data only partial information is available. To address this challenge, we define and optimise for a local quality function severability for measuring the dynamical coherency of a set of states over time. The theoretical underpinnings of severability lie in our local adaptation of the Simon-Ando-Fisher time-scale separation theorem, which formalises the intuition of local wells in the Markov landscape of a dynamical process, or the separation between a microscopic and a macroscopic dynamics. Finally, we demonstrate the practical relevance of severability by applying it to examples drawn from power networks, image segmentation, social networks, metabolic networks, and word association.

This research gauges the ability of deep reinforcement learning (DRL) techniques to assist the optimization and control of fluid mechanical systems. It combines a novel, "degenerate" version of the proximal policy optimization (PPO) algorithm, that trains a neural network in optimizing the system only once per learning episode, and an in-house stabilized finite elements environment implementing the variational multiscale (VMS) method, that computes the numerical reward fed to the neural network. Three prototypical examples of separated flows in two dimensions are used as testbed for developing the methodology, each of which adds a layer of complexity due either to the unsteadiness of the flow solutions, or the sharpness of the objective function, or the dimension of the control parameter space. Relevance is carefully assessed by comparing systematically to reference data obtained by canonical direct and adjoint methods. Beyond adding value to the shallow literature on this subject, these findings establish the potential of single-step PPO for reliable black-box optimization of computational fluid dynamics (CFD) systems, which paves the way for future progress in optimal flow control using this new class of methods.

Median regression analysis has robustness properties which make it attractive compared with regression based on the mean, while differential privacy can protect individual privacy during statistical analysis of certain datasets. In this paper, three privacy preserving methods are proposed for median regression. The first algorithm is based on a finite smoothing method, the second provides an iterative way and the last one further employs the greedy coordinate descent approach. Privacy preserving properties of these three methods are all proved. Accuracy bound or convergence properties of these algorithms are also provided. Numerical calculation shows that the first method has better accuracy than the others when the sample size is small. When the sample size becomes larger, the first method needs more time while the second method needs less time with well-matched accuracy. For the third method, it costs less time in both cases, while it highly depends on step size.

Reinforcement learning is a popular machine learning paradigm which can find near optimal solutions to complex problems. Most often, these procedures involve function approximation using neural networks with gradient based updates to optimise weights for the problem being considered. While this common approach generally works well, there are other update mechanisms which are largely unexplored in reinforcement learning. One such mechanism is Extreme Learning Machines. These were initially proposed to drastically improve the training speed of neural networks and have since seen many applications. Here we attempt to apply extreme learning machines to a reinforcement learning problem in the same manner as gradient based updates. This new algorithm is called Extreme Q-Learning Machine (EQLM). We compare its performance to a typical Q-Network on the cart-pole task - a benchmark reinforcement learning problem - and show EQLM has similar long-term learning performance to a Q-Network.

We present an integer programming model to compute the strong rainbow connection number $src(G)$ of any simple graph $G$. We introduce several enhancements to the proposed model, including a fast heuristic, a novel class of valid inequalities, and a variable elimination scheme. Moreover, we present a novel lower bound for $src(G)$ which may be of independent research interest. We evaluate our model with a traditional branch and cut approach as well as an alternative scheme based on iterative lower bound improvement, which we show to be highly effective in practice. To our knowledge, these are the first computational methods for the strong rainbow connection problem. We demonstrate the efficacy of our methods by computing the strong rainbow connection numbers of graphs with up to $167$ vertices.

Humans are able to create rich representations of their external reality. Their internal representations allow for cross-modality inference, where available perceptions can induce the perceptual experience of missing input modalities. In this paper, we contribute the Multimodal Hierarchical Variational Auto-encoder (MHVAE), a hierarchical multimodal generative model for representation learning. Inspired by human cognitive models, the MHVAE is able to learn modality-specific distributions, of an arbitrary number of modalities, and a joint-modality distribution, responsible for cross-modality inference. We formally derive the model's evidence lower bound and propose a novel methodology to approximate the joint-modality posterior based on modality-specific representation dropout. We evaluate the MHVAE on standard multimodal datasets. Our model performs on par with other state-of-the-art generative models regarding joint-modality reconstruction from arbitrary input modalities and cross-modality inference.

A shared grasp is a grasp formed by contacts between the manipulated object and both the robot hand and the environment. By trading off hand contacts for environmental contacts, a shared grasp requires fewer contacts with the hand, and enables manipulation even when a full grasp is not possible. Previous research has used shared grasps for non-prehensile manipulation such as pivoting and tumbling. This paper treats the problem more generally, with methods to select the best shared grasp and robot actions for a desired object motion. The central issue is to evaluate the feasible contact modes: for each contact, whether that contact will remain active, and whether slip will occur. Robustness is important. When a contact mode fails, e.g., when a contact is lost, or when unintentional slip occurs, the operation will fail, and in some cases damage may occur. In this work, we enumerate all feasible contact modes, calculate corresponding controls, and select the most robust candidate. We can also optimize the contact geometry for robustness. This paper employs quasi-static analysis of planar rigid bodies with Coulomb friction to derive the algorithms and controls. Finally, we demonstrate the robustness of shared grasping and the use of our methods in representative experiments and examples. The video can be found at https://youtu.be/tyNhJvRYZNk

We show that the cop number of any graph on 18 or fewer vertices is at most 3. This answers a specific case of a question posed by Baird et al. on the minimum order of 4-cop-win graphs, first appearing in 2011. We also find all 3-cop-win graphs on 11 vertices, narrow down the possible 4-cop-win graphs on 19 vertices and get some progress on finding the minimum order of 3-cop-win planar graphs.

Upper Confidence Bound (UCB) is arguably the most commonly used method for linear multi-arm bandit problems. While conceptually and computationally simple, this method highly relies on the confidence bounds, failing to strike the optimal exploration-exploitation if these bounds are not properly set. In the literature, confidence bounds are typically derived from concentration inequalities based on assumptions on the reward distribution, e.g., sub-Gaussianity. The validity of these assumptions however is unknown in practice. In this work, we aim at learning the confidence bound in a data-driven fashion, making it adaptive to the actual problem structure. Specifically, noting that existing UCB-typed algorithms are not differentiable with respect to confidence bound, we first propose a novel differentiable linear bandit algorithm. Then, we introduce a gradient estimator, which allows the confidence bound to be learned via gradient ascent. Theoretically, we show that the proposed algorithm achieves a $\tilde{\mathcal{O}}(\hat{\beta}\sqrt{dT})$ upper bound of $T$-round regret, where $d$ is the dimension of arm features and $\hat{\beta}$ is the learned size of confidence bound. Empirical results show that $\hat{\beta}$ is significantly smaller than its theoretical upper bound and proposed algorithms outperforms baseline ones on both simulated and real-world datasets.

Automatic emotion recognition plays a significant role in the process of human computer interaction and the design of Internet of Things (IOT) technologies. Yet, a common problem in emotion recognition systems lies in the scarcity of reliable labels. By modeling pairwise differences between samples of interest, a Siamese network can help to mitigate this challenge since it requires fewer samples than traditional deep learning methods. In this paper, we propose a distance loss, which can be applied on the Siamese network fine-tuning, by optimizing the model based on the relevant distance between same and difference class pairs. Our system use samples from the source data to pre-train the weights of proposed Siamese neural network, which are fine-tuned based on the target data. We present an emotion recognition task that uses speech, since it is one of the most ubiquitous and frequently used bio-behavioral signals. Our target data comes from the RAVDESS dataset, while the CREMA-D and eNTERFACE'05 are used as source data, respectively. Our results indicate that the proposed distance loss is able to greatly benefit the fine-tuning process of Siamese network. Also, the selection of source data has more effect on the Siamese network performance compared to the number of frozen layers. These suggest the great potential of applying the Siamese network and modelling pairwise differences in the field of transfer learning for automatic emotion recognition.

Functional Distributional Semantics provides a computationally tractable framework for learning truth-conditional semantics from a corpus. Previous work in this framework has provided a probabilistic version of first-order logic, recasting quantification as Bayesian inference. In this paper, I show how the previous formulation gives trivial truth values when a precise quantifier is used with vague predicates. I propose an improved account, avoiding this problem by treating a vague predicate as a distribution over precise predicates. I connect this account to recent work in the Rational Speech Acts framework on modelling generic quantification, and I extend this to modelling donkey sentences. Finally, I explain how the generic quantifier can be both pragmatically complex and yet computationally simpler than precise quantifiers.

We explore if it is possible to learn a directed acyclic graph (DAG) from data without imposing explicitly the acyclicity constraint. In particular, for Gaussian distributions, we frame structural learning as a sparse matrix factorization problem and we empirically show that solving an $\ell_1$-penalized optimization yields to good recovery of the true graph and, in general, to almost-DAG graphs. Moreover, this approach is computationally efficient and is not affected by the explosion of combinatorial complexity as in classical structural learning algorithms.

Spiking Neural Networks (SNN) represent a biologically inspired computation model capable of emulating neural computation in human brain and brain-like structures. The main promise is very low energy consumption. Unfortunately, classic Von Neumann architecture based SNN accelerators often fail to address demanding computation and data transfer requirements efficiently at scale. In this work, we propose a promising alternative, an in-memory SNN accelerator based on Spintronic Computational RAM (CRAM) to overcome scalability limitations, which can reduce the energy consumption by up to 164.1$\times$ when compared to a representative ASIC solution.

Let $s$ and $t$ be positive integers. We use $P_t$ to denote the path with $t$ vertices and $K_{1,s}$ to denote the complete bipartite graph with parts of size $1$ and $s$ respectively. The one-subdivision of $K_{1,s}$ is obtained by replacing every edge $\{u,v\}$ of $K_{1,s}$ by two edges $\{u,w\}$ and $\{v,w\}$ with a new vertex $w$. In this paper, we give a polynomial-time algorithm for the list-three-coloring problem restricted to the class of $P_t$-free graph with no induced 1-subdivision of $K_{1,s}$.

We present a system that allows a user to search a large linguistically annotated corpus using syntactic patterns over dependency graphs. In contrast to previous attempts to this effect, we introduce a light-weight query language that does not require the user to know the details of the underlying syntactic representations, and instead to query the corpus by providing an example sentence coupled with simple markup. Search is performed at an interactive speed due to an efficient linguistic graph-indexing and retrieval engine. This allows for rapid exploration, development and refinement of syntax-based queries. We demonstrate the system using queries over two corpora: the English wikipedia, and a collection of English pubmed abstracts. A demo of the wikipedia system is available at: https://allenai.github.io/spike

We introduce a stochastic variational inference procedure for training scalable Gaussian process (GP) models whose per-iteration complexity is independent of both the number of training points, $n$, and the number basis functions used in the kernel approximation, $m$. Our central contributions include an unbiased stochastic estimator of the evidence lower bound (ELBO) for a Gaussian likelihood, as well as a stochastic estimator that lower bounds the ELBO for several other likelihoods such as Laplace and logistic. Independence of the stochastic optimization update complexity on $n$ and $m$ enables inference on huge datasets using large capacity GP models. We demonstrate accurate inference on large classification and regression datasets using GPs and relevance vector machines with up to $m = 10^7$ basis functions.

Reproducible workflow solutions commonly use high-level technologies that were popular when they were created, providing an immediate solution which is unlikely to be sustainable in the long term. We therefore introduce a set of criteria to address this problem and demonstrate their practicality and implementation. The criteria have been tested in several research publications and can be summarized as: completeness (no dependency beyond a POSIX-compatible operating system, no administrator privileges, no network connection and storage primarily in plain text); modular design; minimal complexity; scalability; verifiable inputs and outputs; temporal provenance; linking analysis with narrative; and free-and-open-source software. As a proof of concept, we have implemented "Maneage", a solution which stores the project in machine-actionable and human-readable plain-text, enables version-control, cheap archiving, automatic parsing to extract data provenance, and peer-reviewable verification. We show that requiring longevity of a reproducible workflow solution is realistic, without sacrificing immediate or short-term reproducibility and discuss the benefits of the criteria for scientific progress. This paper has itself been written in Maneage, with snapshot 1637cce.

Live video commenting systems are an emerging feature of online video sites. Recently the Chinese video sharing platform Bilibili, has popularised a novel captioning system where user comments are displayed as streams of moving subtitles overlaid on the video playback screen and broadcast to all viewers in real-time. LiveBot was recently introduced as a novel Automatic Live Video Commenting (ALVC) application. This enables the automatic generation of live video comments from both the existing video stream and existing viewers comments. In seeking to reproduce the baseline results reported in the original Livebot paper, we found differences between the reproduced results using the project codebase and the numbers reported in the paper. Further examination of this situation suggests that this may be caused by a number of small issues in the project code, including a non-obvious overlap between the training and test sets. In this paper, we study these discrepancies in detail and propose an alternative baseline implementation as a reference for other researchers in this field.

The current crisis, at the time of writing, has had a profound impact on the financial world, introducing the need for creative approaches to revitalising the economy at the micro level as well as the macro level. In this informal analysis and design proposal, we describe how infrastructure for digital assets can serve as a useful monetary and fiscal policy tool and an enabler of existing tools in the future, particularly during crises, while aligning the trajectory of financial technology innovation toward a brighter future. We propose an approach to digital currency that would allow people without banking relationships to transact electronically and privately, including both internet purchases and point-of-sale purchases that are required to be cashless. We also propose an approach to digital currency that would allow for more efficient and transparent clearing and settlement, implementation of monetary and fiscal policy, and management of systemic risk. The digital currency could be implemented as central bank digital currency (CBDC), or it could be issued by the government and collateralised by public funds or Treasury assets. Our proposed architecture allows both manifestations and would be operated by banks and other money services businesses, operating within a framework overseen by government regulators. We argue that now is the time for action to undertake development of such a system, not only because of the current crisis but also in anticipation of future crises resulting from geopolitical risks, the continued globalisation of the digital economy, and the changing value and risks that technology brings.

The objective of this paper is to recover the original component signals from a mixture audio with the aid of visual cues of the sound sources. Such task is usually referred as visually guided sound source separation. The proposed Cascaded Opponent Filter (COF) framework consists of multiple stages, which recursively refine the sound separation based on appearance and motion information. A key element is a novel opponent filter module that identifies and relocates residual components between sound sources. Finally, we propose a Sound Source Location Masking (SSLM) technique, which, together with COF, produces a pixel level mask of the source location. The entire system is trained end-to-end using a large set of unlabelled videos. We compare COF with recent baselines and obtain state-of-the-art performance in three challenging datasets (MUSIC, A-MUSIC, and A-NATURAL). The implementation and pre-trained models will be made publicly available.

Modern deep neural networks increasingly make use of features such as dynamic control flow, data structures and dynamic tensor shapes. Existing deep learning systems focus on optimizing and executing static neural networks which assume a pre-determined model architecture and input data shapes--assumptions which are violated by dynamic neural networks. Therefore, executing dynamic models with deep learning systems is currently both inflexible and sub-optimal, if not impossible. Optimizing dynamic neural networks is more challenging than static neural networks; optimizations must consider all possible execution paths and tensor shapes. This paper proposes Nimble, a high-performance and flexible system to optimize, compile, and execute dynamic neural networks on multiple platforms. Nimble handles model dynamism by introducing a dynamic type system, a set of dynamism-oriented optimizations, and a light-weight virtual machine runtime. Our evaluation demonstrates that Nimble outperforms state-of-the-art deep learning frameworks and runtime systems for dynamic neural networks by up to 20x on hardware platforms including Intel CPUs, ARM CPUs, and Nvidia GPUs.

This paper presents a new challenging information extraction task in the domain of materials science. We develop an annotation scheme for marking information on experiments related to solid oxide fuel cells in scientific publications, such as involved materials and measurement conditions. With this paper, we publish our annotation guidelines, as well as our SOFC-Exp corpus consisting of 45 open-access scholarly articles annotated by domain experts. A corpus and an inter-annotator agreement study demonstrate the complexity of the suggested named entity recognition and slot filling tasks as well as high annotation quality. We also present strong neural-network based models for a variety of tasks that can be addressed on the basis of our new data set. On all tasks, using BERT embeddings leads to large performance gains, but with increasing task complexity, adding a recurrent neural network on top seems beneficial. Our models will serve as competitive baselines in future work, and analysis of their performance highlights difficult cases when modeling the data and suggests promising research directions.

Predicting the electrical behavior of the heart, from the cellular scale to the tissue level, relies on the formulation and numerical approximation of coupled nonlinear dynamical systems. These systems describe the cardiac action potential, that is the polarization/depolarization cycle occurring at every heart beat that models the time evolution of the electrical potential across the cell membrane, as well as a set of ionic variables. Multiple solutions of these systems, corresponding to different model inputs, are required to evaluate outputs of clinical interest, such as activation maps and action potential duration. More importantly, these models feature coherent structures that propagate over time, such as wavefronts. These systems can hardly be reduced to lower dimensional problems by conventional reduced order models (ROMs) such as, e.g., the reduced basis (RB) method. This is primarily due to the low regularity of the solution manifold (with respect to the problem parameters) as well as to the nonlinear nature of the input-output maps that we intend to reconstruct numerically. To overcome this difficulty, in this paper we propose a new, nonlinear approach which exploits deep learning (DL) algorithms to obtain accurate and efficient ROMs, whose dimensionality matches the number of system parameters. Our DL approach combines deep feedforward neural networks (NNs) and convolutional autoencoders (AEs). We show that the proposed DL-ROM framework can efficiently provide solutions to parametrized electrophysiology problems, thus enabling multi-scenario analysis in pathological cases. We investigate three challenging test cases in cardiac electrophysiology and prove that DL-ROM outperforms classical projection-based ROMs.

Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP), based on a single trajectory of Markovian samples induced by a behavior policy. Focusing on a $\gamma$-discounted MDP with state space $\mathcal{S}$ and action space $\mathcal{A}$, we demonstrate that the $\ell_{\infty}$-based sample complexity of classical asynchronous Q-learning -- namely, the number of samples needed to yield an entrywise $\varepsilon$-accurate estimate of the Q-function -- is at most on the order of \begin{equation*} \frac{1}{\mu_{\mathsf{min}}(1-\gamma)^5\varepsilon^2}+ \frac{t_{\mathsf{mix}}}{\mu_{\mathsf{min}}(1-\gamma)} \end{equation*} up to some logarithmic factor, provided that a proper constant learning rate is adopted. Here, $t_{\mathsf{mix}}$ and $\mu_{\mathsf{min}}$ denote respectively the mixing time and the minimum state-action occupancy probability of the sample trajectory. The first term of this bound matches the complexity in the case with independent samples drawn from the stationary distribution of the trajectory. The second term reflects the expense taken for the empirical distribution of the Markovian trajectory to reach a steady state, which is incurred at the very beginning and becomes amortized as the algorithm runs. Encouragingly, the above bound improves upon the state-of-the-art result by a factor of at least $|\mathcal{S}||\mathcal{A}|$. Further, the scaling on the discount complexity can be improved by means of variance reduction.

In large-scale distributed storage systems, erasure codes are used to achieve fault tolerance in the face of node failures. Tuning code parameters to observed failure rates has been shown to significantly reduce storage cost. Such tuning of redundancy requires "code conversion", i.e., a change in code dimension and length on already encoded data. Convertible codes are a new class of codes designed to perform such conversions efficiently. The access cost of conversion is the number of nodes accessed during conversion.

Existing literature has characterized the access cost of conversion of linear MDS convertible codes only for a specific and small subset of parameters. In this paper, we present lower bounds on the access cost of conversion of linear MDS codes for all valid parameters. Furthermore, we show that these lower bounds are tight by presenting an explicit construction for access-optimal linear MDS convertible codes for all valid parameters. En route, we show that, one of the degrees-of-freedom in the design of convertible codes that was inconsequential in the previously studied parameter regimes, turns out to be crucial when going beyond these regimes and adds to the challenge in the analysis and code construction.

Difficulty algorithms are a fundamental component of Proof-of-Work blockchains, aimed at maintaining stable block production times by dynamically adjusting the network difficulty in response to the miners' constantly changing computational power. Targeting stable block times is critical, as this ensures consistent transaction throughput. Some blockchains need difficulty algorithms that react quickly to severe hash rate fluctuations. However, without careful design this could create vulnerabilities that incentivize miners to engage in coin-hopping strategies which yield an unreliable system due to unstable processing of transactions.

We provide an empirical analysis of how Bitcoin Cash exhibits cyclicality in block solve times as a consequence of a positive feedback loop in its difficulty algorithm design. Additionally, we examine the extent to which miners' behavior contributes towards this phenomenon over time. In response, we mathematically derive a difficulty algorithm based on a negative exponential filter that prohibits the formation of positive feedback loops and exhibits additional desirable properties, such as history agnosticism. We compare the described algorithm to that of Bitcoin Cash in a simulated mining environment and verify that the former would eliminate the severe oscillations in block solve times. Lastly, we outline how this model can more generally replace difficulty algorithms in other Proof-of-Work blockchains.

Providing high quality-of-service for live communication is a pervasive challenge which is plagued by packet losses during transmission. Streaming codes are a class of erasure codes specifically designed for such low-latency streaming communication settings. We consider the recently proposed setting of streaming codes under variable-size messages which reflects the requirements of applications such as live video streaming. In practice, streaming codes often need to operate in an "online" setting where the sizes of the future messages are unknown. Yet, previously studied upper bounds on the rate apply to "offline" coding schemes with access to all (including future) message sizes.

In this paper, we evaluate whether the optimal offline rate is a feasible goal for online streaming codes when communicating over a burst-only packet loss channel. We identify two broad parameter regimes where, perhaps surprisingly, online streaming codes can, in fact, match the optimal offline rate. For both of these settings, we present rate-optimal online code constructions. For all remaining parameter settings, we establish that it is impossible for online coding schemes to attain the optimal offline rate.

Information-theoretic formulations of the private information retrieval (PIR) problem have been investigated under a variety of scenarios. Symmetric private information retrieval (SPIR) is a variant where a user is able to privately retrieve one out of $K$ messages from $N$ non-colluding replicated databases without learning anything about the remaining $K-1$ messages. However, the goal of perfect privacy can be too taxing for certain applications. In this paper, we investigate if the information-theoretic capacity of SPIR (equivalently, the inverse of the minimum download cost) can be increased by relaxing both user and DB privacy definitions. Such relaxation is relevant in applications where privacy can be traded for communication efficiency. We introduce and investigate the Asymmetric Leaky PIR (AL-PIR) model with different privacy leakage budgets in each direction. For user privacy leakage, we bound the probability ratios between all possible realizations of DB queries by a function of a non-negative constant $\epsilon$. For DB privacy, we bound the mutual information between the undesired messages, the queries, and the answers, by a function of a non-negative constant $\delta$. We propose a general AL-PIR scheme that achieves an upper bound on the optimal download cost for arbitrary $\epsilon$ and $\delta$. We show that the optimal download cost of AL-PIR is upper-bounded as $D^{*}(\epsilon,\delta)\leq 1+\frac{1}{N-1}-\frac{\delta e^{\epsilon}}{N^{K-1}-1}$. Second, we obtain an information-theoretic lower bound on the download cost as $D^{*}(\epsilon,\delta)\geq 1+\frac{1}{Ne^{\epsilon}-1}-\frac{\delta}{(Ne^{\epsilon})^{K-1}-1}$. The gap analysis between the two bounds shows that our AL-PIR scheme is optimal when $\epsilon =0$, i.e., under perfect user privacy and it is optimal within a maximum multiplicative gap of $\frac{N-e^{-\epsilon}}{N-1}$ for any $(\epsilon,\delta)$.

We consider the content delivery problem in a fading multi-input single-output channel with cache-aided users. We are interested in the scalability of the equivalent content delivery rate when the number of users, $K$, is large. Analytical results show that, using coded caching and wireless multicasting, without channel state information at the transmitter (CSIT), linear scaling of the content delivery rate with respect to $K$ can be achieved in some different ways. First, if the multicast transmission spans over $L$ independent sub-channels, e.g., in quasi-static fading if $L = 1$, and in block fading or multi-carrier systems if $L>1$, linear scaling can be obtained when the product of the number of transmit antennas and the number of sub-channels scales logarithmically with $K$. Second, even with a fixed number of antennas, we can achieve the linear scaling with a threshold-based user selection requiring only one-bit feedbacks from the users. When CSIT is available, we propose a mixed strategy that combines spatial multiplexing and multicasting. Numerical results show that, by optimizing the power split between spatial multiplexing and multicasting, we can achieve a significant gain of the content delivery rate with moderate cache size.

A leading choice of error correction for scalable quantum computing is the surface code with lattice surgery. The basic lattice surgery operations, the merging and splitting of logical qubits, act non-unitarily on the logical states and are not easily captured by standard circuit notation. This raises the question of how best to design, verify, and optimise protocols that use lattice surgery, in particular in architectures with complex resource management issues. In this paper we demonstrate that the operations of the ZX calculus -- a form of quantum diagrammatic reasoning based on bialgebras -- match exactly the operations of lattice surgery. Red and green "spider" nodes match rough and smooth merges and splits, and follow the axioms of a dagger special associative Frobenius algebra. Some lattice surgery operations require non-trivial correction operations, which are captured natively in the use of the ZX calculus in the form of ensembles of diagrams. We give a first taste of the power of the calculus as a language for lattice surgery by considering two operations (T gates and producing a CNOT ) and show how ZX diagram re-write rules give lattice surgery procedures for these operations that are novel, efficient, and highly configurable.

A Behavior Tree (BT) is a way to structure the switching between different tasks in an autonomous agent, such as a robot or a virtual entity in a computer game. BTs are a very efficient way of creating complex systems that are both modular and reactive. These properties are crucial in many applications, which has led to the spread of BT from computer game programming to many branches of AI and Robotics. In this book, we will first give an introduction to BTs, then we describe how BTs relate to, and in many cases generalize, earlier switching structures. These ideas are then used as a foundation for a set of efficient and easy to use design principles. Properties such as safety, robustness, and efficiency are important for an autonomous system, and we describe a set of tools for formally analyzing these using a state space description of BTs. With the new analysis tools, we can formalize the descriptions of how BTs generalize earlier approaches. We also show the use of BTs in automated planning and machine learning. Finally, we describe an extended set of tools to capture the behavior of Stochastic BTs, where the outcomes of actions are described by probabilities. These tools enable the computation of both success probabilities and time to completion.

To a large extent, the stiffness of the bidomain and monodomain models depends on the choice of the ionic model, which varies in terms of complexity and realism. In this paper, we compare and analyze a variety of time-stepping methods: explicit or semi-implicit, operator splitting, exponential, and deferred correction methods. We compare these methods for solving the bidomain model coupled with three ionic models of varying complexity and stiffness: the phenomenological Mitchell-Schaeffer model, the more realistic Beeler-Reuter model, and the stiff and very complex ten Tuscher-Noble-Noble-Panfilov (TNNP) model. For each method, we derive absolute stability criteria of the spatially discretized monodomain model and verify that the theoretical critical time-steps obtained closely match the ones in numerical experiments. We also verify that the numerical methods achieve an optimal order of convergence on the model variables and derived quantities (such as speed of the wave, depolarization time), and this in spite of the local non-differentiability of some of the ionic models. The efficiency of the different methods is also considered by comparing computational times for similar accuracy. Conclusions are drawn on the methods to be used to solve the monodomain model based on the model stiffness and complexity, measured respectively by the eigenvalues of the model's Jacobian and the number of variables, and based on strict stability and accuracy criteria.

Internet boards are platforms for online discussions about a variety of topics. On these boards, individuals may start a new thread on a specific matter, or leave comments in an existing discussion. The resulting collective process leads to the formation of discussion trees', where nodes represent a post and comments, and an edge represents a reply-to' relation. The structure of discussion trees has been analysed in previous works, but only from a static perspective. In this paper, we focus on their structural and dynamical properties by modelling their formation as a self-exciting Hawkes process. We first study a Reddit dataset to show that the structure of the trees resemble those produced by a Galton-Watson process with a special root offspring distribution. The dynamical aspect of the model is then used to predict future commenting activity and the final size of a discussion tree. We compare the efficiency of our approach with previous works and show its superiority for the prediction of the dynamics of discussions.

Skin detection is the process of discriminating skin and non-skin regions in a digital image and it is widely used in several applications ranging from hand gesture analysis to track body parts and face detection. Skin detection is a challenging problem which has drawn extensive attention from the research community, nevertheless a fair comparison among approaches is very difficult due to the lack of a common benchmark and a unified testing protocol. In this work, we investigate the most recent researches in this field and we propose a fair comparison among approaches using several different datasets. The major contributions of this work are an exhaustive literature review of skin color detection approaches, a framework to evaluate and combine different skin detector approaches, whose source code is made freely available for future research, and an extensive experimental comparison among several recent methods which have also been used to define an ensemble that works well in many different problems. Experiments are carried out in 10 different datasets including more than 10000 labelled images: experimental results confirm that the best method here proposed obtains a very good performance with respect to other stand-alone approaches, without requiring ad hoc parameter tuning. A MATLAB version of the framework for testing and of the methods proposed in this paper will be freely available from https://github.com/LorisNanni

We give the first polynomial-time algorithm for performing linear or polynomial regression resilient to adversarial corruptions in both examples and labels.

Given a sufficiently large (polynomial-size) training set drawn i.i.d. from distribution D and subsequently corrupted on some fraction of points, our algorithm outputs a linear function whose squared error is close to the squared error of the best-fitting linear function with respect to D, assuming that the marginal distribution of D over the input space is \emph{certifiably hypercontractive}. This natural property is satisfied by many well-studied distributions such as Gaussian, strongly log-concave distributions and, uniform distribution on the hypercube among others. We also give a simple statistical lower bound showing that some distributional assumption is necessary to succeed in this setting.

These results are the first of their kind and were not known to be even information-theoretically possible prior to our work.

Our approach is based on the sum-of-squares (SoS) method and is inspired by the recent applications of the method for parameter recovery problems in unsupervised learning. Our algorithm can be seen as a natural convex relaxation of the following conceptually simple non-convex optimization problem: find a linear function and a large subset of the input corrupted sample such that the least squares loss of the function over the subset is minimized over all possible large subsets.

Reviews of products or services on Internet marketplace websites contain a rich amount of information. Users often wish to survey reviews or review snippets from the perspective of a certain aspect, which has resulted in a large body of work on aspect identification and extraction from such corpora. In this work, we evaluate a newly-proposed neural model for aspect extraction on two practical tasks. The first is to extract canonical sentences of various aspects from reviews, and is judged by human evaluators against alternatives. A $k$-means baseline does remarkably well in this setting. The second experiment focuses on the suitability of the recovered aspect distributions to represent users by the reviews they have written. Through a set of review reranking experiments, we find that aspect-based profiles can largely capture notions of user preferences, by showing that divergent users generate markedly different review rankings.

Academic performance is perceived as a product of complex interactions between students' overall experience, personal characteristics and upbringing. Data science techniques, most commonly involving regression analysis and related approaches, serve as a viable means to explore this interplay. However, these tend to extract factors with wide-ranging impact, while overlooking variations specific to individual students. Focusing on each student's peculiarities is generally impossible with thousands or even hundreds of subjects, yet data mining methods might prove effective in devising more targeted approaches. For instance, subjects with shared characteristics can be assigned to clusters, which can then be examined separately with machine learning algorithms, thereby providing a more nuanced view of the factors affecting individuals in a particular group. In this context, we introduce a data science workflow allowing for fine-grained analysis of academic performance correlates that captures the subtle differences in students' sensitivities to these factors. Leveraging the Local Interpretable Model-Agnostic Explanations (LIME) algorithm from the toolbox of Explainable Artificial Intelligence (XAI) techniques, the proposed pipeline yields groups of students having similar academic attainment indicators, rather than similar features (e.g. familial background) as typically practiced in prior studies. As a proof-of-concept case study, a rich longitudinal dataset is selected to evaluate the effectiveness of the proposed approach versus a standard regression model.

Compressed Sensing using $\ell_1$ regularization is among the most powerful and popular sparsification technique in many applications, but why has it not been used to obtain sparse deep learning model such as convolutional neural network (CNN)? This paper is aimed to provide an answer to this question and to show how to make it work. We first demonstrate that the commonly used stochastic gradient decent (SGD) and variants training algorithm is not an appropriate match with $\ell_1$ regularization and then replace it with a different training algorithm based on a regularized dual averaging (RDA) method. RDA was originally designed specifically for convex problem, but with new theoretical insight and algorithmic modifications (using proper initialization and adaptivity), we have made it an effective match with $\ell_1$ regularization to achieve a state-of-the-art sparsity for CNN compared to other weight pruning methods without compromising accuracy (achieving 95\% sparsity for ResNet18 on CIFAR-10, for example).

The fifth generation of cellular networks (5G) will rely on edge cloud deployments to satisfy the ultra-low latency demand of future applications. In this paper, we argue that such deployments can also be used to enable advanced data-driven and Machine Learning (ML) applications in mobile networks. We propose an edge-controller-based architecture for cellular networks and evaluate its performance with real data from hundreds of base stations of a major U.S. operator. In this regard, we will provide insights on how to dynamically cluster and associate base stations and controllers, according to the global mobility patterns of the users. Then, we will describe how the controllers can be used to run ML algorithms to predict the number of users in each base station, and a use case in which these predictions are exploited by a higher-layer application to route vehicular traffic according to network Key Performance Indicators (KPIs). We show that the prediction accuracy improves when based on machine learning algorithms that rely on the controllers' view and, consequently, on the spatial correlation introduced by the user mobility, with respect to when the prediction is based only on the local data of each single base station.

In the Shortest Common Superstring (SCS) problem, one is given a collection of strings, and needs to find a shortest string containing each of them as a substring. SCS admits $2\frac{11}{23}$-approximation in polynomial time (Mucha, SODA'13). While this algorithm and its analysis are technically involved, the 30 years old Greedy Conjecture claims that the trivial and efficient Greedy Algorithm gives a 2-approximation for SCS.

We develop a graph-theoretic framework for studying approximation algorithms for SCS. The framework is reminiscent of the classical 2-approximation for Traveling Salesman: take two copies of an optimal solution, apply a trivial edge-collapsing procedure, and get an approximate solution. In this framework, we observe two surprising properties of SCS solutions, and we conjecture that they hold for all input instances. The first conjecture, that we call Collapsing Superstring conjecture, claims that there is an elementary way to transform any solution repeated twice into the same graph $G$. This conjecture would give an elementary 2-approximate algorithm for SCS. The second conjecture claims that not only the resulting graph $G$ is the same for all solutions, but that $G$ can be computed by an elementary greedy procedure called Greedy Hierarchical Algorithm.

While the second conjecture clearly implies the first one, perhaps surprisingly we prove their equivalence. We support these equivalent conjectures by giving a proof for the special case where all input strings have length at most 3. We prove that the standard Greedy Conjecture implies Greedy Hierarchical Conjecture, while the latter is sufficient for an efficient greedy 2-approximate approximation of SCS. Except for its (conjectured) good approximation ratio, the Greedy Hierarchical Algorithm provably finds a 3.5-approximation.

We prove that a minor-closed class of graphs has bounded layered pathwidth if and only if some apex-forest is not in the class. This generalises a theorem of Robertson and Seymour, which says that a minor-closed class of graphs has bounded pathwidth if and only if some forest is not in the class.

Online forums provide rich environments where users may post questions and comments about different topics. Understanding how people behave in online forums may shed light on the fundamental mechanisms by which collective thinking emerges in a group of individuals, but it has also important practical applications, for instance to improve user experience, increase engagement or automatically identify bullying. Importantly, the datasets generated by the activity of the users are often openly available for researchers, in contrast to other sources of data in computational social science. In this survey, we map the main research directions that arose in recent years and focus primarily on the most popular platform, Reddit. We distinguish and categorise research depending on their focus on the posts or on the users, and point to different types of methodologies to extract information from the structure and dynamics of the system. We emphasize the diversity and richness of the research in terms of questions and methods, and suggest future avenues of research.

Periodic event-triggered control (PETC) evaluates the triggering rule periodically and is well-suited for implementation on digital platforms. This paper investigates PETC design for nonlinear systems affected by external disturbances under the impulsive system formulation. Sufficient conditions are provided to ensure the input-to-state stability of the resulting closed-loop system for the state feedback and the observer-based output feedback configurations separately. For each configuration, the sampling period and the triggering functions are provided explicitly. Sufficient conditions in the form of linear matrix inequalities are provided for the PETC design of incrementally quadratic nonlinear systems. Two examples are given to illustrate the effectiveness of the proposed method.

Due to the complexity of modeling the elastic properties of materials, the use of machine learning algorithms is continuously increasing for tactile sensing applications. Recent advances in deep neural networks applied to computer vision make vision-based tactile sensors very appealing for their high-resolution and low cost. A soft optical tactile sensor that is scalable to large surfaces with arbitrary shape is discussed in this paper. A supervised learning algorithm trains a model that is able to reconstruct the normal force distribution on the sensor's surface, purely from the images recorded by an internal camera. In order to reduce the training times and the need for large datasets, a calibration procedure is proposed to transfer the acquired knowledge across multiple sensors while maintaining satisfactory performance.

A simple method to produce a random order type is to take the order type of a random point set. We conjecture that many probability distributions on order types defined in this way are heavily concentrated and therefore sample inefficiently the space of order types. We present two results on this question. First, we study experimentally the bias in the order types of $n$ random points chosen uniformly and independently in a square, for $n$ up to $16$. Second, we study algorithms for determining the order type of a point set in terms of the number of coordinate bits they require to know. We give an algorithm that requires on average $4n \log\_2 n+O(n)$ bits to determine the order type of $P$, and show that any algorithm requires at least $4n \log\_2 n - O(n \log\log n)$ bits. This implies that the concentration conjecture cannot be proven by an "efficient encoding" argument.

We develop new methods for the stabilization (stability analysis) of a linear system with general time-varying distributed delays existing at the system's states, inputs and outputs. In contrast to most existing results where the time-varying delay is assumed to be continuous, we assume the delay function to be bounded and measurable. Furthermore, the distributed delay kernels of our system's model can be any $\fL^2$ function over a bounded interval, where the kernels are handled directly by using a decomposition scenario without using approximations of any kind. By constructing a Krasovski\u{i} functional via the application of a novel integral inequality, sufficient conditions for the existence of a dissipative state feedback controller are derived in terms of matrix inequalities without utilizing the reciprocally convex combination lemmas. The proposed synthesis (stability) conditions, which take dissipativity into account, can be either solved directly by a standard numerical solver of semidefinite programming if they are convex, or reshaped into linear matrix inequalities, or solved via a proposed iterative algorithm. To the best of our knowledge, no existing methods can handle the synthesis problem investigated in this paper. Finally, numerical examples are presented to demonstrate the effectiveness of the proposed methodologies.

Deep Convolutional Neural Networks (DCNNs) are used extensively in medical image segmentation and hence 3D navigation for robot-assisted Minimally Invasive Surgeries (MISs). However, current DCNNs usually use down sampling layers for increasing the receptive field and gaining abstract semantic information. These down sampling layers decrease the spatial dimension of feature maps, which can be detrimental to image segmentation. Atrous convolution is an alternative for the down sampling layer. It increases the receptive field whilst maintains the spatial dimension of feature maps. In this paper, a method for effective atrous rate setting is proposed to achieve the largest and fully-covered receptive field with a minimum number of atrous convolutional layers. Furthermore, a new and full resolution DCNN - Atrous Convolutional Neural Network (ACNN), which incorporates cascaded atrous II-blocks, residual learning and Instance Normalization (IN) is proposed. Application results of the proposed ACNN to Magnetic Resonance Imaging (MRI) and Computed Tomography (CT) image segmentation demonstrate that the proposed ACNN can achieve higher segmentation Intersection over Unions (IoUs) than U-Net and Deeplabv3+, but with reduced trainable parameters.

This articles first investigates boundary integral operators for the three-dimensional isotropic linear elasticity of a biphasic model with piecewise constant Lam\'e coefficients in the form of a bounded domain of arbitrary shape surrounded by a background material. In the simple case of a spherical inclusion, the vector spherical harmonics consist of eigenfunctions of the single and double layer boundary operators and we provide their spectra. Further, in the case of many spherical inclusions with isotropic materials, each with its own set of Lam\'e parameters, we propose an integral equation and a subsequent Galerkin discretization using the vector spherical harmonics and apply the discretization to several numerical test cases.

We examine some combinatorial properties of parallel cut elimination in multiplicative linear logic (MLL) proof nets. We show that, provided we impose a constraint on some paths, we can bound the size of all the nets satisfying this constraint and reducing to a fixed resultant net. This result gives a sufficient condition for an infinite weighted sum of nets to reduce into another sum of nets, while keeping coefficients finite. We moreover show that our constraints are stable under reduction.

Our approach is motivated by the quantitative semantics of linear logic: many models have been proposed, whose structure reflect the Taylor expansion of multiplicative exponential linear logic (MELL) proof nets into infinite sums of differential nets. In order to simulate one cut elimination step in MELL, it is necessary to reduce an arbitrary number of cuts in the differential nets of its Taylor expansion. It turns out our results apply to differential nets, because their cut elimination is essentially multiplicative. We moreover show that the set of differential nets that occur in the Taylor expansion of an MELL net automatically satisfies our constraints.

Interestingly, our nets are untyped: we only rely on the sequentiality of linear logic nets and the dynamics of cut elimination. The paths on which we impose bounds are the switching paths involved in the Danos--Regnier criterion for sequentiality. In order to accommodate multiplicative units and weakenings, our nets come equipped with jumps: each weakening node is connected to some other node. Our constraint can then be summed up as a bound on both the length of switching paths, and the number of weakenings that jump to a common node.

We prove a theorem concerning the approximation of bandlimited multivariate functions by deep ReLU networks for which the curse of the dimensionality is overcome. Our theorem is based on a result by Maurey and on the ability of deep ReLU networks to approximate Chebyshev polynomials and analytic functions efficiently.

The standard solution concept for stochastic games is Markov perfect equilibrium (MPE); however, its computation becomes intractable as the number of players increases. Instead, we consider mean field equilibrium (MFE) that has been popularized in the recent literature. MFE takes advantage of averaging effects in models with a large number of players. We make three main contributions. First, our main result provides conditions that ensure the uniqueness of an MFE. We believe this uniqueness result is the first of its nature in the class of models we study. Second, we generalize previous MFE existence results. Third, we provide general comparative statics results. We apply our results to dynamic oligopoly models and to heterogeneous agent macroeconomic models commonly used in previous work in economics and operations.

It has long been recognized as a difficult problem to determine whether the observed statistical correlation between two classical variables arise from causality or from common causes. Recent research has shown that in quantum theoretical framework, the mechanisms of entanglement and quantum coherence provide an advantage in tackling this problem. In some particular cases, quantum common causes and quantum causality can be effectively distinguished using observations only. However, these solutions do not apply to all cases. There still exist enormous cases in which quantum common causes and quantum causality can not be distinguished. In this paper, along the line of considering unitary transformation as causality in the quantum world, we formally show quantum common causes and quantum causality are universally separable. Based on the analysis, we further provide a general method to discriminate the two.

We consider a linear elliptic partial differential equation (PDE) with a generic uniformly bounded parametric coefficient. The solution to this PDE problem is approximated in the framework of stochastic Galerkin finite element methods. We perform a posteriori error analysis of Galerkin approximations and derive a reliable and efficient estimate for the energy error in these approximations. Practical versions of this error estimate are discussed and tested numerically for a model problem with non-affine parametric representation of the coefficient. Furthermore, we use the error reduction indicators derived from spatial and parametric error estimators to guide an adaptive solution algorithm for the given parametric PDE problem. The performance of the adaptive algorithm is tested numerically for model problems with two different non-affine parametric representations of the coefficient.

In big cloud structures or large data structures, fog computing could be interpreted, referring critically to the growing issues and problems in accessing the information among the Internet of things (IoT) devices. Fog computing can be used to compute, store, control and connect smart devices to each other. IoT is an architecture of uniquely identified interrelated physical things, these physical things are able to communicate with each other and can transmit and receive information. This research presents a framework of the combination of the Internet of Things (IoT) and Fog computing. The blockchain is also the emerging technology that provides a hyper, distributed, public, authentic ledger to record the transactions. Blockchains technology is a secure technology that can be a great benefit to the next generation computing. The confluence of fog, blockchains, and IoT in this area introduces a new incentive. In this research work, the author mentions the convergence of blockchain, fog and IoT technological innovations to present an effective communication framework. The framework is implemented and tested using different scenarios.

We study the quantum complexity of time evolution in large-$N$ chaotic systems, with the SYK model as our main example. This complexity is expected to increase linearly for exponential time prior to saturating at its maximum value, and is related to the length of minimal geodesics on the manifold of unitary operators that act on Hilbert space. Using the Euler-Arnold formalism, we demonstrate that there is always a geodesic between the identity and the time evolution operator $e^{-iHt}$ whose length grows linearly with time. This geodesic is minimal until there is an obstruction to its minimality, after which it can fail to be a minimum either locally or globally. We identify a criterion - the Eigenstate Complexity Hypothesis (ECH) - which bounds the overlap between off-diagonal energy eigenstate projectors and the $k$-local operators of the theory, and use it to show that the linear geodesic will at least be a local minimum for exponential time. We show numerically that the large-$N$ SYK model (which is chaotic) satisfies ECH and thus has no local obstructions to linear growth of complexity for exponential time, as expected from holographic duality. In contrast, we also study the case with $N=2$ fermions (which is integrable) and find short-time linear complexity growth followed by oscillations. Our analysis relates complexity to familiar properties of physical theories like their spectra and the structure of energy eigenstates and has implications for the hypothesized computational complexity class separations PSPACE $\nsubseteq$ BQP/poly and PSPACE $\nsubseteq$ BQSUBEXP/subexp, and the "fast-forwarding" of quantum Hamiltonians.

In this paper, we propose a practical structured constellation for non-coherent communication with a single transmit antenna over Rayleigh flat and block fading channel without instantaneous channel state information. The constellation symbols belong to the Grassmannian of lines and are defined up to a complex scaling. The constellation is generated by partitioning the Grassmannian of lines into a collection of bent hypercubes and defining a mapping onto each of these bent hypercubes such that the resulting symbols are approximately uniformly distributed on the Grassmannian. With a reasonable choice of parameters, this so-called cube-split constellation has higher packing efficiency, represented by the minimum distance, than the existing structured constellations. Furthermore, exploiting the constellation structure, we propose low-complexity greedy symbol decoder and log-likelihood ratio computation, as well as an efficient way to associate it to a multilevel code with multistage decoding. Numerical results show that the performance of the cube-split constellation is close to that of a numerically optimized constellation and better than other structured constellations. It also outperforms a coherent pilot-based scheme in terms of error probability and achievable data rate in the regime of short coherence time and large constellation size.

A popular run-time attack technique is to compromise the control-flow integrity of a program by modifying function return addresses on the stack. So far, shadow stacks have proven to be essential for comprehensively preventing return address manipulation. Shadow stacks record return addresses in integrity-protected memory secured with hardware-assistance or software access control. Software shadow stacks incur high overheads or trade off security for efficiency. Hardwareassisted shadow stacks are efficient and secure, but require the deployment of special-purpose hardware.

We present authenticated call stack (ACS), an approach that uses chained message authentication codes (MACs). Our prototype, PACStack , uses the ARMv8.3-A general purpose hardware mechanism for pointer authentication (PA) to implement ACS. Via a rigorous security analysis, we show that PACStack achieves security comparable to hardware-assisted shadow stacks without requiring dedicated hardware. We demonstrate that PACStack's performance overhead is small (~3%).

We consider the non-coherent single-input multiple-output (SIMO) multiple access channel with general signaling under spatially correlated Rayleigh block fading. We propose a novel soft-output multi-user detector that computes an approximate marginal posterior of each transmitted signal using only the knowledge about the channel distribution. Our detector is based on expectation propagation (EP) approximate inference and has polynomial complexity in the number of users, the number of receive antennas, and channel coherence time. We also propose two simplifications of this detector with reduced complexity. With Grassmannian signaling, the proposed detectors outperform a state-of-the-art non-coherent detector with projection-based interference mitigation. With pilot-assisted signaling, the EP detector outperforms, in terms of symbol error rate, some conventional coherent pilot-based detectors, including a sphere decoder and a joint channel estimation-data detection scheme. Our EP-based detectors produce accurate approximates of the true posterior leading to high achievable sum-rates. The gains of these detectors are further observed in terms of the bit error rate when using their soft outputs for a turbo channel decoder.

Neural Networks (NNs) have been extensively used for a wide spectrum of real-world regression tasks, where the goal is to predict a numerical outcome such as revenue, effectiveness, or a quantitative result. In many such tasks, the point prediction is not enough: the uncertainty (i.e. risk or confidence) of that prediction must also be estimated. Standard NNs, which are most often used in such tasks, do not provide uncertainty information. Existing approaches address this issue by combining Bayesian models with NNs, but these models are hard to implement, more expensive to train, and usually do not predict as accurately as standard NNs. In this paper, a new framework (RIO) is developed that makes it possible to estimate uncertainty in any pretrained standard NN. The behavior of the NN is captured by modeling its prediction residuals with a Gaussian Process, whose kernel includes both the NN's input and its output. The framework is evaluated in twelve real-world datasets, where it is found to (1) provide reliable estimates of uncertainty, (2) reduce the error of the point predictions, and (3) scale well to large datasets. Given that RIO can be applied to any standard NN without modifications to model architecture or training pipeline, it provides an important ingredient for building real-world NN applications.

We show that the number of length-n words over a k-letter alphabet having no even palindromic prefix is the same as the number of length-n unbordered words, by constructing an explicit bijection between the two sets. A slightly different but analogous result holds for those words having no odd palindromic prefix. Using known results on borders, we get an asymptotic enumeration for the number of words having no even (resp., odd) palindromic prefix . We obtain an analogous result for words having no nontrivial palindromic prefix. Finally, we obtain similar results for words having no square prefix, thus proving a 2013 conjecture of Chaffin, Linderman, Sloane, and Wilks.

Noise modeling lies in the heart of many image processing tasks. However, existing deep learning methods for noise modeling generally require clean and noisy image pairs for model training; these image pairs are difficult to obtain in many realistic scenarios. To ameliorate this problem, we propose a self-consistent GAN (SCGAN), that can directly extract noise maps from noisy images, thus enabling unsupervised noise modeling. In particular, the SCGAN introduces three novel self-consistent constraints that are complementary to one another, viz.: the noise model should produce a zero response over a clean input; the noise model should return the same output when fed with a specific pure noise input; and the noise model also should re-extract a pure noise map if the map is added to a clean image. These three constraints are simple yet effective. They jointly facilitate unsupervised learning of a noise model for various noise types. To demonstrate its wide applicability, we deploy the SCGAN on three image processing tasks including blind image denoising, rain streak removal, and noisy image super-resolution. The results demonstrate the effectiveness and superiority of our method over the state-of-the-art methods on a variety of benchmark datasets, even though the noise types vary significantly and paired clean images are not available.

State-of-the-art simulations of detailed neural models follow the Bulk Synchronous Parallel execution model. Execution is divided in equidistant communication intervals, equivalent to the shortest synaptic delay in the network. Neurons stepping is performed independently, with collective communication guiding synchronization and exchange of synaptic events.

The interpolation step size is fixed and chosen based on some prior knowledge of the fastest possible dynamics in the system. However, simulations driven by stiff dynamics or a wide range of time scales - such as multiscale simulations of neural networks - struggle with fixed step interpolation methods, yielding excessive computation of intervals of quasi-constant activity, inaccurate interpolation of periods of high volatility solution, and being incapable of handling unknown or distinct time constants. A common alternative is the usage of adaptive stepping methods, however they have been deemed inefficient in parallel executions due to computational load imbalance at the synchronization barriers that characterize the BSP execution model.

We introduce a distributed fully-asynchronous execution model that removes global synchronization, allowing for longer variable timestep interpolations. Asynchronicity is provided by active point-to-point communication notifying neurons' time advancement to synaptic connectivities. Time stepping is driven by scheduled neuron advancements based on synaptic delays across neurons, yielding an "exhaustive yet not speculative" adaptive-step execution. Execution benchmarks on 64 Cray XE6 compute nodes demonstrate a reduced number of interpolation steps, higher numerical accuracy and lower time to solution, compared to state-of-the-art methods. Efficiency is shown to be activity-dependent, with scaling of the algorithm demonstrated on a simulation of a laboratory experiment.

We consider the problem of completely covering an unknown discrete environment with a swarm of asynchronous, frequently-crashing autonomous mobile robots. We represent the environment by a discrete graph, and task the robots with occupying every vertex and with constructing an implicit distributed spanning tree of the graph. The robotic agents activate independently at random exponential waiting times of mean $1$ and enter the graph environment over time from a source location. They grow the environment's coverage by 'settling' at empty locations and aiding other robots' navigation from these locations. The robots are identical and make decisions driven by the same simple and local rule of behaviour. The local rule is based only on the presence of neighbouring robots, and on whether a settled robot points to the current location. Whenever a robot moves, it may crash and disappear from the environment. Each vertex in the environment has limited physical space, so robots frequently obstruct each other.

Our goal is to show that even under conditions of asynchronicity, frequent crashing, and limited physical space, the simple mobile robots complete their mission in linear time asymptotically almost surely, and time to completion degrades gracefully with the frequency of the crashes. Our model and analysis are based on the well-studied "totally asymmetric simple exclusion process" in statistical mechanics.

Overlap between treatment groups is required for non-parametric estimation of causal effects. If a subgroup of subjects always receives the same intervention, we cannot estimate the effect of intervention changes on that subgroup without further assumptions. When overlap does not hold globally, characterizing local regions of overlap can inform the relevance of causal conclusions for new subjects, and can help guide additional data collection. To have impact, these descriptions must be interpretable for downstream users who are not machine learning experts, such as policy makers. We formalize overlap estimation as a problem of finding minimum volume sets subject to coverage constraints and reduce this problem to binary classification with Boolean rule classifiers. We then generalize this method to estimate overlap in off-policy policy evaluation. In several real-world applications, we demonstrate that these rules have comparable accuracy to black-box estimators and provide intuitive and informative explanations that can inform policy making.

In this paper we show that a large class of parallel server networks, with $\sqrt{n}$ safety staffing, and no abandonment, in the Halfin-Whitt regime are exponentially ergodic and their invariant probability distributions are tight, uniformly over all stationary Markov controls. This class consists of all networks of tree topology with a single non-leaf server pool, such as the 'N' and 'M' models, as well as networks of any tree topology with class-dependent service rates.

We first define a parameter which characterizes the spare capacity (safety staffing) of the network. If the spare capacity parameter is negative, we show that the controlled diffusion is transient under any stationary Markov control, and that it cannot be positive recurrent when this parameter is zero. Then we show that the limiting diffusion is uniformly exponentially ergodic over all stationary Markov controls if this parameter is positive.

As well known, joint work conservation, that is, keeping all servers busy unless all queues are empty, cannot be always enforced in multiclass multi-pool networks, and as a result the diffusion limit and the prelimit do not "match" on the entire state space. For this reason, we introduce the concept of "system-wide work conserving policies," which are defined as policies that minimize the number of idle servers at all times. We show that, provided the spare capacity parameter is positive, the diffusion-scaled processes are geometrically ergodic and the invariant distributions are tight, uniformly over all system-wide work conserving policies. In addition, when the spare capacity is negative we show that the diffusion-scaled processes are transient under any stationary Markov control, and when it is zero, they cannot be positive recurrent. We use a unified approach in which the same Lyapunov function is used in the study of the prelimit and diffusion limit.

In much of the literature on function approximation by deep networks, the function is assumed to be defined on some known domain, such as a cube or a sphere. In practice, the data might not be dense on these domains, and therefore, the approximation theory results are observed to be too conservative. In manifold learning, one assumes instead that the data is sampled from an unknown manifold; i.e., the manifold is defined by the data itself. Function approximation on this unknown manifold is then a two stage procedure: first, one approximates the Laplace-Beltrami operator (and its eigen-decomposition) on this manifold using a graph Laplacian, and next, approximates the target function using the eigen-functions. Alternatively, one estimates first some atlas on the manifold and then uses local approximation techniques based on the local coordinate charts.

In this paper, we propose a more direct approach to function approximation on \emph{unknown}, data defined manifolds without computing the eigen-decomposition of some operator or an atlas for the manifold, and estimate the degree of approximation. Our constructions are universal; i.e., do not require the knowledge of any prior on the target function other than continuity on the manifold. For smooth functions, the estimates do not suffer from the so-called saturation phenomenon. We demonstrate via a property called good propagation of errors how the results can be lifted for function approximation using deep networks where each channel evaluates a Gaussian network on a possibly unknown manifold.

We propose a numerical method for solving high dimensional fully nonlinear partial differential equations (PDEs). Our algorithm estimates simultaneously by backward time induction the solution and its gradient by multi-layer neural networks, while the Hessian is approximated by automatic differentiation of the gradient at previous step. This methodology extends to the fully nonlinear case the approach recently proposed in \cite{HPW19} for semi-linear PDEs. Numerical tests illustrate the performance and accuracy of our method on several examples in high dimension with nonlinearity on the Hessian term including a linear quadratic control problem with control on the diffusion coefficient, Monge-Amp{\e}re equation and Hamilton-Jacobi-Bellman equation in portfolio optimization.

Reinforcement learning can interact with the environment and is suitable for applications in decision control systems. Therefore, we used the reinforcement learning method to establish a foreign exchange transaction, avoiding the long-standing problem of unstable trends in deep learning predictions. In the system design, we optimized the Sure-Fire statistical arbitrage policy, set three different actions, encoded the continuous price over a period of time into a heat-map view of the Gramian Angular Field (GAF) and compared the Deep Q Learning (DQN) and Proximal Policy Optimization (PPO) algorithms. To test feasibility, we analyzed three currency pairs, namely EUR/USD, GBP/USD, and AUD/USD. We trained the data in units of four hours from 1 August 2018 to 30 November 2018 and tested model performance using data between 1 December 2018 and 31 December 2018. The test results of the various models indicated that favorable investment performance was achieved as long as the model was able to handle complex and random processes and the state was able to describe the environment, validating the feasibility of reinforcement learning in the development of trading strategies.

In this work, we investigate a variational formulation for a time-fractional Fokker-Planck equation which arises in the study of complex physical systems involving anomalously slow diffusion. The model involves a fractional-order Caputo derivative in time, and thus inherently nonlocal. The study follows the Wasserstein gradient flow approach pioneered by [26]. We propose a JKO type scheme for discretizing the model, using the L1 scheme for the Caputo fractional derivative in time, and establish the convergence of the scheme as the time step size tends to zero. Illustrative numerical results in one- and two-dimensional problems are also presented to show the approach.

Pretrained language models are promising particularly for low-resource languages as they only require unlabelled data. However, training existing models requires huge amounts of compute, while pretrained cross-lingual models often underperform on low-resource languages. We propose Multi-lingual language model Fine-Tuning (MultiFiT) to enable practitioners to train and fine-tune language models efficiently in their own language. In addition, we propose a zero-shot method using an existing pretrained cross-lingual model. We evaluate our methods on two widely used cross-lingual classification datasets where they outperform models pretrained on orders of magnitude more data and compute. We release all models and code.

This paper presents the design and performance of a screw-propelled redundant serpentine robot. This robot comprises serially linked, identical modules, each incorporating an Archimedes' screw for propulsion and a universal joint (U-Joint) for orientation control. When serially chained, these modules form a versatile snake robot platform which enables the robot to reshape its body configuration for varying environments and gait patterns that would be typical of snake movement. Furthermore, the Archimedes' screws allow for novel omni-wheel drive-like motions by speed controlling their screw threads. This paper considers the mechanical and electrical design, as well as the software architecture for realizing a fully integrated system. The system includes 3$N$ actuators for $N$ segments, each controlled using a BeagleBone Black with a customized power-electronics cape, a 9 Degrees of Freedom (DoF) Inertial Measurement Unit (IMU), and a scalable communication channel over ROS. The intended application for this robot is its use as an instrumentation mobility platform on terrestrial planets where the terrain may involve vents, caves, ice, and rocky surfaces. Additional experiments are shown on our website.

This paper introduces a new nonlinear observer for state estimation of linear time invariant systems. The proposed observer contains a (nonlinear) cubic term in its error dynamics.

"For the final version of this article, please refer to the published version in Int. Jr. of Adaptive Control and Signal Processing. The differences between the published and arxiv version are substantial and significant"

We present a framework for data-driven robotics that makes use of a large dataset of recorded robot experience and scales to several tasks using learned reward functions. We show how to apply this framework to accomplish three different object manipulation tasks on a real robot platform. Given demonstrations of a task together with task-agnostic recorded experience, we use a special form of human annotation as supervision to learn a reward function, which enables us to deal with real-world tasks where the reward signal cannot be acquired directly. Learned rewards are used in combination with a large dataset of experience from different tasks to learn a robot policy offline using batch RL. We show that using our approach it is possible to train agents to perform a variety of challenging manipulation tasks including stacking rigid objects and handling cloth.

This paper presents for the first time, to our knowledge, a framework for verifying neural network behavior in power system applications. Up to this moment, neural networks have been applied in power systems as a black-box; this has presented a major barrier for their adoption in practice. Developing a rigorous framework based on mixed integer linear programming, our methods can determine the range of inputs that neural networks classify as safe or unsafe, and are able to systematically identify adversarial examples. Such methods have the potential to build the missing trust of power system operators on neural networks, and unlock a series of new applications in power systems. This paper presents the framework, methods to assess and improve neural network robustness in power systems, and addresses concerns related to scalability and accuracy. We demonstrate our methods on the IEEE 9-bus, 14-bus, and 162-bus systems, treating both N-1 security and small-signal stability.

The distributional perspective on reinforcement learning (RL) has given rise to a series of successful Q-learning algorithms, resulting in state-of-the-art performance in arcade game environments. However, it has not yet been analyzed how these findings from a discrete setting translate to complex practical applications characterized by noisy, high dimensional and continuous state-action spaces. In this work, we propose Quantile QT-Opt (Q2-Opt), a distributional variant of the recently introduced distributed Q-learning algorithm for continuous domains, and examine its behaviour in a series of simulated and real vision-based robotic grasping tasks. The absence of an actor in Q2-Opt allows us to directly draw a parallel to the previous discrete experiments in the literature without the additional complexities induced by an actor-critic architecture. We demonstrate that Q2-Opt achieves a superior vision-based object grasping success rate, while also being more sample efficient. The distributional formulation also allows us to experiment with various risk distortion metrics that give us an indication of how robots can concretely manage risk in practice using a Deep RL control policy. As an additional contribution, we perform batch RL experiments in our virtual environment and compare them with the latest findings from discrete settings. Surprisingly, we find that the previous batch RL findings from the literature obtained on arcade game environments do not generalise to our setup.

We study the vertex classification problem on a graph whose vertices are in $k\ (k\geq 2)$ different communities, edges are only allowed between distinct communities, and the number of vertices in different communities are not necessarily equal. The observation is a weighted adjacency matrix, perturbed by a scalar multiple of the Gaussian Orthogonal Ensemble (GOE), or Gaussian Unitary Ensemble (GUE) matrix. For the exact recovery of the maximum likelihood estimation (MLE) with various weighted adjacency matrices, we prove sharp thresholds of the intensity $\sigma$ of the Gaussian perturbation. These weighted adjacency matrices may be considered as natural models for the electric network. Surprisingly, these thresholds of $\sigma$ do not depend on whether the sample space for MLE is restricted to such classifications that the number of vertices in each group is equal to the true value. In contrast to the $\ZZ_2$-synchronization, a new complex version of the semi-definite programming (SDP) is designed to efficiently implement the community detection problem when the number of communities $k$ is greater than 2, and a common region (independent of $k$) for $\sigma$ such that SDP exactly recovers the true classification is obtained.

Bayesian optimization provides sample-efficient global optimization for a broad range of applications, including automatic machine learning, engineering, physics, and experimental design. We introduce BoTorch, a modern programming framework for Bayesian optimization that combines Monte-Carlo (MC) acquisition functions, a novel sample average approximation optimization approach, auto-differentiation, and variance reduction techniques. BoTorch's modular design facilitates flexible specification and optimization of probabilistic models written in PyTorch, simplifying implementation of new acquisition functions. Our approach is backed by novel theoretical convergence results and made practical by a distinctive algorithmic foundation that leverages fast predictive distributions, hardware acceleration, and deterministic optimization. In experiments, we demonstrate the improved sample efficiency of BoTorch relative to other popular libraries.

The semantic image segmentation task consists of classifying each pixel of an image into an instance, where each instance corresponds to a class. This task is a part of the concept of scene understanding or better explaining the global context of an image. In the medical image analysis domain, image segmentation can be used for image-guided interventions, radiotherapy, or improved radiological diagnostics. In this review, we categorize the leading deep learning-based medical and non-medical image segmentation solutions into six main groups of deep architectural, data synthesis-based, loss function-based, sequenced models, weakly supervised, and multi-task methods and provide a comprehensive review of the contributions in each of these groups. Further, for each group, we analyze each variant of these groups and discuss the limitations of the current approaches and present potential future research directions for semantic image segmentation.

Attackers can access sensitive information of programs by exploiting the side-effects of speculatively-executed instructions using Spectre attacks. To mitigate theses attacks, major compilers deployed a wide range of countermeasures. However, the security of these countermeasures has not been ascertained yet: while some of them are believed to be secure, others are known to be insecure and result in vulnerable programs. In this paper, we formally prove the security (or insecurity) of compiler-level countermeasures for Spectre. To prove security, we introduce a novel, general methodology built upon recent secure compilation theory. We use this theory to derive secure compilation criteria formalizing when a compiler produces secure code against Spectre attacks. With these criteria, we formally prove that some countermeasures, such as speculative load hardening (SLH), are vulnerable to Spectre attacks and others (like speculation-barriers-insertion and variations of SLH) are secure. This work provides sound foundations to formally reason about the security of compiler-level countermeasures against Spectre attacks as well as the first proofs of security and insecurity of said countermeasures.

Vanilla CNNs, as uncalibrated classifiers, suffer from classifying out-of-distribution (OOD) samples nearly as confidently as in-distribution samples. To tackle this challenge, some recent works have demonstrated the gains of leveraging available OOD sets for training end-to-end calibrated CNNs. However, a critical question remains unanswered in these works: how to differentiate OOD sets for selecting the most effective one(s) that induce training such CNNs with high detection rates on unseen OOD sets? To address this pivotal question, we provide a criterion based on generalization errors of Augmented-CNN, a vanilla CNN with an added extra class employed for rejection, on in-distribution and unseen OOD sets. However, selecting the most effective OOD set by directly optimizing this criterion incurs a huge computational cost. Instead, we propose three novel computationally-efficient metrics for differentiating between OOD sets according to their "protection" level of in-distribution sub-manifolds. We empirically verify that the most protective OOD sets -- selected according to our metrics -- lead to A-CNNs with significantly lower generalization errors than the A-CNNs trained on the least protective ones. We also empirically show the effectiveness of a protective OOD set for training well-generalized confidence-calibrated vanilla CNNs. These results confirm that 1) all OOD sets are not equally effective for training well-performing end-to-end models (i.e., A-CNNs and calibrated CNNs) for OOD detection tasks and 2) the protection level of OOD sets is a viable factor for recognizing the most effective one. Finally, across the image classification tasks, we exhibit A-CNN trained on the most protective OOD set can also detect black-box FGS adversarial examples as their distance (measured by our metrics) is becoming larger from the protected sub-manifolds.

In this paper, we present a nonlinear robust model predictive control (MPC) framework for general (state and input dependent) disturbances. This approach uses an online constructed tube in order to tighten the nominal (state and input) constraints. To facilitate an efficient online implementation, the shape of the tube is based on an offline computed incremental Lyapunov function with a corresponding (nonlinear) incrementally stabilizing feedback. Crucially, the online optimization only implicitly includes these nonlinear functions in terms of scalar bounds, which enables an efficient implementation. Furthermore, to account for an efficient evaluation of the worst case disturbance, a simple function is constructed offline that upper bounds the possible disturbance realizations in a neighbourhood of a given point of the open-loop trajectory. The resulting MPC scheme ensures robust constraint satisfaction and practical asymptotic stability with a moderate increase in the online computational demand compared to a nominal MPC. We demonstrate the applicability of the proposed framework in comparison to state of the art robust MPC approaches with a nonlinear benchmark example. This paper is an extended version of [1], and contains further details and additional considers: continuous-time systems (App. A), more general nonlinear constraints (App. B) and special cases (Sec. IV).

Optimization is becoming increasingly common in scientific and engineering domains. Oftentimes, these problems involve various levels of stochasticity or uncertainty in generating proposed solutions. Therefore, optimization in these scenarios must consider this stochasticity to properly guide the design of future experiments. Here, we adapt Bayesian optimization to handle uncertain outcomes, proposing a new framework called stochastic sampling Bayesian optimization (SSBO). We show that the bounds on expected regret for an upper confidence bound search in SSBO resemble those of earlier Bayesian optimization approaches, with added penalties due to the stochastic generation of inputs. Additionally, we adapt existing batch optimization techniques to properly limit the myopic decision making that can arise when selecting multiple instances before feedback. Finally, we show that SSBO techniques properly optimize a set of standard optimization problems as well as an applied problem inspired by bioengineering.

Recent studies show that 20.4% of the internet traffic originates from automated agents. To identify and block such ill-intentioned traffic, mechanisms that verify the humanness of the user are widely deployed across the internet. CAPTCHA is the most popular among such mechanisms. Original CAPTCHAs require extra user effort (e.g., solving mathematical or image-based puzzles), which severely harms user's experience, especially on mobile, and provide only sporadic verification of their humanness. More recent solutions like Google's reCAPTCHA v3 leverage attestation data (e.g., user behavioral data, device fingerprints) shared with a remote server, thus raising significant privacy concerns. To address all of the above, we present ZKSENSE: the first zero knowledge proof-based humanness attestation system designed for mobile devices. Contrary to state-of-the-art systems, ZKSENSE assesses humanness continuously on the background in a privacy preserving way. ZKSENSE achieves that by classifying the motion sensor outputs of the mobile device based on a model trained by using both publicly available sensor data and data collected from a small group of volunteers. The classification result is enclosed in a zero knowledge proof of humanness that can be safely shared with an attestation service such as Privacy Pass. We implement ZKSENSE as an Android service to demonstrate its effectiveness and practicability. In our evaluation, we show that ZKSENSE verifies the humanness of the users asynchronously, on the background, without degrading their experience or jeopardizing user privacy, while it achieves 91% accuracy across a variety of attack scenarios. On a two years old Samsung S9, each attestation takes around 3 seconds in total (when visual CAPTCHAs need 9.8 seconds) and consumes a negligible amount of battery.

Neural networks have been widely used, and most networks achieve excellent performance by stacking certain types of basic units. Compared to increasing the depth and width of the network, designing more effective basic units has become an important research topic. Inspired by the elastic collision model in physics, we present a universal structure that could be integrated into the existing network structures to speed up the training process and increase their generalization abilities. We term this structure the "Inter-layer Collision" (IC) structure. We built two kinds of basic computational units (IC layer and IC block) that compose the convolutional neural networks (CNNs) by combining the IC structure with the convolution operation. Compared to traditional convolutions, both of the proposed computational units have a stronger non-linear representation ability and can filter features useful for a given task. Using these computational units to build networks, we bring significant improvements in performance for existing state-of-the-art CNNs. On the imagenet experiment, we integrate the IC block into ResNet-50 and reduce the top-1 error from 22.85% to 21.49%, which also exceeds the top-1 error of ResNet-100 (21.75%).

Previous studies have proven that integrating video signals, as a complementary modality, can facilitate improved performance for speech enhancement (SE). However, video clips usually contain large amounts of data and pose a high cost in terms of computational resources and thus may complicate the SE system. As an alternative source, a bone-conducted speech signal has a moderate data size while manifesting speech-phoneme structures, and thus complements its air-conducted counterpart. In this study, we propose a novel multi-modal SE structure in the time domain that leverages bone- and air-conducted signals. In addition, we examine two ensemble-learning-based strategies, early fusion (EF) and late fusion (LF), to integrate the two types of speech signals, and adopt a deep learning-based fully convolutional network to conduct the enhancement. The experiment results on the Mandarin corpus indicate that this newly presented multi-modal (integrating bone- and air-conducted signals) SE structure significantly outperforms the single-source SE counterparts (with a bone- or air-conducted signal only) in various speech evaluation metrics. In addition, the adoption of an LF strategy other than an EF in this novel SE multi-modal structure achieves better results.

The properties of gradient techniques for the phase retrieval problem have received a considerable attention in recent years. In almost all applications, however, the phase retrieval problem is solved using a family of algorithms that can be interpreted as variants of Douglas-Rachford splitting. In this work, we establish a connection between Douglas-Rachford and gradient algorithms. Specifically, we show that in some cases a generalization of Douglas-Rachford, called relaxed-reflect-reflect (RRR), can be viewed as gradient descent on a certain objective function. The solutions coincide with the critical points of that objective, which---in contrast to standard gradient techniques---are not its minimizers. Using the objective function, we give simple proofs of some basic properties of the RRR algorithm. Specifically, we describe its set of solutions, show a local convexity around any solution, and derive stability guarantees. Nevertheless, in its present state, the analysis does not elucidate the remarkable empirical performance of RRR and its global properties.

Life quality in cities is deeply related to the mobility options, and how easily one can access different services and attractions. The pedestrian infrastructure network provides the backbone for social life in cities. While there are many approaches to quantify life quality, most do not take specifically into account the walkability of the city, and rather offer a city-wide measure. Here we develop a data-driven, network-based method to quantify the liveability of a city. We introduce a life quality index (LQI) based on pedestrian accessibility to amenities and services, safety and environmental variables. Our computational approach outlines novel ways to measure life quality in a more granular scale, that can become valuable for urban planners, city officials and stakeholders. We apply data-driven methods to Budapest, but as having an emphasis on the online and easily available quantitative data, the methods can be generalized and applied to any city.

Throughout science and technology, receiver operating characteristic (ROC) curves and associated area under the curve (AUC) measures constitute powerful tools for assessing the predictive abilities of features, markers and tests in binary classification problems. Despite its immense popularity, ROC analysis has been subject to a fundamental restriction, in that it applies to dichotomous (yes or no) outcomes only. Here we introduce ROC movies and universal ROC (UROC) curves that apply to just any ordinal or real-valued outcome, along with a new, asymmetric coefficient of predictive ability (CPA) measure. CPA equals the area under the UROC curve, and admits appealing interpretations in terms of probabilities and rank based covariances. For binary outcomes CPA equals AUC, and for pairwise distinct outcomes CPA relates linearly to Spearman's rank correlation coefficient. ROC movies, UROC curves, and CPA nest and generalize the tools of classical ROC analysis, and are bound to supersede them in a wealth of applications. Their usage is illustrated in data examples from biomedicine and meteorology, where CPA yields new insights in the WeatherBench comparison of the predictive performance of convolutional neural networks and physical-numerical models for weather prediction.

RDF triplestores and property graph databases are two approaches for data management which are based on modeling, storing, and querying graph-like data. In spite of such common principles, they present special features that complicate the task of database interoperability. While there exist some methods to transform RDF graphs into property graphs, and vice versa, they lack compatibility and a solid formal foundation. This paper presents three direct mappings (schema-dependent and schema-independent) for transforming an RDF database into a property graph database, including data and schema. We show that two of the proposed mappings satisfy the properties of semantics preservation and information preservation. The existence of both mappings allows us to conclude that the property graph data model subsumes the information capacity of the RDF data model.

In a colouring of $\mathbb{R}^d$ a pair $(S,s_0)$ with $S\subseteq \mathbb{R}^d$ and with $s_0\in S$ is \emph{almost monochromatic} if $S\setminus \{s_0\}$ is monochromatic but $S$ is not. We consider questions about finding almost monochromatic similar copies of pairs $(S,s_0)$ in colourings of $\mathbb{R}^d$, $\mathbb{Z}^d$, and in $\mathbb{Q}$ under some restrictions on the colouring.

Among other results, we characterise those $(S,s_0)$ with $S\subseteq \mathbb{Z}$ for which every finite colouring of $\mathbb{R}$ without an infinite monochromatic arithmetic progression contains an almost monochromatic similar copy of $(S,s_0)$. We also show that if $S\subseteq \mathbb{Z}^d$ and $s_0$ is outside of the convex hull of $S\setminus \{s_0\}$, then every finite colouring of $\mathbb{R}^d$ without a similar monochromatic copy of $\mathbb{Z}^d$ contains an almost monochromatic similar copy of $(S,s_0)$.

Further, we propose an approach of finding almost-monochromatic sets that might lead to a non-computer assisted proof of $\chi(\R^2)\geq 5$.

We extend the inflationary fixed-point logic, IFP, with a new kind of second-order quantifiers which have (poly-)logarithmic bounds. We prove that on ordered structures the new logic $\exists^{\log^{\omega}}\text{IFP}$ captures the limited nondeterminism class $\beta\text{P}$. In order to study its expressive power, we also design a new version of Ehrenfeucht-Fra\"iss\'e game for this logic and show that our capturing result will not hold on the general case, i.e. on all the finite structures.

Improving transaction throughput is an important challenge for Bitcoin. However, shortening the block generation interval or increasing the block size to improve throughput makes it sharing blocks within the network slower and increases the number of orphan blocks. Consequently, the security of the blockchain is sacrificed. To mitigate this, it is necessary to reduce the block propagation delay. Because of the contribution of new Bitcoin protocols and the improvements of the Internet, the block propagation delay in the Bitcoin network has been shortened in recent years. In this study, we identify impacts of compact block relay---an up-to-date Bitcoin protocol---and Internet improvement on the block propagation delay and fork rate in the Bitcoin network from 2015 to 2019. Existing measurement studies could not identify them but our simulation enables it. The experimental results reveal that compact block relay contributes to shortening the block propagation delay more than Internet improvements. The block propagation delay is reduced by 64.5% for the 50th percentile and 63.7% for the 90th percentile due to Internet improvements, and by 90.1% for the 50th percentile and by 87.6% for the 90th percentile due to compact block relay.

We propose to predict histograms of object sizes in crowded scenes directly without any explicit object instance segmentation. What makes this task challenging is the high density of objects (of the same category), which makes instance identification hard. Instead of explicitly segmenting object instances, we show that directly learning histograms of object sizes improves accuracy while using drastically less parameters. This is very useful for application scenarios where explicit, pixel-accurate instance segmentation is not needed, but there lies interest in the overall distribution of instance sizes. Our core applications are in biology, where we estimate the size distribution of soldier fly larvae, and medicine, where we estimate the size distribution of cancer cells as an intermediate step to calculate the tumor cellularity score. Given an image with hundreds of small object instances, we output the total count and the size histogram. We also provide a new data set for this task, the FlyLarvae data set, which consists of 11,000 larvae instances labeled pixel-wise. Our method results in an overall improvement in the count and size distribution prediction as compared to state-of-the-art instance segmentation method Mask R-CNN.

A precise, controllable, interpretable and easily trainable text removal approach is necessary for both user-specific and large-scale text removal applications. To achieve this, we propose a one-stage mask-based text inpainting network, MTRNet++. It has a novel architecture that includes mask-refine, coarse-inpainting and fine-inpainting branches, and attention blocks. With this architecture, MTRNet++ can remove text either with or without an external mask. It achieves state-of-the-art results on both the Oxford and SCUT datasets without using external ground-truth masks. The results of ablation studies demonstrate that the proposed multi-branch architecture with attention blocks is effective and essential. It also demonstrates controllability and interpretability.

Previous generations of face recognition algorithms differ in accuracy for images of different races (race bias). Here, we present the possible underlying factors (data-driven and scenario modeling) and methodological considerations for assessing race bias in algorithms. We discuss data driven factors (e.g., image quality, image population statistics, and algorithm architecture), and scenario modeling factors that consider the role of the "user" of the algorithm (e.g., threshold decisions and demographic constraints). To illustrate how these issues apply, we present data from four face recognition algorithms (a previous-generation algorithm and three deep convolutional neural networks, DCNNs) for East Asian and Caucasian faces. First, dataset difficulty affected both overall recognition accuracy and race bias, such that race bias increased with item difficulty. Second, for all four algorithms, the degree of bias varied depending on the identification decision threshold. To achieve equal false accept rates (FARs), East Asian faces required higher identification thresholds than Caucasian faces, for all algorithms. Third, demographic constraints on the formulation of the distributions used in the test, impacted estimates of algorithm accuracy. We conclude that race bias needs to be measured for individual applications and we provide a checklist for measuring this bias in face recognition algorithms.

Stochastic dual dynamic programming is a cutting plane type algorithm for multi-stage stochastic optimization originated about 30 years ago. In spite of its popularity in practice, there does not exist any analysis on the convergence rates of this method. In this paper, we first establish the number of iterations, i.e., iteration complexity, required by a basic dynamic cutting plane method for solving relatively simple multi-stage optimization problems, by introducing novel mathematical tools including the saturation of search points. We then refine these basic tools and establish the iteration complexity for both deterministic and stochastic dual dynamic programming methods for solving more general multi-stage stochastic optimization problems under the standard stage-wise independence assumption. Our results indicate that the complexity of these methods mildly increases with the number of stages $T$, in fact linearly dependent on $T$ for discounted problems. Therefore, they are efficient for strategic decision making which involves a large number of stages, but with a relatively small number of decision variables in each stage. Without explicitly discretizing the state and action spaces, these methods might also be pertinent to the related reinforcement learning and stochastic control areas.

Light field (LF) cameras record both intensity and directions of light rays, and capture scenes from a number of viewpoints. Both information within each perspective (i.e., spatial information) and among different perspectives (i.e., angular information) is beneficial to image super-resolution (SR). In this paper, we propose a spatial-angular interactive network (namely, LF-InterNet) for LF image SR. Specifically, spatial and angular features are first separately extracted from input LFs, and then repetitively interacted to progressively incorporate spatial and angular information. Finally, the interacted features are fused to superresolve each sub-aperture image. Experimental results demonstrate the superiority of LF-InterNet over the state-of-the-art methods, i.e., our method can achieve high PSNR and SSIM scores with low computational cost, and recover faithful details in the reconstructed images.

The rapid proliferation of shared edge computing platforms has enabled application service providers to deploy a wide variety of services with stringent latency and high bandwidth requirements. A key advantage of these platforms is that they provide pay-as-you-go flexibility by charging clients in proportion to their resource usage through short-term contracts. This affords the client significant cost-saving opportunities, by dynamically deciding when to host (cache) its service on the platform, depending on the changing intensity of requests. A natural caching policy for our setting is the Time-To-Live (TTL) policy. We show that TTL performs poorly both in the adversarial arrival setting, i.e., in terms of the competitive ratio, and for i.i.d. stochastic arrivals with low arrival rates, irrespective of the value of the TTL timer. We propose an online caching policy called RetroRenting (RR) and show that in the class of deterministic online policies, RR is order-optimal with respect to the competitive ratio. In addition, we provide performance guarantees for RR for i.i.d. stochastic arrival processes and prove that it compares well with the optimal online policy. Further, we conduct simulations using both synthetic and real world traces to compare the performance of RR with the optimal offline and online policies. The simulations show that the performance of RR is near optimal for all settings considered. Our results illustrate the universality of RR.

Grid computing is a collection of computer resources that are gathered together from various areas to give computational resources such as storage, data or application services. This is to permit clients to access this huge measure of processing resources without the need to know where these might be found and what technology such as, hardware equipment and operating system was used. Dependability and performance are among the key difficulties faced in a grid computing environment. Various systems have been proposed in the literature to handle recouping from resource failure in Grid computing environment. One case of such system is checkpointing. Checkpointing is a system that endures faults when resources failed. Checkpointing method has the upside of lessening the work lost because of resource faults. However, checkpointing presents a huge runtime overhead. In this paper, we propose an improved checkpointing system to bring down runtime overhead. A replica is added to ensure the availability of resources. This replicates all checkpointing files to other machines as opposed to having dedicated checkpointing machine. The results of simulation employing GridSim noted that retaining the number of resources fixed and changing the number of gridlets, gains of up to 11%, 9%, and 11% on makespan, throughput, and turnaround time respectively were realized while changing the number of resources and preserving the number of gridlets fixed, increases of up to 11%, 8%, and 9% on makespan, throughput, and turnaround time respectively, were realized

Spiking Neural Network (SNN) is a brain-inspired, event-driven machine learning algorithm that has recognized potential in producing ultra-high-energy-efficient hardware. Among existing SNNs, unsupervised SNNs are based on synaptic plasticity and considered to have more potential in imitating the learning process of biological brain. Most unsupervised SNNs are trained through competitive learning with Spike-Timing-Dependent Plasticity (STDP). However, the STDP-based SNNs are limited by slow learning speed and/or constrained learning capability. In this paper, to overcome these limitations, we: 1) designed a high-parallelism network architecture, inspired by the Inception module in the Artificial Neural Network (ANN) literature; 2) extended a widely used vote-based spike decoding scheme to a Vote-for-All (VFA) decoding layer to reduce the information loss in the spike decoding; 3) proposed to use adaptive repolarization (i.e. resetting) in the spiking neuron model to enhance the spiking activities and thus further accelerate the network's learning. We evaluated our contributions on the two established benchmark datasets (MNIST/EMNIST). Our experimental results show that our architecture exhibits superior performance than widely used Fully-Connected (FC) and Locally-Connected (LC) architectures. Our SNN not only achieves comparable results with the state-of-the-art unsupervised SNNs (95.64%/80.11% accuracy on the MNIST/EMNISE dataset), but also shows superior learning efficiency and robustness against hardware damage. Our SNN trained with only hundreds of iterations can achieve a great classification accuracy, and random destruction of large numbers of synapses and neurons only leads to negligible performance degradation.

Polling systems have been widely studied, however most of these studies focus on polling systems with renewal processes for arrivals and random variables for service times. There is a need driven by practical applications to study polling systems with arbitrary arrivals (not restricted to time-varying or in batches) and revealed service time upon a job's arrival. To address that need, our work considers a polling system with generic setting and for the first time provides the worst-case analysis for online scheduling policies in this system. We provide conditions for the existence of constant competitive ratios, and competitive lower bounds for general scheduling policies in polling systems. Our work also bridges the queueing and scheduling communities by proving the competitive ratios for several well-studied policies in the queueing literature, such as cyclic policies with exhaustive, gated or l-limited service disciplines for polling systems.

Organizations are starting to realize of the combined power of data and data-driven algorithmic models to gain insights, situational awareness, and advance their mission. A common challenge to gaining insights is connecting inherently different datasets. These datasets (e.g. geocoded features, video streams, raw text, social network data, etc.) per separate they provide very narrow answers; however collectively they can provide new capabilities. In this work, we present a data fusion framework for accelerating solutions for Processing, Exploitation, and Dissemination (PED). Our platform is a collection of services that extract information from several data sources (per separate) by leveraging deep learning and other means of processing. This information is fused by a set of analytical engines that perform data correlations, searches, and other modeling operations to combine information from the disparate data sources. As a result, events of interest are detected, geolocated, logged, and presented into a common operating picture. This common operating picture allows the user to visualize in real time all the data sources, per separate and their collective cooperation. In addition, forensic activities have been implemented and made available through the framework. Users can review archived results and compare them to the most recent snapshot of the operational environment. In our first iteration we have focused on visual data (FMV, WAMI, CCTV/PTZ-Cameras, open source video, etc.) and AIS data streams (satellite and terrestrial sources). As a proof-of-concept, in our experiments we show how FMV detections can be combined with vessel tracking signals from AIS sources to confirm identity, tip-and-cue aerial reconnaissance, and monitor vessel activity in an area.

This paper addresses two crucial problems of learning disentangled image representations, namely controlling the degree of disentanglement during image editing, and balancing the disentanglement strength and the reconstruction quality. To encourage disentanglement, we devise a distance covariance based decorrelation regularization. Further, for the reconstruction step, our model leverages a soft target representation combined with the latent image code. By exploring the real-valued space of the soft target representation, we are able to synthesize novel images with the designated properties. To improve the perceptual quality of images generated by autoencoder (AE)-based models, we extend the encoder-decoder architecture with the generative adversarial network (GAN) by collapsing the AE decoder and the GAN generator into one. We also design a classification based protocol to quantitatively evaluate the disentanglement strength of our model. Experimental results showcase the benefits of the proposed model.

Graph Convolution Network (GCN) has become new state-of-the-art for collaborative filtering. Nevertheless, the reasons of its effectiveness for recommendation are not well understood. Existing work that adapts GCN to recommendation lacks thorough ablation analyses on GCN, which is originally designed for graph classification tasks and equipped with many neural network operations. However, we empirically find that the two most common designs in GCNs -- feature transformation and nonlinear activation -- contribute little to the performance of collaborative filtering. Even worse, including them adds to the difficulty of training and degrades recommendation performance.

In this work, we aim to simplify the design of GCN to make it more concise and appropriate for recommendation. We propose a new model named LightGCN, including only the most essential component in GCN -- neighborhood aggregation -- for collaborative filtering. Specifically, LightGCN learns user and item embeddings by linearly propagating them on the user-item interaction graph, and uses the weighted sum of the embeddings learned at all layers as the final embedding. Such simple, linear, and neat model is much easier to implement and train, exhibiting substantial improvements (about 16.0\% relative improvement on average) over Neural Graph Collaborative Filtering (NGCF) -- a state-of-the-art GCN-based recommender model -- under exactly the same experimental setting. Further analyses are provided towards the rationality of the simple LightGCN from both analytical and empirical perspectives.

Convolutional neural networks (CNNs) often perform well, but their stability is poorly understood. To address this problem, we consider the simple prototypical problem of signal denoising, where classical approaches such as nonlinear diffusion, wavelet-based methods and regularisation offer provable stability guarantees. To transfer such guarantees to CNNs, we interpret numerical approximations of these classical methods as a specific residual network (ResNet) architecture. This leads to a dictionary which allows to translate diffusivities, shrinkage functions, and regularisers into activation functions, and enables a direct communication between the four research communities. On the CNN side, it does not only inspire new families of nonmonotone activation functions, but also introduces intrinsically stable architectures for an arbitrary number of layers.

The task of text classification is usually divided into two stages: {\it text feature extraction} and {\it classification}. In this standard formalization categories are merely represented as indexes in the label vocabulary, and the model lacks for explicit instructions on what to classify. Inspired by the current trend of formalizing NLP problems as question answering tasks, we propose a new framework for text classification, in which each category label is associated with a category description. Descriptions are generated by hand-crafted templates or using abstractive/extractive models from reinforcement learning. The concatenation of the description and the text is fed to the classifier to decide whether or not the current label should be assigned to the text. The proposed strategy forces the model to attend to the most salient texts with respect to the label, which can be regarded as a hard version of attention, leading to better performances. We observe significant performance boosts over strong baselines on a wide range of text classification tasks including single-label classification, multi-label classification and multi-aspect sentiment analysis.

This paper considers a compressed-coding scheme that combines compressed sensing with forward error control coding. Approximate message passing (AMP) is used to decode the message. Based on the state evolution analysis of AMP, we derive the performance limit of compressed-coding. We show that compressed-coding can approach Gaussian capacity at a very low compression ratio. Further, the results are extended to systems involving non-linear effects such as clipping. We show that the capacity approaching property can still be maintained when generalized AMP is used to decode the message.

To approach the capacity, a low-rate underlying code should be designed according to the curve matching principle, which is complicated in practice. Instead, analog spatial-coupling is used to avoid sophisticated low-rate code design. In the end, we study the coupled scheme in a multiuser environment, where spatial-coupling can be realized in a distributive way. The overall block length can be shared by many users, which reduces block length per-user.

Let $f_1,\ldots,f_m$ be elements in a quotient $R^n / N$ which has finite dimension as a $K$-vector space, where $R = K[X_1,\ldots,X_r]$ and $N$ is an $R$-submodule of $R^n$. We address the problem of computing a Gr\"obner basis of the module of syzygies of $(f_1,\ldots,f_m)$, that is, of vectors $(p_1,\ldots,p_m) \in R^m$ such that $p_1 f_1 + \cdots + p_m f_m = 0$.

An iterative algorithm for this problem was given by Marinari, M\"oller, and Mora (1993) using a dual representation of $R^n / N$ as the kernel of a collection of linear functionals. Following this viewpoint, we design a divide-and-conquer algorithm, which can be interpreted as a generalization to several variables of Beckermann and Labahn's recursive approach for matrix Pad\'e and rational interpolation problems. To highlight the interest of this method, we focus on the specific case of bivariate Pad\'e approximation and show that it improves upon the best known complexity bounds.

Recent years have seen an increasing integration of distributed renewable energy resources into existing electric power grids. Due to the uncertain nature of renewable energy resources, network operators are faced with new challenges in balancing load and generation. In order to meet the new requirements, intelligent distributed energy resource plants can be used which provide as virtual power plants e.g. demand side management or flexible generation. However, the calculation of an adequate schedule for the unit commitment of such distributed energy resources is a complex optimization problem which is typically too complex for standard optimization algorithms if large numbers of distributed energy resources are considered. For solving such complex optimization tasks, population-based metaheuristics -- as e.g. evolutionary algorithms -- represent powerful alternatives. Admittedly, evolutionary algorithms do require lots of computational power for solving such problems in a timely manner. One promising solution for this performance problem is the parallelization of the usually time-consuming evaluation of alternative solutions. In the present paper, a new generic and highly scalable parallel method for unit commitment of distributed energy resources using metaheuristic algorithms is presented. It is based on microservices, container virtualization and the publish/subscribe messaging paradigm for scheduling distributed energy resources. Scalability and applicability of the proposed solution are evaluated by performing parallelized optimizations in a big data environment for three distinct distributed energy resource scheduling scenarios. The new method provides cluster or cloud parallelizability and is able to deal with a comparably large number of distributed energy resources. The application of the new proposed method results in very good performance for scaling up optimization speed.

Despite progress in training neural networks for lossy image compression, current approaches fail to maintain both perceptual quality and abstract features at very low bitrates. Encouraged by recent success in learning discrete representations with Vector Quantized Variational Autoencoders (VQ-VAEs), we motivate the use of a hierarchy of VQ-VAEs to attain high factors of compression. We show that the combination of stochastic quantization and hierarchical latent structure aids likelihood-based image compression. This leads us to introduce a novel objective for training hierarchical VQ-VAEs. Our resulting scheme produces a Markovian series of latent variables that reconstruct images of high-perceptual quality which retain semantically meaningful features. We provide qualitative and quantitative evaluations on the CelebA and MNIST datasets.

Neural network pruning reduces the computational cost of an over-parameterized network to improve its efficiency. Popular methods vary from $\ell_1$-norm sparsification to Neural Architecture Search (NAS). In this work, we propose a novel pruning method that optimizes the final accuracy of the pruned network and distills knowledge from the over-parameterized parent network's inner layers. To enable this approach, we formulate the network pruning as a Knapsack Problem which optimizes the trade-off between the importance of neurons and their associated computational cost. Then we prune the network channels while maintaining the high-level structure of the network. The pruned network is fine-tuned under the supervision of the parent network using its inner network knowledge, a technique we refer to as the Inner Knowledge Distillation. Our method leads to state-of-the-art pruning results on ImageNet, CIFAR-10 and CIFAR-100 using ResNet backbones. To prune complex network structures such as convolutions with skip-links and depth-wise convolutions, we propose a block grouping approach to cope with these structures. Through this we produce compact architectures with the same FLOPs as EfficientNet-B0 and MobileNetV3 but with higher accuracy, by $1\%$ and $0.3\%$ respectively on ImageNet, and faster runtime on GPU.

Running Stochastic Gradient Descent (SGD) in a decentralized fashion has shown promising results. In this paper we propose Moniqua, a technique that allows decentralized SGD to use quantized communication. We prove in theory that Moniqua communicates a provably bounded number of bits per iteration, while converging at the same asymptotic rate as the original algorithm does with full-precision communication. Moniqua improves upon prior works in that it (1) requires zero additional memory, (2) works with 1-bit quantization, and (3) is applicable to a variety of decentralized algorithms. We demonstrate empirically that Moniqua converges faster with respect to wall clock time than other quantized decentralized algorithms. We also show that Moniqua is robust to very low bit-budgets, allowing 1-bit-per-parameter communication without compromising validation accuracy when training ResNet20 and ResNet110 on CIFAR10.

Nondestructive evaluation methods play an important role in ensuring component integrity and safety in many industries. Operator fatigue can play a critical role in the reliability of such methods. This is important for inspecting high value assets or assets with a high consequence of failure, such as aerospace and nuclear components. Recent advances in convolution neural networks can support and automate these inspection efforts. This paper proposes using residual neural networks (ResNets) for real-time detection of corrosion, including iron oxide discoloration, pitting and stress corrosion cracking, in dry storage stainless steel canisters housing used nuclear fuel. The proposed approach crops nuclear canister images into smaller tiles, trains a ResNet on these tiles, and classifies images as corroded or intact using the per-image count of tiles predicted as corroded by the ResNet. The results demonstrate that such a deep learning approach allows to detect the locus of corrosion via smaller tiles, and at the same time to infer with high accuracy whether an image comes from a corroded canister. Thereby, the proposed approach holds promise to automate and speed up nuclear fuel canister inspections, to minimize inspection costs, and to partially replace human-conducted onsite inspections, thus reducing radiation doses to personnel.

We present Asynchronous Stochastic Parallel Pose Graph Optimization (ASAPP), the first asynchronous algorithm for distributed pose graph optimization (PGO) in multi-robot simultaneous localization and mapping. By enabling robots to optimize their local trajectory estimates without synchronization, ASAPP offers resiliency against communication delays and alleviates the need to wait for stragglers in the network. Furthermore, ASAPP can be applied on the rank-restricted relaxations of PGO, a crucial class of non-convex Riemannian optimization problems that underlies recent breakthroughs on globally optimal PGO. Under bounded delay, we establish the global first-order convergence of ASAPP using a sufficiently small stepsize. The derived stepsize depends on the worst-case delay and inherent problem sparsity, and furthermore matches known result for synchronous algorithms when there is no delay. Numerical evaluations on simulated and real-world datasets demonstrate favorable performance compared to state-of-the-art synchronous approach, and show ASAPP's resilience against a wide range of delays in practice.

Despite recent improvements in medical image segmentation, the ability to generalize across imaging contrasts remains an open issue. To tackle this challenge, we implement Feature-wise Linear Modulation (FiLM) to leverage physics knowledge within the segmentation model and learn the characteristics of each contrast. Interestingly, a well-optimised U-Net reached the same performance as our FiLMed-Unet on a multi-contrast dataset (0.72 of Dice score), which suggests that there is a bottleneck in spinal MS lesion segmentation different from the generalization across varying contrasts. This bottleneck likely stems from inter-rater variability, which is estimated at 0.61 of Dice score in our dataset.

Graph autoencoders (GAEs) are the effective unsupervised learning frameworks to learn the latent representations of graph data for network embedding. Most exiting GAE approaches typically focus on the graph reconstruction but neglect the node features reconstruction, which often results in overfitting due to the capacity of the autoencoders. Additionally, the adjacency matrix of these methods cannot accurately represent the connections among nodes in latent space when the prior information is contaminated. To address these problems, we propose a novel Graph Convolutional Auto-encoder with Bi-decoder and Adaptive-sharing Adjacency method, namely BAGE. The framework encodes the graph construction and node features into latent representations, upon which the bi-decoder is trained to reconstruct the graph construction and node features simultaneously. Moreover, the convolutional adjacency matrix is shared with the embedded graph optimization and can be adaptively learned by constructing the Laplacian, leading to more accurate connections. Consequently, the proposed method can be applied to more general datasets than the existing GAEs, i.e., data without a pre-given graph construction to prevent the GAEs from excessive dependence on the prior adjacency information. Experimental results validate the superiority of our method to the state-of-the-art network embedding methods.

The known linear-time kernelizations for $d$-Hitting Set guarantee linear worst-case running times using a quadratic-size data structure (that is not fully initialized). Getting rid of this data structure, we show that problem kernels of asymptotically optimal size $O(k^d)$ for $d$-Hitting Set are computable in linear time and space. Additionally, we experimentally compare the linear-time kernelizations for $d$-Hitting Set to each other and to a classical data reduction algorithm due to Weihe.

Complexities that arise from implementation of object-oriented concepts in C++ such as virtual dispatch and dynamic type casting have attracted the attention of attackers and defenders alike.

Binary-level defenses are dependent on full and precise recovery of class inheritance tree of a given program.

While current solutions focus on recovering single and multiple inheritances from the binary, they are oblivious to virtual inheritance. Conventional wisdom among binary-level defenses is that virtual inheritance is uncommon and/or support for single and multiple inheritances provides implicit support for virtual inheritance. In this paper, we show neither to be true.

Specifically, (1) we present an efficient technique to detect virtual inheritance in C++ binaries and show through a study that virtual inheritance can be found in non-negligible number (more than 10\% on Linux and 12.5\% on Windows) of real-world C++ programs including Mysql and libstdc++. (2) we show that failure to handle virtual inheritance introduces both false positives and false negatives in the hierarchy tree. These false positves and negatives either introduce attack surface when the hierarchy recovered is used to enforce CFI policies, or make the hierarchy difficult to understand when it is needed for program understanding (e.g., during decompilation). (3) We present a solution to recover virtual inheritance from COTS binaries. We recover a maximum of 95\% and 95.5\% (GCC -O0) and a minimum of 77.5\% and 73.8\% (Clang -O2) of virtual and intermediate bases respectively in the virtual inheritance tree.

Here we demonstrate how Deep Neural Network (DNN) detections of multiple constitutive or component objects that are part of a larger, more complex, and encompassing feature can be spatially fused to improve the search, detection, and retrieval (ranking) of the larger complex feature. First, scores computed from a spatial clustering algorithm are normalized to a reference space so that they are independent of image resolution and DNN input chip size. Then, multi-scale DNN detections from various component objects are fused to improve the detection and retrieval of DNN detections of a larger complex feature. We demonstrate the utility of this approach for broad area search and detection of Surface-to-Air Missile (SAM) sites that have a very low occurrence rate (only 16 sites) over a ~90,000 km^2 study area in SE China. The results demonstrate that spatial fusion of multi-scale component-object DNN detections can reduce the detection error rate of SAM Sites by $>$85% while still maintaining a 100% recall. The novel spatial fusion approach demonstrated here can be easily extended to a wide variety of other challenging object search and detection problems in large-scale remote sensing image datasets.

Surjective Constraint Satisfaction Problem (SCSP) is the problem of deciding whether there exists a surjective assignment to a set of variables subject to some specified constraints. In this paper we show that one of the most popular variants of the SCSP, called No-Rainbow Problem, is NP-Hard. Additionally, we disprove the conjecture saying that SCSP over a constraint language $\Gamma$ is equivalent to CSP over the same language with constants. Our counter example also shows that the complexity of SCSP cannot be described in terms of polymorphisms of the constraint language.

Blockchain-based cryptocurrencies and applications have flourished the blockchain research community. Massive data generated from diverse blockchain systems bring not only huge business values and but also technical challenges in data analytics of heterogeneous blockchain data. Different from Bitcoin and Ethereum, EOSIO has richer diversity and higher volume of blockchain data due to its unique architectural design in resource management, consensus scheme and high throughput. Despite its popularity (e.g., 89,800,000 blocks generated till Nov. 14, 2019 since its launching in June 8, 2018), few studies have been made on data analysis of EOSIO. To fill this gap, we collect and process the up-to-date on-chain data from EOSIO. We name these well-processed EOSIO datasets as XBlock-EOS, which consists of 7 well-processed datasets: 1) Block, Transaction and Action, 2) Internal and External EOS Transfer Action, 3) Contract Information, 4) Contract Invocation, 5) Token Action, 6) Account Creation, 7) Resource Management. It is challenging to process and analyze high volume of raw EOSIO data and establish the mapping from original raw data to the fine-grained datasets since it requires substantial efforts in exacting various types of data as well as sophisticated knowledge on software engineering and data analytics. Meanwhile, we present statistics and exploration on these datasets. Moreover, we also outline the possible research opportunities based on XBlock-EOS.

Suppose $\mathbb{K}$ is a large enough field and $\mathcal{P} \subset \mathbb{K}^2$ is a fixed, generic set of points which is available for precomputation. We introduce a technique called \emph{reshaping} which allows us to design quasi-linear algorithms for both: computing the evaluations of an input polynomial $f \in \mathbb{K}[x,y]$ at all points of $\mathcal{P}$; and computing an interpolant $f \in \mathbb{K}[x,y]$ which takes prescribed values on $\mathcal{P}$ and satisfies an input $y$-degree bound. Our genericity assumption is explicit and we prove that it holds for most point sets over a large enough field. If $\mathcal{P}$ violates the assumption, our algorithms still work and the performance degrades smoothly according to a distance from being generic. To show that the reshaping technique may have an impact on other related problems, we apply it to modular composition: suppose generic polynomials $M \in \mathbb{K}[x]$ and $A \in \mathbb{K}[x]$ are available for precomputation, then given an input $f \in \mathbb{K}[x,y]$ we show how to compute $f(x, A(x)) \operatorname{rem} M(x)$ in quasi-linear time.

Graph convolution networks (GCN) are increasingly popular in many applications, yet remain notoriously hard to train over large graph datasets. They need to compute node representations recursively from their neighbors. Current GCN training algorithms suffer from either high computational costs that grow exponentially with the number of layers, or high memory usage for loading the entire graph and node embeddings. In this paper, we propose a novel efficient layer-wise training framework for GCN (L-GCN), that disentangles feature aggregation and feature transformation during training, hence greatly reducing time and memory complexities. We present theoretical analysis for L-GCN under the graph isomorphism framework, that L-GCN leads to as powerful GCNs as the more costly conventional training algorithm does, under mild conditions. We further propose L^2-GCN, which learns a controller for each layer that can automatically adjust the training epochs per layer in L-GCN. Experiments show that L-GCN is faster than state-of-the-arts by at least an order of magnitude, with a consistent of memory usage not dependent on dataset size, while maintaining comparable prediction performance. With the learned controller, L^2-GCN can further cut the training time in half. Our codes are available at https://github.com/Shen-Lab/L2-GCN and supplementary materials at https://yyou1996.github.io/papers/cvpr2020_l2gcn/supplement.pdf.

Oversmoothing has been assumed to be the major cause of performance drop in deep graph convolutional networks (GCNs). In this paper, we propose a new view that deep GCNs can actually learn to anti-oversmooth during training. This work interprets a standard GCN architecture as layerwise integration of a Multi-layer Perceptron (MLP) and graph regularization. We analyze and conclude that before training, the final representation of a deep GCN does over-smooth, however, it learns anti-oversmoothing during training. Based on the conclusion, the paper further designs a cheap but effective trick to improve GCN training. We verify our conclusions and evaluate the trick on three citation networks and further provide insights on neighborhood aggregation in GCNs.

Existing autonomous robot navigation systems allow robots to move from one point to another in a collision-free manner. However, when facing new environments, these systems generally require re-tuning by expert roboticists with a good understanding of the inner workings of the navigation system. In contrast, even users who are unversed in the details of robot navigation algorithms can generate desirable navigation behavior in new environments via teleoperation. In this paper, we introduce APPLD, Adaptive Planner Parameter Learning from Demonstration, that allows existing navigation systems to be successfully applied to new complex environments, given only a human teleoperated demonstration of desirable navigation. APPLD is verified on two robots running different navigation systems in different environments. Experimental results show that APPLD can outperform navigation systems with the default and expert-tuned parameters, and even the human demonstrator themselves.

Following the recent advances in deep networks, object detection and tracking algorithms with deep learning backbones have been improved significantly; however, this rapid development resulted in the necessity of large amounts of annotated labels. Even if the details of such semi-automatic annotation processes for most of these datasets are not known precisely, especially for the video annotations, some automated labeling processes are usually employed. Unfortunately, such approaches might result with erroneous annotations. In this work, different types of annotation errors for object detection problem are simulated and the performance of a popular state-of-the-art object detector, YOLOv3, with erroneous annotations during training and testing stages is examined. Moreover, some inevitable annotation errors in CVPR-2020 Anti-UAV Challenge dataset is also examined in this manner, while proposing a solution to correct such annotation errors of this valuable data set.

Purpose: To present a method that automatically segments and quantifies abnormal CT patterns commonly present in coronavirus disease 2019 (COVID-19), namely ground glass opacities and consolidations. Materials and Methods: In this retrospective study, the proposed method takes as input a non-contrasted chest CT and segments the lesions, lungs, and lobes in three dimensions, based on a dataset of 9749 chest CT volumes. The method outputs two combined measures of the severity of lung and lobe involvement, quantifying both the extent of COVID-19 abnormalities and presence of high opacities, based on deep learning and deep reinforcement learning. The first measure of (PO, PHO) is global, while the second of (LSS, LHOS) is lobewise. Evaluation of the algorithm is reported on CTs of 200 participants (100 COVID-19 confirmed patients and 100 healthy controls) from institutions from Canada, Europe and the United States collected between 2002-Present (April, 2020). Ground truth is established by manual annotations of lesions, lungs, and lobes. Correlation and regression analyses were performed to compare the prediction to the ground truth. Results: Pearson correlation coefficient between method prediction and ground truth for COVID-19 cases was calculated as 0.92 for PO (P < .001), 0.97 for PHO(P < .001), 0.91 for LSS (P < .001), 0.90 for LHOS (P < .001). 98 of 100 healthy controls had a predicted PO of less than 1%, 2 had between 1-2%. Automated processing time to compute the severity scores was 10 seconds per case compared to 30 minutes required for manual annotations. Conclusion: A new method segments regions of CT abnormalities associated with COVID-19 and computes (PO, PHO), as well as (LSS, LHOS) severity scores.

Recent attempts at Super-Resolution for medical images used deep learning techniques such as Generative Adversarial Networks (GANs) to achieve perceptually realistic single image Super-Resolution. Yet, they are constrained by their inability to generalise to different scale factors. This involves high storage and energy costs as every integer scale factor involves a separate neural network. A recent paper has proposed a novel meta-learning technique that uses a Weight Prediction Network to enable Super-Resolution on arbitrary scale factors using only a single neural network. In this paper, we propose a new network that combines that technique with SRGAN, a state-of-the-art GAN-based architecture, to achieve arbitrary scale, high fidelity Super-Resolution for medical images. By using this network to perform arbitrary scale magnifications on images from the Multimodal Brain Tumor Segmentation Challenge (BraTS) dataset, we demonstrate that it is able to outperform traditional interpolation methods by up to 20$\%$ on SSIM scores whilst retaining generalisability on brain MRI images. We show that performance across scales is not compromised, and that it is able to achieve competitive results with other state-of-the-art methods such as EDSR whilst being fifty times smaller than them. Combining efficiency, performance, and generalisability, this can hopefully become a new foundation for tackling Super-Resolution on medical images.

Check out the webapp here: https://metasrgan.herokuapp.com/ Check out the github tutorial here: https://github.com/pancakewaffles/metasrgan-tutorial

Visual Question Answering (VQA) systems are tasked with answering natural language questions corresponding to a presented image. Traditional VQA datasets typically contain questions related to the spatial information of objects, object attributes, or general scene questions. Recently, researchers have recognized the need to improve the balance of such datasets to reduce the system's dependency on memorized linguistic features and statistical biases, while aiming for enhanced visual understanding. However, it is unclear whether any latent patterns exist to quantify and explain these failures. As an initial step towards better quantifying our understanding of the performance of VQA models, we use a taxonomy of Knowledge Gaps (KGs) to tag questions with one or more types of KGs. Each Knowledge Gap (KG) describes the reasoning abilities needed to arrive at a resolution. After identifying KGs for each question, we examine the skew in the distribution of questions for each KG. We then introduce a targeted question generation model to reduce this skew, which allows us to generate new types of questions for an image. These new questions can be added to existing VQA datasets to increase the diversity of questions and reduce the skew.

In the first part of this paper, we prove a theorem which is the $q$-analogue of a generalized modular Ray-Chaudhuri-Wilson Theorem shown in [Alon, Babai, Suzuki, J. Combin. Theory Series A, 1991]. It is also a generalization of the main theorem in [Frankl and Graham, European J. Combin. 1985] under certain circumstances.

In the second part of this paper, we prove $q$-analogues of results on a recent notion called \emph{fractional $L$-intersecting family} for families of subspaces of a given vector space. We use the above theorem to obtain a general upper bound to the cardinality of such families. We give an improvement to this general upper bound in certain special cases.

Opinion dynamics have attracted the interest of researchers from different fields. Local interactions among individuals create interesting dynamics for the system as a whole. Such dynamics are important from a variety of perspectives. Group decision making, successful marketing, and constructing networks (in which consensus can be reached or prevented) are a few examples of existing or potential applications. The invention of the Internet has made the opinion fusion faster, unilateral, and on a whole different scale. Spread of fake news, propaganda, and election interferences have made it clear there is an essential need to know more about these dynamics.

The emergence of new ideas in the field has accelerated over the last few years. In the first quarter of 2020, at least 50 research papers have emerged, either peer-reviewed and published or on pre-print outlets such as arXiv. In this paper, we summarize these ground-breaking ideas and their fascinating extensions and introduce newly surfaced concepts.

Cortical plasticity is one of the main features that enable our capability to learn and adapt in our environment. Indeed, the cerebral cortex has the ability to self-organize itself through two distinct forms of plasticity: the structural plasticity that creates (sprouting) or cuts (pruning) synaptic connections between neurons, and the synaptic plasticity that modifies the synaptic connections strength. These mechanisms are very likely at the basis of an extremely interesting characteristic of the human brain development: the multimodal association. [...] To model such a behavior, Edelman and Damasio proposed respectively the Reentry and the Convergence Divergence Zone frameworks where bi-directional neural communications can lead to both multimodal fusion (convergence) and inter-modal activation (divergence). [...] In this paper, we build a brain-inspired neural system based on the Reentry principles, using Self-Organizing Maps and Hebbian-like learning. We propose and compare different computational methods for unsupervised learning and inference, then quantify the gain of both convergence and divergence mechanisms in a multimodal classification task. The divergence mechanism is used to label one modality based on the other, while the convergence mechanism is used to improve the overall accuracy of the system. We perform our experiments on a constructed written/spoken digits database and a DVS/EMG hand gestures database. Finally, we implement our system on the Iterative Grid, a cellular neuromorphic architecture that enables distributed computing with local connectivity. We show the gain of the so-called hardware plasticity induced by our model, where the system's topology is not fixed by the user but learned along the system's experience through self-organization.

Variational Autoencoders (VAE) and their variants have been widely used in a variety of applications, such as dialog generation, image generation and disentangled representation learning. However, the existing VAE models have some limitations in different applications. For example, a VAE easily suffers from KL vanishing in language modeling and low reconstruction quality for disentangling. To address these issues, we propose a novel controllable variational autoencoder framework, ControlVAE, that combines a controller, inspired by automatic control theory, with the basic VAE to improve the performance of resulting generative models. Specifically, we design a new non-linear PI controller, a variant of the proportional-integral-derivative (PID) control, to automatically tune the hyperparameter (weight) added in the VAE objective using the output KL-divergence as feedback during model training. The framework is evaluated using three applications; namely, language modeling, disentangled representation learning, and image generation. The results show that ControlVAE can achieve better disentangling and reconstruction quality than the existing methods. For language modelling, it not only averts the KL-vanishing, but also improves the diversity of generated text. Finally, we also demonstrate that ControlVAE improves the reconstruction quality of generated images compared to the original VAE.

In this paper, the issue of model uncertainty in safety-critical control is addressed with a data-driven approach. For this purpose, we utilize the structure of an input-ouput linearization controller based on a nominal model along with a Control Barrier Function and Control Lyapunov Function based Quadratic Program (CBF-CLF-QP). Specifically, we propose a novel reinforcement learning framework which learns the model uncertainty present in the CBF and CLF constraints, as well as other control-affine dynamic constraints in the quadratic program. The trained policy is combined with the nominal model-based CBF-CLF-QP, resulting in the Reinforcement Learning-based CBF-CLF-QP (RL-CBF-CLF-QP), which addresses the problem of model uncertainty in the safety constraints. The performance of the proposed method is validated by testing it on an underactuated nonlinear bipedal robot walking on randomly spaced stepping stones with one step preview, obtaining stable and safe walking under model uncertainty.

Typical end-to-end formulations for learning robotic navigation involve predicting a small set of steering command actions (e.g., step forward, turn left, turn right, etc.) from images of the current state (e.g., a bird's-eye view of a SLAM reconstruction). Instead, we show that it can be advantageous to learn with dense action representations defined in the same domain as the state. In this work, we present "spatial action maps," in which the set of possible actions is represented by a pixel map (aligned with the input image of the current state), where each pixel represents a local navigational endpoint at the corresponding scene location. Using ConvNets to infer spatial action maps from state images, action predictions are thereby spatially anchored on local visual features in the scene, enabling significantly faster learning of complex behaviors for mobile manipulation tasks with reinforcement learning. In our experiments, we task a robot with pushing objects to a goal location, and find that policies learned with spatial action maps achieve much better performance than traditional alternatives.

We define a neural network as a septuple consisting of (1) a state vector, (2) an input projection, (3) an output projection, (4) a weight matrix, (5) a bias vector, (6) an activation map and (7) a loss function. We argue that the loss function can be imposed either on the boundary (i.e. input and/or output neurons) or in the bulk (i.e. hidden neurons) for both supervised and unsupervised systems. We apply the principle of maximum entropy to derive a canonical ensemble of the state vectors subject to a constraint imposed on the bulk loss function by a Lagrange multiplier (or an inverse temperature parameter). We show that in an equilibrium the canonical partition function must be a product of two factors: a function of the temperature and a function of the bias vector and weight matrix. Consequently, the total Shannon entropy consists of two terms which represent respectively a thermodynamic entropy and a complexity of the neural network. We derive the first and second laws of learning: during learning the total entropy must decrease until the system reaches an equilibrium (i.e. the second law), and the increment in the loss function must be proportional to the increment in the thermodynamic entropy plus the increment in the complexity (i.e. the first law). We calculate the entropy destruction to show that the efficiency of learning is given by the Laplacian of the total free energy which is to be maximized in an optimal neural architecture, and explain why the optimization condition is better satisfied in a deep network with a large number of hidden layers. The key properties of the model are verified numerically by training a supervised feedforward neural network using the method of stochastic gradient descent. We also discuss a possibility that the entire universe on its most fundamental level is a neural network.

The proper initialization of weights is crucial for the effective training and fast convergence of deep neural networks (DNNs). Prior work in this area has mostly focused on balancing the variance among weights per layer to maintain stability of (i) the input data propagated forwards through the network and (ii) the loss gradients propagated backwards, respectively. This prevalent heuristic is however agnostic of dependencies among gradients across the various layers and captures only firstorder effects. In this paper, we propose and discuss an initialization principle that is based on a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix. The proposed approach is more systematic and recovers previous results for DNN activations such as smooth functions, dropouts, and ReLU. Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool which helps to more rigorously initialize weights

We consider representation learning (hypothesis class $\mathcal{H} = \mathcal{F}\circ\mathcal{G}$) where training and test distributions can be different. Recent studies provide hints and failure examples for domain invariant representation learning, a common approach for this problem, but the explanations provided are somewhat different and do not provide a unified picture. In this paper, we provide new decompositions of risk which give finer-grained explanations and clarify potential generalization issues. For Single-Source Domain Adaptation, we give an exact decomposition (an equality) of the target risk, via a natural hybrid argument, as sum of three factors: (1) source risk, (2) representation conditional label divergence, and (3) representation covariate shift. We derive a similar decomposition for the Multi-Source case. These decompositions reveal factors (2) and (3) as the precise reasons for failure to generalize. For example, we demonstrate that domain adversarial neural networks (DANN) attempt to regularize for (3) but miss (2), while a recent technique Invariant Risk Minimization (IRM) attempts to account for (2) but does not consider (3). We also verify our observations experimentally.

While deep neural networks (DNNs) are being increasingly used to make predictions from high-dimensional, complex data, they are widely seen as uninterpretable "black boxes", since it can be difficult to discover what input information is used to make predictions. This ability is particularly important for applications in cognitive neuroscience and neuroinformatics. A saliency map is a common approach for producing interpretable visualizations of the relative importance of input features for a prediction. However, many methods for creating these maps fail due to focusing too much on the input or being extremely sensitive to small input noise. It is also challenging to quantitatively evaluate how well saliency maps correspond to the truly relevant input information. In this paper, we develop two quantitative evaluation procedures for saliency methods, using the fact that the Human Connectome Project (HCP) dataset contains functional magnetic resonance imaging (fMRI) data from multiple tasks per subject to create ground truth saliency maps. We then introduce an adversarial training method that makes DNNs robust to small input noise, and demonstrate that it measurably improves interpretability.

This article concerns second-order time discretization of subdiffusion equations with time-dependent diffusion coefficients. High-order differentiability and regularity estimates are established for subdiffusion equations with time-dependent coefficients. Using these regularity results and a perturbation argument of freezing the diffusion coefficient, we prove that the convolution quadrature generated by the second-order backward differentiation formula, with proper correction at the first time step, can achieve second-order convergence for both nonsmooth initial data and incompatible source term. Numerical experiments are consistent with the theoretical results.

Model-Free Control (MFC), which is easy to implement both from software and hardware viewpoints, permits the introduction of a high level control synthesis for the Industrial Internet of Things (IIoT) and the Industry 4.0. The choice of the User Diagram Protocol (UDP) as the Internet Protocol permits to neglect the latency. In spite of most severe packet losses, convincing computer simulations and laboratory experiments show that MFC exhibits a good Quality of Service (QoS) and behaves better than a classic PI regulator.

Recent progress in Natural Language Understanding (NLU) is driving fast-paced advances in Information Retrieval (IR), largely owed to fine-tuning deep language models (LMs) for document ranking. While remarkably effective, the ranking models based on these LMs increase computational cost by orders of magnitude over prior approaches, particularly as they must feed each query-document pair through a massive neural network to compute a single relevance score. To tackle this, we present ColBERT, a novel ranking model that adapts deep LMs (in particular, BERT) for efficient retrieval. ColBERT introduces a late interaction architecture that independently encodes the query and the document using BERT and then employs a cheap yet powerful interaction step that models their fine-grained similarity. By delaying and yet retaining this fine-granular interaction, ColBERT can leverage the expressiveness of deep LMs while simultaneously gaining the ability to pre-compute document representations offline, considerably speeding up query processing. Beyond reducing the cost of re-ranking the documents retrieved by a traditional model, ColBERT's pruning-friendly interaction mechanism enables leveraging vector-similarity indexes for end-to-end retrieval directly from a large document collection. We extensively evaluate ColBERT using two recent passage search datasets. Results show that ColBERT's effectiveness is competitive with existing BERT-based models (and outperforms every non-BERT baseline), while executing two orders-of-magnitude faster and requiring four orders-of-magnitude fewer FLOPs per query.

Self-similarity refers to the image prior widely used in image restoration algorithms that small but similar patterns tend to occur at different locations and scales. However, recent advanced deep convolutional neural network based methods for image restoration do not take full advantage of self-similarities by relying on self-attention neural modules that only process information at the same scale. To solve this problem, we present a novel Pyramid Attention module for image restoration, which captures long-range feature correspondences from a multi-scale feature pyramid. Inspired by the fact that corruptions, such as noise or compression artifacts, drop drastically at coarser image scales, our attention module is designed to be able to borrow clean signals from their "clean" correspondences at the coarser levels. The proposed pyramid attention module is a generic building block that can be flexibly integrated into various neural architectures. Its effectiveness is validated through extensive experiments on multiple image restoration tasks: image denoising, demosaicing, compression artifact reduction, and super resolution. Without any bells and whistles, our PANet (pyramid attention module with simple network backbones) can produce state-of-the-art results with superior accuracy and visual quality. Our code will be available at https://github.com/SHI-Labs/Pyramid-Attention-Networks

In this paper, we extend techniques developed in the context of the Travelling Salesperson Problem for cycle problems. Particularly, we study the shrinking of support graphs and the exact algorithms for subcycle elimination separation problems. The efficient application of the considered techniques has proved to be essential in the Travelling Salesperson Problem when solving large size problems by Branch-and-Cut, and this has been the motivation behind this work. Regarding the shrinking of support graphs, we prove the validity of the Padberg-Rinaldi general shrinking rules and the Crowder-Padberg subcycle-safe shrinking rules. Concerning the subcycle separation problems, we extend two exact separation algorithms, the Dynamic Hong and the Extended Padberg-Gr\"otschel algorithms, which are shown to be superior to the ones used so far in the literature of cycle problems.

The proposed techniques are empirically tested in 24 subcycle elimination problem instances generated by solving the Orienteering Problem (involving up to 15112 vertices) with Branch-and-Cut. The experiments suggest the relevance of the proposed techniques for cycle problems. The obtained average speedup for the subcycle separation problems in the Orienteering Problem when the proposed techniques are used together is around 50 times in medium-sized instances and around 250 times in large-sized instances.

It has been shown that the majority of existing adversarial defense methods achieve robustness at the cost of sacrificing prediction accuracy. We propose a novel defense framework, \emph{\underline{R}obust and \underline{A}ccurate \underline{I}mage classificatio\underline{N}} (RAIN), to improve the robustness of given CNN classifiers and, at the same time, preserve their high prediction accuracies. RAIN introduces a new randomization-enhancement scheme. It applies randomization over inputs to break the ties between the model forward prediction path and the backward gradient path, thus improving the model robustness. It then enhances the input's high-frequency details to retain the CNN's high prediction accuracy. Concretely, RAIN consists of two complementary randomization modules: randomized small circular shift (RdmSCS) and randomized down-upsampling (RdmDU). The \emph{RdmDU} module first randomly downsamples the input image. Then, the \emph{RdmSCS} module circularly shifts the input image along a randomly chosen direction by a small but random number of pixels. Finally, the RdmDU module performs upsampling with a high-performance super-resolution model, such as the EDSR, to reconstruct an image with rich details, since an empirical study we conduct reveals that the loss of high-frequency components in input images leads to a drop in the accuracy of a classifier. We conduct extensive experiments on the STL10 and ImageNet datasets to verify the effectiveness of RAIN. Our numerical results show that RAIN outperforms several state-of-the-art methods in both robustness and prediction accuracy.

It is a challenging task to extract the best of both worlds by combining the spatial characteristics of a visible image and the spectral content of an infrared image. In this work, we propose a spatially constrained adversarial autoencoder that extracts deep features from the infrared and visible images to obtain a more exhaustive and global representation. In this paper, we propose a residual autoencoder architecture, regularised by a residual adversarial network, to generate a more realistic fused image. The residual module serves as primary building for the encoder, decoder and adversarial network, as an add on the symmetric skip connections perform the functionality of embedding the spatial characteristics directly from the initial layers of encoder structure to the decoder part of the network. The spectral information in the infrared image is incorporated by adding the feature maps over several layers in the encoder part of the fusion structure, which makes inference on both the visual and infrared images separately. In order to efficiently optimize the parameters of the network, we propose an adversarial regulariser network which would perform supervised learning on the fused image and the original visual image.

Syntactic dependencies can be predicted with high accuracy, and are useful for both machine-learned and pattern-based information extraction tasks. However, their utility can be improved. These syntactic dependencies are designed to accurately reflect syntactic relations, and they do not make semantic relations explicit. Therefore, these representations lack many explicit connections between content words, that would be useful for downstream applications. Proposals like English Enhanced UD improve the situation by extending universal dependency trees with additional explicit arcs. However, they are not available to Python users, and are also limited in coverage. We introduce a broad-coverage, data-driven and linguistically sound set of transformations, that makes event-structure and many lexical relations explicit. We present pyBART, an easy-to-use open-source Python library for converting English UD trees either to Enhanced UD graphs or to our representation. The library can work as a standalone package or be integrated within a spaCy NLP pipeline. When evaluated in a pattern-based relation extraction scenario, our representation results in higher extraction scores than Enhanced UD, while requiring fewer patterns.

We derive a well-defined renormalized version of mutual information that allows to estimate the dependence between continuous random variables in the important case when one is deterministically dependent on the other. This is the situation relevant for feature extraction, where the goal is to produce a low-dimensional effective description of a high-dimensional system. Our approach enables the discovery of collective variables in physical systems, thus adding to the toolbox of artificial scientific discovery, while also aiding the analysis of information flow in artificial neural networks.

The performance of learning-based control techniques crucially depends on how effectively the system is explored. While most exploration techniques aim to achieve a globally accurate model, such approaches are generally unsuited for systems with unbounded state spaces. Furthermore, a globally accurate model is not required to achieve good performance in many common control applications, e.g., local stabilization tasks. In this paper, we propose an active learning strategy for Gaussian process state space models that aims to obtain an accurate model on a bounded subset of the state-action space. Our approach aims to maximize the mutual information of the exploration trajectories with respect to a discretization of the region of interest. By employing model predictive control, the proposed technique integrates information collected during exploration and adaptively improves its exploration strategy. To enable computational tractability, we decouple the choice of most informative data points from the model predictive control optimization step. This yields two optimization problems that can be solved in parallel. We apply the proposed method to explore the state space of various dynamical systems and compare our approach to a commonly used entropy-based exploration strategy. In all experiments, our method yields a better model within the region of interest than the entropy-based method.

It is commonly known that $\zeta(2k) = q_{k}\frac{\zeta(2k + 2)}{\pi^2}$ with known rational numbers $q_{k}$. In this work we construct recurrence relations of the form $\sum_{k = 1}^{\infty}r_{k}\frac{\zeta(2k + 1)}{\pi^{2k}} = 0$ and show that series representations for the coefficients $r_{k} \in \mathbb{R}$ can be computed explicitly.

Predicting the trajectories of surrounding agents is an essential ability for robots navigating complex real-world environments. Autonomous vehicles (AV) in particular, can generate safe and efficient path plans by predicting the motion of surrounding road users. Future trajectories of agents can be inferred using two tightly linked cues: the locations and past motion of agents, and the static scene structure. The configuration of the agents may uncover which part of the scene is more relevant, while the scene structure can determine the relative influence of agents on each other's motion. To better model the interdependence of the two cues, we propose a multi-head attention-based model that uses a joint representation of the static scene and agent configuration for generating both keys and values for the attention heads. Moreover, to address the multimodality of future agent motion, we propose to use each attention head to generate a distinct future trajectory of the agent. Our model achieves state of the art results on the publicly available nuScenes dataset and generates diverse future trajectories compliant with scene structure and agent configuration. Additionally, the visualization of attention maps adds a layer of interpretability to the trajectories predicted by the model.

Designing reward functions is a challenging problem in AI and robotics. Humans usually have a difficult time directly specifying all the desirable behaviors that a robot needs to optimize. One common approach is to learn reward functions from collected expert demonstrations. However, learning reward functions from demonstrations introduces many challenges: some methods require highly structured models, e.g. reward functions that are linear in some predefined set of features, while others adopt less structured reward functions that on the other hand require tremendous amount of data. In addition, humans tend to have a difficult time providing demonstrations on robots with high degrees of freedom, or even quantifying reward values for given demonstrations. To address these challenges, we present a preference-based learning approach, where as an alternative, the human feedback is only in the form of comparisons between trajectories. Furthermore, we do not assume highly constrained structures on the reward function. Instead, we model the reward function using a Gaussian Process (GP) and propose a mathematical formulation to actively find a GP using only human preferences. Our approach enables us to tackle both inflexibility and data-inefficiency problems within a preference-based learning framework. Our results in simulations and a user study suggest that our approach can efficiently learn expressive reward functions for robotics tasks.

We consider a routing game among non-atomic agents where link latency functions are conditional on an uncertain state of the network. All the agents have the same prior belief about the state, but only a fixed fraction receive private route recommendations or a common message, which are generated by a known randomization, referred to as private or public signal respectively. The remaining non-receiving agents choose route according to Bayes Nash flow with respect to the prior. We develop a computational approach to solve the optimal information design problem, i.e., to minimize expected social latency cost over all public or obedient private signals. For a fixed flow induced by non-receiving agents, design of an optimal private signal is shown to be a generalized problem of moments for affine link latency functions, and to admit an atomic solution for the basic two link case. Motivated by this, a hierarchy of polynomial optimization is proposed to approximate, with increasing accuracy, information design over private and public signals, when the non-receiving agents choose route according to Bayes Nash flow. The first level of this hierarchy is shown to be exact for the basic two link case.

We propose a new technique for pushing an unknown object from an initial configuration to a goal configuration with stability constraints. The proposed method leverages recent progress in differentiable physics models to learn unknown mechanical properties of pushed objects, such as their distributions of mass and coefficients of friction. The proposed learning technique computes the gradient of the distance between predicted poses of objects and their actual observed poses and utilizes that gradient to search for values of the mechanical properties that reduce the reality gap. The proposed approach is also utilized to optimize a policy to efficiently push an object toward the desired goal configuration. Experiments with real objects using a real robot to gather data show that the proposed approach can identify the mechanical properties of heterogeneous objects from a small number of pushing actions.

In this work, we investigate a class of elliptic inverse problems and aim to simultaneously recover multiple inhomogeneous inclusions arising from two different physical parameters, using very limited boundary Cauchy data collected only at one or two measurement events. We propose a new fast, stable and highly parallelable direct sampling method (DSM) for the simultaneous reconstruction process. Two groups of probing and index functions are constructed, and their desired properties are analyzed. In order to identify and decouple the multiple inhomogeneous inclusions of different physical nature, we introduce a new concept of mutually almost orthogonality property that generalizes the important concept of almost orthogonality property in classical DSMs for inhomogeneous inclusions of same physical nature. With the help of this new concept, we develop a reliable strategy to distinguish two different types of inhomogeneous inclusions with noisy data collected at one or two measurement events. We further improve the decoupling effect by choosing an appropriate boundary influx. Numerical experiments are presented to illustrate the robustness and efficiency of the proposed method.

The Word Mover's Distance (WMD) is a metric that measures the semantic dissimilarity between two text documents by computing the cost of moving all words of a source/query document to the most similar words of a target document optimally. Computing WMD between two documents is costly because it requires solving an optimization problem that costs $$O(V^3log(V))$$ where $$V$$ is the number of unique words in the document. Fortunately, the WMD can be framed as the Earth Mover's Distance (EMD) (also known as the Optimal Transportation Distance) for which it has been shown that the algorithmic complexity can be reduced to $$O(V^2)$$ by adding an entropy penalty to the optimization problem and a similar idea can be adapted to compute WMD efficiently. Additionally, the computation can be made highly parallel by computing WMD of a single query document against multiple target documents at once (e.g., finding whether a given tweet is similar to any other tweets happened in a day). In this paper, we present a shared-memory parallel Sinkhorn-Knopp Algorithm to compute the WMD of one document against many other documents by adopting the $$O(V^2)$$ EMD algorithm. We used algorithmic transformations to change the original dense compute-heavy kernel to a sparse compute kernel and obtained $$67\times$$ speedup using $$96$$ cores on the state-of-the-art of Intel\textregistered{} 4-sockets Cascade Lake machine w.r.t. its sequential run. Our parallel algorithm is over $$700\times$$ faster than the naive parallel python code that internally uses optimized matrix library calls.

Inertial measurement units are commonly used to estimate the attitude of moving objects. Numerous nonlinear filter approaches have been proposed for solving the inherent sensor fusion problem. However, when a large range of different dynamic and static rotational and translational motions is considered, the attainable accuracy is limited by the need for situation-dependent adjustment of accelerometer and gyroscope fusion weights. We investigate to what extent these limitations can be overcome by means of artificial neural networks and how much domain-specific optimization of the neural network model is required to outperform the conventional filter solution. A diverse set of motion recordings with a marker-based optical ground truth is used for performance evaluation and comparison. The proposed neural networks are found to outperform the conventional filter across all motions only if domain-specific optimizations are introduced. We conclude that they are a promising tool for inertial-sensor-based real-time attitude estimation, but both expert knowledge and rich datasets are required to achieve top performance.

New representations for an integral kernel of the transmutation operator and for a regular solution of the perturbed Bessel equation of the form $-u^{\prime\prime}+\left(\frac{\ell(\ell+1)}{x^{2}}+q(x)\right)u=\omega^{2}u$ are obtained. The integral kernel is represented as a Fourier-Jacobi series. The solution is represented as a Neumann series of Bessel functions uniformly convergent with respect to $\omega$. For the coefficients of the series convenient for numerical computation recurrent integration formulas are obtained. The new representation improves the ones from arXiv:1609.06679 and arXiv:1712.01363 for large values of $\omega$ and $\ell$ and for non-integer values of $\ell$.

The results are based on application of several ideas from the classical transmutation (transformation) operator theory, asymptotic formulas for the solution, results connecting the decay rate of the Fourier transform with the smoothness of a function, the Paley-Wiener theorem and some results from constructive approximation theory.

We show that the analytical representation obtained among other possible applications offers a simple and efficient numerical method able to compute large sets of eigendata with a nondeteriorating accuracy.

We demonstrate that a production-quality keyword-spotting model can be trained on-device using federated learning and achieve comparable false accept and false reject rates to a centrally-trained model. To overcome the algorithmic constraints associated with fitting on-device data (which are inherently non-independent and identically distributed), we conduct thorough empirical studies of optimization algorithms and hyperparameter configurations using large-scale federated simulations. To overcome resource constraints, we replace memory intensive MTR data augmentation with SpecAugment, which reduces the false reject rate by 56%. Finally, to label examples (given the zero visibility into on-device data), we explore teacher-student training.

In recent years, various deep learning architectures have been proposed to solve complex challenges (e.g., spatial dependency, temporal dependency) in traffic domain, which have achieved satisfactory performance. These architectures are composed of multiple deep learning techniques in order to tackle various challenges in traffic data. Traditionally, convolution neural networks (CNNs) are utilized to model spatial dependency by decomposing the traffic network as grids. However, many traffic networks are graph-structured in nature. In order to utilize such spatial information fully, it's more appropriate to formulate traffic networks as graphs mathematically. Recently, various novel deep learning techniques have been developed to process graph data, called graph neural networks (GNNs). More and more works combine GNNs with other deep learning techniques to construct an architecture dealing with various challenges in a complex traffic task, where GNNs are responsible for extracting spatial correlations in traffic network. These graph-based architectures have achieved state-of-the-art performance. To provide a comprehensive and clear picture of such emerging trend, this survey carefully examines various graph-based deep learning architectures in many traffic applications. We first give guidelines to formulate a traffic problem based on graph and construct graphs from various traffic data. Then we decompose these graph-based architectures to discuss their shared deep learning techniques, clarifying the utilization of each technique in traffic tasks. What's more, we summarize common traffic challenges and the corresponding graph-based deep learning solutions to each challenge. Finally, we provide benchmark datasets, open source codes and future research directions in this rapidly growing field.

Greybox fuzzing has been the most scalable and practical approach to software testing. Most greybox fuzzing tools are coverage guided as code coverage is strongly correlated with bug coverage. However, since most covered codes may not contain bugs, blindly extending code coverage is less efficient, especially for corner cases. Unlike coverage-based fuzzers who extend the code coverage in an undirected manner, a directed fuzzer spends most of its time budget on reaching specific target locations (e.g., the bug-prone zone) without wasting resources stressing unrelated parts. Thus, directed greybox fuzzing is particularly suitable for scenarios such as patch testing, bug reproduction, and special bug hunting. In this paper, we conduct the first in-depth study of directed greybox fuzzing. We investigate 28 state-of-the-art fuzzers (82% are published after 2019) closely related to DGF, which have various directed types and optimization techniques. Based on the feature of DGF, we extract 15 metrics to conduct a thorough assessment of the collected tools and systemize the knowledge of this field. Finally, we summarize the challenges and provide perspectives of this field, aiming to facilitate and boost future research on this topic.

This paper presents fast non-sampling based methods to assess the risk of trajectories for autonomous vehicles when probabilistic predictions of other agents' futures are generated by deep neural networks (DNNs). The presented methods address a wide range of representations for uncertain predictions including both Gaussian and non-Gaussian mixture models for predictions of both agent positions and controls. We show that the problem of risk assessment when Gaussian mixture models (GMMs) of agent positions are learned can be solved rapidly to arbitrary levels of accuracy with existing numerical methods. To address the problem of risk assessment for non-Gaussian mixture models of agent position, we propose finding upper bounds on risk using Chebyshev's Inequality and sums-of-squares (SOS) programming; they are both of interest as the former is much faster while the latter can be arbitrarily tight. These approaches only require statistical moments of agent positions to determine upper bounds on risk. To perform risk assessment when models are learned for agent controls as opposed to positions, we develop TreeRing, an algorithm analogous to tree search over the ring of polynomials that can be used to exactly propagate moments of control distributions into position distributions through nonlinear dynamics. The presented methods are demonstrated on realistic predictions from DNNs trained on the Argoverse and CARLA datasets and are shown to be effective for rapidly assessing the probability of low probability events.

We propose an in silico molecular associative memory model for pattern learning, storage and denoising using Pairwise Markov Random Field (PMRF) model. Our PMRF-based molecular associative memory model extracts locally distributed features from the exposed examples, learns and stores the patterns in the molecular associative memory and denoises the given noisy patterns via DNA computation based operations. Thus, our computational molecular model demonstrates the functionalities of content-addressability of human memory. Our molecular simulation results show that the averaged mean squared error between the learned and denoised patterns are low (< 0.014) up to 30% of noise.

The work establishes the exact performance limits of stochastic coded caching when users share a bounded number of cache states, and when the association between users and caches, is random. Under the premise that more balanced user-to-cache associations perform better than unbalanced ones, our work provides a statistical analysis of the average performance of such networks, identifying in closed form, the exact optimal average delivery time. To insightfully capture this delay, we derive easy to compute closed-form analytical bounds that prove tight in the limit of a large number $\Lambda$ of cache states. In the scenario where delivery involves $K$ users, we conclude that the multiplicative performance deterioration due to randomness -- as compared to the well-known deterministic uniform case -- can be unbounded and can scale as $\Theta\left( \frac{\log \Lambda}{\log \log \Lambda} \right)$ at $K=\Theta\left(\Lambda\right)$, and that this scaling vanishes when $K=\Omega\left(\Lambda\log \Lambda\right)$. To alleviate this adverse effect of cache-load imbalance, we consider various load balancing methods, and show that employing proximity-bounded load balancing with an ability to choose from $h$ neighboring caches, the aforementioned scaling reduces to $\Theta \left(\frac{\log(\Lambda / h)}{ \log \log(\Lambda / h)} \right)$, while when the proximity constraint is removed, the scaling is of a much slower order $\Theta \left( \log \log \Lambda \right)$. The above analysis is extensively validated numerically.

Neural Architecture Search (NAS) refers to automatically design the architecture. We propose an hourglass-inspired approach (HourNAS) for this problem that is motivated by the fact that the effects of the architecture often proceed from the vital few blocks. Acting like the narrow neck of an hourglass, vital blocks in the guaranteed path from the input to the output of a deep neural network restrict the information flow and influence the network accuracy. The other blocks occupy the major volume of the network and determine the overall network complexity, corresponding to the bulbs of an hourglass. To achieve an extremely fast NAS while preserving the high accuracy, we propose to identify the vital blocks and make them the priority in the architecture search. The search space of those non-vital blocks is further shrunk to only cover the candidates that are affordable under the computational resource constraints. Experimental results on the ImageNet show that only using 3 hours (0.1 days) with one GPU, our HourNAS can search an architecture that achieves a 77.0% Top-1 accuracy, which outperforms the state-of-the-art methods.

Explainable Artificial Intelligence (XAI) has experienced a significant growth over the last few years. This is due to the widespread application of machine learning, particularly deep learning, that has led to the development of highly accurate models but lack explainability and interpretability. A plethora of methods to tackle this problem have been proposed, developed and tested. This systematic review contributes to the body of knowledge by clustering these methods with a hierarchical classification system with four main clusters: review articles, theories and notions, methods and their evaluation. It also summarises the state-of-the-art in XAI and recommends future research directions.

We propose a multi-head attention mechanism as a blending layer in a neural network model that translates natural language to a high level behavioral language for indoor robot navigation. We follow the framework established by (Zang et al., 2018a) that proposes the use of a navigation graph as a knowledge base for the task. Our results show significant performance gains when translating instructions on previously unseen environments, therefore, improving the generalization capabilities of the model.

Normalizing flows have emerged as an important family of deep neural networks for modelling complex probability distributions. In this note, we revisit their coupling and autoregressive transformation layers as probabilistic graphical models and show that they reduce to Bayesian networks with a pre-defined topology and a learnable density at each node. From this new perspective, we provide three results. First, we show that stacking multiple transformations in a normalizing flow relaxes independence assumptions and entangles the model distribution. Second, we show that a fundamental leap of capacity emerges when the depth of affine flows exceeds 3 transformation layers. Third, we prove the non-universality of the affine normalizing flow, regardless of its depth.

This paper describes BIMCV COVID-19+, a large dataset from the Valencian Region Medical ImageBank (BIMCV) containing chest X-ray images CXR (CR, DX) and computed tomography (CT) imaging of COVID-19+ patients along with their radiological findings and locations, pathologies, radiological reports (in Spanish), DICOM metadata, Polymerase chain reaction (PCR), Immunoglobulin G (IgG) and Immunoglobulin M (IgM) diagnostic antibody tests. The findings have been mapped onto standard Unified Medical Language System (UMLS) terminology and cover a wide spectrum of thoracic entities, unlike the considerably more reduced number of entities annotated in previous datasets. Images are stored in high resolution and entities are localized with anatomical labels and stored in a Medical Imaging Data Structure (MIDS) format. In addition, 10 images were annotated by a team of radiologists to include semantic segmentation of radiological findings. This first iteration of the database includes 1,380 CX, 885 DX and 163 CT studies from 1,311 COVID-19+ patients. This is, to the best of our knowledge, the largest COVID-19+ dataset of images available in an open format. The dataset can be downloaded from \url{this http URL}.

This paper presents fast procedures for thermal infrared remote sensing in dark, GPS-denied environments, such as those found in industrial plants such as in High-Voltage Direct Current (HVDC) converter stations. These procedures are based on the combination of the depth estimation obtained from either a 1-Dimensional LIDAR laser or a 2-Dimensional Hokuyo laser or a 3D MultiSense SLB laser sensor and the visible and thermal cameras from a FLIR Duo R dual-sensor thermal camera. The combination of these sensors/cameras is suitable to be mounted on Unmanned Aerial Vehicles (UAVs) and/or robots in order to provide reliable information about the potential malfunctions, which can be found within the hazardous environment. For example, the capabilities of the developed software and hardware system corresponding to the combination of the 1-D LIDAR sensor and the FLIR Duo R dual-sensor thermal camera is assessed from the point of the accuracy of results and the required computational times: the obtained computational times are under 10 ms, with a maximum localization error of 8 mm and an average standard deviation for the measured temperatures of 1.11 degree Celsius, which results are obtained for a number of test cases. The paper is structured as follows: the description of the system used for identification and localization of hotspots in industrial plants is presented in section II. In section III, the method for faults identification and localization in plants by using a 1-Dimensional LIDAR laser sensor and thermal images is described together with results. In section IV the real time thermal image processing is presented. Fusion of the 2-Dimensional depth laser Hokuyo and the thermal images is described in section V. In section VI the combination of the 3D MultiSense SLB laser and thermal images is described. In section VII a discussion and several conclusions are drawn.

In general, the labels used in sequence labeling consist of different types of elements. For example, IOB-format entity labels, such as B-Person and I-Person, can be decomposed into span (B and I) and type information (Person). However, while most sequence labeling models do not consider such label components, the shared components across labels, such as Person, can be beneficial for label prediction. In this work, we propose to integrate label component information as embeddings into models. Through experiments on English and Japanese fine-grained named entity recognition, we demonstrate that the proposed method improves performance, especially for instances with low-frequency labels.

In this paper, we consider cross-domain imitation learning (CDIL) in which an agent in a target domain learns a policy to perform well in the target domain by observing expert demonstrations in a source domain without accessing any reward function. In order to overcome the domain difference for imitation learning, we propose a dual-structured learning method. The proposed learning method extracts two feature vectors from each input observation such that one vector contains domain information and the other vector contains policy expertness information, and then enhances feature vectors by synthesizing new feature vectors containing both target-domain and policy expertness information. The proposed CDIL method is tested on several MuJoCo tasks where the domain difference is determined by image angles or colors. Numerical results show that the proposed method shows superior performance in CDIL to other existing algorithms and achieves almost the same performance as imitation learning without domain difference.

Information Centric Networks (ICNs) have emerged in recent years as a new networking paradigm for the next-generation Internet. The primary goal of these networks is to provide effective mechanisms for content distribution and retrieval based on in-network content caching. The design of different ICN architectures addressed many of the security issues found in the traditional Internet. Therefore, allowing for a secure, reliable, and scalable communication over the Internet. However, recent research studies showed that these architectures are vulnerable to different types of DDoS attacks. In this paper, we propose a defense mechanism against distributed denial of service attacks (DDoS) in path-identifier based information centric networks. The proposed mechanism, called LogDos, performs GET Message logging based filtering and employs Bloom filter based logging to store incoming GET messages such that corresponding content messages are verified, while filtering packets originating from malicious hosts. We develop three versions of LogDos with varying levels of storage overhead at LogDos-enabled router. Extensive simulation experiments show that LogDos is very effective against DDoS attacks as it can filter more than 99.98 % of attack traffic in different attack scenarios while incurring acceptable storage overhead.

Many unsupervised representation learning methods belong to the class of similarity learning models. While various modality-specific approaches exist for different types of data, a core property of many methods is that representations of similar inputs are close under some similarity function. We propose EMDE (Efficient Manifold Density Estimator) - a framework utilizing arbitrary vector representations with the property of local similarity to succinctly represent smooth probability densities on Riemannian manifolds. Our approximate representation has the desirable properties of being fixed-size and having simple additive compositionality, thus being especially amenable to treatment with neural networks - both as input and output format, producing efficient conditional estimators. We generalize and reformulate the problem of multi-modal recommendations as conditional, weighted density estimation on manifolds. Our approach allows for trivial inclusion of multiple interaction types, modalities of data as well as interaction strengths for any recommendation setting. Applying EMDE to both top-k and session-based recommendation settings, we establish new state-of-the-art results on multiple open datasets in both uni-modal and multi-modal settings. We release the source code and our own real-world dataset of e-commerce product purchases, with special focus on modeling of the item cold-start problem.

Hateful rhetoric is plaguing online discourse, fostering extreme societal movements and possibly giving rise to real-world violence. A potential solution to this growing global problem is citizen-generated counter speech where citizens actively engage in hate-filled conversations to attempt to restore civil non-polarized discourse. However, its actual effectiveness in curbing the spread of hatred is unknown and hard to quantify. One major obstacle to researching this question is a lack of large labeled data sets for training automated classifiers to identify counter speech. Here we made use of a unique situation in Germany where self-labeling groups engaged in organized online hate and counter speech. We used an ensemble learning algorithm which pairs a variety of paragraph embeddings with regularized logistic regression functions to classify both hate and counter speech in a corpus of millions of relevant tweets from these two groups. Our pipeline achieved macro F1 scores on out of sample balanced test sets ranging from 0.76 to 0.97---accuracy in line and even exceeding the state of the art. On thousands of tweets, we used crowdsourcing to verify that the judgments made by the classifier are in close alignment with human judgment. We then used the classifier to discover hate and counter speech in more than 135,000 fully-resolved Twitter conversations occurring from 2013 to 2018 and study their frequency and interaction. Altogether, our results highlight the potential of automated methods to evaluate the impact of coordinated counter speech in stabilizing conversations on social media.

Criminal investigations rely on the collection of conversational data. The identity of speakers must be assessed in order to build or improve the accuracy of an existing criminal network. Investigators use social network analysis tools to identify the most central character and the different communities within the network. We introduce Crime Scene Investigation (CSI) television show as a potential candidate for criminal conversational data. We also introduce the metric of conversation accuracy in the context of criminal investigations. In this paper, a speaker identification baseline is improved by re-ranking candidate speakers based on the frequency of previous interactions between speakers and the topology of the criminal network. The proposed method can be applied to conversations involving two or more speakers. We show that our approach outperforms the baseline speaker accuracy by 1.3% absolute (1.5% relative), and the conversation accuracy by 3.7% absolute (4.7% relative) on CSI data.

Developing modern systems software is a complex task that combines business logic programming and Software Performance Engineering (SPE). The later is an experimental and labor-intensive activity focused on optimizing the system for a given hardware, software, and workload (hw/sw/wl) context.

Today's SPE is performed during build/release phases by specialized teams, and cursed by: 1) lack of standardized and automated tools, 2) significant repeated work as hw/sw/wl context changes, 3) fragility induced by a "one-size-fit-all" tuning (where improvements on one workload or component may impact others). The net result: despite costly investments, system software is often outside its optimal operating point - anecdotally leaving 30% to 40% of performance on the table.

The recent developments in Data Science (DS) hints at an opportunity: combining DS tooling and methodologies with a new developer experience to transform the practice of SPE. In this paper we present: MLOS, an ML-powered infrastructure and methodology to democratize and automate Software Performance Engineering. MLOS enables continuous, instance-level, robust, and trackable systems optimization. MLOS is being developed and employed within Microsoft to optimize SQL Server performance. Early results indicated that component-level optimizations can lead to 20%-90% improvements when custom-tuning for a specific hw/sw/wl, hinting at a significant opportunity. However, several research challenges remain that will require community involvement. To this end, we are in the process of open-sourcing the MLOS core infrastructure, and we are engaging with academic institutions to create an educational program around Software 2.0 and MLOS ideas.

Nowadays e-commerce search has become an integral part of many people's shopping routines. Two critical challenges stay in today's e-commerce search: how to retrieve items that are semantically relevant but not exact matching to query terms, and how to retrieve items that are more personalized to different users for the same search query. In this paper, we present a novel approach called DPSR, which stands for Deep Personalized and Semantic Retrieval, to tackle this problem. Explicitly, we share our design decisions on how to architect a retrieval system so as to serve industry-scale traffic efficiently and how to train a model so as to learn query and item semantics accurately. Based on offline evaluations and online A/B test with live traffics, we show that DPSR model outperforms existing models, and DPSR system can retrieve more personalized and semantically relevant items to significantly improve users' search experience by +1.29% conversion rate, especially for long tail queries by +10.03%. As a result, our DPSR system has been successfully deployed into JD.com's search production since 2019.

Currently, the screening of Wagner grades of diabetic feet (DF) still relies on professional podiatrists. However, in less-developed countries, podiatrists are scarce, which led to the majority of undiagnosed patients. In this study, we proposed the real-time detection and location method for Wagner grades of DF based on refinements on YOLOv3. We collected 2,688 data samples and implemented several methods, such as a visual coherent image mixup, label smoothing, and training scheduler revamping, based on the ablation study. The experimental results suggested that the refinements on YOLOv3 achieved an accuracy of 91.95% and the inference speed of a single picture reaches 31ms with the NVIDIA Tesla V100. To test the performance of the model on a smartphone, we deployed the refinements on YOLOv3 models on an Android 9 system smartphone. This work has the potential to lead to a paradigm shift for clinical treatment of the DF in the future, to provide an effective healthcare solution for DF tissue analysis and healing status.

Convolutional Neural Networks (CNNs) are now a well-established tool for solving computational imaging problems. Modern CNN-based algorithms obtain state-of-the-art performance in diverse image restoration problems. Furthermore, it has been recently shown that, despite being highly overparametrized, networks trained with a single corrupted image can still perform as well as fully trained networks, a phenomenon encapsulated in the deep image prior. We introduce a novel interpretation of denoising networks with no clean training data in the context of the neural tangent kernel (NTK), elucidating the strong links with well-known non-local filtering techniques, such as non-local means or BM3D. The filtering function associated with a given network architecture can be obtained in closed form without need to train the network, being fully characterized by the random initialization of the network weights. While the NTK theory accurately predicts the filter associated with networks trained using standard gradient descent, our analysis shows that it falls short to explain the behaviour of networks trained using the popular Adam optimizer. The latter achieves a larger change of weights in hidden layers, adapting the non-local filtering function during training. We evaluate our findings via extensive image denoising experiments.

Following early work on Hessian-free methods for deep learning, we study a stochastic generalized Gauss-Newton method (SGN) for training DNNs. SGN is a second-order optimization method, with efficient iterations, that we demonstrate to often require substantially fewer iterations than standard SGD to converge. As the name suggests, SGN uses a Gauss-Newton approximation for the Hessian matrix, and, in order to compute an approximate search direction, relies on the conjugate gradient method combined with forward and reverse automatic differentiation. Despite the success of SGD and its first-order variants, and despite Hessian-free methods based on the Gauss-Newton Hessian approximation having been already theoretically proposed as practical methods for training DNNs, we believe that SGN has a lot of undiscovered and yet not fully displayed potential in big mini-batch scenarios. For this setting, we demonstrate that SGN does not only substantially improve over SGD in terms of the number of iterations, but also in terms of runtime. This is made possible by an efficient, easy-to-use and flexible implementation of SGN we propose in the Theano deep learning platform, which, unlike Tensorflow and Pytorch, supports forward and reverse automatic differentiation. This enables researchers to further study and improve this promising optimization technique and hopefully reconsider stochastic second-order methods as competitive optimization techniques for training DNNs; we also hope that the promise of SGN may lead to forward and reverse automatic differentiation being added to Tensorflow or Pytorch. Our results also show that in big mini-batch scenarios SGN is more robust than SGD with respect to its hyperparameters (we never had to tune its step-size for our benchmarks!), which eases the expensive process of hyperparameter tuning that is instead crucial for the performance of first-order methods.

Much recent research effort has been directed to the development of efficient algorithms for solving minimax problems with theoretical convergence guarantees due to the relevance of these problems to a few emergent applications. In this paper, we propose a unified single-loop alternating gradient projection (AGP) algorithm for solving nonconvex-(strongly) concave and (strongly) convex-nonconcave minimax problems. AGP employs simple gradient projection steps for updating the primal and dual variables alternatively at each iteration. We show that it can find an $\varepsilon$-stationary point of the objective function in $\mathcal{O}\left( \varepsilon ^{-2} \right)$ (resp. $\mathcal{O}\left( \varepsilon ^{-4} \right)$) iterations under nonconvex-strongly concave (resp. nonconvex-concave) setting. Moreover, its gradient complexity to obtain an $\varepsilon$-stationary point of the objective function is bounded by $\mathcal{O}\left( \varepsilon ^{-2} \right)$ (resp., $\mathcal{O}\left( \varepsilon ^{-4} \right)$) under the strongly convex-nonconcave (resp., convex-nonconcave) setting. To the best of our knowledge, this is the first time that a simple and unified single-loop algorithm is developed for solving both nonconvex-(strongly) concave and (strongly) convex-nonconcave minimax problems. Moreover, the complexity results for solving the latter (strongly) convex-nonconcave minimax problems have never been obtained before in the literature.

Reconfigurable intelligent surface (RIS)-assisted communication appears as one of the potential enablers for sixth generation (6G) wireless networks by providing a new degree of freedom in the system design to telecom operators. Particularly, RIS-empowered millimeter wave (mmWave) communication systems can be a remedy to provide broadband and ubiquitous connectivity. This paper aims to fill an important gap in the open literature by providing a physical, accurate, open-source, and widely applicable RIS channel model for mmWave frequencies. Our model is not only applicable in various indoor and outdoor environments but also includes the physical characteristics of wireless propagation in the presence of RISs by considering 5G radio channel conditions. Various deployment scenarios are presented for RISs and useful insights are provided for system designers from the perspective of potential RIS use-cases and their efficient positioning. The scenarios in which the use of an RIS makes a big difference or might not have a big impact on the communication system performance, are revealed. The open-source and comprehensive SimRIS Channel Simulator is also introduced in this paper.