## Computer Science (cs) updates on the arXiv.org e-print archive



The current tsunami of deep learning (the hyper-vitamined return of artificial neural networks) applies not only to traditional statistical machine learning tasks: prediction and classification (e.g., for weather prediction and pattern recognition), but has already conquered other areas, such as translation. A growing area of application is the generation of creative content: in particular the case of music, the topic of this paper. The motivation is in using the capacity of modern deep learning techniques to automatically learn musical styles from arbitrary musical corpora and then to generate musical samples from the estimated distribution, with some degree of control over the generation. This article provides a survey of music generation based on deep learning techniques. After a short introduction to the topic illustrated by a recent exemple, the article analyses some early works from the late 1980s using artificial neural networks for music generation and how their pioneering contributions foreshadowed current techniques. Then, we introduce some conceptual framework to analyze the various concepts and dimensions involved. Various examples of recent systems are introduced and analyzed to illustrate the variety of concerns and of techniques.

In this paper, we study the problem of employing pre-trained language models for multi-turn response selection in retrieval-based chatbots. A new model, named Speaker-Aware BERT (SA-BERT), is proposed in order to make the model aware of the speaker change information, which is an important and intrinsic property of multi-turn dialogues. Furthermore, a speaker-aware disentanglement strategy is proposed to tackle the entangled dialogues. This strategy selects a small number of most important utterances as the filtered context according to the speakers' information in them. Finally, domain adaptation is performed in order to incorporate the in-domain knowledge into pre-trained language models. Experiments on five public datasets show that our proposed model outperforms the present models on all metrics by large margins and achieves new state-of-the-art performances for multi-turn response selection.

Attention mechanism plays a dominant role in the sequence generation models and has been used to improve the performance of machine translation and abstractive text summarization. Different from neural machine translation, in the task of text summarization, salience estimation for words, phrases or sentences is a critical component, since the output summary is a distillation of the input text. Although the typical attention mechanism can conduct text fragment selection from the input text conditioned on the decoder states, there is still a gap to conduct direct and effective salience detection. To bring back direct salience estimation for summarization with neural networks, we propose a Multi-Attention Learning framework which contains two new attention learning components for salience estimation: supervised attention learning and unsupervised attention learning. We regard the attention weights as the salience information, which means that the semantic units with large attention value will be more important. The context information obtained based on the estimated salience is incorporated with the typical attention mechanism in the decoder to conduct summary generation. Extensive experiments on some benchmark datasets in different languages demonstrate the effectiveness of the proposed framework for the task of abstractive summarization.

Many tasks in computer vision and graphics fall within the framework of conditional image synthesis. In recent years, generative adversarial nets (GANs) have delivered impressive advances in quality of synthesized images. However, it remains a challenge to generate both diverse and plausible images for the same input, due to the problem of mode collapse. In this paper, we develop a new generic multimodal conditional image synthesis method based on Implicit Maximum Likelihood Estimation (IMLE) and demonstrate improved multimodal image synthesis performance on two tasks, single image super-resolution and image synthesis from scene layouts. We make our implementation publicly available.

In the context of dynamic evolution of workflow processes, the change region identifies the part of the old process from which migration to the new process is guaranteed to be inconsistent. However, this approach may lead to overestimated regions, incorrectly identifying migratable instances as non-migratable. This overestimation causes delays due to postponement of immediate migration. The paper analyzes this overestimation problem on a class of Petri nets models. Structural properties leading to conditions for minimal change regions and overestimations are developed resulting into classification of change regions into two types of change regions called Structural Change Regions and Perfect Structural Change Regions. Necessary and sufficient conditions for perfect regions are identified. The paper also discusses ways for computing the same in terms of structural properties of the old and the new processes.

In this paper, we present an implicit finite difference method for the numerical solution of the Black-Scholes model of American put options without dividend payments. We combine the proposed numerical method by using a front fixing approach where the option price and the early exercise boundary are computed simultaneously. Consistency and stability properties of the method are studied. We choose to improve the accuracy of the computed solution via a mesh refinement based on Richardson's extrapolation. Comparisons with some proposed methods for the American options problem are carried out to validate the obtained numerical results and to show the efficiency of the proposed numerical methods. Finally, by \textit{a posteriori} error estimator, we find a suitable computational grid requiring that the computed solution verifies a prefixed tolerance.

Due to its variety of applications in the real-world, the task of single image-based crowd counting has received a lot of interest in the recent years. Recently, several approaches have been proposed to address various problems encountered in crowd counting. These approaches are essentially based on convolutional neural networks that require large amounts of data to train the network parameters. Considering this, we introduce a new large scale unconstrained crowd counting dataset (JHU-CROWD++) that contains "4,372" images with "1.51 million" annotations. In comparison to existing datasets, the proposed dataset is collected under a variety of diverse scenarios and environmental conditions. Specifically, the dataset includes several images with weather-based degradations and illumination variations, making it a very challenging dataset. Additionally, the dataset consists of a rich set of annotations at both image-level and head-level. Several recent methods are evaluated and compared on this dataset. The dataset can be downloaded from this http URL .

Furthermore, we propose a novel crowd counting network that progressively generates crowd density maps via residual error estimation. The proposed method uses VGG16 as the backbone network and employs density map generated by the final layer as a coarse prediction to refine and generate finer density maps in a progressive fashion using residual learning. Additionally, the residual learning is guided by an uncertainty-based confidence weighting mechanism that permits the flow of only high-confidence residuals in the refinement path. The proposed Confidence Guided Deep Residual Counting Network (CG-DRCN) is evaluated on recent complex datasets, and it achieves significant improvements in errors.

There is a fundamental gap between how humans understand and use language -- in open-ended, real-world situations -- and today's NLP benchmarks for language understanding. To narrow this gap, we propose to evaluate machines by their success at real-world language use -- which greatly expands the scope of language tasks that can be measured and studied.

We introduce TuringAdvice, a new challenge for language understanding systems. Given a complex situation faced by a real person, a machine must generate helpful advice. We make our challenge concrete by introducing RedditAdvice, a dataset and leaderboard for measuring progress. Though we release a training set with 600k examples, our evaluation is dynamic, continually evolving with the language people use: models must generate helpful advice for recently-written situations.

Empirical results show that today's models struggle at our task, even those with billions of parameters. The best model, a finetuned T5, writes advice that is at least as helpful as human-written advice in only 9% of cases. This low performance reveals language understanding errors that are hard to spot outside of a generative setting, showing much room for progress.

The scientific literature is growing faster than ever. Finding an expert in a particular scientific domain has never been as hard as today because of the increasing amount of publications and because of the ever growing diversity of expertise fields. To tackle this challenge, automatic expert finding algorithms rely on the vast scientific heterogeneous network to match textual queries with potential expert candidates. In this direction, document network embedding methods seem to be an ideal choice for building representations of the scientific literature. Citation and authorship links contain major complementary information to the textual content of the publications. In this paper, we propose a benchmark for expert finding in document networks by leveraging data extracted from a scientific citation network and three scientific question & answer websites. We compare the performances of several algorithms on these different sources of data and further study the applicability of embedding methods on an expert finding task.

Unsupervised representation learning holds the promise of exploiting large amounts of unlabeled data to learn general representations. A promising technique for unsupervised learning is the framework of Variational Auto-encoders (VAEs). However, unsupervised representations learned by VAEs are significantly outperformed by those learned by supervised learning for recognition. Our hypothesis is that to learn useful representations for recognition the model needs to be encouraged to learn about repeating and consistent patterns in data. Drawing inspiration from the mid-level representation discovery work, we propose PatchVAE, that reasons about images at patch level. Our key contribution is a bottleneck formulation that encourages mid-level style representations in the VAE framework. Our experiments demonstrate that representations learned by our method perform much better on the recognition tasks compared to those learned by vanilla VAEs.

This paper experiments with the number of fully-connected layers in a deep convolutional neural network as applied to the classification of fundus retinal images. The images analysed corresponded to the ODIR 2019 (Peking University International Competition on Ocular Disease Intelligent Recognition) [9], which included images of various eye diseases (cataract, glaucoma, myopia, diabetic retinopathy, age-related macular degeneration (AMD), hypertension) as well as normal cases. This work focused on the classification of Normal, Cataract, AMD and Myopia. The feature extraction (convolutional) part of the neural network is kept the same while the feature mapping (linear) part of the network is changed. Different data sets are also explored on these neural nets. Each data set differs from another by the number of classes it has. This paper hence aims to find the relationship between number of classes and number of fully-connected layers. It was found out that the effect of increasing the number of fully-connected layers of a neural networks depends on the type of data set being used. For simple, linearly separable data sets, addition of fully-connected layer is something that should be explored and that could result in better training accuracy, but a direct correlation was not found. However as complexity of the data set goes up(more overlapping classes), increasing the number of fully-connected layers causes the neural network to stop learning. This phenomenon happens quicker the more complex the data set is.

We study the suitability of keystroke dynamics to authenticate 100K users typing free-text. For this, we first analyze to what extent our method based on a Siamese Recurrent Neural Network (RNN) is able to authenticate users when the amount of data per user is scarce, a common scenario in free-text keystroke authentication. With 1K users for testing the network, a population size comparable to previous works, TypeNet obtains an equal error rate of 4.8% using only 5 enrollment sequences and 1 test sequence per user with 50 keystrokes per sequence. Using the same amount of data per user, as the number of test users is scaled up to 100K, the performance in comparison to 1K decays relatively by less than 5%, demonstrating the potential of TypeNet to scale well at large scale number of users. Our experiments are conducted with the Aalto University keystroke database. To the best of our knowledge, this is the largest free-text keystroke database captured with more than 136M keystrokes from 168K users.

The availability of low cost sensors has led to an unprecedented growth in the volume of spatial data. However, the time required to evaluate even simple spatial queries over large data sets greatly hampers our ability to interactively explore these data sets and extract actionable insights. Graphics Processing Units~(GPUs) are increasingly being used to speedup spatial queries. However, existing GPU-based solutions have two important drawbacks: they are often tightly coupled to the specific query types they target, making it hard to adapt them for other queries; and since their design is based on CPU-based approaches, it can be difficult to effectively utilize all the benefits provided by the GPU. As a first step towards making GPU spatial query processing mainstream, we propose a new model that represents spatial data as geometric objects and define an algebra consisting of GPU-friendly composable operators that operate over these objects. We demonstrate the expressiveness of the proposed algebra by formulating standard spatial queries as algebraic expressions. We also present a proof-of-concept prototype that supports a subset of the operators and show that it is at least two orders of magnitude faster than a CPU-based implementation. This performance gain is obtained both using a discrete Nvidia mobile GPU and the less powerful integrated GPUs common in commodity laptops.

In this work, an inverse problem in the fractional diffusion equation with random source is considered. The measurements used are the statistical moments of the realizations of single point data $u(x_0,t,\omega).$ We build the representation of the solution $u$ in integral sense, then prove that the unknowns can be bounded by the moments theoretically. For the numerical reconstruction, we establish an iterative algorithm with regularized Levenberg-Marquardt type and some numerical results generated from this algorithm are displayed. For the case of highly heterogeneous media, the Generalized Multiscale finite element method (GMsFEM) will be employed.

In natural language processing, relation extraction seeks to rationally understand unstructured text. Here, we propose a novel SpanBERT-based graph convolutional network (DG-SpanBERT) that extracts semantic features from a raw sentence using the pre-trained language model SpanBERT and a graph convolutional network to pool latent features. Our DG-SpanBERT model inherits the advantage of SpanBERT on learning rich lexical features from large-scale corpus. It also has the ability to capture long-range relations between entities due to the usage of GCN on dependency tree. The experimental results show that our model outperforms other existing dependency-based and sequence-based models and achieves a state-of-the-art performance on the TACRED dataset.

High-capacity models require vast amounts of data, and data augmentation is a common remedy when this resource is limited. Standard augmentation techniques apply small hand-tuned transformations to existing data, which is a brittle process that realistically only allows for simple transformations. We propose a Bayesian interpretation of data augmentation where the transformations are modelled as latent variables to be marginalized, and show how these can be inferred variationally in an end-to-end fashion. This allows for significantly more complex transformations than manual tuning, and the marginalization implies a form of test-time data augmentation. The resulting model can be interpreted as a probabilistic extension of spatial transformer networks. Experimentally, we demonstrate improvements in accuracy and uncertainty quantification in image and time series classification tasks.

Sparsity-inducing regularization problems are ubiquitous in machine learning applications, ranging from feature selection to model compression. In this paper, we present a novel stochastic method -- Orthant Based Proximal Stochastic Gradient Method (OBProx-SG) -- to solve perhaps the most popular instance, i.e., the l1-regularized problem. The OBProx-SG method contains two steps: (i) a proximal stochastic gradient step to predict a support cover of the solution; and (ii) an orthant step to aggressively enhance the sparsity level via orthant face projection. Compared to the state-of-the-art methods, e.g., Prox-SG, RDA and Prox-SVRG, the OBProx-SG not only converges to the global optimal solutions (in convex scenario) or the stationary points (in non-convex scenario), but also promotes the sparsity of the solutions substantially. Particularly, on a large number of convex problems, OBProx-SG outperforms the existing methods comprehensively in the aspect of sparsity exploration and objective values. Moreover, the experiments on non-convex deep neural networks, e.g., MobileNetV1 and ResNet18, further demonstrate its superiority by achieving the solutions of much higher sparsity without sacrificing generalization accuracy.

We present ESP4ML, an open-source system-level design flow to build and program SoC architectures for embedded applications that require the hardware acceleration of machine learning and signal processing algorithms. We realized ESP4ML by combining two established open-source projects (ESP and HLS4ML) into a new, fully-automated design flow. For the SoC integration of accelerators generated by HLS4ML, we designed a set of new parameterized interface circuits synthesizable with high-level synthesis. For accelerator configuration and management, we developed an embedded software runtime system on top of Linux. With this HW/SW layer, we addressed the challenge of dynamically shaping the data traffic on a network-on-chip to activate and support the reconfigurable pipelines of accelerators that are needed by the application workloads currently running on the SoC. We demonstrate our vertically-integrated contributions with the FPGA-based implementations of complete SoC instances booting Linux and executing computer-vision applications that process images taken from the Google Street View database.

There has been great progress in improving streaming machine translation, a simultaneous paradigm where the system appends to a growing hypothesis as more source content becomes available. We study a related problem in which revisions to the hypothesis beyond strictly appending words are permitted. This is suitable for applications such as live captioning an audio feed. In this setting, we compare custom streaming approaches to re-translation, a straightforward strategy where each new source token triggers a distinct translation from scratch. We find re-translation to be as good or better than state-of-the-art streaming systems, even when operating under constraints that allow very few revisions. We attribute much of this success to a previously proposed data-augmentation technique that adds prefix-pairs to the training data, which alongside wait-k inference forms a strong baseline for streaming translation. We also highlight re-translation's ability to wrap arbitrarily powerful MT systems with an experiment showing large improvements from an upgrade to its base model.

Causal inference is at the heart of empirical research in natural and social sciences and is critical for scientific discovery and informed decision making. The gold standard in causal inference is performing randomized controlled trials; unfortunately these are not always feasible due to ethical, legal, or cost constraints. As an alternative, methodologies for causal inference from observational data have been developed in statistical studies and social sciences. However, existing methods critically rely on restrictive assumptions such as the study population consisting of homogeneous elements that can be represented in a single flat table, where each row is referred to as a unit. In contrast, in many real-world settings, the study domain naturally consists of heterogeneous elements with complex relational structure, where the data is naturally represented in multiple related tables. In this paper, we present a formal framework for causal inference from such relational data. We propose a declarative language called CaRL for capturing causal background knowledge and assumptions and specifying causal queries using simple Datalog-like rules.CaRL provides a foundation for inferring causality and reasoning about the effect of complex interventions in relational domains. We present an extensive experimental evaluation on real relational data to illustrate the applicability of CaRL in social sciences and healthcare.

Specific low-bitrate coding strategies are examined through their effect on LQ control performance. By limiting the subject to these methods, we are able to identify principles underlying coding for control; a subject of significant recent interest but few tangible results. In particular, we consider coding the quantized output signal deploying period-two codes of differing delay-versus-accuracy tradeoff. The quantification of coding performance is via the LQ control cost. The feedback control system comprises the coder-decoder in the path between the output and the state estimator, which is followed by linear state-variable feedback, as is optimal in the memoryless case. The quantizer is treated as the functional composition of an infinitely-long linear staircase function and a saturation. This permits the analysis to subdivide into estimator computations, seemingly independent of the control performance criterion, and an escape time evaluation, which ties the control back into the choice of quantizer saturation bound. An example is studied which illustrates the role of the control objective in determining the efficacy of coding using these schemes. The results mesh well with those observed in signal coding. However, the introduction of a realization-based escape time is a novelty departing significantly from mean square computations.

We consider a scenario wherein two parties Alice and Bob are provided $X_{1}^{n}$ and $X_{2}^{n}$ samples that are IID from a PMF $p_{X_1 X_2}$. Alice and Bob can communicate to Charles over (noiseless) communication links of rate $R_1$ and $R_2$ respectively. Their goal is to enable Charles generate samples $Y^{n}$ such that the triple $(X_{1}^{n},X_{2}^{n},Y^{n})$ has a PMF that is close, in total variation, to $\prod p_{X_1 X_2 Y}$. In addition, the three parties may posses shared common randomness at rate $C$. We address the problem of characterizing the set of rate triples $(R_1,R_2,C)$ for which the above goal can be accomplished. We provide a set of sufficient conditions, i.e., an achievable rate region for this three party setup. Our work also provides a complete characterization of a point-to-point setup wherein Bob is absent and Charles is provided with side-information.

Transport protocols use port numbers to allow connection multiplexing on Internet hosts. TCP as well as UDP, the two most widely used transport protocols, have limitations on what constitutes a valid and invalid port number. One example of an invalid port number for these protocols is port 0. In this work, we present preliminary results from analyzing port 0 traffic at a large European IXP. In one week of traffic we find 74GB port 0 traffic. The vast majority of this traffic has both source and destination ports set to 0, suggesting scanning or reconnaissance as its root cause. Our analysis also shows that more than half of all port 0 traffic is targeted to just 18 ASes, whereas more than half of all traffic is originated by about 100 ASes, suggesting a more diverse set of source ASes.

Gauge-invariance is a fundamental concept in Physics---known to provide mathematical justification for the fundamental forces. In this paper, we provide discrete counterparts to the main gauge theoretical concepts directly in terms of Cellular Automata. More precisely, the notions of gauge-invariance and gauge-equivalence in Cellular Automata are formalized. A step-by-step gauging procedure to enforce this symmetry upon a given Cellular Automaton is developed, and three examples of gauge-invariant Cellular Automata are examined.

In this paper, we identify a new phenomenon called activation-divergence which occurs in Federated Learning (FL) due to data heterogeneity (i.e., data being non-IID) across multiple users. Specifically, we argue that the activation vectors in FL can diverge, even if subsets of users share a few common classes with data residing on different devices. To address the activation-divergence issue, we introduce a prior based on the principle of maximum entropy; this prior assumes minimal information about the per-device activation vectors and aims at making the activation vectors of same classes as similar as possible across multiple devices. Our results show that, for both IID and non-IID settings, our proposed approach results in better accuracy (due to the significantly more similar activation vectors across multiple devices), and is more communication-efficient than state-of-the-art approaches in FL. Finally, we illustrate the effectiveness of our approach on a few common benchmarks and two large medical datasets.

Neural approaches to natural language processing (NLP) often fail at the logical reasoning needed for deeper language understanding. In particular, neural approaches to reasoning that rely on embedded \emph{generalizations} of a knowledge base (KB) implicitly model which facts that are \emph{plausible}, but may not model which facts are \emph{true}, according to the KB. While generalizing the facts in a KB is useful for KB completion, the inability to distinguish between plausible inferences and logically entailed conclusions can be problematic in settings like as KB question answering (KBQA). We propose here a novel KB embedding scheme that supports generalization, but also allows accurate logical reasoning with a KB. Our approach introduces two new mechanisms for KB reasoning: neural retrieval over a set of embedded triples, and "memorization" of highly specific information with a compact sketch structure. Experimentally, this leads to substantial improvements over the state-of-the-art on two KBQA benchmarks.

The Russian Drug Reaction Corpus (RuDReC) is a new partially annotated corpus of consumer reviews in Russian about pharmaceutical products for the detection of health-related named entities and the effectiveness of pharmaceutical products. The corpus itself consists of two parts, the raw one and the labelled one. The raw part includes 1.4 million health-related user-generated texts collected from various Internet sources, including social media. The labelled part contains 500 consumer reviews about drug therapy with drug- and disease-related information. Labels for sentences include health-related issues or their absence. The sentences with one are additionally labelled at the expression level for identification of fine-grained subtypes such as drug classes and drug forms, drug indications, and drug reactions. Further, we present a baseline model for named entity recognition (NER) and multi-label sentence classification tasks on this corpus. The macro F1 score of 74.85% in the NER task was achieved by our RuDR-BERT model. For the sentence classification task, our model achieves the macro F1 score of 68.82% gaining 7.47% over the score of BERT model trained on Russian data. We make the RuDReC corpus and pretrained weights of domain-specific BERT models freely available at https://github.com/cimm-kzn/RuDReC

When video collections become huge, how to explore both within and across videos efficiently is challenging. Video summarization is one of the ways to tackle this issue. Traditional summarization approaches limit the effectiveness of video exploration because they only generate one fixed video summary for a given input video independent of the information need of the user. In this work, we introduce a method which takes a text-based query as input and generates a video summary corresponding to it. We do so by modeling video summarization as a supervised learning problem and propose an end-to-end deep learning based method for query-controllable video summarization to generate a query-dependent video summary. Our proposed method consists of a video summary controller, video summary generator, and video summary output module. To foster the research of query-controllable video summarization and conduct our experiments, we introduce a dataset that contains frame-based relevance score labels. Based on our experimental result, it shows that the text-based query helps control the video summary. It also shows the text-based query improves our model performance. Our code and dataset: https://github.com/Jhhuangkay/Query-controllable-Video-Summarization.

We study the problem of designing interval-valued observers that simultaneously estimate the system state and learn an unknown dynamic model for partially unknown nonlinear systems with dynamic unknown inputs and bounded noise signals. Leveraging affine abstraction methods and the existence of nonlinear decomposition functions, as well as applying our previously developed data-driven function over-approximation/abstraction approach to over-estimate the unknown dynamic model, our proposed observer recursively computes the maximal and minimal elements of the estimate intervals that are proven to contain the true augmented states. Then, using observed output/measurement signals, the observer iteratively shrinks the intervals by eliminating estimates that are not compatible with the measurements. Finally, given new interval estimates, the observer updates the over-approximation of the unknown model dynamics. Moreover, we provide sufficient conditions for uniform boundedness of the sequence of estimate interval widths, i.e., stability of the designed observer, in the form of tractable (mixed-)integer programs with finitely countable feasible sets.

Applications of formal methods for state space exploration have been successfully applied to evaluate robust critical software systems. Formal methods enable discovery of error conditions that conventional testing may miss, and can aid in planning complex system operations. However, broad application of formal methods has been hampered by the effort required to generate formal specifications for real systems. In this paper we present State Linked Interface Compliance Engine for Data (SLICED), a methodology that addresses the complexity of formal state machine specification generation by leveraging conventional engineering models to derive compositional formal state models and to generate formal assertions on the state machines. We demonstrate SLICED using the Virtual ADAPT model published by NASA and validate our results by replicating them using Simulink.

We present a new supervised image classification method for problems where the data at hand conform to certain deformation models applied to unknown prototypes or templates. The method makes use of the previously described Radon Cumulative Distribution Transform (R-CDT) for image data, whose mathematical properties are exploited to express the image data in a form that is more suitable for machine learning. While certain operations such as translation, scaling, and higher-order transformations are challenging to model in native image space, we show the R-CDT can capture some of these variations and thus render the associated image classification problems easier to solve. The method is simple to implement, non-iterative, has no hyper-parameters to tune, it is computationally efficient, and provides competitive accuracies to state-of-the-art neural networks for many types of classification problems, especially in a learning with few labels setting. Furthermore, we show improvements with respect to neural network-based methods in terms of computational efficiency (it can be implemented without the use of GPUs), number of training samples needed for training, as well as out-of-distribution generalization. The Python code for reproducing our results is available at https://github.com/rohdelab/rcdt_ns_classifier.

The increasing use of Internet-of-Things (IoT) devices for monitoring a wide spectrum of applications, along with the challenges of "big data" streaming support they often require for data analysis, is nowadays pushing for an increased attention to the emerging edge computing paradigm. In particular, smart approaches to manage and analyze data directly on the network edge, are more and more investigated, and Artificial Intelligence (AI) powered edge computing is envisaged to be a promising direction. In this paper, we focus on Data Centers (DCs) and Supercomputers (SCs), where a new generation of high-resolution monitoring systems is being deployed, opening new opportunities for analysis like anomaly detection and security, but introducing new challenges for handling the vast amount of data it produces. In detail, we report on a novel lightweight and scalable approach to increase the security of DCs/SCs, that involves AI-powered edge computing on high-resolution power consumption. The method -- called pAElla -- targets real-time Malware Detection (MD), it runs on an out-of-band IoT-based monitoring system for DCs/SCs, and involves Power Spectral Density of power measurements, along with AutoEncoders. Results are promising, with an F1-score close to 1, and a False Alarm and Malware Miss rate close to 0%. We compare our method with State-of-the-Art MD techniques and show that, in the context of DCs/SCs, pAElla can cover a wider range of malware, significantly outperforming SoA approaches in terms of accuracy. Moreover, we propose a methodology for online training suitable for DCs/SCs in production, and release open dataset and code.

Back-translation has proven to be an effective method to utilize monolingual data in neural machine translation (NMT), and iteratively conducting back-translation can further improve the model performance. Selecting which monolingual data to back-translate is crucial, as we require that the resulting synthetic data are of high quality \textit{and} reflect the target domain. To achieve these two goals, data selection and weighting strategies have been proposed, with a common practice being to select samples close to the target domain but also dissimilar to the average general-domain text. In this paper, we provide insights into this commonly used approach and generalize it to a dynamic curriculum learning strategy, which is applied to iterative back-translation models. In addition, we propose weighting strategies based on both the current quality of the sentence and its improvement over the previous iteration. We evaluate our models on domain adaptation, low-resource, and high-resource MT settings and on two language pairs. Experimental results demonstrate that our methods achieve improvements of up to 1.8 BLEU points over competitive baselines.

The Lean mathematical library mathlib is developed by a community of users with very different backgrounds and levels of experience. To lower the barrier of entry for contributors and to lessen the burden of reviewing contributions, we have developed a number of tools for the library which check proof developments for subtle mistakes in the code and generate documentation suited for our varied audience.

Segmentation of Multiple Sclerosis (MS) lesions in longitudinal brain MR scans is performed for monitoring the progression of MS lesions. In order to improve segmentation, we use spatio-temporal cues in longitudinal data. To that end, we propose two approaches: Our longitudinal segmentation architecture which is grounded upon early-fusion of longitudinal data. And complementary to the longitudinal architecture, we propose a novel multi-task learning approach by defining an auxiliary self-supervised task of deformable registration between two time-points to guide the neural network toward learning from spatio-temporal changes. We show the effectiveness of our methods on two datasets: An in-house dataset comprised of 70 patients with one follow-up study for each patient and the ISBI longitudinal MS lesion segmentation challenge dataset which has 19 patients with three to five follow-up studies. Our results show that spatio-temporal information in longitudinal data is a beneficial cue for improving segmentation. Code is publicly available.

Image manipulation can be considered a special case of image generation where the image to be produced is a modification of an existing image. Image generation and manipulation have been, for the most part, tasks that operate on raw pixels. However, the remarkable progress in learning rich image and object representations has opened the way for tasks such as text-to-image or layout-to-image generation that are mainly driven by semantics. In our work, we address the novel problem of image manipulation from scene graphs, in which a user can edit images by merely applying changes in the nodes or edges of a semantic graph that is generated from the image. Our goal is to encode image information in a given constellation and from there on generate new constellations, such as replacing objects or even changing relationships between objects, while respecting the semantics and style from the original image. We introduce a spatio-semantic scene graph network that does not require direct supervision for constellation changes or image edits. This makes it possible to train the system from existing real-world datasets with no additional annotation effort.

With the growing popularity of deep-learning based NLP models, comes a need for interpretable systems. But what is interpretability, and what constitutes a high-quality interpretation? In this opinion piece we reflect on the current state of interpretability evaluation research. We call for more clearly differentiating between different desired criteria an interpretation should satisfy, and focus on the faithfulness criteria. We survey the literature with respect to faithfulness evaluation, and arrange the current approaches around three assumptions, providing an explicit form to how faithfulness is "defined" by the community. We provide concrete guidelines on how evaluation of interpretation methods should and should not be conducted. Finally, we claim that the current binary definition for faithfulness sets a potentially unrealistic bar for being considered faithful. We call for discarding the binary notion of faithfulness in favor of a more graded one, which we believe will be of greater practical utility.

We propose a method for building large collections of human poses with full 3D annotations captured in the wild', for which specialized capture equipment cannot be used. We start with a dataset with 2D keypoint annotations such as COCO and MPII and generates corresponding 3D poses. This is done via Exemplar Fine-Tuning (EFT), a new method to fit a 3D parametric model to 2D keypoints. EFT is accurate and can exploit a data-driven pose prior to resolve the depth reconstruction ambiguity that comes from using only 2D observations as input. We use EFT to augment these large in-the-wild datasets with plausible and accurate 3D pose annotations. We then use this data to strongly supervise a 3D pose regression network, achieving state-of-the-art results in standard benchmarks, including the ones collected outdoor. This network also achieves unprecedented 3D pose estimation quality on extremely challenging Internet videos.

As the COVID-19 pandemic continues its march around the world, an unprecedented amount of open data is being generated for genetics and epidemiological research. The unparalleled rate at which many research groups around the world are releasing data and publications on the ongoing pandemic is allowing other scientists to learn from local experiences and data generated in the front lines of the COVID-19 pandemic. However, there is a need to integrate additional data sources that map and measure the role of social dynamics of such a unique world-wide event into biomedical, biological, and epidemiological analyses. For this purpose, we present a large-scale curated dataset of over 152 million tweets, growing daily, related to COVID-19 chatter generated from January 1st to April 4th at the time of writing. This open dataset will allow researchers to conduct a number of research projects relating to the emotional and mental responses to social distancing measures, the identification of sources of misinformation, and the stratified measurement of sentiment towards the pandemic in near real time.

Manipulation in cluttered environments like homes requires stable grasps, precise placement and robustness against external contact. We present the Sot-Bubble gripper system with a highly compliant gripping surface and dense-geometry visuotactile sensing, capable of multiple kinds of tactile perception. We first present various mechanical design advances and a fabrication technique to deposit custom patterns to the internal surface of the sensor that enable tracking of shear-induced displacement of the manipuland. The depth maps output by the internal imaging sensor are used in an in-hand proximity pose estimation framework -- the method better captures distances to corners or edges on the manipuland geometry. We also extend our previous work on tactile classification and integrate the system within a robust manipulation pipeline for cluttered home environments. The capabilities of the proposed system are demonstrated through robust execution multiple real-world manipulation tasks. A video of the system in action can be found here: [https://youtu.be/G_wBsbQyBfc].

We present a novel greedy Gauss-Seidel method for solving large linear least squares problem. This method improves the greedy randomized coordinate descent (GRCD) method proposed recently by Bai and Wu [Bai ZZ, and Wu WT. On greedy randomized coordinate descent methods for solving large linear least-squares problems. Numer Linear Algebra Appl. 2019;26(4):1--15], which in turn improves the popular randomized Gauss-Seidel method. Convergence analysis of the new method is provided. Numerical experiments show that, for the same accuracy, our method outperforms the GRCD method in term of the computing time.

In this short note, we show that cutting cycles of rods is fixed-parameter tractable by reducing the problem to computing a feedback vertex set in a mixed graph.

Autotuning techniques are a promising approach to minimize the otherwise tedious manual effort of optimizing scientific applications for a specific target platform. Ideally, an autotuning approach is capable of reliably identifying the most efficient implementation variant(s) for a new target system or new characteristics of the input by applying suitable program transformations and analytic models. In this work, we introduce Offsite, an offline autotuning approach which automates this selection process at installation time by rating implementation variants based on an analytic performance model without requiring time-consuming runtime experiments. From abstract multilevel YAML description languages, Offsite automatically derives optimized, platform-specific and problem-specific code of possible implementation variants and applies the performance model to these implementation variants.

We apply Offsite to parallel numerical methods for ordinary differential equations (ODEs). In particular, we investigate tuning a specific class of explicit ODE solvers (PIRK methods) for various initial value problems (IVPs) on shared-memory systems. Our experiments demonstrate that Offsite is able to reliably identify a set of the most efficient implementation variants for given test configurations (ODE solver, IVP, platform) and is capable of effectively handling important autotuning scenarios.

The precise segmentation of retinal blood vessel is of great significance for early diagnosis of eye-related diseases such as diabetes and hypertension. In this work, we propose a lightweight network named Spatial Attention U-Net (SA-UNet) that does not require thousands of annotated training samples and can be utilized in a data augmentation manner to use the available annotated samples more efficiently. SA-UNet introduces a spatial attention module which infers the attention map along the spatial dimension, and then multiply the attention map by the input feature map for adaptive feature refinement. In addition, the proposed network employees a kind of structured dropout convolutional block instead of the original convolutional block of U-Net to prevent the network from overfitting. We evaluate SA-UNet based on two benchmark retinal datasets: the Vascular Extraction (DRIVE) dataset and the Child Heart and Health Study (CHASE_DB1) dataset. The results show that our proposed SA-UNet achieves the state-of-the-art retinal vessel segmentation accuracy on both datasets.

Retinal vessel segmentation plays an imaportant role in the field of retinal image analysis because changes in retinal vascular structure can aid in the diagnosis of diseases such as hypertension and diabetes. In recent research, numerous successful segmentation methods for fundus images have been proposed. But for other retinal imaging modalities, more research is needed to explore vascular extraction. In this work, we propose an efficient method to segment blood vessels in Scanning Laser Ophthalmoscopy (SLO) retinal images. Inspired by U-Net, "feature map reuse" and residual learning, we propose a deep dense residual network structure called DRNet. In DRNet, feature maps of previous blocks are adaptively aggregated into subsequent layers as input, which not only facilitates spatial reconstruction, but also learns more efficiently due to more stable gradients. Furthermore, we introduce DropBlock to alleviate the overfitting problem of the network. We train and test this model on the recent SLO public dataset. The results show that our method achieves the state-of-the-art performance even without data augmentation.

Coronavirus (COVID-19) emerged towards the end of 2019. World Health Organization (WHO) was identified it as a global epidemic. Consensus occurred in the opinion that using Computerized Tomography (CT) techniques for early diagnosis of pandemic disease gives both fast and accurate results. It was stated by expert radiologists that COVID-19 displays different behaviours in CT images. In this study, a novel method was proposed as fusing and ranking deep features to detect COVID-19 in early phase. 16x16 (Subset-1) and 32x32 (Subset-2) patches were obtained from 150 CT images to generate sub-datasets. Within the scope of the proposed method, 3000 patch images have been labelled as CoVID-19 and No finding for using in training and testing phase. Feature fusion and ranking method have been applied in order to increase the performance of the proposed method. Then, the processed data was classified with a Support Vector Machine (SVM). According to other pre-trained Convolutional Neural Network (CNN) models used in transfer learning, the proposed method shows high performance on Subset-2 with 98.27% accuracy, 98.93% sensitivity, 97.60% specificity, 97.63% precision, 98.28% F1-score and 96.54% Matthews Correlation Coefficient (MCC) metrics.

Retinal vessel segmentation is a vital step for the diagnosis of many early eye-related diseases. In this work, we propose a new deep learning model, namely Channel Attention Residual U-Net (CAR-U-Net), to accurately segment retinal vascular and non-vascular pixels. In this model, the channel attention mechanism was introduced into Residual Block and a Channel Attention Residual Block (CARB) was proposed to enhance the discriminative ability of the network by considering the interdependence between the feature channels. Moreover, to prevent the convolutional networks from overfitting, a Structured Dropout Residual Block (SDRB) was proposed, consisting of pre-activated residual block and DropBlock. The results show that our proposed CAR-U-Net has reached the state-of-the-art performance on two publicly available retinal vessel datasets: DRIVE and CHASE DB1.

Deep learning based models have surpassed classical machine learning based approaches in various text classification tasks, including sentiment analysis, news categorization, question answering, and natural language inference. In this work, we provide a detailed review of more than 150 deep learning based models for text classification developed in recent years, and discuss their technical contributions, similarities, and strengths. We also provide a summary of more than 40 popular datasets widely used for text classification. Finally, we provide a quantitative analysis of the performance of different deep learning models on popular benchmarks, and discuss future research directions.

Classic deep learning methods achieve impressive results in image recognition over large-scale artificially-balanced datasets. However, real-world datasets exhibit highly class-imbalanced distributions. In this work we address the problem of long tail recognition wherein the training set is highly imbalanced and the test set is kept balanced. The key challenges faced by any long tail recognition technique are relative imbalance amongst the classes and data scarcity or unseen concepts for mediumshot or fewshot classes. Existing techniques rely on data-resampling, cost sensitive learning, online hard example mining, reshaping the loss objective and complex memory based models to address this problem. We instead propose an ensemble of experts technique that decomposes the imbalanced problem into multiple balanced classification problems which are more tractable. Our ensemble of experts reaches close to state-of-the-art results and an extended ensemble establishes new state-of-the-art on two benchmarks for long tail recognition. We conduct numerous experiments to analyse the performance of the ensemble, and show that in modern datasets relative imbalance is a harder problem than data scarcity.

While image captioning has progressed rapidly, existing works focus mainly on describing single images. In this paper, we introduce a new task, context-aware group captioning, which aims to describe a group of target images in the context of another group of related reference images. Context-aware group captioning requires not only summarizing information from both the target and reference image group but also contrasting between them. To solve this problem, we propose a framework combining self-attention mechanism with contrastive feature construction to effectively summarize common information from each image group while capturing discriminative information between them. To build the dataset for this task, we propose to group the images and generate the group captions based on single image captions using scene graphs matching. Our datasets are constructed on top of the public Conceptual Captions dataset and our new Stock Captions dataset. Experiments on the two datasets show the effectiveness of our method on this new task. Related Datasets and code are released at https://lizw14.github.io/project/groupcap .

The increasingly collaborative, globalized nature of scientific research combined with the need to share data and the explosion in data volumes present an urgent need for a scientific data management system (SDMS). An SDMS presents a logical and holistic view of data that greatly simplifies and empowers data organization, curation, searching, sharing, dissemination, etc. We present DataFed -- a lightweight, distributed SDMS that spans a federation of storage systems within a loosely-coupled network of scientific facilities. Unlike existing SDMS offerings, DataFed uses high-performance and scalable user management and data transfer technologies that simplify deployment, maintenance, and expansion of DataFed. DataFed provides web-based and command-line interfaces to manage data and integrate with complex scientific workflows. DataFed represents a step towards reproducible scientific research by enabling reliable staging of the correct data at the desired environment.

This paper proposes a novel framework for the segmentation of phonocardiogram (PCG) signals into heart states, exploiting the temporal evolution of the PCG as well as considering the salient information that it provides for the detection of the heart state. We propose the use of recurrent neural networks and exploit recent advancements in attention based learning to segment the PCG signal. This allows the network to identify the most salient aspects of the signal and disregard uninformative information. The proposed method attains state-of-the-art performance on multiple benchmarks including both human and animal heart recordings. Furthermore, we empirically analyse different feature combinations including envelop features, wavelet and Mel Frequency Cepstral Coefficients (MFCC), and provide quantitative measurements that explore the importance of different features in the proposed approach. We demonstrate that a recurrent neural network coupled with attention mechanisms can effectively learn from irregular and noisy PCG recordings. Our analysis of different feature combinations shows that MFCC features and their derivatives offer the best performance compared to classical wavelet and envelop features. Heart sound segmentation is a crucial pre-processing step for many diagnostic applications. The proposed method provides a cost effective alternative to labour extensive manual segmentation, and provides a more accurate segmentation than existing methods. As such, it can improve the performance of further analysis including the detection of murmurs and ejection clicks. The proposed method is also applicable for detection and segmentation of other one dimensional biomedical signals.

We consider the problem of incrementally maintaining the triangle queries with arbitrary free variables under single-tuple updates to the input relations. We introduce an approach called IVM$^\epsilon$ that exhibits a trade-off between the update time, the space, and the delay for the enumeration of the query result, such that the update time ranges from the square root to linear in the database size while the delay ranges from constant to linear time. IVM$^\epsilon$ achieves Pareto worst-case optimality in the update-delay space conditioned on the Online Matrix-Vector Multiplication conjecture. It is strongly Pareto optimal for the triangle queries with zero or three free variables and weakly Pareto optimal for the triangle queries with one or two free variables.

Machine learning applications that are implemented with spike-based computation model, e.g., Spiking Neural Network (SNN), have a great potential to lower the energy consumption when they are executed on a neuromorphic hardware. However, compiling and mapping an SNN to the hardware is challenging, especially when compute and storage resources of the hardware (viz. crossbar) need to be shared among the neurons and synapses of the SNN. We propose an approach to analyze and compile SNNs on a resource-constrained neuromorphic hardware, providing guarantee on key performance metrics such as execution time and throughput. Our approach makes the following three key contributions. First, we propose a greedy technique to partition an SNN into clusters of neurons and synapses such that each cluster can fit on to the resources of a crossbar. Second, we exploit the rich semantics and expressiveness of Synchronous Dataflow Graphs (SDFGs) to represent a clustered SNN and analyze its performance using Max-Plus Algebra, considering the available compute and storage capacities, buffer sizes, and communication bandwidth. Third, we propose a self-timed execution-based fast technique to compile and admit SNN-based applications to a neuromorphic hardware at run-time, adapting dynamically to the available resources on the hardware. We evaluate our approach with standard SNN-based applications and demonstrate a significant performance improvement compared to current practices.

Smallholder farmers in Tanzania are challenged on the lack of tools for early detection of banana diseases. This study aimed at developing a mobile application for early detection of Fusarium wilt race 1 and black Sigatoka banana diseases using deep learning. We used a dataset of 3000 banana leaves images. We pre-trained our model on Resnet152 and Inceptionv3 Convolution Neural Network architectures. The Resnet152 achieved an accuracy of 99.2% and Inceptionv3 an accuracy of 95.41%. On deployment using Android mobile phones, we chose Inceptionv3 since it has lower memory requirements compared to Resnet152. The mobile application on real environment detected the two diseases with a confidence level of 99% of the captured leaf area. This result indicates the potential in improving the yield of bananas by smallholder farmers using a tool for early detection of diseases.

Despite several (accepted) standards, core notions typically employed in information technology (IT) architectures lack the precise and exact foundations encountered in logic, algebra, and other branches of mathematics. In this contribution we define the term "architecture" in a mathematically rigorous way. We motivate our particular choice by demonstrating (i) how commonly understood and expected properties of an architecture can be suitably defined or derived within our formalization, and (ii) how our concept is fully compatible with real life (business) architectures. Based on our fundamental definitions we further develop a rigorous notion of architectural \emph{similarity} based on the notion of "homomorphisms" between architectures. We demonstrate the (theoretical) applicability by deriving some theorems on the characterization n-tier architectures.

The success of pretrained transformer language models in natural language processing has led to a wide range of different pretraining setups. These models employ a variety of subword tokenization methods, most notably byte pair encoding (BPE) (Sennrich et al., 2016; Gage, 1994), the WordPiece method (Schuster and Nakajima, 2012), and unigram language modeling (Kudo, 2018), to segment text. However, to the best of our knowledge, the literature does not contain a direct evaluation of the impact of tokenization on language model pretraining. First, we analyze differences between BPE and unigram LM tokenization, and find that the unigram LM method is able to recover subword units that more strongly align with underlying morphology, in addition to avoiding several shortcomings of BPE stemming from its greedy construction procedure. We then compare the fine-tuned task performance of identical transformer masked language models pretrained with these tokenizations. Across downstream tasks, we find that the unigram LM tokenization method consistently matches or outperforms BPE. We hope that developers of future pretrained language models will consider adopting the unigram LM method over the more common BPE.

The global expansion of maritime activities and the development of the Automatic Identification System (AIS) have driven the advances in maritime monitoring systems in the last decade. Monitoring vessel behavior is fundamental to safeguard maritime operations, protecting other vessels sailing the ocean and the marine fauna and flora. Given the enormous volume of vessel data continually being generated, real-time analysis of vessel behaviors is only possible because of decision support systems provided with event and anomaly detection methods. However, current works on vessel event detection are ad-hoc methods able to handle only a single or a few predefined types of vessel behavior. Most of the existing approaches do not learn from the data and require the definition of queries and rules for describing each behavior. In this paper, we discuss challenges and opportunities in classical machine learning and deep learning for vessel event and anomaly detection. We hope to motivate the research of novel methods and tools, since addressing these challenges is an essential step towards actual intelligent maritime monitoring systems.

This paper develops a distributed solution to the fully-heterogeneous containment control problem (CCP), for which not only the followers' dynamics but also the leaders' dynamics are non-identical. A novel formulation of the fully-heterogeneous CCP is first presented in which each follower constructs its virtual exo-system. To build these virtual exo-systems by followers, a novel distributed algorithm is developed to calculate the so-called normalized level of influences (NLIs) of all leaders on each follower and a novel adaptive distributed observer is designed to estimate the dynamics and states of all leaders that have an influence on each follower. Then, a distributed control protocol is proposed based on the cooperative output regulation framework, utilizing this virtual exo-system. Based on estimations of leaders' dynamics and states and NLIs of leaders on each follower, the solutions of the so-called linear regulator equations are calculated in a distributed manner, and consequently, a distributed control protocol is designed for solving the output containment problem. Finally, theoretical results are verified by performing numerical simulations.

Advanced systems such as IoT comprise many heterogeneous, interconnected, and autonomous entities operating in often highly dynamic environments. Due to their large scale and complexity, large volumes of monitoring data are generated and need to be stored, retrieved, and mined in a time- and resource-efficient manner. Architectural self-adaptation automates the control, orchestration, and operation of such systems. This can only be achieved via sophisticated decision-making schemes supported by monitoring data that fully captures the system behavior and its history.

Employing model-driven engineering techniques we propose a highly scalable, history-aware approach to store and retrieve monitoring data in form of enriched runtime models. We take advantage of rule-based adaptation where change events in the system trigger adaptation rules. We first present a scheme to incrementally check model queries in the form of temporal logic formulas which represent the conditions of adaptation rules against a runtime model with history. Then we enhance the model to retain only information that is temporally relevant to the queries, therefore reducing the accumulation of information to a required minimum. Finally, we demonstrate the feasibility and scalability of our approach via experiments on a simulated smart healthcare system employing a real-world medical guideline.

Online recommendation systems make use of a variety of information sources to provide users the items that users are potentially interested in. However, due to the openness of the online platform, recommendation systems are vulnerable to data poisoning attacks. Existing attack approaches are either based on simple heuristic rules or designed against specific recommendations approaches. The former often suffers unsatisfactory performance, while the latter requires strong knowledge of the target system. In this paper, we focus on a general next-item recommendation setting and propose a practical poisoning attack approach named LOKI against blackbox recommendation systems. The proposed LOKI utilizes the reinforcement learning algorithm to train the attack agent, which can be used to generate user behavior samples for data poisoning. In real-world recommendation systems, the cost of retraining recommendation models is high, and the interaction frequency between users and a recommendation system is restricted.Given these real-world restrictions, we propose to let the agent interact with a recommender simulator instead of the target recommendation system and leverage the transferability of the generated adversarial samples to poison the target system. We also propose to use the influence function to efficiently estimate the influence of injected samples on the recommendation results, without re-training the models within the simulator. Extensive experiments on two datasets against four representative recommendation models show that the proposed LOKI achieves better attacking performance than existing methods.

Recently, the Wasserstein loss function has been proven to be effective when applied to deterministic full-waveform inversion (FWI) problems. We consider the application of this loss function in Bayesian FWI so that the uncertainty can be captured in the solution. Other loss functions that are commonly used in practice are also considered for comparison. Existence and stability of the resulting Gibbs posteriors are shown on function space under weak assumptions on the prior and model. In particular, the distribution arising from the Wasserstein loss is shown to be quite stable with respect to high-frequency noise in the data. We then illustrate the difference between the resulting distributions numerically, using Laplace approximations and dimension-robust MCMC to estimate the unknown velocity field and uncertainty associated with the estimates.

We present a locality preserving loss (LPL)that improves the alignment between vector space representations (i.e., word or sentence embeddings) while separating (increasing distance between) uncorrelated representations as compared to the standard method that minimizes the mean squared error (MSE) only. The locality preserving loss optimizes the projection by maintaining the local neighborhood of embeddings that are found in the source, in the target domain as well. This reduces the overall size of the dataset required to the train model. We argue that vector space alignment (with MSE and LPL losses) acts as a regularizer in certain language-based classification tasks, leading to better accuracy than the base-line, especially when the size of the training set is small. We validate the effectiveness ofLPL on a cross-lingual word alignment task, a natural language inference task, and a multi-lingual inference task.

Mixed-integer convex programming (MICP) has seen significant algorithmic and hardware improvements with several orders of magnitude solve time speedups compared to 25 years ago. Despite these advances, MICP has been rarely applied to real-world robotic control because the solution times are still too slow for online applications. In this work, we extend the machine learning optimizer (MLOPT) framework to solve MICPs arising in robotics at very high speed. MLOPT encodes the combinatorial part of the optimal solution into a strategy. Using data collected from offline problem solutions, we train a multiclass classifier to predict the optimal strategy given problem-specific parameters such as states or obstacles. Compared to previous approaches, we use task-specific strategies and prune redundant ones to significantly reduce the number of classes the predictor has to select from, thereby greatly improving scalability. Given the predicted strategy, the control task becomes a small convex optimization problem that we can solve in milliseconds. Numerical experiments on a cart-pole system with walls, a free-flying space robot and task-oriented grasps show that our method provides not only 1 to 2 orders of magnitude speedups compared to state-of-the-art solvers but also performance close to the globally optimal MICP solution.

Unconstrained remote gaze estimation remains challenging mostly due to its vulnerability to the large variability in head-pose. Prior solutions struggle to maintain reliable accuracy in unconstrained remote gaze tracking. Among them, appearance-based solutions demonstrate tremendous potential in improving gaze accuracy. However, existing works still suffer from head movement and are not robust enough to handle real-world scenarios. Especially most of them study gaze estimation under controlled scenarios where the collected datasets often cover limited ranges of both head-pose and gaze which introduces further bias. In this paper, we propose novel end-to-end appearance-based gaze estimation methods that could more robustly incorporate different levels of head-pose representations into gaze estimation. Our method could generalize to real-world scenarios with low image quality, different lightings and scenarios where direct head-pose information is not available. To better demonstrate the advantage of our methods, we further propose a new benchmark dataset with the most rich distribution of head-gaze combination reflecting real-world scenarios. Extensive evaluations on several public datasets and our own dataset demonstrate that our method consistently outperforms the state-of-the-art by a significant margin.

The Internet of Things (IoT) has started to empower the future of many industrial and mass-market applications. Localization techniques are becoming key to add location context to IoT data without human perception and intervention. Meanwhile, the newly-emerged Low-Power Wide-Area Network (LPWAN) technologies have advantages such as long-range, low power consumption, low cost, massive connections, and the capability for communication in both indoor and outdoor areas. These features make LPWAN signals strong candidates for mass-market localization applications. However, there are various error sources that have limited localization performance by using such IoT signals. This paper reviews the IoT localization system through the following sequence: IoT localization system review -- localization data sources -- localization algorithms -- localization error sources and mitigation -- localization performance evaluation. Compared to the related surveys, this paper has a more comprehensive and state-of-the-art review on IoT localization methods, an original review on IoT localization error sources and mitigation, an original review on IoT localization performance evaluation, and a more comprehensive review of IoT localization applications, opportunities, and challenges. Thus, this survey provides comprehensive guidance for peers who are interested in enabling localization ability in the existing IoT systems, using IoT systems for localization, or integrating IoT signals with the existing localization sensors.

Recent advances in large-scale language representation models such as BERT have improved the state-of-the-art performances in many NLP tasks. Meanwhile, character-level Chinese NLP models, including BERT for Chinese, have also demonstrated that they can outperform the existing models. In this paper, we show that, however, such BERT-based models are vulnerable under character-level adversarial attacks. We propose a novel Chinese char-level attack method against BERT-based classifiers. Essentially, we generate "small" perturbation on the character level in the embedding space and guide the character substitution procedure. Extensive experiments show that the classification accuracy on a Chinese news dataset drops from 91.8% to 0% by manipulating less than 2 characters on average based on the proposed attack. Human evaluations also confirm that our generated Chinese adversarial examples barely affect human performance on these NLP tasks.

The recently proposed SNLI-VE corpus for recognising visual-textual entailment is a large, real-world dataset for fine-grained multimodal reasoning. However, the automatic way in which SNLI-VE has been assembled (via combining parts of two related datasets) gives rise to a large number of errors in the labels of this corpus. In this paper, we first present a data collection effort to correct the class with the highest error rate in SNLI-VE. Secondly, we re-evaluate an existing model on the corrected corpus, which we call SNLI-VE-2.0, and provide a quantitative comparison with its performance on the non-corrected corpus. Thirdly, we introduce e-SNLI-VE-2.0, which appends human-written natural language explanations to SNLI-VE-2.0. Finally, we train models that learn from these explanations at training time, and output such explanations at testing time.

COVID-19 is currently one the most life-threatening problems around the world. The fast and accurate detection of the COVID-19 infection is essential to identify, take better decisions and ensure treatment for the patients which will help save their lives. In this paper, we propose a fast and efficient way to identify COVID-19 patients with multi-task deep learning (DL) methods. Both X-ray and CT scan images are considered to evaluate the proposed technique. We employ our Inception Residual Recurrent Convolutional Neural Network with Transfer Learning (TL) approach for COVID-19 detection and our NABLA-N network model for segmenting the regions infected by COVID-19. The detection model shows around 84.67% testing accuracy from X-ray images and 98.78% accuracy in CT-images. A novel quantitative analysis strategy is also proposed in this paper to determine the percentage of infected regions in X-ray and CT images. The qualitative and quantitative results demonstrate promising results for COVID-19 detection and infected region localization.

This paper investigates the stochastic optimization problem with a focus on developing scalable parallel algorithms for deep learning tasks. Our solution involves a reformation of the objective function for stochastic optimization in neural network models, along with a novel parallel strategy, coined weighted aggregating stochastic gradient descent (WASGD). Following a theoretical analysis on the characteristics of the new objective function, WASGD introduces a decentralized weighted aggregating scheme based on the performance of local workers. Without any center variable, the new method automatically assesses the importance of local workers and accepts them according to their contributions. Furthermore, we have developed an enhanced version of the method, WASGD+, by (1) considering a designed sample order and (2) applying a more advanced weight evaluating function. To validate the new method, we benchmark our schemes against several popular algorithms including the state-of-the-art techniques (e.g., elastic averaging SGD) in training deep neural networks for classification tasks. Comprehensive experiments have been conducted on four classic datasets, including the CIFAR-100, CIFAR-10, Fashion-MNIST, and MNIST. The subsequent results suggest the superiority of the WASGD scheme in accelerating the training of deep architecture. Better still, the enhanced version, WASGD+, has been shown to be a significant improvement over its basic version.

Visual Question Answering (VQA) systems are tasked with answering natural language questions corresponding to a presented image. Current VQA datasets typically contain questions related to the spatial information of objects, object attributes, or general scene questions. Recently, researchers have recognized the need for improving the balance of such datasets to reduce the system's dependency on memorized linguistic features and statistical biases and to allow for improved visual understanding. However, it is unclear as to whether there are any latent patterns that can be used to quantify and explain these failures. To better quantify our understanding of the performance of VQA models, we use a taxonomy of Knowledge Gaps (KGs) to identify/tag questions with one or more types of KGs. Each KG describes the reasoning abilities needed to arrive at a resolution, and failure to resolve gaps indicate an absence of the required reasoning ability. After identifying KGs for each question, we examine the skew in the distribution of the number of questions for each KG. In order to reduce the skew in the distribution of questions across KGs, we introduce a targeted question generation model. This model allows us to generate new types of questions for an image.

We present an open loop system, called DashCam Pay, that enables in-vehicle payments using face and voice biometrics. The system uses a plug-and-play device (dashcam) mounted in the vehicle to capture face images and voice commands of passengers. The dashcam is connected to mobile devices of passengers sitting in the vehicle, and uses privacy-preserving biometric comparison techniques to compare the biometric data captured by the dashcam with the biometric data enrolled on the users' mobile devices to determine the payer. Once the payer is verified, payment is initiated via the mobile device of the payer. For initial feasibility analysis, we collected data from 20 different subjects at two different sites using a commercially available dashcam, and evaluated open-source biometric algorithms on the collected data. Subsequently, we built an android prototype of the proposed system using open-source software packages to demonstrate the utility of the proposed system in facilitating secure in-vehicle payments. DashCam Pay can be integrated either by dashcam or vehicle manufacturers to enable open loop in-vehicle payments. We also discuss the applicability of the system to other payments scenarios, such as in-store payments.

We present a model describing the temporal evolution of opinions due to interactions among a network of individuals. This Accept-Shift-Constrict (ASC) model is formulated in terms of coupled nonlinear differential equations for opinions and uncertainties. The ASC model dynamics allows for the emergence and persistence of majority positions so that the mean opinion can shift even for a symmetric network. The model also formulates a distinction between opinion and rhetoric in accordance with a recently proposed theory of the group polarization effect. This enables the modeling of discussion-induced shifts toward the extreme without the typical modeling assumption of greater resistance to persuasion among extremists. An experiment is described in which triads engaged in online discussion. Simulations show that the ASC model is in qualitative and quantitative agreement with the experimental data.

Disentanglement is a problem in which multiple conversations occur in the same channel simultaneously, and the listener should decide which utterance is part of the conversation he will respond to. We propose a new model, named Dialogue BERT (DialBERT), which integrates local and global semantics in a single stream of messages to disentangle the conversations that mixed together. We employ BERT to capture the matching information in each utterance pair at the utterance-level, and use a BiLSTM to aggregate and incorporate the context-level information. With only a 3% increase in parameters, a 12% improvement has been attained in comparison to BERT, based on the F1-Score. The model achieves a state-of-the-art result on the a new dataset proposed by IBM and surpasses previous work by a substantial margin.

Recent developments in Transformers have opened new interesting areas of research in partially observable reinforcement learning tasks. Results from late 2019 showed that Transformers are able to outperform LSTMs on both memory intense and reactive tasks. In this work we first partially replicate the results shown in Stabilizing Transformers in RL on both reactive and memory based environments. We then show performance improvement coupled with reduced computation when adding adaptive attention span to this Stable Transformer on a challenging DMLab30 environment. The code for all our experiments and models is available at https://github.com/jerrodparker20/adaptive-transformers-in-rl.

Early work on narrative modeling used explicit plans and goals to generate stories, but the language generation itself was restricted and inflexible. Modern methods use language models for more robust generation, but often lack an explicit representation of the scaffolding and dynamics that guide a coherent narrative. This paper introduces a new model that integrates explicit narrative structure with neural language models, formalizing narrative modeling as a Switching Linear Dynamical System (SLDS). A SLDS is a dynamical system in which the latent dynamics of the system (i.e. how the state vector transforms over time) is controlled by top-level discrete switching variables. The switching variables represent narrative structure (e.g., sentiment or discourse states), while the latent state vector encodes information on the current state of the narrative. This probabilistic formulation allows us to control generation, and can be learned in a semi-supervised fashion using both labeled and unlabeled data. Additionally, we derive a Gibbs sampler for our model that can fill in arbitrary parts of the narrative, guided by the switching variables. Our filled-in (English language) narratives outperform several baselines on both automatic and human evaluations.

This paper is concerned with numerical approximation of some two-dimensional Keller-Segel chemotaxis models, especially those generating pattern formations. The numerical resolution of such nonlinear parabolic-parabolic or parabolic-elliptic systems of partial differential equations consumes a significant computational time when solved with fully implicit schemes. Standard linearized semi-implicit schemes, however, require reasonable computational time, but suffer from lack of accuracy. In this work, two methods based on a single-layer neural network are developed to build linearized implicit schemes: a basic one called the each step training linearized implicit (ESTLI) method and a more efficient one, the selected steps training linearized implicit (SSTLI) method. The proposed schemes make use also of a spatial finite volume method with a hybrid difference scheme approximation for convection-diffusion fluxes. Several numerical tests are performed to illustrate the accuracy, efficiency and robustness of the proposed methods. Generalization of the developed methods to other nonlinear partial differential equations is straightforward.

Today's increasing rate of technological change results from the rapid growth in computer processing speed, when combined with the cost decline of processing capacity, and is of historical import. The daily life of billions of individuals worldwide has been forever changed by technology in just the last few years. Costly data breaches continue at an alarming rate. The challenge facing humans as they attempt to govern the process of artificial intelligence, machine learning, and the impact of billions of sensory devices connected to the Internet is the subject of this Article.

We proceed in nine sections. First, we define the Internet of Things (IoT), comment on the explosive growth in sensory devices connected to the Internet, provide examples of IoT devices, and speak to the promise of the IoT. Second, we discuss legal requirements for corporate governance as a foundation for considering the challenge of governing the IoT. Third, we look at potential IoT threats. Fourth, we discuss the Mirai botnet. Fifth, is a look at the IoT threat vector vulnerabilities during times of crisis. Sixth, we discuss the Manufactured Usage Description (MUD) methodology. Seventh, is a discussion of recent regulatory developments. Next, we look at a few recommendations. And finally, we conclude. We believe this Article contributes to our understanding of the widespread exposure to malware associated with IoT and adds to the nascent but emerging literature on governance of enterprise risk, a subject of vital societal importance.

A significant remaining challenge for existing recommender systems is that users may not trust the recommender systems for either lack of explanation or inaccurate recommendation results. Thus, it becomes critical to embrace a trustworthy recommender system. This survey provides a systemic summary of three categories of trust-aware recommender systems: social-aware recommender systems that leverage users' social relationships; robust recommender systems that filter untruthful noises (e.g., spammers and fake information) or enhance attack resistance; explainable recommender systems that provide explanations of recommended items. We focus on the work based on deep learning techniques, an emerging area in the recommendation research.

Supervised relation extraction methods based on deep neural network play an important role in the recent information extraction field. However, at present, their performance still fails to reach a good level due to the existence of complicated relations. On the other hand, recently proposed pre-trained language models (PLMs) have achieved great success in multiple tasks of natural language processing through fine-tuning when combined with the model of downstream tasks. However, original standard tasks of PLM do not include the relation extraction task yet. We believe that PLMs can also be used to solve the relation extraction problem, but it is necessary to establish a specially designed downstream task model or even loss function for dealing with complicated relations. In this paper, a new network architecture with a special loss function is designed to serve as a downstream model of PLMs for supervised relation extraction. Experiments have shown that our method significantly exceeded the current optimal baseline models across multiple public datasets of relation extraction.

Satirical news detection is an important yet challenging task to prevent spread of misinformation. Many feature based and end-to-end neural nets based satirical news detection systems have been proposed and delivered promising results. Existing approaches explore comprehensive word features from satirical news articles, but lack semantic metrics using word vectors for tweet form satirical news. Moreover, the vagueness of satire and news parody determines that a news tweet can hardly be classified with a binary decision, that is, satirical or legitimate. To address these issues, we collect satirical and legitimate news tweets, and propose a semantic feature based approach. Features are extracted by exploring inconsistencies in phrases, entities, and between main and relative clauses. We apply game-theoretic rough set model to detect satirical news, in which probabilistic thresholds are derived by game equilibrium and repetition learning mechanism. Experimental results on the collected dataset show the robustness and improvement of the proposed approach compared with Pawlak rough set model and SVM.

Recently, the performance of single image super-resolution (SR) has been significantly improved with powerful networks. However, these networks are developed for image SR with a single specific integer scale (e.g., x2;x3,x4), and cannot be used for non-integer and asymmetric SR. In this paper, we propose to learn a scale-arbitrary image SR network from scale-specific networks. Specifically, we propose a plug-in module for existing SR networks to perform scale-arbitrary SR, which consists of multiple scale-aware feature adaption blocks and a scale-aware upsampling layer. Moreover, we introduce a scale-aware knowledge transfer paradigm to transfer knowledge from scale-specific networks to the scale-arbitrary network. Our plug-in module can be easily adapted to existing networks to achieve scale-arbitrary SR. These networks plugged with our module can achieve promising results for non-integer and asymmetric SR while maintaining state-of-the-art performance for SR with integer scale factors. Besides, the additional computational and memory cost of our module is very small.

We define and analyze a multi-agent multi-armed bandit problem in which decision-making agents can observe the choices and rewards of their neighbors under a linear observation cost. Neighbors are defined by a network graph that encodes the inherent observation constraints of the system. We define a cost associated with observations such that at every instance an agent makes an observation it receives a constant observation regret. We design a sampling algorithm and an observation protocol for each agent to maximize its own expected cumulative reward through minimizing expected cumulative sampling regret and expected cumulative observation regret. For our proposed protocol, we prove that total cumulative regret is logarithmically bounded. We verify the accuracy of analytical bounds using numerical simulations.

Training large language representation models has become a standard in the natural language processing community. This allows for fine tuning on any number of specific tasks, however, these large high capacity models can continue to train on domain specific unlabeled data to make initialization even more robust for supervised tasks. We demonstrate that in practice these pre-trained models present performance deterioration in the form of catastrophic forgetting when evaluated on tasks from a general domain such as GLUE. In this work we propose CALM, Continuous Adaptive Learning for Language Modeling: techniques to render models which retain knowledge across multiple domains. With these methods, we are able to reduce the performance gap across supervised tasks introduced by task specific models which we demonstrate using a continual learning setting in biomedical and clinical domains.

LiDAR odometry is a fundamental task for various areas such as robotics, autonomous driving. This problem is difficult since it requires the systems to be highly robust running in noisy real-world data. Existing methods are mostly local iterative methods. Feature-based global registration methods are not preferred since extracting accurate matching pairs in the nonuniform and sparse LiDAR data remains challenging. In this paper, we present Deep Matching LiDAR Odometry (DMLO), a novel learning-based framework which makes the feature matching method applicable to LiDAR odometry task. Unlike many recent learning-based methods, DMLO explicitly enforces geometry constraints in the framework. Specifically, DMLO decomposes the 6-DoF pose estimation into two parts, a learning-based matching network which provides accurate correspondences between two scans and rigid transformation estimation with a close-formed solution by Singular Value Decomposition (SVD). Comprehensive experimental results on real-world datasets KITTI and Argoverse demonstrate that our DMLO dramatically outperforms existing learning-based methods and comparable with the state-of-the-art geometry based approaches.

Generative Adversarial Networks (GANs) have gained significant attention in recent years, with particularly impressive applications highlighted in computer vision. In this work, we present a Mixture Density Conditional Generative Adversarial Model (MD-CGAN), where the generator is a Gaussian mixture model, with a focus on time series forecasting. Compared to examples in vision, there have been more limited applications of GAN models to time series. We show that our model is capable of estimating a probabilistic posterior distribution over forecasts and that, in comparison to a set of benchmark methods, the MD-CGAN model performs well, particularly in situations where noise is a significant in the time series. Further, by using a Gaussian mixture model that allows for a flexible number of mixture coefficients, the MD-CGAN offers posterior distributions that are non-Gaussian.

Binary sequences with low odd-periodic correlation magnitudes have found important applications in communication systems. It is well known that the nega-cyclic shift and negation preserve the odd-periodic autocorrelation function (OACF) values in general. In this paper, we define a new operation based on Parker's transformation, which also preserves the OACF values of binary sequences. This enables us to classify Parker's 16 cases into 8 ones, and may possibly further allow to classify all constructions based on Parker's transformation.

To speedup Deep Neural Networks (DNN) accelerator design and enable effective implementation, we propose HybridDNN, a framework for building high-performance hybrid DNN accelerators and delivering FPGA-based hardware implementations. Novel techniques include a highly flexible and scalable architecture with a hybrid Spatial/Winograd convolution (CONV) Processing Engine (PE), a comprehensive design space exploration tool, and a complete design flow to fully support accelerator design and implementation. Experimental results show that the accelerators generated by HybridDNN can deliver 3375.7 and 83.3 GOPS on a high-end FPGA (VU9P) and an embedded FPGA (PYNQ-Z1), respectively, which achieve a 1.8x higher performance improvement compared to the state-of-art accelerator designs. This demonstrates that HybridDNN is flexible and scalable and can target both cloud and embedded hardware platforms with vastly different resource constraints.

Efficient rendering of photo-realistic virtual worlds is a long standing effort of computer graphics. Modern graphics techniques have succeeded in synthesizing photo-realistic images from hand-crafted scene representations. However, the automatic generation of shape, materials, lighting, and other aspects of scenes remains a challenging problem that, if solved, would make photo-realistic computer graphics more widely accessible. Concurrently, progress in computer vision and machine learning have given rise to a new approach to image synthesis and editing, namely deep generative models. Neural rendering is a new and rapidly emerging field that combines generative machine learning techniques with physical knowledge from computer graphics, e.g., by the integration of differentiable rendering into network training. With a plethora of applications in computer graphics and vision, neural rendering is poised to become a new area in the graphics community, yet no survey of this emerging field exists. This state-of-the-art report summarizes the recent trends and applications of neural rendering. We focus on approaches that combine classic computer graphics techniques with deep generative models to obtain controllable and photo-realistic outputs. Starting with an overview of the underlying computer graphics and machine learning concepts, we discuss critical aspects of neural rendering approaches. This state-of-the-art report is focused on the many important use cases for the described algorithms such as novel view synthesis, semantic photo manipulation, facial and body reenactment, relighting, free-viewpoint video, and the creation of photo-realistic avatars for virtual and augmented reality telepresence. Finally, we conclude with a discussion of the social implications of such technology and investigate open research problems.

We introduce SciWING, an open-source software toolkit which provides access to pre-trained models for scientific document processing tasks, inclusive of citation string parsing and logical structure recovery. SciWING enables researchers to rapidly experiment with different models by swapping and stacking different modules. It also enables them declare and run models from a configuration file. It enables researchers to perform production-ready transfer learning from general, pre-trained transformers (i.e., BERT, SciBERT etc), and aids development of end-user applications. It includes ready-to-use web and terminal-based applications and demonstrations (Available from this http URL).

One of the most popular paradigms of applying large, pre-trained NLP models such as BERT is to fine-tune it on a smaller dataset. However, one challenge remains as the fine-tuned model often overfits on smaller datasets. A symptom of this phenomenon is that irrelevant words in the sentences, even when they are obvious to humans, can substantially degrade the performance of these fine-tuned BERT models. In this paper, we propose a novel technique, called Self-Supervised Attention (SSA) to help facilitate this generalization challenge. Specifically, SSA automatically generates weak, token-level attention labels iteratively by "probing" the fine-tuned model from the previous iteration. We investigate two different ways of integrating SSA into BERT and propose a hybrid approach to combine their benefits. Empirically, on a variety of public datasets, we illustrate significant performance improvement using our SSA-enhanced BERT model.

Many studies have applied reinforcement learning to train a dialog policy and show great promise these years. One common approach is to employ a user simulator to obtain a large number of simulated user experiences for reinforcement learning algorithms. However, modeling a realistic user simulator is challenging. A rule-based simulator requires heavy domain expertise for complex tasks, and a data-driven simulator requires considerable data and it is even unclear how to evaluate a simulator. To avoid explicitly building a user simulator beforehand, we propose Multi-Agent Dialog Policy Learning, which regards both the system and the user as the dialog agents. Two agents interact with each other and are jointly learned simultaneously. The method uses the actor-critic framework to facilitate pretraining and improve scalability. We also propose Hybrid Value Network for the role-aware reward decomposition to integrate role-specific domain knowledge of each agent in the task-oriented dialog. Results show that our method can successfully build a system policy and a user policy simultaneously, and two agents can achieve a high task success rate through conversational interaction.

This paper proposes a statistical approach to 2D pose estimation from human images. The main problems with the standard supervised approach, which is based on a deep recognition (image-to-pose) model, are that it often yields anatomically implausible poses, and its performance is limited by the amount of paired data. To solve these problems, we propose a semi-supervised method that can make effective use of images with and without pose annotations. Specifically, we formulate a hierarchical generative model of poses and images by integrating a deep generative model of poses from pose features with that of images from poses and image features. We then introduce a deep recognition model that infers poses from images. Given images as observed data, these models can be trained jointly in a hierarchical variational autoencoding (image-to-pose-to-feature-to-pose-to-image) manner. The results of experiments show that the proposed reflective architecture makes estimated poses anatomically plausible, and the performance of pose estimation improved by integrating the recognition and generative models and also by feeding non-annotated images.

Query optimization remains one of the most challenging problems in data management systems. Recent efforts to apply machine learning techniques to query optimization challenges have been promising, but have shown few practical gains due to substantive training overhead, inability to adapt to changes, and poor tail performance. Motivated by these difficulties and drawing upon a long history of research in multi-armed bandits, we introduce Bao (the BAndit Optimizer). Bao takes advantage of the wisdom built into existing query optimizers by providing per-query optimization hints. Bao combines modern tree convolutional neural networks with Thompson sampling, a decades-old and well-studied reinforcement learning algorithm. As a result, Bao automatically learns from its mistakes and adapts to changes in query workloads, data, and schema. Experimentally, we demonstrate that Bao can quickly (an order of magnitude faster than previous approaches) learn strategies that improve end-to-end query execution performance, including tail latency. In cloud environments, we show that Bao can offer both reduced costs and better performance compared with a sophisticated commercial system.

Predicting the relevance between two given videos with respect to their visual content is a key component for content-based video recommendation and retrieval. Thanks to the increasing availability of pre-trained image and video convolutional neural network models, deep visual features are widely used for video content representation. However, as how two videos are relevant is task-dependent, such off-the-shelf features are not always optimal for all tasks. Moreover, due to varied concerns including copyright, privacy and security, one might have access to only pre-computed video features rather than original videos. We propose in this paper feature re-learning for improving video relevance prediction, with no need of revisiting the original video content. In particular, re-learning is realized by projecting a given deep feature into a new space by an affine transformation. We optimize the re-learning process by a novel negative-enhanced triplet ranking loss. In order to generate more training data, we propose a new data augmentation strategy which works directly on frame-level and video-level features. Extensive experiments in the context of the Hulu Content-based Video Relevance Prediction Challenge 2018 justify the effectiveness of the proposed method and its state-of-the-art performance for content-based video relevance prediction.

The graph matching problem aims to find the latent vertex correspondence between two edge-correlated graphs and has many practical applications. In this work, we study a version of the seeded graph matching problem, which assumes that a set of seeds, i.e., pre-mapped vertex-pairs, is given in advance. Specifically, consider two correlated graphs whose edges are sampled independently with probability $s$ from a parent \ER graph $\mathcal{G}(n,p)$. Furthermore, a mapping between the vertices of the two graphs is provided as seeds, of which an unknown $\beta$ fraction is correct. This problem was first studied in \cite{lubars2018correcting} where an algorithm is proposed and shown to perfectly recover the correct vertex mapping with high probability if $\beta\geq\max\left\{\frac{8}{3}p,\frac{16\log{n}}{nps^2}\right\}$. We improve their condition to $\beta\geq\max\left\{30\sqrt{\frac{\log n}{n(1-p)^2s^2}},\frac{45\log{n}}{np(1-p)^2s^2}\right)$. However, when $p=O\left( \sqrt{{\log n}/{ns^2}}\right)$, our improved condition still requires that $\beta$ must increase inversely proportional to $np$. In order to improve the matching performance for sparse graphs, we propose a new algorithm that uses "witnesses" in the 2-hop neighborhood, instead of only 1-hop neighborhood as in \cite{lubars2018correcting}. We show that when $np^2\leq\frac{1}{135\log n}$, our new algorithm can achieve perfect recovery with high probability if $\beta\geq\max\left\{900\sqrt{\frac{np^3(1-s)\log n}{s}},600\sqrt{\frac{\log n}{ns^4}}, \frac{1200\log n}{n^2p^2s^4}\right\}$ and $nps^2\geq 128\log n$. Numerical experiments on both synthetic and real graphs corroborate our theoretical findings and show that our 2-hop algorithm significantly outperforms the 1-hop algorithm when the graphs are relatively sparse.

In Transformer-based neural machine translation (NMT), the positional encoding mechanism helps the self-attention networks to learn the source representation with order dependency, which makes the Transformer-based NMT achieve state-of-the-art results for various translation tasks. However, Transformer-based NMT only adds representations of positions sequentially to word vectors in the input sentence and does not explicitly consider reordering information in this sentence. In this paper, we first empirically investigate the relationship between source reordering information and translation performance. The empirical findings show that the source input with the target order learned from the bilingual parallel dataset can substantially improve translation performance. Thus, we propose a novel reordering method to explicitly model this reordering information for the Transformer-based NMT. The empirical results on the WMT14 English-to-German, WAT ASPEC Japanese-to-English, and WMT17 Chinese-to-English translation tasks show the effectiveness of the proposed approach.

Minor embedding heuristics have become an indispensable tool for compiling problems in quadratically unconstrained binary optimization (QUBO) into the hardware graphs of quantum and CMOS annealing processors. While recent embedding heuristics have been developed for annealers of moderate size (about 2000 nodes) the size of the latest CMOS annealing processor (with 102,400 nodes) poses entirely new demands on the embedding heuristic. This raises the question, if recent embedding heuristics can maintain meaningful embedding performance on hardware graphs of increasing size. Here, we develop an improved version of the probabilistic-swap-shift-annealing (PSSA) embedding heuristic [which has recently been demonstrated to outperform the standard embedding heuristic by D-Wave Systems (Cai et al., 2014)] and evaluate its embedding performance on hardware graphs of increasing size. For random-cubic and Barabasi-Albert graphs we find the embedding performance of improved PSSA to consistently exceed the threshold of the best known complete graph embedding by a factor of 3.2 and 2.8, respectively, up to hardware graphs with 102,400 nodes. On the other hand, for random graphs with constant edge density not even improved PSSA can overcome the deterministic threshold guaranteed by the existence of the best known complete graph embedding. Finally, we prove a new upper bound on the maximal embeddable size of complete graphs into hardware graphs of CMOS annealers and show that the embedding performance of its currently best known complete graph embedding has optimal order for hardware graphs with fixed coordination number.

Recent years showed a strong increase in biomedical sciences and an inherent increase in publication volume. Extraction of specific information from these sources requires highly sophisticated text mining and information extraction tools. However, the integration of freely available tools into customized workflows is often cumbersome and difficult. We describe SIA (Scalable Interoperable Annotation Server), our contribution to the BeCalm-Technical interoperability and performance of annotation servers (BeCalm-TIPS) task, a scalable, extensible, and robust annotation service. The system currently covers six named entity types (i.e., Chemicals, Diseases, Genes, miRNA, Mutations, and Organisms) and is freely available under Apache 2.0 license at https://github.com/Erechtheus/sia.

Traditional convolution-based generative adversarial networks synthesize images based on hierarchical local operations, where long-range dependency relation is implicitly modeled with a Markov chain. It is still not sufficient for categories with complicated structures. In this paper, we characterize long-range dependence with attentive normalization (AN), which is an extension to traditional instance normalization. Specifically, the input feature map is softly divided into several regions based on its internal semantic similarity, which are respectively normalized. It enhances consistency between distant regions with semantic correspondence. Compared with self-attention GAN, our attentive normalization does not need to measure the correlation of all locations, and thus can be directly applied to large-size feature maps without much computational burden. Extensive experiments on class-conditional image generation and semantic inpainting verify the efficacy of our proposed module.

Fine-tuning pre-trained generative language models to down-stream language generation tasks has shown promising results. However, this comes with the cost of having a single, large model for each task, which is not ideal in low-memory/power scenarios (e.g., mobile). In this paper, we propose an effective way to fine-tune multiple down-stream generation tasks simultaneously using a single, large pre-trained model. The experiments on five diverse language generation tasks show that by just using an additional 2-3% parameters for each task, our model can maintain or even improve the performance of fine-tuning the whole model.

Change detection in heterogeneous remote sensing images is crucial for disaster damage assessment. Recent methods use homogenous transformation, which transforms the heterogeneous optical and SAR remote sensing images into the same feature space, to achieve change detection. Such transformations mainly operate on the low-level feature space and may corrupt the semantic content, deteriorating the performance of change detection. To solve this problem, this paper presents a new homogeneous transformation model termed deep homogeneous feature fusion (DHFF) based on image style transfer (IST). Unlike the existing methods, the DHFF method segregates the semantic content and the style features in the heterogeneous images to perform homogeneous transformation. The separation of the semantic content and the style in homogeneous transformation prevents the corruption of image semantic content, especially in the regions of change. In this way, the detection performance is improved with accurate homogeneous transformation. Furthermore, we present a new iterative IST (IIST) strategy, where the cost function in each IST iteration measures and thus maximizes the feature homogeneity in additional new feature subspaces for change detection. After that, change detection is accomplished accurately on the original and the transformed images that are in the same feature space. Real remote sensing images acquired by SAR and optical satellites are utilized to evaluate the performance of the proposed method. The experiments demonstrate that the proposed DHFF method achieves significant improvement for change detection in heterogeneous optical and SAR remote sensing images, in terms of both accuracy rate and Kappa index.

In recent years, cyber-security of power systems has become a growing concern. To protect power systems from malicious adversaries, advanced defense strategies that exploit sophisticated detection algorithms are required. Motivated by this, in this paper we introduce an active defense method based on dynamic clustering. Our detection strategy uses a moving-target approach where information about the system's varying operating point is first used to cluster measurements according to their transfer function characteristics that change over time. Then, detection is carried out through series of similarity checks between measurements within the same cluster. The proposed method is effective in detecting cyber-attacks even when the attacker has extensive knowledge of the system parameters, model and detection policy at some point in time. The effectiveness of our proposed detection algorithm is demonstrated through a numerical example on the IEEE 24-bus power system.

Current neural networks are mostly built upon the MP model, which usually formulates the neuron as executing an activation function on the real-valued weighted aggregation of signals received from other neurons. In this paper, we propose the Flexible Transmitter (FT) model, a novel bio-plausible neuron with flexible plasticity. The FT model employs a pair of parameters to model the transmitter between neurons and sets up a neurotransmitter regulated memory unit to record the long-term learning information of the concerned neuron, thus leading to the formulation of the FT model as a two-variable two-valued function, which takes the commonly-used MP neuron model as its special case. The FT model can handle more complicated data, even time series signals. To exhibit the power and potential of our FT model, we present the Flexible Transmitter Network (FTNet), which is built in the most common fully-connected feed-forward architecture by incorporating the FT neuron as the basic building block. FTNet allows gradient calculation and can be implemented by an extension of the backpropagation algorithm in the complex domain. Experiments on a board range of tasks show the superiority of the proposed FTNet. This study provides an alternative basic building block in neural networks and exhibits the feasibility of developing artificial neural networks with neuronal plasticity.

This paper focuses on the new privacy challenges that arise in smart homes. Specifically, the paper focuses on inferring the user's activities -- which may, in turn, lead to the user's privacy -- via inferences through device activities and network traffic analysis. We develop techniques that are based on a cryptographically secure token circulation in a ring network consisting of smart home devices to prevent inferences from device activities, via device workflow, i.e., inferences from a coordinated sequence of devices' actuation. The solution hides the device activity and corresponding channel activities, and thus, preserve the individual's activities. We also extend our solution to deal with a large number of devices and devices that produce large-sized data by implementing parallel rings. Our experiments also evaluate the performance in terms of communication overheads of the proposed approach and the obtained privacy.

This paper presents online-capable deep learning model for probabilistic vehicle trajectory prediction. We propose a simple encoder-decoder architecture based on multi-head attention. The proposed model generates the distribution of the predicted trajectories for multiple vehicles in parallel. Our approach to model the interactions can learn to attend to a few influential vehicles in an unsupervised manner, which can improve the interpretability of the network. The experiments using naturalistic trajectories at highway show the clear improvement in terms of positional error on both longitudinal and lateral direction.

The ongoing neural revolution in Natural Language Processing has recently been dominated by large-scale pre-trained Transformer models, where size does matter: it has been shown that the number of parameters in such a model is typically positively correlated with its performance. Naturally, this situation has unleashed a race for ever larger models, many of which, including the large versions of popular models such as BERT, XLNet, and RoBERTa, are now out of reach for researchers and practitioners without large-memory GPUs/TPUs. To address this issue, we explore a number of memory-light model reduction strategies that do not require model pre-training from scratch. The experimental results show that we are able to prune BERT, RoBERTa and XLNet models by up to 40%, while maintaining up to 98% of their original performance. We also show that our pruned models are on par with DistilBERT in terms of both model size and performance. Finally, our pruning strategies enable interesting comparative analysis between BERT and XLNet.

Cluster structure detection is a fundamental task for the analysis of graphs, in order to understand and to visualize their functional characteristics. Among the different cluster structure detection methods, spectral clustering is currently one of the most widely used due to its speed and simplicity. Yet, there are few theoretical guarantee to recover the underlying partitions of the graph for general models. This paper therefore presents a variant of spectral clustering, called 1-spectral clustering, performed on a new random model closely related to stochastic block model. Its goal is to promote a sparse eigenbasis solution of a 1 minimization problem revealing the natural structure of the graph. The effectiveness and the robustness to small noise perturbations of our technique is confirmed through a collection of simulated and real data examples.

Multilingual sequence labeling is a task of predicting label sequences using a single unified model for multiple languages. Compared with relying on multiple monolingual models, using a multilingual model has the benefit of a smaller model size, easier in online serving, and generalizability to low-resource languages. However, current multilingual models still underperform individual monolingual models significantly due to model capacity limitations. In this paper, we propose to reduce the gap between monolingual models and the unified multilingual model by distilling the structural knowledge of several monolingual models (teachers) to the unified multilingual model (student). We propose two novel KD methods based on structure-level information: (1) approximately minimizes the distance between the student's and the teachers' structure level probability distributions, (2) aggregates the structure-level knowledge to local distributions and minimizes the distance between two local probability distributions. Our experiments on 4 multilingual tasks with 25 datasets show that our approaches outperform several strong baselines and have stronger zero-shot generalizability than both the baseline model and teacher models.

In this letter, we contribute a multi-language handwritten digit recognition dataset named MNIST-MIX, which is the largest dataset of the same type in terms of both languages and data samples. With the same data format with MNIST, MNIST-MIX can be seamlessly applied in existing studies for handwritten digit recognition. By introducing digits from 10 different languages, MNIST-MIX becomes a more challenging dataset and its imbalanced classification requires a better design of models. We also present the results of applying a LeNet model which is pre-trained on MNIST as the baseline.

This paper presents the system used in our submission to the \textit{CoNLL 2019 shared task: Cross-Framework Meaning Representation Parsing}. Our system is a graph-based parser which combines an extended pointer-generator network that generates nodes and a second-order mean field variational inference module that predicts edges. Our system achieved \nth{1} and \nth{2} place for the DM and PSD frameworks respectively on the in-framework ranks and achieved \nth{3} place for the DM framework on the cross-framework ranks.

Are 5G connection and UAVs merely parts of an extravagant and luxurious world, or are they essential parts of a practical world in a way we have yet to see? To aid in a direction to address the issue, we provide a practical framework for immersive aerial monitoring for public safety. Because the framework is built on top of actual realizations and implementations designed to fulfill specific use cases, high level of practicality is ensured by nature. We first investigate 5G network performance on UAVs by isolating performance for different aspects of expected flight missions. Finally, the novel aerial monitoring scheme that we introduce relies on the recent advances brought by 5G networks and mitigates the inherent limitations of 5G network that we investigate in this paper.

In this document it is shown that the chemical shift, spin-spin couplings and return to equilibrium observed in Nuclear Magnetic Resonance (NMR) are naturally contained in the realtime nuclear spin dynamics, if the dynamics is calculated directly from molecular Quantum Electrodynamics at finite temperatures. Thus, no effective NMR parameters or relaxation superoperators are used for the calculation of \textit{continuous} NMR spectra. This provides a basis for the repeal of Ramsey's theory from the 1950s, NMR relaxation theory and later developments which form the current basis for NMR theory. The presented approach replaces the discrete spectrum of the effective spin model by a continuous spectrum, whose numerical calculation is enabled by the usage of the mathematical structure of algebraic Quantum Field Theory. While the findings are demonstrated for the hydrogen atom, it is outlined that the approach can be applied to any molecular system for which the electronic structure can be calculated by using a common quantum chemical method. Thus, the presented approach has potential for an improved NMR data analysis and more accurate predictions for hyperpolarized Magnetic Resonance Imaging.

Besides being part of the Internet of Things (IoT), drones can play a relevant role in it as enablers. The 3D mobility of UAVs can be exploited to improve node localization in IoT networks for, e.g., search and rescue or goods localization and tracking. One of the widespread IoT communication technologies is Long Range Wide Area Network (LoRaWAN), which allows achieving long communication distances with low power. In this work, we present a drone-aided localization system for LoRa networks in which a UAV is used to improve the estimation of a node's location initially provided by the network. We characterize the relevant parameters of the communication system and use them to develop and test a search algorithm in a realistic simulated scenario. We then move to the full implementation of a real system in which a drone is seamlessly integrated into Swisscom's LoRa network. The drone coordinates with the network with a two-way exchange of information which results in an accurate and fully autonomous localization system. The results obtained in our field tests show a ten-fold improvement in localization precision with respect to the estimation provided by the fixed network. Up to our knowledge, this is the first time a UAV is successfully integrated in a LoRa network to improve its localization accuracy.

We consider the problem of fitting a polynomial to a set of data points, each data point consisting of a feature vector and a response variable. In contrast to standard least-squares polynomial regression, we require that the polynomial regressor satisfy shape constraints, such as monotonicity with respect to a variable, Lipschitz-continuity, or convexity over a region. Constraints of this type appear quite frequently in a number of areas including economics, operations research, and pricing. We show how to use semidefinite programming to obtain polynomial regressors that have these properties. We further show that, under some assumptions on the generation of the data points, the regressors obtained are consistent estimators of the underlying shape-constrained function that maps the feature vectors to the responses. We apply our methodology to the US KLEMS dataset to estimate production of a sector as a function of capital, energy, labor, materials, and services. We observe that it outperforms the more traditional approach (which consists in modelling the production curves as Cobb-Douglas functions) on 50 out of the 65 industries listed in the KLEMS database.

We introduce High-Relative Degree Stochastic Control Lyapunov functions and Barrier Functions as a means to ensure asymptotic stability of the system and incorporate state dependent high relative degree safety constraints on a non-linear stochastic systems. Our proposed formulation also provides a generalisation to the existing literature on control Lyapunov and barrier functions for stochastic systems. The control policies are evaluated using a constrained quadratic program that is based on control Lyapunov and barrier functions. Our proposed control design is validated via simulated experiments on a relative degree 2 system (2 dimensional car navigation) and relative degree 4 system (two-link pendulum with elastic actuator).

We propose a novel method for image stitching that is robust against repetitive patterns and featureless regions in the imaginary. In such cases, typical image stitching methods easily produce stitching artifacts, since they may produce false pairwise image registrations that are in conflict within the global connectivity graph. By contrast, our method collects all the plausible pairwise image registration candidates, among which globally consistent candidates are chosen. This enables the method to determine the correct pairwise registrations by utilizing all the available information from the whole imaginary, such as unambiguous registrations outside the repeating pattern and featureless regions. We formalize the method as a weighted multigraph whose nodes represent the individual image transformations from the composite image, and whose sets of multiple edges between two nodes represent all the plausible transformations between the pixel coordinates of the two images. The edge weights represent the plausibility of the transformations. The image transformations and the edge weights are solved from a non-linear minimization problem with linear constraints, for which a projection method is used. As an example, we apply the method in a scanning application where the transformations are primarily translations with only slight rotation and scaling component.

An increasing number of decisions are guided by machine learning algorithms. In many settings, from consumer credit to criminal justice, those decisions are made by applying an estimator to data on an individual's observed behavior. But when consequential decisions are encoded in rules, individuals may strategically alter their behavior to achieve desired outcomes. This paper develops a new class of estimator that is stable under manipulation, even when the decision rule is fully transparent. We explicitly model the costs of manipulating different behaviors, and identify decision rules that are stable in equilibrium. Through a large field experiment in Kenya, we show that decision rules estimated with our strategy-robust method outperform those based on standard supervised learning approaches.

Intersection of adversarial learning and satellite image processing is an emerging field in remote sensing. In this study, we intend to address synthesis of high resolution multi-spectral satellite imagery using adversarial learning. Guided by the discovery of attention mechanism, we regulate the process of band synthesis through spatio-spectral Laplacian attention. Further, we use Wasserstein GAN with gradient penalty norm to improve training and stability of adversarial learning. In this regard, we introduce a new cost function for the discriminator based on spatial attention and domain adaptation loss. We critically analyze the qualitative and quantitative results compared with state-of-the-art methods using widely adopted evaluation metrics. Our experiments on datasets of three different sensors, namely LISS-3, LISS-4, and WorldView-2 show that attention learning performs favorably against state-of-the-art methods. Using the proposed method we provide an additional data product in consistent with existing high resolution bands. Furthermore, we synthesize over 4000 high resolution scenes covering various terrains to analyze scientific fidelity. At the end, we demonstrate plausible large scale real world applications of the synthesized band.

In previous work, artificial agents were shown to achieve almost perfect accuracy in referential games where they have to communicate to identify images. Nevertheless, the resulting communication protocols rarely display salient features of natural languages, such as compositionality. In this paper, we propose some realistic sources of pressure on communication that avert this outcome. More specifically, we formalise the principle of least effort through an auxiliary objective. Moreover, we explore several game variants, inspired by the principle of object constancy, in which we alter the frequency, position, and luminosity of the objects in the images. We perform an extensive analysis on their effect through compositionality metrics, diagnostic classifiers, and zero-shot evaluation. Our findings reveal that the proposed sources of pressure result in emerging languages with less redundancy, more focus on high-level conceptual information, and better abilities of generalisation. Overall, our contributions reduce the gap between emergent and natural languages.

Polar codes are able to achieve the capacity of memoryless channels under successive cancellation (SC) decoding. Soft Cancellation (SCAN) is a soft-output decoder based on the SC schedule, useful in iterative decoding and concatenation of polar codes. However, the sequential nature of this decoder leads to high decoding latency compared to state-of-the-art codes. To reduce the latency of SCAN, in this paper we identify special nodes in the decoding tree, corresponding to specific frozen-bit sequences, and propose dedicated low-latency decoding approaches for each of them. The resulting fast-SCAN decoder does not alter the soft-output compared to the standard SCAN while dramatically reducing the decoding latency and yielding the same error-correction performance.

The purpose of this paper is to present a comprehensive study of a coherent feedback network where the main component consists of two distant double quantum dot (DQD) qubits which are directly coupled to a cavity. This main component has recently been physically realized (van Woerkom, {\it et al.}, Microwave photon-mediated interactions between semiconductor qubits, Physical Review X, 8(4):041018, 2018). The feedback loop is closed by cascading this main component with a beamsplitter. The dynamics of this coherent feedback network is studied from three perspectives. First, an analytic form of the output single-photon state of the network driven by a single-photon state is derived; in particular, it is observed that coherent feedback elongates considerably the interaction between the input single photon and the network. Second, excitation probabilities of DQD qubits are computed when the network is driven by a single-photon input state. Moreover, if the input is vacuum but one of the two DQD qubits is initialized in its excited state, the explicit expression of the state of the network is derived, in particular, it is shown that the output field and the two DQD qubits can form an entangled state if the transition frequencies of two DQD qubits are equal. Finally, the exact form of the pulse shape is obtained by which the single-photon input can fully excite one of these two DQD qubits at any controllable time, which may be useful in the construction of $2$-qubit quantum gates.

Separating different music instruments playing the same piece is a challenging task since the different audio sources are synchronized and playing in harmony. Moreover, the number of sources may vary for each piece and some of the sources may belong to the same family of instruments, thus sharing timbral characteristics and making the sources more correlated. This paper proposes a source separation method for multiple musical instruments sounding simultaneously and explores how much additional information apart from the audio stream can lift the quality of source separation. We explore conditioning techniques at different levels of a primary source separation network and utilize two extra modalities of data, namely presence or absence of instruments in the mixture, and the corresponding video stream data.

This chapter focuses on the performance enhancement brought by the addition of caching capabilities to full-duplex (FD) radios in the context of ultra-dense networks (UDNs). More specifically, we aim at showing that the interference footprint of such networks, i.e., the major bottleneck to overcome to observe the theoretical FD throughput doubling at the network level, can be significantly reduced thanks to edge caching. Fundamental results show that most of the gain, as compared to their half-duplex (HD) counterparts, can be achieved by such networks only if costly modifications to their infrastructure are performed and/or if high-rate signaling is exchanged between user equipments (UEs) over suitable control links. Therefore, we aim at proposing a viable and cost-effective alternative to these solutions based on pre-fetching locally popular contents at the network edge. We start by considering an interference-rich scenario such as an ultra-dense FD small-cell network, in which several non-cooperative FD base stations (BSs) serve their associated UEs while communicating with a wireless backhaul node (BN) to retrieve the content to deliver. We then describe a geographical caching policy aiming at capturing local files popularity and compute the corresponding cache-hit probability. Thereupon, we calculate the probability of successful transmission of a file requested by a UE, either directly by its serving small-cell base station (SBS) or by the corresponding BN: this quantity is then used to lower-bound the throughput of the considered network. Our approach leverages tools from stochastic geometry in order to guarantee both analytical tractability of the problem and generality of the results. Our numerical simulations show that shifting from cache-free to cache-aided FD small-cell networks yields a remarkable performance improvement.

News headline generation aims to produce a short sentence to attract readers to read the news. One news article often contains multiple keyphrases that are of interest to different users, which can naturally have multiple reasonable headlines. However, most existing methods focus on the single headline generation. In this paper, we propose generating multiple headlines with keyphrases of user interests, whose main idea is to generate multiple keyphrases of interest to users for the news first, and then generate multiple keyphrase-relevant headlines. We propose a multi-source Transformer decoder, which takes three sources as inputs: (a) keyphrase, (b) keyphrase-filtered article, and (c) original article to generate keyphrase-relevant, high-quality, and diverse headlines. Furthermore, we propose a simple and effective method to mine the keyphrases of interest in the news article and build a first large-scale keyphrase-aware news headline corpus, which contains over 180K aligned triples of $<$news article, headline, keyphrase$>$. Extensive experimental comparisons on the real-world dataset show that the proposed method achieves state-of-the-art results in terms of quality and diversity

Coupled with the rise of Deep Learning, the wealth of data and enhanced computation capabilities of Internet of Vehicles (IoV) components enable effective Artificial Intelligence (AI) based models to be built. Beyond ground data sources, Unmanned Aerial Vehicles (UAVs) based service providers for data collection and AI model training, i.e., Drones-as-a-Service, is increasingly popular in recent years. However, the stringent regulations governing data privacy potentially impedes data sharing across independently owned UAVs. To this end, we propose the adoption of a Federated Learning (FL) based approach to enable privacy-preserving collaborative Machine Learning across a federation of independent DaaS providers for the development of IoV applications, e.g., for traffic prediction and car park occupancy management. Given the information asymmetry and incentive mismatches between the UAVs and model owners, we leverage on the self-revealing properties of a multi-dimensional contract to ensure truthful reporting of the UAV types, while accounting for the multiple sources of heterogeneity, e.g., in sensing, computation, and transmission costs. Then, we adopt the Gale-Shapley algorithm to match the lowest cost UAV to each subregion. The simulation results validate the incentive compatibility of our contract design, and shows the efficiency of our matching, thus guaranteeing profit maximization for the model owner amid information asymmetry.

In the past few years supervised and adversarial learning have been widely adopted in various complex computer vision tasks. It seems natural to wonder whether another branch of artificial intelligence, commonly known as Reinforcement Learning (RL) can benefit such complex vision tasks. In this study, we explore the plausible usage of RL in super resolution of remote sensing imagery. Guided by recent advances in super resolution, we propose a theoretical framework that leverages the benefits of supervised and reinforcement learning. We argue that a straightforward implementation of RL is not adequate to address ill-posed super resolution as the action variables are not fully known. To tackle this issue, we propose to parameterize action variables by matrices, and train our policy network using Monte-Carlo sampling. We study the implications of parametric action space in a model-free environment from theoretical and empirical perspective. Furthermore, we analyze the quantitative and qualitative results on both remote sensing and non-remote sensing datasets. Based on our experiments, we report considerable improvement over state-of-the-art methods by encapsulating supervised models in a reinforcement learning framework.

Exploiting more information from ground truth (GT) images now is a new research direction for further improving CNN's performance in CT image segmentation. Previous methods focus on devising the loss function for fulfilling such a purpose. However, it is rather difficult to devise a general and optimization-friendly loss function. We here present a novel and practical method that exploits GT images beyond the loss function. Our insight is that feature maps of two CNNs trained respectively on GT and CT images should be similar on some metric space, because they both are used to describe the same objects for the same purpose. We hence exploit GT images by enforcing such two CNNs' feature maps to be consistent. We assess the proposed method on two data sets, and compare its performance to several competitive methods. Extensive experimental results show that the proposed method is effective, outperforming all the compared methods.

In this paper, we introduce one family of vectorial prolate spheroidal wave functions of real order $\alpha>-1$ on the unit ball in $R^3$, which satisfy the divergence free constraint, thus are termed as divergence free vectorial ball PSWFs. They are vectorial eigenfunctions of an integral operator related to the finite Fourier transform, and solve the divergence free constrained maximum concentration problem in three dimensions, i.e., to what extent can the total energy of a band-limited divergence free vectorial function be concentrated on the unit ball? Interestingly, any optimally concentrated divergence free vectorial functions, when represented in series in vector spherical harmonics, shall be also concentrated in one of the three vectorial spherical harmonics modes. Moreover, divergence free ball PSWFs are exactly the vectorial eigenfunctions of the second order Sturm-Liouville differential operator which defines the scalar ball PSWFs. Indeed, the divergence free vectorial ball PSWFs possess a simple and close relation with the scalar ball PSWFs such that they share the same merits. Simultaneously, it turns out that the divergence free ball PSWFs solve another second order Sturm-Liouville eigen equation defined through the curl operator $\nabla\times$ instead of the gradient operator $\nabla$.

Flow-based generative models are an important class of exact inference models that admit efficient inference and sampling for image synthesis. Owing to the efficiency constraints on the design of the flow layers, e.g. split coupling flow layers in which approximately half the pixels do not undergo further transformations, they have limited expressiveness for modeling long-range data dependencies compared to autoregressive models that rely on conditional pixel-wise generation. In this work, we improve the representational power of flow-based models by introducing channel-wise dependencies in their latent space through multi-scale autoregressive priors (mAR). Our mAR prior for models with split coupling flow layers (mAR-SCF) can better capture dependencies in complex multimodal data. The resulting model achieves state-of-the-art density estimation results on MNIST, CIFAR-10, and ImageNet. Furthermore, we show that mAR-SCF allows for improved image generation quality, with gains in FID and Inception scores compared to state-of-the-art flow-based models.

Segmenting overlapping cytoplasm of cells in cervical smear images is a clinically essential task, for quantitatively measuring cell-level features in order to diagnose cervical cancer. This task, however, remains rather challenging, mainly due to the deficiency of intensity (or color) information in the overlapping region. Although shape prior-based models that compensate intensity deficiency by introducing prior shape information (shape priors) about cytoplasm are firmly established, they often yield visually implausible results, mainly because they model shape priors only by limited shape hypotheses about cytoplasm, exploit cytoplasm-level shape priors alone, and impose no shape constraint on the resulting shape of the cytoplasm. In this paper, we present a novel and effective shape prior-based approach, called constrained multi-shape evolution, that segments all overlapping cytoplasms in the clump simultaneously by jointly evolving each cytoplasm's shape guided by the modeled shape priors. We model local shape priors (cytoplasm--level) by an infinitely large shape hypothesis set which contains all possible shapes of the cytoplasm. In the shape evolution, we compensate intensity deficiency for the segmentation by introducing not only the modeled local shape priors but also global shape priors (clump--level) modeled by considering mutual shape constraints of cytoplasms in the clump. We also constrain the resulting shape in each evolution to be in the built shape hypothesis set, for further reducing implausible segmentation results. We evaluated the proposed method in two typical cervical smear datasets, and the extensive experimental results show that the proposed method is effective to segment overlapping cytoplasm, consistently outperforming the state-of-the-art methods.

This paper makes a first step towards compatible and hence reusable network components. Rather than training networks for different tasks independently, we adapt the training process to produce network components that are compatible across tasks. We propose and compare several different approaches to accomplish compatibility. Our experiments on CIFAR-10 show that: (i) we can train networks to produce compatible features, without degrading task accuracy compared to training networks independently; (ii) the degree of compatibility is highly dependent on where we split the network into a feature extractor and a classification head; (iii) random initialization has a large effect on compatibility; (iv) we can train incrementally: given previously trained components, we can train new ones which are also compatible with them. This work is part of a larger goal to increase network reusability: we envision that compatibility will enable solving new tasks by mixing and matching suitable components.

Learning words is a challenge for children and neural networks alike. However, what they struggle with can differ. When prompted by novel words, children have been shown to tend to associate them with unfamiliar referents. This has been taken to reflect a propensity toward mutual exclusivity. In this study, we investigate whether and under which circumstances neural models can exhibit analogous behavior. To this end, we evaluate cross-situational neural models on novel items with distractors, contrasting the interaction between different word learning and referent selection strategies. We find that, as long as they bring about competition between words, constraints in both learning and referent selection can improve success in tasks with novel words and referents. For neural network research, our findings clarify the role of available options for enhanced performance in tasks where mutual exclusivity is advantageous. For cognitive research, they highlight latent interactions between word learning, referent selection mechanisms, and the structure of stimuli.

This paper develops a new exponential forgetting algorithm that can prevent so-called the estimator windup problem, while retaining fast convergence speed. To investigate the properties of the proposed forgetting algorithm, boundedness of the covariance matrix is first analysed and compared with various exponential and directional forgetting algorithms. Then, stability of the estimation error with and without the persistent excitation condition is theoretically analysed in comparison with the existing benchmark algorithms. Numerical simulations on wing rock motion validate the analysis results.

Here, we propose an unsupervised fuzzy rule-based dimensionality reduction method primarily for data visualization. It considers the following important issues relevant to dimensionality reduction-based data visualization: (i) preservation of neighborhood relationships, (ii) handling data on a non-linear manifold, (iii) the capability of predicting projections for new test data points, (iv) interpretability of the system, and (v) the ability to reject test points if required. For this, we use a first-order Takagi-Sugeno type model. We generate rule antecedents using clusters in the input data. In this context, we also propose a new variant of the Geodesic c-means clustering algorithm. We estimate the rule parameters by minimizing an error function that preserves the inter-point geodesic distances (distances over the manifold) as Euclidean distances on the projected space. We apply the proposed method on three synthetic and three real-world data sets and visually compare the results with four other standard data visualization methods. The obtained results show that the proposed method behaves desirably and performs better than or comparable to the methods compared with. The proposed method is found to be robust to the initial conditions. The predictability of the proposed method for test points is validated by experiments. We also assess the ability of our method to reject output points when it should. Then, we extend this concept to provide a general framework for learning an unsupervised fuzzy model for data projection with different objective functions. To the best of our knowledge, this is the first attempt to manifold learning using unsupervised fuzzy modeling.

The paper presents the method of attractive cylinders -- a generalization of the atrractive ellipsoid method to the cases of tracking and observation. Based on the developed method, an algorithm for calculating the parameters of the controller, which ensures the boundedness of tracking or observation errors in the presence of bounded external disturbances, is proposed. The effectiveness of the proposed method is demonstrated by examples.

We study the differential properties of higher-order statistical probabilistic programs with recursion and conditioning. Our starting point is an open problem posed by Hongseok Yang: what class of statistical probabilistic programs have densities that are differentiable almost everywhere? To formalise the problem, we consider Statistical PCF (SPCF), an extension of call-by-value PCF with real numbers, and constructs for sampling and conditioning. We give SPCF a sampling-style operational semantics a la Borgstrom et al., and study the associated weight (commonly referred to as the density) function and value function on the set of possible execution traces. Our main result is that almost-surely terminating SPCF programs, generated from a set of primitive functions (e.g. the set of analytic functions) satisfying mild closure properties, have weight and value functions that are almost-everywhere differentiable. We use a stochastic form of symbolic execution to reason about almost-everywhere differentiability. A by-product of this work is that almost-surely terminating deterministic (S)PCF programs with real parameters denote functions that are almost-everywhere differentiable. Our result is of practical interest, as almost-everywhere differentiability of the density function is required to hold for the correctness of major gradient-based inference algorithms.

The Coronavirus pandemic has taken the world by storm as also the social media. As the awareness about the ailment increased, so did messages, videos and posts acknowledging its presence. The social networking site, Twitter, demonstrated similar effect with the number of posts related to coronavirus showing an unprecedented growth in a very short span of time. This paper presents a statistical analysis of the twitter messages related to this disease posted since January 2020. Two types of empirical studies have been performed. The first is on word frequency and the second on sentiments of the individual tweet messages. Inspection of the word frequency is useful in characterizing the patterns or trends in the words used on the site. This would also reflect on the psychology of the twitter users at this critical juncture. Unigram, bigram and trigram frequencies have been modeled by power law distribution. The results have been validated by Sum of Square Error (SSE), R2 and Root Mean Square Error (RMSE). High values of R2 and low values of SSE and RMSE lay the grounds for the goodness of fit of this model. Sentiment analysis has been conducted to understand the general attitudes of the twitter users at this time. Both tweets by general public and WHO were part of the corpus. The results showed that the majority of the tweets had a positive polarity and only about 15% were negative.

In this work, we propose efficient algorithms for joint independent subspace analysis (JISA), an extension of independent component analysis that deals with parallel mixtures, where not all the components are independent. We derive an algorithmic framework for JISA based on the majorization-minimization (MM) optimization technique (JISA-MM). We use a well-known inequality for super-Gaussian sources to derive a surrogate function of the negative log-likelihood of the observed data. The minimization of this surrogate function leads to a variant of the hybrid exact-approximate diagonalization problem, but where multiple demixing vectors are grouped together. In the spirit of auxiliary function based independent vector analysis (AuxIVA), we propose several updates that can be applied alternately to one, or jointly to two, groups of demixing vectors.

Recently, blind extraction of one or more sources has gained interest as a reasonable way of exploiting larger microphone arrays to achieve better separation. In particular, several MM algorithms have been proposed for overdetermined IVA (OverIVA). By applying JISA-MM, we are not only able to rederive these in a general manner, but also find several new algorithms. We run extensive numerical experiments to evaluate their performance, and compare it to that of full separation with AuxIVA. We find that algorithms using pairwise updates of two sources, or of one source and the background have the fastest convergence, and are able to separate target sources quickly and precisely from the background. In addition, we characterize the performance of all algorithms under a large number of noise, reverberation, and background mismatch conditions.

We consider the problem of controlling an unstable scalar linear plant over a power-constrained additive white Gaussian noise (AWGN) channel, where the controller/receiver has access to an additional noisy measurement of the state of the control system. To that end, we view the noisy measurement as side information and recast the problem to that of joint source-channel coding with side information at the receiver. We argue that judicious modulo-based schemes improve over their linear counterparts and allow to avoid a large increase in the transmit power due to the ignorance of the side information at the sensor/transmitter. We demonstrate the usefulness of our technique for the settings where i) the sensor is oblivious of the control objectives, control actions and previous controller state estimates, ii) the system output tracks a desired reference signal that is available only at the controller via integral control action.

Knowledge about the daily number of new infections of Covid-19 is important because it is the basis for political decisions resulting in lockdowns and urgent health care measures. We use Germany as an example to illustrate shortcomings of official numbers, which are, at least in Germany, disclosed only with several days of delay and severely underreported on weekends (more than 40%). These shortcomings outline an urgent need for alternative data sources. The other widely cited source provided by the Center for Systems Science and Engineering at Johns Hopkins University (JHU) also deviates for Germany on average by 79% from the official numbers. We argue that Google Search and Twitter data should complement official numbers. They predict even better than the original values from Johns Hopkins University and do so several days ahead. These two data sources could also be used in parts of the world where official numbers do not exist or are perceived to be unreliable.

Aiming at the problems that the convolutional neural networks neglect to capture the inherent attributes of natural images and extract features only in a single scale in the field of image super-resolution reconstruction, a network structure based on attention mechanism and multi-scale feature fusion is proposed. By using the attention mechanism, the network can effectively integrate the non-local information and second-order features of the image, so as to improve the feature expression ability of the network. At the same time, the convolution kernel of different scales is used to extract the multi-scale information of the image, so as to preserve the complete information characteristics at different scales. Experimental results show that the proposed method can achieve better performance over other representative super-resolution reconstruction algorithms in objective quantitative metrics and visual quality.

We consider the routing flow shop problem with two machines on an asymmetric network. For this problem we discuss properties of an optimal schedule and present a polynomial time algorithm assuming the number of nodes of the network to be bounded by a constant. To the best of our knowledge, this is the first positive result on the complexity of the routing flow shop problem with an arbitrary structure of the transportation network, even in the case of a symmetric network. This result stands in contrast with the complexity of the two-machine routing open shop problem, which was shown to be NP-hard even on the two-node network.

The technology of vehicle and driver detection in Intelligent Transportation System(ITS) is a hot topic in recent years. In particular, the driver detection is still a challenging problem which is conductive to supervising traffic order and maintaining public safety. In this paper, an algorithm based on YOLOv3 is proposed to realize the detection and classification of vehicles, drivers, and people on the highway, so as to achieve the purpose of distinguishing driver and passenger and form a one-to-one correspondence between vehicles and drivers. The proposed model and contrast experiment are conducted on our self-build traffic driver's face database. The effectiveness of our proposed algorithm is validated by extensive experiments and verified under various complex highway conditions. Compared with other advanced vehicle and driver detection technologies, the model has a good performance and is robust to road blocking, different attitudes, and extreme lighting.

In multi-label learning, the issue of missing labels brings a major challenge. Many methods attempt to recovery missing labels by exploiting low-rank structure of label matrix. However, these methods just utilize global low-rank label structure, ignore both local low-rank label structures and label discriminant information to some extent, leaving room for further performance improvement. In this paper, we develop a simple yet effective discriminant multi-label learning (DM2L) method for multi-label learning with missing labels. Specifically, we impose the low-rank structures on all the predictions of instances from the same labels (local shrinking of rank), and a maximally separated structure (high-rank structure) on the predictions of instances from different labels (global expanding of rank). In this way, these imposed low-rank structures can help modeling both local and global low-rank label structures, while the imposed high-rank structure can help providing more underlying discriminability. Our subsequent theoretical analysis also supports these intuitions. In addition, we provide a nonlinear extension via using kernel trick to enhance DM2L and establish a concave-convex objective to learn these models. Compared to the other methods, our method involves the fewest assumptions and only one hyper-parameter. Even so, extensive experiments show that our method still outperforms the state-of-the-art methods.

In this paper, we propose a system for file classification in large data sets based on spiking neural networks (SNNs). File information contained in key-value metadata pairs is mapped by a novel correlative temporal encoding scheme to spike patterns that are input to an SNN. The correlation between input spike patterns is determined by a file similarity measure. Unsupervised training of such networks using spike-timing-dependent plasticity (STDP) is addressed first. Then, supervised SNN training is considered by backpropagation of an error signal that is obtained by comparing the spike pattern at the output neurons with a target pattern representing the desired class. The classification accuracy is measured for various publicly available data sets with tens of thousands of elements, and compared with other learning algorithms, including logistic regression and support vector machines. Simulation results indicate that the proposed SNN-based system using memristive synapses may represent a valid alternative to classical machine learning algorithms for inference tasks, especially in environments with asynchronous ingest of input data and limited resources.

Channel symmetry properties that imply the tightness of Shannon's random coding inner bound have recently been used to determine the capacity region of discrete-memoryless two-way channels (DM-TWCs). For channels without such symmetry properties, outer bounds are often needed to estimate the capacity region. However, validating symmetry conditions and/or evaluating non-trivial outer bounds are computationally demanding, especially for channels with large input and output alphabets. In this paper, three easy-to-check conditions that identify DM-TWCs with no such symmetry properties as well as an easy-to-compute outer bound are derived. The bound is obtained from Shannon's inner bound computation but is non-trivial. Using this outer bound, approximate capacity results can be established for certain DM-TWCs. The results are illustrated by two examples.

Although the success of artificial neural networks (ANNs), there is still a concern among many over their "black box" nature. Why do they work? Could we design a "transparent" network? This paper presents a controllable and readable polynomial neural network (CR-PNN) for approximation, prediction, and system identification. CR-PNN is simple enough to be described as one "small" formula so that we can control the approximation precision and explain the internal structure of the network. CR-PNN, in fact, essentially is the fascinating Taylor expansion in the form of network. The number of layers represents precision. Derivatives in Taylor expansion are exactly imitated by error back-propagation algorithm. Firstly, we demonstrated that CR-PNN shows excellent analysis performance to the "black box" system through ten synthetic data with noise. Also, the results were compared with synthetic data to substantiate its search towards the global optimum. Secondly, it was verified, by ten real-world applications, that CR-PNN brought better generalization capability relative to the typical ANNs that approximate depended on the nonlinear activation function. Finally, 200,000 repeated experiments, with 4898 samples, demonstrated that CR-PNN is five times more efficient than typical ANN for one epoch and ten times more efficient than typical ANN for one forward-propagation. In short, compared with the traditional neural networks, the novelties and advantages of CR-PNN include readability of the internal structure, guarantees of a globally optimal solution, lower computational complexity, and likely better robustness to real-world approximation.(We're strong believers in Open Source, and provide CR-PNN code for others. GitHub: https://github.com/liugang1234567/CR-PNN#cr-pnn)

We present results extending the foundational work of Choromanska et al (2015) on the complexity of the loss surfaces of multi-layer neural networks. We remove the strict reliance on specifically ReLU activation functions and obtain broadly the same results for general activation functions. This is achieved with piece-wise linear approximations to general activation functions, Kac-Rice calculations akin to those of Auffinger, Ben Arous and \v{C}ern\y (2013) and asymptotic analysis made possible by supersymmetric methods. Our results strengthen the case for the conclusions of Choromanska et al (2015) and the calculations contain various novel details required to deal with certain perturbations to the classical spin-glass calculations.

The persistent growth in phishing and the rising volume of phishing websites has led to individuals and organizations worldwide becoming increasingly exposed to various cyber-attacks. Consequently, more effective phishing detection is required for improved cyber defence. Hence, in this paper we present a deep learning-based approach to enable high accuracy detection of phishing sites. The proposed approach utilizes convolutional neural networks (CNN) for high accuracy classification to distinguish genuine sites from phishing sites. We evaluate the models using a dataset obtained from 6,157 genuine and 4,898 phishing websites. Based on the results of extensive experiments, our CNN based models proved to be highly effective in detecting unknown phishing sites. Furthermore, the CNN based approach performed better than traditional machine learning classifiers evaluated on the same dataset, reaching 98.2% phishing detection rate with an F1-score of 0.976. The method presented in this paper compares favourably to the state-of-the art in deep learning based phishing website detection.

Wireless signal-based gesture recognition has promoted the developments of VR game, smart home, etc. However, traditional approaches suffer from the influence of the domain gap. Low recognition accuracy occurs when the recognition model is trained in one domain but is used in another domain. Though some solutions, such as adversarial learning, transfer learning and body-coordinate velocity profile, have been proposed to achieve cross-domain recognition, these solutions more or less have flaws. In this paper, we define the concept of domain gap and then propose a more promising solution, namely DI, to eliminate domain gap and further achieve domain-independent gesture recognition. DI leverages the sign map of the gradient map as the domain gap eliminator to improve the recognition accuracy. We conduct experiments with ten domains and ten gestures. The experiment results show that DI can achieve the recognition accuracies of 87.13%, 90.12% and 94.45% on KNN, SVM and CNN, which outperforms existing solutions.

We develop a method for automatically synthesizing a rap verse given an input text written in another form, such as a summary of a news article. Our approach is to train a Transformer-based denoising autoencoder to reconstruct rap lyrics from content words. We study three different approaches for automatically stripping content words that convey the essential meaning of the lyrics. Moreover, we propose a BERT-based paraphrasing scheme for rhyme enhancement and show that it increases the average rhyme density of the lyrics by 10%. Experimental results on three diverse input domains -- existing rap lyrics, news, and movie plot summaries -- show that our method is capable of generating coherent and technically fluent rap verses that preserve the input content words. Human evaluation demonstrates that our approach gives a good trade-off between content preservation and style transfer compared to a strong information retrieval baseline.

Scene understanding has been of high interest in computer vision. It encompasses not only identifying objects in a scene, but also their relationships within the given context. With this goal, a recent line of works tackles 3D semantic segmentation and scene layout prediction. In our work we focus on scene graphs, a data structure that organizes the entities of a scene in a graph, where objects are nodes and their relationships modeled as edges. We leverage inference on scene graphs as a way to carry out 3D scene understanding, mapping objects and their relationships. In particular, we propose a learned method that regresses a scene graph from the point cloud of a scene. Our novel architecture is based on PointNet and Graph Convolutional Networks (GCN). In addition, we introduce 3DSSG, a semi-automatically generated dataset, that contains semantically rich scene graphs of 3D scenes. We show the application of our method in a domain-agnostic retrieval task, where graphs serve as an intermediate representation for 3D-3D and 2D-3D matching.

Polynomial chaos expansion (PCE) is an increasingly popular technique for uncertainty propagation and quantification in systems and control. Based on the theory of Hilbert spaces and orthogonal polynomials, PCE allows for a unifying mathematical framework to study systems under arbitrary uncertainties of finite variance; we introduce this problem as a so-called mapping under uncertainty. For practical PCE-based applications we require orthogonal polynomials relative to given probability densities, and their quadrature rules. With PolyChaos we provide a Julia software package that delivers the desired functionality: given a probability density function, PolyChaos offers several numerical routines to construct the respective orthogonal polynomials, and the quadrature rules together with tensorized scalar products. PolyChaos is the first PCE-related software written in Julia, a scientific programming language that combines the readability of scripted languages with the speed of compiled languages. We provide illustrating numerical examples that show both PCE and PolyChaos in action.

Context: Programmers frequently look for the code of previously solved problems that they can adapt for their own problem. Despite existing example code on the web, on sites like Stack Overflow, cryptographic Application Programming Interfaces (APIs) are co monly misused. There is little known about what makes examples helpful for developers in using crypto APIs. Analogical problem solving is a psychological theory that investigates how people use known solutions to solve new problems. There is evidence that the capacity to reason and solve novel problems a.k.a Fluid Intelligence (Gf ) and structurally and procedurally similar solutions support problem solving. Aim: Our goal is to understand whether similarity and Gf also have an effect in the context of using cryptographic APIs with the help of code examples. Method: We conducted a controlled experiment with 76 student participants developing with or without procedurally similar examples, one of two Java crypto libraries and measured the Gf of the participants as well as the effect on usability (effectiveness, efficiency, satisfaction) and security bugs. Results: We observed a strong effect of code examples with a high procedural similarity on all dependent variables. Fluid intelligence Gf had no effect. It also made no difference which library the participants used. Conclusions: Example code must be more highly similar to a concrete solution, not very abstract and generic to have a positive effect in a development task.

Topic models extract meaningful groups of words from documents, allowing for a better understanding of data. However, the solutions are often not coherent enough, and thus harder to interpret. Coherence can be improved by adding more contextual knowledge to the model. Recently, neural topic models have become available, while BERT-based representations have further pushed the state of the art of neural models in general. We combine pre-trained representations and neural topic models. Pre-trained BERT sentence embeddings indeed support the generation of more meaningful and coherent topics than either standard LDA or existing neural topic models. Results on four datasets show that our approach effectively increases topic coherence.

Private Set Intersection (PSI) is a vital cryptographic technique used for securely computing common data of different sets. In PSI protocols, often two parties hope to find their common set elements without needing to disclose their uncommon ones. In recent years, the cloud has been playing an influential role in PSI protocols which often need huge computational tasks. In 2017, Abadi et al. introduced a scheme named EO-PSI which uses a cloud to pass on the main computations to it and does not include any public-key operations. In EO-PSI, parties need to set up secure channels beforehand; otherwise, an attacker can easily eavesdrop on communications between honest parties and find private information. This paper presents an improved EO-PSI scheme which has the edge on the previous scheme in terms of privacy and complexity. By providing possible attacks on the prior scheme, we show the necessity of using secure channels between parties. Also, our proposed protocol is secure against passive attacks without having to have any secure channels. We measure the protocol's overhead and show that computational complexity is considerably reduced and also is fairer compared to the previous scheme.

In this paper, we consider the filtering problem for partially observed diffusions, which are regularly observed at discrete times. We are concerned with the case when one must resort to time-discretization of the diffusion process if the transition density is not available in an appropriate form. In such cases, one must resort to advanced numerical algorithms such as particle filters to consistently estimate the filter. It is also well known that the particle filter can be enhanced by considering hierarchies of discretizations and the multilevel Monte Carlo (MLMC) method, in the sense of reducing the computational effort to achieve a given mean square error (MSE). A variety of multilevel particle filters (MLPF) have been suggested in the literature, e.g., in Jasra et al., SIAM J, Numer. Anal., 55, 3068--3096. Here we introduce a new alternative that involves a resampling step based on the optimal Wasserstein coupling. We prove a central limit theorem (CLT) for the new method. On considering the asymptotic variance, we establish that in some scenarios, there is a reduction, relative to the approach in the aforementioned paper by Jasra et al., in computational effort to achieve a given MSE. These findings are confirmed in numerical examples. We also consider filtering diffusions with unstable dynamics; we empirically show that in such cases a change of measure technique seems to be required to maintain our findings.

The large size of nowadays' online multimedia databases makes retrieving their content a difficult and time-consuming task. Users of online sound collections typically submit search queries that express a broad intent, often making the system return large and unmanageable result sets. Search Result Clustering is a technique that organises search-result content into coherent groups, which allows users to identify useful subsets in their results. Obtaining coherent and distinctive clusters that can be explored with a suitable interface is crucial for making this technique a useful complement of traditional search engines. In our work, we propose a graph-based approach using audio features for clustering diverse sound collections obtained when querying large online databases. We propose an approach to assess the performance of different features at scale, by taking advantage of the metadata associated with each sound. This analysis is complemented with an evaluation using ground-truth labels from manually annotated datasets. We show that using a confidence measure for discarding inconsistent clusters improves the quality of the partitions. After identifying the most appropriate features for clustering, we conduct an experiment with users performing a sound design task, in order to evaluate our approach and its user interface. A qualitative analysis is carried out including usability questionnaires and semi-structured interviews. This provides us with valuable new insights regarding the features that promote efficient interaction with the clusters.

In 3D human pose estimation one of the biggest problems is the lack of large, diverse datasets. This is especially true for multi-person 3D pose estimation, where, to our knowledge, there are only machine generated annotations available for training. To mitigate this issue, we introduce a network that can be trained with additional RGB-D images in a weakly supervised fashion. Due to the existence of cheap sensors, videos with depth maps are widely available, and our method can exploit a large, unannotated dataset. Our algorithm is a monocular, multi-person, absolute pose estimator. We evaluate the algorithm on several benchmarks, showing a consistent improvement in error rates. Also, our model achieves state-of-the-art results on the MuPoTS-3D dataset by a considerable margin.

Previous work on symmetric group equivariant neural networks generally only considered the case where the group acts by permuting the elements of a single vector. In this paper we derive formulae for general permutation equivariant layers, including the case where the layer acts on matrices by permuting their rows and columns simultaneously. This case arises naturally in graph learning and relation learning applications. As a specific case of higher order permutation equivariant networks, we present a second order graph variational encoder, and show that the latent distribution of equivariant generative models must be exchangeable. We demonstrate the efficacy of this architecture on the tasks of link prediction in citation graphs and molecular graph generation.

We propose learning discrete structured representations from unlabeled data by maximizing the mutual information between a structured latent variable and a target variable. Calculating mutual information is intractable in this setting. Our key technical contribution is an adversarial objective that can be used to tractably estimate mutual information assuming only the feasibility of cross entropy calculation. We develop a concrete realization of this general formulation with Markov distributions over binary encodings. We report critical and unexpected findings on practical aspects of the objective such as the choice of variational priors. We apply our model on document hashing and show that it outperforms current best baselines based on discrete and vector quantized variational autoencoders. It also yields highly compressed interpretable representations.

Graph convolutional networks (GCNs) have gained popularity due to high performance achievable on several downstream tasks including node classification. Several architectural variants of these networks have been proposed and investigated with experimental studies in the literature. Motivated by a recent work on simplifying GCNs, we study the problem of designing other variants and propose a framework to compose networks using building blocks of GCN. The framework offers flexibility to compose and evaluate different networks using feature and/or label propagation networks, linear or non-linear networks, with each composition having different computational complexity. We conduct a detailed experimental study on several benchmark datasets with many variants and present observations from our evaluation. Our empirical experimental results suggest that several newly composed variants are useful alternatives to consider because they are as competitive as, or better than the original GCN.

We consider one-level additive Schwarz domain decomposition preconditioners for the Helmholtz equation with variable coefficients (modelling wave propagation in heterogeneous media), subject to boundary conditions that include wave scattering problems. Absorption is included as a parameter in the problem. This problem is discretised using $H^1$-conforming nodal finite elements of fixed local degree $p$ on meshes with diameter $h = h(k)$, chosen so that the error remains bounded with increasing $k$. The action of the one-level preconditioner consists of the parallel solution of problems on subdomains (which can be of general geometry), each equipped with an impedance boundary condition. We prove rigorous estimates on the norm and field of values of the left- or right-preconditioned matrix that show explicitly how the absorption, the heterogeneity in the coefficients and the dependence on the degree enter the estimates. These estimates prove rigorously that, with enough absorption and for $k$ large enough, GMRES is guaranteed to converge in a number of iterations that is independent of $k,p,$ and the coefficients. The theoretical threshold for $k$ to be large enough depends on $p$ and on the local variation of coefficients in subdomains (and not globally). Extensive numerical experiments are given for both the absorptive and the propagative cases; in the latter case we investigate examples both when the coefficients are nontrapping and when they are trapping. These experiments (i) support our theory in terms of dependence on polynomial degree and the coefficients; (ii) support the sharpness of our field of values estimates in terms of the level of absorption required.

Our society is built on a complex web of interdependencies whose effects become manifest during extraordinary events such as the COVID-19 pandemic, with shocks in one system propagating to the others to an exceptional extent. We analyzed more than 100 millions Twitter messages posted worldwide in 64 languages during the epidemic emergency due to SARS-CoV-2 and classified the reliability of news diffused. We found that waves of unreliable and low-quality information anticipate the epidemic ones, exposing entire countries to irrational social behavior and serious threats for public health. When the epidemics hit the same area, reliable information is quickly inoculated, like antibodies, and the system shifts focus towards certified informational sources. Contrary to mainstream beliefs, we show that human response to falsehood exhibits early-warning signals that might be mitigated with adequate communication strategies.

Due to individual unreliable commodity components, failures are common in large-scale distributed storage systems. Erasure codes are widely deployed in practical storage systems to provide fault tolerance with low storage overhead. However, random data distribution (RDD), commonly used in erasure-coded storage systems, induces heavy cross-rack traffic, load imbalance, and random access, which adversely affects failure recovery. In this paper, with orthogonal arrays, we define a Deterministic Data Distribution ($D^3$) to uniformly distribute data/parity blocks among nodes, and propose an efficient failure recovery approach based on $D^3$, which minimizes the cross-rack repair traffic against a single node failure. Thanks to the uniformity of $D^3$, the proposed recovery approach balances the repair traffic not only among nodes within a rack but also among racks. We implement $D^3$ over Reed-Solomon codes and Locally Repairable Codes in Hadoop Distributed File System (HDFS) with a cluster of 28 machines. Compared with RDD, our experiments show that $D^3$ significantly speeds up the failure recovery up to 2.49 times for RS codes and 1.38 times for LRCs. Moreover, $D^3$ supports front-end applications better than RDD in both of normal and recovery states.

Learning how to adapt to complex and dynamic environments is one of the most important factors that contribute to our intelligence. Endowing artificial agents with this ability is not a simple task, particularly in competitive scenarios. In this paper, we present a broad study on how popular reinforcement learning algorithms can be adapted and implemented to learn and to play a real-world implementation of a competitive multiplayer card game. We propose specific training and validation routines for the learning agents, in order to evaluate how the agents learn to be competitive and explain how they adapt to each others' playing style. Finally, we pinpoint how the behavior of each agent derives from their learning style and create a baseline for future research on this scenario.

There are several approaches for improving neural machine translation for low-resource languages: Monolingual data can be exploited via pretraining or data augmentation; Parallel corpora on related language pairs can be used via parameter sharing or transfer learning in multilingual models; Subword segmentation and regularization techniques can be applied to ensure high coverage of the vocabulary. We review these approaches in the context of an asymmetric-resource one-to-many translation task, in which the pair of target languages are related, with one being a very low-resource and the other a higher-resource language. We test various methods on three artificially restricted translation tasks---English to Estonian (low-resource) and Finnish (high-resource), English to Slovak and Czech, English to Danish and Swedish---and one real-world task, Norwegian to North S\'ami and Finnish. The experiments show positive effects especially for scheduled multi-task learning, denoising autoencoder, and subword sampling.

Given a social network with nonuniform selection cost of the users, the problem of \textit{Budgeted Influence Maximization} (BIM in short) asks for selecting a subset of the nodes within an allocated budget for initial activation, such that due to the cascading effect, influence in the network is maximized. In this paper, we study this problem with a variation, where a set of nodes are designated as target nodes, each of them is assigned with a benefit value, that can be earned by influencing them, and our goal is to maximize the earned benefit by initially activating a set of nodes within the budget. We call this problem as the \textsc{Earned Benefit Maximization Problem}. First, we show that this problem is NP\mbox{-}Hard and the benefit function is \textit{monotone}, \textit{sub\mbox{-}modular} under the \textit{Independent Cascade Model} of diffusion. We propose an incremental greedy strategy for this problem and show, with minor modification it gives $(1-\frac{1}{\sqrt{e}})$\mbox{-}factor approximation guarantee on the earned benefit. Next, by exploiting the sub\mbox{-}modularity property of the benefit function, we improve the efficiency of the proposed greedy algorithm. Then, we propose a hop\mbox{-}based heuristic method, which works based on the computation of the expected earned benefit' of the effective neighbors corresponding to the target nodes. Finally, we perform a series of extensive experiments with four real\mbox{-}life, publicly available social network datasets. From the experiments, we observe that the seed sets selected by the proposed algorithms can achieve more benefit compared to many existing methods. Particularly, the hop\mbox{-}based approach is found to be more efficient than the other ones for solving this problem.

The signature in rough path theory provides a graduated summary of a path through an examination of the effects of its increments. Inspired by recent developments of signature features in the context of machine learning, we explore a transformation that is able to embed the effect of the absolute position of the data stream into signature features. This unified feature is particularly effective for its simplifying role in allowing the signature feature set to accommodate nonlinear functions of absolute and relative values.

Large pre-trained contextual word representations have transformed the field of natural language processing, obtaining impressive results on a wide range of tasks. However, as models increase in size, computational limitations make them impractical for researchers and practitioners alike. We hypothesize that contextual representations have both intrinsic and task-specific redundancies. We propose a novel feature selection method, which takes advantage of these redundancies to reduce the size of the pre-trained features. In a comprehensive evaluation on two pre-trained models, BERT and XLNet, using a diverse suite of sequence labeling and sequence classification tasks, our method reduces the feature set down to 1--7% of the original size, while maintaining more than 97% of the performance.

Speaker verification systems usually suffer from the mismatch problem between training and evaluation data, such as speaker population mismatch, the channel and environment variations. In order to address this issue, it requires the system to have good generalization ability on unseen data. In this work, we incorporate Bayesian neural networks (BNNs) into the deep neural network (DNN) x-vector speaker verification system to improve the system's generalization ability. With the weight uncertainty modeling provided by BNNs, we expect the system could generalize better on the evaluation data and make verification decisions more accurately. Our experiment results indicate that the DNN x-vector system could benefit from BNNs especially when the mismatch problem is severe for evaluations using out-of-domain data. Specifically, results show that the system could benefit from BNNs by a relative EER decrease of 2.66% and 2.32% respectively for short- and long-utterance in-domain evaluations. Additionally, the fusion of DNN x-vector and Bayesian x-vector systems could achieve further improvement. Moreover, experiments conducted by out-of-domain evaluations, e.g. models trained on Voxceleb1 while evaluated on NIST SRE10 core test, suggest that BNNs could bring a larger relative EER decrease of around 4.69%.

We present a timely and novel methodology that combines disease estimates from mechanistic models with digital traces, via interpretable machine-learning methodologies, to reliably forecast COVID-19 activity in Chinese provinces in real-time. Specifically, our method is able to produce stable and accurate forecasts 2 days ahead of current time, and uses as inputs (a) official health reports from Chinese Center Disease for Control and Prevention (China CDC), (b) COVID-19-related internet search activity from Baidu, (c) news media activity reported by Media Cloud, and (d) daily forecasts of COVID-19 activity from GLEAM, an agent-based mechanistic model. Our machine-learning methodology uses a clustering technique that enables the exploitation of geo-spatial synchronicities of COVID-19 activity across Chinese provinces, and a data augmentation technique to deal with the small number of historical disease activity observations, characteristic of emerging outbreaks. Our model's predictive power outperforms a collection of baseline models in 27 out of the 32 Chinese provinces, and could be easily extended to other geographies currently affected by the COVID-19 outbreak to help decision makers.

Early quantification of Tuta absoluta pest's effects in tomato plants is a very important factor in controlling and preventing serious damages of the pest. The invasion of Tuta absoluta is considered a major threat to tomato production causing heavy loss ranging from 80 to 100 percent when not properly managed. Therefore, real-time and early quantification of tomato leaf miner Tuta absoluta, can play an important role in addressing the issue of pest management and enhance farmers' decisions. In this study, we propose a Convolutional Neural Network (CNN) approach in determining the effects of Tuta absoluta in tomato plants. Four CNN pre-trained architectures (VGG16, VGG19, ResNet and Inception-V3) were used in training classifiers on a dataset containing health and infested tomato leaves collected from real field experiments. Among the pre-trained architectures, experimental results showed that Inception-V3 yielded the best results with an average accuracy of 87.2 percent in estimating the severity status of Tuta absoluta in tomato plants. The pre-trained models could also easily identify High Tuta severity status compared to other severity status (Low tuta and No tuta)

In 2030, we are going to evidence the 6G mobile communication technology, which will enable the Internet of Everything. Yet 5G has to be experienced by people and B5G has to be developed; the researchers have already started planning, visioning, and gathering requirements of the 6G. 6G promises to take everyone to a different era of technology. It promises connecting every smart device to the Internet from smartphone to intelligent vehicles. 6G will provide sophisticated and high QoS such as holographic communication, augmented reality/virtual reality and many more. Also, 6G will focus on Quality of Experience (QoE) to provide rich experiences from 6G technology. Notably, it is very important to vision the issues and challenges of 6G technology, otherwise, promises may not be delivered on time. The requirements of 6G poses new challenges to the research community. To achieve desired parameters of 6G, researchers are exploring various alternatives. Hence, there are diverse research challenges to envision, from devices to softwarization. Therefore, in this article, we discuss the future issues and challenges to be faced by the 6G technology. We have discussed issues and challenges from all aspects from hardware to the enabling technologies which will be utilized by 6G.

Varying power-infeed from converter-based generation units introduces great uncertainty on system parameters such as inertia and damping. As a consequence, system operators face increasing challenges in performing dynamic security assessment and taking real-time control actions. Exploiting the widespread deployment of phasor measurement units (PMUs) and aiming at developing a fast dynamic state and parameter estimation tool, this paper investigates the performance of Physics-Informed Neural Networks (PINN) for discovering the frequency dynamics of future power systems and monitoring the system inertia in real-time. PINNs have the potential to address challenges such as the stronger non-linearities of low-inertia systems, increased measurement noise, and limited availability of data. The estimator is demonstrated in several test cases using a 4-bus system, and compared with state of the art algorithms, such as the Unscented Kalman Filter (UKF), to assess its performance.

Public transportation system commuters are often interested in getting accurate travel time information to plan their daily activities. However, this information is often difficult to predict accurately due to the irregularities of road traffic, caused by factors such as weather conditions, road accidents, and traffic jams. In this study, two neural network models namely multi-layer(MLP) perceptron and long short-term model(LSTM) are developed for predicting link travel time of a busy route with input generated using Origin-Destination travel time matrix derived from a historical GPS dataset. The experiment result showed that both models can make near-accurate predictions however, LSTM is more susceptible to noise as time step increases.

The purpose of this paper is to explore the question "to what extent could we produce formal, machine-verifiable, proofs in real algebraic geometry?" The question has been asked before but as yet the leading algorithms for answering such questions have not been formalised. We present a thesis that a new algorithm for ascertaining satisfiability of formulae over the reals via Cylindrical Algebraic Coverings [\'{A}brah\'{a}m, Davenport, England, Kremer, \emph{Deciding the Consistency of Non-Linear Real Arithmetic Constraints with a Conflict Driver Search Using Cylindrical Algebraic Coverings}, 2020] might provide trace and outputs that allow the results to be more susceptible to machine verification than those of competing algorithms.

The pre-trained language models like BERT and RoBERTa, though powerful in many natural language processing tasks, are both computational and memory expensive. To alleviate this problem, one approach is to compress them for specific tasks before deployment. However, recent works on BERT compression usually reduce the large BERT model to a fixed smaller size, and can not fully satisfy the requirements of different edge devices with various hardware performances. In this paper, we propose a novel dynamic BERT model (abbreviated as DynaBERT), which can run at adaptive width and depth. The training process of DynaBERT includes first training a width-adaptive BERT and then allows both adaptive width and depth, by distilling knowledge from the full-sized model to small sub-networks. Network rewiring is also used to keep the more important attention heads and neurons shared by more sub-networks. Comprehensive experiments under various efficiency constraints demonstrate that our proposed dynamic BERT (or RoBERTa) at its largest size has comparable performance as BERT (or RoBERTa), while at smaller widths and depths consistently outperforms existing BERT compression methods.

Singing voice detection is the task to identify the frames which contain the singer vocal or not. It has been one of the main components in music information retrieval (MIR), which can be applicable to melody extraction, artist recognition, and music discovery in popular music. Although there are several methods which have been proposed, a more robust and more complete system is desired to improve the detection performance. In this paper, our motivation is to provide an extensive comparison in different stages of singing voice detection. Based on the analysis a novel method was proposed to build a more efficiently singing voice detection system. In the proposed system, there are main three parts. The first is a pre-process of singing voice separation to extract the vocal without the music. The improvements of several singing voice separation methods were compared to decide the best one which is integrated to singing voice detection system. And the second is a deep neural network based classifier to identify the given frames. Different deep models for classification were also compared. The last one is a post-process to filter out the anomaly frame on the prediction result of the classifier. The median filter and Hidden Markov Model (HMM) based filter as the post process were compared. Through the step by step module extension, the different methods were compared and analyzed. Finally, classification performance on two public datasets indicates that the proposed approach which based on the Long-term Recurrent Convolutional Networks (LRCN) model is a promising alternative.

In recent years, it has become crucial to improve the resilience of electricity distribution networks (DNs) against storm-induced failures. Microgrids enabled by Distributed Energy Resources (DERs) can significantly help speed up re-energization of loads, particularly in the complete absence of bulk power supply. We describe an integrated approach which considers a pre-storm DER allocation problem under the uncertainty of failure scenarios as well as a post-storm dispatch problem in microgrids during the multi-period repair of the failed components. This problem is computationally challenging because the number of scenarios (resp. binary variables) increases exponentially (resp. quadratically) in the network size. Our overall solution approach for solving the resulting two-stage mixed-integer linear program (MILP) involves implementing the sample average approximation (SAA) method and Benders Decomposition. Additionally, we implement a greedy approach to reduce the computational time requirements of the post-storm repair scheduling and dispatch problem. The optimality of the resulting solution is evaluated on a modified IEEE 36-node network.

The outbreak of the COVID-19 pandemic has changed our lives in unprecedented ways. In the face of the projected catastrophic consequences, many countries have enacted social distancing measures in an attempt to limit the spread of the virus. Under these conditions, the Web has become an indispensable medium for information acquisition, communication, and entertainment. At the same time, unfortunately, the Web is being exploited for the dissemination of potentially harmful and disturbing content, such as the spread of conspiracy theories and hateful speech towards specific ethnic groups, in particular towards Chinese people since COVID-19 is believed to have originated from China.

In this paper, we make a first attempt to study the emergence of Sinophobic behavior on the Web during the outbreak of the COVID-19 pandemic. We collect two large-scale datasets from Twitter and 4chan's Politically Incorrect board (/pol/) over a time period of approximately five months and analyze them to investigate whether there is a rise or important differences with regard to the dissemination of Sinophobic content. We find that COVID-19 indeed drives the rise of Sinophobia on the Web and that the dissemination of Sinophobic content is a cross-platform phenomenon: it exists on fringe Web communities like \dspol, and to a lesser extent on mainstream ones like Twitter. Also, using word embeddings over time, we characterize the evolution and emergence of new Sinophobic slurs on both Twitter and /pol/. Finally, we find interesting differences in the context in which words related to Chinese people are used on the Web before and after the COVID-19 outbreak: on Twitter we observe a shift towards blaming China for the situation, while on /pol/ we find a shift towards using more (and new) Sinophobic slurs.

We present an analysis of semi-supervised acoustic and language model training for English-isiZulu code-switched ASR using soap opera speech. Approximately 11 hours of untranscribed multilingual speech was transcribed automatically using four bilingual code-switching transcription systems operating in English-isiZulu, English-isiXhosa, English-Setswana and English-Sesotho. These transcriptions were incorporated into the acoustic and language model training sets. Results showed that the TDNN-F acoustic models benefit from the additional semi-supervised data and that even better performance could be achieved by including additional CNN layers. Using these CNN-TDNN-F acoustic models, a first iteration of semi-supervised training achieved an absolute mixed-language WER reduction of 3.4%, and a further 2.2% after a second iteration. Although the languages in the untranscribed data were unknown, the best results were obtained when all automatically transcribed data was used for training and not just the utterances classified as English-isiZulu. Despite reducing perplexity, the semi-supervised language model was not able to improve the ASR performance.

Contact tracing is being widely employed to combat the spread of COVID-19. Many apps have been developed that allow for tracing to be done automatically based off location and interaction data generated by users. There are concerns, however, regarding the privacy and security of users data when using these apps. These concerns are paramount for users who contract the virus, as they are generally required to release all their data. Motivated by the need to protect users privacy we propose two solutions to this problem. Our first solution builds on current "message based" methods and our second leverages ideas from secret sharing and additively homomorphic encryption.

Recent attempts to ingest external knowledge into neural models for named-entity recognition (NER) have exhibited mixed results. In this work, we present GazSelfAttn, a novel gazetteer embedding approach that uses self-attention and match span encoding to build enhanced gazetteer embeddings. In addition, we demonstrate how to build gazetteer resources from the open source Wikidata knowledge base. Evaluations on CoNLL-03 and Ontonotes 5 datasets, show F1 improvements over baseline model from 92.34 to 92.86 and 89.11 to 89.32 respectively, achieving performance comparable to large state-of-the-art models.

Convolutional neural networks (CNNs) give state of the art performance in many pattern recognition problems but can be fooled by carefully crafted patterns of noise. We report that CNN face recognition systems also make surprising "errors". We tested six commercial face recognition CNNs and found that they outperform typical human participants on standard face matching tasks. However, they also declare matches that humans would not, where one image from the pair has been transformed to look a different sex or race. This is not due to poor performance; the best CNNs perform almost perfectly on the human face matching tasks, but also declare the most matches for faces of a different apparent race or sex. Although differing on the salience of sex and race, humans and computer systems are not working in completely different ways. They tend to find the same pairs of images difficult, suggesting some agreement about the underlying similarity space.

Existing algorithms for aligning cross-lingual word vector spaces assume that vector spaces are approximately isomorphic. As a result, they perform poorly or fail completely on non-isomorphic spaces. Such non-isomorphism has been hypothesised to result almost exclusively from typological differences between languages. In this work, we ask whether non-isomorphism is also crucially a sign of degenerate word vector spaces. We present a series of experiments across diverse languages which show that, besides inherent typological differences, variance in performance across language pairs can largely be attributed to the size of the monolingual resources available, and to the properties and duration of monolingual training (e.g. "under-training").

We develop a novel Multilevel Asymptotic-Preserving Monte Carlo (ML-APMC) method for simulating the kinetic Boltzmann transport equation with Bhatnagar-Gross-Krook (BGK) collision operator. This equation occurs, for instance, in mathematical models of the neutral particles in the plasma edge of nuclear fusion reactors. The main features of our method are a new and improved recipe for correlating particle trajectories with different time step sizes, and a new and more general level selection strategy. We illustrate the efficiency of our ML-APMC method by applying it to a one-dimensional fusion test case with nonhomogeneous and anisotropic plasma background. Our method yields significant speedups, both in the low and high collisional regime. In the high-collisional case, our ML-APMC outperforms the single-level APMC method by several orders of magnitude.

In this paper, robust deep learning frameworks are introduced, aims to detect respiratory diseases from respiratory sound inputs. The entire processes firstly begins with a front-end feature extraction that transforms recordings into spectrograms. Next, a back-end deep learning model classifies the spectrogram features into categories of respiratory disease or anomaly. Experiments are conducted over the ICBHI benchmark dataset of respiratory sounds. According to obtained experimental results, we make three main contributions toward lung-sound analysis: Firstly, we provide an extensive analysis on common factors (type of spectrogram, time resolution, cycle length, or data augmentation, etc.) that affect final prediction accuracy in a deep learning based system. Secondly, we propose novel deep learning based frameworks by using the most influencing factors indicated. As a result, the proposed deep learning frameworks outperforms state of the art methods. Finally, we successfully to apply the Teacher-Student scheme to solve the trade-off between model performance and model size that helps to increase ability of building real-time applications.

The ability to learn in dynamic, nonstationary environments without forgetting previous knowledge, also known as Continual Learning (CL), is a key enabler for scalable and trustworthy deployments of adaptive solutions. While the importance of continual learning is largely acknowledged in machine vision and reinforcement learning problems, this is mostly under-documented for sequence processing tasks. This work proposes a Recurrent Neural Network (RNN) model for CL that is able to deal with concept drift in input distribution without forgetting previously acquired knowledge. We also implement and test a popular CL approach, Elastic Weight Consolidation (EWC), on top of two different types of RNNs. Finally, we compare the performances of our enhanced architecture against EWC and RNNs on a set of standard CL benchmarks, adapted to the sequential data processing scenario. Results show the superior performance of our architecture and highlight the need for special solutions designed to address CL in RNNs.

Neuromorphic event-based dynamic vision sensors (DVS) have much faster sampling rates and a higher dynamic range than frame-based imagers. However, they are sensitive to background activity (BA) events which are unwanted. we propose a new criterion with little computation overhead for defining real events and BA events by utilizing the global space and time information rather than the local information by Gaussian convolution, which can be also used as a filter. We denote the filter as GF. We demonstrate GF on three datasets, each recorded by a different DVS with different output size. The experimental results show that our filter produces the clearest frames compared with baseline filters and run fast.

This paper introduces the first dataset of satellite images labeled with forage quality by on-the-ground experts and provides proof of concept for applying computer vision methods to index-based drought insurance. We also present the results of a collaborative benchmark tool used to crowdsource an accurate machine learning model on the dataset. Our methods significantly outperform the existing technology for an insurance program in Northern Kenya, suggesting that a computer vision-based approach could substantially benefit pastoralists, whose exposure to droughts is severe and worsening with climate change.

Pose estimation and map building are central ingredients of autonomous robots and typically rely on the registration of sensor data. In this paper, we investigate a new metric for registering images that builds upon on the idea of the photometric error. Our approach combines a gradient orientation-based metric with a magnitude-dependent scaling term. We integrate both into stereo estimation as well as visual odometry systems and show clear benefits for typical disparity and direct image registration tasks when using our proposed metric. Our experimental evaluation indicats that our metric leads to more robust and more accurate estimates of the scene depth as well as camera trajectory. Thus, the metric improves camera pose estimation and in turn the mapping capabilities of mobile robots. We believe that a series of existing visual odometry and visual SLAM systems can benefit from the findings reported in this paper.

Point cloud analysis has received much attention recently; and segmentation is one of the most important tasks. The success of existing approaches is attributed to deep network design and large amount of labelled training data, where the latter is assumed to be always available. However, obtaining 3d point cloud segmentation labels is often very costly in practice. In this work, we propose a weakly supervised point cloud segmentation approach which requires only a tiny fraction of points to be labelled in the training stage. This is made possible by learning gradient approximation and exploitation of additional spatial and color smoothness constraints. Experiments are done on three public datasets with different degrees of weak supervision. In particular, our proposed method can produce results that are close to and sometimes even better than its fully supervised counterpart with 10$\times$ fewer labels.

When trained effectively, the Variational Autoencoder (VAE) can be both a powerful generative model and an effective representation learning framework for natural language. In this paper, we propose the first large-scale language VAE model, Optimus. A universal latent embedding space for sentences is first pre-trained on large text corpus, and then fine-tuned for various language generation and understanding tasks. Compared with GPT-2, Optimus enables guided language generation from an abstract level using the latent vectors. Compared with BERT, Optimus can generalize better on low-resource language understanding tasks due to the smooth latent space structure. Extensive experimental results on a wide range of language tasks demonstrate the effectiveness of Optimus. It achieves new state-of-the-art on VAE language modeling benchmarks. We hope that our first pre-trained big VAE language model itself and results can help the NLP community renew the interests of deep generative models in the era of large-scale pre-training, and make these principled methods more practical.

The recent advances in deep learning indicate significant progress in the field of single image super-resolution. With the advent of these techniques, high-resolution image with high peak signal to noise ratio (PSNR) and excellent perceptual quality can be reconstructed. The major challenges associated with existing deep convolutional neural networks are their computational complexity and time; the increasing depth of the networks, often result in high space complexity. To alleviate these issues, we developed an innovative shallow residual feature representative network (SRFRN) that uses a bicubic interpolated low-resolution image as input and residual representative units (RFR) which include serially stacked residual non-linear convolutions. Furthermore, the reconstruction of the high-resolution image is done by combining the output of the RFR units and the residual output from the bicubic interpolated LR image. Finally, multiple experiments have been performed on the benchmark datasets and the proposed model illustrates superior performance for higher scales. Besides, this model also exhibits faster execution time compared to all the existing approaches.

Deep speaker embedding has demonstrated state-of-the-art performance in audio speaker recognition (SRE). However, one potential issue with this approach is that the speaker vectors derived from deep embedding models tend to be non-Gaussian for each individual speaker, and non-homogeneous for distributions of different speakers. These irregular distributions can seriously impact SRE performance, especially with the popular PLDA scoring method, which assumes homogeneous Gaussian distribution. In this paper, we argue that deep speaker vectors require deep normalization, and propose a deep normalization approach based on a novel discriminative normalization flow (DNF) model. We demonstrate the effectiveness of the proposed approach with experiments using the widely used SITW and CNCeleb corpora. In these experiments, the DNF-based normalization delivered substantial performance gains and also showed strong generalization capability in out-of-domain tests.

Speaker embeddings (x-vectors) extracted from very short segments of speech have recently been shown to give competitive performance in speaker diarization. We generalize this recipe by extracting from each speech segment, in parallel with the x-vector, also a diagonal precision matrix, thus providing a path for the propagation of information about the quality of the speech segment into a PLDA scoring backend. These precisions quantify the uncertainty about what the values of the embeddings might have been if they had been extracted from high quality speech segments. The proposed probabilistic embeddings (x-vectors with precisions) are interfaced with the PLDA model by treating the x-vectors as hidden variables and marginalizing them out. We apply the proposed probabilistic embeddings as input to an agglomerative hierarchical clustering (AHC) algorithm to do diarization in the DIHARD'19 evaluation set. We compute the full PLDA likelihood 'by the book' for each clustering hypothesis that is considered by AHC. We do joint discriminative training of the PLDA parameters and of the probabilistic x-vector extractor. We demonstrate accuracy gains relative to a baseline AHC algorithm, applied to traditional xvectors (without uncertainty), and which uses averaging of binary log-likelihood-ratios, rather than by-the-book scoring.

Due to the simple design pipeline, end-to-end (E2E) neural models for speech enhancement (SE) have attracted great interest. In order to improve the performance of the E2E model, the locality and temporal sequential properties of speech should be efficiently taken into account when modelling. However, in most current E2E models for SE, these properties are either not fully considered, or are too complex to be realized. In this paper, we propose an efficient E2E SE model, termed WaveCRN. In WaveCRN, the speech locality feature is captured by a convolutional neural network (CNN), while the temporal sequential property of the locality feature is modeled by stacked simple recurrent units (SRU). Unlike a conventional temporal sequential model that uses a long short-term memory (LSTM) network, which is difficult to parallelize, SRU can be efficiently parallelized in calculation with even fewer model parameters. In addition, in order to more effectively suppress the noise components in the input noisy speech, we derive a novel restricted feature masking (RFM) approach that performs enhancement on the embedded features in the hidden layers instead of on the physical spectral features commonly used in speech separation tasks. Experimental results on speech denoising and compressed speech restoration tasks confirm that with the lightweight architecture of SRU and the feature-mapping-based RFM, WaveCRN performs comparably with other state-of-the-art approaches with notably reduced model complexity and inference time.

Automatic Speech Recognition (ASR) is the interdisciplinary subfield of computational linguistics that develops methodologies and technologies that enables the recognition and translation of spoken language into text by computers. It incorporates knowledge and research in linguistics, computer science, and electrical engineering fields. Sentiment analysis is contextual mining of text which identifies and extracts subjective information in the source material and helping a business to understand the social sentiment of their brand, product or service while monitoring online conversations. According to the speech structure, three models are used in speech recognition to do the match: Acoustic Model, Phonetic Dictionary and Language Model. Any speech recognition program is evaluated using two factors: Accuracy (percentage error in converting spoken words to digital data) and Speed (the extent to which the program can keep up with a human speaker). For the purpose of converting speech to text (STT), we will be studying the following open source toolkits: CMU Sphinx and Kaldi. The toolkits use Mel-Frequency Cepstral Coefficients (MFCC) and I-vector for feature extraction. CMU Sphinx has been used with pre-trained Hidden Markov Models (HMM) and Gaussian Mixture Models (GMM), while Kaldi is used with pre-trained Neural Networks (NNET) as acoustic models. The n-gram language models contain the phonemes or pdf-ids for generating the most probable hypothesis (transcription) in the form of a lattice. The speech dataset is stored in the form of .raw or .wav file and is transcribed in .txt file. The system then tries to identify opinions within the text, and extract the following attributes: Polarity (if the speaker expresses a positive or negative opinion) and Keywords (the thing that is being talked about).

The research of knowledge-driven conversational systems is largely limited due to the lack of dialog data which consist of multi-turn conversations on multiple topics and with knowledge annotations. In this paper, we propose a Chinese multi-domain knowledge-driven conversation dataset, KdConv, which grounds the topics in multi-turn conversations to knowledge graphs. Our corpus contains 4.5K conversations from three domains (film, music, and travel), and 86K utterances with an average turn number of 19.0. These conversations contain in-depth discussions on related topics and natural transition between multiple topics. To facilitate the following research on this corpus, we provide several benchmark models. Comparative results show that the models can be enhanced by introducing background knowledge, yet there is still a large space for leveraging knowledge to model multi-turn conversations for further research. Results also show that there are obvious performance differences between different domains, indicating that it is worth to further explore transfer learning and domain adaptation. The corpus and benchmark models are publicly available.

Emotion intensity prediction determines the degree or intensity of an emotion that the author intends to express in a text, extending previous categorical approaches to emotion detection. While most previous work on this topic has concentrated on English texts, other languages would also benefit from fine-grained emotion classification, preferably without having to recreate the amount of annotated data available in English in each new language. Consequently, we explore cross-lingual transfer approaches for fine-grained emotion detection in Spanish and Catalan tweets. To this end we annotate a test set of Spanish and Catalan tweets using Best-Worst scaling. We compare four cross-lingual approaches, e.g., machine translation and cross-lingual embedding projection, which have varying requirements for parallel data -- from millions of parallel sentences to completely unsupervised. The results show that on this data, low-resource methods perform surprisingly better than conventional supervised methods, which we explain through an in-depth error analysis. We make the dataset and the code available at https://github.com/jbarnesspain/fine-grained_cross-lingual_emotion.

Blockchain-enabled Federated Learning (BFL) enables model updates of Federated Learning (FL) to be stored in the blockchain in a secure and reliable manner. However, the issue of BFL is that the training latency may increase due to the blockchain mining process. The other issue is that mobile devices in BFL have energy and CPU constraints that may reduce the system lifetime and training efficiency. To address these issues, the Machine Learning Model Owner (MLMO) needs to (i) decide how much data and energy that the mobile devices use for the training and (ii) determine the mining difficulty to minimize the training latency and energy consumption while achieving the target model accuracy. Under the uncertainty of the BFL environment, it is challenging for the MLMO to determine the optimal decisions. We propose to use the Deep Reinforcement Learning (DRL) to derive the optimal decisions for the MLMO.

We investigate the relationship between the frequency with which verbs are found in particular subcategorization frames and the acceptability of those verbs in those frames, focusing in particular on subordinate clause-taking verbs, such as "think", "want", and "tell". We show that verbs' subcategorization frame frequency distributions are poor predictors of their acceptability in those frames---explaining, at best, less than 1/3 of the total information about acceptability across the lexicon---and, further, that common matrix factorization techniques used to model the acquisition of verbs' acceptability in subcategorization frames fare only marginally better. All data and code are available at this http URL

Event-related desynchronization and synchronization (ERD/S) and movement-related cortical potential (MRCP) play an important role in brain-computer interfaces (BCI) for lower limb rehabilitation, particularly in standing and sitting. However, little is known about the differences in the cortical activation between standing and sitting, especially how the brain's intention modulates the pre-movement sensorimotor rhythm as they do for switching movements. In this study, we aim to investigate the decoding of continuous EEG rhythms during action observation (AO), motor imagery (MI), and motor execution (ME) for standing and sitting. We developed a behavioral task in which participants were instructed to perform both AO and MI/ME in regard to the actions of sit-to-stand and stand-to-sit. Our results demonstrated that the ERD was prominent during AO, whereas ERS was typical during MI at the alpha band across the sensorimotor area. A combination of the filter bank common spatial pattern (FBCSP) and support vector machine (SVM) for classification was used for both offline and pseudo-online analysis. The offline analysis indicated the classification of AO and MI providing the highest mean accuracy at 82.73$\pm$2.38\% in stand-to-sit transition. The results were acceptable in comparison to the original FBCSP study of right hand and right foot activation classifications. By applying the pseudo-online analysis, we demonstrated the possibility of decoding neural intentions from the integration of both AO and MI. These observations led us to the promising aspect of using our developed tasks to build future exoskeleton-based rehabilitation systems.

This paper presents the modal truncation and singular value decomposition (SVD) technique as two main algorithms for dynamic model reduction of the power system. The significance and accuracy of the proposed methods are investigated with their detailed formulation derived for a constrained linear system. The full linearized model of the original nonlinear system is determined and used as the input of the dynamic reduction technique. Therefore, the variables of a synchronous machine in a multi-machine system is studied and replaced with a much simpler dynamic model. This equivalent dynamic model should behave similarly to what is observed from the system under study. The capability of each technique in keeping dominant oscillation modes after dynamic reduction is utilized as the comparison criteria. The reduction techniques are simulated over the dynamic 39-bus New England test system for validation.

We propose a concept for reservoir computing on oscillators using the high-order synchronization effect. The reservoir output is presented in the form of oscillator synchronization metrics: fractional high-order synchronization value and synchronization efficiency, expressed as a percentage. Using two coupled relaxation oscillators built on VO2 switches, we created an oscillator reservoir that allows simulating the XOR operation. The reservoir can operate as with static input data (power currents, coupling forces), as with dynamic data in the form of spike sequences. Having a small number of oscillators and significant non-linearity, the reservoir expresses a wide range of dynamic states. The proposed computing concept can be implemented on oscillators of diverse nature.

Deep neural networks have experimentally demonstrated superior performance over other machine learning approaches in decision-making predictions. However, one major concern is the closed set nature of the classification decision on the trained classes, which can have serious consequences in safety critical systems. When the deep neural network is in a streaming environment, fast interpretation of this classification is required to determine if the classification result is trusted. Un-trusted classifications can occur when the input data to the deep neural network changes over time. One type of change that can occur is concept evolution, where a new class is introduced that the deep neural network was not trained on. In the majority of deep neural network architectures, the only option is to assign this instance to one of the classes it was trained on, which would be incorrect. The aim of this research is to detect the arrival of a new class in the stream. Existing work on interpreting deep neural networks often focuses on neuron activations to provide visual interpretation and feature extraction. Our novel approach, coined DeepStreamCE, uses streaming approaches for real-time concept evolution detection in deep neural networks. DeepStreamCE applies neuron activation reduction using an autoencoder and MCOD stream-based clustering in the offline phase. Both outputs are used in the online phase to analyse the neuron activations in the evolving stream in order to detect concept evolution occurrence in real time. We evaluate DeepStreamCE by training VGG16 convolutional neural networks on combinations of data from the CIFAR-10 dataset, holding out some classes to be used as concept evolution. For comparison, we apply the data and VGG16 networks to an open-set deep network solution - OpenMax. DeepStreamCE outperforms OpenMax when identifying concept evolution for our datasets.

We consider a counter-adversarial sequential decision-making problem where an agent computes its private belief (posterior distribution) of the current state of the world, by filtering private information. According to its private belief, the agent performs an action, which is observed by an adversarial agent. We have recently shown how the adversarial agent can reconstruct the private belief of the decision-making agent via inverse optimization. The main contribution of this paper is a method to obfuscate the private belief of the agent from the adversary, by performing a suboptimal action. The proposed method optimizes the trade-off between obfuscating the private belief and limiting the increase in cost accrued due to taking a suboptimal action. We propose a probabilistic relaxation to obtain a linear optimization problem for solving the trade-off. In numerical examples, we show that the proposed methods enable the agent to obfuscate its private belief without compromising its cost budget.

In this paper we investigate some of the issues that arise from the scalarization of the multi-objective optimization problem in the Advantage Actor Critic (A2C) reinforcement learning algorithm. We show how a naive scalarization leads to gradients overlapping and we also argue that the entropy regularization term just inject uncontrolled noise into the system. We propose two methods: one to avoid gradient overlapping (NOG) but keeping the same loss formulation; and one to avoid the noise injection (TE) but generating action distributions with a desired entropy. A comprehensive pilot experiment has been carried out showing how using our proposed methods speeds up the training of 210%. We argue how the proposed solutions can be applied to all the Advantage based reinforcement learning algorithms.

In all developing and developed countries in the world, skin diseases are becoming a very frequent health problem for the humans of all age groups. Skin problems affect mental health, develop addiction to alcohol and drugs and sometimes causes social isolation. Considering the importance, we propose an automatic technique to detect three popular skin diseases- Leprosy, Tinea versicolor and Vitiligofrom the images of skin lesions. The proposed technique involves Weber local descriptor and Local binary pattern to represent texture pattern of the affected skin regions. This ensemble technique achieved 91.38% accuracy using multi-level support vector machine classifier, where features are extracted from different regions that are based on center of gravity. We have also applied some popular deep learn-ing networks such as MobileNet, ResNet_152, GoogLeNet,DenseNet_121, and ResNet_101. We get 89% accuracy using ResNet_101. The ensemble approach clearly outperform all of the used deep learning networks. This imaging tool will be useful for early skin disease screening.

Named entity recognition systems perform well on standard datasets comprising English news. But given the paucity of data, it is difficult to draw conclusions about the robustness of systems with respect to recognizing a diverse set of entities. We propose a method for auditing the in-domain robustness of systems, focusing specifically on differences in performance due to the national origin of entities. We create entity-switched datasets, in which named entities in the original texts are replaced by plausible named entities of the same type but of different national origin. We find that state-of-the-art systems' performance vary widely even in-domain: In the same context, entities from certain origins are more reliably recognized than entities from elsewhere. Systems perform best on American and Indian entities, and worst on Vietnamese and Indonesian entities. This auditing approach can facilitate the development of more robust named entity recognition systems, and will allow research in this area to consider fairness criteria that have received heightened attention in other predictive technology work.

BERT is a cutting-edge language representation model pre-trained by a large corpus, which achieves superior performances on various natural language understanding tasks. However, a major blocking issue of applying BERT to online services is that it is memory-intensive and leads to unsatisfactory latency of user requests, raising the necessity of model compression. Existing solutions leverage the knowledge distillation framework to learn a smaller model that imitates the behaviors of BERT. However, the training procedure of knowledge distillation is expensive itself as it requires sufficient training data to imitate the teacher model. In this paper, we address this issue by proposing a hybrid solution named LadaBERT (Lightweight adaptation of BERT through hybrid model compression), which combines the advantages of different model compression methods, including weight pruning, matrix factorization and knowledge distillation. LadaBERT achieves state-of-the-art accuracy on various public datasets while the training overheads can be reduced by an order of magnitude.

In this work we report that the public reacted on social media at an early stage of the COVID-19 pandemic in a surprisingly accurate way, with activity levels reflecting the severity of the contagion figures registered almost a month later. Specifically, the intensity of COVID-related social media activity from different Italian regions at the beginning of the epidemic (21-24/2/2020), predicts well the total number of deaths reached almost a month later (21/3/2020) in each region. It should be noted that at the time of the initial twitter reaction no tabled regional data on the epidemic was readily available. By the 24th February 2020 only two regions reported death cases and only three reported infected subjects.

Extended versions of the Lambek Calculus currently used in computational linguistics rely on unary modalities to allow for the controlled application of structural rules affecting word order and phrase structure. These controlled structural operations give rise to derivational ambiguities that are missed by the original Lambek Calculus or its pregroup simplification. Proposals for compositional interpretation of extended Lambek Calculus in the compact closed category of FVect and linear maps have been made, but in these proposals the syntax-semantics mapping ignores the control modalities, effectively restricting their role to the syntax. Our aim is to turn the modalities into first-class citizens of the vectorial interpretation. Building on the density matrix semantics of (Correia et al, 2019), we extend the interpretation of the type system with an extra spin density matrix space. The interpretation of proofs then results in ambiguous derivations being tensored with orthogonal spin states. Our method introduces a way of simultaneously representing co-existing interpretations of ambiguous utterances, and provides a uniform framework for the integration of lexical and derivational ambiguity.

Data on transient events, like GRBs, are often contained in large databases of unstructured data from space experiments, merged with potentially large amount of background or simply undesired information. We present a computational formal model to apply techniques of modern computer science -such as Data Mining (DM) and Knowledge Discovering in Databases (KDD)- to a generic, large database derived from a high energy astrophysics experiment. This method is aimed to search, identify and extract expected information, and maybe to discover unexpected information .

We present CURL: Contrastive Unsupervised Representations for Reinforcement Learning. CURL extracts high-level features from raw pixels using contrastive learning and performs off-policy control on top of the extracted features. CURL outperforms prior pixel-based methods, both model-based and model-free, on complex tasks in the DeepMind Control Suite and Atari Games showing 2.8x and 1.6x performance gains respectively at the 100K interaction steps benchmark. On the DeepMind Control Suite, CURL is the first image-based algorithm to nearly match the sample-efficiency and performance of methods that use state-based features.

Today, data analysts largely rely on intuition to determine whether missing or withheld rows of a dataset significantly affect their analyses. We propose a framework that can produce automatic contingency analysis, i.e., the range of values an aggregate SQL query could take, under formal constraints describing the variation and frequency of missing data tuples. We describe how to process SUM, COUNT, AVG, MIN, and MAX queries in these conditions resulting in hard error bounds with testable constraints. We propose an optimization algorithm based on an integer program that reconciles a set of such constraints, even if they are overlapping, conflicting, or unsatisfiable, into such bounds. Our experiments on real-world datasets against several statistical imputation and inference baselines show that statistical techniques can have a deceptively high error rate that is often unpredictable. In contrast, our framework offers hard bounds that are guaranteed to hold if the constraints are not violated. In spite of these hard bounds, we show competitive accuracy to statistical baselines.

One of the greatest obstacles in the adoption of deep neural networks for new applications is that training the network typically requires a large number of manually labeled training samples. We empirically investigate the scenario where one has access to large amounts of unlabeled data but require labeling only a single prototypical sample per class in order to train a deep network (i.e., one-shot semi-supervised learning). Specifically, we investigate the recent results reported in FixMatch for one-shot semi-supervised learning to understand the factors that affect and impede high accuracies and reliability for one-shot semi-supervised learning of Cifar-10. For example, we discover that one barrier to one-shot semi-supervised learning for high-performance image classification is the unevenness of class accuracy during the training. These results point to solutions that might enable more widespread adoption of one-shot semi-supervised training methods for new applications.

Scene flow estimation has been receiving increasing attention for 3D environment perception. Monocular scene flow estimation -- obtaining 3D structure and 3D motion from two temporally consecutive images -- is a highly ill-posed problem, and practical solutions are lacking to date. We propose a novel monocular scene flow method that yields competitive accuracy and real-time performance. By taking an inverse problem view, we design a single convolutional neural network (CNN) that successfully estimates depth and 3D motion simultaneously from a classical optical flow cost volume. We adopt self-supervised learning with 3D loss functions and occlusion reasoning to leverage unlabeled data. We validate our design choices, including the proxy loss and augmentation setup. Our model achieves state-of-the-art accuracy among unsupervised/self-supervised learning approaches to monocular scene flow, and yields competitive results for the optical flow and monocular depth estimation sub-tasks. Semi-supervised fine-tuning further improves the accuracy and yields promising results in real-time.

This document describes the aggregation and anonymization process applied to the initial version of Google COVID-19 Community Mobility Reports (published at this http URL on April 2, 2020), a publicly available resource intended to help public health authorities understand what has changed in response to work-from-home, shelter-in-place, and other recommended policies aimed at flattening the curve of the COVID-19 pandemic. Our anonymization process is designed to ensure that no personal data, including an individual's location, movement, or contacts, can be derived from the resulting metrics.

The high-level description of the procedure is as follows: we first generate a set of anonymized metrics from the data of Google users who opted in to Location History. Then, we compute percentage changes of these metrics from a baseline based on the historical part of the anonymized metrics. We then discard a subset which does not meet our bar for statistical reliability, and release the rest publicly in a format that compares the result to the private baseline.

Indian Railway Network has been analyzed on the basis of number of trains directly linking two railway zones. The network has been displayed as a weighted graph where the weights denote the number of trains between the zones. It may be pointed out that each zone is a complex network in itself and may depict different characteristic features. The zonal network therefore can be considered as a network of complex networks. In this paper, self links, in-degree and out-degree of each zone have been computed which provides information about the inter and intra zonal connectivity. Degree passenger correlation which gives an idea about number of trains and passengers originating from a particular zone which might play a role in policy making decisions has also been studied. Some other complex network parameters like betweenness, clustering coefficient and cliques have been obtained to get more insight about the complex Indian zonal network.

The automatic detection of events in sport videos has im-portant applications for data analytics, as well as for broadcasting andmedia companies. This paper presents a comprehensive approach for de-tecting a wide range of complex events in soccer videos starting frompositional data. The event detector is designed as a two-tier system thatdetectsatomicandcomplex events. Atomic events are detected basedon temporal and logical combinations of the detected objects, their rel-ative distances, as well as spatio-temporal features such as velocity andacceleration. Complex events are defined as temporal and logical com-binations of atomic and complex events, and are expressed by meansof a declarative Interval Temporal Logic (ITL). The effectiveness of theproposed approach is demonstrated over 16 different events, includingcomplex situations such as tackles and filtering passes. By formalizingevents based on principled ITL, it is possible to easily perform reason-ing tasks, such as understanding which passes or crosses result in a goalbeing scored. To counterbalance the lack of suitable, annotated publicdatasets, we built on an open source soccer simulation engine to re-lease the synthetic SoccER (Soccer Event Recognition) dataset, whichincludes complete positional data and annotations for more than 1.6 mil-lion atomic events and 9,000 complex events. The dataset and code areavailable at https://gitlab.com/grains2/slicing-and-dicing-soccer

The ability to simulate a fermionic system on a quantum computer is expected to revolutionize chemical engineering, materials design, nuclear physics, to name a few. Thus, optimizing the simulation circuits is of significance in harnessing the power of quantum computers. Here, we address this problem in two aspects. In the fault-tolerant regime, we optimize the $\rzgate$ gate counts and depths, assuming the use of a product-formula algorithm for implementation. In the pre-fault tolerant regime, we optimize the two-qubit gate counts, assuming the use of variational quantum eigensolver (VQE) approach. Specifically to the latter, we present a framework that enables bootstrapping the VQE progression towards the convergence of the ground-state energy of the fermionic system. This framework, based on perturbation theory, also improves the energy estimate at each cycle of the VQE progression dramatically, resulting in significant savings of quantum resources required to be within a pre-specified tolerance from the known ground-state energy in the test-bed, classically-accessible system of the water molecule. We also explore a suite of generalized transformations of fermion to qubit operators and show that resource-requirement savings of up to nearly $20\%$ is possible.

In this paper, a new analogue of blossom based on post quantum calculus is introduced. The post quantum blossom has been adapted for developing identities and algorithms for Bernstein bases and B$\acute{e}$zier curves. By applying the post quantum blossom, various new identities and formulae expressing the monomials in terms of the post quantun Bernstein basis functions and a post quantun variant of Marsden's identity are investigated. For each post quantum B$\acute{e}$zier curves of degree $m,$ a collection of $m!$ new, affine invariant, recursive evaluation algorithms are derived.

An important task in quantum physics is the estimation of local quantities for ground states of local Hamiltonians. Recently, [Ambainis, CCC 2014] defined the complexity class P^QMA[log], and motivated its study by showing that the physical task of estimating the expectation value of a local observable against the ground state of a local Hamiltonian is P^QMA[log]-complete. In this paper, we continue the study of P^QMA[log], obtaining the following lower and upper bounds.

Lower bounds (hardness results): (1) The P^QMA[log]-completeness result of [Ambainis, CCC 2014] requires O(log n)-local observables and Hamiltonians. We show that simulating even a single qubit measurement on ground states of 5-local Hamiltonians is P^QMA[log]-complete, resolving an open question of Ambainis. (2) We formalize the complexity theoretic study of estimating two-point correlation functions against ground states, and show that this task is similarly P^QMA[log]-complete. (3) We identify a flaw in [Ambainis, CCC 2014] regarding a P^UQMA[log]-hardness proof for estimating spectral gaps of local Hamiltonians. By introducing a "query validation" technique, we build on [Ambainis, CCC 2014] to obtain P^UQMA[log]-hardness for estimating spectral gaps under polynomial-time Turing reductions.

Upper bounds (containment in complexity classes): P^QMA[log] is thought of as "slightly harder" than QMA. We justify this formally by exploiting the hierarchical voting technique of [Beigel, Hemachandra, Wechsung, SCT 1989] to show P^QMA[log] is in PP. This improves the containment QMA is in PP [Kitaev, Watrous, STOC 2000].

This work contributes a rigorous treatment of the subtlety involved in studying oracle classes in which the oracle solves a promise problem. This is particularly relevant for quantum complexity theory, where most natural classes such as BQP and QMA are defined as promise classes.

We investigate the interesting impact of mobility on the problem of efficient wireless power transfer in ad hoc networks. We consider a set of mobile agents (consuming energy to perform certain sensing and communication tasks), and a single static charger (with finite energy) which can recharge the agents when they get in its range. In particular, we focus on the problem of efficiently computing the appropriate range of the charger with the goal of prolonging the network lifetime. We first demonstrate (under the realistic assumption of fixed energy supplies) the limitations of any fixed charging range and, therefore, the need for (and power of) a dynamic selection of the charging range, by adapting to the behavior of the mobile agents which is revealed in an online manner. We investigate the complexity of optimizing the selection of such an adaptive charging range, by showing that two simplified offline optimization problems (closely related to the online one) are NP-hard. To effectively address the involved performance trade-offs, we finally present a variety of adaptive heuristics, assuming different levels of agent information regarding their mobility and energy.

Infinite time Turing machine models with tape length $\alpha$, denoted $T_\alpha$, strengthen the machines of Hamkins and Kidder [HL00] with tape length $\omega$. A new phenomenon is that for some countable ordinals $\alpha$, some cells cannot be halting positions of $T_\alpha$ given trivial input. The main open question in [Rin14] asks about the size of the least such ordinal $\delta$.

We answer this by providing various characterizations. For instance, $\delta$ is the least ordinal with any of the following properties: (a) For some $\xi<\alpha$, there is a $T_\xi$-writable but not $T_\alpha$-writable subset of $\omega$. (b) There is a gap in the $T_\alpha$-writable ordinals. (c) $\alpha$ is uncountable in $L_{\lambda_\alpha}$. Here $\lambda_\alpha$ denotes the supremum of $T_\alpha$-writable ordinals, i.e. those with a $T_\alpha$-writable code of length $\alpha$.

We further use the above characterizations, and an analogue to Welch's submodel characterization of the ordinals $\lambda$, $\zeta$ and $\Sigma$, to show that $\delta$ is large in the sense that it is a closure point of the function $\alpha \mapsto \Sigma_\alpha$, where $\Sigma_\alpha$ denotes the supremum of the $T_\alpha$-accidentally writable ordinals.

Datasets drive vision progress, yet existing driving datasets are impoverished in terms of visual content and supported tasks to study multitask learning for autonomous driving. Researchers are usually constrained to study a small set of problems on one dataset, while real-world computer vision applications require performing tasks of various complexities. We construct BDD100K, the largest driving video dataset with 100K videos and 10 tasks to evaluate the exciting progress of image recognition algorithms on autonomous driving. The dataset possesses geographic, environmental, and weather diversity, which is useful for training models that are less likely to be surprised by new conditions. Based on this diverse dataset, we build a benchmark for heterogeneous multitask learning and study how to solve the tasks together. Our experiments show that special training strategies are needed for existing models to perform such heterogeneous tasks. BDD100K opens the door for future studies in this important venue.

In this paper, we focus on weakly supervised learning with noisy training data for both classification and regression problems.We assume that the training outputs are collected from a mixture of a target and correlated noise distributions.Our proposed method simultaneously estimates the target distribution and the quality of each data which is defined as the correlation between the target and data generating distributions.The cornerstone of the proposed method is a Cholesky Block that enables modeling dependencies among mixture distributions in a differentiable manner where we maintain the distribution over the network weights.We first provide illustrative examples in both regression and classification tasks to show the effectiveness of the proposed method.Then, the proposed method is extensively evaluated in a number of experiments where we show that it constantly shows comparable or superior performances compared to existing baseline methods in the handling of noisy data.

We study the fundamental problem of distributed network formation among mobile agents of limited computational power that aim to achieve energy balance by wirelessly transmitting and receiving energy in a peer-to-peer manner. Specifically, we design simple distributed protocols consisting of a small number of states and interaction rules for the formation of arbitrary and k-ary tree networks. Furthermore, we evaluate (theoretically and also using computer simulations) a plethora of energy redistribution protocols that exploit different levels of knowledge in order to achieve desired energy distributions among the agents which require that every agent has exactly or at least twice the energy of the agents of higher depth, according to the structure of the network. Our study shows that without using any knowledge about the network structure, such energy distributions cannot be achieved in a timely manner, meaning that there might be high energy loss during the redistribution process. On the other hand, only a few extra bits of information seem to be enough to guarantee quick convergence to energy distributions that satisfy particular properties, yielding low energy loss.

As shown in (this http URL), the usual update modes of Boolean networks (BNs), including synchronous and (generalized) asynchronous, fail to capture behaviors introduced by multivalued refinements. Thus, update modes do not allow a correct abstract reasoning on dynamics of biological systems, as they may lead to reject valid BN models.This technical report lists the main definitions and properties of the most permissive semantics of BNs introduced in this http URL This semantics meets with a correct abstraction of any multivalued refinements, with any update mode. It subsumes all the usual updating modes, while enabling new behaviors achievable by more concrete models. Moreover, it appears that classical dynamical analyzes of reachability and attractors have a simpler computational complexity:- reachability can be assessed in a polynomial number of iterations. The computation of iterations is in NP in the very general case, and is linear when local functions are monotonic, or with some usual representations of functions of BNs (binary decision diagrams, Petri nets, automata networks, etc.). Thus, reachability is in P with locally-monotonic BNs, and P$^{\text{NP}}$ otherwise (instead of being PSPACE-complete with update modes);- deciding wherever a configuration belongs to an attractor is in coNP with locally-monotonic BNs, and coNP$^{\text{coNP}}$ otherwise (instead of PSPACE-complete with update modes).Furthermore, we demonstrate that the semantics completely captures any behavior achievable with any multilevel or ODE refinement of the BN; and the semantics is minimal with respect to this model refinement criteria: to any most permissive trajectory, there exists a multilevel refinement of the BN which can reproduce it.In brief, the most permissive semantics of BNs enables a correct abstract reasoning on dynamics of BNs, with a greater tractability than previously introduced update modes.

We still do not have perfect decoders for topological codes that can satisfy all needs of different experimental setups. Recently, a few neural network based decoders have been studied, with the motivation that they can adapt to a wide range of noise models, and can easily run on dedicated chips without a full-fledged computer. The later feature might lead to fast speed and the ability to operate at low temperatures. However, a question which has not been addressed in previous works is whether neural network decoders can handle 2D topological codes with large distances. In this work, we provide a positive answer for the toric code. The structure of our neural network decoder is inspired by the renormalization group decoder. With a fairly strict policy on training time, when the bit-flip error rate is lower than $9\%$ and syndrome extraction is perfect, the neural network decoder performs better when code distance increases. With a less strict policy, we find it is not hard for the neural decoder to achieve a performance close to the minimum-weight perfect matching algorithm. The numerical simulation is done up to code distance $d=64$. Last but not least, we describe and analyze a few failed approaches. They guide us to the final design of our neural decoder, but also serve as a caution when we gauge the versatility of stock deep neural networks. The source code of our neural decoder can be found at https://github.com/XiaotongNi/toric-code-neural-decoder .

This paper proposes a new highly scalable and asymptotically optimal control synthesis algorithm from linear temporal logic specifications, called $\text{STyLuS}^{*}$ for large-Scale optimal Temporal Logic Synthesis, that is designed to solve complex temporal planning problems in large-scale multi-robot systems. Existing planning approaches with temporal logic specifications rely on graph search techniques applied to a product automaton constructed among the robots. In our previous work, we have proposed a more tractable sampling-based algorithm that builds incrementally trees that approximate the state-space and transitions of the synchronous product automaton and does not require sophisticated graph search techniques. Here, we extend our previous work by introducing bias in the sampling process which is guided by transitions in the B$\ddot{\text{u}}$chi automaton that belong to the shortest path to the accepting states. This allows us to synthesize optimal motion plans from product automata with hundreds of orders of magnitude more states than those that existing optimal control synthesis methods or off-the-shelf model checkers can manipulate. We show that $\text{STyLuS}^{*}$ is probabilistically complete and asymptotically optimal and has exponential convergence rate. This is the first time that convergence rate results are provided for sampling-based optimal control synthesis methods. We provide simulation results that show that $\text{STyLuS}^{*}$ can synthesize optimal motion plans for very large multi-robot systems which is impossible using state-of-the-art methods.

We study the problem of the embedding degree of an abelian variety over a finite field which is vital in pairing-based cryptography. In particular, we show that for a prescribed CM field $L$ of degree $\geq 4$, prescribed integers $m$, $n$ and any prime $l\equiv 1 \pmod{mn}$, there exists an ordinary abelian variety over a finite field with endomorphism algebra $L$, embedding degree $n$ with respect to $l$ and the field extension generated by the $l$-torsion points of degree $mn$ over the field of definition. We also study a class of absolutely simple higher dimensional abelian varieties whose endomorphism algebras are central over imaginary quadratic fields.

We address the problem of learning the parameters of a stable linear time invariant (LTI) system or linear dynamical system (LDS) with unknown latent space dimension, or order, from a single time--series of noisy input-output data. We focus on learning the best lower order approximation allowed by finite data. Motivated by subspace algorithms in systems theory, where the doubly infinite system Hankel matrix captures both order and good lower order approximations, we construct a Hankel-like matrix from noisy finite data using ordinary least squares. This circumvents the non-convexities that arise in system identification, and allows accurate estimation of the underlying LTI system. Our results rely on careful analysis of self-normalized martingale difference terms that helps bound identification error up to logarithmic factors of the lower bound. We provide a data-dependent scheme for order selection and find an accurate realization of system parameters, corresponding to that order, by an approach that is closely related to the Ho-Kalman subspace algorithm. We demonstrate that the proposed model order selection procedure is not overly conservative, i.e., for the given data length it is not possible to estimate higher order models or find higher order approximations with reasonable accuracy.

Ants are six-legged insects that can carry loads ten times heavier than their body weight. Since having six-legs, they are intrinsically stable. They are powerful and can carry heavy loads. For these reasons, in this paper a new parallel kinematic structure is proposed for a six-legged ant robot. The mechanical structure is designed and optimized in Solidworks. The mechanism has six legs and only two DC motors actuate the six legs so from mechanical point of view the design is an optimal one. The robot is lightweight and semiautonomous due to using wireless modules. This feature makes this robot to be suitable to be used in social robotics and rescue robotics applications. The transmitter program is implemented in supervisor computer using LabVIEW and a microcontroller is used as the main controller. The electronic board is designed and tested in Proteus Professional and the PCB board is implemented in Altium Designer. Microcontroller programming is done in Code Vision.

We introduce a Curry-Howard correspondence for a large class of intermediate logics characterized by intuitionistic proofs with non-nested applications of rules for classical disjunctive tautologies (1-depth intermediate proofs). The resulting calculus, we call it $\lambda_{\parallel}$, is a strongly normalizing parallel extension of the simply typed $\lambda$-calculus. Although simple, the $\lambda_{\parallel}$ reduction rules can model arbitrary process network topologies, and encode interesting parallel programs ranging from numeric computation to algorithms on graphs.

The complexity of parity games is a long standing open problem that saw a major breakthrough in 2017 when two quasi-polynomial algorithms were published. This article presents a third, independent approach to solving parity games in quasi-polynomial time, based on the notion of register game, a parameterised variant of a parity game. The analysis of register games leads to a quasi-polynomial algorithm for parity games, a polynomial algorithm for restricted classes of parity games and a novel measure of complexity, the register index, which aims to capture the combined complexity of the priority assignement and the underlying game graph.

We further present a translation of alternating parity word automata into alternating weak automata with only a quasi-polynomial increase in size, based on register games; this improves on the previous exponential translation.

We also use register games to investigate the parity index hierarchy: while for words the index hierarchy of alternating parity automata collapses to the weak level, and for trees it is strict, for structures between trees and words, it collapses logarithmically, in the sense that any parity tree automaton of size n is equivalent, on these particular classes of structures, to an automaton with a number of priorities logarithmic in n.

We consider a communication scenario, in which an intruder tries to determine the modulation scheme of the intercepted signal. Our aim is to minimize the accuracy of the intruder, while guaranteeing that the intended receiver can still recover the underlying message with the highest reliability. This is achieved by perturbing channel input symbols at the encoder, similarly to adversarial attacks against classifiers in machine learning. In image classification, the perturbation is limited to be imperceptible to a human observer, while in our case the perturbation is constrained so that the message can still be reliably decoded by the legitimate receiver, which is oblivious to the perturbation. Simulation results demonstrate the viability of our approach to make wireless communication secure against state-of-the-art intruders (using deep learning or decision trees) with minimal sacrifice in the communication performance. On the other hand, we also demonstrate that using diverse training data and curriculum learning can significantly boost the accuracy of the intruder.

Most existing work focuses on the generalization of KKT for nonsmooth convex optimization problems, but this paper explores a generalized form of Karush-Kuhn-Tucker (KKT) conditions for real continuous optimization problems.

Few-shot classification refers to learning a classifier for new classes given only a few examples. While a plethora of models have emerged to tackle it, we find the procedure and datasets that are used to assess their progress lacking. To address this limitation, we propose Meta-Dataset: a new benchmark for training and evaluating models that is large-scale, consists of diverse datasets, and presents more realistic tasks. We experiment with popular baselines and meta-learners on Meta-Dataset, along with a competitive method that we propose. We analyze performance as a function of various characteristics of test tasks and examine the models' ability to leverage diverse training sources for improving their generalization. We also propose a new set of baselines for quantifying the benefit of meta-learning in Meta-Dataset. Our extensive experimentation has uncovered important research challenges and we hope to inspire work in these directions.

We study the problem of finding the optimal dosage in early stage clinical trials through the multi-armed bandit lens. We advocate the use of the Thompson Sampling principle, a flexible algorithm that can accommodate different types of monotonicity assumptions on the toxicity and efficacy of the doses. For the simplest version of Thompson Sampling, based on a uniform prior distribution for each dose, we provide finite-time upper bounds on the number of sub-optimal dose selections, which is unprecedented for dose-finding algorithms. Through a large simulation study, we then show that variants of Thompson Sampling based on more sophisticated prior distributions outperform state-of-the-art dose identification algorithms in different types of dose-finding studies that occur in phase I or phase I/II trials.

Recently, it has been shown that many functions on sets can be represented by sum decompositions. These decompositons easily lend themselves to neural approximations, extending the applicability of neural nets to set-valued inputs---Deep Set learning. This work investigates a core component of Deep Set architecture: aggregation functions. We suggest and examine alternatives to commonly used aggregation functions, including learnable recurrent aggregation functions. Empirically, we show that the Deep Set networks are highly sensitive to the choice of aggregation functions: beyond improved performance, we find that learnable aggregations lower hyper-parameter sensitivity and generalize better to out-of-distribution input size.

Breaking down botnets have always been a big challenge. The robustness of C&C channels is increased, and the detection of botmaster is harder in P2P botnets. In this paper, we propose a probabilistic method to reconstruct the topologies of the C&C channel for P2P botnets. Due to the geographic dispersion of P2P botnet members, it is not possible to supervise all members, and there does not exist all necessary data for applying other graph reconstruction methods. So far, no general method has been introduced to reconstruct C&C channel topology for all type of P2P botnet. In our method, the probability of connections between bots is estimated by using the inaccurate receiving times of several cascades, network model parameters of C&C channel, and end-to-end delay distribution of the Internet. The receiving times can be collected by observing the external reaction of bots to commands. The results of our simulations show that more than 90% of the edges in a 1000-member network with node degree mean 50, have been accurately estimated by collecting the inaccurate receiving times of 22 cascades. In case the receiving times of just half of the bots are collected, this accuracy of estimation is obtained by using 95 cascades.

Auditory frisson is the experience of feeling of cold or shivering related to sound in the absence of a physical cold stimulus. Multiple examples of frisson-inducing sounds have been reported, but the mechanism of auditory frisson remains elusive. Typical frisson-inducing sounds may contain a looming effect, in which a sound appears to approach the listener's peripersonal space. Previous studies on sound in peripersonal space have provided objective measurements of sound-inducing effects, but few have investigated the subjective experience of frisson-inducing sounds. Here we explored whether it is possible to produce subjective feelings of frisson by moving a noise sound (white noise, rolling beads noise, or frictional noise produced by rubbing a plastic bag) stimulus around a listener's head. Our results demonstrated that sound-induced frisson can be experienced stronger when auditory stimuli are rotated around the head (binaural moving sounds) than the one without the rotation (monaural static sounds), regardless of the source of the noise sound. Pearson's correlation analysis showed that several acoustic features of auditory stimuli, such as variance of interaural level difference (ILD), loudness, and sharpness, were correlated with the magnitude of subjective frisson. We had also observed that the subjective feelings of frisson by moving a musical sound had increased comparing with a static musical sound.

We consider a scenario where a system experiences a disruption, and the states (representing health values) of its components continue to reduce over time, unless they are acted upon by a controller. Given this dynamical setting, we consider the problem of finding an optimal control (or switching) sequence to maximize the sum of the weights of the components whose states are brought back to the maximum value. We first provide several characteristics of the optimal policy for the general (fully heterogeneous) version of this problem. We then show that under certain conditions on the rates of repair and deterioration, we can explicitly characterize the optimal control policy as a function of the states. When the deterioration rate (when not being repaired) is larger than or equal to the repair rate, and the deterioration and repair rates as well as the weights are homogeneous across all the components, the optimal control policy is to target the component that has the largest state value at each time step. On the other hand, if the repair rates are sufficiently larger than the deterioration rates, the optimal control policy is to target the component whose state minus the deterioration rate is least in a particular subset of components at each time step.

Reinforcement learning (RL) methods learn optimal decisions in the presence of a stationary environment. However, the stationary assumption on the environment is very restrictive. In many real world problems like traffic signal control, robotic applications, one often encounters situations with non-stationary environments and in these scenarios, RL methods yield sub-optimal decisions. In this paper, we thus consider the problem of developing RL methods that obtain optimal decisions in a non-stationary environment. The goal of this problem is to maximize the long-term discounted reward achieved when the underlying model of the environment changes over time. To achieve this, we first adapt a change point algorithm to detect change in the statistics of the environment and then develop an RL algorithm that maximizes the long-run reward accrued. We illustrate that our change point method detects change in the model of the environment effectively and thus facilitates the RL algorithm in maximizing the long-run reward. We further validate the effectiveness of the proposed solution on non-stationary random Markov decision processes, a sensor energy management problem and a traffic signal control problem.

A critical decision point when training predictors using multiple studies is whether these studies should be combined or treated separately. We compare two multi-study learning approaches in the presence of potential heterogeneity in predictor-outcome relationships across datasets. We consider 1) merging all of the datasets and training a single learner, and 2) multi-study ensembling, which involves training a separate learner on each dataset and combining the predictions resulting from each learner. In a linear regression setting, we show analytically and confirm via simulation that merging yields lower prediction error than ensembling when the predictor-outcome relationships are relatively homogeneous across studies. However, as cross-study heterogeneity increases, there exists a transition point beyond which ensembling outperforms merging. We provide analytic expressions for the transition point in various scenarios, study asymptotic properties, and illustrate how transition point theory can be used for deciding when studies should be combined with an application from metabolomics.

Gradient-based distributed learning in Parameter Server (PS) computing architectures is subject to random delays due to straggling worker nodes, as well as to possible communication bottlenecks between PS and workers. Solutions have been recently proposed to separately address these impairments based on the ideas of gradient coding, worker grouping, and adaptive worker selection. This paper provides a unified analysis of these techniques in terms of wall-clock time, communication, and computation complexity measures. Furthermore, in order to combine the benefits of gradient coding and grouping in terms of robustness to stragglers with the communication and computation load gains of adaptive selection, novel strategies, named Lazily Aggregated Gradient Coding (LAGC) and Grouped-LAG (G-LAG), are introduced. Analysis and results show that G-LAG provides the best wall-clock time and communication performance, while maintaining a low computational cost, for two representative distributions of the computing times of the worker nodes.

We prove bounds on the generalization error of convolutional networks. The bounds are in terms of the training loss, the number of parameters, the Lipschitz constant of the loss and the distance from the weights to the initial weights. They are independent of the number of pixels in the input, and the height and width of hidden feature maps. We present experiments using CIFAR-10 with varying hyperparameters of a deep convolutional network, comparing our bounds with practical generalization gaps.

Conditional generative adversarial networks (cGANs) have been widely researched to generate class conditional images using a single generator. However, in the conventional cGANs techniques, it is still challenging for the generator to learn condition-specific features, since a standard convolutional layer with the same weights is used regardless of the condition. In this paper, we propose a novel convolution layer, called the conditional convolution layer, which directly generates different feature maps by employing the weights which are adjusted depending on the conditions. More specifically, in each conditional convolution layer, the weights are conditioned in a simple but effective way through filter-wise scaling and channel-wise shifting operations. In contrast to the conventional methods, the proposed method with a single generator can effectively handle condition-specific characteristics. The experimental results on CIFAR, LSUN and ImageNet datasets show that the generator with the proposed conditional convolution layer achieves a higher quality of conditional image generation than that with the standard convolution layer.

We study a constrained shortest path problem in group-labeled graphs with nonnegative edge length, called the shortest non-zero path problem. Depending on the group in question, this problem includes two types of tractable variants in undirected graphs: one is the parity-constrained shortest path/cycle problem, and the other is computing a shortest noncontractible cycle in surface-embedded graphs.

For the shortest non-zero path problem with respect to finite abelian groups, Kobayashi and Toyooka (2017) proposed a randomized, pseudopolynomial-time algorithm via permanent computation. For a slightly more general class of groups, Yamaguchi (2016) showed a reduction of the problem to the weighted linear matroid parity problem. In particular, some cases are solved in strongly polynomial time via the reduction with the aid of a deterministic, polynomial-time algorithm for the weighted linear matroid parity problem developed by Iwata and Kobayashi (2017), which generalizes a well-known fact that the parity-constrained shortest path problem is solved via weighted matching.

In this paper, as the first general solution independent of the group, we present a rather simple, deterministic, and strongly polynomial-time algorithm for the shortest non-zero path problem. The algorithm is based on Dijkstra's algorithm for the unconstrained shortest path problem and Edmonds' blossom shrinking technique in matching algorithms, and clarifies a common tractable feature behind the parity and topological constraints in the shortest path/cycle problem. Furthermore, we demonstrate a faster algorithm without explicit blossom shrinking together with a dual linear programming formulation of the equivalent problem like potential maximization for the unconstrained shortest path problem.

Panoptic segmentation requires segments of both "things" (countable object instances) and "stuff" (uncountable and amorphous regions) within a single output. A common approach involves the fusion of instance segmentation (for "things") and semantic segmentation (for "stuff") into a non-overlapping placement of segments, and resolves overlaps. However, instance ordering with detection confidence do not correlate well with natural occlusion relationship. To resolve this issue, we propose a branch that is tasked with modeling how two instance masks should overlap one another as a binary relation. Our method, named OCFusion, is lightweight but particularly effective in the instance fusion process. OCFusion is trained with the ground truth relation derived automatically from the existing dataset annotations. We obtain state-of-the-art results on COCO and show competitive results on the Cityscapes panoptic segmentation benchmark.

We propose a novel method for unsupervised image-to-image translation, which incorporates a new attention module and a new learnable normalization function in an end-to-end manner. The attention module guides our model to focus on more important regions distinguishing between source and target domains based on the attention map obtained by the auxiliary classifier. Unlike previous attention-based method which cannot handle the geometric changes between domains, our model can translate both images requiring holistic changes and images requiring large shape changes. Moreover, our new AdaLIN (Adaptive Layer-Instance Normalization) function helps our attention-guided model to flexibly control the amount of change in shape and texture by learned parameters depending on datasets. Experimental results show the superiority of the proposed method compared to the existing state-of-the-art models with a fixed network architecture and hyper-parameters. Our code and datasets are available at https://github.com/taki0112/UGATIT or https://github.com/znxlwm/UGATIT-pytorch.

Let $d\geq 1$ be an integer, and let $\mathcal{P}$ be the convex hull in $\mathbb{R}^s$ of all integral points $\mathbf{e}_{i_1}+\cdots+\mathbf{e}_{i_d}$ such that $1\leq i_1<\cdots< i_d\leq s$, where $\mathbf{e}_i$ is the $i$-th unit vector in $\mathbb{R}^s$. Given a finite field $\mathbb{F}_q$, we determine the minimum distance of the projective toric code of $\mathcal{P}$. Then, we show a formula, in terms of the degree and a fixed monomial order, to compute the $r$-th generalized Hamming weights of an affine Reed--Muller-type code, and give a lower bound for this number which is easier to compute. We determine the minimum distance and the 2nd generalized Hamming weight of a squarefree evaluation code over an affine torus.

In this paper, we propose "personal VAD", a system to detect the voice activity of a target speaker at the frame level. This system is useful for gating the inputs to a streaming on-device speech recognition system, such that it only triggers for the target user, which helps reduce the computational cost and battery consumption, especially in scenarios where a keyword detector is unpreferable. We achieve this by training a VAD-alike neural network that is conditioned on the target speaker embedding or the speaker verification score. For each frame, personal VAD outputs the probabilities for three classes: non-speech, target speaker speech, and non-target speaker speech. Under our optimal setup, we are able to train a model with only 130K parameters that outperforms a baseline system where individually trained standard VAD and speaker recognition networks are combined to perform the same task.

In this study, we introduce a novel multi-task learning algorithm based on capsule network to encode visual attributes towards image-based diagnosis. By learning visual attributes, our proposed capsule architecture, called X-Caps, is considered explainable and models high-level visual attributes within the vectors of its capsules, then forms predictions based solely on these interpretable features. To accomplish this, we modify the dynamic routing algorithm to independently route information from child capsules to parents for the visual attribute vectors. To increase the explainability of our method further, we propose to train our network on a distribution of expert labels directly rather than the average of these labels as done in previous studies. At test time, this provides a meaningful metric of model confidence, punishing over/under confidence, directly supervised by human-experts' agreement, while visual attribute prediction scores are verified via a reconstruction branch of the network. To test and validate the proposed algorithm, we conduct experiments on a large dataset of over 1000 CT scans, where our proposed X-Caps, even being a relatively small 2D capsule network, outperforms the previous state-of-the-art deep dual-path dense 3D CNN in predicting visual attribute scores while also improving diagnostic accuracy. To the best of our knowledge, this is the first study to investigate capsule networks for making predictions based on radiologist-level interpretable attributes and its applications to medical image diagnosis.

In recent years, inspired by a mass of researches on adversarial examples for computer vision, there has been a growing interest in designing adversarial attacks for Natural Language Processing (NLP) tasks, followed by very few works of adversarial defenses for NLP. To our knowledge, there exists no defense method against the successful synonym substitution based attacks that aim to satisfy all the lexical, grammatical, semantic constraints and thus are hard to be perceived by humans. We contribute to fill this gap and propose a novel adversarial defense method called \textit{Synonym Encoding Method} (SEM), which inserts an encoder before the input layer of the model and then trains the model to eliminate adversarial perturbations. Extensive experiments demonstrate that SEM can efficiently defend current best synonym substitution based adversarial attacks with little decay on the accuracy for benign examples. To better evaluate SEM, we also design a strong attack method called Improved Genetic Algorithm (IGA) that adopts the genetic metaheuristic for synonym substitution based attacks. Compared with the first genetic based adversarial attack proposed in 2018, IGA can achieve higher attack success rate with lower word substitution rate, at the same time maintain the transferability of adversarial examples.

For robots to exhibit a high level of intelligence in the real world, they must be able to assess objects for which they have no prior knowledge. Therefore, it is crucial for robots to perceive object affordances by reasoning about physical interactions with the object. In this paper, we propose a novel method to provide robots with an ability to imagine object affordances using physical simulations. The class of chair is chosen here as an initial category of objects to illustrate a more general paradigm. In our method, the robot "imagines" the affordance of an arbitrarily oriented object as a chair by simulating a physical sitting interaction between an articulated human body and the object. This object affordance reasoning is used as a cue for object classification (chair vs non-chair). Moreover, if an object is classified as a chair, the affordance reasoning can also predict the upright pose of the object which allows the sitting interaction to take place. We call this type of poses the functional pose. We demonstrate our method in chair classification on synthetic 3D CAD models. Although our method uses only 30 models for training, it outperforms appearance-based deep learning methods, which require a large amount of training data, when the upright orientation is not assumed to be known a priori. In addition, we showcase that the functional pose predictions of our method align well with human judgments on both synthetic models and real objects scanned by a depth camera.

Molecular communication via diffusion (MCvD) is a method of achieving nano- and micro-scale connectivity by utilizing the free diffusion mechanism of information molecules. The randomness in diffusive propagation is the main cause of inter-symbol interference (ISI) and the limiting factor of high data rate MCvD applications. In this paper, an apertured plane is considered between the transmitter and the receiver of an MCvD link. Either after being artificially placed or occurring naturally, surfaces or volumes that resemble an apertured plane only allow a fraction of the molecules to pass. Contrary to intuition, it is observed that such topology may improve communication performance, given the molecules that can pass through the aperture are the ones that take more directed paths towards the receiver. Furthermore, through both computer simulations and a theoretical signal evaluation metric named signal-to-interference and noise amplitude ratio (SINAR), it is found that the size of the aperture imposes a trade-off between the received signal power and the ISI combating capability of an MCvD system, hinting to an optimal aperture size that minimizes the bit error rate (BER). It is observed that the trend of BER is accurately mirrored by SINAR, suggesting the proposed metric's applicability to optimization tasks in MCvD systems, including finding the optimal aperture size of an apertured plane. In addition, computer simulations and SINAR show that said optimal aperture size is affected by the location of the aperture and the bit rate. Lastly, the paper analyzes the effects of radial and angular offsets in the placement of the apertured plane, and finds that a reduction in BER is still in effect up to certain offset values. Overall, our results imply that apertured plane-like surfaces may actually help communication efficiency, even though they reduce the received signal power.

This paper presents a new neural architecture that combines a modulated Hebbian network (MOHN) with DQN, which we call modulated Hebbian plus Q network architecture (MOHQA). The hypothesis is that such a combination allows MOHQA to solve difficult partially observable Markov decision process (POMDP) problems which impair temporal difference (TD)-based RL algorithms such as DQN, as the TD error cannot be easily derived from observations. The key idea is to use a Hebbian network with bio-inspired neural traces in order to bridge temporal delays between actions and rewards when confounding observations and sparse rewards result in inaccurate TD errors. In MOHQA, DQN learns low level features and control, while the MOHN contributes to the high-level decisions by associating rewards with past states and actions. Thus the proposed architecture combines two modules with significantly different learning algorithms, a Hebbian associative network and a classical DQN pipeline, exploiting the advantages of both. Simulations on a set of POMDPs and on the MALMO environment show that the proposed algorithm improved DQN's results and even outperformed control tests with A2C, QRDQN+LSTM and REINFORCE algorithms on some POMDPs with confounding stimuli and sparse rewards.

We propose Dynamically Pruned Message Passing Networks (DPMPN) for large-scale knowledge graph reasoning. In contrast to existing models, embedding-based or path-based, we learn an input-dependent subgraph to explicitly model reasoning process. Subgraphs are dynamically constructed and expanded by applying graphical attention mechanism conditioned on input queries. In this way, we not only construct graph-structured explanations but also enable message passing designed in Graph Neural Networks (GNNs) to scale with graph sizes. We take the inspiration from the consciousness prior proposed by and develop a two-GNN framework to simultaneously encode input-agnostic full graph representation and learn input-dependent local one coordinated by an attention module. Experiments demonstrate the reasoning capability of our model that is to provide clear graphical explanations as well as deliver accurate predictions, outperforming most state-of-the-art methods in knowledge base completion tasks.

Existing graph neural network-based models have biasedly used a supervised training setting for graph classification, and they often share the conventional limitations in exploiting potential dependencies among nodes. To this end, we present U2GNN -- a novel embedding model leveraging the strength of the transformer self-attention network -- to learn low-dimensional embeddings of graphs. In particular, given an input graph, U2GNN applies a self-attention mechanism followed by a recurrent transition to update vector representation of each node from its neighbors. Thus, U2GNN can address the limitations in the existing models to produce plausible node embeddings whose sum is the final embedding of the whole graph. Experimental results in both supervised and unsupervised training settings show that our U2GNN achieves new state-of-the-art performances on a range of well-known benchmark datasets for the graph classification task. To the best of our knowledge, this is the first work showing that an unsupervised model performs better than supervised models by a large margin.

While deep learning-based classification is generally tackled using standardized approaches, a wide variety of techniques are employed for regression. In computer vision, one particularly popular such technique is that of confidence-based regression, which entails predicting a confidence value for each input-target pair (x,y). While this approach has demonstrated impressive results, it requires important task-dependent design choices, and the predicted confidences lack a natural probabilistic meaning. We address these issues by proposing a general and conceptually simple regression method with a clear probabilistic interpretation. In our proposed approach, we create an energy-based model of the conditional target density p(y|x), using a deep neural network to predict the un-normalized density from (x,y). This model of p(y|x) is trained by directly minimizing the associated negative log-likelihood, approximated using Monte Carlo sampling. We perform comprehensive experiments on four computer vision regression tasks. Our approach outperforms direct regression, as well as other probabilistic and confidence-based methods. Notably, our model achieves a 2.2% AP improvement over Faster-RCNN for object detection on the COCO dataset, and sets a new state-of-the-art on visual tracking when applied for bounding box estimation. In contrast to confidence-based methods, our approach is also shown to be directly applicable to more general tasks such as age and head-pose estimation.

We consider the problem of learning low-dimensional representations for large-scale Markov chains. We formulate the task of representation learning as that of mapping the state space of the model to a low-dimensional state space, called the kernel space. The kernel space contains a set of meta states which are desired to be representative of only a small subset of original states. To promote this structural property, we constrain the number of nonzero entries of the mappings between the state space and the kernel space. By imposing the desired characteristics of the representation, we cast the problem as a constrained nonnegative matrix factorization. To compute the solution, we propose an efficient block coordinate gradient descent and theoretically analyze its convergence properties.

We present a number of contributions to bridging the gap between supervisory control theory and coordination of services in order to explore the frontiers between coordination and control systems. Firstly, we modify the classical synthesis algorithm from supervisory control theory for obtaining the so-called most permissive controller in order to synthesise orchestrations and choreographies of service contracts formalised as contract automata. The key ingredient to make this possible is a novel notion of controllability. Then, we present an abstract parametric synthesis algorithm and show that it generalises the classical synthesis as well as the orchestration and choreography syntheses. Finally, through the novel abstract synthesis, we show that the concrete syntheses are in a refinement order. A running example from the service domain illustrates our contributions.

Intelligent detection and processing capabilities can be instrumental to improving the safety, efficiency, and successful completion of rescue missions conducted by firefighters in emergency first response settings. The objective of this research is to create an automated system that is capable of real-time, intelligent object detection and recognition and facilitates the improved situational awareness of firefighters during an emergency response. We have explored state of the art machine/deep learning techniques to achieve this objective. The goal for this work is to enhance the situational awareness of firefighters by effectively exploiting the information gathered from infrared cameras carried by firefighters. To accomplish this, we use a trained deep Convolutional Neural Network (CNN) system to classify and identify objects of interest from thermal imagery in real time. In the midst of those critical circumstances created by structure fire, this system is able to accurately inform the decision making process of firefighters with real-time up-to-date scene information by extracting, processing, and analyzing crucial information. With the new information produced by the framework, firefighters are able to make more informed inferences about the circumstances for their safe navigation through such hazardous and potentially catastrophic environments.

Graph-based semi-supervised learning has been shown to be one of the most effective approaches for classification tasks from a wide range of domains, such as image classification and text classification, as they can exploit the connectivity patterns between labeled and unlabeled samples to improve learning performance. In this work, we advance this effective learning paradigm towards a scenario where labeled data are severely limited. More specifically, we address the problem of graph-based semi-supervised learning in the presence of severely limited labeled samples, and propose a new framework, called {\em Shoestring}, that improves the learning performance through semantic transfer from these very few labeled samples to large numbers of unlabeled samples.

In particular, our framework learns a metric space in which classification can be performed by computing the similarity to centroid embedding of each class. {\em Shoestring} is trained in an end-to-end fashion to learn to leverage the semantic knowledge of limited labeled samples as well as their connectivity patterns with large numbers of unlabeled samples simultaneously. By combining {\em Shoestring} with graph convolutional networks, label propagation and their recent label-efficient variations (IGCN and GLP), we are able to achieve state-of-the-art node classification performance in the presence of very few labeled samples. In addition, we demonstrate the effectiveness of our framework on image classification tasks in the few-shot learning regime, with significant gains on miniImageNet ($2.57\%\sim3.59\%$) and tieredImageNet ($1.05\%\sim2.70\%$).

This paper shows that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks. We train a Transformer-based masked language model on one hundred languages, using more than two terabytes of filtered CommonCrawl data. Our model, dubbed XLM-R, significantly outperforms multilingual BERT (mBERT) on a variety of cross-lingual benchmarks, including +14.6% average accuracy on XNLI, +13% average F1 score on MLQA, and +2.4% F1 score on NER. XLM-R performs particularly well on low-resource languages, improving 15.7% in XNLI accuracy for Swahili and 11.4% for Urdu over previous XLM models. We also present a detailed empirical analysis of the key factors that are required to achieve these gains, including the trade-offs between (1) positive transfer and capacity dilution and (2) the performance of high and low resource languages at scale. Finally, we show, for the first time, the possibility of multilingual modeling without sacrificing per-language performance; XLM-R is very competitive with strong monolingual models on the GLUE and XNLI benchmarks. We will make our code, data and models publicly available.

Most of the existing pre-trained language representation models neglect to consider the linguistic knowledge of texts, whereas we argue that such knowledge can promote language understanding in various NLP tasks. To benefit the downstream tasks in sentiment analysis, we propose a novel language representation model called SentiLR, which introduces word-level linguistic knowledge including part-of-speech tag and prior sentiment polarity from SentiWordNet. During pre-training, we first acquire the prior sentiment polarity of each word by querying the SentiWordNet dictionary with its part-of-speech tag. Then, we devise a new pre-training task called label-aware masked language model consisting of two sub-tasks: 1) word knowledge recovering given the sentence-level label; 2) sentence-level label prediction with linguistic knowledge enhanced context. Experiments show that SentiLR achieves state-of-the-art performances on sentence-level / aspect-level sentiment analysis and sentiment-aware data augmentation.

We introduce S2ORC, a large contextual citation graph of English-language academic papers from multiple scientific domains; the corpus consists of 81.1M papers, 380.5M citation edges, and associated paper metadata. We provide structured full text for 8.1M open access papers. All inline citation mentions in the full text are detected and linked to their corresponding bibliography entries, which are linked to their referenced papers, forming contextual citation edges. To our knowledge, this is the largest publicly-available contextual citation graph. The full text alone is the largest structured academic text corpus to date. We release S2ORC to facilitate research and development of tools and tasks for the analysis of scientific text.

Cross-lingual word embeddings transfer knowledge between languages: models trained for a high-resource language can be used in a low-resource language. These embeddings are usually trained on general-purpose corpora but used for a domain-specific task. We introduce CLIME, an interactive system that allows a user to quickly refine cross-lingual word embeddings for a given classification problem. First, CLIME ranks words in the vocabulary by their salience to the downstream task. Then, salient keywords are displayed on an interface for users to mark the similarity between each keyword and its nearest neighbors in the embedding space. Finally, CLIME updates the embeddings using the annotations. We evaluate CLIME on a cross-lingual text classification benchmark for four low-resource languages: Ilocano, Sinhalese, Tigrinya, and Uyghur. Embeddings refined by CLIME capture more nuanced word semantics and have higher test accuracy than the original embeddings. CLIME also improves test accuracy faster than an active learning baseline and can be easily combined with active learning to improve results.

Motivated by the problem of spam classification, we study online learning in strategic classification settings from the perspective of the learner, who is repeatedly facing myopically rational strategic agents. We model this interplay as a repeated Stackelberg game, where at each timestep nature selects a feature vector observed only by the agent, the learner deploys a high-dimensional linear classifier and the agent, after observing the classifier and according to his underlying utility function, best responds with a potentially altered feature vector. The performance of the learner is measured in terms of Stackelberg regret for her 0--1 loss function. In this game-theoretic setting, applying standard online learning algorithms that minimize the external regret is not helpful; we prove that Stackelberg and external regret are strongly incompatible, i.e., there exist worst-case scenarios, where any sequence of actions providing sublinear external regret might result in linear Stackelberg regret and vice versa. For that, we introduce GRINDER, an adaptive discretization algorithm potentially of independent interest, and prove its data-dependent upper bound on the Stackelberg regret given oracle access, while being computationally efficient. In fact, we prove a nearly matching lower bound for the Stackelberg regret of online strategic classification against myopically rational agents. We complement our theoretical analysis with simulation results, which suggest that our algorithm outperforms the benchmarks, even given access to approximation oracles. Our results advance the known state-of-the-art results in the growing literature of learning from revealed preferences, which has so far focused on "smoother" utility and loss functions from the perspective of the agents and the learner respectively.

Predicting an agent's future trajectory is a challenging task given the complicated stimuli (environmental/inertial/social) of motion. Prior works learn individual stimulus from different modules and fuse the representations in an end-to-end manner, which makes it hard to understand what are actually captured and how they are fused. In this work, we borrow the notion of potential field from physics as an interpretable and unified representation to model all stimuli. This allows us to not only supervise the intermediate learning process, but also have a coherent method to fuse the information of different sources. From the generated potential fields, we further estimate future motion direction and speed, which are modeled as Gaussian distributions to account for the multi-modal nature of the problem. The final prediction results are generated by recurrently moving past location based on the estimated motion direction and speed. We show state-of-the-art results on the ETH, UCY, and Stanford Drone datasets.

Function inversion is the problem that given a random function $f: [M] \to [N]$, we want to find pre-image of any image $f^{-1}(y)$ in time $T$. In this work, we revisit this problem under the preprocessing model where we can compute some auxiliary information or advice of size $S$ that only depends on $f$ but not on $y$. It is a well-studied problem in the classical settings, however, it is not clear how quantum algorithms can solve this task any better besides invoking Grover's algorithm, which does not leverage the power of preprocessing.

Nayebi et al. proved a lower bound $ST^2 \ge \tilde\Omega(N)$ for quantum algorithms inverting permutations, however, they only consider algorithms with classical advice. Hhan et al. subsequently extended this lower bound to fully quantum algorithms for inverting permutations. In this work, we give the same asymptotic lower bound to fully quantum algorithms for inverting functions for fully quantum algorithms under the regime where $M = O(N)$.

In order to prove these bounds, we generalize the notion of quantum random access code, originally introduced by Ambainis et al., to the setting where we are given a list of (not necessarily independent) random variables, and we wish to compress them into a variable-length encoding such that we can retrieve a random element just using the encoding with high probability. As our main technical contribution, we give a nearly tight lower bound (for a wide parameter range) for this generalized notion of quantum random access codes, which may be of independent interest.

Object detection has recently experienced substantial progress. Yet, the widely adopted horizontal bounding box representation is not appropriate for ubiquitous oriented objects such as objects in aerial images and scene texts. In this paper, we propose a simple yet effective framework to detect multi-oriented objects. Instead of directly regressing the four vertices, we glide the vertex of the horizontal bounding box on each corresponding side to accurately describe a multi-oriented object. Specifically, We regress four length ratios characterizing the relative gliding offset on each corresponding side. This may facilitate the offset learning and avoid the confusion issue of sequential label points for oriented objects. To further remedy the confusion issue for nearly horizontal objects, we also introduce an obliquity factor based on area ratio between the object and its horizontal bounding box, guiding the selection of horizontal or oriented detection for each object. We add these five extra target variables to the regression head of faster R-CNN, which requires ignorable extra computation time. Extensive experimental results demonstrate that without bells and whistles, the proposed method achieves superior performances on multiple multi-oriented object detection benchmarks including object detection in aerial images, scene text detection, pedestrian detection in fisheye images.

Deep convolutional neural networks are hindered by training instability and feature redundancy towards further performance improvement. A promising solution is to impose orthogonality on convolutional filters.

We develop an efficient approach to impose filter orthogonality on a convolutional layer based on the doubly block-Toeplitz matrix representation of the convolutional kernel instead of using the common kernel orthogonality approach, which we show is only necessary but not sufficient for ensuring orthogonal convolutions.

Our proposed orthogonal convolution requires no additional parameters and little computational overhead. This method consistently outperforms the kernel orthogonality alternative on a wide range of tasks such as image classification and inpainting under supervised, semi-supervised and unsupervised settings. Further, it learns more diverse and expressive features with better training stability, robustness, and generalization. Our code is publicly available at https://github.com/samaonline/Orthogonal-Convolutional-Neural-Networks.

Deep learning-based models have been very successful in achieving state-of-the-art results in many of the computer vision, speech recognition, and natural language processing tasks in the last few years. These models seem a natural fit for handling the ever-increasing scale of biometric recognition problems, from cellphone authentication to airport security systems. Deep learning-based models have increasingly been leveraged to improve the accuracy of different biometric recognition systems in recent years. In this work, we provide a comprehensive survey of more than 120 promising works on biometric recognition (including face, fingerprint, iris, palmprint, ear, voice, signature, and gait recognition), which deploy deep learning models, and show their strengths and potentials in different applications. For each biometric, we first introduce the available datasets that are widely used in the literature and their characteristics. We will then talk about several promising deep learning works developed for that biometric, and show their performance on popular public benchmarks. We will also discuss some of the main challenges while using these models for biometric recognition, and possible future directions to which research in this area is headed.

Neural Architecture Search (NAS) that aims to automate the procedure of architecture design has achieved promising results in many computer vision fields. In this paper, we propose an AdversarialNAS method specially tailored for Generative Adversarial Networks (GANs) to search for a superior generative model on the task of unconditional image generation. The AdversarialNAS is the first method that can search the architectures of generator and discriminator simultaneously in a differentiable manner. During searching, the designed adversarial search algorithm does not need to comput any extra metric to evaluate the performance of the searched architecture, and the search paradigm considers the relevance between the two network architectures and improves their mutual balance. Therefore, AdversarialNAS is very efficient and only takes 1 GPU day to search for a superior generative model in the proposed large search space ($10^{38}$). Experiments demonstrate the effectiveness and superiority of our method. The discovered generative model sets a new state-of-the-art FID score of $10.87$ and highly competitive Inception Score of $8.74$ on CIFAR-10. Its transferability is also proven by setting new state-of-the-art FID score of $26.98$ and Inception score of $9.63$ on STL-10. Code is at: \url{https://github.com/chengaopro/AdversarialNAS}.

Proof-of-Stake consensus protocols give rise to complex modeling challenges. We analyze the recently-updated Tezos Proof-of-Stake protocol and demonstrate that, under certain conditions, rational participants are incentivized to behave dishonestly. In doing so, we provide a theoretical analysis of the feasibility and profitability of a block stealing attack that we call selfish endorsing, a concrete instance of an attack previously only theoretically considered. We propose and analyze a simple change to the Tezos protocol which significantly reduces the (already small) profitability of this dishonest behavior, and introduce a new delay and reward scheme that is provably secure against length-1 and length-2 selfish endorsing attacks. Our framework provides a template for analyzing other Proof-of-Stake implementations for selfish behavior.

We investigate the problem of simultaneous machine translation of long-form speech content. We target a continuous speech-to-text scenario, generating translated captions for a live audio feed, such as a lecture or play-by-play commentary. As this scenario allows for revisions to our incremental translations, we adopt a re-translation approach to simultaneous translation, where the source is repeatedly translated from scratch as it grows. This approach naturally exhibits very low latency and high final quality, but at the cost of incremental instability as the output is continuously refined. We experiment with a pipeline of industry-grade speech recognition and translation tools, augmented with simple inference heuristics to improve stability. We use TED Talks as a source of multilingual test data, developing our techniques on English-to-German spoken language translation. Our minimalist approach to simultaneous translation allows us to easily scale our final evaluation to six more target languages, dramatically improving incremental stability for all of them.

Tuning a data miner for for software analytics is something of a black art. Recent research has shown that some of that tuning can be achieved via automatic tools, called "hyperparameter optimizers". Much of that research has used tools developed from outside of SE. Hence, here, we ask how and when we can exploit the special properties of SE data to build faster and better optimizers.

Specifically, we apply hyperparameter optimization for 120 data sets addressing problems like bad smell detection, predicting Github issue close time, bug report analysis, defect prediction and dozens of other non-SE problems. To these, we applied a tool developed using SE data which (a) out-performs the state-of-the-art for these SE problems yet (b) fails badly on non-SE problems. From this experience, we can infer a simple rule for when to use/avoid different kinds of optimizers. SE data is often about infrequent issues, like the occasional defect, or the rarely exploited security violation, or the requirement that holds for one special case. But as we show, the same was not observed when we applied it on non-SE data.

Our conclusion will be that we can exploit this special properties of SE to great effect; specifically, to find better optimizations for SE tasks via a tactic called "dodging" (explained in this paper).

The goal of this paper is to design a simplex algorithm for linear programs on lattice polytopes that traces short' simplex paths from any given vertex to an optimal one. We consider a lattice polytope $P$ contained in $[0,k]^n$ and defined via $m$ linear inequalities. Our first contribution is a simplex algorithm that reaches an optimal vertex by tracing a path along the edges of $P$ of length in $O(n^4 k\log(nk)$. The length of this path is independent from $m$ and it is the best possible up to a polynomial function. In fact, it is only polynomially far from the worst-case diameter, which roughly grows as a linear function in $n$ and $k$.

Motivated by the fact that most known lattice polytopes are defined via $0,\pm 1$ constraint matrices, our second contribution is an iterative algorithm which exploits the largest absolute value $\alpha$ of the entries in the constraint matrix. We show that the length of the simplex path generated by the iterative algorithm is in $O(n^2k \log(nk\alpha))$. In particular, if $\alpha$ is bounded by a polynomial in $n, k$, then the length of the simplex path is in $O(n^2k \log(nk))$.

For both algorithms, the number of arithmetic operations needed to compute the next vertex in the path is polynomial in $n$, $m$ and $\log k$. If $k$ is polynomially bounded by $n$ and $m$, the algorithm runs in strongly polynomial time.

We investigate applying convolutional neural network (CNN) architecture to facilitate aerial hyperspectral scene understanding and present a new hyperspectral dataset-AeroRIT-that is large enough for CNN training. To date the majority of hyperspectral airborne have been confined to various sub-categories of vegetation and roads and this scene introduces two new categories: buildings and cars. To the best of our knowledge, this is the first comprehensive large-scale hyperspectral scene with nearly seven million pixel annotations for identifying cars, roads, and buildings. We compare the performance of three popular architectures - SegNet, U-Net, and Res-U-Net, for scene understanding and object identification via the task of dense semantic segmentation to establish a benchmark for the scene. To further strengthen the network, we add squeeze and excitation blocks for better channel interactions and use self-supervised learning for better encoder initialization. Aerial hyperspectral image analysis has been restricted to small datasets with limited train/test splits capabilities and we believe that AeroRIT will help advance the research in the field with a more complex object distribution to perform well on. The full dataset, with flight lines in radiance and reflectance domain, is available for download at https://github.com/aneesh3108/AeroRIT. This dataset is the first step towards developing robust algorithms for hyperspectral airborne sensing that can robustly perform advanced tasks like vehicle tracking and occlusion handling.

Savitzky-Golay (SG) filters are finite impulse response (FIR) realizations of least-squares polynomial regression and they are widely used for filtering (e.g. smoothing, interpolating, predicting, differentiating) and processing (e.g. detecting and classifying) non-stationary signals in non-Gaussian noise. For such inputs, the Wiener filter is biased and the Kalman filter is sub-optimal. Sequentially-correlated (i.e. colored) noise models are an integral part of the Wiener filter and an optional addition to the Kalman filter; however, their use in SG-filters has been overlooked in recent times. It is shown here that colored (wide-band and narrow-band) noise models are readily incorporated into a standard SG-filter and that this also addresses the well-known deficiency of their poor frequency-selectivity/configurability. A wide-band noise model sets the main-lobe/side-lobe width/height and provides physical justification for band-limited design procedures described elsewhere. The proposed narrow-band noise model, with arbitrarily placed side-lobe nulls, has the potential to outperform other SG filters when sinusoidal interferers of known frequency are present. The utility of these whitened SG-filters is illustrated in a hypothetical pulse/peak-detection application using a test statistic that is shaped by the noise model.

We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular camera setup without supervision. Our technical contributions are three-fold. First, we propose a differentiable forward rigid projection module that plays a key role in our instance-wise depth and motion learning. Second, we design an instance-wise photometric and geometric consistency loss that effectively decomposes background and moving object regions. Lastly, we introduce a new auto-annotation scheme to produce video instance segmentation maps that will be utilized as input to our training pipeline. These proposed elements are validated in a detailed ablation study. Through extensive experiments conducted on the KITTI dataset, our framework is shown to outperform the state-of-the-art depth and motion estimation methods. Our code and dataset will be available at https://github.com/SeokjuLee/Insta-DM.

Wind farm control using dynamic concepts is a research topic that is receiving an increasing amount of interest. The main concept of this approach is that dynamic variations of the wind turbine control settings lead to higher wake turbulence, and subsequently faster wake recovery due to increased mixing. As a result, downstream turbines experience higher wind speeds, thus increasing their energy capture. The current state of the art in dynamic wind farm control is to vary the magnitude of the thrust force of an upstream turbine. Although very effective, this approach also leads to increased power and thrust variations, negatively impacting energy quality and fatigue loading. In this paper, a novel approach for the dynamic control of wind turbines in a wind farm is proposed: using individual pitch control, the fixed-frame tilt and yaw moments on the turbine are varied, thus dynamically manipulating the wake. This strategy is named the helix approach since the resulting wake has a helical shape. Large eddy simulations of a two-turbine wind farm show that the helix approach leads to enhanced wake mixing with minimal power and thrust variations.

The rapid proliferation of shared edge computing platforms has enabled application service providers to deploy a wide variety of services with stringent latency and high bandwidth requirements. A key advantage of these platforms is that they provide pay-as-you-go flexibility by charging clients in proportion to their resource usage through short-term contracts. This affords the client significant cost-saving opportunities, by dynamically deciding when to host (cache) its service on the platform, depending on the changing intensity of requests. A natural caching policy for our setting is the Time-To-Live (TTL) policy. We show that TTL performs poorly both in the adversarial arrival setting, i.e., in terms of the competitive ratio, and for i.i.d. stochastic arrivals with low arrival rates, irrespective of the value of the TTL timer. We propose an online caching policy called RetroRenting (RR) and show that in the class of deterministic online policies, RR is order-optimal with respect to the competitive ratio. In addition, we provide performance guarantees for RR for i.i.d. stochastic arrival processes and prove that it compares well with the optimal online policy. Further, we conduct simulations using both synthetic and real world traces to compare the performance of RR and its variants with the optimal offline and online policies. The simulations show that the performance of RR is near optimal for all settings considered. Our results illustrate the universality of RR.

Pattern matching in time series data streams is considered to be an essential data mining problem that still stays challenging for many practical scenarios. Different factors such as noise, varying amplitude scale or shift, signal stretches or shrinks in time are all leading to performance degradation of many existing pattern matching algorithms. In this paper, we introduce a dynamic z-normalization mechanism allowing for proper signal scaling even under significant time and amplitude distortions. Based on that, we further propose a Dynamic Time Warping-based real-time pattern matching method to recover hidden patterns that can be distorted in both time and amplitude. We evaluate our proposed method on synthetic and real-world scenarios under realistic conditions demonstrating its high operational characteristics comparing to other state-of-the-art pattern matching methods.

The goal of this work is to build a classifier that can identify text complexity within the context of teaching reading to English as a Second Language (ESL) learners. To present language learners with texts that are suitable to their level of English, a set of features that can describe the phonological, morphological, lexical, syntactic, discursive, and psychological complexity of a given text were identified. Using a corpus of 6171 texts, which had already been classified into three different levels of difficulty by ESL experts, different experiments were conducted with five machine learning algorithms. The results showed that the adopted linguistic features provide a good overall classification performance (F-Score = 0.97). A scalability evaluation was conducted to test if such a classifier could be used within real applications, where it can be, for example, plugged into a search engine or a web-scraping module. In this evaluation, the texts in the test set are not only different from those from the training set but also of different types (ESL texts vs. children reading texts). Although the overall performance of the classifier decreased significantly (F-Score = 0.65), the confusion matrix shows that most of the classification errors are between the classes two and three (the middle-level classes) and that the system has a robust performance in categorizing texts of class one and four. This behavior can be explained by the difference in classification criteria between the two corpora. Hence, the observed results confirm the usability of such a classifier within a real-world application.

The ergodicity and the output-controllability of stochastic reaction networks have been shown to be essential properties to fulfill to enable their control using, for instance, antithetic integral control. We propose here to extend those properties to the case of uncertain networks. To this aim, the notions of interval, robust, sign, and structural ergodicity/output-controllability are introduced. The obtained results lie in the same spirit as those obtained in [Briat, Gupta & Khammash, Cell Systems, 2016] where those properties are characterized in terms of control theoretic concepts, linear algebraic conditions, linear programs, and graph-theoretic/algebraic conditions. An important conclusion is that all those properties can be characterized by linear programs. Two examples are given for illustration.

An adaptive joint source-channel coding (JSCC) scheme is presented for transmitting correlated sources over discrete-memoryless two-way channels subject to distortion constraints. The proposed JSCC scheme makes use of the previously transmitted and received channel signals as well as the sources' correlation to facilitate coordination between terminals. It is shown that the adaptive scheme strictly subsumes prior lossy coding methods for two-way simultaneous transmission and yields a new adaptive separate source-channel coding result. Two examples are given to show the scheme's advantages.

Most state-of-the-art 3D object detectors heavily rely on LiDAR sensors because there is a large performance gap between image-based and LiDAR-based methods. It is caused by the way to form representation for the prediction in 3D scenarios. Our method, called Deep Stereo Geometry Network (DSGN), significantly reduces this gap by detecting 3D objects on a differentiable volumetric representation -- 3D geometric volume, which effectively encodes 3D geometric structure for 3D regular space. With this representation, we learn depth information and semantic cues simultaneously. For the first time, we provide a simple and effective one-stage stereo-based 3D detection pipeline that jointly estimates the depth and detects 3D objects in an end-to-end learning manner. Our approach outperforms previous stereo-based 3D detectors (about 10 higher in terms of AP) and even achieves comparable performance with several LiDAR-based methods on the KITTI 3D object detection leaderboard. Our code is publicly available at https://github.com/chenyilun95/DSGN.

MRI with multiple protocols is commonly used for diagnosis, but it suffers from a long acquisition time, which yields the image quality vulnerable to say motion artifacts. To accelerate, various methods have been proposed to reconstruct full images from under-sampled k-space data. However, these algorithms are inadequate for two main reasons. Firstly, aliasing artifacts generated in the image domain are structural and non-local, so that sole image domain restoration is insufficient. Secondly, though MRI comprises multiple protocols during one exam, almost all previous studies only employ the reconstruction of an individual protocol using a highly distorted undersampled image as input, leaving the use of fully-sampled short protocol (say T1) as complementary information highly underexplored. In this work, we address the above two limitations by proposing a Dual Domain Recurrent Network (DuDoRNet) with deep T1 prior embedded to simultaneously recover k-space and images for accelerating the acquisition of MRI with a long imaging protocol. Specifically, a Dilated Residual Dense Network (DRDNet) is customized for dual domain restorations from undersampled MRI data. Extensive experiments on different sampling patterns and acceleration rates demonstrate that our method consistently outperforms state-of-the-art methods, and can reconstruct high-quality MRI.

Percentiles are statistics pointing to the standing of a paper's citation impact relative to other papers in a given citation distribution. Percentile Ranks (PRs) often play an important role in evaluating the impact of scholars, institutions, and lines of study. Because PRs are so important for the assessment of scholarly impact, and because citation practices differ greatly across time and fields, various percentile approaches have been proposed to time- and field-normalize citations. Unfortunately, current popular methods often face significant problems in time- and field-normalization, including when papers are assigned to multiple fields or have been published by more than one unit (e.g., researchers or countries). They also face problems for estimating citation counts (CCs) for pre-defined PRs (e.g., the 90th PR). We offer a series of guidelines and procedures that, we argue, address these problems and others and provide a superior means to make the use of percentile methods more accurate and informative. In particular, we introduce two approaches, CP-IN and CP-EX, that should be preferred in bibliometric studies because they consider the complete citation distribution. Both approaches are based on cumulative frequencies in percentages (CPs). The paper further shows how bar graphs and beamplots can present PRs in a more meaningful and accurate manner.

Image segmentation is a key topic in image processing and computer vision with applications such as scene understanding, medical image analysis, robotic perception, video surveillance, augmented reality, and image compression, among many others. Various algorithms for image segmentation have been developed in the literature. Recently, due to the success of deep learning models in a wide range of vision applications, there has been a substantial amount of works aimed at developing image segmentation approaches using deep learning models. In this survey, we provide a comprehensive review of the literature at the time of this writing, covering a broad spectrum of pioneering works for semantic and instance-level segmentation, including fully convolutional pixel-labeling networks, encoder-decoder architectures, multi-scale and pyramid based approaches, recurrent networks, visual attention models, and generative models in adversarial settings. We investigate the similarity, strengths and challenges of these deep learning models, examine the most widely used datasets, report performances, and discuss promising future research directions in this area.

Bitcoin is a peer-to-peer payment system proposed by Nakamoto in 2008. Based on the Nakamoto consensus, Bagaria, Kannan, Tse, Fanti, and Viswanath proposed the Prism protocol in 2018 and showed that it achieves near-optimal blockchain throughput while maintaining similar level of security as bitcoin. Previous probabilistic security guarantees for the bitcoin and Prism backbone protocols were either established under a simplified discrete-time model or expressed in terms of exponential order result. This paper presents a streamlined and strengthened analysis under a more realistic continuous-time model where the block propagation delays are heterogeneous, arbitrary, and upper bounded by some constant. A fully rigorous model for blockchains is developed with no assumption on adversarial miners except for an upper bound on their aggregate mining rate. Also introduced are the notions of a ''credible'' blockchain and ''typical'' events concerning block production over time intervals. A blockchain growth theorem, a blockchain quality theorem, and a common prefix theorem are established with explicit probability bounds. These results guarantee that, under a certain typical event which occurs with probability close to $1$, a credible transaction that is deep enough in one credible blockchain becomes permanent in all future credible blockchains.

Batch Normalization (BN) is one of the most widely used techniques in Deep Learning field. But its performance can awfully degrade with insufficient batch size. This weakness limits the usage of BN on many computer vision tasks like detection or segmentation, where batch size is usually small due to the constraint of memory consumption. Therefore many modified normalization techniques have been proposed, which either fail to restore the performance of BN completely, or have to introduce additional nonlinear operations in inference procedure and increase huge consumption. In this paper, we reveal that there are two extra batch statistics involved in backward propagation of BN, on which has never been well discussed before. The extra batch statistics associated with gradients also can severely affect the training of deep neural network. Based on our analysis, we propose a novel normalization method, named Moving Average Batch Normalization (MABN). MABN can completely restore the performance of vanilla BN in small batch cases, without introducing any additional nonlinear operations in inference procedure. We prove the benefits of MABN by both theoretical analysis and experiments. Our experiments demonstrate the effectiveness of MABN in multiple computer vision tasks including ImageNet and COCO. The code has been released in https://github.com/megvii-model/MABN.

Graph generative models have been extensively studied in the data mining literature. While traditional techniques are based on generating structures that adhere to a pre-decided distribution, recent techniques have shifted towards learning this distribution directly from the data. While learning-based approaches have imparted significant improvement in quality, some limitations remain to be addressed. First, learning graph distributions introduces additional computational overhead, which limits their scalability to large graph databases. Second, many techniques only learn the structure and do not address the need to also learn node and edge labels, which encode important semantic information and influence the structure itself. Third, existing techniques often incorporate domain-specific rules and lack generalizability. Fourth, the experimentation of existing techniques is not comprehensive enough due to either using weak evaluation metrics or focusing primarily on synthetic or small datasets. In this work, we develop a domain-agnostic technique called GraphGen to overcome all of these limitations. GraphGen converts graphs to sequences using minimum DFS codes. Minimum DFS codes are canonical labels and capture the graph structure precisely along with the label information. The complex joint distributions between structure and semantic labels are learned through a novel LSTM architecture. Extensive experiments on million-sized, real graph datasets show GraphGen to be 4 times faster on average than state-of-the-art techniques while being significantly better in quality across a comprehensive set of 11 different metrics. Our code is released at https://github.com/idea-iitd/graphgen.

The recent emergence of a new coronavirus (COVID-19) has gained a high cover in public media and worldwide news. The virus has caused a viral pneumonia in tens of thousands of people in Wuhan, a central city of China. This short paper gives a brief introduction on how the demand for information on this new epidemic is reported through Google Trends. The reported period is 31 December 2020 to 20 March 2020. The authors draw conclusions on current infodemiological data on COVID-19 using three main search keywords: coronavirus, SARS and MERS. Two approaches are set. First is the worldwide perspective, second - the Chinese one, which reveals that in China this disease in the first days was more often referred to SARS then to general coronaviruses, whereas worldwide, since the beginning, it is more often referred to coronaviruses.

We compare the performance of two popular iterative algorithms, fictitious play and counterfactual regret minimization, in approximating Nash equilibrium in multiplayer games. Despite recent success of counterfactual regret minimization in multiplayer poker and conjectures of its superiority, we show that fictitious play leads to improved Nash equilibrium approximation with statistical significance over a variety of game classes and sizes.

Automated algorithm selection and hyperparameter tuning facilitates the application of machine learning. Traditional multi-armed bandit strategies look to the history of observed rewards to identify the most promising arms for optimizing expected total reward in the long run. When considering limited time budgets and computational resources, this backward view of rewards is inappropriate as the bandit should look into the future for anticipating the highest final reward at the end of a specified time budget. This work addresses that insight by introducing HAMLET, which extends the bandit approach with learning curve extrapolation and computation time-awareness for selecting among a set of machine learning algorithms. Results show that the HAMLET Variants 1-3 exhibit equal or better performance than other bandit-based algorithm selection strategies in experiments with recorded hyperparameter tuning traces for the majority of considered time budgets. The best performing HAMLET Variant 3 combines learning curve extrapolation with the well-known upper confidence bound exploration bonus. That variant performs better than all non-HAMLET policies with statistical significance at the 95% level for 1,485 runs.

In this paper we introduce the Ladder Algorithm; a novel recurrent algorithm to detect repetitive structures in natural images with high accuracy using little training data.

We then demonstrate the algorithm on the task of extracting vertebrae from whole spine magnetic resonance scans with only lumbar MR scans for training data. It is shown to achieve high perforamance with 99.8% precision and recall, exceeding current state of the art approaches for lumbar vertebrae detection in T1 and T2 weighted scans. It also generalises without retraining to whole spine images with minimal drop in accuracy, achieving 99.4% detection rate.

We consider incentivized exploration: a version of multi-armed bandits where the choice of actions is controlled by self-interested agents, and the algorithm can only issue recommendations. The algorithm controls the flow of information, and the information asymmetry can incentivize the agents to explore. Prior work matches the optimal regret rates for bandits up to "constant" multiplicative factors determined by the Bayesian prior. However, the dependence on the prior in prior work could be arbitrarily large, and the dependence on the number of arms K could be exponential. The optimal dependence on the prior and K is very unclear. We make progress on these issues. Our first result is that Thompson sampling is incentive-compatible if initialized with enough data points. Thus, we reduce the problem of designing incentive-compatible algorithms to that of sample complexity:

(i) How many data points are needed to incentivize Thompson sampling?

(ii) How many rounds does it take to collect these samples?

We address both questions, providing upper bounds on sample complexity that are typically polynomial in K and lower bounds that are polynomially matching.

Super-resolution aims at increasing image resolution by algorithmic means and has progressed over the recent years due to advances in the fields of computer vision and deep learning. Convolutional Neural Networks based on a variety of architectures have been applied to the problem, e.g. autoencoders and residual networks. While most research focuses on the processing of photographs consisting only of RGB color channels, little work can be found concentrating on multi-band, analytic satellite imagery. Satellite images often include a panchromatic band, which has higher spatial resolution but lower spectral resolution than the other bands. In the field of remote sensing, there is a long tradition of applying pan-sharpening to satellite images, i.e. bringing the multispectral bands to the higher spatial resolution by merging them with the panchromatic band. To our knowledge there are so far no approaches to super-resolution which take advantage of the panchromatic band. In this paper we propose a method to train state-of-the-art CNNs using pairs of lower-resolution multispectral and high-resolution pan-sharpened image tiles in order to create super-resolved analytic images. The derived quality metrics show that the method improves information content of the processed images. We compare the results created by four CNN architectures, with RedNet30 performing best.

The primary characteristic of robust speaker representations is that they are invariant to factors of variability not related to speaker identity. Disentanglement of speaker representations is one of the techniques used to improve robustness of speaker representations to both intrinsic factors that are acquired during speech production (e.g., emotion, lexical content) and extrinsic factors that are acquired during signal capture (e.g., channel, noise). Disentanglement in neural speaker representations can be achieved either in a supervised fashion with annotations of the nuisance factors (factors not related to speaker identity) or in an unsupervised fashion without labels of the factors to be removed. In either case it is important to understand the extent to which the various factors of variability are entangled in the representations. In this work, we examine speaker representations with and without unsupervised disentanglement for the amount of information they capture related to a suite of factors. Using classification experiments we provide empirical evidence that disentanglement reduces the information with respect to nuisance factors from speaker representations, while retaining speaker information. This is further validated by speaker verification experiments on the VOiCES corpus in several challenging acoustic conditions. We also show improved robustness in speaker verification tasks using data augmentation during training of disentangled speaker embeddings. Finally, based on our findings, we provide insights into the factors that can be effectively separated using the unsupervised disentanglement technique and discuss potential future directions.

The spoofing countermeasure (CM) systems in automatic speaker verification (ASV) are not typically used in isolation of each other. These systems can be combined, for example, into a cascaded system where CM produces first a decision whether the input is synthetic or bona fide speech. In case the CM decides it is a bona fide sample, then the ASV system will consider it for speaker verification. End users of the system are not interested in the performance of the individual sub-modules, but instead are interested in the performance of the combined system. Such combination can be evaluated with tandem detection cost function (t-DCF) measure, yet the individual components are trained separately from each other using their own performance metrics. In this work we study training the ASV and CM components together for a better t-DCF measure by using reinforcement learning. We demonstrate that such training procedure indeed is able to improve the performance of the combined system, and does so with more reliable results than with the standard supervised learning techniques we compare against.

Recent works have demonstrated that deep learning (DL) based compressed sensing (CS) implementation can provide impressive improvements to reconstruct high-quality MR images from sub-sampling k-space data. However, network architectures adopted in current methods are all designed by handcraft, thus the performances of these networks are limited by researchers' expertise and labor. In this manuscript, we proposed a novel and efficient MR image reconstruction framework by Neural Architecture Search (NAS) algorithm. The inner cells in our reconstruction network are automatically defined from a flexible search space in a differentiable manner. Comparing to previous works where only several common convolutional operations are tried by human, our method can explore different operations (e.g. dilated convolution) with their possible combinations sufficiently. Our proposed method can also reach a better trade-off between computation cost and reconstruction performance for practical clinical translation. Experiments performed on a publicly available dataset show that our network produces better reconstruction results compared to the previous state-of-the-art methods in terms of PSNR and SSIM with 4 times fewer computation resources. The final network architecture found by the algorithm can also offer insights for network architecture designed in other medical image analysis applications.

Since the late 1950's when quasi-Newton methods first appeared, they have become one of the most widely used and efficient algorithmic paradigms for unconstrained optimization. Despite their immense practical success, there is little theory that shows why these methods are so efficient. We provide a semi-local rate of convergence for the randomized BFGS method which can be significantly better than that of gradient descent, finally giving theoretical evidence supporting the superior empirical performance of the method.

In this paper, we conduct mathematical and numerical analyses for COVID-19. To predict the trend of COVID-19, we propose a time-dependent SIR model that tracks the transmission rate and the recovering rate at time $t$. Using the data provided by China authority, we show our one-day prediction errors are almost less than $3\%$. The turning point and the total number of confirmed cases in China are predicted under our model. To analyze the impact of the asymptomatic infections on the spread of disease, we extend our SIR model by considering two types of infected persons: detectable infected persons and undetectable infected persons. Whether there is an outbreak is characterized by the spectral radius of a $2 \times 2$ matrix that is closely related to the basic reproduction number $R_0$. We plot the phase transition diagram of an outbreak and show that there are several countries on the verge of COVID-19 outbreaks on Mar. 2, 2020. To illustrate the effectiveness of social distancing, we analyze the independent cascade model for disease propagation in a random network specified by a degree distribution. We show two approaches of social distancing that can lead to a reduction of $R_0$.

Absorption imaging is the most common probing technique in experiments with ultracold atoms. The standard procedure involves the division of two frames acquired at successive exposures, one with the atomic absorption signal and one without. A well-known problem is the presence of residual structured noise in the final image, due to small differences between the imaging light in the two exposures. Here we solve this problem by performing absorption imaging with only a single exposure, where instead of a second exposure the reference frame is generated by an unsupervised image-completion autoencoder neural network. The network is trained on images without absorption signal such that it can infer the noise overlaying the atomic signal based only on the information in the region encircling the signal. We demonstrate our approach on data captured with a quantum degenerate Fermi gas. The average residual noise in the resulting images is below that of the standard double-shot technique. Our method simplifies the experimental sequence, reduces the hardware requirements, and can improve the accuracy of extracted physical observables. The trained network and its generating scripts are available as an open-source repository (this http URL).

We consider the problem of neural network training in a time-varying context. Machine learning algorithms have excelled in problems that do not change over time. However, problems encountered in financial markets are often non-stationary. We propose the online early stopping algorithm and show that a neural network trained using this algorithm can track a function changing with unknown dynamics. We applied the proposed algorithm to the stock return prediction problem studied in Gu et al. (2019) and achieved mean rank correlation of 4.69%, almost twice as high as the expanding window approach. We also show that prominent factors, such as the size effect and momentum, exhibit time varying stock return predictiveness.

This paper provides a tutorial introduction to disk margins. These are robust stability measures that account for simultaneous gain and phase perturbations in a feedback system. The paper first reviews the classical (gain-only and phase-only) margins and their limitations. This motivates the use of disk margins which are defined using a set of perturbations that have simultaneous gain and phase variations. A necessary and sufficient condition is provided to compute the disk margin for a single-input, single-output feedback system. Frequency-dependent disk margins can also be computed yielding additional insight. The paper concludes with a discussion of stability margins for multiple-input, multiple output (MIMO) feedback systems. A typical approach is to assess robust stability "loop-at-a-time" with a perturbation introduced into a single channel and all other channels held at their nominal values. MIMO disk margins provide a useful extension to consider simultaneous variations in multiple channels. This multiple-loop analysis can provide a more accurate robustness assessment as compared to the loop-at-a-time approach.

3D instance segmentation, with a variety of applications in robotics and augmented reality, is in large demands these days. Unlike 2D images that are projective observations of the environment, 3D models provide metric reconstruction of the scenes without occlusion or scale ambiguity. In this paper, we define "3D occupancy size", as the number of voxels occupied by each instance. It owns advantages of robustness in prediction, on which basis, OccuSeg, an occupancy-aware 3D instance segmentation scheme is proposed. Our multi-task learning produces both occupancy signal and embedding representations, where the training of spatial and feature embeddings varies with their difference in scale-aware. Our clustering scheme benefits from the reliable comparison between the predicted occupancy size and the clustered occupancy size, which encourages hard samples being correctly clustered and avoids over segmentation. The proposed approach achieves state-of-the-art performance on 3 real-world datasets, i.e. ScanNetV2, S3DIS and SceneNN, while maintaining high efficiency.

Personalized dialogue systems are an essential step toward better human-machine interaction. Existing personalized dialogue agents rely on properly designed conversational datasets, which are mostly monolingual (e.g., English), which greatly limits the usage of conversational agents in other languages. In this paper, we propose a multi-lingual extension of Persona-Chat, namely XPersona. Our dataset includes persona conversations in six different languages other than English for building and evaluating multilingual personalized agents. We experiment with both multilingual and cross-lingual trained baselines, and evaluate them against monolingual and translation-pipeline models using both automatic and human evaluation. Experimental results show that the multilingual trained models outperform the translation-pipeline and that they are on par with the monolingual models, with the advantage of having a single model across multiple languages. On the other hand, the state-of-the-art cross-lingual trained models achieve inferior performance to the other models, showing that cross-lingual conversation modeling is a challenging task. We hope that our dataset and baselines will accelerate research in multilingual dialogue systems.

This paper studies privacy-preserving weighted federated learning within the oracle-aided multi-party computation (MPC) framework. The contribution of this paper mainly comprises the following three-fold:

In the first fold, a new notion which we call weighted federated learning (wFL) is introduced and formalized inspired by McMahan et al.'s seminal paper. The weighted federated learning concept formalized in this paper differs from that presented in McMahan et al.'s paper since both addition and multiplication operations are executed over ciphers in our model while these operations are executed over plaintexts in McMahan et al.'s model.

In the second fold, an oracle-aided MPC solution for computing weighted federated learning is formalized by decoupling the security of federated learning systems from that of underlying multi-party computations. Our decoupling formulation may benefit machine learning developers to select their best security practices from the state-of-the-art security tool sets;

In the third fold, a concrete solution to the weighted federated learning problem is presented and analysed. The security of our implementation is guaranteed by the security composition theorem assuming that the underlying multiplication algorithm is secure against honest-but-curious adversaries.

We expose a new security leak for smartphone users, which allows to stole user personal data by accessing the mobile operator user page when auto-login is employed. We show how any "apparently" genuine app can steal these data from some mobile operators, affecting more than 80% of Italian mobile smartphones.

Reinforcement learning is one of the most popular approaches for automated game playing. This method allows an agent to estimate the expected utility of its state in order to make optimal actions in an unknown environment. We seek to apply reinforcement learning algorithms to the game Flappy Bird. We implement SARSA and Q-Learning with some modifications such as $\epsilon$-greedy policy, discretization and backward updates. We find that SARSA and Q-Learning outperform the baseline, regularly achieving scores of 1400+, with the highest in-game score of 2069.

With the rapid popularity of blockchain, decentralized human intelligence tasks (HITs) are proposed to crowdsource human knowledge without relying on vulnerable third-party platforms. However, the inherent limits of blockchain cause decentralized HITs to face a few "new" challenges. For example, the confidentiality of solicited data turns out to be the sine qua non, though it was an arguably dispensable property in the centralized setting. To ensure the "new" requirement of data privacy, existing decentralized HITs use generic zero-knowledge proof frameworks (e.g. SNARK), but scarcely perform well in practice, due to the inherently expensive cost of generality.

We present a practical decentralized protocol for HITs, which also achieves the fairness between requesters and workers. At the core of our contributions, we avoid the powerful yet highly-costly generic zk-proof tools and propose a special-purpose scheme to prove the quality of encrypted data. By various non-trivial statement reformations, proving the quality of encrypted data is reduced to efficient verifiable encryption, thus making decentralized HITs practical. Along the way, we rigorously define the ideal functionality of decentralized HITs and then prove the security due to the ideal-real paradigm.

We further instantiate our protocol to implement a system called Dragoon, an instance of which is deployed atop Ethereum to facilitate an image annotation task used by ImageNet. Our evaluations demonstrate its practicality: the on-chain handling cost of Dragoon is even less than the handling fee of Amazon's Mechanical Turk for the same ImageNet HIT.

The emerging area of computational pathology (CPath) is ripe ground for the application of deep learning (DL) methods to healthcare due to the sheer volume of raw pixel data in whole-slide images (WSIs) of cancerous tissue slides. However, it is imperative for the DL algorithms relying on nuclei-level details to be able to cope with data from `the clinical wild', which tends to be quite challenging.

We study, and extend recently released PanNuke dataset consisting of ~200,000 nuclei categorized into 5 clinically important classes for the challenging tasks of segmenting and classifying nuclei in WSIs. Previous pan-cancer datasets consisted of only up to 9 different tissues and up to 21,000 unlabeled nuclei and just over 24,000 labeled nuclei with segmentation masks. PanNuke consists of 19 different tissue types that have been semi-automatically annotated and quality controlled by clinical pathologists, leading to a dataset with statistics similar to the clinical wild and with minimal selection bias. We study the performance of segmentation and classification models when applied to the proposed dataset and demonstrate the application of models trained on PanNuke to whole-slide images. We provide comprehensive statistics about the dataset and outline recommendations and research directions to address the limitations of existing DL tools when applied to real-world CPath applications.

Feature warping is a core technique in optical flow estimation; however, the ambiguity caused by occluded areas during warping is a major problem that remains unsolved. In this paper, we propose an asymmetric occlusion-aware feature matching module, which can learn a rough occlusion mask that filters useless (occluded) areas immediately after feature warping without any explicit supervision. The proposed module can be easily integrated into end-to-end network architectures and enjoys performance gains while introducing negligible computational cost. The learned occlusion mask can be further fed into a subsequent network cascade with dual feature pyramids with which we achieve state-of-the-art performance. At the time of submission, our method, called MaskFlownet, surpasses all published optical flow methods on the MPI Sintel, KITTI 2012 and 2015 benchmarks. Code is available at https://github.com/microsoft/MaskFlownet.

Multi-robot systems of increasing size and complexity are used to solve large-scale problems, such as area exploration and search and rescue. A key decision in human-robot teaming is dividing a multi-robot system into teams to address separate issues or to accomplish a task over a large area. In order to address the problem of selecting teams in a multi-robot system, we propose a new multimodal graph embedding method to construct a unified representation that fuses multiple information modalities to describe and divide a multi-robot system. The relationship modalities are encoded as directed graphs that can encode asymmetrical relationships, which are embedded into a unified representation for each robot. Then, the constructed multimodal representation is used to determine teams based upon unsupervised learning. We perform experiments to evaluate our approach on expert-defined team formations, large-scale simulated multi-robot systems, and a system of physical robots. Experimental results show that our method successfully decides correct teams based on the multifaceted internal structures describing multi-robot systems, and outperforms baseline methods based upon only one mode of information, as well as other graph embedding-based division methods.

We created this CORD-19-NER dataset with comprehensive named entity recognition (NER) on the COVID-19 Open Research Dataset Challenge (CORD-19) corpus (2020- 03-13). This CORD-19-NER dataset covers 74 fine-grained named entity types. It is automatically generated by combining the annotation results from four sources: (1) pre-trained NER model on 18 general entity types from Spacy, (2) pre-trained NER model on 18 biomedical entity types from SciSpacy, (3) knowledge base (KB)-guided NER model on 127 biomedical entity types with our distantly-supervised NER method, and (4) seed-guided NER model on 8 new entity types (specifically related to the COVID-19 studies) with our weakly-supervised NER method. We hope this dataset can help the text mining community build downstream applications. We also hope this dataset can bring insights for the COVID- 19 studies, both on the biomedical side and on the social side.

Optimal power flow (OPF) is a very fundamental but vital optimization problem in the power system, which aims at solving a specific objective function (ex.: generator costs) while maintaining the system in the stable and safe operations. In this paper, we adopted the start-of-the-art artificial intelligence (AI) techniques to train an agent aiming at solving the AC OPF problem, where the nonlinear power balance equations are considered. The modified IEEE-14 bus system were utilized to validate the proposed approach. The testing results showed a great potential of adopting AI techniques in the power system operations.

Vehicular cloud computing has emerged as a promising solution to fulfill users' demands on processing computation-intensive applications in modern driving environments. Such applications are commonly represented by graphs consisting of components and edges. However, encouraging vehicles to share resources poses significant challenges owing to users' selfishness. In this paper, an auction-based graph job allocation problem is studied in vehicular cloud-assisted networks considering resource reutilization. Our goal is to map each buyer (component) to a feasible seller (virtual machine) while maximizing the buyers' utility-of-service, which concerns the execution time and commission cost. First, we formulate the auction-based graph job allocation as an integer programming (IP) problem. Then, a Vickrey-Clarke-Groves based payment rule is proposed which satisfies the desired economical properties, truthfulness and individual rationality. We face two challenges: 1) the above-mentioned IP problem is NP-hard; 2) one constraint associated with the IP problem poses addressing the subgraph isomorphism problem. Thus, obtaining the optimal solution is practically infeasible in large-scale networks. Motivated by which, we develop a structure-preserved matching algorithm by maximizing the utility-of-service-gain, and the corresponding payment rule which offers economical properties and low computation complexity. Extensive simulations demonstrate that the proposed algorithm outperforms the benchmark methods considering various problem sizes.

The task of Knowledge Graph Completion (KGC) aims to automatically infer the missing fact information in Knowledge Graph (KG). In this paper, we take a new perspective that aims to leverage rich user-item interaction data (user interaction data for short) for improving the KGC task. Our work is inspired by the observation that many KG entities correspond to online items in application systems. However, the two kinds of data sources have very different intrinsic characteristics, and it is likely to hurt the original performance using simple fusion strategy. To address this challenge, we propose a novel adversarial learning approach by leveraging user interaction data for the KGC task. Our generator is isolated from user interaction data, and serves to improve the performance of the discriminator. The discriminator takes the learned useful information from user interaction data as input, and gradually enhances the evaluation capacity in order to identify the fake samples generated by the generator. To discover implicit entity preference of users, we design an elaborate collaborative learning algorithms based on graph neural networks, which will be jointly optimized with the discriminator. Such an approach is effective to alleviate the issues about data heterogeneity and semantic complexity for the KGC task. Extensive experiments on three real-world datasets have demonstrated the effectiveness of our approach on the KGC task.

Let $G$ be a graph with $n$ vertices and $m$ edges. One of several hierarchies towards the stability number of $G$ is the exact subgraph hierarchy (ESH). On the first level it computes the Lov\'{a}sz theta function $\vartheta(G)$ as semidefinite program (SDP) with a matrix variable of order $n+1$ and $n+m+1$ constraints. On the $k$-th level it adds all exact subgraph constraints (ESC) for subgraphs of order $k$ to the SDP. An ESC ensures that the submatrix of the matrix variable corresponding to the subgraph is in the correct polytope. By including only some ESCs into the SDP the ESH can be exploited computationally.

In this paper we introduce a variant of the ESH that computes $\vartheta(G)$ through an SDP with a matrix variable of order $n$ and $m+1$ constraints. We show that it makes sense to include the ESCs into this SDP and introduce the compressed ESH (CESH) analogously to the ESH. Computationally the CESH seems favorable as the SDP is smaller. However, we prove that the bounds based on the ESH are always at least as good as those of the CESH. In computations sometimes they are significantly better.

We also introduce scaled ESCs (SESCs), which are a more natural way to include exactness constraints into the smaller SDP and we prove that including an SESC is equivalent to including an ESC for every subgraph.

We present Catalyst.RL, an open-source PyTorch framework for reproducible and sample efficient reinforcement learning (RL) research. Main features of Catalyst.RL include large-scale asynchronous distributed training, efficient implementations of various RL algorithms and auxiliary tricks, such as n-step returns, value distributions, hyperbolic reinforcement learning, etc. To demonstrate the effectiveness of Catalyst.RL, we applied it to a physics-based reinforcement learning challenge "NeurIPS 2019: Learn to Move -- Walk Around" with the objective to build a locomotion controller for a human musculoskeletal model. The environment is computationally expensive, has a high-dimensional continuous action space and is stochastic. Our team took the 2nd place, capitalizing on the ability of Catalyst.RL to train high-quality and sample-efficient RL agents in only a few hours of training time. The implementation along with experiments is open-sourced so results can be reproduced and novel ideas tried out.

Governments and researchers around the world are implementing digital contact tracing solutions to stem the spread of infectious disease, namely COVID-19. Many of these solutions threaten individual rights and privacy. Our goal is to break past the false dichotomy of effective versus privacy-preserving contact tracing. We offer an alternative approach to assess and communicate users' risk of exposure to an infectious disease while preserving individual privacy. Our proposal uses recent GPS location histories, which are transformed and encrypted, and a private set intersection protocol to interface with a semi-trusted authority.

There have been other recent proposals for privacy-preserving contact tracing, based on Bluetooth and decentralization, that could further eliminate the need for trust in authority. However, solutions with Bluetooth are currently limited to certain devices and contexts while decentralization adds complexity. The goal of this work is two-fold: we aim to propose a location-based system that is more privacy-preserving than what is currently being adopted by governments around the world, and that is also practical to implement with the immediacy needed to stem a viral outbreak.

Optimization techniques are of great importance to effectively and efficiently train a deep neural network (DNN). It has been shown that using the first and second order statistics (e.g., mean and variance) to perform Z-score standardization on network activations or weight vectors, such as batch normalization (BN) and weight standardization (WS), can improve the training performance. Different from these existing methods that mostly operate on activations or weights, we present a new optimization technique, namely gradient centralization (GC), which operates directly on gradients by centralizing the gradient vectors to have zero mean. GC can be viewed as a projected gradient descent method with a constrained loss function. We show that GC can regularize both the weight space and output feature space so that it can boost the generalization performance of DNNs. Moreover, GC improves the Lipschitzness of the loss function and its gradient so that the training process becomes more efficient and stable. GC is very simple to implement and can be easily embedded into existing gradient based DNN optimizers with only one line of code. It can also be directly used to fine-tune the pre-trained DNNs. Our experiments on various applications, including general image classification, fine-grained image classification, detection and segmentation, demonstrate that GC can consistently improve the performance of DNN learning. The code of GC can be found at https://github.com/Yonghongwei/Gradient-Centralization.

Many real-world signal sources are complex-valued, having real and imaginary components. However, the vast majority of existing deep learning platforms and network architectures do not support the use of complex-valued data. MRI data is inherently complex-valued, so existing approaches discard the richer algebraic structure of the complex data. In this work, we investigate end-to-end complex-valued convolutional neural networks - specifically, for image reconstruction in lieu of two-channel real-valued networks. We apply this to magnetic resonance imaging reconstruction for the purpose of accelerating scan times and determine the performance of various promising complex-valued activation functions. We find that complex-valued CNNs with complex-valued convolutions provide superior reconstructions compared to real-valued convolutions with the same number of trainable parameters, over a variety of network architectures and datasets.

We study the security of large-scale cyber-physical systems (CPS) consisting of multiple interdependent subsystems, each managed by a different defender. Defenders invest their security budgets with the goal of thwarting the spread of cyber attacks to their critical assets. We model the security investment decisions made by the defenders as a security game. While prior work has used security games to analyze such scenarios, we propose behavioral security games, in which defenders exhibit characteristics of human decision making that have been identified in behavioral economics as representing typical human cognitive biases. This is important as many of the critical security decisions in our target class of systems are made by humans.

We provide empirical evidence for our behavioral model through a controlled subject experiment. We then show that behavioral decision making leads to a suboptimal pattern of resource allocation compared to non-behavioral decision making. We illustrate the effects of behavioral decision making using two representative real-world interdependent CPS. In particular, we identify the effects of the defenders' security budget availability and distribution, the degree of interdependency among defenders, and collaborative defense strategies, on the degree of suboptimality of security outcomes due to behavioral decision making. In this context, the adverse effects of behavioral decision making are most severe with moderate defense budgets. Moreover, the impact of behavioral suboptimal decision making is magnified as the degree of the interdependency between subnetworks belonging to different defenders increases. We also observe that selfish defense decisions together with behavioral decisions significantly increase security risk.

In this paper, we characterize the content and discourse on BitChute, a social video-hosting platform. Launched in 2017 as an alternative to YouTube, BitChute joins an ecosystem of alternative, low content moderation platforms, including Gab, Voat, Minds, and 4chan. Uniquely, BitChute is the first of these alternative platforms to focus on video content and is growing in popularity. Our analysis reveals several key characteristics of the platform. We find that only a handful of channels receive any engagement, and almost all of those channels contain conspiracies or hate speech. This high rate of hate speech on the platform as a whole, much of which is anti-Semitic, is particularly concerning. Our results suggest that BitChute has a higher rate of hate speech than Gab but less than 4chan. Lastly, we find that while some BitChute content producers have been banned from other platforms, many maintain profiles on mainstream social media platforms, particularly YouTube. This paper contributes a first look at the content and discourse on BitChute and provides a building block for future research on low content moderation platforms.

The fact that image datasets are often imbalanced poses an intense challenge for deep learning techniques. In this paper, we propose a method to restore the balance in imbalanced images, by coalescing two concurrent methods, generative adversarial networks (GANs) and capsule network. In our model, generative and discriminative networks play a novel competitive game, in which the generator generates samples towards specific classes from multivariate probabilities distribution. The discriminator of our model is designed in a way that while recognizing the real and fake samples, it is also requires to assign classes to the inputs. Since GAN approaches require fully observed data during training, when the training samples are imbalanced, the approaches might generate similar samples which leading to data overfitting. This problem is addressed by providing all the available information from both the class components jointly in the adversarial training. It improves learning from imbalanced data by incorporating the majority distribution structure in the generation of new minority samples. Furthermore, the generator is trained with feature matching loss function to improve the training convergence. In addition, prevents generation of outliers and does not affect majority class space. The evaluations show the effectiveness of our proposed methodology; in particular, the coalescing of capsule-GAN is effective at recognizing highly overlapping classes with much fewer parameters compared with the convolutional-GAN.

Word meaning has different aspects, while the existing word representation "compresses" these aspects into a single vector, and it needs further analysis to recover the information in different dimensions. Inspired by quantum probability, we represent words as density matrices, which are inherently capable of representing mixed states. The experiment shows that the density matrix representation can effectively capture different aspects of word meaning while maintaining comparable reliability with the vector representation. Furthermore, we propose a novel method to combine the coherent summation and incoherent summation in the computation of both vectors and density matrices. It achieves consistent improvement on word analogy task.

In this paper, we propose a deep learning framework, TSception, for emotion detection from electroencephalogram (EEG). TSception consists of temporal and spatial convolutional layers, which learn discriminative representations in the time and channel domains simultaneously. The temporal learner consists of multi-scale 1D convolutional kernels whose lengths are related to the sampling rate of the EEG signal, which learns multiple temporal and frequency representations. The spatial learner takes advantage of the asymmetry property of emotion responses at the frontal brain area to learn the discriminative representations from the left and right hemispheres of the brain. In our study, a system is designed to study the emotional arousal in an immersive virtual reality (VR) environment. EEG data were collected from 18 healthy subjects using this system to evaluate the performance of the proposed deep learning network for the classification of low and high emotional arousal states. The proposed method is compared with SVM, EEGNet, and LSTM. TSception achieves a high classification accuracy of 86.03%, which outperforms the prior methods significantly (p<0.05). The code is available at https://github.com/deepBrains/TSception

This paper reports a novel approach that uses transistor aging in an integrated circuit (IC) to detect hardware Trojans. When one reduces the supply voltage of a transistor from an initial higher voltage to a smaller voltage, it causes short term aging in the transistor. When stressing the transistor, the higher supply voltage produces a larger aging-induced threshold voltage. Transition of the supply voltage to a smaller value at this higher threshold voltage reduces the transistor current. This ends up in a longer switching delay through the transistor than what standalone voltage scaling causes. This increase in delay results in timing violations that reveal as timing errors at the output of the IC during its operation. We present experiments using aging-aware standard cell libraries to illustrate the usefulness of the technique in detecting hardware Trojans. Combining IC aging with over-clocking produces a pattern of bit errors at the IC output by the induced timing violations. We use machine learning to learn the bit error distribution at the output of a clean IC. We differentiate the divergence in the pattern of bit errors because of a Trojan in the IC from this baseline distribution. We simulate the golden IC and show robustness to IC-to-IC manufacturing variations. The approach is effective and can detect a Trojan even if we plant it far off the critical paths. Simulation results on benchmarks from the Trust-hub show a Trojan detection accuracy of over 99%.

The chance-constrained knapsack problem is a variant of the classical knapsack problem where each item has a weight distribution instead of a deterministic weight. The objective is to maximize the total profit of the selected items under the condition that the weight of the selected items only exceeds the given weight bound with a small probability of $\alpha$. In this paper, consider problem-specific single-objective and multi-objective approaches for the problem. We examine the use of heavy-tail mutations and introduce a problem-specific crossover operator to deal with the chance-constrained knapsack problem. Empirical results for single-objective evolutionary algorithms show the effectiveness of our operators compared to the use of classical operators. Moreover, we introduce a new effective multi-objective model for the chance-constrained knapsack problem. We use this model in combination with the problem-specific crossover operator in multi-objective evolutionary algorithms to solve the problem. Our experimental results show that this leads to significant performance improvements when using the approach in evolutionary multi-objective algorithms such as GSEMO and NSGA-II.

Several emerging technologies for byte-addressable non-volatile memory (NVM) have been considered to replace DRAM as the main memory in computer systems during the last years. The disadvantage of a lower write endurance, compared to DRAM, of NVM technologies like Phase-Change Memory (PCM) or Ferroelectric RAM (FeRAM) has been addressed in the literature. As a solution, in-memory wear-leveling techniques have been proposed, which aim to balance the wear-level over all memory cells to achieve an increased memory lifetime. Generally, to apply such advanced aging-aware wear-leveling techniques proposed in the literature, additional special hardware is introduced into the memory system to provide the necessary information about the cell age and thus enable aging-aware wear-leveling decisions.

This paper proposes software-only aging-aware wear-leveling based on common CPU features and does not rely on any additional hardware support from the memory subsystem. Specifically, we exploit the memory management unit (MMU), performance counters, and interrupts to approximate the memory write counts as an aging indicator. Although the software-only approach may lead to slightly worse wear-leveling, it is applicable on commonly available hardware. We achieve page-level coarse-grained wear-leveling by approximating the current cell age through statistical sampling and performing physical memory remapping through the MMU. This method results in non-uniform memory usage patterns within a memory page. Hence, we further propose a fine-grained wear-leveling in the stack region of C / C++ compiled software.

By applying both wear-leveling techniques, we achieve up to $78.43\%$ of the ideal memory lifetime, which is a lifetime improvement of more than a factor of $900$ compared to the lifetime without any wear-leveling.

Deep unsupervised representation learning has recently led to new approaches in the field of Unsupervised Anomaly Detection (UAD) in brain MRI. The main principle behind these works is to learn a model of normal anatomy by learning to compress and recover healthy data. This allows to spot abnormal structures from erroneous recoveries of compressed, potentially anomalous samples. The concept is of great interest to the medical image analysis community as it i) relieves from the need of vast amounts of manually segmented training data---a necessity for and pitfall of current supervised Deep Learning---and ii) theoretically allows to detect arbitrary, even rare pathologies which supervised approaches might fail to find. To date, the experimental design of most works hinders a valid comparison, because i) they are evaluated against different datasets and different pathologies, ii) use different image resolutions and iii) different model architectures with varying complexity. The intent of this work is to establish comparability among recent methods by utilizing a single architecture, a single resolution and the same dataset(s). Besides providing a ranking of the methods, we also try to answer questions like i) how many healthy training subjects are needed to model normality and ii) if the reviewed approaches are also sensitive to domain shift. Further, we identify open challenges and provide suggestions for future community efforts and research directions.

Natural language inference (NLI) and semantic textual similarity (STS) are key tasks in natural language understanding (NLU). Although several benchmark datasets for those tasks have been released in English and a few other languages, there are no publicly available NLI or STS datasets in the Korean language. Motivated by this, we construct and release new datasets for Korean NLI and STS, dubbed KorNLI and KorSTS, respectively. Following previous approaches, we machine-translate existing English training sets and manually translate development and test sets into Korean. To accelerate research on Korean NLU, we also establish baselines on KorNLI and KorSTS. Our datasets are made publicly available via our GitHub repository.

Automatic image-based food recognition is a particularly challenging task. Traditional image analysis approaches have achieved low classification accuracy in the past, whereas deep learning approaches enabled the identification of food types and their ingredients. The contents of food dishes are typically deformable objects, usually including complex semantics, which makes the task of defining their structure very difficult. Deep learning methods have already shown very promising results in such challenges, so this chapter focuses on the presentation of some popular approaches and techniques applied in image-based food recognition. The three main lines of solutions, namely the design from scratch, the transfer learning and the platform-based approaches, are outlined, particularly for the task at hand, and are tested and compared to reveal the inherent strengths and weaknesses. The chapter is complemented with basic background material, a section devoted to the relevant datasets that are crucial in light of the empirical approaches adopted, and some concluding remarks that underline the future directions.

Despite rapid advances in image-based machine learning, the threat identification of a knife wielding attacker has not garnered substantial academic attention. This relative research gap appears less understandable given the high knife assault rate (>100,000 annually) and the increasing availability of public video surveillance to analyze and forensically document. We present three complementary methods for scoring automated threat identification using multiple knife image datasets, each with the goal of narrowing down possible assault intentions while minimizing misidentifying false positives and risky false negatives. To alert an observer to the knife-wielding threat, we test and deploy classification built around MobileNet in a sparse and pruned neural network with a small memory requirement (< 2.2 megabytes) and 95% test accuracy. We secondly train a detection algorithm (MaskRCNN) to segment the hand from the knife in a single image and assign probable certainty to their relative location. This segmentation accomplishes both localization with bounding boxes but also relative positions to infer overhand threats. A final model built on the PoseNet architecture assigns anatomical waypoints or skeletal features to narrow the threat characteristics and reduce misunderstood intentions. We further identify and supplement existing data gaps that might blind a deployed knife threat detector such as collecting innocuous hand and fist images as important negative training sets. When automated on commodity hardware and software solutions one original research contribution is this systematic survey of timely and readily available image-based alerts to task and prioritize crime prevention countermeasures prior to a tragic outcome.

We study the attribution problem [28] for deep networks applied to perception tasks. For vision tasks, attribution techniques attribute the prediction of a network to the pixels of the input image. We propose a new technique called \emph{Blur Integrated Gradients}. This technique has several advantages over other methods. First, it can tell at what scale a network recognizes an object. It produces scores in the scale/frequency dimension, that we find captures interesting phenomena. Second, it satisfies the scale-space axioms [14], which imply that it employs perturbations that are free of artifact. We therefore produce explanations that are cleaner and consistent with the operation of deep networks. Third, it eliminates the need for a 'baseline' parameter for Integrated Gradients [31] for perception tasks. This is desirable because the choice of baseline has a significant effect on the explanations. We compare the proposed technique against previous techniques and demonstrate application on three tasks: ImageNet object recognition, Diabetic Retinopathy prediction, and AudioSet audio event identification.

There exist several inherent trade-offs in designing a fair model, such as those between the model's predictive performance and fairness, or even among different notions of fairness. In practice, exploring these trade-offs requires significant human and computational resources. We propose a diagnostic that enables practitioners to explore these trade-offs without training a single model. Our work hinges on the observation that many widely-used fairness definitions can be expressed via the fairness-confusion tensor, an object obtained by splitting the traditional confusion matrix according to protected data attributes. Optimizing accuracy and fairness objectives directly over the elements in this tensor yields a data-dependent yet model-agnostic way of understanding several types of trade-offs. We further leverage this tensor-based perspective to generalize existing theoretical impossibility results to a wider range of fairness definitions. Finally, we demonstrate the usefulness of the proposed diagnostic on synthetic and real datasets.

Conversational AI assistants are becoming popular and question-answering is an important part of any conversational assistant. Using relevant utterances as features in question-answering has shown to improve both the precision and recall for retrieving the right answer by a conversational assistant. Hence, utterance generation has become an important problem with the goal of generating relevant utterances (sentences or phrases) from a knowledge base article that consists of a title and a description. However, generating good utterances usually requires a lot of manual effort, creating the need for an automated utterance generation. In this paper, we propose an utterance generation system which 1) uses extractive summarization to extract important sentences from the description, 2) uses multiple paraphrasing techniques to generate a diverse set of paraphrases of the title and summary sentences, and 3) selects good candidate paraphrases with the help of a novel candidate selection algorithm.

In this paper, we present TailorNet, a neural model which predicts clothing deformation in 3D as a function of three factors: pose, shape and style (garment geometry), while retaining wrinkle detail. This goes beyond prior models, which are either specific to one style and shape, or generalize to different shapes producing smooth results, despite being style specific. Our hypothesis is that (even non-linear) combinations of examples smooth out high frequency components such as fine-wrinkles, which makes learning the three factors jointly hard. At the heart of our technique is a decomposition of deformation into a high frequency and a low frequency component. While the low-frequency component is predicted from pose, shape and style parameters with an MLP, the high-frequency component is predicted with a mixture of shape-style specific pose models. The weights of the mixture are computed with a narrow bandwidth kernel to guarantee that only predictions with similar high-frequency patterns are combined. The style variation is obtained by computing, in a canonical pose, a subspace of deformation, which satisfies physical constraints such as inter-penetration, and draping on the body. TailorNet delivers 3D garments which retain the wrinkles from the physics based simulations (PBS) it is learned from, while running more than 1000 times faster. In contrast to PBS, TailorNet is easy to use and fully differentiable, which is crucial for computer vision algorithms. Several experiments demonstrate TailorNet produces more realistic results than prior work, and even generates temporally coherent deformations on sequences of the AMASS dataset, despite being trained on static poses from a different dataset. To stimulate further research in this direction, we will make a dataset consisting of 55800 frames, as well as our model publicly available at https://virtualhumans.mpi-inf.mpg.de/tailornet.

Graph burning runs on discrete time steps. The aim of the graph burning problem is to burn all the vertices in a given graph in the least amount of time steps. The least number of required time steps is known to be the burning number of the graph. The spread of social influence, an alarm, or a social contagion can be modeled using graph burning. The less the burning number, the quick the spread.

Computationally, graph burning is hard. It has already been proved that burning of path forests, spider graphs, and trees with maximum degree three are NP-Complete. In this work we study graph burning on geometric graphs and show NP-completeness results on several sub classes. More precisely, we show burning problem to be NP-complete on interval graph, permutation graph and disk graph.