# cs updates on arXiv.org

## Computer Science (cs) updates on the arXiv.org e-print archive



We propose exact count formulae for the 21 topologically distinct non-induced connected subgraphs on five nodes, in simple, unweighted and undirected graphs. We prove the main result using short and purely combinatorial arguments that can be adapted to derive count formulae for larger subgraphs. To illustrate, we give analytic results for some regular graphs, and present a short empirical application on real-world network data. We also discuss the well-known result that induced subgraph counts follow as linear combinations of non-induced counts.

There is an increasing focus on model-based dialog evaluation metrics such as ADEM, RUBER, and the more recent BERT-based metrics. These models aim to assign a high score to all relevant responses and a low score to all irrelevant responses. Ideally, such models should be trained using multiple relevant and irrelevant responses for any given context. However, no such data is publicly available, and hence existing models are usually trained using a single relevant response and multiple randomly selected responses from other contexts (random negatives). To allow for better training and robust evaluation of model-based metrics, we introduce the DailyDialog++ dataset, consisting of (i) five relevant responses for each context and (ii) five adversarially crafted irrelevant responses for each context. Using this dataset, we first show that even in the presence of multiple correct references, n-gram based metrics and embedding based metrics do not perform well at separating relevant responses from even random negatives. While model-based metrics perform better than n-gram and embedding based metrics on random negatives, their performance drops substantially when evaluated on adversarial examples. To check if large scale pretraining could help, we propose a new BERT-based evaluation metric called DEB, which is pretrained on 727M Reddit conversations and then finetuned on our dataset. DEB significantly outperforms existing models, showing better correlation with human judgements and better performance on random negatives (88.27% accuracy). However, its performance again drops substantially, when evaluated on adversarial responses, thereby highlighting that even large-scale pretrained evaluation models are not robust to the adversarial examples in our dataset. The dataset and code are publicly available.

Research in vertebral bone micro-structure generally requires costly procedures to obtain physical scans of real bone with a specific pathology under study, since no methods are available yet to generate realistic bone structures in-silico. Here we propose to apply recent advances in generative adversarial networks (GANs) to develop such a method. We adapted style-transfer techniques, which have been largely used in other contexts, in order to transfer style between image pairs while preserving its informational content. In a first step, we trained a volumetric generative model in a progressive manner using a Wasserstein objective and gradient penalty (PWGAN-GP) to create patches of realistic bone structure in-silico. The training set contained 7660 purely spongeous bone samples from twelve human vertebrae (T12 or L1) with isotropic resolution of 164um and scanned with a high resolution peripheral quantitative CT (Scanco XCT). After training, we generated new samples with tailored micro-structure properties by optimizing a vector z in the learned latent space. To solve this optimization problem, we formulated a differentiable goal function that leads to valid samples while compromising the appearance (content) with target 3D properties (style). Properties of the learned latent space effectively matched the data distribution. Furthermore, we were able to simulate the resulting bone structure after deterioration or treatment effects of osteoporosis therapies based only on expected changes of micro-structural parameters. Our method allows to generate a virtually infinite number of patches of realistic bone micro-structure, and thereby likely serves for the development of bone-biomarkers and to simulate bone therapies in advance.

A variant of the index coding problem (ICP), the embedded index coding problem (EICP) was introduced in [A. Porter and M. Wootters, "Embedded index coding," ITW, Sweden, 2019] which was motivated by its application in distributed computing where every user can act as sender for other users and an algorithm for code construction was reported. The constructions depends on the computation of minrank of a matrix, which is computationally intensive. In [A. Mahesh, N. Sageer Karat and B. S. Rajan, "Min-rank of Embedded Index Coding Problems," ISIT, 2020], for EICP, a notion of side-information matrix was introduced and it was proved that the length of an optimal scalar linear index code is equal to the min-rank of the side-information matrix. The authors have provided an explicit code construction for a class of EICP - \textit{Consecutive and Symmetric Embedded Index Coding Problem (CS-EICP)}. We introduce the idea of sub-packetization of the messages in index coding problems to provide a novel code construction for CS-EICP in contrast to the scalar linear solutions provided in the prior works. For CS-EICP, the normalized rate, which is defined as the number of bits transmitted by all the users together normalized by the total number of bits of all the messages, for our construction is lesser than the normalized rate achieved by Mahesh et al.,for scalar linear codes.

In this work we study a variant of the well-known multi-armed bandit (MAB) problem, which has the properties of a delay in feedback, and a loss that declines over time. We introduce an algorithm, EXP4-DFDC, to solve this MAB variant, and demonstrate that the regret vanishes as the time increases. We also show that LeCaR, a previously published machine learning-based cache replacement algorithm, is an instance of EXP4-DFDC. Our results can be used to provide insight on the choice of hyperparameters, and optimize future LeCaR instances.

The Gibbs Sampler is a general method for sampling high-dimensional distributions, dating back to 1971 [Turchin1971]. In each step, we pick a random coordinate and re-sample that coordinate from the distribution induced by fixing all other coordinates. While it has become widely used over the past half-century, guarantees of efficient convergence have been elusive. Here we show that for convex bodies in $\mathbb{R}^{n}$ with diameter $D$, the resulting Coordinate Hit-and-Run (CHAR) algorithm mixes in poly$(n,D)$ steps. This is the first polynomial guarantee for this widely-used algorithm. We also give a lower bound on the mixing rate, showing that it is strictly worse than hit-and-run or the ball walk in the worst case.

While being an essential component of spoken language, fillers (e.g."um" or "uh") often remain overlooked in Spoken Language Understanding (SLU) tasks. We explore the possibility of representing them with deep contextualised embeddings, showing improvements on modelling spoken language and two downstream tasks - predicting a speaker's stance and expressed confidence.

In this work, we propose a multi-stage training strategy for the development of deep learning algorithms applied to problems with multiscale features. Each stage of the pro-posed strategy shares an (almost) identical network structure and predicts the same reduced order model of the multiscale problem. The output of the previous stage will be combined with an intermediate layer for the current stage. We numerically show that using different reduced order models as inputs of each stage can improve the training and we propose several ways of adding different information into the systems. These methods include mathematical multiscale model reductions and network approaches; but we found that the mathematical approach is a systematical way of decoupling information and gives the best result. We finally verified our training methodology on a time dependent nonlinear problem and a steady state model

We consider the problem of relative pose regression in visual relocalization. Recently, several promising approaches have emerged in this area. We claim that even though they demonstrate on the same datasets using the same split to train and test, a faithful comparison between them was not available since on currently used evaluation metric, some approaches might perform favorably, while in reality performing worse. We reveal a tradeoff between accuracy and the 3D volume of the regressed subspace. We believe that unlike other relocalization approaches, in the case of relative pose regression, the regressed subspace 3D volume is less dependent on the scene and more affect by the method used to score the overlap, which determined how closely sampled viewpoints are. We propose three new metrics to remedy the issue mentioned above. The proposed metrics incorporate statistics about the regression subspace volume. We also propose a new pose regression network that serves as a new baseline for this task. We compare the performance of our trained model on Microsoft 7-Scenes and Cambridge Landmarks datasets both with the standard metrics and the newly proposed metrics and adjust the overlap score to reveal the tradeoff between the subspace and performance. The results show that the proposed metrics are more robust to different overlap threshold than the conventional approaches. Finally, we show that our network generalizes well, specifically, training on a single scene leads to little loss of performance on the other scenes.

This paper presents an optimization-based collision avoidance trajectory generation method for autonomous driving in free-space environments, with enhanced robust-ness, driving comfort and efficiency. Starting from the hybrid optimization-based framework, we introduces two warm start methods, temporal and dual variable warm starts, to improve the efficiency. We also reformulates the problem to improve the robustness and efficiency. We name this new algorithm TDR-OBCA. With these changes, compared with original hybrid optimization we achieve a 96.67% failure rate decrease with respect to initial conditions, 13.53% increase in driving comforts and 3.33% to 44.82% increase in planner efficiency as obstacles number scales. We validate our results in hundreds of simulation scenarios and hundreds of hours of public road tests in both U.S. and China. Our source code is availableathttps://github.com/ApolloAuto/apollo.

Machine learning is rapidly becoming one of the most important technology for malware traffic detection, since the continuous evolution of malware requires a constant adaptation and the ability to generalize. However, network traffic datasets are usually oversized and contain redundant and irrelevant information, and this may dramatically increase the computational cost and decrease the accuracy of most classifiers, with the risk to introduce further noise.

We propose two novel dataset optimization strategies which exploit and combine several state-of-the-art approaches in order to achieve an effective optimization of the network traffic datasets used to train malware detectors. The first approach is a feature selection technique based on mutual information measures and sensibility enhancement. The second is a dimensional reduction technique based autoencoders. Both these approaches have been experimentally applied on the MTA-KDD'19 dataset, and the optimized results evaluated and compared using a Multi Layer Perceptron as machine learning model for malware detection.

Constrained Markov Decision Processes (CMDPs) formalize sequential decision-making problems whose objective is to minimize a cost function while satisfying constraints on various cost functions. In this paper, we consider the setting of episodic fixed-horizon CMDPs. We propose an online algorithm which leverages the linear programming formulation of finite-horizon CMDP for repeated optimistic planning to provide a probably approximately correct (PAC) guarantee on the number of episodes needed to ensure an $\epsilon$-optimal policy, i.e., with resulting objective value within $\epsilon$ of the optimal value and satisfying the constraints within $\epsilon$-tolerance, with probability at least $1-\delta$. The number of episodes needed is shown to be of the order $\tilde{\mathcal{O}}\big(\frac{|S||A|C^{2}H^{2}}{\epsilon^{2}}\log\frac{1}{\delta}\big)$, where $C$ is the upper bound on the number of possible successor states for a state-action pair. Therefore, if $C \ll |S|$, the number of episodes needed have a linear dependence on the state and action space sizes $|S|$ and $|A|$, respectively, and quadratic dependence on the time horizon $H$.

In this work we introduce a method for estimating entropy rate and entropy production rate from finite symbolic time series. From the point of view of statistics, estimating entropy from a finite series can be interpreted as a problem of estimating parameters of a distribution with a censored or truncated sample. We use this point of view to give estimations of entropy rate and entropy production rate assuming that this is a parameter of a (limiting) distribution. The last statement is actually a consequence of the fact that the distribution of estimators coming from recurrence-time statistics comply with the central limit theorem. We test our method in a Markov chain model where these quantities can be exactly computed.

This document presents a detailed description of the challenge on clarifying questions for dialogue systems (ClariQ). The challenge is organized as part of the Conversational AI challenge series (ConvAI3) at Search Oriented Conversational AI (SCAI) EMNLP workshop in 2020. The main aim of the conversational systems is to return an appropriate answer in response to the user requests. However, some user requests might be ambiguous. In IR settings such a situation is handled mainly thought the diversification of the search result page. It is however much more challenging in dialogue settings with limited bandwidth. Therefore, in this challenge, we provide a common evaluation framework to evaluate mixed-initiative conversations. Participants are asked to rank clarifying questions in an information-seeking conversations. The challenge is organized in two stages where in Stage 1 we evaluate the submissions in an offline setting and single-turn conversations. Top participants of Stage 1 get the chance to have their model tested by human annotators.

The present paper is devoted to clustering geometric graphs. While the standard spectral clustering is often not effective for geometric graphs, we present an effective generalization, which we call higher-order spectral clustering. It resembles in concept the classical spectral clustering method but uses for partitioning the eigenvector associated with a higher-order eigenvalue. We establish the weak consistency of this algorithm for a wide class of geometric graphs which we call Soft Geometric Block Model. A small adjustment of the algorithm provides strong consistency. We also show that our method is effective in numerical experiments even for graphs of modest size.

Learning low-dimensional representations for entities and relations in knowledge graphs using contrastive estimation represents a scalable and effective method for inferring connectivity patterns. A crucial aspect of contrastive learning approaches is the choice of corruption distribution that generates hard negative samples, which force the embedding model to learn discriminative representations and find critical characteristics of observed data. While earlier methods either employ too simple corruption distributions, i.e. uniform, yielding easy uninformative negatives or sophisticated adversarial distributions with challenging optimization schemes, they do not explicitly incorporate known graph structure resulting in suboptimal negatives. In this paper, we propose Structure Aware Negative Sampling (SANS), an inexpensive negative sampling strategy that utilizes the rich graph structure by selecting negative samples from a node's k-hop neighborhood. Empirically, we demonstrate that SANS finds high-quality negatives that are highly competitive with SOTA methods, and requires no additional parameters nor difficult adversarial optimization.

We present an error analysis for the discontinuous Galerkin method applied to the discrete-ordinate discretization of the steady-state radiative transfer equation. Under some mild assumptions, we show that the DG method converges uniformly with respect to a scaling parameter $\varepsilon$ which characterizes the strength of scattering in the system. However, the rate is not optimal and can be polluted by the presence of boundary layers. In one-dimensional slab geometries, we demonstrate optimal convergence when boundary layers are not present and analyze a simple strategy for balance interior and boundary layer errors. Some numerical tests are also provided in this reduced setting.

The theory of integral quadratic constraints (IQCs) allows the certification of exponential convergence of interconnected systems containing nonlinear or uncertain elements. In this work, we adapt the IQC theory to study first-order methods for smooth and strongly-monotone games and show how to design tailored quadratic constraints to get tight upper bounds of convergence rates. Using this framework, we recover the existing bound for the gradient method~(GD), derive sharper bounds for the proximal point method~(PPM) and optimistic gradient method~(OG), and provide \emph{for the first time} a global convergence rate for the negative momentum method~(NM) with an iteration complexity $\bigo(\kappa^{1.5})$, which matches its known lower bound. In addition, for time-varying systems, we prove that the gradient method with optimal step size achieves the fastest provable worst-case convergence rate with quadratic Lyapunov functions. Finally, we further extend our analysis to stochastic games and study the impact of multiplicative noise on different algorithms. We show that it is impossible for an algorithm with one step of memory to achieve acceleration if it only queries the gradient once per batch (in contrast with the stochastic strongly-convex optimization setting, where such acceleration has been demonstrated). However, we exhibit an algorithm which achieves acceleration with two gradient queries per batch.

The problem of monotone missing data has been broadly studied during the last two decades and has many applications in different fields such as bioinformatics or statistics. Commonly used imputation techniques require multiple iterations through the data before yielding convergence. Moreover, those approaches may introduce extra noises and biases to the subsequent modeling. In this work, we derive exact formulas and propose a novel algorithm to compute the maximum likelihood estimators (MLEs) of a multiple class, monotone missing dataset when all the covariance matrices of all categories are assumed to be equal, namely EPEM. We then illustrate an application of our proposed methods in Linear Discriminant Analysis (LDA). As the computation is exact, our EPEM algorithm does not require multiple iterations through the data as other imputation approaches, thus promising to handle much less time-consuming than other methods. This effectiveness was validated by empirical results when EPEM reduced the error rates significantly and required a short computation time compared to several imputation-based approaches. We also release all codes and data of our experiments in one GitHub repository to contribute to the research community related to this problem.

Self-interference (SI) is considered as a main challenge in full-duplex (FD) systems. Therefore, efficient SI cancelers are required for the influential deployment of FD systems in beyond fifth-generation wireless networks. Existing methods for SI cancellation have mostly considered the polynomial representation of the SI signal at the receiver. These methods are shown to operate well in practice while requiring high computational complexity. Alternatively, neural networks (NNs) are envisioned as promising candidates for modeling the SI signal with reduced computational complexity. Consequently, in this paper, two novel low complexity NN structures, referred to as the ladder-wise grid structure (LWGS) and moving-window grid structure (MWGS), are proposed. The core idea of these two structures is to mimic the non-linearity and memory effect introduced to the SI signal in order to achieve proper SI cancellation while exhibiting low computational complexity. The simulation results reveal that the LWGS and MWGS NN-based cancelers attain the same cancellation performance of the polynomial-based canceler while providing 49.87% and 34.19% complexity reduction, respectively.

Accurate forecasts of fine particulate matter (PM 2.5) from wildfire smoke are crucial to safeguarding cardiopulmonary public health. Existing forecasting systems are trained on sparse and inaccurate ground truths, and do not take sufficient advantage of important spatial inductive biases. In this work, we present a convolutional neural network which preserves sparsity invariance throughout, and leverages multitask learning to perform dense forecasts of PM 2.5values. We demonstrate that our model outperforms two existing smoke forecasting systems during the 2018 and 2019 wildfire season in British Columbia, Canada, predicting PM 2.5 at a grid resolution of 10 km, 24 hours in advance with high fidelity. Most interestingly, our model also generalizes to meaningful smoke dispersion patterns despite training with irregularly distributed ground truth PM 2.5 values available in only 0.5% of grid cells.

Recently we introduced a model of symbiosis, Model-S, based on the evolution of seed patterns in Conway's Game of Life. In the model, the fitness of a seed pattern is measured by one-on-one competitions in the Immigration Game, a two-player variation of the Game of Life. This article examines the role of autopoiesis in determining fitness in Model-S. We connect our research on evolution, symbiosis, and autopoiesis with research on soups in the Game of Life community. A soup is a random initial pattern in a cellular automaton, such as the Game of Life. When a game begins, there is usually a flare of rapid change in the soup, resembling a fire spreading through a forest. Eventually the fire burns down and the remaining patterns are called ash. Ashes are stable, oscillating, and flying patterns, studied intensively by the Game of Life community for insights into the game. Ashes are instances of autopoietic structures. We use the apgsearch software (Ash Pattern Generator Search), developed for the study of ash, to analyze autopoiesis in Model-S. We find that the fitness of evolved seed patterns in Model-S is highly correlated with the diversity and quantity of autopoietic structures (ash).

I give a brief, non-technical, historical perspective on numerical analysis and optimization. I also touch on emerging trends and future challenges. This content is based on the short presentation that I made at the opening ceremony of \emph{The International Conference on Numerical Analysis and Optimization}, which was held at Sultan Qaboos University, Muscat, Oman, on January 6--9, 2020. Of course, the material covered here is necessarily incomplete and biased towards my own interests and comfort zones. My aim is to give a feel for how the area has developed over the past few decades and how it may continue.

The process of simultaneously mapping the environment in three dimensional (3D) space and localizing a moving vehicle's pose (orientation and position) is termed Simultaneous Localization and Mapping (SLAM). SLAM is a core task in robotics applications. In the SLAM problem, each of the vehicle's pose and the environment are assumed to be completely unknown. This paper takes the conventional SLAM design as a basis and proposes a novel approach that ensures fast adaptation of the nonlinear observer for SLAM. Due to the fact that the true SLAM problem is nonlinear and is modeled on the Lie group of $\mathbb{SLAM}_{n}\left(3\right)$, the proposed observer for SLAM is nonlinear and modeled on $\mathbb{SLAM}_{n}\left(3\right)$. The proposed observer compensates for unknown bias attached to velocity measurements. The results of the simulation illustrate the robustness of the proposed approach.

We consider the method-of-moments approach to solve the Boltzmann equation of rarefied gas dynamics, which results in the following moment-closure problem. Given a set of moments, find the underlying probability density function. The moment-closure problem has infinitely many solutions and requires an additional optimality criterion to single-out a unique solution. Motivated from a discontinuous Galerkin velocity discretization, we consider an optimality criterion based upon L2-minimization. To ensure a positive solution to the moment-closure problem, we enforce positivity constraints on L2-minimization. This results in a quadratic optimization problem with moments and positivity constraints. We show that a (Courant-Friedrichs-Lewy) CFL-type condition ensures both the feasibility of the optimization problem and the L2-stability of the moment approximation. Numerical experiments showcase the accuracy of our moment method.

Non-intrusive methods have been used since two decades to derive reduced-order models for geometrically nonlinear structures, with a particular emphasis on the so-called STiffness Evaluation Procedure (STEP), relying on the static application of prescribed displacements in a finite-element context. We show that a particularly slow convergence of the modal expansion is observed when applying the method with 3D elements, because of nonlinear couplings occurring with very high frequency modes involving 3D thickness deformations. Focusing on the case of flat structures, we first show by computing all the modes of the structure that a converged solution can be exhibited by using either static condensation or normal form theory. We then show that static modal derivatives provide the same solution with fewer calculations. Finally, we propose a modified STEP, where the prescribed displacements are imposed solely on specific degrees of freedom of the structure, and show that this adjustment also provides efficiently a converged solution.

In the last decades, unsupervised deep learning based methods have caught researchers attention, since in many applications collecting a great amount of training examples is not always feasible. Moreover, the construction of a good training set is time consuming and hard because the selected data have to be enough representative for the task. In this paper, we mainly focus on the Deep Image Prior (DIP) framework powered by adding the Total Variation regularizer which promotes gradient-sparsity of the solution. Differently from other existing approaches, we solve the arising minimization problem by using the well known Alternating Direction Method of Multipliers (ADMM) framework, decoupling the contribution of the DIP $L_{2}$-norm and Total Variation terms. The promising performances of the proposed approach, in terms of PSNR and SSIM values, are addressed by means of experiments for different image restoration tasks on synthetic as well as on real data.

In contrast with previous approaches where information flows only towards deeper layers of a stack, we consider a multi-pass transformer (MPT) architecture in which earlier layers are allowed to process information in light of the output of later layers. To maintain a directed acyclic graph structure, the encoder stack of a transformer is repeated along a new multi-pass dimension, keeping the parameters tied, and information is allowed to proceed unidirectionally both towards deeper layers within an encoder stack and towards any layer of subsequent stacks. We consider both soft (i.e., continuous) and hard (i.e., discrete) connections between parallel encoder stacks, relying on a neural architecture search to find the best connection pattern in the hard case. We perform an extensive ablation study of the proposed MPT architecture and compare it with other state-of-the-art transformer architectures. Surprisingly, Base Transformer equipped with MPT can surpass the performance of Large Transformer on the challenging machine translation En-De and En-Fr datasets. In the hard connection case, the optimal connection pattern found for En-De also leads to improved performance for En-Fr.

We investigate the problem of monitoring multiple targets using a single mobile sensor, with the goal of minimizing the maximum estimation error among all the targets over long time horizons. The sensor can move in a network-constrained structure, where it has to plan which targets to visit and for how long to dwell at each node. We prove that in an optimal observation time allocation, the peak uncertainty is the same among all the targets. By further restricting the agent policy to only visit each target once every cycle, we develop a scheme to optimize the agent's behavior that is significantly simpler computationally when compared to previous approaches for similar problems.

C/C++/OpenCL-based high-level synthesis (HLS) becomes more and more popular for field-programmable gate array (FPGA) accelerators in many application domains in recent years, thanks to its competitive quality of result (QoR) and short development cycle compared with the traditional register-transfer level (RTL) design approach. Yet, limited by the sequential C semantics, it remains challenging to adopt the same highly productive high-level programming approach in many other application domains, where coarse-grained tasks run in parallel and communicate with each other at a fine-grained level. While current HLS tools support task-parallel programs, the productivity is greatly limited in the code development, correctness verification, and QoR tuning cycles, due to the poor programmability, restricted software simulation, and slow code generation, respectively. Such limited productivity often defeats the purpose of HLS and hinder programmers from adopting HLS for task-parallel FPGA accelerators. In this paper, we extend the HLS C++ language and present a fully automated framework with programmer-friendly interfaces, universal software simulation, and fast code generation to overcome these limitations. Experimental results based on a wide range of real-world task-parallel programs show that, on average, the lines of kernel and host code are reduced by 22% and 51%, respectively, which considerably improves the programmability. The correctness verification and the iterative QoR tuning cycles are both greatly accelerated by 3.2xand 6.8x, respectively.

What really sparked my interest was how certain parameters worked better at executing and optimization algorithm convergence even though the objective formula had no significant differences. Thus the research question stated: 'Which parameters provides an upmost optimal convergence solution of an Objective formula using the on-the-fly method?' This research was done in an experimental concept in which five different algorithms were tested with different objective functions to discover which parameter would result well for the best convergence. To find the correct parameter a method called 'on-the-fly' was applied. I run the experiments with five different optimization algorithms. One of the test runs showed that each parameter has an increasing or decreasing convergence accuracy towards the subjective function depending on which specific optimization algorithm you choose. Each parameter has an increasing or decreasing convergence accuracy toward the subjective function. One of the results in which evolutionary algorithm was applied with only the recombination technique did well at finding the best optimization. As well that some results have an increasing accuracy visualization by combing mutation or several parameters in one test performance. In conclusion, each algorithm has its own set of the parameter that converge differently. Also depending on the target formula that is used. This confirms that the fly method a suitable approach at finding the best parameter. This means manipulations and observe the effects in process to find the right parameter works as long as the learning cost rate decreases over time.

We determine the border ranks of tensors that could potentially advance the known upper bound for the exponent $\omega$ of matrix multiplication. The Kronecker square of the small $q=2$ Coppersmith-Winograd tensor equals the $3\times 3$ permanent, and could potentially be used to show $\omega=2$. We prove the negative result for complexity theory that its border rank is $16$, resolving a longstanding problem. Regarding its $q=4$ skew cousin in $C^5\otimes C^5\otimes C^5$, which could potentially be used to prove $\omega\leq 2.11$, we show the border rank of its Kronecker square is at most $42$, a remarkable sub-multiplicativity result, as the square of its border rank is $64$. We also determine moduli spaces $\underline{VSP}$ for the small Coppersmith-Winograd tensors.

Randomized SVD has become an extremely successful approach for efficiently computing a low-rank approximation of matrices. In particular the paper by Halko, Martinsson, and Tropp (SIREV 2011) contains extensive analysis, and has made it a very popular method. The typical complexity for a rank-$r$ approximation of $m\times n$ matrices is $O(mn\log n+(m+n)r^2)$ for dense matrices. The classical Nystr{\"o}m method is much faster, but applicable only to positive semidefinite matrices. This work studies a generalization of Nystr{\"o}m method applicable to general matrices, and shows that (i) it has near-optimal approximation quality comparable to competing methods, (ii) the computational cost is the near-optimal $O(mn\log n+r^3)$ for dense matrices, with small hidden constants, and (iii) crucially, it can be implemented in a numerically stable fashion despite the presence of an ill-conditioned pseudoinverse. Numerical experiments illustrate that generalized Nystr{\"o}m can significantly outperform state-of-the-art methods, especially when $r\gg 1$, achieving up to a 10-fold speedup. The method is also well suited to updating and downdating the matrix.

Strong presentation skills are valuable and sought-after in workplace and classroom environments alike. Of the possible improvements to vocal presentations, disfluencies and stutters in particular remain one of the most common and prominent factors of someone's demonstration. Millions of people are affected by stuttering and other speech disfluencies, with the majority of the world having experienced mild stutters while communicating under stressful conditions. While there has been much research in the field of automatic speech recognition and language models, there lacks the sufficient body of work when it comes to disfluency detection and recognition. To this end, we propose an end-to-end deep neural network, FluentNet, capable of detecting a number of different disfluency types. FluentNet consists of a Squeeze-and-Excitation Residual convolutional neural network which facilitate the learning of strong spectral frame-level representations, followed by a set of bidirectional long short-term memory layers that aid in learning effective temporal relationships. Lastly, FluentNet uses an attention mechanism to focus on the important parts of speech to obtain a better performance. We perform a number of different experiments, comparisons, and ablation studies to evaluate our model. Our model achieves state-of-the-art results by outperforming other solutions in the field on the publicly available UCLASS dataset. Additionally, we present LibriStutter: a disfluency dataset based on the public LibriSpeech dataset with synthesized stutters. We also evaluate FluentNet on this dataset, showing the strong performance of our model versus a number of benchmark techniques.

Deep neural networks (DNNs) have proven to be powerful tools for processing unstructured data. However for high-dimensional data, like images, they are inherently vulnerable to adversarial attacks. Small almost invisible perturbations added to the input can be used to fool DNNs. Various attacks, hardening methods and detection methods have been introduced in recent years. Notoriously, Carlini-Wagner (CW) type attacks computed by iterative minimization belong to those that are most difficult to detect. In this work, we demonstrate that such iterative minimization attacks can by used as detectors themselves. Thus, in some sense we show that one can fight fire with fire. This work also outlines a mathematical proof that under certain assumptions this detector provides asymptotically optimal separation of original and attacked images. In numerical experiments, we obtain AUROC values up to 99.73% for our detection method. This distinctly surpasses state of the art detection rates for CW attacks from the literature. We also give numerical evidence that our method is robust against the attacker's choice of the method of attack.

Reinforcement learning algorithms solve sequential decision-making problems in probabilistic environments by optimizing for long-term reward. The desire to use reinforcement learning in safety-critical settings inspires a recent line of work on formally constrained reinforcement learning; however, these methods place the implementation of the learning algorithm in their Trusted Computing Base. The crucial correctness property of these implementations is a guarantee that the learning algorithm converges to an optimal policy. This paper begins the work of closing this gap by developing a Coq formalization of two canonical reinforcement learning algorithms: value and policy iteration for finite state Markov decision processes. The central results are a formalization of Bellman's optimality principle and its proof, which uses a contraction property of Bellman optimality operator to establish that a sequence converges in the infinite horizon limit. The CertRL development exemplifies how the Giry monad and mechanized metric coinduction streamline optimality proofs for reinforcement learning algorithms. The CertRL library provides a general framework for proving properties about Markov decision processes and reinforcement learning algorithms, paving the way for further work on formalization of reinforcement learning algorithms.

A bioinformatics researcher and a game design researcher walk into a lab... This paper shares two case-studies of a collaboration between a bioinformatics researcher who is developing a set of educational VR simulations for youth and a consultative game design researcher with a background in games User Research (GUR) techniques who assesses and iteratively improves the player experience in the simulations. By introducing games-based player engagement strategies, the two researchers improve the (re)playability of these VR simulations to encourage greater player engagement and retention.

In this work, we develop a novel fairness learning approach for multi-task regression models based on a biased training dataset, using a popular rank-based non-parametric independence test, i.e., Mann Whitney U statistic, for measuring the dependency between target variable and protected variables. To solve this learning problem efficiently, we first reformulate the problem as a new non-convex optimization problem, in which a non-convex constraint is defined based on group-wise ranking functions of individual objects. We then develop an efficient model-training algorithm based on the framework of non-convex alternating direction method of multipliers (NC-ADMM), in which one of the main challenges is to implement an efficient projection oracle to the preceding non-convex set defined based on ranking functions. Through the extensive experiments on both synthetic and real-world datasets, we validated the out-performance of our new approach against several state-of-the-art competitive methods on several popular metrics relevant to fairness learning.

We study fairness in supervised few-shot meta-learning models that are sensitive to discrimination (or bias) in historical data. A machine learning model trained based on biased data tends to make unfair predictions for users from minority groups. Although this problem has been studied before, existing methods mainly aim to detect and control the dependency effect of the protected variables (e.g. race, gender) on target prediction based on a large amount of training data. These approaches carry two major drawbacks that (1) lacking showing a global cause-effect visualization for all variables; (2) lacking generalization of both accuracy and fairness to unseen tasks. In this work, we first discover discrimination from data using a causal Bayesian knowledge graph which not only demonstrates the dependency of the protected variable on target but also indicates causal effects between all variables. Next, we develop a novel algorithm based on risk difference in order to quantify the discriminatory influence for each protected variable in the graph. Furthermore, to protect prediction from unfairness, a fast-adapted bias-control approach in meta-learning is proposed, which efficiently mitigates statistical disparity for each task and it thus ensures independence of protected attributes on predictions based on biased and few-shot data samples. Distinct from existing meta-learning models, group unfairness of tasks are efficiently reduced by leveraging the mean difference between (un)protected groups for regression problems. Through extensive experiments on both synthetic and real-world data sets, we demonstrate that our proposed unfairness discovery and prevention approaches efficiently detect discrimination and mitigate biases on model output as well as generalize both accuracy and fairness to unseen tasks with a limited amount of training samples.

Forecasting influenza in a timely manner aids health organizations and policymakers in adequate preparation and decision making. However, effective influenza forecasting still remains a challenge despite increasing research interest. It is even more challenging amidst the COVID pandemic, when the influenza-like illness (ILI) counts is affected by various factors such as symptomatic similarities with COVID-19 and shift in healthcare seeking patterns of the general population. We term the ILI values observed when it is potentially affected as COVID-ILI. Under the current pandemic, historical influenza models carry valuable expertise about the disease dynamics but face difficulties adapting. Therefore, we propose CALI-NET, a neural transfer learning architecture which allows us to 'steer' a historical disease forecasting model to new scenarios where flu and COVID co-exist. Our framework enables this adaptation by automatically learning when it is should emphasize learning from COVID-related signals and when from the historical model. In such way, we exploit representations learned from historical ILI data as well as the limited COVID-related signals. Our experiments demonstrate that our approach is successful in adapting a historical forecasting model to the current pandemic. In addition, we show that success in our primary goal, adaptation, does not sacrifice overall performance as compared with state-of-the-art influenza forecasting approaches.

The success of deep learning relies on the availability of large-scale annotated data sets, the acquisition of which can be costly, requiring expert domain knowledge. Semi-supervised learning (SSL) mitigates this challenge by exploiting the behavior of the neural function on large unlabeled data. The smoothness of the neural function is a commonly used assumption exploited in SSL. A successful example is the adoption of mixup strategy in SSL that enforces the global smoothness of the neural function by encouraging it to behave linearly when interpolating between training examples. Despite its empirical success, however, the theoretical underpinning of how mixup regularizes the neural function has not been fully understood. In this paper, we offer a theoretically substantiated proposition that mixup improves the smoothness of the neural function by bounding the Lipschitz constant of the gradient function of the neural networks. We then propose that this can be strengthened by simultaneously constraining the Lipschitz constant of the neural function itself through adversarial Lipschitz regularization, encouraging the neural function to behave linearly while also constraining the slope of this linear function. On three benchmark data sets and one real-world biomedical data set, we demonstrate that this combined regularization results in improved generalization performance of SSL when learning from a small amount of labeled data. We further demonstrate the robustness of the presented method against single-step adversarial attacks. Our code is available at https://github.com/Prasanna1991/Mixup-LR.

Visual analysis of temporal networks comprises an effective way to understand the network dynamics, facilitating the identification of patterns, anomalies, and other network properties, thus resulting in fast decision making. The amount of data in real-world networks, however, may result in a layout with high visual clutter due to edge overlapping. This is particularly relevant in the so-called streaming networks, in which edges are continuously arriving (online) and in non-stationary distribution. All three network dimensions, namely node, edge, and time, can be manipulated to reduce such clutter and improve readability. This paper presents an online and nonuniform timeslicing method, thus considering the underlying network structure and addressing streaming network analyses. We conducted experiments using two real-world networks to compare our method against uniform and nonuniform timeslicing strategies. The results show that our method automatically selects timeslices that effectively reduce visual clutter in periods with bursts of events. As a consequence, decision making based on the identification of global temporal patterns becomes faster and more reliable.

We describe an approach to task-oriented dialogue in which dialogue state is represented as a dataflow graph. A dialogue agent maps each user utterance to a program that extends this graph. Programs include metacomputation operators for reference and revision that reuse dataflow fragments from previous turns. Our graph-based state enables the expression and manipulation of complex user intents, and explicit metacomputation makes these intents easier for learned models to predict. We introduce a new dataset, SMCalFlow, featuring complex dialogues about events, weather, places, and people. Experiments show that dataflow graphs and metacomputation substantially improve representability and predictability in these natural dialogues. Additional experiments on the MultiWOZ dataset show that our dataflow representation enables an otherwise off-the-shelf sequence-to-sequence model to match the best existing task-specific state tracking model. The SMCalFlow dataset and code for replicating experiments are available at https://www.microsoft.com/en-us/research/project/dataflow-based-dialogue-semantic-machines.

Person re-identification (Re-ID) is a challenging task as persons are often in different backgrounds. Most recent Re-ID methods treat the foreground and background information equally for person discriminative learning, but can easily lead to potential false alarm problems when different persons are in similar backgrounds or the same person is in different backgrounds. In this paper, we propose a Foreground-Guided Texture-Focused Network (FTN) for Re-ID, which can weaken the representation of unrelated background and highlight the attributes person-related in an end-to-end manner. FTN consists of a semantic encoder (S-Enc) and a compact foreground attention module (CFA) for Re-ID task, and a texture-focused decoder (TF-Dec) for reconstruction task. Particularly, we build a foreground-guided semi-supervised learning strategy for TF-Dec because the reconstructed ground-truths are only the inputs of FTN weighted by the Gaussian mask and the attention mask generated by CFA. Moreover, a new gradient loss is introduced to encourage the network to mine the texture consistency between the inputs and the reconstructed outputs. Our FTN is computationally efficient and extensive experiments on three commonly used datasets Market1501, CUHK03 and MSMT17 demonstrate that the proposed method performs favorably against the state-of-the-art methods.

Petrographic analysis based on microfacies identification in thin sections is widely used in sedimentary environment interpretation and paleoecological reconstruction. Fossil recognition from microfacies is an essential procedure for petrographers to complete this task. Distinguishing the morphological and microstructural diversity of skeletal fragments requires extensive prior knowledge of fossil morphotypes in microfacies and long training sessions under the microscope. This requirement engenders certain challenges for sedimentologists and paleontologists, especially novices. However, a machine classifier can help address this challenge. We collected a microfacies image dataset comprising both public data from 1,149 references and our own materials (including a total of 30,815 images of 22 fossil and abiotic grain groups). We employed a high-performance workstation to implement four classic deep convolutional neural networks (DCNNs), which have proven to be highly efficient in computer vision over the last several years. Our framework uses a transfer learning technique, which reuses the pre-trained parameters that are trained on a larger ImageNet dataset as initialization for the network to achieve high accuracy with low computing costs. We obtained up to 95% of the top one and 99% of the top three test accuracies in the Inception ResNet v2 architecture. The machine classifier exhibited 0.99 precision on minerals, such as dolomite and pyrite. Although it had some difficulty on samples having similar morphologies, such as the bivalve, brachiopod, and ostracod, it nevertheless obtained 0.88 precision. Our machine learning framework demonstrated high accuracy with reproducibility and bias avoidance that was comparable to those of human classifiers. Its application can thus eliminate much of the tedious, manually intensive efforts by human experts conducting routine identification.

Wildlife monitoring is crucial to nature conservation and has been done by manual observations from motion-triggered camera traps deployed in the field. Widespread adoption of such in-situ sensors has resulted in unprecedented data volumes being collected over the last decade. A significant challenge exists to process and reliably identify what is in these images efficiently. Advances in computer vision are poised to provide effective solutions with custom AI models built to automatically identify images of interest and label the species in them. Here we outline the data unification effort for the Wildlife Insights platform from various conservation partners, and the challenges involved. Then we present an initial deep convolutional neural network model, trained on 2.9M images across 465 fine-grained species, with a goal to reduce the load on human experts to classify species in images manually. The long-term goal is to enable scientists to make conservation recommendations from near real-time analysis of species abundance and population health.

Maximum Independent Set ({MaxIS}) problem is a fundamental problem in graph theory, which is NP-hard. Since the underlying graphs are always changing in numerous applications, computing a {MaxIS} over dynamic graphs has received increasing attention in recent years. Due to the intractability to compute an exact MaxIS or its good approximation over dynamic graphs, this paper studies the problem to maintain a high-quality independent set, which is an approximation of the MaxIS, over dynamic graphs. A framework based on swap operations for resolving this problem is presented and two concrete update algorithms based on one-swappable vertices and two-swappble vertex pairs are designed. Both algorithms can compute high-quality independent sets over dynamic graphs with $O(\Delta^3)$ time for general graphs and with $O(1)$ time for bounded-degree graphs. Moreover, the lower bound of the size of the solution maintained by our algorithms is derived if there is no swappable vertex in it. Then the algorithms are extended under \textit{semi-external} setting to maintained a high-quality independent set with limited memory consumption. Extensive experiments are conducted over real graphs to confirm the effectiveness and efficiency of the proposed methods.

The system we used for Task 6 (Automated Audio Captioning)of the Detection and Classification of Acoustic Scenes and Events(DCASE) 2020 Challenge combines three elements, namely, dataaugmentation, multi-task learning, and post-processing, for audiocaptioning. The system received the highest evaluation scores, butwhich of the individual elements most fully contributed to its perfor-mance has not yet been clarified. Here, to asses their contributions,we first conducted an element-wise ablation study on our systemto estimate to what extent each element is effective. We then con-ducted a detailed module-wise ablation study to further clarify thekey processing modules for improving accuracy. The results showthat data augmentation and post-processing significantly improvethe score in our system. In particular, mix-up data augmentationand beam search in post-processing improve SPIDEr by 0.8 and 1.6points, respectively.

We propose a new hybrid clock distribution scheme that uses global current-mode (CM) and local voltage-mode (VM) clocking to distribute a high-performance clock signal with reduced power consumption. In order to enable hybrid clocking, we propose two new current-to-voltage converters. The converters are simple current receiver circuits based on amplifier and current-mirror circuits. The global clocking is bufferless and relies on current rather than voltage, which reduces the jitter. The local VM network improves compatibility with traditional CMOS logic. The hybrid clock distribution network exhibits 29% lower average power and 54% lower jitter-induced skew in a symmetric network compared to traditional VM clocks. To use hybrid clocking efficiently, we present a methodology to identify the optimal cluster size and the number of required receiver circuits, which we demonstrate using the ISPD 2009, ISPD 2010, and ISCAS89 testbench networks. At 1--2GHz clock frequency, the proposed methodology uses up to 45% and 42% lower power compared to a synthesized buffered VM scheme using ISPD 2009 and ISPD 2010 testbenches, respectively. In addition, the proposed hybrid clocking scheme saves up to 50% and 59% of power compared to a buffered scheme using the ISCAS89 benchmark circuit at 1GHz and 2GHz clock frequency, respectively.

Spin-Transfer Torque RAM (STTRAM) is a promising alternative to SRAM in on-chip caches due to several advantages. These advantages include non-volatility, low leakage, high integration density, and CMOS compatibility. Prior studies have shown that relaxing and adapting the STTRAM retention time to runtime application needs can substantially reduce overall cache energy without significant latency overheads, due to the lower STTRAM write energy and latency in shorter retention times. In this paper, as a first step towards efficient prefetching across the STTRAM cache hierarchy, we study prefetching in reduced retention STTRAM L1 caches. Using SPEC CPU 2017 benchmarks, we analyze the energy and latency impact of different prefetch distances in different STTRAM cache retention times for different applications. We show that expired_unused_prefetches---the number of unused prefetches expired by the reduced retention time STTRAM cache---can accurately determine the best retention time for energy consumption and access latency. This new metric can also provide insights into the best prefetch distance for memory bandwidth consumption and prefetch accuracy. Based on our analysis and insights, we propose Prefetch-Aware Retention time Tuning (PART) and Retention time-based Prefetch Control (RPC). Compared to a base STTRAM cache, PART and RPC collectively reduced the average cache energy and latency by 22.24% and 24.59%, respectively. When the base architecture was augmented with the state-of-the-art near-side prefetch throttling (NST), PART+RPC reduced the average cache energy and latency by 3.50% and 3.59%, respectively, and reduced the hardware overhead by 54.55%

Artificial Neural Networks are uniquely adroit at machine learning by processing data through a network of artificial neurons. The inter-neuronal connection weights represent the learnt Neural Program that instructs the network on how to compute the data. However, without an external memory to store Neural Programs, they are restricted to only one, overwriting learnt programs when trained on new data. This is functionally equivalent to a special-purpose computer. Here we design Neurocoder, an entirely new class of general-purpose conditional computational machines in which the neural network "codes" itself in a data-responsive way by composing relevant programs from a set of shareable, modular programs. This can be considered analogous to building Lego structures from simple Lego bricks. Notably, our bricks change their shape through learning. External memory is used to create, store and retrieve modular programs. Like today's stored-program computers, Neurocoder can now access diverse programs to process different data. Unlike manually crafted computer programs, Neurocoder creates programs through training. Integrating Neurocoder into current neural architectures, we demonstrate new capacity to learn modular programs, handle severe pattern shifts and remember old programs as new ones are learnt, and show substantial performance improvement in solving object recognition, playing video games and continual learning tasks. Such integration with Neurocoder increases the computation capability of any current neural network and endows it with entirely new capacity to reuse simple programs to build complex ones. For the first time a Neural Program is treated as a datum in memory, paving the ways for modular, recursive and procedural neural programming.

A technique for object localization based on pose estimation and camera calibration is presented. The 3-dimensional (3D) coordinates are estimated by collecting multiple 2-dimensional (2D) images of the object and are utilized for the calibration of the camera. The calibration steps involving a number of parameter calculation including intrinsic and extrinsic parameters for the removal of lens distortion, computation of object's size and camera's position calculation are discussed. A transformation strategy to estimate the 3D pose using the 2D images is presented. The proposed method is implemented on MATLAB and validation experiments are carried out for both pose estimation and camera calibration.

We examine a control problem where the states of the components of a system deteriorate after a disruption, if they are not being repaired by an entity. There exist a set of dependencies in the form of precedence constraints between the components, captured by a directed acyclic graph (DAG). The objective of the entity is to maximize the number of components whose states are brought back to the fully repaired state within a given time. We prove that the general problem is NP-hard, and therefore we characterize near-optimal control policies for special instances of the problem. We show that when the deterioration rates are larger than or equal to the repair rates and the precedence constraints are given by a DAG, it is optimal to continue repairing a component until its state reaches the fully recovered state before switching to repair any other component. Under the aforementioned assumptions and when the deterioration and the repair rates are homogeneous across all the components, we prove that the control policy that targets the healthiest component at each time-step while respecting the precedence and time constraints fully repairs at least half the number of components that would be fully repaired by an optimal policy. Finally, we prove that when the repair rates are sufficiently larger than the deterioration rates, the precedence constraints are given by a set of disjoint trees that each contain at most k nodes, and there is no time constraint, the policy that targets the component with the least value of health minus the deterioration rate at each time-step while respecting the precedence constraints fully repairs at least 1/k times the number of components that would be fully repaired by an optimal policy.

Nonlinear dynamics of spiking neural networks has recently attracted much interest as an approach to understand possible information processing in the brain and apply it to artificial intelligence. Since information can be processed by collective spiking dynamics of neurons, the fine control of spiking dynamics is desirable for neuromorphic devices. Here we show that photonic spiking neurons implemented with paired nonlinear optical oscillators can be controlled to generate two modes of bio-realistic spiking dynamics by changing the optical pump amplitude. When they are coupled in a network, we found that the interaction between the photonic neurons induces an effective change in the pump amplitude depending on the order parameter that characterizes synchronization. The experimental results show that the effective change causes spontaneous modification of the spiking modes and firing rates of clustered neurons, and such collective dynamics can be utilized to realize efficient heuristics for solving NP-hard combinatorial optimization problems.

Packet Compressed Sensing Imaging (PCSI) is digital unconnected image transmission method resilient to packet loss. The goal is to develop a robust image transmission method that is computationally trivial to transmit (e.g., compatible with low-power 8-bit microcontrollers) and well suited for weak signal environments where packets are likely to be lost. In other image transmission techniques, noise and packet loss leads to parts of the image being distorted or missing. In PCSI, every packet contains random pixel information from the entire image, and each additional packet received (in any order) simply enhances image quality. Satisfactory SSTV resolution (320x240 pixel) images can be received in ~1-2 minutes when transmitted at 1200 baud AFSK, which is on par with analog SSTV transmission time. Image transmission and reception can occur simultaneously on a computer, and multiple images can be received from multiple stations simultaneously - allowing for the creation of "image nets." This paper presents a simple computer application for Windows, Mac, and Linux that implements PCSI transmission and reception on any KISS compatible hardware or software modem on any band and digital mode.

With computer vision reaching an inflection point in the past decade, face recognition technology has become pervasive in policing, intelligence gathering, and consumer applications. Recently, face recognition technology has been deployed on bodyworn cameras to keep officers safe, enabling situational awareness and providing evidence for trial. However, limited academic research has been conducted on this topic using traditional techniques on datasets with small sample size. This paper aims to bridge the gap in the state-of-the-art face recognition using bodyworn cameras (BWC). To this aim, the contribution of this work is two-fold: (1) collection of a dataset called BWCFace consisting of a total of 178K facial images of 132 subjects captured using the body-worn camera in in-door and daylight conditions, and (2) open-set evaluation of the latest deep-learning-based Convolutional Neural Network (CNN) architectures combined with five different loss functions for face identification, on the collected dataset. Experimental results on our BWCFace dataset suggest a maximum of 33.89% Rank-1 accuracy obtained when facial features are extracted using SENet-50 trained on a large scale VGGFace2 facial image dataset. However, performance improved up to a maximum of 99.00% Rank-1 accuracy when pretrained CNN models are fine-tuned on a subset of identities in our BWCFace dataset. Equivalent performances were obtained across body-worn camera sensor models used in existing face datasets. The collected BWCFace dataset and the pretrained/ fine-tuned algorithms are publicly available to promote further research and development in this area. A downloadable link of this dataset and the algorithms is available by contacting the authors.

Uncertain partially observable Markov decision processes (uPOMDPs) allow the probabilistic transition and observation functions of standard POMDPs to belong to a so-called uncertainty set. Such uncertainty sets capture uncountable sets of probability distributions. We develop an algorithm to compute finite-memory policies for uPOMDPs that robustly satisfy given specifications against any admissible distribution. In general, computing such policies is both theoretically and practically intractable. We provide an efficient solution to this problem in four steps. (1) We state the underlying problem as a nonconvex optimization problem with infinitely many constraints. (2) A dedicated dualization scheme yields a dual problem that is still nonconvex but has finitely many constraints. (3) We linearize this dual problem and (4) solve the resulting finite linear program to obtain locally optimal solutions to the original problem. The resulting problem formulation is exponentially smaller than those resulting from existing methods. We demonstrate the applicability of our algorithm using large instances of an aircraft collision-avoidance scenario and a novel spacecraft motion planning case study.

Pretrained neural language models (LMs) are prone to generating racist, sexist, or otherwise toxic language which hinders their safe deployment. We investigate the extent to which pretrained LMs can be prompted to generate toxic language, and the effectiveness of controllable text generation algorithms at preventing such toxic degeneration. We create and release RealToxicityPrompts, a dataset of 100K naturally occurring, sentence-level prompts derived from a large corpus of English web text, paired with toxicity scores from a widely-used toxicity classifier. Using RealToxicityPrompts, we find that pretrained LMs can degenerate into toxic text even from seemingly innocuous prompts. We empirically assess several controllable generation methods, and find that while data- or compute-intensive methods (e.g., adaptive pretraining on non-toxic data) are more effective at steering away from toxicity than simpler solutions (e.g., banning "bad" words), no current method is failsafe against neural toxic degeneration. To pinpoint the potential cause of such persistent toxic degeneration, we analyze two web text corpora used to pretrain several LMs (including GPT-2; Radford et. al, 2019), and find a significant amount of offensive, factually unreliable, and otherwise toxic content. Our work provides a test bed for evaluating toxic generations by LMs and stresses the need for better data selection processes for pretraining.

Most of the prior work in massively parallel data processing assumes homogeneity, i.e., every computing unit has the same computational capability, and can communicate with every other unit with the same latency and bandwidth. However, this strong assumption of a uniform topology rarely holds in practical settings, where computing units are connected through complex networks. To address this issue, Blanas et al. recently proposed a topology-aware massively parallel computation model that integrates the network structure and heterogeneity in the modeling cost. The network is modeled as a directed graph, where each edge is associated with a cost function that depends on the data transferred between the two endpoints. The computation proceeds in synchronous rounds, and the cost of each round is measured as the maximum cost over all the edges in the network.

In this work, we take the first step into investigating three fundamental data processing tasks in this topology-aware parallel model: set intersection, cartesian product, and sorting. We focus on network topologies that are tree topologies, and present both lower bounds, as well as (asymptotically) matching upper bounds. The optimality of our algorithms is with respect to the initial data distribution among the network nodes, instead of assuming worst-case distribution as in previous results. Apart from the theoretical optimality of our results, our protocols are simple, use a constant number of rounds, and we believe can be implemented in practical settings as well.

We present the design of a low-cost wheeled mobile robot, and an analytical model for predicting its motion under the influence of motor torques and friction forces. Using our proposed model, we show how to analytically compute the gradient of an appropriate loss function, that measures the deviation between predicted motion trajectories and real-world trajectories, which are estimated using Apriltags and an overhead camera. These analytical gradients allow us to automatically infer the unknown friction coefficients, by minimizing the loss function using gradient descent. Motion trajectories that are predicted by the optimized model are in excellent agreement with their real-world counterparts. Experiments show that our proposed approach is computationally superior to existing black-box system identification methods and other data-driven techniques, and also requires very few real-world samples for accurate trajectory prediction. The proposed approach combines the data efficiency of analytical models based on first principles, with the flexibility of data-driven methods, which makes it appropriate for low-cost robots. Using the learned model and our gradient-based optimization approach, we show how to automatically compute motor control signals for driving the robot along pre-specified curves.

We propose a framework based on Recurrent Neural Networks (RNNs) to determine an optimal control strategy for a discrete-time system that is required to satisfy specifications given as Signal Temporal Logic (STL) formulae. RNNs can store information of a system over time, thus, enable us to determine satisfaction of the dynamic temporal requirements specified in STL formulae. Given a STL formula, a dataset of satisfying system executions and corresponding control policies, we can use RNNs to predict a control policy at each time based on the current and previous states of system. We use Control Barrier Functions (CBFs) to guarantee the safety of the predicted control policy. We validate our theoretical formulation and demonstrate its performance in an optimal control problem subject to partially unknown safety constraints through simulations.

Graph convolutional networks (GCNs) have achieved promising performance on various graph-based tasks. However they suffer from over-smoothing when stacking more layers. In this paper, we present a quantitative study on this observation and develop novel insights towards the deeper GCN. First, we interpret the current graph convolutional operations from an optimization perspective and argue that over-smoothing is mainly caused by the naive first-order approximation of the solution to the optimization problem. Subsequently, we introduce two metrics to measure the over-smoothing on node-level tasks. Specifically, we calculate the fraction of the pairwise distance between connected and disconnected nodes to the overall distance respectively. Based on our theoretical and empirical analysis, we establish a universal theoretical framework of GCN from an optimization perspective and derive a novel convolutional kernel named GCN+ which has lower parameter amount while relieving the over-smoothing inherently. Extensive experiments on real-world datasets demonstrate the superior performance of GCN+ over state-of-the-art baseline methods on the node classification tasks.

Ancient Chinese is the essence of Chinese culture. There are several natural language processing tasks of ancient Chinese domain, such as ancient-modern Chinese translation, poem generation, and couplet generation. Previous studies usually use the supervised models which deeply rely on parallel data. However, it is difficult to obtain large-scale parallel data of ancient Chinese. In order to make full use of the more easily available monolingual ancient Chinese corpora, we release AnchiBERT, a pre-trained language model based on the architecture of BERT, which is trained on large-scale ancient Chinese corpora. We evaluate AnchiBERT on both language understanding and generation tasks, including poem classification, ancient-modern Chinese translation, poem generation, and couplet generation. The experimental results show that AnchiBERT outperforms BERT as well as the non-pretrained models and achieves state-of-the-art results in all cases.

We propose two new criteria to understand the advantage of deepening neural networks. It is important to know the expressivity of functions computable by deep neural networks in order to understand the advantage of deepening neural networks. Unless deep neural networks have enough expressivity, they cannot have good performance even though learning is successful. In this situation, the proposed criteria contribute to understanding the advantage of deepening neural networks since they can evaluate the expressivity independently from the efficiency of learning. The first criterion shows the approximation accuracy of deep neural networks to the target function. This criterion has the background that the goal of deep learning is approximating the target function by deep neural networks. The second criterion shows the property of linear regions of functions computable by deep neural networks. This criterion has the background that deep neural networks whose activation functions are piecewise linear are also piecewise linear. Furthermore, by the two criteria, we show that to increase layers is more effective than to increase units at each layer on improving the expressivity of deep neural networks.

Should social media platforms intervene when communities repeatedly break rules? What actions can they consider? In light of this hotly debated issue, platforms have begun experimenting with softer alternatives to outright bans. We examine one such intervention called quarantining, that impedes direct access to and promotion of controversial communities. Specifically, we present two case studies of what happened when Reddit quarantined the influential communities r/TheRedPill (TRP) and r/The_Donald (TD). Working with over 85M Reddit posts, we apply causal inference methods to examine the quarantine's effects on TRP and TD. We find that the quarantine made it more difficult to recruit new members: new user influx to TRP and TD decreased by 79.5% and 58%, respectively. Despite quarantining, existing users' misogyny and racism levels remained unaffected. We conclude by reflecting on the effectiveness of this design friction in limiting the influence of toxic communities and discuss broader implications for content moderation.

Cybersecurity tools are increasingly automated with artificial intelligent (AI) capabilities to match the exponential scale of attacks, compensate for the relatively slower rate of training new cybersecurity talents, and improve of the accuracy and performance of both tools and users. However, the safe and appropriate usage of autonomous cyber attack tools - especially at the development stages for these tools - is still largely an unaddressed gap. Our survey of current literature and tools showed that most of the existing cyber range designs are mostly using manual tools and have not considered augmenting automated tools or the potential security issues caused by the tools. In other words, there is still room for a novel cyber range design which allow security researchers to safely deploy autonomous tools and perform automated tool testing if needed. In this paper, we introduce Pandora, a safe testing environment which allows security researchers and cyber range users to perform experiments on automated cyber attack tools that may have strong potential of usage and at the same time, a strong potential for risks. Unlike existing testbeds and cyber ranges which have direct compatibility with enterprise computer systems and the potential for risk propagation across the enterprise network, our test system is intentionally designed to be incompatible with enterprise real-world computing systems to reduce the risk of attack propagation into actual infrastructure. Our design also provides a tool to convert in-development automated cyber attack tools into to executable test binaries for validation and usage realistic enterprise system environments if required. Our experiments tested automated attack tools on our proposed system to validate the usability of our proposed environment. Our experiments also proved the safety of our environment by compatibility testing using simple malicious code.

Word embeddings can reflect the semantic representations, and the embedding qualities can be comprehensively evaluated with human natural reading-related cognitive data sources. In this paper, we proposed the CogniFNN framework, which is the first attempt at using fuzzy neural networks to extract non-linear and non-stationary characteristics for evaluations of English word embeddings against the corresponding cognitive datasets. In our experiment, we used 15 human cognitive datasets across three modalities: EEG, fMRI, and eye-tracking, and selected the mean square error and multiple hypotheses testing as metrics to evaluate our proposed CogniFNN framework. Compared to the recent pioneer framework, our proposed CogniFNN showed smaller prediction errors of both context-independent (GloVe) and context-sensitive (BERT) word embeddings, and achieved higher significant ratios with randomly generated word embeddings. Our findings suggested that the CogniFNN framework could provide a more accurate and comprehensive evaluation of cognitive word embeddings. It will potentially be beneficial to the further word embeddings evaluation on extrinsic natural language processing tasks.

Automated gender classification has important applications in many domains, such as demographic research, law enforcement, online advertising, as well as human-computer interaction. Recent research has questioned the fairness of this technology across gender and race. Specifically, the majority of the studies raised the concern of higher error rates of the face-based gender classification system for darker-skinned people like African-American and for women. However, to date, the majority of existing studies were limited to African-American and Caucasian only. The aim of this paper is to investigate the differential performance of the gender classification algorithms across gender-race groups. To this aim, we investigate the impact of (a) architectural differences in the deep learning algorithms and (b) training set imbalance, as a potential source of bias causing differential performance across gender and race. Experimental investigations are conducted on two latest large-scale publicly available facial attribute datasets, namely, UTKFace and FairFace. The experimental results suggested that the algorithms with architectural differences varied in performance with consistency towards specific gender-race groups. For instance, for all the algorithms used, Black females (Black race in general) always obtained the least accuracy rates. Middle Eastern males and Latino females obtained higher accuracy rates most of the time. Training set imbalance further widens the gap in the unequal accuracy rates across all gender-race groups. Further investigations using facial landmarks suggested that facial morphological differences due to the bone structure influenced by genetic and environmental factors could be the cause of the least performance of Black females and Black race, in general.

Model discovery based on existing data has been one of the major focuses of mathematical modelers for decades. Despite tremendous achievements of model identification from adequate data, how to unravel the models from limited data is less resolved. In this paper, we focus on the model discovery problem when the data is not efficiently sampled in time. This is common due to limited experimental accessibility and labor/resource constraints. Specifically, we introduce a recursive deep neural network (RDNN) for data-driven model discovery. This recursive approach can retrieve the governing equation in a simple and efficient manner, and it can significantly improve the approximation accuracy by increasing the recursive stages. In particular, our proposed approach shows superior power when the existing data are sampled with a large time lag, from which the traditional approach might not be able to recover the model well. Several widely used examples of dynamical systems are used to benchmark this newly proposed recursive approach. Numerical comparisons confirm the effectiveness of this recursive neural network for model discovery.

The Common Vulnerabilities and Exposures (CVE) represent standard means for sharing publicly known information security vulnerabilities. One or more CVEs are grouped into the Common Weakness Enumeration (CWE) classes for the purpose of understanding the software or configuration flaws and potential impacts enabled by these vulnerabilities and identifying means to detect or prevent exploitation. As the CVE-to-CWE classification is mostly performed manually by domain experts, thousands of critical and new CVEs remain unclassified, yet they are unpatchable. This significantly limits the utility of CVEs and slows down proactive threat mitigation. This paper presents the first automatic tool to classify CVEs to CWEs. ThreatZoom uses a novel learning algorithm that employs an adaptive hierarchical neural network which adjusts its weights based on text analytic scores and classification errors. It automatically estimates the CWE classes corresponding to a CVE instance using both statistical and semantic features extracted from the description of a CVE. This tool is rigorously tested by various datasets provided by MITRE and the National Vulnerability Database (NVD). The accuracy of classifying CVE instances to their correct CWE classes are 92% (fine-grain) and 94% (coarse-grain) for NVD dataset, and 75% (fine-grain) and 90% (coarse-grain) for MITRE dataset, despite the small corpus.

This thesis deals with the investigation of a H(div)-conforming hybrid discontinuous Galerkin discretization for incompressible turbulent flows. The discretization method provides many physical and solving-oriented properties, which may be advantageous for resolving computationally intensive turbulent structures. A standard continuous Galerkin discretization for the Navier-Stokes equations with the well-known Taylor-Hood elements is also introduced in order to provide a comparison. The four different main principles of simulating turbulent flows are explained: the Reynolds-averaged Navier-Stokes simulation, large eddy simulation, variational multiscale method and the direct numerical simulation. The large eddy simulation and variational multiscale have shown good promise in the computation of traditionally difficult turbulent cases. This accuracy can be only surpassed by directly solving the Navier-Stokes equations, but comes with excessively high computational costs. The very common strategy is the Reynolds-average approach, since it is the most cost-effective. Those modelling principles have been applied to the two discretization techniques and validated through the basic plane channel flow test case. All numerical tests have been conducted with the finite element library Netgen/NGSolve.

Automatic math word problem solving has attracted growing attention in recent years. The evaluation datasets used by previous works have serious limitations in terms of scale and diversity. In this paper, we release a new large-scale and template-rich math word problem dataset named Ape210K. It consists of 210K Chinese elementary school-level math problems, which is 9 times the size of the largest public dataset Math23K. Each problem contains both the gold answer and the equations needed to derive the answer. Ape210K is also of greater diversity with 56K templates, which is 25 times more than Math23K. Our analysis shows that solving Ape210K requires not only natural language understanding but also commonsense knowledge. We expect Ape210K to be a benchmark for math word problem solving systems. Experiments indicate that state-of-the-art models on the Math23K dataset perform poorly on Ape210K. We propose a copy-augmented and feature-enriched sequence to sequence (seq2seq) model, which outperforms existing models by 3.2% on the Math23K dataset and serves as a strong baseline of the Ape210K dataset. The gap is still significant between human and our baseline model, calling for further research efforts. We make Ape210K dataset publicly available at https://github.com/yuantiku/ape210k

Deep neural networks (DNNs) have demonstrated excellent performance on various tasks, however they are under the risk of adversarial examples that can be easily generated when the target model is accessible to an attacker (white-box setting). As plenty of machine learning models have been deployed via online services that only provide query outputs from inaccessible models (e.g. Google Cloud Vision API2), black-box adversarial attacks (inaccessible target model) are of critical security concerns in practice rather than white-box ones. However, existing query-based black-box adversarial attacks often require excessive model queries to maintain a high attack success rate. Therefore, in order to improve query efficiency, we explore the distribution of adversarial examples around benign inputs with the help of image structure information characterized by a Neural Process, and propose a Neural Process based black-box adversarial attack (NP-Attack) in this paper. Extensive experiments show that NP-Attack could greatly decrease the query counts under the black-box setting.

Information networks are ubiquitous and are ideal for modeling relational data. Networks being sparse and irregular, network embedding algorithms have caught the attention of many researchers, who came up with numerous embeddings algorithms in static networks. Yet in real life, networks constantly evolve over time. Hence, evolutionary patterns, namely how nodes develop itself over time, would serve as a powerful complement to static structures in embedding networks, on which relatively few works focus. In this paper, we propose EPNE, a temporal network embedding model preserving evolutionary patterns of the local structure of nodes. In particular, we analyze evolutionary patterns with and without periodicity and design strategies correspondingly to model such patterns in time-frequency domains based on causal convolutions. In addition, we propose a temporal objective function which is optimized simultaneously with proximity ones such that both temporal and structural information are preserved. With the adequate modeling of temporal information, our model is able to outperform other competitive methods in various prediction tasks.

We prove that the equivalence of two fundamental problems in the theory of computing. For every polynomial $t(n)\geq (1+\varepsilon)n, \varepsilon>0$, the following are equivalent:

- One-way functions exists (which in turn is equivalent to the existence of secure private-key encryption schemes, digital signatures, pseudorandom generators, pseudorandom functions, commitment schemes, and more);

- $t$-time bounded Kolmogorov Complexity, $K^t$, is mildly hard-on-average (i.e., there exists a polynomial $p(n)>0$ such that no PPT algorithm can compute $K^t$, for more than a $1-\frac{1}{p(n)}$ fraction of $n$-bit strings).

In doing so, we present the first natural, and well-studied, computational problem characterizing the feasibility of the central private-key primitives and protocols in Cryptography.

We develop an efficient algorithm for sampling the eigenvalues of random matrices distributed according to the Haar measure over the orthogonal or unitary group. Our technique samples directly a factorization of the Hessenberg form of such matrices, and then computes their eigenvalues with a tailored core-chasing algorithm. This approach requires a number of floating-point operations that is quadratic in the order of the matrix being sampled, and can be adapted to other matrix groups. In particular, we explain how it can be used to sample the Haar measure over the special orthogonal and unitary groups and the conditional probability distribution obtained by requiring the determinant of the sampled matrix be a given complex number on the complex unit circle.

Unmanned aerial vehicles (UAVs) are considered as one of the promising technologies for the next-generation wireless communication networks. Their mobility and their ability to establish a line of sight (LOS) links with the users made them key solutions for many potential applications. In the same vein, artificial intelligence is growing rapidly nowadays and has been very successful, particularly due to the massive amount of the available data. As a result, a significant part of the research community has started to integrate intelligence at the core of UAVs networks by applying machine learning (ML) algorithms in solving several problems in relation to drones. In this article, we provide a comprehensive overview of some potential applications of ML in UAV-Based networks. We will also highlight the limits of the existing works and outline some potential future applications of ML for UAVs networks.

Language models have emerged as a central component across NLP, and a great deal of progress depends on the ability to cheaply adapt them (e.g., through finetuning) to new domains and tasks. A language model's \emph{vocabulary}---typically selected before training and permanently fixed later---affects its size and is part of what makes it resistant to such adaptation. Prior work has used compositional input embeddings based on surface forms to ameliorate this issue. In this work, we go one step beyond and propose a fully compositional output embedding layer for language models, which is further grounded in information from a structured lexicon (WordNet), namely semantically related words and free-text definitions. To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary. We evaluate the model on conventional language modeling as well as challenging cross-domain settings with an open vocabulary, finding that it matches or outperforms previous state-of-the-art output embedding methods and adaptation approaches. Our analysis attributes the improvements to sample efficiency: our model is more accurate for low-frequency words.

Brain connectivity networks, derived from magnetic resonance imaging (MRI), non-invasively quantify the relationship in function, structure, and morphology between two brain regions of interest (ROIs) and give insights into gender-related connectional differences. However, to the best of our knowledge, studies on gender differences in brain connectivity were limited to investigating pairwise (i.e., low-order) relationship ROIs, overlooking the complex high-order interconnectedness of the brain as a network. To address this limitation, brain multiplexes have been introduced to model the relationship between at least two different brain networks. However, this inhibits their application to datasets with single brain networks such as functional networks. To fill this gap, we propose the first work on predicting brain multiplexes from a source network to investigate gender differences. Recently, generative adversarial networks (GANs) submerged the field of medical data synthesis. However, although conventional GANs work well on images, they cannot handle brain networks due to their non-Euclidean topological structure. Differently, in this paper, we tap into the nascent field of geometric-GANs (G-GAN) to design a deep multiplex prediction architecture comprising (i) a geometric source to target network translator mimicking a U-Net architecture with skip connections and (ii) a conditional discriminator which classifies predicted target intra-layers by conditioning on the multiplex source intra-layers. Such architecture simultaneously learns the latent source network representation and the deep non-linear mapping from the source to target multiplex intra-layers. Our experiments on a large dataset demonstrated that predicted multiplexes significantly boost gender classification accuracy compared with source networks and identifies both low and high-order gender-specific multiplex connections.

Modern object detection methods can be divided into one-stage approaches and two-stage ones. One-stage detectors are more efficient owing to straightforward architectures, but the two-stage detectors still take the lead in accuracy. Although recent work try to improve the one-stage detectors by imitating the structural design of the two-stage ones, the accuracy gap is still significant. In this paper, we propose MimicDet, a novel and efficient framework to train a one-stage detector by directly mimic the two-stage features, aiming to bridge the accuracy gap between one-stage and two-stage detectors. Unlike conventional mimic methods, MimicDet has a shared backbone for one-stage and two-stage detectors, then it branches into two heads which are well designed to have compatible features for mimicking. Thus MimicDet can be end-to-end trained without the pre-train of the teacher network. And the cost does not increase much, which makes it practical to adopt large networks as backbones. We also make several specialized designs such as dual-path mimicking and staggered feature pyramid to facilitate the mimicking process. Experiments on the challenging COCO detection benchmark demonstrate the effectiveness of MimicDet. It achieves 46.1 mAP with ResNeXt-101 backbone on the COCO test-dev set, which significantly surpasses current state-of-the-art methods.

The individual brain can be viewed as a highly-complex multigraph (i.e. a set of graphs also called connectomes), where each graph represents a unique connectional view of pairwise brain region (node) relationships such as function or morphology. Due to its multifold complexity, understanding how brain disorders alter not only a single view of the brain graph, but its multigraph representation at the individual and population scales, remains one of the most challenging obstacles to profiling brain connectivity for ultimately disentangling a wide spectrum of brain states (e.g., healthy vs. disordered). In this work, while cross-pollinating the fields of spectral graph theory and diffusion models, we unprecedentedly propose an eigen-based cross-diffusion strategy for multigraph brain integration, comparison, and profiling. Specifically, we first devise a brain multigraph fusion model guided by eigenvector centrality to rely on most central nodes in the cross-diffusion process. Next, since the graph spectrum encodes its shape (or geometry) as if one can hear the shape of the graph, for the first time, we profile the fused multigraphs at several diffusion timescales by extracting the compact heat-trace signatures of their corresponding Laplacian matrices. Here, we reveal for the first time autistic and healthy profiles of morphological brain multigraphs, derived from T1-w magnetic resonance imaging (MRI), and demonstrate their discriminability in boosting the classification of unseen samples in comparison with state-of-the-art methods. This study presents the first step towards hearing the shape of the brain multigraph that can be leveraged for profiling and disentangling comorbid neurological disorders, thereby advancing precision medicine.

Adapting pre-trained language models (PrLMs) (e.g., BERT) to new domains has gained much attention recently. Instead of fine-tuning PrLMs as done in most previous work, we investigate how to adapt the features of PrLMs to new domains without fine-tuning. We explore unsupervised domain adaptation (UDA) in this paper. With the features from PrLMs, we adapt the models trained with labeled data from the source domain to the unlabeled target domain. Self-training is widely used for UDA which predicts pseudo labels on the target domain data for training. However, the predicted pseudo labels inevitably include noise, which will negatively affect training a robust model. To improve the robustness of self-training, in this paper we present class-aware feature self-distillation (CFd) to learn discriminative features from PrLMs, in which PrLM features are self-distilled into a feature adaptation module and the features from the same class are more tightly clustered. We further extend CFd to a cross-language setting, in which language discrepancy is studied. Experiments on two monolingual and multilingual Amazon review datasets show that CFd can consistently improve the performance of self-training in cross-domain and cross-language settings.

Process mining techniques such as process discovery and conformance checking provide insights into actual processes by analyzing event data that are widely available in information systems. These data are very valuable, but often contain sensitive information, and process analysts need to balance confidentiality and utility. Privacy issues in process mining are recently receiving more attention from researchers which should be complemented by a tool to integrate the solutions and make them available in the real world. In this paper, we introduce a Python-based infrastructure implementing state-of-the-art privacy preservation techniques in process mining. The infrastructure provides a hierarchy of usages from single techniques to the collection of techniques, integrated as web-based tools. Our infrastructure manages both standard and non-standard event data resulting from privacy preservation techniques. It also stores explicit privacy metadata to track the modifications applied to protect sensitive data.

In this paper we propose an index key compression scheme based on the notion of distinction bits by proving that the distinction bits of index keys are sufficient information to determine the sorted order of the index keys correctly. While the actual compression ratio may vary depending on the characteristics of datasets (an average of 2.76 to one compression ratio was observed in our experiments), the index key compression scheme leads to significant performance improvements during the reconstruction of large-scale indexes. Our index key compression can be effectively used in database replication and index recovery of modern main-memory database systems.

The motivation of our research is to establish a Laplace-domain theory that provides principles and methodology to analyze and synthesize systems with nonlinear dynamics. A semigroup of composition operators defined for nonlinear autonomous dynamical systems---the Koopman semigroup and its associated Koopman generator---plays a central role in this study. We introduce the resolvent of the Koopman generator, which we call the Koopman resolvent, and provide its spectral characterization for three types of nonlinear dynamics: ergodic evolution on an attractor, convergence to a stable equilibrium point, and convergence to a (quasi-)stable limit cycle. This shows that the Koopman resolvent provides the Laplace-domain representation of such nonlinear autonomous dynamics. A computational aspect of the Laplace-domain representation is also discussed with emphasis on non-stationary Koopman modes.

It is well-known that optimal (i.e., revenue-maximizing) selling mechanisms in multidimensional type spaces may involve randomization. We study mechanisms for selling two identical, indivisible objects to a single buyer. We analyze two settings: (i) decreasing marginal values (DMV) and (ii) increasing marginal values (IMV). Thus, the two marginal values of the buyer are not independent. We obtain sufficient conditions on the distribution of buyer values for the existence of an optimal mechanism that is deterministic.

In the DMV model, we show that under a well-known condition, it is optimal to sell the first unit deterministically. Under the same sufficient condition, a bundling mechanism (which is deterministic) is optimal in the IMV model. Under a stronger sufficient condition, a deterministic mechanism is optimal in the DMV model.

Our results apply to heterogenous objects when there is a specified sequence in which the two objects must be sold.

Increasing awareness of privacy-preserving has led to a strong focus on anonymous systems protecting anonymity. By studying early schemes, we summarize some intractable problems of anonymous systems. Centralization setting is a universal problem since most anonymous system rely on central proxies or presetting nodes to forward and mix messages, which compromises users' privacy in some way. Besides, availability becomes another important factor limiting the development of anonymous system due to the large requirement of additional additional resources (i.e. bandwidth and storage) and high latency. Moreover, existing anonymous systems may suffer from different attacks including abominable Man-in-the-Middle (MitM) attacks, Distributed Denial-of-service (DDoS) attacks and so on. In this context, we first come up with a BlockChain-based Mix-Net (BCMN) protocol and theoretically demonstrate its security and anonymity. Then we construct a concrete dynamic self-organizing BlockChain-based MIX anonymous system (BCMIX). In the system, users and mix nodes utilize the blockchain transactions and their addresses to negotiate keys with each other, which can resist the MitM attacks. In addition, we design an IP sharding algorithm to mitigate Sybil attacks. To evaluate the BCMIX system, we leverage the distribution of mining pools in the real world to test the system's performance and ability to resistant attacks. Compared with other systems, BCMIX provides better resilience to known attacks, while achieving low latency anonymous communication without significant bandwidth or storage resources.

We consider the joint constellation design problem for the noncoherent multiple-input multiple-output (MIMO) multiple-access channel. By analyzing the noncoherent maximum-likelihood (ML) detection error, we propose novel design criteria so as to minimize the error probability. Our first criterion is the minimum expected pairwise log-likelihood ratio over the joint constellation. From an analysis of this metric at high signal-to-noise ratio, we obtain further simplified metrics. For any given set of constellation sizes, the proposed metrics can be optimized over the set of signal matrices. Using these criteria, we evaluate two simple constructions: partitioning a single-user constellation, which is effective for relatively small constellations, and precoding individual constellations of lower dimension. For a fixed joint constellation, the design metrics can be further optimized over the per-user transmit power, especially when the users transmit at different rates. Considering unitary space-time modulation, we investigate the option of building each individual constellation as a set of truncated unitary matrices scaled by the respective transmit power. Numerical results show that our proposed metrics are meaningful, and can be used as objectives to generate constellations through numerical optimization that perform better, for the same transmission rate and power constraint, than a common pilot-based scheme and the constellations optimized with existing metrics.

Recent advances in single image super-resolution (SISR) explored the power of convolutional neural network (CNN) to achieve a better performance. Despite the great success of CNN-based methods, it is not easy to apply these methods to edge devices due to the requirement of heavy computation. To solve this problem, various fast and lightweight CNN models have been proposed. The information distillation network is one of the state-of-the-art methods, which adopts the channel splitting operation to extract distilled features. However, it is not clear enough how this operation helps in the design of efficient SISR models. In this paper, we propose the feature distillation connection (FDC) that is functionally equivalent to the channel splitting operation while being more lightweight and flexible. Thanks to FDC, we can rethink the information multi-distillation network (IMDN) and propose a lightweight and accurate SISR model called residual feature distillation network (RFDN). RFDN uses multiple feature distillation connections to learn more discriminative feature representations. We also propose a shallow residual block (SRB) as the main building block of RFDN so that the network can benefit most from residual learning while still being lightweight enough. Extensive experimental results show that the proposed RFDN achieve a better trade-off against the state-of-the-art methods in terms of performance and model complexity. Moreover, we propose an enhanced RFDN (E-RFDN) and won the first place in the AIM 2020 efficient super-resolution challenge. Code will be available at https://github.com/njulj/RFDN.

We study fundamental graph problems such as graph connectivity, minimum spanning forest (MSF), and approximate maximum (weight) matching in a distributed setting. In particular, we focus on the Adaptive Massively Parallel Computation (AMPC) model, which is a theoretical model that captures MapReduce-like computation augmented with a distributed hash table.

We show the first AMPC algorithms for all of the studied problems that run in a constant number of rounds and use only $O(n^\epsilon)$ space per machine, where $0 < \epsilon < 1$. Our results improve both upon the previous results in the AMPC model, as well as the best-known results in the MPC model, which is the theoretical model underpinning many popular distributed computation frameworks, such as MapReduce, Hadoop, Beam, Pregel and Giraph.

Finally, we provide an empirical comparison of the algorithms in the MPC and AMPC models in a fault-tolerant distriubted computation environment. We empirically evaluate our algorithms on a set of large real-world graphs and show that our AMPC algorithms can achieve improvements in both running time and round-complexity over optimized MPC baselines.

Graph embedding is a powerful method to represent graph neurological data (e.g., brain connectomes) in a low dimensional space for brain connectivity mapping, prediction and classification. However, existing embedding algorithms have two major limitations. First, they primarily focus on preserving one-to-one topological relationships between nodes (i.e., regions of interest (ROIs) in a connectome), but they have mostly ignored many-to-many relationships (i.e., set to set), which can be captured using a hyperconnectome structure. Second, existing graph embedding techniques cannot be easily adapted to multi-view graph data with heterogeneous distributions. In this paper, while cross-pollinating adversarial deep learning with hypergraph theory, we aim to jointly learn deep latent embeddings of subject0specific multi-view brain graphs to eventually disentangle different brain states. First, we propose a new simple strategy to build a hyperconnectome for each brain view based on nearest neighbour algorithm to preserve the connectivities across pairs of ROIs. Second, we design a hyperconnectome autoencoder (HCAE) framework which operates directly on the multi-view hyperconnectomes based on hypergraph convolutional layers to better capture the many-to-many relationships between brain regions (i.e., nodes). For each subject, we further regularize the hypergraph autoencoding by adversarial regularization to align the distribution of the learned hyperconnectome embeddings with that of the input hyperconnectomes. We formalize our hyperconnectome embedding within a geometric deep learning framework to optimize for a given subject, thereby designing an individual-based learning framework. Our experiments showed that the learned embeddings by HCAE yield to better results for brain state classification compared with other deep graph embedding methods methods.

The high-SNR capacity of the noncoherent MIMO channel has been derived for the case of independent and identically distributed (IID) Rayleigh block fading by exploiting the Gaussianity of the channel matrix. This implies the optimal degrees of freedom (DoF), i.e., the capacity pre-log factor. Nevertheless, as far as the optimal DoF is concerned, IID Rayleigh fading is apparently a sufficient but not necessary condition. In this paper, we show that the optimal DoF for the IID Rayleigh block fading channel is also the optimal DoF for a more general class of generic block fading channels, in which the random channel matrix has finite power and finite differential entropy. Our main contribution is a novel converse proof based on the duality approach.

Similarity-preserving hashing is a core technique for fast similarity searches, and it randomly maps data points in a metric space to strings of discrete symbols (i.e., sketches) in the Hamming space. While traditional hashing techniques produce binary sketches, recent ones produce integer sketches for preserving various similarity measures. However, most similarity search methods are designed for binary sketches and inefficient for integer sketches. Moreover, most methods are either inapplicable or inefficient for dynamic datasets, although modern real-world datasets are updated over time. We propose dynamic filter trie (DyFT), a dynamic similarity search method for both binary and integer sketches. An extensive experimental analysis using large real-world datasets shows that DyFT performs superiorly with respect to scalability, time performance, and memory efficiency. For example, on a huge dataset of 216 million data points, DyFT performs a similarity search 6,000 times faster than a state-of-the-art method while reducing to one-thirteenth in memory.

Salient object segmentation aims at distinguishing various salient objects from backgrounds. Despite the lack of semantic consistency, salient objects often have obvious texture and location characteristics in local area. Based on this priori, we propose a novel Local Context Attention Network (LCANet) to generate locally reinforcement feature maps in a uniform representational architecture. The proposed network introduces an Attentional Correlation Filter (ACF) module to generate explicit local attention by calculating the correlation feature map between coarse prediction and global context. Then it is expanded to a Local Context Block(LCB). Furthermore, an one-stage coarse-to-fine structure is implemented based on LCB to adaptively enhance the local context description ability. Comprehensive experiments are conducted on several salient object segmentation datasets, demonstrating the superior performance of the proposed LCANet against the state-of-the-art methods, especially with 0.883 max F-score and 0.034 MAE on DUTS-TE dataset.

Equipping machines with comprehensive knowledge of the world's entities and their relationships has been a long-standing goal of AI. Over the last decade, large-scale knowledge bases, also known as knowledge graphs, have been automatically constructed from web contents and text sources, and have become a key asset for search engines. This machine knowledge can be harnessed to semantically interpret textual phrases in news, social media and web tables, and contributes to question answering, natural language processing and data analytics. This article surveys fundamental concepts and practical methods for creating and curating large knowledge bases. It covers models and methods for discovering and canonicalizing entities and their semantic types and organizing them into clean taxonomies. On top of this, the article discusses the automatic extraction of entity-centric properties. To support the long-term life-cycle and the quality assurance of machine knowledge, the article presents methods for constructing open schemas and for knowledge curation. Case studies on academic projects and industrial knowledge graphs complement the survey of concepts and methods.

The aim of the present paper is to introduce a new numerical method for solving nonlinear Volterra integro-differential equations involving delay. We apply trapezium rule to the integral involved in the equation. Further, Daftardar-Gejji and Jafari method (DGJ) is employed to solve the implicit equation. Existence-uniqueness theorem is derived for solutions of such equations and the error and convergence analysis of the proposed method is presented. We illustrate efficacy of the newly proposed method by constructing examples.

Along with the rapid development of cloud computing technology, containerization technology has drawn much attention from both industry and academia. In this paper, we perform a comparative measurement analysis of Docker-sec, which is a Linux Security Module proposed in 2018, and a new AppArmor profile generator called Lic-Sec, which combines Docker-sec with a modified version of LiCShield, which is also a Linux Security Module proposed in 2015. Docker-sec and LiCShield can be used to enhance Docker container security based on mandatory access control and allows protection of the container without manually configurations. Lic-Sec brings together their strengths and provides stronger protection. We evaluate the effectiveness and performance of Docker-sec and Lic-Sec by testing them with real-world attacks. We generate an exploit database with 42 exploits effective on Docker containers selected from the latest 400 exploits on Exploit-db. We launch these exploits on containers spawned with Docker-sec and Lic-Sec separately. Our evaluations show that for demanding images, Lic-Sec gives protection for all privilege escalation attacks for which Docker-sec failed to give protection.

Providing personalized recommendations that are also accompanied by explanations as to why an item is recommended is a research area of growing importance. At the same time, progress is limited by the availability of open evaluation resources. In this work, we address the task of scientific literature recommendation. We present arXivDigest, which is an online service providing personalized arXiv recommendations to end users and operates as a living lab for researchers wishing to work on explainable scientific literature recommendations.

Nowcasting is a field of meteorology which aims at forecasting weather on a short term of up to a few hours. In the meteorology landscape, this field is rather specific as it requires particular techniques, such as data extrapolation, where conventional meteorology is generally based on physical modeling. In this paper, we focus on cloud cover nowcasting, which has various application areas such as satellite shots optimisation and photovoltaic energy production forecast.

Following recent deep learning successes on multiple imagery tasks, we applied deep convolutionnal neural networks on Meteosat satellite images for cloud cover nowcasting. We present the results of several architectures specialized in image segmentation and time series prediction. We selected the best models according to machine learning metrics as well as meteorological metrics. All selected architectures showed significant improvements over persistence and the well-known U-Net surpasses AROME physical model.

We consider a discrete-time nonatomic routing game with variable demand and uncertain costs. Given a fixed routing network with single origin and destination, the costs functions on edges depend on some uncertain persistent state parameter. Every period, a variable traffic demand routes through the network. The experienced costs are publicly observed and the belief about the state parameter is Bayesianly updated. This paper studies the dynamics of equilibrium and beliefs. We say that there is strong learning when beliefs converge to the truth and there is weak learning when equilibrium flows converge to those under complete information. Our main result is a characterization of the networks for which learning occurs for all increasing cost functions, given highly variable demand. We prove that these networks have a series-parallel structure and provide a counterexample to prove that the condition is necessary.

Lung cancer is one of the most deadly diseases in the world. Detecting such tumors at an early stage can be a tedious task. Existing deep learning architecture for lung nodule identification used complex architecture with large number of parameters. This study developed a cascaded architecture which can accurately segment and classify the benign or malignant lung nodules on computed tomography (CT) images. The main contribution of this study is to introduce a segmentation network where the first stage trained on a public data set can help to recognize the images which included a nodule from any data set by means of transfer learning. And the segmentation of a nodule improves the second stage to classify the nodules into benign and malignant. The proposed architecture outperformed the conventional methods with an area under curve value of 95.67\%. The experimental results showed that the classification accuracy of 97.96\% of our proposed architecture outperformed other simple and complex architectures in classifying lung nodules for lung cancer detection.

Travel time is a crucial measure in transportation. Accurate travel time prediction is also fundamental for operation and advanced information systems. A variety of solutions exist for short-term travel time predictions such as solutions that utilize real-time GPS data and optimization methods to track the path of a vehicle. However, reliable long-term predictions remain challenging. We show in this paper the applicability and usefulness of travel time i.e. delivery time prediction for postal services. We investigate several methods such as linear regression models and tree based ensembles such as random forest, bagging, and boosting, that allow to predict delivery time by conducting extensive experiments and considering many usability scenarios. Results reveal that travel time prediction can help mitigate high delays in postal services. We show that some boosting algorithms, such as light gradient boosting and catboost, have a higher performance in terms of accuracy and runtime efficiency than other baselines such as linear regression models, bagging regressor and random forest.

Distributed development involving globally distributed teams in different countries and timezones adds additional complexity into an already complex undertaking. This paper focuses on the effect of global software development on motivation. Specifically, we ask, what impact does misalignment between needed and actual autonomy have on global team motivation? We studied members of two distributed software development teams with different degrees of distribution, both following the Scrum approach to software development. One team's members are distributed across Ireland, England and Wales; the other has members in locations across Europe and North America. We observed the teams during their Scrum "ceremonies," and interviewed each team member, during which asked we asked team members to rate their motivation on a 5 point ordinal scale. Considering both the reported motivation levels, and qualitative analysis of our observations and interviews, our results suggest that autonomy appears to be just one of three job aspects that affect motivation, the others being competence and relatedness. We hypothesize that (1) autonomy is a necessary but not sufficient condition for motivation among experienced team members, and (2) autonomy is not a motivator unless accompanied by sufficient competence.

In this article, we introduce and study the concept of the exponent of a cyclic code over a finite field $\mathbb{F}_q.$ We give a relation between the exponent of a cyclic code and its dual code. Finally, we introduce and determine the exponent distribution of the cyclic code.

Data clustering with uneven distribution in high level noise is challenging. Currently, HDBSCAN is considered as the SOTA algorithm for this problem. In this paper, we propose a novel clustering algorithm based on what we call graph of density topology (GDT). GDT jointly considers the local and global structures of data samples: firstly forming local clusters based on a density growing process with a strategy for properly noise handling as well as cluster boundary detection; and then estimating a GDT from relationship between local clusters in terms of a connectivity measure, givingglobal topological graph. The connectivity, measuring similarity between neighboring local clusters, is based on local clusters rather than individual points, ensuring its robustness to even very large noise. Evaluation results on both toy and real-world datasets show that GDT achieves the SOTA performance by far on almost all the popular datasets, and has a low time complexity of O(nlogn). The code is available at https://github.com/gaozhangyang/DGC.git.

Lithium-ion batteries are increasingly being deployed in liberalised electricity systems, where their use is driven by economic optimisation in a specific market context. However, battery degradation depends strongly on operational profile, and this is particularly variable in energy trading applications. Here, we present results from a year-long experiment where pairs of batteries were cycled with profiles calculated by solving an economic optimisation problem for wholesale energy trading, including a physically-motivated degradation model as a constraint. The results show that this approach can increase revenue by 20% whilst simultaneously decreasing degradation by 30% compared to existing methods. The physics-based approach increases the lifetime both in terms of years and number of cycles, as well as the revenue per year, increasing the possible lifetime revenue by 70%. This demonstrates the potential to unlock significant extra performance using control engineering incorporating physical models of battery ageing.

The identification of safe faults (i.e., faults which are guaranteed not to produce any failure) in an electronic system is a crucial step when analyzing its dependability and its test plan development. Unfortunately, safe fault identification is poorly supported by available EDA tools, and thus remains an open problem. The complexity growth of modern systems used in safety-critical applications further complicates their identification. In this article, we identify some classes of safe faults within an embedded system based on a pipelined processor. A new method for automating the safe fault identification is also proposed. The safe faults belonging to each class are identified resorting to Automatic Test Pattern Generation (ATPG) techniques. The proposed methodology is applied to a sample system built around the OpenRisc1200 open source processor.

The Ulam's metric is the minimal number of moves consisting in removal of one element from a permutation and its subsequent reinsertion in different place, to go between two given permutations. Thet elements that are not moved create longest common subsequence of permutations. Aldous and Diaconis, in their paper, pointed that Ulam's metric had been introduced in the context of questions concerning sorting and tossing cards. In this paper we define and study Ulam's metric in highier dimensions: for dimension one the considered object is a pair of permutations, for dimension k it is a pair of k-tuples of permutations. Over encodings by k-tuples of permutations we define two dually related hierarchies. Our very first motivation come from Murata at al. paper, in which pairs of permutations were used as representation of topological relation between rectangles packed into minimal area with application to VLSI physical design. Our results concern hardness, approximability, and parametrized complexity inside the hierarchies.

We prove the existence and computability of optimal strategies in weighted limit games, zero-sum infinite-duration games with a B\"uchi-style winning condition requiring to produce infinitely many play prefixes that satisfy a given regular specification. Quality of plays is measured in the maximal weight of infixes between successive play prefixes that satisfy the specification.

This paper deals with belief base revision that is a form of belief change consisting of the incorporation of new facts into an agent's beliefs represented by a finite set of propositional formulas. In the aim to guarantee more reliability and rationality for real applications while performing revision, we propose the idea of credible belief base revision yielding to define two new formula-based revision operators using the suitable tools offered by evidence theory. These operators, uniformly presented in the same spirit of others in [9], stem from consistent subbases maximal with respect to credibility instead of set inclusion and cardinality. Moreover, in between these two extremes operators, evidence theory let us shed some light on a compromise operator avoiding losing initial beliefs to the maximum extent possible. Its idea captures maximal consistent sets stemming from all possible intersections of maximal consistent subbases. An illustration of all these operators and a comparison with others are inverstigated by examples.

For graphs $G,H$, a homomorphism from $G$ to $H$ is an edge-preserving mapping from $V(G)$ to $V(H)$. In the list homomorphism problem, denoted by \textsc{LHom}($H$), we are given a graph $G$ and lists $L: V(G) \to 2^{V(H)}$, and we ask for a homomorphism from $G$ to $H$ which additionally respects the lists $L$.

Very recently Okrasa, Piecyk, and Rz\k{a}\.zewski [ESA 2020] defined an invariant $i^*(H)$ and proved that under the SETH $\mathcal{O}^*\left (i^*(H)^{\textrm{tw}(G)}\right)$ is the tight complexity bound for \textsc{LHom}($H$), parameterized by the treewidth $\textrm{tw}(G)$ of the instance graph $G$. We study the complexity of the problem under dirretent parameterizations. As the first result, we show that $i^*(H)$ is also the right complexity base if the parameter is the size of a minimum feedback vertex set of $G$.

Then we turn our attention to a parameterization by the cutwidth $\textrm{ctw}(G)$ of $G$. Jansen and Nederlof~[ESA 2018] showed that \textsc{List $k$-Coloring} (i.e., \textsc{LHom}($K_k$)) can be solved in time $\mathcal{O}^*\left (c^{\textrm{ctw}(G)}\right)$ where $c$ does not depend on $k$. Jansen asked if this behavior extends to graph homomorphisms. As the main result of the paper, we answer the question in the negative. We define a new graph invariant $mim^*(H)$ and prove that \textsc{LHom}($H$) problem cannot be solved in time $\mathcal{O}^*\left ((mim^*(H)-\varepsilon)^{\textrm{ctw}(G)}\right)$ for any $\varepsilon >0$, unless the SETH fails. This implies that there is no $c$, such that for every odd cycle the non-list version of the problem can be solved in time $\mathcal{O}^*\left (c^{\textrm{ctw}(G)} \right)$.

Finally, we generalize the algorithm of Jansen and Nederlof, so that it can be used to solve \textsc{LHom}($H$) for every graph $H$; its complexity depends on $\textrm{ctw}(G)$ and another invariant of $H$, which is constant for cliques.

A labelled Markov decision process is a labelled Markov chain with nondeterminism, i.e., together with a strategy a labelled MDP induces a labelled Markov chain. The model is related to interval Markov chains. Motivated by applications of equivalence checking for the verification of anonymity, we study the algorithmic comparison of two labelled MDPs, in particular, whether there exist strategies such that the MDPs become equivalent/inequivalent, both in terms of trace equivalence and in terms of probabilistic bisimilarity. We provide the first polynomial-time algorithms for computing memoryless strategies to make the two labelled MDPs inequivalent if such strategies exist. We also study the computational complexity of qualitative problems about making the total variation distance and the probabilistic bisimilarity distance less than one or equal to one.

Cough audio signal classification has been successfully used to diagnose a variety of respiratory conditions, and there has been significant interest in leveraging Machine Learning (ML) to provide widespread COVID-19 screening. However, there is currently no validated database of cough sounds with which to train such ML models. The COUGHVID dataset provides over 20,000 crowdsourced cough recordings representing a wide range of subject ages, genders, geographic locations, and COVID-19 statuses. First, we filtered the dataset using our open-sourced cough detection algorithm. Second, experienced pulmonologists labeled more than 2,000 recordings to diagnose medical abnormalities present in the coughs, thereby contributing one of the largest expert-labeled cough datasets in existence that can be used for a plethora of cough audio classification tasks. Finally, we ensured that coughs labeled as symptomatic and COVID-19 originate from countries with high infection rates, and that their expert labels are consistent. As a result, the COUGHVID dataset contributes a wealth of cough recordings for training ML models to address the world's most urgent health crises.

Legal scholars have postulated that there have been three eras of American law to-date, consisting in chronological order of the initial Age of Discovery, the Age of Faith, and then the Age of Anxiety. An open question that has received erudite attention in legal studies is what the next era, the fourth era, might consist of, and for which various proposals exist including examples such as the Age of Consent, the Age of Information, etc. There is no consensus in the literature as yet on what the fourth era is, and nor whether the fourth era has already begun or will instead emerge in the future. This paper examines the potential era-elucidating impacts amid the advent of autonomous Artificial Intelligence Legal Reasoning (AILR), entailing whether such AILR will be an element of a fourth era or a driver of a fourth, fifth, or perhaps the sixth era of American law. Also, a set of meta-characteristics about the means of identifying a legal era changeover are introduced, along with an innovative discussion of the role entailing legal formalism versus legal realism in the emergence of the American law eras.

In the last twenty years, data series similarity search has emerged as a fundamental operation at the core of several analysis tasks and applications related to data series collections. Many solutions to different mining problems work by means of similarity search. In this regard, all the proposed solutions require the prior knowledge of the series length on which similarity search is performed. In several cases, the choice of the length is critical and sensibly influences the quality of the expected outcome. Unfortunately, the obvious brute-force solution, which provides an outcome for all lengths within a given range is computationally untenable. In this Ph.D. work, we present the first solutions that inherently support scalable and variable-length similarity search in data series, applied to sequence/subsequences matching, motif and discord discovery problems.The experimental results show that our approaches are up to orders of magnitude faster than the alternatives. They also demonstrate that we can remove the unrealistic constraint of performing analytics using a predefined length, leading to more intuitive and actionable results, which would have otherwise been missed.

In this paper, we investigate a prescribed-time and fully distributed Nash Equilibrium (NE) seeking problem for continuous-time noncooperative games. By exploiting pseudo-gradient play and consensus-based schemes, various distributed NE seeking algorithms are presented over either fixed or switching communication topologies so that the convergence to the NE is reached in a prescribed time. In particular, a prescribed-time distributed NE seeking algorithm is firstly developed under a fixed graph to find the NE in a prior-given and user-defined time, provided that a static controller gain can be selected based on certain global information such as the algebraic connectivity of the communication graph and both the Lipschitz and monotone constants of the pseudo-gradient associated with players' objective functions. Secondly, a prescribed-time and fully distributed NE seeking algorithm is proposed to remove global information by designing heterogeneous dynamic gains that turn on-line the weights of the communication topology. Further, we extend this algorithm to accommodate jointly switching topologies. It is theoretically proved that the global convergence of those proposed algorithms to the NE is rigorously guaranteed in a prescribed time based on a time function transformation approach. In the last, numerical simulation results are presented to verify the effectiveness of the designs.

Annotation is the labeling of data by human effort. Annotation is critical to modern machine learning, and Bloomberg has developed years of experience of annotation at scale. This report captures a wealth of wisdom for applied annotation projects, collected from more than 30 experienced annotation project managers in Bloomberg's Global Data department.

In this paper a fully coupled system of transient $Navier$-$Stokes$ ($NS$) fluid flow model and variable coefficient unsteady Advection-Diffusion-Reaction ($VADR$) transport model has been studied through subgrid multiscale stabilized finite element method. In particular algebraic approach of approximating the subscales has been considered to arrive at stabilized variational formulation of the coupled system. This system is strongly coupled since viscosity of the fluid depends upon the concentration, whose transportation is modelled by $VADR$ equation. Fully implicit schemes have been considered for time discretisation. Further more elaborated derivations of both $apriori$ and $aposteriori$ error estimates for stabilized finite element scheme have been carried out. Credibility of the stabilized method is also established well through various numerical experiments, presented before concluding.

In [Cou15] a multiplier technique, going back to Leray and G{\aa}rding for scalar hyperbolic partial differential equations, has been extended to the context of finite difference schemes for evolutionary problems. The key point of the analysis in [Cou15] was to obtain a discrete energy-dissipation balance law when the initial difference operator is multiplied by a suitable quantity. The construction of the energy and dissipation functionals was achieved in [Cou15] under the assumption that all modes were separated. We relax this assumption here and construct, for the same multiplier as in [Cou15], the energy and dissipation functionals when some modes cross. Semigroup estimates for fully discrete hy-perbolic initial boundary value problems are deduced in this broader context by following the arguments of [Cou15].

We propose a heuristics-based social-sensor cloud service selection and composition model to reconstruct mosaic scenes. The proposed approach leverages crowdsourced social media images to create an image mosaic to reconstruct a scene at a designated location and an interval of time. The novel approach relies on the set of features defined on the bases of the image metadata to determine the relevance and composability of services. Novel heuristics are developed to filter out non-relevant services. Multiple machine learning strategies are employed to produce smooth service composition resulting in a mosaic of relevant images indexed by geolocation and time. The preliminary analytical results prove the feasibility of the proposed composition model.

Subgraph counting aims to count occurrences of a template T in a given network G(V, E). It is a powerful graph analysis tool and has found real-world applications in diverse domains. Scaling subgraph counting problems is known to be memory bounded and computationally challenging with exponential complexity. Although scalable parallel algorithms are known for several graph problems such as Triangle Counting and PageRank, this is not common for counting complex subgraphs. Here we address this challenge and study connected acyclic graphs or trees. We propose a novel vectorized subgraph counting algorithm, named Subgraph2Vec, as well as both shared memory and distributed implementations: 1) reducing algorithmic complexity by minimizing neighbor traversal; 2) achieving a highly-vectorized implementation upon linear algebra kernels to significantly improve performance and hardware utilization. 3) Subgraph2Vec improves the overall performance over the state-of-the-art work by orders of magnitude and up to 660x on a single node. 4) Subgraph2Vec in distributed mode can scale up the template size to 20 and maintain good strong scalability. 5) enabling portability to both CPU and GPU.

Forecasting plays a critical role in the development of organisational business strategies. Despite a considerable body of research in the area of forecasting, the focus has largely been on the financial and economic outcomes of the forecasting process as opposed to societal benefits. Our motivation in this study is to promote the latter, with a view to using the forecasting process to advance social and environmental objectives such as equality, social justice and sustainability. We refer to such forecasting practices as Forecasting for Social Good (FSG) where the benefits to society and the environment take precedence over economic and financial outcomes. We conceptualise FSG and discuss its scope and boundaries in the context of the "Doughnut theory". We present some key attributes that qualify a forecasting process as FSG: it is concerned with a real problem, it is focused on advancing social and environmental goals and prioritises these over conventional measures of economic success, and it has a broad societal impact. We also position FSG in the wider literature on forecasting and social good practices. We propose an FSG maturity framework as the means to engage academics and practitioners with research in this area. Finally, we highlight that FSG: (i) cannot be distilled to a prescriptive set of guidelines, (ii) is scalable, and (iii) has the potential to make significant contributions to advancing social objectives.

This paper proposes a new analysis of graph using the concept of electric potential, and also proposes a graph simplification method based on this analysis. Suppose that each node in the weighted-graph has its respective potential value. Furthermore, suppose that the start and terminal nodes in graphs have maximum and zero potentials, respectively. When we let the level of each node be defined as the minimum number of edges/hops from the start node to the node, the proper potential of each level can be estimated based on geometric proportionality relationship. Based on the estimated potential for each level, we can re-design the graph for path-finding problems to be the electrical circuits, thus Kirchhoff's Circuit Law can be directed applicable for simplifying the graph for path-finding problems.

Latest research in expertise assessment of soccer players pronounced the importance of perceptual skills. Former research focused either on high experimental control or natural presentation mode. To assess perceptual skills of athletes, in an optimized manner, we captured omnidirectional in-field scenes, showed to 12 expert, 9 intermediate and 13 novice goalkeepers from soccer on virtual reality glasses. All scenes where shown from the same natural goalkeeper perspective and ended after the return pass to the goalkeeper. Based on their responses and gaze behavior we classified their expertise with common machine learning techniques. This pilot study shows promising results for objective classification of goalkeepers expertise based on their gaze behaviour.

Recent work has identified a number of formally incompatible operational measures for the unfairness of a machine learning (ML) system. As these measures all capture intuitively desirable aspects of a fair system, choosing "the one true" measure is not possible, and instead a reasonable approach is to minimize a weighted combination of measures. However, this simply raises the question of how to choose the weights. Here, we formulate Legally Grounded Fairness Objectives (LGFO), which uses signals from the legal system to non-arbitrarily measure the social cost of a specific degree of unfairness. The LGFO is the expected damages under a putative lawsuit that might be awarded to those who were wrongly classified, in the sense that the ML system made a decision different to that which would have be made under the court's preferred measure. Notably, the two quantities necessary to compute the LGFO, the court's preferences about fairness measures, and the expected damages, are unknown but well-defined, and can be estimated by legal advice. Further, as the damages awarded by the legal system are designed to measure and compensate for the harm caused to an individual by an unfair classification, the LGFO aligns closely with society's estimate of the social cost.

The success of machine learning algorithms often relies on a large amount of high-quality data to train well-performed models. However, data is a valuable resource and are always held by different parties in reality. An effective solution to such a data isolation problem is to employ federated learning, which allows multiple parties to collaboratively train a model. In this paper, we propose a Secure version of the widely used Maximum Mean Discrepancy (SMMD) based on homomorphic encryption to enable effective knowledge transfer under the data federation setting without compromising the data privacy. The proposed SMMD is able to avoid the potential information leakage in transfer learning when aligning the source and target data distribution. As a result, both the source domain and target domain can fully utilize their data to build more scalable models. Experimental results demonstrate that our proposed SMMD is secure and effective.

Pre-sales customer service is of importance to E-commerce platforms as it contributes to optimizing customers' buying process. To better serve users, we propose AliMe KG, a domain knowledge graph in E-commerce that captures user problems, points of interests (POI), item information and relations thereof. It helps to understand user needs, answer pre-sales questions and generate explanation texts. We applied AliMe KG to several online business scenarios such as shopping guide, question answering over properties and recommendation reason generation, and gained positive results. In the paper, we systematically introduce how we construct domain knowledge graph from free text, and demonstrate its business value with several applications. Our experience shows that mining structured knowledge from free text in vertical domain is practicable, and can be of substantial value in industrial settings.

Software refactoring aims at improving code quality while preserving the system's external behavior. Although in principle refactoring is a behavior-preserving activity, a study presented by Bavota et al. in 2012 reported the proneness of some refactoring actions (e.g., pull up method) to induce faults. The study was performed by mining refactoring activities and bugs from three systems. Taking profit of the advances made in the mining software repositories field (e.g., better tools to detect refactoring actions at commit-level granularity), we present a differentiated replication of the work by Bavota et al. in which we (i) overcome some of the weaknesses that affect their experimental design, (ii) answer the same research questions of the original study on a much larger dataset (3 vs 103 systems), and (iii) complement the quantitative analysis of the relationship between refactoring and bugs with a qualitative, manual inspection of commits aimed at verifying the extent to which refactoring actions trigger bug-fixing activities. The results of our quantitative analysis confirm the findings of the replicated study, while the qualitative analysis partially demystifies the role played by refactoring actions in the bug introduction.

We study the problem of convergence to stability in coalition formation games in which the strategies of each agent are coalitions in which she can participate and outcomes are coalition structures. Given a natural blocking dynamic, an absorbing set is a minimum set of coalition structures that once reached is never abandoned. The coexistence of single and non-single absorbing sets is what causes lack of convergence to stability. To characterize games in which both types of set are present, we first relate circularity among coalitions in preferences (rings) with circularity among coalition structures (cycles) and show that there is a ring in preferences if and only if there is a cycle in coalition structures. Then we identify a special configuration of overlapping rings in preferences characterizing games that lack convergence to stability. Finally, we apply our findings to the study of games induced by sharing rules.

Despite the success of generative pre-trained language models on a series of text generation tasks, they still suffer in cases where reasoning over underlying commonsense knowledge is required during generation. Existing approaches that integrate commonsense knowledge into generative pre-trained language models simply transfer relational knowledge by post-training on individual knowledge triples while ignoring rich connections within the knowledge graph. We argue that exploiting both the structural and semantic information of the knowledge graph facilitates commonsense-aware text generation. In this paper, we propose Generation with Multi-Hop Reasoning Flow (GRF) that enables pre-trained models with dynamic multi-hop reasoning on multi-relational paths extracted from the external commonsense knowledge graph. We empirically show that our model outperforms existing baselines on three text generation tasks that require reasoning over commonsense knowledge. We also demonstrate the effectiveness of the dynamic multi-hop reasoning module with reasoning paths inferred by the model that provide rationale to the generation.

Carbon dioxide Capture and Storage (CCS) is an important strategy in mitigating anthropogenic CO$_2$ emissions. In order for CCS to be successful, large quantities of CO$_2$ must be stored and the storage site conformance must be monitored. Here we present a deep learning method to reconstruct pressure fields and classify the flux out of the storage formation based on the pressure data from Above Zone Monitoring Interval (AZMI) wells. The deep learning method is a version of a semi conditional variational auto-encoder tailored to solve two tasks: reconstruction of an incremental pressure field and leakage rate classification. The method, predictions and associated uncertainty estimates are illustrated on the synthetic data from a high-fidelity heterogeneous 2D numerical reservoir model, which was used to simulate subsurface CO$_2$ movement and pressure changes in the AZMI due to a CO$_2$ leakage.

The Poisson-Boltzmann equation is a widely used model to study the electrostatics in molecular solvation. Its numerical solution using a boundary integral formulation requires a mesh on the molecular surface only, yielding accurate representations of the solute, which is usually a complicated geometry. Here, we utilize adjoint-based analyses to form two goal-oriented error estimates that allows us to determine the contribution of each discretization element (panel) to the numerical error in the solvation free energy. This information is useful to identify high-error panels to then refine them adaptively to find optimal surface meshes. We present results for spheres and real molecular geometries, and see that elements with large error tend to be in regions where there is a high electrostatic potential. We also find that even though both estimates predict different total errors, they have similar performance as part of an adaptive mesh refinement scheme. Our test cases suggest that the adaptive mesh refinement scheme is very effective, as we are able to reduce the error one order of magnitude by increasing the mesh size less than 20\%. This result sets the basis towards efficient automatic mesh refinement schemes that produce optimal meshes for solvation energy calculations.

In analogy to compressed sensing, which allows sample-efficient signal reconstruction given prior knowledge of its sparsity in frequency domain, we propose to utilize policy simplicity (Occam's Razor) as a prior to enable sample-efficient imitation learning. We first demonstrated the feasibility of this scheme on linear case where state-value function can be sampled directly. We also extended the scheme to scenarios where only actions are visible and scenarios where the policy is obtained from nonlinear network. The method is benchmarked against behavior cloning and results in significantly higher scores with limited expert demonstrations.

Artificial intelligence (AI) provides many opportunities to improve private and public life. Discovering patterns and structures in large troves of data in an automated manner is a core component of data science, and currently drives applications in diverse areas such as computational biology, law and finance. However, such a highly positive impact is coupled with significant challenges: how do we understand the decisions suggested by these systems in order that we can trust them? In this report, we focus specifically on data-driven methods -- machine learning (ML) and pattern recognition models in particular -- so as to survey and distill the results and observations from the literature. The purpose of this report can be especially appreciated by noting that ML models are increasingly deployed in a wide range of businesses. However, with the increasing prevalence and complexity of methods, business stakeholders in the very least have a growing number of concerns about the drawbacks of models, data-specific biases, and so on. Analogously, data science practitioners are often not aware about approaches emerging from the academic literature, or may struggle to appreciate the differences between different methods, so end up using industry standards such as SHAP. Here, we have undertaken a survey to help industry practitioners (but also data scientists more broadly) understand the field of explainable machine learning better and apply the right tools. Our latter sections build a narrative around a putative data scientist, and discuss how she might go about explaining her models by asking the right questions.

The finite element method, finite difference method, finite volume method and spectral method have achieved great success in solving partial differential equations. However, the high accuracy of traditional numerical methods is at the cost of high efficiency. Especially in the face of high-dimensional problems, the traditional numerical methods are often not feasible in the subdivision of high-dimensional meshes and the differentiability and integrability of high-order terms. In deep learning, neural network can deal with high-dimensional problems by adding the number of layers or expanding the number of neurons. Compared with traditional numerical methods, it has great advantages. In this article, we consider the Deep Galerkin Method (DGM) for solving the general Stokes equations by using deep neural network without generating mesh grid. The DGM can reduce the computational complexity and achieve the competitive results. Here, depending on the L2 error we construct the objective function to control the performance of the approximation solution. Then, we prove the convergence of the objective function and the convergence of the neural network to the exact solution. Finally, the effectiveness of the proposed framework is demonstrated through some numerical experiments.

Multivariate time series analysis is an important problem in data mining because of its widespread applications. With the increase of time series data available for training, implementing deep neural networks in the field of time series analysis is becoming common. Res2Net, a recently proposed backbone, can further improve the state-of-the-art networks as it improves the multi-scale representation ability through connecting different groups of filters. However, Res2Net ignores the correlations of the feature maps and lacks the control on the information interaction process. To address that problem, in this paper, we propose a backbone convolutional neural network based on the thought of gated mechanism and Res2Net, namely Gated Res2Net (GRes2Net), for multivariate time series analysis. The hierarchical residual-like connections are influenced by gates whose values are calculated based on the original feature maps, the previous output feature maps and the next input feature maps thus considering the correlations between the feature maps more effectively. Through the utilization of gated mechanism, the network can control the process of information sending hence can better capture and utilize the both the temporal information and the correlations between the feature maps. We evaluate the GRes2Net on four multivariate time series datasets including two classification datasets and two forecasting datasets. The results demonstrate that GRes2Net have better performances over the state-of-the-art methods thus indicating the superiority

In this study, we produce a geometrically scaled perceptual timbre space from dissimilarity ratings of subtractive synthesized sounds and correlate the resulting dimensions with a set of acoustic descriptors. We curate a set of 15 sounds, produced by a synthesis model that uses varying source waveforms, frequency modulation (FM) and a lowpass filter with an enveloped cutoff frequency. Pairwise dissimilarity ratings were collected within an online browser-based experiment. We hypothesized that a varied waveform input source and enveloped filter would act as the main vehicles for timbral variation, providing novel acoustic correlates for the perception of synthesized timbres.

This work presents a mathematical treatment of the relation between Self-Organizing Maps (SOMs) and Gaussian Mixture Models (GMMs). We show that energy-based SOM models can be interpreted as performing gradient descent, minimizing an approximation to the GMM log-likelihood that is particularly valid for high data dimensionalities. The SOM-like decrease of the neighborhood radius can be understood as an annealing procedure ensuring that gradient descent does not get stuck in undesirable local minima. This link allows to treat SOMs as generative probabilistic models, giving a formal justification for using SOMs, e.g., to detect outliers, or for sampling.

High-dimensional streaming data are becoming increasingly ubiquitous in many fields. They often lie in multiple low-dimensional subspaces, and the manifold structures may change abruptly on the time scale due to pattern shift or occurrence of anomalies. However, the problem of detecting the structural changes in a real-time manner has not been well studied. To fill this gap, we propose a dynamic sparse subspace learning (DSSL) approach for online structural change-point detection of high-dimensional streaming data. A novel multiple structural change-point model is proposed and it is shown to be equivalent to maximizing a posterior under certain conditions. The asymptotic properties of the estimators are investigated. The penalty coefficients in our model can be selected by AMDL criterion based on some historical data. An efficient Pruned Exact Linear Time (PELT) based method is proposed for online optimization and change-point detection. The effectiveness of the proposed method is demonstrated through a simulation study and a real case study using gesture data for motion tracking.

In this paper we present a methodology that uses convolutional neural networks (CNNs) for segmentation by iteratively growing predicted mask regions in each coordinate direction. The CNN is used to predict class probability scores in a small neighborhood of the center pixel in a tile of an image. We use a threshold on the CNN probability scores to determine whether pixels are added to the region and the iteration continues until no new pixels are added to the region. Our method is able to achieve high segmentation accuracy and preserve biologically realistic morphological features while leveraging small amounts of training data and maintaining computational efficiency. Using retinal blood vessel images from the DRIVE database we found that our method is more accurate than a fully convolutional semantic segmentation CNN for several evaluation metrics.

Deep neural networks have achieved great success both in computer vision and natural language processing tasks. However, mostly state-of-art methods highly rely on external training or computing to improve the performance. To alleviate the external reliance, we proposed a gradient enhancement approach, conducted by the short circuit neural connections, to improve the gradient learning of deep neural networks. The proposed short circuit is a unidirectional connection that single back propagates the sensitive from the deep layer to the shallows. Moreover, the short circuit formulates to be a gradient truncation of its crossing layers which can plug into the backbone deep neural networks without introducing external training parameters. Extensive experiments demonstrate deep neural networks with our short circuit gain a large margin over the baselines on both computer vision and natural language processing tasks.

Self-driving cars and autonomous vehicles are revolutionizing the automotive sector, shaping the future of mobility altogether. Although the integration of novel technologies such as Artificial Intelligence (AI) and Cloud/Edge computing provides golden opportunities to improve autonomous driving applications, there is the need to modernize accordingly the whole prototyping and deployment cycle of AI components. This paper proposes a novel framework for developing so-called AI Inference Engines for autonomous driving applications based on deep learning modules, where training tasks are deployed elastically over both Cloud and Edge resources, with the purpose of reducing the required network bandwidth, as well as mitigating privacy issues. Based on our proposed data driven V-Model, we introduce a simple yet elegant solution for the AI components development cycle, where prototyping takes place in the cloud according to the Software-in-the-Loop (SiL) paradigm, while deployment and evaluation on the target ECUs (Electronic Control Units) is performed as Hardware-in-the-Loop (HiL) testing. The effectiveness of the proposed framework is demonstrated using two real-world use-cases of AI inference engines for autonomous vehicles, that is environment perception and most probable path prediction.

Although indoor localization has been a wide researched topic, obtained results may not fit the requirements that some domains need. Most approaches are not able to precisely localize a fast moving object even with a complex installation, which makes their implementation in the automated driving domain complicated. In this publication, common technologies were analyzed and a commercial product, called Marvelmind Indoor GPS, was chosen for our use case in which both ultrasound and radio frequency communications are used. The evaluation is given in a first moment on small indoor scenarios with static and moving objects. Further tests were done on wider areas, where the system is integrated within our Robotics Operating System (ROS)-based self-developed 'Smart PhysIcal Demonstration and evaluation Robot (SPIDER)' and the results of these outdoor tests are compared with the obtained localization by the installed GPS on the robot. Finally, the next steps to improve the results in further developments are discussed.

Upon starting a collective endeavour, it is important to understand your partners' preferences and how strongly they commit to a common goal. Establishing a prior commitment or agreement in terms of posterior benefits and consequences from those engaging in it provides an important mechanism for securing cooperation in both pairwise and multiparty cooperation dilemmas. Resorting to methods from Evolutionary Game Theory (EGT), here we analyse how prior commitments can also be adopted as a tool for enhancing coordination when its outcomes exhibit an asymmetric payoff structure, in both pairwise and multiparty interactions. Arguably, coordination is more complex to achieve than cooperation since there might be several desirable collective outcomes in a coordination problem (compared to mutual cooperation, the only desirable collective outcome in cooperation dilemmas), especially when these outcomes entail asymmetric benefits for those involved. Our analysis, both analytically and via numerical simulations, shows that whether prior commitment would be a viable evolutionary mechanism for enhancing coordination and the overall population social welfare strongly depends on the collective benefit and severity of competition, and more importantly, how asymmetric benefits are resolved in a commitment deal. Moreover, in multiparty interactions, prior commitments prove to be crucial when a high level of group diversity is required for optimal coordination. Our results are shown to be robust for different selection intensities. We frame our model within the context of the technology adoption decision making, but the obtained results are applicable to other coordination problems.

This paper aims to understand and improve the utility of the dropout operation from the perspective of game-theoretic interactions. We prove that dropout can suppress the strength of interactions between input variables of deep neural networks (DNNs). The theoretical proof is also verified by various experiments. Furthermore, we find that such interactions were strongly related to the over-fitting problem in deep learning. Thus, the utility of dropout can be regarded as decreasing interactions to alleviating the significance of over-fitting. Based on this understanding, we propose an interaction loss to further improve the utility of dropout. Experimental results have shown that the interaction loss can effectively improve the utility of dropout and boost the performance of DNNs.

Deep learning approaches to anomaly detection have recently improved the state of the art in detection performance on complex datasets such as large collections of images or text. These results have sparked a renewed interest in the anomaly detection problem and led to the introduction of a great variety of new methods. With the emergence of numerous such methods, including approaches based on generative models, one-class classification, and reconstruction, there is a growing need to bring methods of this field into a systematic and unified perspective. In this review we aim to identify the common underlying principles as well as the assumptions that are often made implicitly by various methods. In particular, we draw connections between classic 'shallow' and novel deep approaches and show how this relation might cross-fertilize or extend both directions. We further provide an empirical assessment of major existing methods that is enriched by the use of recent explainability techniques, and present specific worked-through examples together with practical advice. Finally, we outline critical open challenges and identify specific paths for future research in anomaly detection.

Graph sparsification aims to reduce the number of edges of a network while maintaining its accuracy for given tasks. In this study, we propose a novel method called GSGAN, which is able to sparsify networks for community detection tasks. GSGAN is able to capture those relationships that are not shown in the original graph but are relatively important, and creating artificial edges to reflect these relationships and further increase the effectiveness of the community detection task. We adopt GAN as the learning model and guide the generator to produce random walks that are able to capture the structure of a network. Specifically, during the training phase, in addition to judging the authenticity of the random walk, discriminator also considers the relationship between nodes at the same time. We design a reward function to guide the generator creating random walks that contain useful hidden relation information. These random walks are then combined to form a new social network that is efficient and effective for community detection. Experiments with real-world networks demonstrate that the proposed GSGAN is much more effective than the baselines, and GSGAN can be applied and helpful to various clustering algorithms of community detection.

Today's sensor network implementations often comprise various types of nodes connected with different types of networks. These and various other aspects influence the delay of transmitting data and therefore of out-of-order data occurrences. This turns into a crucial problem in time-sensitive applications where data must be processed promptly and decisions must be reliable.

In this paper, we were researching dynamic buffer sizing algorithms for multiple, distributed and independent sources, which reorder event streams, thus enabling subsequent time-sensitive applications to work correctly. To be able to evaluate such algorithms, we had to record datasets first. Five novel dynamic buffer sizing algorithms were implemented and compared to state-of-the-art approaches in this domain. The evaluation has shown that the use of a dynamic time-out buffering method is preferable over a static buffer. The higher the variation of the network or other influences in the environment, the more necessary it becomes to use an algorithm which dynamically adapts its buffer size. These algorithms are universally applicable, easy to integrate in existing architectures, and particularly interesting for time-sensitive applications. Dynamic time-out buffering is still a trade-off between reaction time and out-of-order event compensation.

This work aims to advance computational methods for projection-based reduced order models (ROMs) of linear time-invariant (LTI) dynamical systems. For such systems, current practice relies on ROM formulations expressing the state as a rank-1 tensor (i.e., a vector), leading to computational kernels that are memory bandwidth bound and, therefore, ill-suited for scalable performance on modern many-core and hybrid computing nodes. This weakness can be particularly limiting when tackling many-query studies, where one needs to run a large number of simulations. This work introduces a reformulation, called rank-2 Galerkin, of the Galerkin ROM for LTI dynamical systems which converts the nature of the ROM problem from memory bandwidth to compute bound. We present the details of the formulation and its implementation, and demonstrate its utility through numerical experiments using, as a test case, the simulation of elastic seismic shear waves in an axisymmetric domain. We quantify and analyze performance and scaling results for varying numbers of threads and problem sizes. Finally, we present an end-to-end demonstration of using the rank-2 Galerkin ROM for a Monte Carlo sampling study. We show that the rank-2 Galerkin ROM is one order of magnitude more efficient than the rank-1 Galerkin ROM (the current practice) and about 970X more efficient than the full order model, while maintaining excellent accuracy in both the mean and statistics of the field.

The suicide of Indian actor Sushant Singh Rajput in the midst of the COVID-19 lockdown triggered a media frenzy of prime time coverage that lasted several months and became a political hot button issue. Using data from Twitter, YouTube, and an archive of debunked misinformation stories, we found two important patterns. First, that retweet rates on Twitter clearly suggest that commentators benefited from talking about the case, which got higher engagement than other topics. Second, that politicians, in particular, were instrumental in changing the course of the discourse by referring to the case as 'murder', rather than 'suicide'. In conclusion, we consider the effects of Rajput's outsider status as a small-town implant in the film industry within the broader narrative of systemic injustice, as well as the gendered aspects of mob justice that have taken aim at his former partner in the months since.

Bank transaction fraud results in over $13B annual losses for banks, merchants, and card holders worldwide. Much of this fraud starts with a Point-of-Compromise (a data breach or a skimming operation) where credit and debit card digital information is stolen, resold, and later used to perform fraud. We introduce this problem and present an automatic Points-of-Compromise (POC) detection procedure. BreachRadar is a distributed alternating algorithm that assigns a probability of being compromised to the different possible locations. We implement this method using Apache Spark and show its linear scalability in the number of machines and transactions. BreachRadar is applied to two datasets with billions of real transaction records and fraud labels where we provide multiple examples of real Points-of-Compromise we are able to detect. We further show the effectiveness of our method when injecting Points-of-Compromise in one of these datasets, simultaneously achieving over 90% precision and recall when only 10% of the cards have been victims of fraud. Engineering projects are notoriously hard to complete on-time, with project delays often theorised to propagate across interdependent activities. Here, we use a novel dataset consisting of activity networks from 14 diverse, large-scale engineering projects to uncover network properties that impact timely project completion. We provide the first empirical evidence of the infectious nature of activity deviations, where perturbations in the delivery of a single activity can impact up to 4 activities downstream, leading to large perturbation cascades. We further show that perturbation clustering significantly affects project overall delays. Finally, we find that poorly performing projects have their highest perturbations in high reach nodes, which can lead to largest cascades, while well performing projects have perturbations in low reach nodes, resulting in localised cascades. Altogether, these findings pave the way for a network-science framework that can materially enhance the delivery of large-scale engineering projects. Commonsense explanation generation aims to empower the machine's sense-making capability by generating plausible explanations to statements against commonsense. While this task is easy to human, the machine still struggles to generate reasonable and informative explanations. In this work, we propose a method that first extracts the underlying concepts which are served as \textit{bridges} in the reasoning chain and then integrates these concepts to generate the final explanation. To facilitate the reasoning process, we utilize external commonsense knowledge to build the connection between a statement and the bridge concepts by extracting and pruning multi-hop paths to build a subgraph. We design a bridge concept extraction model that first scores the triples, routes the paths in the subgraph, and further selects bridge concepts with weak supervision at both the triple level and the concept level. We conduct experiments on the commonsense explanation generation task and our model outperforms the state-of-the-art baselines in both automatic and human evaluation. Conflict-avoiding codes (CACs) were introduced by Levenshtein as a single-channel transmission scheme for a multiple-access collision channel without feedback. When the number of simultaneously active source nodes is less than or equal to the weight of a CAC, it is able to provide a hard guarantee that each active source node transmits at least one packet successfully within a fixed time duration, no matter what the relative time offsets between the source nodes are. In this paper, we extend CACs to multichannel CACs for providing such a hard guarantee over multiple orthogonal channels. Upper bounds on the number of codewords for multichannel CACs of weights three and four are derived, and constructions that are optimal with respect to these bounds are presented. We study the expressive power of successor-invariant first-order logic, which is an extension of first-order logic where the usage of an additional successor relation on the structure is allowed, as long as the validity of formulas is independent on the choice of a particular successor. We show that when the degree is bounded, successor-invariant first-order logic is no more expressive than first-order logic. In the classical multi-party computation setting, multiple parties jointly compute a function without revealing their own input data. We consider a variant of this problem, where the input data can be shared for machine learning training purposes, but the data are also encrypted so that they cannot be recovered by other parties. We present a rotation based method using flow model, and theoretically justified its security. We demonstrate the effectiveness of our method in different scenarios, including supervised secure model training, and unsupervised generative model training. Our code is available at https://github.com/ duchenzhuang/flowencrypt. This paper explores a new research problem of unsupervised transfer learning across multiple spatiotemporal prediction tasks. Unlike most existing transfer learning methods that focus on fixing the discrepancy between supervised tasks, we study how to transfer knowledge from a zoo of unsupervisedly learned models towards another predictive network. Our motivation is that models from different sources are expected to understand the complex spatiotemporal dynamics from different perspectives, thereby effectively supplementing the new task, even if the task has sufficient training samples. Technically, we propose a differentiable framework named transferable memory. It adaptively distills knowledge from a bank of memory states of multiple pretrained RNNs, and applies it to the target network via a novel recurrent structure called the Transferable Memory Unit (TMU). Compared with finetuning, our approach yields significant improvements on three benchmarks for spatiotemporal prediction, and benefits the target task even from less relevant pretext ones. Wikipedia is edited by volunteer editors around the world. Considering the large amount of existing content (e.g. over 5M articles in English Wikipedia), deciding what to edit next can be difficult, both for experienced users that usually have a huge backlog of articles to prioritize, as well as for newcomers who that might need guidance in selecting the next article to contribute. Therefore, helping editors to find relevant articles should improve their performance and help in the retention of new editors. In this paper, we address the problem of recommending relevant articles to editors. To do this, we develop a scalable system on top of Graph Convolutional Networks and Doc2Vec, learning how to represent Wikipedia articles and deliver personalized recommendations for editors. We test our model on editors' histories, predicting their most recent edits based on their prior edits. We outperform competitive implicit-feedback collaborative-filtering methods such as WMRF based on ALS, as well as a traditional IR-method such as content-based filtering based on BM25. All of the data used on this paper is publicly available, including graph embeddings for Wikipedia articles, and we release our code to support replication of our experiments. Moreover, we contribute with a scalable implementation of a state-of-art graph embedding algorithm as current ones cannot efficiently handle the sheer size of the Wikipedia graph. The Bluetooth standard specifies two incompatible wireless transports: Bluetooth Classic (BT) for high-throughput services and Bluetooth Low Energy (BLE) for very low-power services. BT and BLE have different security architectures and threat models, but they use similar security mechanisms. In particular, pairing enables two devices to establish a long term key to secure the communication. Two devices have to pair over BT and BLE to use both transports securely. Since pairing the same devices two times is considered user-unfriendly, Bluetooth v4.2 introduced Cross-Transport Key Derivation (CTKD). CTKD allows two devices to pair once, either over BT or BLE, and generate both BT and BLE long term keys. Despite CTKD allowing traversal of the security boundary between BT and BLE, the security implications of CTKD have not yet been investigated. We present the first security analysis of CTKD and identify five cross-transport issues for BT and BLE. These issues enable, for the first time, exploitation of both BT and BLE by attacking either transport. Based on the identified issues, we demonstrate four novel cross-transport attacks resulting in device impersonation, traffic manipulation, and malicious session establishment. We refer to them as BLUR attacks, as they blur the security boundary between BT and BLE. The BLUR attacks are standard-compliant and therefore apply to all devices supporting CTKD, regardless of implementation details. We successfully demonstrate the BLUR attacks on 13 devices with 10 unique Bluetooth chips, and discuss effective countermeasures. We disclosed our findings and countermeasures to the Bluetooth SIG in May 2020. We present a polynomial space Monte Carlo algorithm that given a directed graph on$n$vertices and average outdegree$\delta$, detects if the graph has a Hamiltonian cycle in$2^{n-\Omega(\frac{n}{\delta})}$time. This asymptotic scaling of the savings in the running time matches the fastest known exponential space algorithm by Bj\"orklund and Williams ICALP 2019. By comparison, the previously best polynomial space algorithm by Kowalik and Majewski IPEC 2020 guarantees a$2^{n-\Omega(\frac{n}{2^\delta})}$time bound. Our algorithm combines for the first time the idea of obtaining a fingerprint of the presence of a Hamiltonian cycle through an inclusion--exclusion summation over the Laplacian of the graph from Bj\"orklund, Kaski, and Koutis ICALP 2017, with the idea of sieving for the non-zero terms in an inclusion--exclusion summation by listing solutions to systems of linear equations over$\mathbb{Z}_2$from Bj\"orklund and Husfeldt FOCS 2013. We present a new method for learning control law that stabilizes an unknown nonlinear dynamical system at an equilibrium point. We formulate a system identification task in a self-supervised learning setting that jointly learns a controller and corresponding stable closed-loop dynamics hypothesis. The open-loop input-output behavior of the underlying dynamical system is used as the supervising signal to train the neural network-based system model and controller. The method relies on the Lyapunov stability theory to generate a stable closed-loop dynamics hypothesis and corresponding control law. We demonstrate our method on various nonlinear control problems such as n-Link pendulum balancing, pendulum on cart balancing, and wheeled vehicle path following. In a partitioned Bloom Filter the$m$bit vector is split into$k$disjoint$m/k$sized parts, one per hash function. Contrary to hardware designs, where they prevail, software implementations mostly adopt standard Bloom filters, considering partitioned filters slightly worse, due to the slightly larger false positive rate (FPR). In this paper, by performing an in-depth analysis, first we show that the FPR advantage of standard Bloom filters is smaller than thought; more importantly, by studying the per-element FPR, we show that standard Bloom filters have weak spots in the domain: elements which will be tested as false positives much more frequently than expected. This is relevant in scenarios where an element is tested against many filters, e.g., in packet forwarding. Moreover, standard Bloom filters are prone to exhibit extremely weak spots if naive double hashing is used, something occurring in several, even mainstream, libraries. Partitioned Bloom filters exhibit a uniform distribution of the FPR over the domain and are robust to the naive use of double hashing, having no weak spots. Finally, by surveying several usages other than testing set membership, we point out the many advantages of having disjoint parts: they can be individually sampled, extracted, added or retired, leading to superior designs for, e.g., SIMD usage, size reduction, test of set disjointness, or duplicate detection in streams. Partitioned Bloom filters are better, and should replace the standard form, both in general purpose libraries and as the base for novel designs. Despite the influence that image-based communication has on online discourse, the role played by images in disinformation is still not well understood. In this paper, we present the first large-scale study of fauxtography, analyzing the use of manipulated or misleading images in news discussion on online communities. First, we develop a computational pipeline geared to detect fauxtography, and identify over 61k instances of fauxtography discussed on Twitter, 4chan, and Reddit. Then, we study how posting fauxtography affects engagement of posts on social media, finding that posts containing it receive more interactions in the form of re-shares, likes, and comments. Finally, we show that fauxtography images are often turned into memes by Web communities. Our findings show that effective mitigation against disinformation need to take images into account, and highlight a number of challenges in dealing with image-based disinformation. In the \textsc{Maximum Degree Contraction} problem, input is a graph$G$on$n$vertices, and integers$k, d$, and the objective is to check whether$G$can be transformed into a graph of maximum degree at most$d$, using at most$k$edge contractions. A simple brute-force algorithm that checks all possible sets of edges for a solution runs in time$n^{\mathcal{O}(k)}$. As our first result, we prove that this algorithm is asymptotically optimal, upto constants in the exponents, under Exponential Time Hypothesis (\ETH). Belmonte, Golovach, van't Hof, and Paulusma studied the problem in the realm of Parameterized Complexity and proved, among other things, that it admits an \FPT\ algorithm running in time$(d + k)^{2k} \cdot n^{\mathcal{O}(1)} = 2^{\mathcal{O}(k \log (k+d) )} \cdot n^{\mathcal{O}(1)}$, and remains \NP-hard for every constant$d \ge 2$(Acta Informatica$(2014)$). We present a different \FPT\ algorithm that runs in time$2^{\mathcal{O}(dk)} \cdot n^{\mathcal{O}(1)}$. In particular, our algorithm runs in time$2^{\mathcal{O}(k)} \cdot n^{\mathcal{O}(1)}$, for every fixed$d$. In the same article, the authors asked whether the problem admits a polynomial kernel, when parameterized by$k + d$. We answer this question in the negative and prove that it does not admit a polynomial compression unless$\NP \subseteq \coNP/poly$. Domain adaptation or transfer learning using pre-trained language models such as BERT has proven to be an effective approach for many natural language processing tasks. In this work, we propose to formulate word sense disambiguation as a relevance ranking task, and fine-tune BERT on sequence-pair ranking task to select the most probable sense definition given a context sentence and a list of candidate sense definitions. We also introduce a data augmentation technique for WSD using existing example sentences from WordNet. Using the proposed training objective and data augmentation technique, our models are able to achieve state-of-the-art results on the English all-words benchmark datasets. Agriculture is a key component in any country's development. Domain-specific knowledge resources serve to gain insight into the domain. Existing knowledge resources such as AGROVOC and NAL Thesaurus are developed and maintained by the domain experts. Population of terms into these knowledge resources can be automated by using automatic term extraction tools for processing unstructured agricultural text. Automatic term extraction is also a key component in many semantic web applications, such as ontology creation, recommendation systems, sentiment classification, query expansion among others. The primary goal of an automatic term extraction system is to maximize the number of valid terms and minimize the number of invalid terms extracted from the input set of documents. Despite its importance in various applications, the availability of online tools for the said purpose is rather limited. Moreover, the performance of the most popular ones among them varies significantly. As a consequence, selection of the right term extraction tool is perceived as a serious problem for different knowledge-based applications. This paper presents an analysis of three commonly used term extraction tools, viz. RAKE, TerMine, TermRaider and compares their performance in terms of precision and recall, vis-a-vis RENT, a more recent term extractor developed by these authors for agriculture domain. In this study, we applied reinforcement learning based on the proximal policy optimization algorithm to perform motion planning for an unmanned aerial vehicle (UAV) in an open space with static obstacles. The application of reinforcement learning through a real UAV has several limitations such as time and cost; thus, we used the Gazebo simulator to train a virtual quadrotor UAV in a virtual environment. As the reinforcement learning progressed, the mean reward and goal rate of the model were increased. Furthermore, the test of the trained model shows that the UAV reaches the goal with an 81% goal rate using the simple reward function suggested in this work. In this paper, we propose a bio-inspired receiver, which detects the electrons transmitted through a nanowire, then, it converts the detected information into a blue light using bioluminescence. Using light allows the designed receiver to also act as a relay for the nearest gateway (photo-detector). We simulate the construction of the nanowire, present its electrical characteristics and calculate its maximum throughput for a better design of the receiver. The designed receiver contains two parts, a part that detects the transmitted electrons, which we model by using an equivalent circuit, and a part that converts the detected electrons into a blue light. We derive the analytical expressions of the equivalent circuit's components, and we calculate the emitted photons for each electrical pulse detected. We also propose modulation techniques that guaranty an effective decoding of the information. We send a binary message and we follow the electron detection process of the proposed receiver until light emission and we calculate the Bit Error Rate (BER) to evaluate the performance of the designed receiver. The results of this study show that the designed receiver can accurately detect the electrons sent through a conductive nanowire in wired nano-communication networks, and that it can also act as a relay for the nearest gateway. The A64FX processor from Fujitsu, being designed for computational simulation and machine learning applications, has the potential for unprecedented performance in HPC systems. In this paper, we evaluate the A64FX by benchmarking against a range of production HPC platforms that cover a number of processor technologies. We investigate the performance of complex scientific applications across multiple nodes, as well as single node and mini-kernel benchmarks. This paper finds that the performance of the A64FX processor across our chosen benchmarks often significantly exceeds other platforms, even without specific application optimisations for the processor instruction set or hardware. However, this is not true for all the benchmarks we have undertaken. Furthermore, the specific configuration of applications can have an impact on the runtime and performance experienced. Obtaining data with meaningful labels is often costly and error-prone. In this situation, semi-supervised learning (SSL) approaches are interesting, as they leverage assumptions about the unlabeled data to make up for the limited amount of labels. However, in real-world situations, we cannot assume that the labeling process is infallible, and the accuracy of many SSL classifiers decreases significantly in the presence of label noise. In this work, we introduce the LGC_LVOF, a leave-one-out filtering approach based on the Local and Global Consistency (LGC) algorithm. Our method aims to detect and remove wrong labels, and thus can be used as a preprocessing step to any SSL classifier. Given the propagation matrix, detecting noisy labels takes O(cl) per step, with c the number of classes and l the number of labels. Moreover, one does not need to compute the whole propagation matrix, but only an$l$by$l$submatrix corresponding to interactions between labeled instances. As a result, our approach is best suited to datasets with a large amount of unlabeled data but not many labels. Results are provided for a number of datasets, including MNIST and ISOLET. LGCLVOF appears to be equally or more precise than the adapted gradient-based filter. We show that the best-case accuracy of the embedding of LGCLVOF into LGC yields performance comparable to the best-case of$\ell_1$-based classifiers designed to be robust to label noise. We provide a heuristic to choose the number of removed instances. The goal of zero-shot learning (ZSL) is to train a model to classify samples of classes that were not seen during training. To address this challenging task, most ZSL methods relate unseen test classes to seen(training) classes via a pre-defined set of attributes that can describe all classes in the same semantic space, so the knowledge learned on the training classes can be adapted to unseen classes. In this paper, we aim to optimize the attribute space for ZSL by training a propagation mechanism to refine the semantic attributes of each class based on its neighbors and related classes on a graph of classes. We show that the propagated attributes can produce classifiers for zero-shot classes with significantly improved performance in different ZSL settings. The graph of classes is usually free or very cheap to acquire such as WordNet or ImageNet classes. When the graph is not provided, given pre-defined semantic embeddings of the classes, we can learn a mechanism to generate the graph in an end-to-end manner along with the propagation mechanism. However, this graph-aided technique has not been well-explored in the literature. In this paper, we introduce the attribute propagation network (APNet), which is composed of 1) a graph propagation model generating attribute vector for each class and 2) a parameterized nearest neighbor (NN) classifier categorizing an image to the class with the nearest attribute vector to the image's embedding. For better generalization over unseen classes, different from previous methods, we adopt a meta-learning strategy to train the propagation mechanism and the similarity metric for the NN classifier on multiple sub-graphs, each associated with a classification task over a subset of training classes. In experiments with two zero-shot learning settings and five benchmark datasets, APNet achieves either compelling performance or new state-of-the-art results. Sequential sensor data is generated in a wide variety of practical applications. A fundamental challenge involves learning effective classifiers for such sequential data. While deep learning has led to impressive performance gains in recent years in domains such as speech, this has relied on the availability of large datasets of sequences with high-quality labels. In many applications, however, the associated class labels are often extremely limited, with precise labelling/segmentation being too expensive to perform at a high volume. However, large amounts of unlabeled data may still be available. In this paper we propose a novel framework for semi-supervised learning in such contexts. In an unsupervised manner, change point detection methods can be used to identify points within a sequence corresponding to likely class changes. We show that change points provide examples of similar/dissimilar pairs of sequences which, when coupled with labeled, can be used in a semi-supervised classification setting. Leveraging the change points and labeled data, we form examples of similar/dissimilar sequences to train a neural network to learn improved representations for classification. We provide extensive synthetic simulations and show that the learned representations are superior to those learned through an autoencoder and obtain improved results on both simulated and real-world human activity recognition datasets. Fuzzy string matching and language classification are important tools in Natural Language Processing pipelines, this paper provides advances in both areas. We propose a fast novel approach to string tokenisation for fuzzy language matching and experimentally demonstrate an 83.6% decrease in processing time with an estimated improvement in recall of 3.1% at the cost of a 2.6% decrease in precision. This approach is able to work even where keywords are subdivided into multiple words, without needing to scan character-to-character. So far there has been little work considering using metadata to enhance language classification algorithms. We provide observational data and find the Accept-Language header is 14% more likely to match the classification than the IP Address. We consider an approach for community detection in time-varying networks. At its core, this approach maintains a small sketch graph to capture the essential community structure found in each snapshot of the full network. We demonstrate how the sketch can be used to explicitly identify six key community events which typically occur during network evolution: growth, shrinkage, merging, splitting, birth and death. Based on these detection techniques, we formulate a community detection algorithm which can process a network concurrently exhibiting all processes. One advantage afforded by the sketch-based algorithm is the efficient handling of large networks. Whereas detecting events in the full graph may be computationally expensive, the small size of the sketch allows changes to be quickly assessed. A second advantage occurs in networks containing clusters of disproportionate size. The sketch is constructed such that there is equal representation of each cluster, thus reducing the possibility that the small clusters are lost in the estimate. We present a new standardized benchmark based on the stochastic block model which models the addition and deletion of nodes, as well as the birth and death of communities. When coupled with existing benchmarks, this new benchmark provides a comprehensive suite of tests encompassing all six community events. We provide a set of numerical results demonstrating the advantages of our approach both in run time and in the handling of small clusters. Since the outbreak of the COVID-19 pandemic, many healthcare facilities have suffered from shortages in medical resources, particularly in Personal Protective Equipment (PPE). In this paper, we propose a game-theoretic approach to schedule PPE orders among healthcare facilities. In this PPE game, each independent healthcare facility optimises its own storage utilisation in order to keep its PPE cost at a minimum. Such a model can reduce peak demand considerably when applied to a fluctuating PPE consumption profile. Experiments conducted for NHS England regions using actual data confirm that the challenge of securing PPE supply during disasters such as COVID-19 can be eased if proper stock management procedures are adopted. These procedures can include early stockpiling, increasing storage capacities and implementing measures that can prolong the time period between successive infection waves, such as social distancing measures. Simulation results suggest that the provision of PPE dedicated storage space can be a viable solution to avoid straining PPE supply chains in case a second wave of COVID-19 infections occurs. Recent network pruning methods focus on pruning models early-on in training. To estimate the impact of removing a parameter, these methods use importance measures that were originally designed for pruning trained models. Despite lacking justification for their use early-on in training, models pruned using such measures result in surprisingly minimal accuracy loss. To better explain this behavior, we develop a general, gradient-flow based framework that relates state-of-the-art importance measures through an order of time-derivative of the norm of model parameters. We use this framework to determine the relationship between pruning measures and evolution of model parameters, establishing several findings related to pruning models early-on in training: (i) magnitude-based pruning removes parameters that contribute least to reduction in loss, resulting in models that converge faster than magnitude-agnostic methods; (ii) loss-preservation based pruning preserves first-order model evolution dynamics and is well-motivated for pruning minimally trained models; and (iii) gradient-norm based pruning affects second-order model evolution dynamics, and increasing gradient norm via pruning can produce poorly performing models. We validate our claims on several VGG-13, MobileNet-V1, and ResNet-56 models trained on CIFAR-10 and CIFAR-100. Code available at https://github.com/EkdeepSLubana/flowandprune. The task of scheduling jobs to machines while minimizing the total makespan, the sum of weighted completion times, or a norm of the load vector, are among the oldest and most fundamental tasks in combinatorial optimization. Since all of these problems are in general NP-hard, much attention has been given to the regime where there is only a small number$k$of job types, but possibly the number of jobs$n$is large; this is the few job types, high-multiplicity regime. Despite many positive results, the hardness boundary of this regime was not understood until now. We show that makespan minimization on uniformly related machines ($Q|HM|C_{\max}$) is NP-hard already with$6$job types, and that the related Cutting Stock problem is NP-hard already with$8$item types. For the more general unrelated machines model ($R|HM|C_{\max}$), we show that if either the largest job size$p_{\max}$, or the number of jobs$n$are polynomially bounded in the instance size$|I|$, there are algorithms with complexity$|I|^{\textrm{poly}(k)}$. Our main result is that this is unlikely to be improved, because$Q||C_{\max}$is W[1]-hard parameterized by$k$already when$n$,$p_{\max}$, and the numbers describing the speeds are polynomial in$|I|$; the same holds for$R|HM|C_{\max}$(without speeds) when the job sizes matrix has rank$2$. Our positive and negative results also extend to the objectives$\ell_2$-norm minimization of the load vector and, partially, sum of weighted completion times$\sum w_j C_j$. Along the way, we answer affirmatively the question whether makespan minimization on identical machines ($P||C_{\max}$) is fixed-parameter tractable parameterized by$k$, extending our understanding of this fundamental problem. Together with our hardness results for$Q||C_{\max}$this implies that the complexity of$P|HM|C_{\max}$is the only remaining open case. We study the limits of an information intermediary in Bayesian auctions. Formally, we consider the standard single-item auction, with a revenue-maximizing seller and$n$buyers with independent private values; in addition, we now have an intermediary who knows the buyers' true values, and can map these to a public signal so as to try to increase buyer surplus. This model was proposed by Bergemann et al., who present a signaling scheme for the single-buyer setting that raises the optimal consumer surplus, by guaranteeing the item is always sold while ensuring the seller gets the same revenue as without signaling. Our work aims to understand how this result ports to the setting with multiple buyers. Our first result is an impossibility: We show that such a signaling scheme need not exist even for$n=2$buyers with$2$-point valuation distributions. Indeed, no signaling scheme can always allocate the item to the highest-valued buyer while preserving any non-trivial fraction of the original consumer surplus; further, no signaling scheme can achieve consumer surplus better than a factor of$\frac{1}{2}$compared to the maximum achievable. These results are existential (and not computational) impossibilities, and thus provide a sharp separation between the single and multi-buyer settings. On the positive side, for discrete valuation distributions, we develop signaling schemes with good approximation guarantees for the consumer surplus compared to the maximum achievable, in settings where either the number of agents, or the support size of valuations, is small. Formally, for i.i.d. buyers, we present an$O(\min(\log n, K))$-approximation where$K$is the support size of the valuations. Moreover, for general distributions, we present an$O(\min(n \log n, K^2))$-approximation. Our signaling schemes are conceptually simple and computable in polynomial (in$n$and$K) time. We study how neural networks trained by gradient descent extrapolate, i.e., what they learn outside the support of training distribution. Previous works report mixed empirical results when extrapolating with neural networks: while multilayer perceptrons (MLPs) do not extrapolate well in simple tasks, Graph Neural Networks (GNNs), a structured network with MLP modules, have some success in more complex tasks. We provide a theoretical explanation and identify conditions under which MLPs and GNNs extrapolate well. We start by showing ReLU MLPs trained by gradient descent converge quickly to linear functions along any direction from the origin, which suggests ReLU MLPs cannot extrapolate well in most non-linear tasks. On the other hand, ReLU MLPs can provably converge to a linear target function when the training distribution is "diverse" enough. These observations lead to a hypothesis: GNNs can extrapolate well in dynamic programming (DP) tasks if we encode appropriate non-linearity in the architecture and input representation. We provide theoretical and empirical support for the hypothesis. Our theory explains previous extrapolation success and suggest their limitations: successful extrapolation relies on incorporating task-specific non-linearity, which often requires domain knowledge or extensive model search. This paper proposed an ensemble of deep convolutional neural networks (CNN) based on EfficientNet, named ECOVNet, to detect COVID-19 using a large chest X-ray data set. At first, the open-access large chest X-ray collection is augmented, and then ImageNet pre-trained weights for EfficientNet is transferred with some customized fine-tuning top layers that are trained, followed by an ensemble of model snapshots to classify chest X-rays corresponding to COVID-19, normal, and pneumonia. The predictions of the model snapshots, which are created during a single training, are combined through two ensemble strategies, i.e., hard ensemble and soft ensemble to ameliorate classification performance and generalization in the related task of classifying chest X-rays. Constrained robot motion planning is a widely used technique to solve complex robot tasks. We consider the problem of learning representations of constraints from demonstrations with a deep neural network, which we call Equality Constraint Manifold Neural Network (ECoMaNN). The key idea is to learn a level-set function of the constraint suitable for integration into a constrained sampling-based motion planner. Learning proceeds by aligning subspaces in the network with subspaces of the data. We combine both learned constraints and analytically described constraints into the planner and use a projection-based strategy to find valid points. We evaluate ECoMaNN on its representation capabilities of constraint manifolds, the impact of its individual loss terms, and the motions produced when incorporated into a planner. Stochastic gradient descent (SGD) is often applied to train Deep Neural Networks (DNNs), and research efforts have been devoted to investigate the convergent dynamics of SGD and minima found by SGD. The influencing factors identified in the literature include learning rate, batch size, Hessian, and gradient covariance, and stochastic differential equations are used to model SGD and establish the relationships among these factors for characterizing minima found by SGD. It has been found that the ratio of batch size to learning rate is a main factor in highlighting the underlying SGD dynamics; however, the influence of other important factors such as the Hessian and gradient covariance is not entirely agreed upon. This paper describes the factors and relationships in the recent literature and presents numerical findings on the relationships. In particular, it confirms the four-factor and general relationship results obtained in Wang (2019), while the three-factor and associated relationship results found in Jastrz\c{e}bski et al. (2018) may not hold beyond the considered special case. A common dilemma in 3D object detection for autonomous driving is that high-quality, dense point clouds are only available during training, but not testing. We use knowledge distillation to bridge the gap between a model trained on high-quality inputs at training time and another tested on low-quality inputs at inference time. In particular, we design a two-stage training pipeline for point cloud object detection. First, we train an object detection model on dense point clouds, which are generated from multiple frames using extra information only available at training time. Then, we train the model's identical counterpart on sparse single-frame point clouds with consistency regularization on features from both models. We show that this procedure improves performance on low-quality data during testing, without additional overhead. We ask the question whether entropy accumulates, in the sense that the operationally relevant total uncertainty about ann$-partite system$A = (A_1, \ldots A_n)$corresponds to the sum of the entropies of its parts$A_i$. The Asymptotic Equipartition Property implies that this is indeed the case to first order in$n$, under the assumption that the parts$A_i$are identical and independent of each other. Here we show that entropy accumulation occurs more generally, i.e., without an independence assumption, provided one quantifies the uncertainty about the individual systems$A_i$by the von Neumann entropy of suitably chosen conditional states. The analysis of a large system can hence be reduced to the study of its parts. This is relevant for applications. In device-independent cryptography, for instance, the approach yields essentially optimal security bounds valid for general attacks, as shown by Arnon-Friedman et al. Our concern is the overhead of answering OWL 2 QL ontology-mediated queries (OMQs) in ontology-based data access compared to evaluating their underlying tree-shaped and bounded treewidth conjunctive queries (CQs). We show that OMQs with bounded-depth ontologies have nonrecursive datalog (NDL) rewritings that can be constructed and evaluated in LOGCFL for combined complexity, even in NL if their CQs are tree-shaped with a bounded number of leaves, and so incur no overhead in complexity-theoretic terms. For OMQs with arbitrary ontologies and bounded-leaf CQs, NDL-rewritings are constructed and evaluated in LOGCFL. We show experimentally feasibility and scalability of our rewritings compared to previously proposed NDL-rewritings. On the negative side, we prove that answering OMQs with tree-shaped CQs is not fixed-parameter tractable if the ontology depth or the number of leaves in the CQs is regarded as the parameter, and that answering OMQs with a fixed ontology (of infinite depth) is NP-complete for tree-shaped and LOGCFL for bounded-leaf CQs. Smartphones have ubiquitously integrated into our home and work environments, however, users normally rely on explicit but inefficient identification processes in a controlled environment. Therefore, when a device is stolen, a thief can have access to the owner's personal information and services against the stored passwords. As a result of this potential scenario, this work proposes an automatic legitimate user identification system based on gait biometrics extracted from user walking patterns captured by a smartphone. A set of preprocessing schemes is applied to calibrate noisy and invalid samples and augment the gait-induced time and frequency domain features, then further optimized using a non-linear unsupervised feature selection method. The selected features create an underlying gait biometric representation able to discriminate among individuals and identify them uniquely. Different classifiers (i.e. Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Bagging, and Extreme Learning Machine (ELM)) are adopted to achieve accurate legitimate user identification. Extensive experiments on a group of$16$individuals in an indoor environment show the effectiveness of the proposed solution: with$5$to$70$samples per window, KNN and bagging classifiers achieve$87-99\%$accuracy,$82-98\%$for ELM, and$81-94\%$for SVM. The proposed pipeline achieves a$100\%$true positive and$0\%$false-negative rate for almost all classifiers. Due to the importance of zero-shot learning, i.e. classifying images where there is a lack of labeled training data, the number of proposed approaches has recently increased steadily. We argue that it is time to take a step back and to analyze the status quo of the area. The purpose of this paper is three-fold. First, given the fact that there is no agreed upon zero-shot learning benchmark, we first define a new benchmark by unifying both the evaluation protocols and data splits of publicly available datasets used for this task. This is an important contribution as published results are often not comparable and sometimes even flawed due to, e.g. pre-training on zero-shot test classes. Moreover, we propose a new zero-shot learning dataset, the Animals with Attributes 2 (AWA2) dataset which we make publicly available both in terms of image features and the images themselves. Second, we compare and analyze a significant number of the state-of-the-art methods in depth, both in the classic zero-shot setting but also in the more realistic generalized zero-shot setting. Finally, we discuss in detail the limitations of the current status of the area which can be taken as a basis for advancing it. We introduce a family of unsupervised, domain-free, and (asymptotically) model-independent algorithms based on the principles of algorithmic probability and information theory designed to minimize the loss of algorithmic information, including a lossless-compression-based lossy compression algorithm. The methods can select and coarse-grain data in an algorithmic-complexity fashion (without the use of popular compression algorithms) by collapsing regions that may procedurally be regenerated from a computable candidate model. We show that the method can preserve the salient properties of objects and perform dimension reduction, denoising, feature selection, and network sparsification. As validation case, we demonstrate that the method preserves all the graph-theoretic indices measured on a well-known set of synthetic and real-world networks of very different nature, ranging from degree distribution and clustering coefficient to edge betweenness and degree and eigenvector centralities, achieving equal or significantly better results than other data reduction and some of the leading network sparsification methods. The methods (InfoRank, MILS) can also be applied to applications such as image segmentation based on algorithmic probability. Deep learning revolutionized data science, and recently its popularity has grown exponentially, as did the amount of papers employing deep networks. Vision tasks, such as human pose estimation, did not escape from this trend. There is a large number of deep models, where small changes in the network architecture, or in the data pre-processing, together with the stochastic nature of the optimization procedures, produce notably different results, making extremely difficult to sift methods that significantly outperform others. This situation motivates the current study, in which we perform a systematic evaluation and statistical analysis of vanilla deep regression, i.e. convolutional neural networks with a linear regression top layer. This is the first comprehensive analysis of deep regression techniques. We perform experiments on four vision problems, and report confidence intervals for the median performance as well as the statistical significance of the results, if any. Surprisingly, the variability due to different data pre-processing procedures generally eclipses the variability due to modifications in the network architecture. Our results reinforce the hypothesis according to which, in general, a general-purpose network (e.g. VGG-16 or ResNet-50) adequately tuned can yield results close to the state-of-the-art without having to resort to more complex and ad-hoc regression models. In classic fair division problems such as cake cutting and rent division, envy-freeness requires that each individual (weakly) prefer his allocation to anyone else's. On a conceptual level, we argue that envy-freeness also provides a compelling notion of fairness for classification tasks. Our technical focus is the generalizability of envy-free classification, i.e., understanding whether a classifier that is envy free on a sample would be almost envy free with respect to the underlying distribution with high probability. Our main result establishes that a small sample is sufficient to achieve such guarantees, when the classifier in question is a mixture of deterministic classifiers that belong to a family of low Natarajan dimension. Matrix-monotonic optimization exploits the monotonic nature of positive semi-definite matrices to derive optimal diagonalizable structures for the matrix variables of matrix-variable optimization problems. Based on the optimal structures derived, the associated optimization problems can be substantially simplified and underlying physical insights can also be revealed. In our work, a comprehensive framework of the applications of matrix-monotonic optimization to multiple-input multiple-output (MIMO) transceiver design is provided for a series of specific performance metrics under various linear constraints. This framework consists of two parts, i.e., Part-I for single-variable optimization and Part-II for multi-variable optimization. In this paper, single-variable matrix-monotonic optimization is investigated under various power constraints and various types of channel state information (CSI) condition. Specifically, three cases are investigated: 1) both the transmitter and receiver have imperfect CSI; 2) perfect CSI is available at the receiver but the transmitter has no CSI; 3) perfect CSI is available at the receiver but the channel estimation error at the transmitter is norm-bounded. In all three cases, the matrix-monotonic optimization framework can be used for deriving the optimal structures of the optimal matrix variables. We propose an Euler transformation that transforms a given$d$-dimensional cell complex$K$for$d=2,3$into a new$d$-complex$\hat{K}$in which every vertex is part of a uniform even number of edges. Hence every vertex in the graph$\hat{G}$that is the$1$-skeleton of$\hat{K}$has an even degree, which makes$\hat{G}$Eulerian, i.e., it is guaranteed to contain an Eulerian tour. Meshes whose edges admit Eulerian tours are crucial in coverage problems arising in several applications including 3D printing and robotics. For$2$-complexes in$\mathbb{R}^2$($d=2$) under mild assumptions (that no two adjacent edges of a$2$-cell in$K$are boundary edges), we show that the Euler transformed$2$-complex$\hat{K}$has a geometric realization in$\mathbb{R}^2$, and that each vertex in its$1$-skeleton has degree$4$. We bound the numbers of vertices, edges, and$2$-cells in$\hat{K}$as small scalar multiples of the corresponding numbers in$K$. We prove corresponding results for$3$-complexes in$\mathbb{R}^3$under an additional assumption that the degree of a vertex in each$3$-cell containing it is$3$. In this setting, every vertex in$\hat{G}$is shown to have a degree of$6$. We also present bounds on parameters measuring geometric quality (aspect ratios, minimum edge length, and maximum angle) of$\hat{K}$in terms of the corresponding parameters of$K$(for$d=2, 3$). Finally, we illustrate a direct application of the proposed Euler transformation in additive manufacturing. We propose and analyse randomized cubature formulae for the numerical integration of functions with respect to a given probability measure$\mu$defined on a domain$\Gamma \subseteq \mathbb{R}^d$, in any dimension$d$. Each cubature formula is conceived to be exact on a given finite-dimensional subspace$V_n\subset L^2(\Gamma,\mu)$of dimension$n$, and uses pointwise evaluations of the integrand function$\phi : \Gamma \to \mathbb{R}$at$m>n$independent random points. These points are distributed according to a suitable auxiliary probability measure that depends on$V_n$. We show that, up to a logarithmic factor, a linear proportionality between$m$and$n$with dimension-independent constant ensures stability of the cubature formula with high probability. We also prove error estimates in probability and in expectation for any$n\geq 1$and$m>n$, thus covering both preasymptotic and asymptotic regimes. Our analysis shows that the expected cubature error decays as$\sqrt{n/m}$times the$L(\Gamma, \mu)$-best approximation error of$\phi$in$V_n$. On the one hand, for fixed$n$and$m\to \infty$our cubature formula can be seen as a variance reduction technique for a Monte Carlo estimator, and can lead to enormous variance reduction for smooth integrand functions and subspaces$V_n$with spectral approximation properties. On the other hand, when we let$n,m\to\infty$, our cubature becomes of high order with spectral convergence. As a further contribution, we analyse also another cubature formula whose expected error decays as$\sqrt{1/m}$times the$L^2(\Gamma,\mu)$-best approximation error of$\phi$in$V_n$, and is therefore asymptotically optimal but with constants that can be larger in the preasymptotic regime. Finally we show that, under a more demanding (at least quadratic) proportionality betweeen$m$and$n$, the weights of the cubature are positive with high probability. We analyze the bootstrap percolation process on the stochastic block model (SBM), a natural extension of the Erd\"{o}s--R\'{e}nyi random graph that allows representing the "community structure" observed in many real systems. In the SBM, nodes are partitioned into subsets, which represent different communities, and pairs of nodes are independently connected with a probability that depends on the communities they belong to. Under mild assumptions on system parameters, we prove the existence of a sharp phase transition for the final number of active nodes and characterize sub-critical and super-critical regimes in terms of the number of initially active nodes, which are selected uniformly at random in each community. We provide non-asymptotic excess risk guarantees for statistical learning in a setting where the population risk with respect to which we evaluate the target parameter depends on an unknown nuisance parameter that must be estimated from data. We analyze a two-stage sample splitting meta-algorithm that takes as input two arbitrary estimation algorithms: one for the target parameter and one for the nuisance parameter. We show that if the population risk satisfies a condition called Neyman orthogonality, the impact of the nuisance estimation error on the excess risk bound achieved by the meta-algorithm is of second order. Our theorem is agnostic to the particular algorithms used for the target and nuisance and only makes an assumption on their individual performance. This enables the use of a plethora of existing results from statistical learning and machine learning to give new guarantees for learning with a nuisance component. Moreover, by focusing on excess risk rather than parameter estimation, we can give guarantees under weaker assumptions than in previous works and accommodate settings in which the target parameter belongs to a complex nonparametric class. We provide conditions on the metric entropy of the nuisance and target classes such that oracle rates---rates of the same order as if we knew the nuisance parameter---are achieved. We also derive new rates for specific estimation algorithms such as variance-penalized empirical risk minimization, neural network estimation and sparse high-dimensional linear model estimation. We highlight the applicability of our results in four settings of central importance: 1) heterogeneous treatment effect estimation, 2) offline policy optimization, 3) domain adaptation, and 4) learning with missing data. Deep reinforcement learning has been successful in a variety of tasks, such as game playing and robotic manipulation. However, attempting to learn \textit{tabula rasa} disregards the logical structure of many domains as well as the wealth of readily available knowledge from domain experts that could help "warm start" the learning process. We present a novel reinforcement learning technique that allows for intelligent initialization of a neural network weights and architecture. Our approach permits the encoding domain knowledge directly into a neural decision tree, and improves upon that knowledge with policy gradient updates. We empirically validate our approach on two OpenAI Gym tasks and two modified StarCraft 2 tasks, showing that our novel architecture outperforms multilayer-perceptron and recurrent architectures. Our knowledge-based framework finds superior policies compared to imitation learning-based and prior knowledge-based approaches. Importantly, we demonstrate that our approach can be used by untrained humans to initially provide >80% increase in expected reward relative to baselines prior to training (p < 0.001), which results in a >60% increase in expected reward after policy optimization (p = 0.011). Quantum walk search may exhibit phenomena beyond the intuition from a conventional random walk theory. One of such examples is exceptional configuration phenomenon -- it appears that it may be much harder to find any of two or more marked vertices, that if only one of them is marked. In this paper, we analyze the probability of finding any of marked vertices in such scenarios and prove upper bounds for various sets of marked vertices. We apply the upper bounds to large collection of graphs and show that the quantum search may be slow even when taking real-world networks. The logarithmic negativity of a bipartite quantum state is a widely employed entanglement measure in quantum information theory, due to the fact that it is easy to compute and serves as an upper bound on distillable entanglement. More recently, the$\kappa$-entanglement of a bipartite state was shown to be the first entanglement measure that is both easily computable and has a precise information-theoretic meaning, being equal to the exact entanglement cost of a bipartite quantum state when the free operations are those that completely preserve the positivity of the partial transpose [Wang and Wilde, Phys. Rev. Lett. 125(4):040502, July 2020]. In this paper, we provide a non-trivial link between these two entanglement measures, by showing that they are the extremes of an ordered family of$\alpha$-logarithmic negativity entanglement measures, each of which is identified by a parameter$\alpha\in[ 1,\infty] $. In this family, the original logarithmic negativity is recovered as the smallest with$\alpha=1$, and the$\kappa$-entanglement is recovered as the largest with$\alpha=\infty$. We prove that the$\alpha $-logarithmic negativity satisfies the following properties: entanglement monotone, normalization, faithfulness, and subadditivity. We also prove that it is neither convex nor monogamous. Finally, we define the$\alpha$-logarithmic negativity of a quantum channel as a generalization of the notion for quantum states, and we show how to generalize many of the concepts to arbitrary resource theories. Network embedding is a method to learn low-dimensional representation vectors for nodes in complex networks. In real networks, nodes may have multiple tags but existing methods ignore the abundant semantic and hierarchical information of tags. This information is useful to many network applications and usually very stable. In this paper, we propose a tag representation learning model, Tag2Vec, which mixes nodes and tags into a hybrid network. Firstly, for tag networks, we define semantic distance as the proximity between tags and design a novel strategy, parameterized random walk, to generate context with semantic and hierarchical information of tags adaptively. Then, we propose hyperbolic Skip-gram model to express the complex hierarchical structure better with lower output dimensions. We evaluate our model on the NBER U.S. patent dataset and WordNet dataset. The results show that our model can learn tag representations with rich semantic information and it outperforms other baselines. Cephalometric tracing method is usually used in orthodontic diagnosis and treat-ment planning. In this paper, we propose a deep learning based framework to au-tomatically detect anatomical landmarks in cephalometric X-ray images. We train the deep encoder-decoder for landmark detection, and combine global landmark configuration with local high-resolution feature responses. The proposed frame-work is based on 2-stage u-net, regressing the multi-channel heatmaps for land-mark detection. In this framework, we embed attention mechanism with global stage heatmaps, guiding the local stage inferring, to regress the local heatmap patches in a high resolution. Besides, the Expansive Exploration strategy im-proves robustness while inferring, expanding the searching scope without in-creasing model complexity. We have evaluated our framework in the most wide-ly-used public dataset of landmark detection in cephalometric X-ray images. With less computation and manually tuning, our framework achieves state-of-the-art results. Surprise-based learning allows agents to rapidly adapt to non-stationary stochastic environments characterized by sudden changes. We show that exact Bayesian inference in a hierarchical model gives rise to a surprise-modulated trade-off between forgetting old observations and integrating them with the new ones. The modulation depends on a probability ratio, which we call "Bayes Factor Surprise", that tests the prior belief against the current belief. We demonstrate that in several existing approximate algorithms the Bayes Factor Surprise modulates the rate of adaptation to new observations. We derive three novel surprised-based algorithms, one in the family of particle filters, one in the family of variational learning, and the other in the family of message passing, that have constant scaling in observation sequence length and particularly simple update dynamics for any distribution in the exponential family. Empirical results show that these surprise-based algorithms estimate parameters better than alternative approximate approaches and reach levels of performance comparable to computationally more expensive algorithms. The Bayes Factor Surprise is related to but different from Shannon Surprise. In two hypothetical experiments, we make testable predictions for physiological indicators that dissociate the Bayes Factor Surprise from Shannon Surprise. The theoretical insight of casting various approaches as surprise-based learning, as well as the proposed online algorithms, may be applied to the analysis of animal and human behavior, and to reinforcement learning in non-stationary environments.$k$-means algorithm is one of the most classical clustering methods, which has been widely and successfully used in signal processing. However, due to the thin-tailed property of the Gaussian distribution,$k$-means algorithm suffers from relatively poor performance on the dataset containing heavy-tailed data or outliers. Besides, standard$k$-means algorithm also has relatively weak stability,$i.e.$its results have a large variance, which reduces the model credibility. In this paper, we propose a robust and stable$k$-means variant, dubbed the$t$-$k$-means, as well as its fast version to alleviate those problems. Theoretically, we derive the$t$-$k$-means and analyze its robustness and stability from the aspect of the loss function and the expression of the clustering center, respectively. A large number of experiments are also conducted, which verify the effectiveness and efficiency of the proposed method. This paper presents a real-time non-probabilistic detection mechanism to detect load-redistribution (LR) attacks against energy management systems (EMSs). Prior studies have shown that certain LR attacks can bypass conventional bad data detectors (BDDs) and remain undetectable, which implies that presence of a reliable and intelligent detection mechanism to flag LR attacks, is imperative. Therefore, in this study a detection mechanism to enhance the existing BDDs is proposed based on the fundamental knowledge of the physics laws in the electric grid. A greedy algorithm, which can optimize the core LR attack problems, is presented to enable a fast mechanism to identify the most sensitive locations for critical assets. The main contribution of this detection mechanism is leveraging of power systems domain insight to identify an underlying exploitable structure for the core problem of LR attack problems, which enables the prediction of the attackers' behavior. Additional contribution includes the ability to combine this approach with other detection mechanisms to increase their likelihood of detection. The proposed approach is applied to 2383-bus Polish test system to demonstrate the scalability of the greedy algorithm, and it solved the attacker's problem more than 10x faster than a traditional linear optimization approach. Policy gradient methods are among the most effective methods in challenging reinforcement learning problems with large state and/or action spaces. However, little is known about even their most basic theoretical convergence properties, including: if and how fast they converge to a globally optimal solution or how they cope with approximation error due to using a restricted class of parametric policies. This work provides provable characterizations of the computational, approximation, and sample size properties of policy gradient methods in the context of discounted Markov Decision Processes (MDPs). We focus on both: "tabular" policy parameterizations, where the optimal policy is contained in the class and where we show global convergence to the optimal policy; and parametric policy classes (considering both log-linear and neural policy classes), which may not contain the optimal policy and where we provide agnostic learning results. One central contribution of this work is in providing approximation guarantees that are average case -- which avoid explicit worst-case dependencies on the size of state space -- by making a formal connection to supervised learning under distribution shift. This characterization shows an important interplay between estimation error, approximation error, and exploration (as characterized through a precisely defined condition number). In this work we consider numerical efficiency and convergence rates for solvers of non-convex multi-penalty formulations when reconstructing sparse signals from noisy linear measurements. We extend an existing approach, based on reduction to an augmented single-penalty formulation, to the non-convex setting and discuss its computational intractability in large-scale applications. To circumvent this limitation, we propose an alternative single-penalty reduction based on infimal convolution that shares the benefits of the augmented approach but is computationally less dependent on the problem size. We provide linear convergence rates for both approaches, and their dependence on design parameters. Numerical experiments substantiate our theoretical findings. Operator-theoretic analysis of nonlinear dynamical systems has attracted much attention in a variety of engineering and scientific fields, endowed with practical estimation methods using data such as dynamic mode decomposition. In this paper, we address a lifted representation of nonlinear dynamical systems with random noise based on transfer operators, and develop a novel Krylov subspace method for estimating the operators using finite data, with consideration of the unboundedness of operators. For this purpose, we first consider Perron-Frobenius operators with kernel-mean embeddings for such systems. We then extend the Arnoldi method, which is the most classical type of Kryov subspace method, so that it can be applied to the current case. Meanwhile, the Arnoldi method requires the assumption that the operator is bounded, which is not necessarily satisfied for transfer operators on nonlinear systems. We accordingly develop the shift-invert Arnoldi method for Perron-Frobenius operators to avoid this problem. Also, we describe an approach of evaluating predictive accuracy by estimated operators on the basis of the maximum mean discrepancy, which is applicable, for example, to anomaly detection in complex systems. The empirical performance of our methods is investigated using synthetic and real-world healthcare data. [New and updated results were published in Nature Chemistry, doi:10.1038/s41557-020-0544-y.] The electronic Schr\"odinger equation describes fundamental properties of molecules and materials, but can only be solved analytically for the hydrogen atom. The numerically exact full configuration-interaction method is exponentially expensive in the number of electrons. Quantum Monte Carlo is a possible way out: it scales well to large molecules, can be parallelized, and its accuracy has, as yet, only been limited by the flexibility of the used wave function ansatz. Here we propose PauliNet, a deep-learning wave function ansatz that achieves nearly exact solutions of the electronic Schr\"odinger equation. PauliNet has a multireference Hartree-Fock solution built in as a baseline, incorporates the physics of valid wave functions, and is trained using variational quantum Monte Carlo (VMC). PauliNet outperforms comparable state-of-the-art VMC ansatzes for atoms, diatomic molecules and a strongly-correlated hydrogen chain by a margin and is yet computationally efficient. We anticipate that thanks to the favourable scaling with system size, this method may become a new leading method for highly accurate electronic-strucutre calculations on medium-sized molecular systems. Rapid intensification (RI) of tropical cyclones often causes major destruction to human civilization due to short response time. It is an important yet challenging task to accurately predict this kind of extreme weather event in advance. Traditionally, meteorologists tackle the task with human-driven feature extraction and predictor correction procedures. Nevertheless, these procedures do not leverage the power of modern machine learning models and abundant sensor data, such as satellite images. In addition, the human-driven nature of such an approach makes it difficult to reproduce and benchmark prediction models. In this study, we build a benchmark for RI prediction using only satellite images, which are underutilized in traditional techniques. The benchmark follows conventional data science practices, making it easier for data scientists to contribute to RI prediction. We demonstrate the usefulness of the benchmark by designing a domain-inspired spatiotemporal deep learning model. The results showcase the promising performance of deep learning in solving complex meteorological problems such as RI prediction. A model is proposed to allocate Formula One World Championship prize money among the constructors. The methodology is based on pairwise comparison matrices, allows for the use of any weighting method, and makes possible to tune the level of inequality. We introduce an axiom called scale invariance, which requires the ranking of the teams to be independent of the parameter controlling inequality. The eigenvector method is revealed to violate this condition in our dataset, while the row geometric mean method always satisfies it. The revenue allocation is not influenced by the arbitrary valuation given to the race prizes in the official points scoring system of Formula One and takes the intensity of pairwise preferences into account, contrary to the standard Condorcet method. Our suggestion can be used to share revenues among groups when group members are ranked several times. This paper presents a novel geometrical approach to investigate the convexity of a density-based cluster. Our approach is grid-based and we are about to calibrate the value space of the cluster. However, the cluster objects are coming from an infinite distribution, their number is finite, and thus, the regarding shape will not be sharp. Therefore, we establish the precision of the grid properly in a way that, the reliable approximate boundaries of the cluster are founded. After that, regarding the simple notion of convex sets and midpoint convexity, we investigate whether or not the density-based cluster is convex. Moreover, our experiments on synthetic datasets demonstrate the desirable performance of our method. Detecting manipulated facial images and videos is an increasingly important topic in digital media forensics. As advanced face synthesis and manipulation methods are made available, new types of fake face representations are being created which have raised significant concerns for their use in social media. Hence, it is crucial to detect manipulated face images and localize manipulated regions. Instead of simply using multi-task learning to simultaneously detect manipulated images and predict the manipulated mask (regions), we propose to utilize an attention mechanism to process and improve the feature maps for the classification task. The learned attention maps highlight the informative regions to further improve the binary classification (genuine face v. fake face), and also visualize the manipulated regions. To enable our study of manipulated face detection and localization, we collect a large-scale database that contains numerous types of facial forgeries. With this dataset, we perform a thorough analysis of data-driven fake face detection. We show that the use of an attention mechanism improves facial forgery detection and manipulated region localization. Robotic on-orbit servicing (OOS) is expected to be a key technology and concept for future sustainable space exploration. This paper develops a semi-analytical model for OOS systems analysis, responding to the growing needs and ongoing trend of robotic OOS. An OOS infrastructure system is considered whose goal is to provide responsive services to the random failures of a set of customer modular satellites distributed in space (e.g., at the geosynchronous equatorial orbit). The considered OOS architecture is comprised of a servicer that travels and provides module-replacement services to the customer satellites, an on-orbit depot to store the spares, and a series of launch vehicles to replenish the depot. The OOS system performance is analyzed by evaluating the mean waiting time before service completion for a given failure and its relationship with the depot capacity. Leveraging the queueing theory and inventory management methods, the developed semi-analytical model is capable of analyzing the OOS system performance without relying on computationally costly simulations. The effectiveness of the proposed model is demonstrated using a case study compared with simulation results. This paper is expected to provide a critical step to push the research frontier of analytical/semi-analytical models development for complex space systems design. Most existing knowledge graphs suffer from incompleteness, which can be alleviated by inferring missing links based on known facts. One popular way to accomplish this is to generate low-dimensional embeddings of entities and relations, and use these to make inferences. ConvE, a recently proposed approach, applies convolutional filters on 2D reshapings of entity and relation embeddings in order to capture rich interactions between their components. However, the number of interactions that ConvE can capture is limited. In this paper, we analyze how increasing the number of these interactions affects link prediction performance, and utilize our observations to propose InteractE. InteractE is based on three key ideas -- feature permutation, a novel feature reshaping, and circular convolution. Through extensive experiments, we find that InteractE outperforms state-of-the-art convolutional link prediction baselines on FB15k-237. Further, InteractE achieves an MRR score that is 9%, 7.5%, and 23% better than ConvE on the FB15k-237, WN18RR and YAGO3-10 datasets respectively. The results validate our central hypothesis -- that increasing feature interaction is beneficial to link prediction performance. We make the source code of InteractE available to encourage reproducible research. The article considers linear functions of many (n) variables - multilinear polynomials (MP). The three-steps evaluation is presented that uses the minimal possible number of floating point operations for non-sparse MP at each step. The minimal number of additions is achieved in the algorithm for fast MP derivatives (FMPD) calculation. The cost of evaluating all first derivatives approaches to only 1/8 of MP evaluation with a growing number of variables. The FMPD algorithm structure exhibits similarity to the Fast Fourier Transformation (FFT) algorithm. Most of the existing pre-trained language representation models neglect to consider the linguistic knowledge of texts, which can promote language understanding in NLP tasks. To benefit the downstream tasks in sentiment analysis, we propose a novel language representation model called SentiLARE, which introduces word-level linguistic knowledge including part-of-speech tag and sentiment polarity (inferred from SentiWordNet) into pre-trained models. We first propose a context-aware sentiment attention mechanism to acquire the sentiment polarity of each word with its part-of-speech tag by querying SentiWordNet. Then, we devise a new pre-training task called label-aware masked language model to construct knowledge-aware language representation. Experiments show that SentiLARE obtains new state-of-the-art performance on a variety of sentiment analysis tasks. Although ADAM is a very popular algorithm for optimizing the weights of neural networks, it has been recently shown that it can diverge even in simple convex optimization examples. Several variants of ADAM have been proposed to circumvent this convergence issue. In this work, we study the ADAM algorithm for smooth nonconvex optimization under a boundedness assumption on the adaptive learning rate. The bound on the adaptive step size depends on the Lipschitz constant of the gradient of the objective function and provides safe theoretical adaptive step sizes. Under this boundedness assumption, we show a novel first order convergence rate result in both deterministic and stochastic contexts. Furthermore, we establish convergence rates of the function value sequence using the Kurdyka-Lojasiewicz property. Existing systems for metadata-hiding messaging that provide cryptographic privacy properties have either high communication costs, high computation costs, or both. In this paper, we introduce Express, a metadata-hiding communication system that significantly reduces both communication and computation costs. Express is a two-server system that provides cryptographic security against an arbitrary number of malicious clients and one malicious server. In terms of communication, Express only incurs a constant-factor overhead per message sent regardless of the number of users, whereas previous cryptographically-secure systems Pung and Riposte had communication costs proportional to roughly the square root of the number of users. In terms of computation, Express only uses symmetric key cryptographic primitives and makes both practical and asymptotic improvements on protocols employed by prior work. These improvements enable Express to increase message throughput, reduce latency, and consume over 100x less bandwidth than Pung and Riposte, dropping the end to end cost of running a realistic whistleblowing application by 6x. We introduce The Benchmark of Linguistic Minimal Pairs (shortened to BLiMP), a challenge set for evaluating what language models (LMs) know about major grammatical phenomena in English. BLiMP consists of 67 sub-datasets, each containing 1000 minimal pairs isolating specific contrasts in syntax, morphology, or semantics. The data is automatically generated according to expert-crafted grammars, and aggregate human agreement with the labels is 96.4%. We use it to evaluate n-gram, LSTM, and Transformer (GPT-2 and Transformer-XL) LMs. We find that state-of-the-art models identify morphological contrasts reliably, but they struggle with semantic restrictions on the distribution of quantifiers and negative polarity items and subtle syntactic phenomena such as extraction islands. We present a method to analyse arrival directions of ultra-high-energy cosmic rays (UHECRs) using a classifier defined by a deep convolutional neural network trained on a HEALPix grid. To illustrate a high effectiveness of the method, we employ it to estimate prospects of detecting a large-scale anisotropy of UHECRs induced by a nearby source with an (orbital) detector having a uniform exposure of the celestial sphere and compare the results with our earlier calculations based on the angular power spectrum. A minimal model for extragalactic cosmic rays and neutrinos by Kachelrie{\ss}, Kalashev, Ostapchenko and Semikoz (2017) is assumed for definiteness and nearby active galactic nuclei Centaurus A, M82, NGC 253, M87 and Fornax A are considered as possible sources of UHECRs. We demonstrate that the proposed method drastically improves sensitivity of an experiment by decreasing the minimal required amount of detected UHECRs or the minimal detectable fraction of from-source events several times compared to the approach based on the angular power spectrum. We also test robustness of the neural networks against different models of the large-scale Galactic magnetic fields and variations of the mass composition of UHECRs, and consider situations when there are two nearby sources or the dominating source is not known a~priori. In all cases, the neural networks demonstrate good performance unless the test models strongly deviate from those used for training. The method can be readily applied to the analysis of data of the Telescope Array, the Pierre Auger Observatory and other cosmic ray experiments. We propose to use the {\L}ojasiewicz inequality as a general tool for analyzing the convergence rate of gradient descent on a Hilbert manifold, without resorting to the continuous gradient flow. Using this tool, we show that a Sobolev gradient descent method with adaptive inner product converges exponentially fast to the ground state for the Gross-Pitaevskii eigenproblem. This method can be extended to a class of general high-degree optimizations or nonlinear eigenproblems under certain conditions. We demonstrate this generalization by several examples, in particular a nonlinear Schr\"odinger eigenproblem with an extra high-order interaction term. Numerical experiments are presented for these problems. Given two pairs of quantum states, a fundamental question in the resource theory of asymmetric distinguishability is to determine whether there exists a quantum channel converting one pair to the other. In this work, we reframe this question in such a way that a catalyst can be used to help perform the transformation, with the only constraint on the catalyst being that its reduced state is returned unchanged, so that it can be used again to assist a future transformation. What we find here, for the special case in which the states in a given pair are commuting, and thus quasi-classical, is that this catalytic transformation can be performed if and only if the relative entropy of one pair of states is larger than that of the other pair. This result endows the relative entropy with a fundamental operational meaning that goes beyond its traditional interpretation in the setting of independent and identical resources. Our finding thus has an immediate application and interpretation in the resource theory of asymmetric distinguishability, and we expect it to find application in other domains. We present a simple, fully-convolutional model for real-time (>30 fps) instance segmentation that achieves competitive results on MS COCO evaluated on a single Titan Xp, which is significantly faster than any previous state-of-the-art approach. Moreover, we obtain this result after training on only one GPU. We accomplish this by breaking instance segmentation into two parallel subtasks: (1) generating a set of prototype masks and (2) predicting per-instance mask coefficients. Then we produce instance masks by linearly combining the prototypes with the mask coefficients. We find that because this process doesn't depend on repooling, this approach produces very high-quality masks and exhibits temporal stability for free. Furthermore, we analyze the emergent behavior of our prototypes and show they learn to localize instances on their own in a translation variant manner, despite being fully-convolutional. We also propose Fast NMS, a drop-in 12 ms faster replacement for standard NMS that only has a marginal performance penalty. Finally, by incorporating deformable convolutions into the backbone network, optimizing the prediction head with better anchor scales and aspect ratios, and adding a novel fast mask re-scoring branch, our YOLACT++ model can achieve 34.1 mAP on MS COCO at 33.5 fps, which is fairly close to the state-of-the-art approaches while still running at real-time. In this work, we analyze the legal requirements on how cookie banners are supposed to be implemented to be fully compliant with the e-Privacy Directive and the General Data Protection Regulation. Our contribution resides in the definition of seventeen operational and fine-grained requirements on cookie banner design that are legally compliant, and moreover, we define whether and when the verification of compliance of each requirement is technically feasible. The definition of requirements emerges from a joint interdisciplinary analysis composed of lawyers and computer scientists in the domain of web tracking technologies. As such, while some requirements are provided by explicitly codified legal sources, others result from the domain-expertise of computer scientists. In our work, we match each requirement against existing cookie banners design of websites. For each requirement, we exemplify with compliant and non-compliant cookie banners. As an outcome of a technical assessment, we verify per requirement if technical (with computer science tools) or manual (with any human operator) verification is needed to assess compliance of consent and we also show which requirements are impossible to verify with certainty in the current architecture of the Web. For example, we explain how the requirement for revocable consent could be implemented in practice: when consent is revoked, the publisher should delete the consent cookie and communicate the withdrawal to all third parties who have previously received consent. With this approach we aim to support practically-minded parties (compliance officers, regulators, researchers, and computer scientists) to assess compliance and detect violations in cookie banner design and implementation, specially under the current revision of the European Union e-Privacy framework. We propose and study numerically the implicit approximation in time of the Navier-Stokes equations by a Galerkin-collocation method in time combined with inf-sup stable finite element methods in space. The conceptual basis of the Galerkin-collocation approach is the establishment of a direct connection between the Galerkin method and the classical collocation methods, with the perspective of achieving the accuracy of the former with reduced computational costs in terms of less complex algebraic systems of the latter. Regularity of higher order in time of the discrete solution is ensured further. As an additional ingredient, we employ Nitsche's method to impose all boundary conditions in weak form with the perspective that evolving domains become feasible in the future. We carefully compare the performance poroperties of the Galerkin-collocation approach with a standard continuous Galerkin-Petrov method using piecewise linear polynomials in time, that is algebraically equivalent to the popular Crank-Nicholson scheme. The condition number of the arising linear systems after Newton linearization as well as the reliable approximation of the drag and lift coefficient for laminar flow around a cylinder (DFG flow benchmark with$Re=100) are investigated. The superiority of the Galerkin-collocation approach over the linear in time, continuous Galerkin-Petrov method is demonstrated therein. Recent semi-supervised learning methods have shown to achieve comparable results to their supervised counterparts while using only a small portion of labels in image classification tasks thanks to their regularization strategies. In this paper, we take a more direct approach for semi-supervised learning and propose learning to impute the labels of unlabeled samples such that a network achieves better generalization when it is trained on these labels. We pose the problem in a learning-to-learn formulation which can easily be incorporated to the state-of-the-art semi-supervised techniques and boost their performance especially when the labels are limited. We demonstrate that our method is applicable to both classification and regression problems including image classification and facial landmark detection tasks. Graph link prediction is an important task in cyber-security: relationships between entities within a computer network, such as users interacting with computers, or system libraries and the corresponding processes that use them, can provide key insights into adversary behaviour. Poisson matrix factorisation (PMF) is a popular model for link prediction in large networks, particularly useful for its scalability. In this article, PMF is extended to include scenarios that are commonly encountered in cyber-security applications. Specifically, an extension is proposed to explicitly handle binary adjacency matrices and include known covariates associated with the graph nodes. A seasonal PMF model is also presented to handle seasonal networks. To allow the methods to scale to large graphs, variational methods are discussed for performing fast inference. The results show an improved performance over the standard PMF model and other statistical network models. In this paper, we study an intelligent reflecting surface (IRS)-assisted system where a multi-antenna base station (BS) serves a single-antenna user with the help of a multi-element IRS in the presence of interference generated by a multi-antenna BS serving its own single-antenna user. The signal and interference links via the IRS are modeled with Rician fading. To reduce phase adjustment cost, we adopt quasi-static phase shift design where the phase shifts do not change with the instantaneous channel state information (CSI). We investigate two cases of CSI at the BSs, namely, the instantaneous CSI case and the statistical CSI case, and apply Maximum Ratio Transmission (MRT) based on the complete CSI and the CSI of the Line-of-sight (LoS) components, respectively. Different costs on channel estimation and beamforming adjustment are incurred in the two CSI cases. First, we obtain a tractable expression of the average rate in the instantaneous CSI case and a tractable expression of the ergodic rate in the statistical CSI case. We also provide sufficient conditions for the average rate in the instantaneous CSI case to surpass the ergodic rate in the statistical CSI case, at any phase shifts. Then, we maximize the average rate and ergodic rate, both with respect to the phase shifts, leading to two non-convex optimization problems. For each problem, we obtain a globally optimal solution under certain system parameters, and propose an iterative algorithm based on parallel coordinate descent (PCD) to obtain a stationary point under arbitrary system parameters. Next, in each CSI case, we provide sufficient conditions under which the optimal quasi-static phase shift design is beneficial, compared to the system without IRS. Finally, we numerically verify the analytical results and demonstrate notable gains of the proposal solutions over existing ones. The goal of many scientific experiments including A/B testing is to estimate the average treatment effect (ATE), which is defined as the difference between the expected outcomes of two or more treatments. In this paper, we consider a situation where an experimenter can assign a treatment to research subjects sequentially. In adaptive experimental design, the experimenter is allowed to change the probability of assigning a treatment using past observations for estimating the ATE efficiently. However, with this approach, it is difficult to apply a standard statistical method to construct an estimator because the observations are not independent and identically distributed. We thus propose an algorithm for efficient experiments with estimators constructed from dependent samples. We also introduce a sequential testing framework using the proposed estimator. To justify our proposed approach, we provide finite and infinite sample analyses. Finally, we experimentally show that the proposed algorithm exhibits preferable performance. Compressive sensing (CS) is widely used to reduce the acquisition time of magnetic resonance imaging (MRI). Although state-of-the-art deep learning based methods have been able to obtain fast, high-quality reconstruction of CS-MR images, their main drawback is that they treat complex-valued MRI data as real-valued entities. Most methods either extract the magnitude from the complex-valued entities or concatenate them as two real-valued channels. In both the cases, the phase content, which links the real and imaginary parts of the complex-valued entities, is discarded. In order to address the fundamental problem of real-valued deep networks, i.e. their inability to process complex-valued data, we propose a novel framework based on a complex-valued generative adversarial network (Co-VeGAN). Our model can process complex-valued input, which enables it to perform high-quality reconstruction of the CS-MR images. Further, considering that phase is a crucial component of complex-valued entities, we propose a novel complex-valued activation function, which is sensitive to the phase of the input. Extensive evaluation of the proposed approach on different datasets using various sampling masks demonstrates that the proposed model significantly outperforms the existing CS-MRI reconstruction techniques in terms of peak signal-to-noise ratio as well as structural similarity index. Further, it uses significantly fewer trainable parameters to do so, as compared to the real-valued deep learning based methods. The translation equivariance of convolutional layers enables convolutional neural networks to generalize well on image problems. While translation equivariance provides a powerful inductive bias for images, we often additionally desire equivariance to other transformations, such as rotations, especially for non-image data. We propose a general method to construct a convolutional layer that is equivariant to transformations from any specified Lie group with a surjective exponential map. Incorporating equivariance to a new group requires implementing only the group exponential and logarithm maps, enabling rapid prototyping. Showcasing the simplicity and generality of our method, we apply the same model architecture to images, ball-and-stick molecular data, and Hamiltonian dynamical systems. For Hamiltonian systems, the equivariance of our models is especially impactful, leading to exact conservation of linear and angular momentum. The fast Fourier transform (FFT) is one of the most successful numerical algorithms of the 20th century and has found numerous applications in many branches of computational science and engineering. The FFT algorithm can be derived from a particular matrix decomposition of the discrete Fourier transform (DFT) matrix. In this paper, we show that the quantum Fourier transform (QFT) can be derived by further decomposing the diagonal factors of the FFT matrix decomposition into products of matrices with Kronecker product structure. We analyze the implication of this Kronecker product structure on the discrete Fourier transform of rank-1 tensors on a classical computer. We also explain why such a structure can take advantage of an important quantum computer feature that enables the QFT algorithm to attain an exponential speedup on a quantum computer over the FFT algorithm on a classical computer. Further, the connection between the matrix decomposition of the DFT matrix and a quantum circuit is made. We also discuss a natural extension of a radix-2 QFT decomposition to a radix-d QFT decomposition. No prior knowledge of quantum computing is required to understand what is presented in this paper. Yet, we believe this paper may help readers to gain some rudimentary understanding of the nature of quantum computing from a matrix computation point of view. Efficient numerical solvers for sparse linear systems are crucial in science and engineering. One of the fastest methods for solving large-scale sparse linear systems is algebraic multigrid (AMG). The main challenge in the construction of AMG algorithms is the selection of the prolongation operator -- a problem-dependent sparse matrix which governs the multiscale hierarchy of the solver and is critical to its efficiency. Over many years, numerous methods have been developed for this task, and yet there is no known single right answer except in very special cases. Here we propose a framework for learning AMG prolongation operators for linear systems with sparse symmetric positive (semi-) definite matrices. We train a single graph neural network to learn a mapping from an entire class of such matrices to prolongation operators, using an efficient unsupervised loss function. Experiments on a broad class of problems demonstrate improved convergence rates compared to classical AMG, demonstrating the potential utility of neural networks for developing sparse system solvers. Facial action unit (AU) detection and face alignment are two highly correlated tasks, since facial landmarks can provide precise AU locations to facilitate the extraction of meaningful local features for AU detection. However, most existing AU detection works handle the two tasks independently by treating face alignment as a preprocessing, and often use landmarks to predefine a fixed region or attention for each AU. In this paper, we propose a novel end-to-end deep learning framework for joint AU detection and face alignment, which has not been explored before. In particular, multi-scale shared feature is learned firstly, and high-level feature of face alignment is fed into AU detection. Moreover, to extract precise local features, we propose an adaptive attention learning module to refine the attention map of each AU adaptively. Finally, the assembled local features are integrated with face alignment feature and global feature for AU detection. Extensive experiments demonstrate that our framework (i) significantly outperforms the state-of-the-art AU detection methods on the challenging BP4D, DISFA, GFT and BP4D+ benchmarks, (ii) can adaptively capture the irregular region of each AU, (iii) achieves competitive performance for face alignment, and (iv) also works well under partial occlusions and non-frontal poses. The code for our method is available at https://github.com/ZhiwenShao/PyTorch-JAANet. The algebraic reformulation of molecular Quantum Electrodynamics (mQED) at finite temperatures is applied to Nuclear Magnetic Resonance (NMR) in order to provide a foundation for the reconstruction of much more detailed molecular structures, than possible with current methods. Conventional NMR theories are based on the effective spin model which idealizes nuclei as fixed point particles in a latticeL$, while molecular vibrations, bond rotations and proton exchange cause a delocalization of nuclei. Hence, a lot information on molecular structures remain hidden in experimental NMR data, if the effective spin model is used for the investigation. In this document it is shown how the quantum mechanical probability density$\mid\Psi^\beta(X)\mid^2$on$\mathbb{R}^{3n}$for the continuous, spatial distribution of$n$nuclei can be reconstructed from NMR data. To this end, it is shown how NMR spectra can be calculated directly from mQED at finite temperatures without involving the effective description. The fundamental problem of performing numerical calculations with the infinite-dimensional radiation field is solved by using a purified representation of a KMS state on a$W^*-dynamical system. Furthermore, it is shown that the presented method corrects wrong predictions of the effective spin model. It is outlined that the presented method can be applied to any molecular system whose electronic ground state can be calculated using a common quantum chemical method. Therefore, the presented method may replace the effective spin model which forms the basis for NMR theory since 1950. We introduce a method for computing some pseudo-elliptic integrals in terms of elementary functions. The method is simple and fast in comparison to the algebraic case of the Risch-Trager-Bronstein algorithm. This method can quickly solve many pseudo-elliptic integrals, which other well-known computer algebra systems either fail, return an answer in terms of special functions, or require more than 20 seconds of computing time. Randomised tests showed our method solved 73.4% of the integrals that could be solved with the best implementation of the Risch-Trager-Bronstein algorithm. Unlike the symbolic integration algorithms of Risch, Davenport, Trager, Bronstein and Miller; our method is not a decision process. The implementation of this method is less than 200 lines of Mathematica code and can be easily ported to other CAS that can solve systems of polynomial equations. Reading comprehension models often overfit to nuances of training datasets and fail at adversarial evaluation. Training with adversarially augmented dataset improves robustness against those adversarial attacks but hurts generalization of the models. In this work, we present several effective adversaries and automated data augmentation policy search methods with the goal of making reading comprehension models more robust to adversarial evaluation, but also improving generalization to the source domain as well as new domains and languages. We first propose three new methods for generating QA adversaries, that introduce multiple points of confusion within the context, show dependence on insertion location of the distractor, and reveal the compounding effect of mixing adversarial strategies with syntactic and semantic paraphrasing methods. Next, we find that augmenting the training datasets with uniformly sampled adversaries improves robustness to the adversarial attacks but leads to decline in performance on the original unaugmented dataset. We address this issue via RL and more efficient Bayesian policy search methods for automatically learning the best augmentation policy combinations of the transformation probability for each adversary in a large search space. Using these learned policies, we show that adversarial training can lead to significant improvements in in-domain, out-of-domain, and cross-lingual (German, Russian, Turkish) generalization. We address the problem of analyzing the performance of 3D face alignment (3DFA) algorithms. Traditionally, performance analysis relies on carefully annotated datasets. Here, these annotations correspond to the 3D coordinates of a set of pre-defined facial landmarks. However, this annotation process, be it manual or automatic, is rarely error-free, which strongly biases the analysis. In contrast, we propose a fully unsupervised methodology based on robust statistics and a parametric confidence test. We revisit the problem of robust estimation of the rigid transformation between two point sets and we describe two algorithms, one based on a mixture between a Gaussian and a uniform distribution, and another one based on the generalized Student's t-distribution. We show that these methods are robust to up to 50% outliers, which makes them suitable for mapping a face, from an unknown pose to a frontal pose, in the presence of facial expressions and occlusions. Using these methods in conjunction with large datasets of face images, we build a statistical frontal facial model and an associated parametric confidence metric, eventually used for performance analysis. We empirically show that the proposed pipeline is neither method-biased nor data-biased, and that it can be used to assess both the performance of 3DFA algorithms and the accuracy of annotations of face datasets. Previous works on Natural Language Generation (NLG) from structured data have primarily focused on surface-level descriptions of record sequences. However, for complex structured data, e.g., multi-row tables, it is often desirable for an NLG system to describe interesting facts from logical inferences across records. If only provided with the table, it is hard for existing models to produce controllable and high-fidelity logical generations. In this work, we formulate logical level NLG as generation from logical forms in order to obtain controllable, high-fidelity, and faithful generations. We present a new large-scale dataset, \textsc{Logic2Text}, with 10,753 descriptions involving common logic types paired with the underlying logical forms. The logical forms show diversified graph structure of free schema, which poses great challenges on the model's ability to understand the semantics. We experiment on (1) Fully-supervised training with the full datasets, and (2) Few-shot setting, provided with hundreds of paired examples; We compare several popular generation models and analyze their performances. We hope our dataset can encourage research towards building an advanced NLG system capable of natural, faithful, and human-like generation. The dataset and code are available at https://github.com/czyssrs/Logic2Text. We propose transfer learning as a method for analyzing the encoding of grammatical structure in neural language models. We train LSTMs on non-linguistic data and evaluate their performance on natural language to assess which kinds of data induce generalizable structural features that LSTMs can use for natural language. We find that training on non-linguistic data with latent structure (MIDI music or Java code) improves test performance on natural language, despite no overlap in surface form or vocabulary. Training on artificial languages containing recursion (hierarchical structure) also improves performance on natural language, again with no vocabulary overlap. Surprisingly, training on artificial languages consisting of sets of separated pairs of words, but with no recursion, improves performance on natural language as well as recursive languages do. Experiments on transfer between natural languages show that zero-shot performance on a test language is highly correlated with typological syntactic similarity to the training language, suggesting that representations induced from natural languages correspond to the cross-linguistic syntactic properties studied in linguistic typology. Our results provide insights into the ways that neural models represent abstract syntactic structure, and also about the kind of structural inductive biases which a learner needs to model language. This paper studies the possibility of detecting and isolating topology failures (including link failures and node failures) of a networked system from subsystem measurements, in which subsystems are of fixed high-order linear dynamics, and the exact interaction weights among them are unknown. We prove that in such class of networked systems with the same network topologies, the detectability and isolability of a given topology failure (set) are generic properties, indicating that it is the network topology that dominates the property of being detectable or isolable for a failure (set). We first give algebraic conditions for detectability and isolability of arbitrary parameter perturbations for a lumped plant, and then derive graph-theoretical necessary and sufficient conditions for generic detectability and isolability of topology failures for the networked systems. On the basis of these results, we consider the problems of deploying the smallest set of sensors for generic detectability and isolability. We reduce the associated sensor placement problems to the hitting set problems, which can be effectively solved by greedy algorithms with guaranteed approximation performances. Aircraft performance models play a key role in airline operations, especially in planning a fuel-efficient flight. In practice, manufacturers provide guidelines which are slightly modified throughout the aircraft life cycle via the tuning of a single factor, enabling better fuel predictions. However this has limitations, in particular they do not reflect the evolution of each feature impacting the aircraft performance. Our goal here is to overcome this limitation. The key contribution of the present article is to foster the use of machine learning to leverage the massive amounts of data continuously recorded during flights performed by an aircraft and provide models reflecting its actual and individual performance. We illustrate our approach by focusing on the estimation of the drag and lift coefficients from recorded flight data. As these coefficients are not directly recorded, we resort to aerodynamics approximations. As a safety check, we provide bounds to assess the accuracy of both the aerodynamics approximation and the statistical performance of our approach. We provide numerical results on a collection of machine learning algorithms. We report excellent accuracy on real-life data and exhibit empirical evidence to support our modelling, in coherence with aerodynamics principles. Understanding the characteristics of public attention and sentiment is an essential prerequisite for appropriate crisis management during adverse health events. This is even more crucial during a pandemic such as COVID-19, as primary responsibility of risk management is not centralized to a single institution, but distributed across society. While numerous studies utilize Twitter data in descriptive or predictive context during COVID-19 pandemic, causal modeling of public attention has not been investigated. In this study, we propose a causal inference approach to discover and quantify causal relationships between pandemic characteristics (e.g. number of infections and deaths) and Twitter activity as well as public sentiment. Our results show that the proposed method can successfully capture the epidemiological domain knowledge and identify variables that affect public attention and sentiment. We believe our work contributes to the field of infodemiology by distinguishing events that correlate with public attention from events that cause public attention. A basic model for key agreement with a remote (or hidden) source is extended to a multi-user model with joint secrecy and privacy constraints over all entities that do not trust each other after key agreement. Multiple entities using different measurements of the same source through broadcast channels (BCs) to agree on mutually-independent local secret keys are considered. Our model is the proper multi-user extension of the basic model since the encoder and decoder pairs are not assumed to trust other pairs after key agreement, unlike assumed in the literature. Strong secrecy constraints imposed on all secret keys jointly, which is more stringent than separate secrecy leakage constraints for each secret key considered in the literature, are satisfied. Inner bounds for maximum key rate, and minimum privacy-leakage and database-storage rates are proposed for any finite number of entities. Inner and outer bounds for degraded and less-noisy BCs are given to illustrate cases with strong privacy. A multi-enrollment model that is used for common physical unclonable functions is also considered to establish inner and outer bounds for key-leakage-storage regions that differ only in the Markov chains imposed. For this special case, the encoder and decoder measurement channels have the same channel transition matrix and secrecy leakage is measured for each secret key separately. We illustrate cases for which it is useful to have multiple enrollments as compared to a single enrollment and vice versa. Artificial neural networks (ANNs) are essential tools in machine learning that have drawn increasing attention in neuroscience. Besides offering powerful techniques for data analysis, ANNs provide a new approach for neuroscientists to build models for complex behaviors, heterogeneous neural activity and circuit connectivity, as well as to explore optimization in neural systems, in ways that traditional models are not designed for. In this pedagogical Primer, we introduce ANNs and demonstrate how they have been fruitfully deployed to study neuroscientific questions. We first discuss basic concepts and methods of ANNs. Then, with a focus on bringing this mathematical framework closer to neurobiology, we detail how to customize the analysis, structure, and learning of ANNs to better address a wide range of challenges in brain research. To help the readers garner hands-on experience, this Primer is accompanied with tutorial-style code in PyTorch and Jupyter Notebook, covering major topics. Dropout is a well-known regularization method by sampling a sub-network from a larger deep neural network and training different sub-networks on different subsets of the data. Inspired by the dropout concept, we propose EDropout as an energy-based framework for pruning neural networks in classification tasks. In this approach, a set of binary pruning state vectors (population) represents a set of corresponding sub-networks from an arbitrary provided original neural network. An energy loss function assigns a scalar energy loss value to each pruning state. The energy-based model stochastically evolves the population to find states with lower energy loss. The best pruning state is then selected and applied to the original network. Similar to dropout, the kept weights are updated using backpropagation in a probabilistic model. The energy-based model again searches for better pruning states and the cycle continuous. Indeed, this procedure is in fact switching between the energy model, which manages the pruning states, and the probabilistic model, which updates the temporarily unpruned weights, in each iteration. The population can dynamically converge to a pruning state. This can be interpreted as dropout leading to pruning the network. From an implementation perspective, EDropout can prune typical neural networks without modification of the network architecture. We evaluated the proposed method on different flavours of ResNets, AlexNet, and SqueezeNet on the Kuzushiji, Fashion, CIFAR-10, CIFAR-100, and Flowers datasets, and compared the pruning rate and classification performance of the models. On average the networks trained with \textit{EDropout} achieved a pruning rate of more than50\%$of the trainable parameters with approximately$<5\%$and$<1\%$drop of Top-1 and Top-5 classification accuracy, respectively. In this paper we study the multiple-processor multitask scheduling problem in deterministic and stochastic models. We consider and analyze M-SRPT, a simple modification of the shortest remaining processing time algorithm, which always schedules jobs according to SRPT whenever possible, while processes tasks in an arbitrary order. The modified SRPT algorithm is shown to achieve a competitive ratio of$\Theta(\log \alpha +\beta)$for minimizing the flow time, where$\alpha$denotes the ratio of maximum job workload and minimum job workload,$\beta$represents the ratio between maximum non-preemptive task workload and minimum job workload. The algorithm is shown to be optimal (up to a constant factor) when there are constant number of machines. We further consider the problem under Poisson arrival and general workload distribution, and show that M-SRPT is asymptotic optimal when the traffic intensity$\rho$approaches$1$, if the job size distribution has bounded support. In addition, we also present a characterization of the heavy traffic behavior of work-conserving algorithms in single-task job scheduling, and prove that the average flow time in$GI/GI/1$system scales with$\frac{1}{1-\rho}$, if the job size distribution has bounded support, which generalizes the result in [Lin, Wierman and Zwart, 2011]. This paper presents a batch-wise density-based clustering approach for local outlier detection in massive-scale datasets. Differently from well-known traditional algorithms, which assume that all the data is memory-resident, our proposed method is scalable and processes the data chunk-by-chunk within the confines of a limited memory buffer. At first, a temporary clustering model is built, then it is incrementally updated by analyzing consecutive memory loads of points. Ultimately, the proposed algorithm will give an outlying score to each object, which is named SDCOR (Scalable Density-based Clustering Outlierness Ratio). Evaluations on real-life and synthetic datasets demonstrate that the proposed method has a low linear time complexity and is more effective and efficient compared to best-known conventional density-based methods, which need to load all data into the memory; and also, to some fast distance-based methods, which can perform on data resident in the disk. The key challenge in single image 3D shape reconstruction is to ensure that deep models can generalize to shapes which were not part of the training set. This is difficult because the algorithm must infer the occluded portion of the surface by leveraging the shape characteristics of the training data, and can therefore be vulnerable to overfitting. Such generalization to unseen categories of objects is a function of architecture design and training approaches. This paper introduces SDFNet, a novel shape prediction architecture and training approach which supports effective generalization. We provide an extensive investigation of the factors which influence generalization accuracy and its measurement, ranging from the consistent use of 3D shape metrics to the choice of rendering approach and the large-scale evaluation on unseen shapes using ShapeNetCore.v2 and ABC. We show that SDFNet provides state-of-the-art performance on seen and unseen shapes relative to existing baseline methods GenRe and OccNet. We provide the first large-scale experimental evaluation of generalization performance. The codebase released with this article will allow for the consistent evaluation and comparison of methods for single image shape reconstruction. A geometric nonlinear observer algorithm for Simultaneous Localization and Mapping (SLAM) developed on the Lie group of \mathbb{SLAM}_{n}\left(3\right) is proposed. The presented novel solution estimates the vehicle's pose (i.e. attitude and position) with respect to landmarks simultaneously positioning the reference features in the global frame. The proposed estimator on manifold is characterized by predefined measures of transient and steady-state performance. Dynamically reducing boundaries guide the error function of the system to reduce asymptotically to the origin from its starting position within a large given set. The proposed observer has the ability to use the available velocity and feature measurements directly. Also, it compensates for unknown constant bias attached to velocity measurements. Unit-qauternion of the proposed observer is presented. Numerical results reveal effectiveness of the proposed observer. Keywords: Nonlinear filter algorithm, Nonlinear observer for Simultaneous Localization and Mapping, Nonlinear estimator, nonlinear SLAM observer on manifold, nonlinear SLAM filter on matrix Lie Group, observer design, asymptotic stability, systematic convergence, Prescribed performance function, pose estimation, attitude filter, position filter, feature filter, landmark filter, gradient based SLAM observer, gradient based observer for SLAM, adaptive estimate, SLAM observer, observer SLAM framework, equivariant observer, inertial vision unit, visual, SLAM filter, SE(3), SO(3). Machine learning becomes increasingly important to tune or even synthesize the behavior of safety-critical components in highly non-trivial environments, where the inability to understand learned components in general, and neural nets in particular, poses serious obstacles to their adoption. Explainability and interpretability methods for learned systems have gained considerable academic attention, but the focus of current approaches on only one aspect of explanation, at a fixed level of abstraction, and limited if any formal guarantees, prevents those explanations from being digestible by the relevant stakeholders (e.g., end users, certification authorities, engineers) with their diverse backgrounds and situation-specific needs. We introduce Fanoos, a flexible framework for combining formal verification techniques, heuristic search, and user interaction to explore explanations at the desired level of granularity and fidelity. We demonstrate the ability of Fanoos to produce and adjust the abstractness of explanations in response to user requests on a learned controller for an inverted double pendulum and on a learned CPU usage model. When people try to understand nuanced language they typically process multiple input sensor modalities to complete this cognitive task. It turns out the human brain has even a specialized neuron formation, called sagittal stratum, to help us understand sarcasm. We use this biological formation as the inspiration for designing a neural network architecture that combines predictions of different models on the same text to construct a robust, accurate and computationally efficient classifier for sentiment analysis. Experimental results on representative benchmark datasets and comparisons to other methods1show the advantages of the new network architecture. Protocols in a quantum network involve multiple parties performing actions on their quantum systems in a carefully orchestrated manner over time in order to accomplish a given task. This sequence of actions over time is often referred to as a strategy, or policy. In this work, we consider policy optimization in a quantum network. Specifically, as a first step towards developing full-fledged quantum network protocols, we consider policies for generating elementary links in a quantum network. We start by casting elementary link generation as a quantum partially observable Markov decision process, as defined in [Phys. Rev. A 90, 032311 (2014)]. Then, we analyze in detail the commonly used memory cutoff policy. Under this policy, once an elementary link is established it is kept in quantum memory for some amount$t^{\star}of time, called the cutoff, before it is discarded and the elementary link generation is reattempted. For this policy, we determine the average quantum state of the elementary link as a function of time for an arbitrary number of nodes in the link, as well as the average fidelity of the link as a function of time for any noise model for the quantum memories. Finally, we show how optimal policies can be obtained in the finite-horizon setting using dynamic programming. By casting elementary link generation as a quantum decision process, this work goes beyond the analytical results derived here by providing the theoretical framework for performing reinforcement learning of practical quantum network protocols. Statisticians have warned us since the early days of their discipline that experimental correlation between two observations by no means implies the existence of a causal relation. The question about what clues exist in observational data that could informs us about the existence of such causal relations is nevertheless more that legitimate. It lies actually at the root of any scientific endeavor. For decades however the only accepted method among statisticians to elucidate causal relationships was the so called Randomized Controlled Trial. Besides this notorious exception causality questions remained largely taboo for many. One reason for this state of affairs was the lack of an appropriate mathematical framework to formulate such questions in an unambiguous way. Fortunately thinks have changed these last years with the advent of the so called Causality Revolution initiated by Judea Pearl and coworkers. The aim of this pedagogical paper is to present their ideas and methods in a compact and self-contained fashion with concrete business examples as illustrations. We evaluate whether BERT, a widely used neural network for sentence processing, acquires an inductive bias towards forming structural generalizations through pretraining on raw data. We conduct four experiments testing its preference for structural vs. linear generalizations in different structure-dependent phenomena. We find that BERT makes a structural generalization in 3 out of 4 empirical domains---subject-auxiliary inversion, reflexive binding, and verb tense detection in embedded clauses---but makes a linear generalization when tested on NPI licensing. We argue that these results are the strongest evidence so far from artificial learners supporting the proposition that a structural bias can be acquired from raw data. If this conclusion is correct, it is tentative evidence that some linguistic universals can be acquired by learners without innate biases. However, the precise implications for human language acquisition are unclear, as humans learn language from significantly less data than BERT. Multi-task learning (MTL) is to learn one single model that performs multiple tasks for achieving good performance on all tasks and lower cost on computation. Learning such a model requires to jointly optimize losses of a set of tasks with different difficulty levels, magnitudes, and characteristics (e.g. cross-entropy, Euclidean loss), leading to the imbalance problem in multi-task learning. To address the imbalance problem, we propose a knowledge distillation based method in this work. We first learn a task-specific model for each task. We then learn the multi-task model for minimizing task-specific loss and for producing the same feature with task-specific models. As the task-specific network encodes different features, we introduce small task-specific adaptors to project multi-task features to the task-specific features. In this way, the adaptors align the task-specific feature and the multi-task feature, which enables a balanced parameter sharing across tasks. Extensive experimental results demonstrate that our method can optimize a multi-task learning model in a more balanced way and achieve better overall performance. We investigate a practical variant of the well-known polygonal visibility path (watchman) problem. For a polygonP$, a minimum link visibility path is a polygonal visibility path in$P$that has the minimum number of links. The problem of finding a minimum link visibility path is NP-hard for simple polygons. If the link-length (number of links) of a minimum link visibility path (tour) is$Opt$for a simple polygon$P$with$n$vertices, we provide an algorithm with$O(kn^2)$runtime that produces polygonal visibility paths (or tours) of link-length at most$(\gamma+a_l/(k-1))Opt$(or$(\gamma+a_l/k)Opt$), where$k$is a parameter dependent on$P$,$a_l$is an output sensitive parameter and$\gamma$is the approximation factor of an$O(k^3)$time approximation algorithm for the geometric traveling salesman problem (path or tour version). Federate learning can conduct machine learning as well as protect the privacy of self-owned training data on corresponding ends, instead of having to upload to a central trusted data aggregation server. In mobile scenarios, a centralized trusted server may not be existing, and even though it exists, the delay will not be manageable, e.g., smart driving cars. Thus, mobile federate learning at the edge with privacy-awareness is attracted more and more attentions. It then imposes a problem - after data are trained on a mobile terminal to obtain a learned model, how to share the model parameters among others to create more accurate and robust accumulative final model. This kind of model sharing confronts several challenges, e.g., the sharing must be conducted without a third trusted party (autonomously), and the sharing must be fair as model training (by training data)is valuable. To tackle the above challenges, we propose a smart contract and IPFS (Inter-Planetary File System) based model sharing protocol and algorithms to address the challenges. The proposed protocol does not rely on a trusted third party, where individual-learned models are shared/stored in corresponding ends. Conducted through extensive experiments, three main steps of the proposed protocol are evaluated. The average executive time of the three steps are 0.059s, 0.060s and 0.032s, demonstrating its efficiency. Inspired by the recent COVID-19 pandemic, we study a generalization of the multi-resource allocation problem with heterogeneous demands and Leontief utilities. Unlike existing settings, we allow each agent to specify a constraint to only accept allocations from a subset of the total supply for each resource. Such constraints often arise from location constraints (e.g. among all of the volunteer nurses, only those who live nearby can work at hospital A due to commute constraints. So hospital A can only receive allocations of volunteers from a subset of the total supply). This can also model a type of substitute effect where some agents need 1 unit of resource A \emph{or} B, but some other agents specifically want A, and some specifically want B. We propose a new mechanism called Group Dominant Resource Fairness which determines the allocations by solving a small number of linear programs. The proposed method satisfies Pareto optimality, envy-freeness, strategy-proofness, and a notion of sharing incentive for our setting. To the best of our knowledge this is the first mechanism to achieve all four properties in our setting. Furthermore, we show numerically that our method scales better to large problems than alternative approaches. Finally, although motivated by the problem of medical resource allocation in a pandemic, our mechanism can be applied more broadly to resource allocation under Leontief utilities with accessibility constraints. Deep neural networks yield the state-of-the-art results in many computer vision and human machine interface applications such as object detection, speech recognition etc. Since, these networks are computationally expensive, customized accelerators are designed for achieving the required performance at lower cost and power. One of the key building blocks of these neural networks is non-linear activation function such as sigmoid, hyperbolic tangent (tanh), and ReLU. A low complexity accurate hardware implementation of the activation function is required to meet the performance and area targets of the neural network accelerators. Even though, various methods and implementations of tanh activation function have been published, a comparative study is missing. This paper presents comparative analysis of polynomial and rational methods and their hardware implementation. The inverse quantum scattering problem for the perturbed Bessel equation is considered. A direct and practical method for solving the problem is proposed. It allows one to reduce the inverse problem to a system of linear algebraic equations, and the potential is recovered from the first component of the solution vector of the system. The approach is based on a special form Fourier-Jacobi series representation for the transmutation operator kernel and the Gelfand-Levitan equation which serves for obtaining the system of linear algebraic equations. The convergence and stability of the method are proved as well as the existence and uniqueness of the solution of the truncated system. Numerical realization of the method is discussed. Results of numerical tests are provided revealing a remarkable accuracy and stability of the method. Advances in Reinforcement Learning (RL) have successfully tackled sample efficiency and overestimation bias. However, these methods often fall short of scalable performance. On the other hand, genetic methods provide scalability but depict hyperparameter sensitivity to evolutionary operations. We present the Evolution-based Soft Actor-Critic (ESAC), a scalable RL algorithm. Our contributions are threefold; ESAC (1) abstracts exploration from exploitation by combining Evolution Strategies (ES) with Soft Actor-Critic (SAC), (2) provides dominant skill transfer between offsprings by making use of soft winner selections and genetic crossovers in hindsight and (3) improves hyperparameter sensitivity in evolutions using Automatic Mutation Tuning (AMT). AMT gradually replaces the entropy framework of SAC allowing the population to succeed at the task while acting as randomly as possible, without making use of backpropagation updates. On a range of challenging control tasks consisting of high-dimensional action spaces and sparse rewards, ESAC demonstrates state-of-the-art performance and sample efficiency equivalent to SAC. ESAC demonstrates scalability comparable to ES on the basis of hardware resources and algorithm overhead. A complete implementation of ESAC with notes on reproducibility and videos can be found at the project website https://karush17.github.io/esac-web/. Grounded language acquisition -- learning how language-based interactions refer to the world around them -- is amajor area of research in robotics, NLP, and HCI. In practice the data used for learning consists almost entirely of textual descriptions, which tend to be cleaner, clearer, and more grammatical than actual human interactions. In this work, we present the Grounded Language Dataset (GoLD), a multimodal dataset of common household objects described by people using either spoken or written language. We analyze the differences and present an experiment showing how the different modalities affect language learning from human in-put. This will enable researchers studying the intersection of robotics, NLP, and HCI to better investigate how the multiple modalities of image, text, and speech interact, as well as show differences in the vernacular of these modalities impact results. We envision a self-sovereign, grassroots, digital community that grows in a bottom up, decentralized manner, and aim to integrate for it the following previously-proposed building blocks: a mechanism that accepts members into the community while keeping a bounded number of sybils; digital social contracts that define the possible interactions of a community bounded by such a contract; a design for a fault-tolerant distributed ledger implementation of digital social contracts; and a digital social contract for the egalitarian and just minting of digital currency, which also offers a form of universal basic income. We augment these building blocks with a mechanism that allows the community to maintain sovereignty over the economy, by making it sybil-resilient. To do so, we assume that the community has the means for exposing sybils and we extend the basic egalitarian currency digital social contract with means to balance the economy so that money minted by sybils is eventually retrieved and burned. This leads---asymptotically---to distributive justice among the genuine agents, with the amount of money minted being equal to the number of genuine agents, multiplied by the time each agent was a member of the community. We then argue that this approach constitutes a mechanism that deters the creation of sybils and incentivizes sybil hunting. Out-of-Distribution (OOD) generalization problem is a problem of seeking the predictor function whose performance in the worst environments is optimal. This paper makes two contributions to OOD problem. We first use the basic results of probability to prove maximal Invariant Predictor(MIP) condition, a theoretical result that can be used to identify the OOD optimal solution. We then use our MIP to derive inner-environmental Gradient Alignment(IGA) algorithm that can be used to help seek the OOD optimal predictor. Previous studies that have investigated the theoretical aspect of the OOD-problem use strong structural assumptions such as causal DAG. However, in cases involving image datasets, for example, the identification of hidden structural relations is itself a difficult problem. Our theoretical results are different from those of many previous studies in that it can be applied to cases in which the underlying structure of a dataset is difficult to analyze. We present an extensive comparison of previous theoretical approaches to the OODproblems based on the assumptions they make. We also present an extension of the colored-MNIST that can more accurately represent the pathological OOD situation than the original version, and demonstrate the superiority of IGA over previous methods on both the original and the extended version of Colored-MNIST. In recent years, emerging hardware storage technologies have focused on divergent goals: better performance or lower cost-per-bit of storage. Correspondingly, data systems that employ these new technologies are optimized either to be fast (but expensive) or cheap (but slow). We take a different approach: by combining multiple tiers of fast and low-cost storage technologies within the same system, we can achieve a Pareto-efficient balance between performance and cost-per-bit. This paper presents the design and implementation of PrismDB, a novel log-structured merge tree based key-value store that exploits a full spectrum of heterogeneous storage technologies (from 3D XPoint to QLC NAND). We introduce the notion of "read-awareness" to log-structured merge trees, which allows hot objects to be pinned to faster storage, achieving better tiering and hot-cold separation of objects. Compared to the standard use of RocksDB on flash in datacenters today, PrismDB's average throughput on heterogeneous storage is 2.3$\times$faster and its tail latency is more than an order of magnitude better, using hardware than is half the cost. Visual object tracking from an unmanned aerial vehicle (UAV) poses several challenges such as object occlusion or background clutter. In order to improve the robustness of on-board UAV visual object tracking, we propose a pipeline combining a visual object tracker and a sparse 3D reconstruction of the static environment. The 3D reconstruction is computed with an image-based structure-from-motion (SfM) component and enables us to leverage a state estimator in the corresponding 3D scene space. This improves the handling of occlusions and artifacts caused by background clutter. We performed an evaluation on prototypical image sequences, captured from a UAV with oblique views at low altitude, by adapting an existing dataset for visual object tracking and reconstructing the observed scene in 3D. The experimental results demonstrate that the proposed approach outperforms methods using plain visual cues as well as approaches leveraging image space-based state estimations. In a \emph{data poisoning attack}, an attacker modifies, deletes, and/or inserts some training examples to corrupt the learnt machine learning model. \emph{Bootstrap Aggregating (bagging)} is a well-known ensemble learning method, which trains multiple base models on random subsamples of a training dataset using a base learning algorithm and uses majority vote to predict labels of testing examples. We prove the intrinsic certified robustness of bagging against data poisoning attacks. Specifically, we show that bagging with an arbitrary base learning algorithm provably predicts the same label for a testing example when the number of modified, deleted, and/or inserted training examples is bounded by a threshold. Moreover, we show that our derived threshold is tight if no assumptions on the base learning algorithm are made. We evaluate our method on MNIST and CIFAR10. For instance, our method achieves a certified accuracy of$91.1\%$on MNIST when arbitrarily modifying, deleting, and/or inserting 100 training examples. Tumor segmentation in multimodal medical images has seen a growing trend towards deep learning based methods. Typically, studies dealing with this topic fuse multimodal image data to improve the tumor segmentation contour for a single imaging modality. However, they do not take into account that tumor characteristics are emphasized differently by each modality, which affects the tumor delineation. Thus, the tumor segmentation is modality- and task-dependent. This is especially the case for soft tissue sarcomas, where, due to necrotic tumor tissue, the segmentation differs vastly. Closing this gap, we develop a modalityspecific sarcoma segmentation model that utilizes multimodal image data to improve the tumor delineation on each individual modality. We propose a simultaneous co-segmentation method, which enables multimodal feature learning through modality-specific encoder and decoder branches, and the use of resource-effcient densely connected convolutional layers. We further conduct experiments to analyze how different input modalities and encoder-decoder fusion strategies affect the segmentation result. We demonstrate the effectiveness of our approach on public soft tissue sarcoma data, which comprises MRI (T1 and T2 sequence) and PET/CT scans. The results show that our multimodal co-segmentation model provides better modality-specific tumor segmentation than models using only the PET or MRI (T1 and T2) scan as input. We improve the upper bound on trace reconstruction to$\exp(\widetilde{O}(n^{1/5}))$. When we investigate a type system, it is helpful if we can establish the well-foundedness of types or terms with respect to a certain hierarchy, and the Extended Calculus of Constructions (called ECC, defined and studied comprehensively in [Luo, 1994]) is no exception. However, under a very natural hierarchy relation, the well-foundedness of terms does not hold generally. In this article, the well-foundedness are established for two natural families of terms (namely, types in a valid context and terms having normal forms). Also, we give an independent proof of the existence of principal types in ECC since it is used in the proof of well-foundedness of types in a valid context although it is often proved by utilizing the well-foundedness of terms, which would make our argument circular if adopted. Safe deployment of deep learning systems in critical real world applications requires models to make few mistakes, and only under predictable circumstances. Development of such a model is not yet possible, in general. In this work, we address this problem with an abstaining classifier tuned to have$>$95% accuracy, and identify the determinants of abstention with LIME (the Local Interpretable Model-agnostic Explanations method). Essentially, we are training our model to learn the attributes of pathology reports that are likely to lead to incorrect classifications, albeit at the cost of reduced sensitivity. We demonstrate our method in a multitask setting to classify cancer pathology reports from the NCI SEER cancer registries on six tasks of greatest importance. For these tasks, we reduce the classification error rate by factors of 2-5 by abstaining on 25-45% of the reports. For the specific case of cancer site, we are able to identify metastasis and reports involving lymph nodes as responsible for many of the classification mistakes, and that the extent and types of mistakes vary systematically with cancer site (eg. breast, lung, and prostate). When combining across three of the tasks, our model classifies 50% of the reports with an accuracy greater than 95% for three of the six tasks and greater than 85% for all six tasks on the retained samples. By using this information, we expect to define work flows that incorporate machine learning only in the areas where it is sufficiently robust and accurate, saving human attention to areas where it is required. How can we identify the training examples that contribute most to the prediction of a tree ensemble? In this paper, we introduce TREX, an explanation system that provides instance-attribution explanations for tree ensembles, such as random forests and gradient boosted trees. TREX builds on the representer point framework previously developed for explaining deep neural networks. Since tree ensembles are non-differentiable, we define a kernel that captures the structure of the specific tree ensemble. By using this kernel in kernel logistic regression or a support vector machine, TREX builds a surrogate model that approximates the original tree ensemble. The weights in the kernel expansion of the surrogate model are used to define the global or local importance of each training example. Our experiments show that TREX's surrogate model accurately approximates the tree ensemble; its global importance weights are more effective in dataset debugging than the previous state-of-the-art; its explanations identify the most influential samples better than alternative methods under the remove and retrain evaluation framework; it runs orders of magnitude faster than alternative methods; and its local explanations can identify and explain errors due to domain mismatch. This technical communiqu\'e aims at correcting an erroneous statement (Lemma 2.4) in an earlier paper by the same authors concerning a sufficient condition of uniform observability for a Linear Time-Varying (LTV) system. In this earlier paper, the proofs of two other lemmas, about body-pose estimation from range measurements, relied on this erroneous statement. For the sake of conciseness, only a new proof of one of these lemmas is presented, the proof of the second lemma being a simpler version of it. Optimal control of turbulent mixed-convection flows has attracted considerable attention from researchers. Numerical algorithms such as Genetic Algorithms (GAs) are powerful tools that allow to perform global optimization. These algorithms are particularly of great interest in complex optimization problems where cost functionals may lack smoothness and regularity. In turbulent flow optimization, the hybridization of GA with high fidelity Computational Fluid Dynamics (CFD) is extremely demanding in terms of computational time and memory storage. Thus, alternative approaches aiming to alleviate these requirements are of great interest. Nowadays, data driven approaches gained attention due to their potential in predicting flow solutions based only on preexisting data. In the present paper, we propose a near-real time data-driven genetic algorithm (DDGA) for inverse parameter identification problems involving turbulent flows. In this optimization framework, the parametrized flow data are used in their reduced form obtained by the POD (Proper Orthogonal Decomposition) and solutions prediction is made by interpolating the temporal and the spatial POD subspaces through a recently developed Riemannian barycentric interpolation. The validation of the proposed optimization approach is carried out in the parameter identification problem of the turbulent mixed-convection flow in a cavity. The objective is to determine the inflow temperature and inflow velocity corresponding to a given temperature distribution in a restricted area of the spatial domain. The results show that the proposed genetic programming optimization framework is able to deliver good approximations of the optimal solutions within less than two minutes. Security of currently deployed public key cryptography algorithms is foreseen to be vulnerable against quantum computer attacks. Hence, a community effort exists to develop post-quantum cryptography (PQC) algorithms, i.e., algorithms that are resistant to quantum attacks. In this work, we have investigated how lattice-based candidate algorithms from the NIST PQC standardization competition fare when conceived as hardware accelerators. To achieve this, we have assessed the reference implementations of selected algorithms with the goal of identifying what are their basic building blocks. We assume the hardware accelerators will be implemented in application specific integrated circuit (ASIC) and the targeted technology in our experiments is a commercial 65nm node. In order to estimate the characteristics of each algorithm, we have assessed their memory requirements, use of multipliers, and how each algorithm employs hashing functions. Furthermore, for these building blocks, we have collected area and power figures for 12 candidate algorithms. For memories, we make use of a commercial memory compiler. For logic, we make use of a standard cell library. In order to compare the candidate algorithms fairly, we select a reference frequency of operation of 500MHz. Our results reveal that our area and power numbers are comparable to the state of the art, despite targeting a higher frequency of operation and a higher security level in our experiments. The comprehensive investigation of lattice-based NIST PQC algorithms performed in this paper can be used for guiding ASIC designers when selecting an appropriate algorithm while respecting requirements and design constraints. There are five features to consider when using generative adversarial networks to apply makeup to photos of the human face. These features include (1) facial components, (2) interactive color adjustments, (3) makeup variations, (4) robustness to poses and expressions, and the (5) use of multiple reference images. Several related works have been proposed, mainly using generative adversarial networks (GAN). Unfortunately, none of them have addressed all five features simultaneously. This paper closes the gap with an innovative style- and latent-guided GAN (SLGAN). We provide a novel, perceptual makeup loss and a style-invariant decoder that can transfer makeup styles based on histogram matching to avoid the identity-shift problem. In our experiments, we show that our SLGAN is better than or comparable to state-of-the-art methods. Furthermore, we show that our proposal can interpolate facial makeup images to determine the unique features, compare existing methods, and help users find desirable makeup configurations. Vision-and-language navigation (VLN) is a task in which an agent is embodied in a realistic 3D environment and follows an instruction to reach the goal node. While most of the previous studies have built and investigated a discriminative approach, we notice that there are in fact two possible approaches to building such a VLN agent: discriminative and generative. In this paper, we design and investigate a generative language-grounded policy which computes the distribution over all possible instructions given action and the transition history. In experiments, we show that the proposed generative approach outperforms the discriminative approach in the Room-2-Room (R2R) dataset, especially in the unseen environments. We further show that the combination of the generative and discriminative policies achieves close to the state-of-the art results in the R2R dataset, demonstrating that the generative and discriminative policies capture the different aspects of VLN. This paper surveys the field of transfer learning in the problem setting of Reinforcement Learning (RL). RL has been a key solution to sequential decision-making problems. Along with the fast advances of RL in various domains, such as robotics and game-playing, transfer learning arises as an important technique to assist RL by leveraging and transferring external expertise to boost the learning process of RL. In this survey, we review the central issues of transfer learning in the RL domain, providing a systematic categorization of its state-of-the-art techniques. We analyze their goals, methodologies, applications, and the RL frameworks under which the transfer learning techniques are approachable. We discuss the relationship between transfer learning and other relevant topics from the RL perspective and also explore the potential challenges as well as future development directions for transfer learning in RL. Diffuse Large B-Cell Lymphoma (DLBCL) is the most common non-Hodgkin lymphoma. Though histologically DLBCL shows varying morphologies, no morphologic features have been consistently demonstrated to correlate with prognosis. We present a morphologic analysis of histology sections from 209 DLBCL cases with associated clinical and cytogenetic data. Duplicate tissue core sections were arranged in tissue microarrays (TMAs), and replicate sections were stained with H&E and immunohistochemical stains for CD10, BCL6, MUM1, BCL2, and MYC. The TMAs are accompanied by pathologist-annotated regions-of-interest (ROIs) that identify areas of tissue representative of DLBCL. We used a deep learning model to segment all tumor nuclei in the ROIs, and computed several geometric features for each segmented nucleus. We fit a Cox proportional hazards model to demonstrate the utility of these geometric features in predicting survival outcome, and found that it achieved a C-index (95% CI) of 0.635 (0.574,0.691). Our finding suggests that geometric features computed from tumor nuclei are of prognostic importance, and should be validated in prospective studies. Single layer feedforward networks with random weights are successful in a variety of classification and regression problems. These networks are known for their non-iterative and fast training algorithms. A major drawback of these networks is that they require a large number of hidden units. In this paper, we propose the use of multi-activation hidden units. Such units increase the number of tunable parameters and enable formation of complex decision surfaces, without increasing the number of hidden units. We experimentally show that multi-activation hidden units can be used either to improve the classification accuracy, or to reduce computations. For visual object tracking, it is difficult to realize an almighty online tracker due to the huge variations of target appearance depending on an image sequence. This paper proposes an online tracking method that adaptively aggregates arbitrary multiple online trackers. The performance of the proposed method is theoretically guaranteed to be comparable to that of the best tracker for any image sequence, although the best expert is unknown during tracking. The experimental study on the large variations of benchmark datasets and aggregated trackers demonstrates that the proposed method can achieve state-of-the-art performance. The code is available at https://github.com/songheony/AAA-journal. Differential privacy (DP) has emerged as a de facto standard privacy notion for a wide range of applications. Since the meaning of data utility in different applications may vastly differ, a key challenge is to find the optimal randomization mechanism, i.e., the distribution and its parameters, for a given utility metric. Existing works have identified the optimal distributions in some special cases, while leaving all other utility metrics (e.g., usefulness and graph distance) as open problems. Since existing works mostly rely on manual analysis to examine the search space of all distributions, it would be an expensive process to repeat such efforts for each utility metric. To address such deficiency, we propose a novel approach that can automatically optimize different utility metrics found in diverse applications under a common framework. Our key idea that, by regarding the variance of the injected noise itself as a random variable, a two-fold distribution may approximately cover the search space of all distributions. Therefore, we can automatically find distributions in this search space to optimize different utility metrics in a similar manner, simply by optimizing the parameters of the two-fold distribution. Specifically, we define a universal framework, namely, randomizing the randomization mechanism of differential privacy (R$^2$DP), and we formally analyze its privacy and utility. Our experiments show that R$^2$DP can provide better results than the baseline distribution (Laplace) for several utility metrics with no known optimal distributions, whereas our results asymptotically approach to the optimality for utility metrics having known optimal distributions. As a side benefit, the added degree of freedom introduced by the two-fold distribution allows R$^2\$DP to accommodate the preferences of both data owners and recipients.

Causal explanation analysis (CEA) can assist us to understand the reasons behind daily events, which has been found very helpful for understanding the coherence of messages. In this paper, we focus on Causal Explanation Detection, an important subtask of causal explanation analysis, which determines whether a causal explanation exists in one message. We design a Pyramid Salient-Aware Network (PSAN) to detect causal explanations on messages. PSAN can assist in causal explanation detection via capturing the salient semantics of discourses contained in their keywords with a bottom graph-based word-level salient network. Furthermore, PSAN can modify the dominance of discourses via a top attention-based discourse-level salient network to enhance explanatory semantics of messages. The experiments on the commonly used dataset of CEA shows that the PSAN outperforms the state-of-the-art method by 1.8% F1 value on the Causal Explanation Detection task.

Designing of a cranial implant needs a 3D understanding of the complete skull shape. Thus, taking a 2D approach is sub-optimal, since a 2D model lacks a holistic 3D view of both the defective and healthy skulls. Further, loading the whole 3D skull shapes at its original image resolution is not feasible in commonly available GPUs. To mitigate these issues, we propose a fully convolutional network composed of two subnetworks. The first subnetwork is designed to complete the shape of the downsampled defective skull. The second subnetwork upsamples the reconstructed shape slice-wise. We train the 3D and 2D networks together end-to-end, with a hierarchical loss function. Our proposed solution accurately predicts a high-resolution 3D implant in the challenge test case in terms of dice-score and the Hausdorff distance.

The notion of face refers to the public self-image of an individual that emerges both from the individual's own actions as well as from the interaction with others. Modeling face and understanding its state changes throughout a conversation is critical to the study of maintenance of basic human needs in and through interaction. Grounded in the politeness theory of Brown and Levinson (1978), we propose a generalized framework for modeling face acts in persuasion conversations, resulting in a reliable coding manual, an annotated corpus, and computational models. The framework reveals insights about differences in face act utilization between asymmetric roles in persuasion conversations. Using computational models, we are able to successfully identify face acts as well as predict a key conversational outcome (e.g. donation success). Finally, we model a latent representation of the conversational state to analyze the impact of predicted face acts on the probability of a positive conversational outcome and observe several correlations that corroborate previous findings.

Nowadays, the prevalence of sensor networks has enabled tracking of the states of dynamic objects for a wide spectrum of applications from autonomous driving to environmental monitoring and urban planning. However, tracking real-world objects often faces two key challenges: First, due to the limitation of individual sensors, state estimation needs to be solved in a collaborative and distributed manner. Second, the objects' movement behavior is unknown, and needs to be learned using sensor observations. In this work, for the first time, we formally formulate the problem of simultaneous state estimation and behavior learning in a sensor network. We then propose a simple yet effective solution to this new problem by extending the Gaussian process-based Bayes filters (GP-BayesFilters) to an online, distributed setting. The effectiveness of the proposed method is evaluated on tracking objects with unknown movement behaviors using both synthetic data and data collected from a multi-robot platform.

Salient object detection (SOD) is a fundamental computer vision task. Recently, with the revival of deep neural networks, SOD has made great progresses. However, there still exist two thorny issues that cannot be well addressed by existing methods, indistinguishable regions and complex structures. To address these two issues, in this paper we propose a novel deep network for accurate SOD, named CLASS. First, in order to leverage the different advantages of low-level and high-level features, we propose a novel non-local cross-level attention (CLA), which can capture the long-range feature dependencies to enhance the distinction of complete salient object. Second, a novel cross-level supervision (CLS) is designed to learn complementary context for complex structures through pixel-level, region-level and object-level. Then the fine structures and boundaries of salient objects can be well restored. In experiments, with the proposed CLA and CLS, our CLASS net. consistently outperforms 13 state-of-the-art methods on five datasets.

We extend the {\lambda}-calculus with constructs suitable for relational and functional-logic programming: non-deterministic choice, fresh variable introduction, and unification of expressions. In order to be able to unify {\lambda}-expressions and still obtain a confluent theory, we depart from related approaches, such as {\lambda}Prolog, in that we do not attempt to solve higher-order unification. Instead, abstractions are decorated with a location, which intuitively may be understood as its memory address, and we impose a simple coherence invariant: abstractions in the same location must be equal. This allows us to formulate a confluent small-step operational semantics which only performs first-order unification and does not require strong evaluation (below lambdas). We study a simply typed version of the system. Moreover, a denotational semantics for the calculus is proposed and reduction is shown to be sound with respect to the denotational semantics.

Generating high-quality images from scene graphs, that is, graphs that describe multiple entities in complex relations, is a challenging task that attracted substantial interest recently. Prior work trained such models by using supervised learning, where the goal is to produce the exact target image layout for each scene graph. It relied on predicting object locations and shapes independently and in parallel. However, scene graphs are underspecified, and thus the same scene graph often occurs with many target images in the training data. This leads to generated images with high inter-object overlap, empty areas, blurry objects, and overall compromised quality. In this work, we propose a method that alleviates these issues by generating all object layouts together and reducing the reliance on such supervision. Our model predicts layouts directly from embeddings (without predicting intermediate boxes) by gradually upsampling, refining and contextualizing object layouts. It is trained with a novel adversarial loss, that optimizes the interaction between object pairs. This improves coverage and removes overlaps, while maintaining sensible contours and respecting objects relations. We empirically show on the COCO-STUFF dataset that our proposed approach substantially improves the quality of generated layouts as well as the overall image quality. Our evaluation shows that we improve layout coverage by almost 20 points, and drop object overlap to negligible amounts. This leads to better image generation, relation fulfillment and objects quality.

Until now, Coronavirus SARS-CoV-2 has caused more than 850,000 deaths and infected more than 27 million individuals in over 120 countries. Besides principal polymerase chain reaction (PCR) tests, automatically identifying positive samples based on computed tomography (CT) scans can present a promising option in the early diagnosis of COVID-19. Recently, there have been increasing efforts to utilize deep networks for COVID-19 diagnosis based on CT scans. While these approaches mostly focus on introducing novel architectures, transfer learning techniques, or construction large scale data, we propose a novel strategy to improve the performance of several baselines by leveraging multiple useful information sources relevant to doctors' judgments. Specifically, infected regions and heat maps extracted from learned networks are integrated with the global image via an attention mechanism during the learning process. This procedure not only makes our system more robust to noise but also guides the network focusing on local lesion areas. Extensive experiments illustrate the superior performance of our approach compared to recent baselines. Furthermore, our learned network guidance presents an explainable feature to doctors as we can understand the connection between input and output in a grey-box model.

This paper presents a systematic approach for analyzing the departure-time choice equilibrium (DTCE) problem of a single bottleneck with heterogeneous commuters. The approach is based on the fact that the DTCE is equivalently represented as a linear programming problem with a special structure, which can be analytically solved by exploiting the theory of optimal transport combined with a decomposition technique. By applying the proposed approach to several types of models with heterogeneous commuters, it is shown that (i) the essential condition for emerging equilibrium "sorting patterns," which have been known in the literature, is that the schedule delay functions have the "Monge property," (ii) the equilibrium problems with the Monge property can be solved analytically, and (iii) the proposed approach can be applied to a more general problem with more than two types of heterogeneities.

Already today, driver assistance systems help to make daily traffic more comfortable and safer. However, there are still situations that are quite rare but are hard to handle at the same time. In order to cope with these situations and to bridge the gap towards fully automated driving, it becomes necessary to not only collect enormous amounts of data but rather the right ones. This data can be used to develop and validate the systems through machine learning and simulation pipelines. Along this line this paper presents a fleet learning-based architecture that enables continuous improvements of systems predicting the movement of surrounding traffic participants. Moreover, the presented architecture is applied to a testing vehicle in order to prove the fundamental feasibility of the system. Finally, it is shown that the system collects meaningful data which are helpful to improve the underlying prediction systems.

A new application of subspaces interpolation for the construction of nonlinear Parametric Reduced Order Models (PROMs) is proposed. This approach is based upon the Riemannian geometry of the manifold formed by the quotient of the set of full-rank N-by-q matrices by the orthogonal group of dimension q. By using a set of untrained parametrized Proper Orthogonal Decomposition (POD) subspaces of dimension q, the subspace for a new untrained parameter is obtained as the generalized Karcher barycenter which solution is sought after by solving a simple fixed point problem. Contrary to existing PROM approaches, the proposed barycentric PROM is by construction easier to implement and more flexible with respect to change in parameter values. To assess the potential of the barycentric PROM, numerical experiments are conducted on the parametric flow past a circular cylinder and the flow in a lid driven cavity when the value of Reynolds number varies. It is demonstrated that the proposed barycentric PROM approach achieves competitive results with considerably reduced computational cost.

Deep learning applied to weather forecasting has started gaining popularity because of the progress achieved by data-driven models. The present paper compares four different deep learning architectures to perform weather prediction on daily data gathered from 18 cities across Europe and spanned over a period of 15 years. The four proposed models investigate the different type of input representations (i.e. tensorial unistream vs. multi-stream matrices) as well as the combination of convolutional neural networks and LSTM (i.e. cascaded vs. ConvLSTM). In particular, we show that a model that uses a multi-stream input representation and that processes each lag individually combined with a cascaded convolution and LSTM is capable of better forecasting than the other compared models. In addition, we show that visualization techniques such as occlusion analysis and score maximization can give an additional insight on the most important features and cities for predicting a particular target feature and city.

We propose a simple algorithm to locate the "corner" of an L-curve, a function often used to select the regularisation parameter for the solution of ill-posed inverse problems. The algorithm involves the Menger curvature of a circumcircle and the golden section search method. It efficiently finds the regularisation parameter value corresponding to the maximum positive curvature region of the L-curve. The algorithm is applied to some commonly available test problems and compared to the typical way of locating the l-curve corner by means of its analytical curvature. The application of the algorithm to the data processing of an electrical resistance tomography experiment on thin conductive films is also reported.

Modern driver assistance systems as well as autonomous vehicles take their decisions based on local maps of the environment. These maps include, for example, surrounding moving objects perceived by sensors as well as routes and navigation information. Current research in the field of environment mapping is concerned with two major challenges. The first one is the integration of information from different sources e.g. on-board sensors like radar, camera, ultrasound and lidar, offline map data or backend information. The second challenge comprises in finding an abstract representation of this aggregated information with suitable interfaces for different driving functions and traffic situations. To overcome these challenges, an extended environment model is a reasonable choice. In this paper, we show that role-based motion predictions in combination with v2x-extended environment models are able to contribute to increased traffic safety and driving comfort. Thus, we combine the mentioned research areas and show possible improvements, using the example of a threading process at a motorway access road. Furthermore, it is shown that already an average v2x equipment penetration of 80% can lead to a significant improvement of 0.33m/s^2 of the total acceleration and 12m more safety distance compared to non v2x-equipped vehicles during the threading process.

Single layer feedforward networks with random weights are known for their non-iterative and fast training algorithms and are successful in a variety of classification and regression problems. A major drawback of these networks is that they require a large number of hidden units. In this paper, we propose a technique to reduce the number of hidden units substantially without affecting the accuracy of the networks significantly. We introduce the concept of primary and secondary hidden units. The weights for the primary hidden units are chosen randomly while the secondary hidden units are derived using pairwise combinations of the primary hidden units. Using this technique, we show that the number of hidden units can be reduced by at least one order of magnitude. We experimentally show that this technique leads to significant drop in computations at inference time and has only a minor impact on network accuracy. A huge reduction in computations is possible if slightly lower accuracy is acceptable.

Prediction of stock prices has been an important area of research for a long time. While supporters of the efficient market hypothesis believe that it is impossible to predict stock prices accurately, there are formal propositions demonstrating that accurate modeling and designing of appropriate variables may lead to models using which stock prices and stock price movement patterns can be very accurately predicted. In this work, we propose an approach of hybrid modeling for stock price prediction building different machine learning and deep learning-based models. For the purpose of our study, we have used NIFTY 50 index values of the National Stock Exchange (NSE) of India, during the period December 29, 2014 till July 31, 2020. We have built eight regression models using the training data that consisted of NIFTY 50 index records during December 29, 2014 till December 28, 2018. Using these regression models, we predicted the open values of NIFTY 50 for the period December 31, 2018 till July 31, 2020. We, then, augment the predictive power of our forecasting framework by building four deep learning-based regression models using long-and short-term memory (LSTM) networks with a novel approach of walk-forward validation. We exploit the power of LSTM regression models in forecasting the future NIFTY 50 open values using four different models that differ in their architecture and in the structure of their input data. Extensive results are presented on various metrics for the all the regression models. The results clearly indicate that the LSTM-based univariate model that uses one-week prior data as input for predicting the next week open value of the NIFTY 50 time series is the most accurate model.