cs updates on arXiv.org

Computer Science (cs) updates on the arXiv.org e-print archive



back
<p>In this work, a new method for designing an analog circuit for deep sub-micron CMOS fabrication processes is proposed. The proposed method leverages the regression algorithms with the transistor circuit model to size a transistor in 0.18 um technology fast and without using simulation software. Threshold voltage, output resistance, and the product of mobility and oxide capacitance are key parameters in the transistor circuit model to size a transistor. For nano-scale transistors, however, these parameters are nonlinear with respect to electrical and physical characteristics of transistors and circuit simulator is needed to find the value of these parameters and therefore the design time increases. Regression analysis is utilized to predict values of these parameters. We demonstrate the performance of the proposed method by designing a Current Feedback Instrumentational Amplifier (CFIA). We show that the presented method accomplishes higher than 90% accuracy in predicting the desired value of W. It reduces the design time over 97% compared to conventional methods. The designed circuit using the proposed method consumes 5.76 uW power and has a Common Mode Rejection Ratio (CMRR) of 35.83 dB and it results in achieving 8.17 V/V gain. </p>
back
<p>Self-supervised learning is the backbone of state of the art language modeling. It has been argued that training with predictive loss on a self-supervised dataset causes simulators: entities that internally represent possible configurations of real-world systems. Under this assumption, a mathematical model for simulators is built based in the Cartesian frames model of embedded agents, which is extended to multi-agent worlds through scaling a two-dimensional frame to arbitrary dimensions, where literature prior chooses to instead use operations on frames. This variant leveraging scaling dimensionality is named the Cartesian object, and is used to represent simulations (where individual simulacra are the agents and devices in that object). Around the Cartesian object, functions like token selection and simulation complexity are accounted for in formalizing the behavior of a simulator, and used to show (through the L\"obian obstacle) that a proof of alignment between simulacra by inspection of design is impossible in the simulator context. Following this, a scheme is proposed and termed Partial Simulation Extrapolation aimed at circumventing the L\"obian obstacle through the evaluation of low-complexity simulations. </p>
back
<p>Fairness is a popular research topic in recent years. A research topic closely related to fairness is bias and debiasing. Among different types of bias problems, position bias is one of the most widely encountered symptoms. Position bias means that recommended items on top of the recommendation list has a higher likelihood to be clicked than items on bottom of the same list. To mitigate this problem, we propose to use regularization technique to reduce the bias effect. In the experiment section, we prove that our method is superior to other modern algorithms. </p>
back
<p>As legal case law databases such as HUDOC continue to grow rapidly, it has become essential for legal researchers to find efficient methods to handle such large-scale data sets. Such case law databases usually consist of the textual content of cases together with the citations between them. This paper focuses on case law from the European Court of Human Rights on Article 8 of the European Convention of Human Rights, the right to respect private and family life, home and correspondence. In this study, we demonstrate and compare the potential of topic modelling and citation network to find and organize case law on Article 8 based on their general themes and citation patterns, respectively. Additionally, we explore whether combining these two techniques leads to better results compared to the application of only one of the methods. We evaluate the effectiveness of the combined method on a unique manually collected and annotated dataset of Aricle 8 case law on evictions. The results of our experiments show that our combined (text and citation-based) approach provides the best results in finding and grouping case law, providing scholars with an effective way to extract and analyse relevant cases on a specific issue. </p>
back
<p>Background: The COVID-19 pandemic has caused severe impacts on health systems worldwide. Its critical nature and the increased interest of individuals and organizations to develop countermeasures to the problem has led to a surge of new studies in scientific journals. Objetive: We sought to develop a tool that incorporates, in a novel way, aspects of Information Retrieval (IR) and Extraction (IE) applied to the COVID-19 Open Research Dataset (CORD-19). The main focus of this paper is to provide researchers with a better search tool for COVID-19 related papers, helping them find reference papers and hightlight relevant entities in text. Method: We applied Latent Dirichlet Allocation (LDA) to model, based on research aspects, the topics of all English abstracts in CORD-19. Relevant named entities of each abstract were extracted and linked to the corresponding UMLS concept. Regular expressions and the K-Nearest Neighbors algorithm were used to rank relevant papers. Results: Our tool has shown the potential to assist researchers by automating a topic-based search of CORD-19 papers. Nonetheless, we identified that more fine-tuned topic modeling parameters and increased accuracy of the research aspect classifier model could lead to a more accurate and reliable tool. Conclusion: We emphasize the need of new automated tools to help researchers find relevant COVID-19 documents, in addition to automatically extracting useful information contained in them. Our work suggests that combining different algorithms and models could lead to new ways of browsing COVID-19 paper data. </p>
back
<p>The task of predicting conversion rates (CVR) lies at the heart of online advertising systems aiming to optimize bids to meet advertiser performance requirements. Even with the recent rise of deep neural networks, these predictions are often made by factorization machines (FM), especially in commercial settings where inference latency is key. These models are trained using the logistic regression framework on labeled tabular data formed from past user activity that is relevant to the task at hand. </p> <p>Many advertisers only care about click-attributed conversions. A major challenge in training models that predict conversions-given-clicks comes from data sparsity - clicks are rare, conversions attributed to clicks are even rarer. However, mitigating sparsity by adding conversions that are not click-attributed to the training set impairs model calibration. Since calibration is critical to achieving advertiser goals, this is infeasible. </p> <p>In this work we use the well-known idea of self-supervised pre-training, and use an auxiliary auto-encoder model trained on all conversion events, both click-attributed and not, as a feature extractor to enrich the main CVR prediction model. Since the main model does not train on non click-attributed conversions, this does not impair calibration. We adapt the basic self-supervised pre-training idea to our online advertising setup by using a loss function designed for tabular data, facilitating continual learning by ensuring auto-encoder stability, and incorporating a neural network into a large-scale real-time ad auction that ranks tens of thousands of ads, under strict latency constraints, and without incurring a major engineering cost. We show improvements both offline, during training, and in an online A/B test. Following its success in A/B tests, our solution is now fully deployed to the Yahoo native advertising system. </p>
back
<p>Within-basket recommendation (WBR) refers to the task of recommending items to the end of completing a non-empty shopping basket during a shopping session. While the latest innovations in this space demonstrate remarkable performance improvement on benchmark datasets, they often overlook the complexity of user behaviors in practice, such as 1) co-existence of multiple shopping intentions, 2) multi-granularity of such intentions, and 3) interleaving behavior (switching intentions) in a shopping session. This paper presents Neural Pattern Associator (NPA), a deep item-association-mining model that explicitly models the aforementioned factors. Specifically, inspired by vector quantization, the NPA model learns to encode common user intentions (or item-combination patterns) as quantized representations (a.k.a. codebook), which permits identification of users's shopping intentions via attention-driven lookup during the reasoning phase. This yields coherent and self-interpretable recommendations. We evaluated the proposed NPA model across multiple extensive datasets, encompassing the domains of grocery e-commerce (shopping basket completion) and music (playlist extension), where our quantitative evaluations show that the NPA model significantly outperforms a wide range of existing WBR solutions, reflecting the benefit of explicitly modeling complex user intentions. </p>
back
<p>An adaptive control approach for a three-phase grid-interfaced solar photovoltaic system based on the new Neuro-Fuzzy Inference System with Rain Optimization Algorithm (ANROA) methodology is proposed and discussed in this manuscript. This method incorporates an Adaptive Neuro-fuzzy Inference System (ANFIS) with a Rain Optimization Algorithm (ROA). The ANFIS controller has excellent maximum tracking capability because it includes features of both neural and fuzzy techniques. The ROA technique is in charge of controlling the voltage source converter switching. Avoiding power quality problems including voltage fluctuations, harmonics, and flickers as well as unbalanced loads and reactive power usage is the major goal. Besides, the proposed method performs at zero voltage regulation and unity power factor modes. The suggested control approach has been modeled and simulated, and its performance has been assessed using existing alternative methods. A statistical analysis of proposed and existing techniques has been also presented and discussed. The results of the simulations demonstrate that, when compared to alternative approaches, the suggested strategy may properly and effectively identify the best global solutions. Furthermore, the system's robustness has been studied by using MATLAB/SIMULINK environment and experimentally by Field Programmable Gate Arrays Controller (FPGA)-based Hardware-in-Loop (HLL). </p>
back
<p>The Burrows-Wheeler Transform (BWT) is a string transformation technique widely used in areas such as bioinformatics and file compression. Many applications combine a run-length encoding (RLE) with the BWT in a way which preserves the ability to query the compressed data efficiently. However, these methods may not take full advantage of the compressibility of the BWT as they do not modify the alphabet ordering for the sorting step embedded in computing the BWT. Indeed, any such alteration of the alphabet ordering can have a considerable impact on the output of the BWT, in particular on the number of runs. For an alphabet $\Sigma$ containing $\sigma$ characters, the space of all alphabet orderings is of size $\sigma!$. While for small alphabets an exhaustive investigation is possible, finding the optimal ordering for larger alphabets is not feasible. Therefore, there is a need for a more informed search strategy than brute-force sampling the entire space, which motivates a new heuristic approach. In this paper, we explore the non-trivial cases for the problem of minimizing the size of a run-length encoded BWT (RLBWT) via selecting a new ordering for the alphabet. We show that random sampling of the space of alphabet orderings usually gives sub-optimal orderings for compression and that a local search strategy can provide a large improvement in relatively few steps. We also inspect a selection of initial alphabet orderings, including ASCII, letter appearance, and letter frequency. While this alphabet ordering problem is computationally hard we demonstrate gain in compressibility. </p>
back
<p>Weather radar is the primary tool used by forecasters to detect and warn for tornadoes in near-real time. In order to assist forecasters in warning the public, several algorithms have been developed to automatically detect tornadic signatures in weather radar observations. Recently, Machine Learning (ML) algorithms, which learn directly from large amounts of labeled data, have been shown to be highly effective for this purpose. Since tornadoes are extremely rare events within the corpus of all available radar observations, the selection and design of training datasets for ML applications is critical for the performance, robustness, and ultimate acceptance of ML algorithms. This study introduces a new benchmark dataset, TorNet to support development of ML algorithms in tornado detection and prediction. TorNet contains full-resolution, polarimetric, Level-II WSR-88D data sampled from 10 years of reported storm events. A number of ML baselines for tornado detection are developed and compared, including a novel deep learning (DL) architecture capable of processing raw radar imagery without the need for manual feature extraction required for existing ML algorithms. Despite not benefiting from manual feature engineering or other preprocessing, the DL model shows increased detection performance compared to non-DL and operational baselines. The TorNet dataset, as well as source code and model weights of the DL baseline trained in this work, are made freely available. </p>
back
<p>Deep learning models like Transformers and Convolutional Neural Networks (CNNs) have revolutionized various domains, but their parameter-intensive nature hampers deployment in resource-constrained settings. In this paper, we introduce a novel concept utilizes column space and row space of weight matrices, which allows for a substantial reduction in model parameters without compromising performance. Leveraging this paradigm, we achieve parameter-efficient deep learning models.. Our approach applies to both Bottleneck and Attention layers, effectively halving the parameters while incurring only minor performance degradation. Extensive experiments conducted on the ImageNet dataset with ViT and ResNet50 demonstrate the effectiveness of our method, showcasing competitive performance when compared to traditional models. This approach not only addresses the pressing demand for parameter efficient deep learning solutions but also holds great promise for practical deployment in real-world scenarios. </p>
back
<p>We study the problem of auditing classifiers with the notion of statistical subgroup fairness. Kearns et al. (2018) has shown that the problem of auditing combinatorial subgroups fairness is as hard as agnostic learning. Essentially all work on remedying statistical measures of discrimination against subgroups assumes access to an oracle for this problem, despite the fact that no efficient algorithms are known for it. If we assume the data distribution is Gaussian, or even merely log-concave, then a recent line of work has discovered efficient agnostic learning algorithms for halfspaces. Unfortunately, the boosting-style reductions given by Kearns et al. required the agnostic learning algorithm to succeed on reweighted distributions that may not be log-concave, even if the original data distribution was. In this work, we give positive and negative results on auditing for the Gaussian distribution: On the positive side, we an alternative approach to leverage these advances in agnostic learning and thereby obtain the first polynomial-time approximation scheme (PTAS) for auditing nontrivial combinatorial subgroup fairness: we show how to audit statistical notions of fairness over homogeneous halfspace subgroups when the features are Gaussian. On the negative side, we find that under cryptographic assumptions, no polynomial-time algorithm can guarantee any nontrivial auditing, even under Gaussian feature distributions, for general halfspace subgroups. </p>
back
<p>There has been considerable recent interest in scoring properties on the basis of eviction risk. The success of methods for eviction prediction is typically evaluated using different measures of predictive accuracy. However, the underlying goal of such prediction is to direct appropriate assistance to households that may be at greater risk so they remain stably housed. Thus, we must ask the question of how useful such predictions are in targeting outreach efforts - informing action. In this paper, we investigate this question using a novel dataset that matches information on properties, evictions, and owners. We perform an eviction prediction task to produce risk scores and then use these risk scores to plan targeted outreach policies. We show that the risk scores are, in fact, useful, enabling a theoretical team of caseworkers to reach more eviction-prone properties in the same amount of time, compared to outreach policies that are either neighborhood-based or focus on buildings with a recent history of evictions. We also discuss the importance of neighborhood and ownership features in both risk prediction and targeted outreach. </p>
back
<p>Over the past years, a large number of fake news detection algorithms based on deep learning have emerged. However, they are often developed under different frameworks, each mandating distinct utilization methodologies, consequently hindering reproducibility. Additionally, a substantial amount of redundancy characterizes the code development of such fake news detection models. To address these concerns, we propose FaKnow, a unified and comprehensive fake news detection algorithm library. It encompasses a variety of widely used fake news detection models, categorized as content-based and social context-based approaches. This library covers the full spectrum of the model training and evaluation process, effectively organizing the data, models, and training procedures within a unified framework. Furthermore, it furnishes a series of auxiliary functionalities and tools, including visualization, and logging. Our work contributes to the standardization and unification of fake news detection research, concurrently facilitating the endeavors of researchers in this field. The open-source code and documentation can be accessed at https://github.com/NPURG/FaKnow and https://faknow.readthedocs.io, respectively. </p>
back
<p>As VR devices become more prevalent in the consumer space, VR applications are likely to be increasingly used by users unfamiliar with VR. Detecting the familiarity level of a user with VR as an interaction medium provides the potential of providing on-demand training for acclimatization and prevents the user from being burdened by the VR environment in accomplishing their tasks. In this work, we present preliminary results of using deep classifiers to conduct automatic detection of familiarity with VR by using hand tracking of the user as they interact with a numeric passcode entry panel to unlock a VR door. We use a VR door as we envision it to the first point of entry to collaborative virtual spaces, such as meeting rooms, offices, or clinics. Users who are unfamiliar with VR will have used their hands to open doors with passcode entry panels in the real world. Thus, while the user may not be familiar with VR, they would be familiar with the task of opening the door. Using a pilot dataset consisting of 7 users familiar with VR, and 7 not familiar with VR, we acquire highest accuracy of 88.03\% when 6 test users, 3 familiar and 3 not familiar, are evaluated with classifiers trained using data from the remaining 8 users. Our results indicate potential for using user movement data to detect familiarity for the simple yet important task of secure passcode-based access. </p>
back
<p>Existing game AI research mainly focuses on enhancing agents' abilities to win games, but this does not inherently make humans have a better experience when collaborating with these agents. For example, agents may dominate the collaboration and exhibit unintended or detrimental behaviors, leading to poor experiences for their human partners. In other words, most game AI agents are modeled in a "self-centered" manner. In this paper, we propose a "human-centered" modeling scheme for collaborative agents that aims to enhance the experience of humans. Specifically, we model the experience of humans as the goals they expect to achieve during the task. We expect that agents should learn to enhance the extent to which humans achieve these goals while maintaining agents' original abilities (e.g., winning games). To achieve this, we propose the Reinforcement Learning from Human Gain (RLHG) approach. The RLHG approach introduces a "baseline", which corresponds to the extent to which humans primitively achieve their goals, and encourages agents to learn behaviors that can effectively enhance humans in achieving their goals better. We evaluate the RLHG agent in the popular Multi-player Online Battle Arena (MOBA) game, Honor of Kings, by conducting real-world human-agent tests. Both objective performance and subjective preference results show that the RLHG agent provides participants better gaming experience. </p>
back
<p>Large language models (LLMs), as epitomized by models like ChatGPT, have revolutionized the field of natural language processing (NLP). Along with this trend, code-based large language models such as StarCoder, WizardCoder, and CodeLlama have emerged, trained extensively on vast repositories of code data. Yet, inherent in their design, these models primarily focus on generative tasks like code generation, code completion, and comment generation, and general support for multiple programming languages. While the generic abilities of code LLMs are useful for many programmers, the area of high-performance computing (HPC) has a narrower set of requirements that make a smaller and more domain-specific LM a smarter choice. This paper introduces OMPGPT, a novel model meticulously designed to harness the inherent strengths of language models for OpenMP pragma generation. Furthermore, we adopt and adapt prompt engineering techniques from the NLP domain to create chain-of-OMP, an innovative strategy designed to enhance OMPGPT's effectiveness. Our extensive evaluations demonstrate that OMPGPT outperforms existing large language models specialized in OpenMP tasks and maintains a notably smaller size, aligning it more closely with the typical hardware constraints of HPC environments. We consider our contribution as a pivotal bridge, connecting the advantage of language models with the specific demands of HPC tasks. The success of OMPGPT lays a solid foundation, suggesting its potential applicability and adaptability to a wider range of HPC tasks, thereby opening new avenues in the field of computational efficiency and effectiveness. </p>
back
<p>Fast and reliable transmission network reconfiguration is critical in improving power grid resilience to cyber-attacks. If the network reconfiguration following cyber-attacks is imperfect, secondary incidents may delay or interrupt post-attack restoration of the power grid. This paper proposes a framework of resilient transmission network reconfiguration, taking into account the impacts of cyber-attacks in the network reconfiguration process. First, the mechanism of cyber-attack propagation is analyzed based on the characteristics of network reconfiguration. Second, systematic resilience indices are specially extracted in which the impact of cyber-attacks on network reconfiguration is quantified. These indices are defined in terms of the restoration characteristics of the transmission power system. Third, representative cyber-attack incidents motivate an optimization-based model of resilient transmission network reconfiguration, and an optimal reconstruction scheme is obtained. Finally, simulation results based on the IEEE 39-bus system verify the feasibility and effectiveness of the proposed framework in enhancing power grid resilience to cyber-attacks. </p>
back
<p>This paper presents LLM4SecHW, a novel framework for hardware debugging that leverages domain specific Large Language Model (LLM). Despite the success of LLMs in automating various software development tasks, their application in the hardware security domain has been limited due to the constraints of commercial LLMs and the scarcity of domain specific data. To address these challenges, we propose a unique approach to compile a dataset of open source hardware design defects and their remediation steps, utilizing version control data. This dataset provides a substantial foundation for training machine learning models for hardware. LLM4SecHW employs fine tuning of medium sized LLMs based on this dataset, enabling the identification and rectification of bugs in hardware designs. This pioneering approach offers a reference workflow for the application of fine tuning domain specific LLMs in other research areas. We evaluate the performance of our proposed system on various open source hardware designs, demonstrating its efficacy in accurately identifying and correcting defects. Our work brings a new perspective on automating the quality control process in hardware design. </p>
back
<p>Digital Twins (DT) have become crucial to achieve sustainable and effective smart urban solutions. However, current DT modelling techniques cannot support the dynamicity of these smart city environments. This is caused by the lack of right-time data capturing in traditional approaches, resulting in inaccurate modelling and high resource and energy consumption challenges. To fill this gap, we explore spatiotemporal graphs and propose the Reinforcement Learning-based Adaptive Twining (RL-AT) mechanism with Deep Q Networks (DQN). By doing so, our study contributes to advancing Green Cities and showcases tangible benefits in accuracy, synchronisation, resource optimization, and energy efficiency. As a result, we note the spatiotemporal graphs are able to offer a consistent accuracy and 55% higher querying performance when implemented using graph databases. In addition, our model demonstrates right-time data capturing with 20% lower overhead and 25% lower energy consumption. </p>
back
<p>With the increasing need for inclusive and user-friendly technology, web accessibility is crucial to ensuring equal access to online content for individuals with disabilities, including visual, auditory, cognitive, or motor impairments. Despite the existence of accessibility guidelines and standards such as Web Content Accessibility Guidelines (WCAG) and the Web Accessibility Initiative (W3C), over 90\% of websites still fail to meet the necessary accessibility requirements. For web users with disabilities, there exists a need for a tool to automatically fix web page accessibility errors. While research has demonstrated methods to find and target accessibility errors, no research has focused on effectively correcting such violations. This paper presents a novel approach to correcting accessibility violations on the web by modifying the document object model (DOM) in real time with foundation models. Leveraging accessibility error information, large language models (LLMs), and prompt engineering techniques, we achieved greater than a 51\% reduction in accessibility violation errors after corrections on our novel benchmark: ACCESS. Our work demonstrates a valuable approach toward the direction of inclusive web content, and provides directions for future research to explore advanced methods to automate web accessibility. </p>
back
<p>Offline reinforcement learning (RL) algorithms can improve the decision making via stitching sub-optimal trajectories to obtain more optimal ones. This capability is a crucial factor in enabling RL to learn policies that are superior to the behavioral policy. On the other hand, Decision Transformer (DT) abstracts the decision-making as sequence modeling, showcasing competitive performance on offline RL benchmarks, however, recent studies demonstrate that DT lacks of stitching capability, thus exploit stitching capability for DT is vital to further improve its performance. In order to endow stitching capability to DT, we abstract trajectory stitching as expert matching and introduce our approach, ContextFormer, which integrates contextual information-based imitation learning (IL) and sequence modeling to stitch sub-optimal trajectory fragments by emulating the representations of a limited number of expert trajectories. To validate our claim, we conduct experiments from two perspectives: 1) We conduct extensive experiments on D4RL benchmarks under the settings of IL, and experimental results demonstrate ContextFormer can achieve competitive performance in multi-IL settings. 2) More importantly, we conduct a comparison of ContextFormer with diverse competitive DT variants using identical training datasets. The experimental results unveiled ContextFormer's superiority, as it outperformed all other variants, showcasing its remarkable performance. </p>
back
<p>Long-term traffic prediction has always been a challenging task due to its dynamic temporal dependencies and complex spatial dependencies. In this paper, we propose a model that combines hybrid Transformer and spatio-temporal self-supervised learning. The model enhances its robustness by applying adaptive data augmentation techniques at the sequence-level and graph-level of the traffic data. It utilizes Transformer to overcome the limitations of recurrent neural networks in capturing long-term sequences, and employs Chebyshev polynomial graph convolution to capture complex spatial dependencies. Furthermore, considering the impact of spatio-temporal heterogeneity on traffic speed, we design two self-supervised learning tasks to model the temporal and spatial heterogeneity, thereby improving the accuracy and generalization ability of the model. Experimental evaluations are conducted on two real-world datasets, PeMS04 and PeMS08, and the results are visualized and analyzed, demonstrating the superior performance of the proposed model. </p>
back
<p>An effective multi-turn instruction-following assistant can be developed by creating a simulator that can generate useful interaction data. Apart from relying on its intrinsic weights, an ideal user simulator should also be able to bootstrap external knowledge rapidly in its raw form to simulate the multifarious diversity of text available over the internet. Previous user simulators generally lacked diversity, were mostly closed domain, and necessitated rigid schema making them inefficient to rapidly scale to incorporate external knowledge. In this regard, we introduce, Kaucus, a Knowledge-Augmented User Simulator framework, to outline a process of creating diverse user simulators, that can seamlessly exploit external knowledge as well as benefit downstream assistant model training. Through two GPT-J based simulators viz., a Retrieval Augmented Simulator and a Summary Controlled Simulator we generate diverse simulator-assistant interactions. Through reward and preference model-based evaluations, we find that these interactions serve as useful training data and create more helpful downstream assistants. We also find that incorporating knowledge through retrieval augmentation or summary control helps create better assistants. </p>
back
<p>Recently, efficient Vision Transformers have shown great performance with low latency on resource-constrained devices. Conventionally, they use 4x4 patch embeddings and a 4-stage structure at the macro level, while utilizing sophisticated attention with multi-head configuration at the micro level. This paper aims to address computational redundancy at all design levels in a memory-efficient manner. We discover that using larger-stride patchify stem not only reduces memory access costs but also achieves competitive performance by leveraging token representations with reduced spatial redundancy from the early stages. Furthermore, our preliminary analyses suggest that attention layers in the early stages can be substituted with convolutions, and several attention heads in the latter stages are computationally redundant. To handle this, we introduce a single-head attention module that inherently prevents head redundancy and simultaneously boosts accuracy by parallelly combining global and local information. Building upon our solutions, we introduce SHViT, a Single-Head Vision Transformer that obtains the state-of-the-art speed-accuracy tradeoff. For example, on ImageNet-1k, our SHViT-S4 is 3.3x, 8.1x, and 2.4x faster than MobileViTv2 x1.0 on GPU, CPU, and iPhone12 mobile device, respectively, while being 1.3% more accurate. For object detection and instance segmentation on MS COCO using Mask-RCNN head, our model achieves performance comparable to FastViT-SA12 while exhibiting 3.8x and 2.0x lower backbone latency on GPU and mobile device, respectively. </p>
back
<p>Bias mitigation of Language Models has been the topic of many studies with a recent focus on learning separate modules like adapters for on-demand debiasing. Besides optimizing for a modularized debiased model, it is often critical in practice to control the degree of bias reduction at inference time, e.g., in order to tune for a desired performance-fairness trade-off in search results or to control the strength of debiasing in classification tasks. In this paper, we introduce Controllable Gate Adapter (ConGater), a novel modular gating mechanism with adjustable sensitivity parameters, which allows for a gradual transition from the biased state of the model to the fully debiased version at inference time. We demonstrate ConGater performance by (1) conducting adversarial debiasing experiments with three different models on three classification tasks with four protected attributes, and (2) reducing the bias of search results through fairness list-wise regularization to enable adjusting a trade-off between performance and fairness metrics. Our experiments on the classification tasks show that compared to baselines of the same caliber, ConGater can maintain higher task performance while containing less information regarding the attributes. Our results on the retrieval task show that the fully debiased ConGater can achieve the same fairness performance while maintaining more than twice as high task performance than recent strong baselines. Overall, besides strong performance ConGater enables the continuous transitioning between biased and debiased states of models, enhancing personalization of use and interpretability through controllability. </p>
back
<p>Peer-to-peer (P2P) lending has emerged as a distinctive financing mechanism, linking borrowers with lenders through online platforms. However, P2P lending faces the challenge of information asymmetry, as lenders often lack sufficient data to assess the creditworthiness of borrowers. This paper proposes a novel approach to address this issue by leveraging the textual descriptions provided by borrowers during the loan application process. Our methodology involves processing these textual descriptions using a Large Language Model (LLM), a powerful tool capable of discerning patterns and semantics within the text. Transfer learning is applied to adapt the LLM to the specific task at hand. </p> <p>Our results derived from the analysis of the Lending Club dataset show that the risk score generated by BERT, a widely used LLM, significantly improves the performance of credit risk classifiers. However, the inherent opacity of LLM-based systems, coupled with uncertainties about potential biases, underscores critical considerations for regulatory frameworks and engenders trust-related concerns among end-users, opening new avenues for future research in the dynamic landscape of P2P lending and artificial intelligence. </p>
back
<p>The remarkable prowess of diffusion models in image generation has spurred efforts to extend their application beyond generative tasks. However, a persistent challenge exists in lacking a unified approach to apply diffusion models to visual perception tasks with diverse semantic granularity requirements. Our purpose is to establish a unified visual perception framework, capitalizing on the potential synergies between generative and discriminative models. In this paper, we propose Vermouth, a simple yet effective framework comprising a pre-trained Stable Diffusion (SD) model containing rich generative priors, a unified head (U-head) capable of integrating hierarchical representations, and an adapted expert providing discriminative priors. Comprehensive investigations unveil potential characteristics of Vermouth, such as varying granularity of perception concealed in latent variables at distinct time steps and various U-net stages. We emphasize that there is no necessity for incorporating a heavyweight or intricate decoder to transform diffusion models into potent representation learners. Extensive comparative evaluations against tailored discriminative models showcase the efficacy of our approach on zero-shot sketch-based image retrieval (ZS-SBIR), few-shot classification, and open-vocabulary semantic segmentation tasks. The promising results demonstrate the potential of diffusion models as formidable learners, establishing their significance in furnishing informative and robust visual representations. </p>
back
<p>A multiagent system can be viewed as a society of autonomous agents, whose interactions can be effectively regulated via social norms. In general, the norms of a society are not hardcoded but emerge from the agents' interactions. Specifically, how the agents in a society react to each other's behavior and respond to the reactions of others determines which norms emerge in the society. We think of these reactions by an agent to the satisfactory or unsatisfactory behaviors of another agent as communications from the first agent to the second agent. Understanding these communications is a kind of social intelligence: these communications provide natural drivers for norm emergence by pushing agents toward certain behaviors, which can become established as norms. Whereas it is well-known that sanctioning can lead to the emergence of norms, we posit that a broader kind of social intelligence can prove more effective in promoting cooperation in a multiagent system. </p> <p>Accordingly, we develop Nest, a framework that models social intelligence in the form of a wider variety of communications and understanding of them than in previous work. To evaluate Nest, we develop a simulated pandemic environment and conduct simulation experiments to compare Nest with baselines considering a combination of three kinds of social communication: sanction, tell, and hint. </p> <p>We find that societies formed of Nest agents achieve norms faster; moreover, Nest agents effectively avoid undesirable consequences, which are negative sanctions and deviation from goals, and yield higher satisfaction for themselves than baseline agents despite requiring only an equivalent amount of information. </p>
back
<p>The problem of the Remaining Useful Life (RUL) prediction, aiming at providing an accurate estimate of the remaining time from the current predicting moment to the complete failure of the device, has gained significant attention from researchers in recent years. In this paper, to overcome the shortcomings of rigid combination for temporal and spatial features in most existing RUL prediction approaches, a spatial-temporal homogeneous feature extractor, named Dual-Mixer model, is firstly proposed. Flexible layer-wise progressive feature fusion is employed to ensure the homogeneity of spatial-temporal features and enhance the prediction accuracy. Secondly, the Feature Space Global Relationship Invariance (FSGRI) training method is introduced based on supervised contrastive learning. This method maintains the consistency of relationships among sample features with their degradation patterns during model training, simplifying the subsequently regression task in the output layer and improving the model's performance in RUL prediction. Finally, the effectiveness of the proposed method is validated through comparisons with other latest research works on the C-MAPSS dataset. The Dual-Mixer model demonstrates superiority across most metrics, while the FSGRI training method shows an average improvement of 7.00% and 2.41% in RMSE and MAPE, respectively, for all baseline models. Our experiments and model code are publicly available at https://github.com/fuen1590/PhmDeepLearningProjects. </p>
back
<p>Robotic hands are an important tool for replacing humans in handling toxic or radioactive materials. However, these are usually highly expensive, and in many cases, once they are contaminated, they cannot be re-used. Some solutions cope with this challenge by 3D printing parts of a tendon-based hand. However, fabrication requires additional assembly steps. Therefore, a novice user may have difficulties fabricating a hand upon contamination of the previous one. We propose the Print-N-Grip (PNG) hand which is a tendon-based underactuated mechanism able to adapt to the shape of objects. The hand is fabricated through one-shot 3D printing with no additional engineering effort, and can accommodate a number of fingers as desired by the practitioner. Due to its low cost, the PNG hand can easily be detached from a universal base for disposing upon contamination, and replaced by a newly printed one. In addition, the PNG hand is scalable such that one can effortlessly resize the computerized model and print. We present the design of the PNG hand along with experiments to show the capabilities and high durability of the hand. </p>
back
<p>Creating and maximizing influence among the customers is one of the central goals of an advertiser, and hence, remains an active area of research in recent times. In this advertisement technique, the advertisers approach an influence provider for a specific number of views of their content on a payment basis. Now, if the influence provider can provide the required number of views or more, he will receive the full, else a partial payment. In the context of an influence provider, it is a loss for him if he offers more or less views. This is formalized as 'Regret', and naturally, in the context of the influence provider, the goal will be to minimize this quantity. In this paper, we solve this problem in the context of billboard advertisement and pose it as a discrete optimization problem. We propose four efficient solution approaches for this problem and analyze them to understand their time and space complexity. We implement all the solution methodologies with real-life datasets and compare the obtained results with the existing solution approaches from the literature. We observe that the proposed solutions lead to less regret while taking less computational time. </p>
back
<p>Apparel's significant role in human appearance underscores the importance of garment digitalization for digital human creation. Recent advances in 3D content creation are pivotal for digital human creation. Nonetheless, garment generation from text guidance is still nascent. We introduce a text-driven 3D garment generation framework, DressCode, which aims to democratize design for novices and offer immense potential in fashion design, virtual try-on, and digital human creation. For our framework, we first introduce SewingGPT, a GPT-based architecture integrating cross-attention with text-conditioned embedding to generate sewing patterns with text guidance. We also tailored a pre-trained Stable Diffusion for high-quality, tile-based PBR texture generation. By leveraging a large language model, our framework generates CG-friendly garments through natural language interaction. Our method also facilitates pattern completion and texture editing, simplifying the process for designers by user-friendly interaction. With comprehensive evaluations and comparisons with other state-of-the-art methods, our method showcases the best quality and alignment with input prompts. User studies further validate our high-quality rendering results, highlighting its practical utility and potential in production settings. </p>
back
<p>While large language models (LLMs) are increasingly being used for program synthesis, they lack the global view needed to develop useful abstractions; they generally predict programs one at a time, often repeating the same functionality. Generating redundant code from scratch is both inefficient and error-prone. To address this, we propose Refactoring for Generalizable Abstraction Learning (ReGAL), a gradient-free method for learning a library of reusable functions via code refactorization, i.e. restructuring code without changing its execution output. ReGAL learns from a small set of existing programs, iteratively verifying and refining its abstractions via execution. We find that the shared function libraries discovered by ReGAL make programs easier to predict across diverse domains. On three datasets (LOGO graphics generation, Date reasoning, and TextCraft, a Minecraft-based text game), both open-source and proprietary LLMs improve in accuracy when predicting programs with ReGAL functions. For CodeLlama-13B, ReGAL results in absolute accuracy increases of 11.5% on graphics, 26.1% on date understanding, and 8.1% on TextCraft, outperforming GPT-3.5 in two of three domains. Our analysis reveals ReGAL's abstractions encapsulate frequently-used subroutines as well as environment dynamics. </p>
back
<p>Image restoration is a fundamental problem that involves recovering a high-quality clean image from its degraded observation. All-In-One image restoration models can effectively restore images from various types and levels of degradation using degradation-specific information as prompts to guide the restoration model. In this work, we present the first approach that uses human-written instructions to guide the image restoration model. Given natural language prompts, our model can recover high-quality images from their degraded counterparts, considering multiple degradation types. Our method, InstructIR, achieves state-of-the-art results on several restoration tasks including image denoising, deraining, deblurring, dehazing, and (low-light) image enhancement. InstructIR improves +1dB over previous all-in-one restoration methods. Moreover, our dataset and results represent a novel benchmark for new research on text-guided image restoration and enhancement. Our code, datasets and models are available at: https://github.com/mv-lab/InstructIR </p>
back
<p>Text simplification aims to make technical texts more accessible to laypeople but often results in deletion of information and vagueness. This work proposes InfoLossQA, a framework to characterize and recover simplification-induced information loss in form of question-and-answer (QA) pairs. Building on the theory of Question Under Discussion, the QA pairs are designed to help readers deepen their knowledge of a text. We conduct a range of experiments with this framework. First, we collect a dataset of 1,000 linguist-curated QA pairs derived from 104 LLM simplifications of scientific abstracts of medical studies. Our analyses of this data reveal that information loss occurs frequently, and that the QA pairs give a high-level overview of what information was lost. Second, we devise two methods for this task: end-to-end prompting of open-source and commercial language models, and a natural language inference pipeline. With a novel evaluation framework considering the correctness of QA pairs and their linguistic suitability, our expert evaluation reveals that models struggle to reliably identify information loss and applying similar standards as humans at what constitutes information loss. </p>
back
<p>We propose a novel GPU-cluster scheduler for distributed DL (DDL) workloads that enables proximity based consolidation of GPU resources based on the DDL jobs' sensitivities to the anticipated communication-network delays. Our scheduler consists of three major components: (i) a classical delay scheduling algorithm to facilitate job placement and consolidation; (ii) a network-sensitive job preemption strategy; and (iii) an "auto-tuner" mechanism to optimize delay timers for effective delay scheduling. Additionally, to enable a cost-effective methodology for large-scale experiments, we develop a data-driven DDL cluster simulation platform. Employing the simulation platform we compare against several state-of-the-art alternatives on real-world workload traces to demonstrate the benefits of our design. Our scheduler can provide improvement of up to 69% in end-to-end Makespan for training all jobs compared to the prevailing consolidation-based scheduling methods, while reducing the average job completion time by up to 83% and minimizing the communication overheads by up to 98% under congested networking conditions. </p>
back
<p>In this paper, we present an advanced approach to solving the inverse rig problem in blendshape animation, using high-quality corrective blendshapes. Our algorithm introduces novel enhancements in three key areas: ensuring high data fidelity in reconstructed meshes, achieving greater sparsity in weight distributions, and facilitating smoother frame-to-frame transitions. While the incorporation of corrective terms is a known practice, our method differentiates itself by employing a unique combination of $l_1$ norm regularization for sparsity and a temporal smoothness constraint through roughness penalty, focusing on the sum of second differences in consecutive frame weights. A significant innovation in our approach is the temporal decoupling of blendshapes, which permits simultaneous optimization across entire animation sequences. This feature sets our work apart from existing methods and contributes to a more efficient and effective solution. Our algorithm exhibits a marked improvement in maintaining data fidelity and ensuring smooth frame transitions when compared to prior approaches that either lack smoothness regularization or rely solely on linear blendshape models. In addition to superior mesh resemblance and smoothness, our method offers practical benefits, including reduced computational complexity and execution time, achieved through a novel parallelization strategy using clustering methods. Our results not only advance the state of the art in terms of fidelity, sparsity, and smoothness in inverse rigging but also introduce significant efficiency improvements. The source code will be made available upon acceptance of the paper. </p>
back
<p>Extracting meaningful information from high-dimensional data poses a formidable modeling challenge, particularly when the data is obscured by noise or represented through different modalities. In this research, we propose a novel non-parametric modeling approach, leveraging the Gaussian Process (GP), to characterize high-dimensional data by mapping it to a latent low-dimensional manifold. This model, named the Latent Discriminative Generative Decoder (LDGD), utilizes both the data (or its features) and associated labels (such as category or stimulus) in the manifold discovery process. To infer the latent variables, we derive a Bayesian solution, allowing LDGD to effectively capture inherent uncertainties in the data while enhancing the model's predictive accuracy and robustness. We demonstrate the application of LDGD on both synthetic and benchmark datasets. Not only does LDGD infer the manifold accurately, but its prediction accuracy in anticipating labels surpasses state-of-the-art approaches. We have introduced inducing points to reduce the computational complexity of Gaussian Processes (GPs) for large datasets. This enhancement facilitates batch training, allowing for more efficient processing and scalability in handling extensive data collections. Additionally, we illustrate that LDGD achieves higher accuracy in predicting labels and operates effectively with a limited training dataset, underscoring its efficiency and effectiveness in scenarios where data availability is constrained. These attributes set the stage for the development of non-parametric modeling approaches in the analysis of high-dimensional data; especially in fields where data are both high-dimensional and complex. </p>
back
<p>Pneumatic systems are common in manufacturing, healthcare, transportation, robotics, and many other fields. Failures in these systems can have very serious consequences, particularly if they go undetected. In this work, we present an air-powered error detector device that can detect and respond to failures in pneumatically actuated systems. The device contains 21 monolithic membrane valves that act like transistors in a pneumatic logic "circuit" that uses vacuum to represent TRUE and atmospheric pressure as FALSE. Three pneumatic exclusive-OR (XOR) gates are used to calculate the parity bit corresponding to the values of several control bits. If the calculated value of the parity bit differs from the expected value, then an error (like a leak or a blocked air line) has been detected and the device outputs a pneumatic error signal which can in turn be used to alert a user, shut down the system, or take some other action. As a proof-of-concept, we used our pneumatic error detector to monitor the operation of a medical device, an intermittent pneumatic compression (IPC) device commonly used to prevent the formation of life-threatening blood clots in the wearer's legs. Experiments confirm that when the IPC device was damaged, the pneumatic error detector immediately recognized the error (a leak) and alerted the wearer using sound. By providing a simple and low-cost way to add fault detection to pneumatic actuation systems without using sensors, our pneumatic error detector can promote safety and reliability across the wide range of pneumatic systems. </p>
back
<p>This paper presents a modeling effort to explore the underlying physics of temperature evolution during additive friction stir deposition (AFSD) by a human-AI teaming approach. AFSD is an emerging solid-state additive manufacturing technology that deposits materials without melting. However, both process modeling and modeling of the AFSD tool are at an early stage. In this paper, a human-AI teaming approach is proposed to combine models based on first principles with AI. The resulting human-informed machine learning method, denoted as AFSD-Physics, can effectively learn the governing equations of temperature evolution at the tool and the build from in-process measurements. Experiments are designed and conducted to collect in-process measurements for the deposition of aluminum 7075 with a total of 30 layers. The acquired governing equations are physically interpretable models with low computational cost and high accuracy. Model predictions show good agreement with the measurements. Experimental validation with new process parameters demonstrates the model's generalizability and potential for use in tool temperature control and process optimization. </p>
back
<p>The growing reliance on online services underscores the crucial role of recommendation systems, especially on social media platforms seeking increased user engagement. This study investigates how recommendation systems influence the impact of personal behavioral traits on social network dynamics. It explores the interplay between homophily, users' openness to novel ideas, and recommendation-driven exposure to new opinions. Additionally, the research examines the impact of recommendation systems on the diversity of newly generated ideas, shedding light on the challenges and opportunities in designing effective systems that balance the exploration of new ideas with the risk of reinforcing biases or filtering valuable, unconventional concepts. </p>
back
<p>There is a growing demand for transparency in search engines to understand how search results are curated and to enhance users' trust. Prior research has introduced search result explanations with a focus on how to explain, assuming explanations are beneficial. Our study takes a step back to examine if search explanations are needed and when they are likely to provide benefits. Additionally, we summarize key characteristics of helpful explanations and share users' perspectives on explanation features provided by Google and Bing. Interviews with non-technical individuals reveal that users do not always seek or understand search explanations and mostly desire them for complex and critical tasks. They find Google's search explanations too obvious but appreciate the ability to contest search results. Based on our findings, we offer design recommendations for search engines and explanations to help users better evaluate search results and enhance their search experience. </p>
back
<p>Artificial intelligence (AI) has seen remarkable advancements across various domains, including natural language processing, computer vision, autonomous vehicles, and biology. However, the rapid expansion of AI technologies has escalated the demand for more powerful computing resources. As digital computing approaches fundamental limits, neuromorphic photonics emerges as a promising platform to complement existing digital systems. In neuromorphic photonic computing, photonic devices are controlled using analog signals. This necessitates the use of digital-to-analog converters (DAC) and analog-to-digital converters (ADC) for interfacing with these devices during inference and training. However, data movement between memory and these converters in conventional von Neumann computing architectures consumes energy. To address this, analog memory co-located with photonic computing devices is proposed. This approach aims to reduce the reliance on DACs and ADCs and minimize data movement to enhance compute efficiency. This paper demonstrates a monolithically integrated neuromorphic photonic circuit with co-located capacitive analog memory and compares various analog memory technologies for neuromorphic photonic computing using the MNIST dataset as a benchmark. </p>
back
<p>The Kinematic Theory of rapid movements, and its associated Sigma-Lognormal, model 2D spatiotemporal trajectories. It is constructed mainly as a temporal overlap of curves between virtual target points. Specifically, it uses an arc and a lognormal as primitives for the representation of the trajectory and velocity, respectively. This paper proposes developing this model, in what we call the Kinematic Theory Transform, which establishes a mathematical framework that allows further primitives to be used. Mainly, we evaluate Euler curves to link virtual target points and Gaussian, Beta, Gamma, Double-bounded lognormal, and Generalized Extreme Value functions to model the bell-shaped velocity profile. Using these primitives, we report reconstruction results with spatiotemporal trajectories executed by human beings, animals, and anthropomorphic robots. </p>
back
<p>In the realm of Earth science, effective cloud property retrieval, encompassing cloud masking, cloud phase classification, and cloud optical thickness (COT) prediction, remains pivotal. Traditional methodologies necessitate distinct models for each sensor instrument due to their unique spectral characteristics. Recent strides in Earth Science research have embraced machine learning and deep learning techniques to extract features from satellite datasets' spectral observations. However, prevailing approaches lack novel architectures accounting for hierarchical relationships among retrieval tasks. Moreover, considering the spectral diversity among existing sensors, the development of models with robust generalization capabilities over different sensor datasets is imperative. Surprisingly, there is a dearth of methodologies addressing the selection of an optimal model for diverse datasets. In response, this paper introduces MT-HCCAR, an end-to-end deep learning model employing multi-task learning to simultaneously tackle cloud masking, cloud phase retrieval (classification tasks), and COT prediction (a regression task). The MT-HCCAR integrates a hierarchical classification network (HC) and a classification-assisted attention-based regression network (CAR), enhancing precision and robustness in cloud labeling and COT prediction. Additionally, a comprehensive model selection method rooted in K-fold cross-validation, one standard error rule, and two introduced performance scores is proposed to select the optimal model over three simulated satellite datasets OCI, VIIRS, and ABI. The experiments comparing MT-HCCAR with baseline methods, the ablation studies, and the model selection affirm the superiority and the generalization capabilities of MT-HCCAR. </p>
back
<p>This work undertakes studies to evaluate Interpretability Methods for Time-Series Deep Learning. Sensitivity analysis assesses how input changes affect the output, constituting a key component of interpretation. Among the post-hoc interpretation methods such as back-propagation, perturbation, and approximation, my work will investigate perturbation-based sensitivity Analysis methods on modern Transformer models to benchmark their performances. Specifically, my work answers three research questions: 1) Do different sensitivity analysis (SA) methods yield comparable outputs and attribute importance rankings? 2) Using the same sensitivity analysis method, do different Deep Learning (DL) models impact the output of the sensitivity analysis? 3) How well do the results from sensitivity analysis methods align with the ground truth? </p>
back
<p>Deep learning-based informative band selection methods on hyperspectral images (HSI) recently have gained intense attention to eliminate spectral correlation and redundancies. However, the existing deep learning-based methods either need additional post-processing strategies to select the descriptive bands or optimize the model indirectly, due to the parameterization inability of discrete variables for the selection procedure. To overcome these limitations, this work proposes a novel end-to-end network for informative band selection. The proposed network is inspired by the advances in concrete autoencoder (CAE) and dropout feature ranking strategy. Different from the traditional deep learning-based methods, the proposed network is trained directly given the required band subset eliminating the need for further post-processing. Experimental results on four HSI scenes show that the proposed dropout CAE achieves substantial and effective performance levels outperforming the competing methods. </p>
back
<p>FPGA technology mapping is the process of implementing a hardware design expressed in high-level HDL (hardware design language) code using the low-level, architecture-specific primitives of the target FPGA. As FPGAs become increasingly heterogeneous, achieving high performance requires hardware synthesis tools that better support mapping to complex, highly configurable primitives like digital signal processors (DSPs). Current tools support DSP mapping via handwritten special-case mapping rules, which are laborious to write, error-prone, and often overlook mapping opportunities. We introduce Lakeroad, a principled approach to technology mapping via sketch-guided program synthesis. Lakeroad leverages two techniques -- architecture-independent sketch templates and semantics extraction from HDL -- to provide extensible technology mapping with stronger correctness guarantees and higher coverage of mapping opportunities than state-of-the-art tools. Across representative microbenchmarks, Lakeroad produces 2--3.5$\times$ the number of optimal mappings compared to proprietary state-of-the-art tools and 6--44$\times$ the number of optimal mappings compared to popular open-source tools, while also providing correctness guarantees not given by any other tool. </p>
back
<p>Scheduling real-time tasks that utilize GPUs with analyzable guarantees poses a significant challenge due to the intricate interaction between CPU and GPU resources, as well as the complex GPU hardware and software stack. While much research has been conducted in the real-time research community, several limitations persist, including the absence or limited availability of preemption, extended blocking times, and/or the need for extensive modifications to program code. In this paper, we propose two novel techniques, namely the kernel thread and IOCTL-based approaches, to enable preemptive priority-based scheduling for real-time GPU tasks. Our approaches exert control over GPU context scheduling at the device driver level and enable preemptive GPU scheduling based on task priorities. The kernel thread-based approach achieves this without requiring modifications to user-level programs, while the IOCTL-based approach needs only a single macro at the boundaries of GPU access segments. In addition, we provide a comprehensive response time analysis that takes into account overlaps between different task segments, mitigating pessimism in worst-case estimates. Through empirical evaluations and case studies, we demonstrate the effectiveness of the proposed approaches in improving taskset schedulability and timeliness of real-time tasks. The results highlight significant improvements over prior work, with up to 40\% higher schedulability, while also achieving predictable worst-case behavior on Nvidia Jetson embedded platforms. </p>
back
<p>Selection of hyperparameters in deep neural networks is a challenging problem due to the wide search space and emergence of various layers with specific hyperparameters. There exists an absence of consideration for the neural architecture selection of convolutional neural networks (CNNs) for spectrum sensing. Here, we develop a method using reinforcement learning and Q-learning to systematically search and evaluate various architectures for generated datasets including different signals and channels in the spectrum sensing problem. We show by extensive simulations that CNN-based detectors proposed by our developed method outperform several detectors in the literature. For the most complex dataset, the proposed approach provides 9% enhancement in accuracy at the cost of higher computational complexity. Furthermore, a novel method using multi-armed bandit model for selection of the sensing time is proposed to achieve higher throughput and accuracy while minimizing the consumed energy. The method dynamically adjusts the sensing time under the time-varying condition of the channel without prior information. We demonstrate through a simulated scenario that the proposed method improves the achieved reward by about 20% compared to the conventional policies. Consequently, this study effectively manages the selection of important hyperparameters for CNN-based detectors offering superior performance of cognitive radio network. </p>
back
<p>In high-end visual effects pipelines, a customized (and expensive) light stage system is (typically) used to scan an actor in order to acquire both geometry and texture for various expressions. Aiming towards democratization, we propose a novel pipeline for obtaining geometry and texture as well as enough expression information to build a customized person-specific animation rig without using a light stage or any other high-end hardware (or manual cleanup). A key novel idea consists of warping real-world images to align with the geometry of a template avatar and subsequently projecting the warped image into the template avatar's texture; importantly, this allows us to leverage baked-in real-world lighting/texture information in order to create surrogate facial features (and bridge the domain gap) for the sake of geometry reconstruction. Not only can our method be used to obtain a neutral expression geometry and de-lit texture, but it can also be used to improve avatars after they have been imported into an animation system (noting that such imports tend to be lossy, while also hallucinating various features). Since a default animation rig will contain template expressions that do not correctly correspond to those of a particular individual, we use a Simon Says approach to capture various expressions and build a person-specific animation rig (that moves like they do). Our aforementioned warping/projection method has high enough efficacy to reconstruct geometry corresponding to each expressions. </p>
back
<p>Battery-constrained power consumption, compute limitations, and high frame rate requirements in head-mounted displays present unique challenges in the drive to present increasingly immersive and comfortable imagery in virtual reality. However, humans are not equally sensitive to all regions of the visual field, and perceptually-optimized rendering techniques are increasingly utilized to address these bottlenecks. Many of these techniques are gaze-contingent and often render reduced detail away from a user's fixation. Such techniques are dependent on spatio-temporally-accurate gaze tracking and can result in obvious visual artifacts when eye tracking is inaccurate. In this work we present a gaze-contingent rendering technique which only requires saccade detection, bypassing the need for highly-accurate eye tracking. In our first experiment, we show that visual acuity is reduced for several hundred milliseconds after a saccade. In our second experiment, we use these results to reduce the rendered image resolution after saccades in a controlled psychophysical setup, and find that observers cannot discriminate between saccade-contingent reduced-resolution rendering and full-resolution rendering. Finally, in our third experiment, we introduce a 90 pixels per degree headset and validate our saccade-contingent rendering method under typical VR viewing conditions. </p>
back
<p>Utilizing administrative data to predict outcomes is an important application area of machine learning, particularly in healthcare. Most administrative data records are timestamped and the pattern of records over time is a key input for machine learning models. This paper explores how best to divide the observation window of a machine learning model into time segments or "bins". A computationally efficient process is presented that identifies which data features benefit most from smaller, higher resolution time segments. Results generated on healthcare and housing/homelessness administrative data demonstrate that optimizing the time bin size of these high priority features while using a single time bin for the other features achieves machine learning models that are simpler and quicker to train. This approach also achieves similar and sometimes better performance than more complex models that default to representing all data features with the same time resolution. </p>
back
<p>This work focuses on non-adaptive group testing, with a primary goal of efficiently identifying a set of at most $d$ defective elements among a given set of elements using the fewest possible number of tests. Non-adaptive combinatorial group testing often employs disjunctive codes and union-free codes. This paper discusses union-free codes with fast decoding (UFFD codes), a recently introduced class of union-free codes that combine the best of both worlds -- the linear complexity decoding of disjunctive codes and the fewest number of tests of union-free codes. In our study, we distinguish two subclasses of these codes -- one subclass, denoted as $(=d)$-UFFD codes, can be used when the number of defectives $d$ is a priori known, whereas $(\le d)$-UFFD codes works for any subset of at most $d$ defectives. Previous studies have established a lower bound on the rate of these codes for $d=2$. Our contribution lies in deriving new lower bounds on the rate for both $(=d)$- and $(\le d)$-UFFD codes for an arbitrary number $d \ge 2$ of defectives. Our results show that for $d\to\infty$, the rate of $(=d)$-UFFD codes is twice as large as the best-known lower bound on the rate of $d$-disjunctive codes. In addition, the rate of $(\le d)$-UFFD code is shown to be better than the known lower bound on the rate of $d$-disjunctive codes for small values of $d$. </p>
back
<p>The intricate relationship between human decision-making and emotions, particularly guilt and regret, has significant implications on behavior and well-being. Yet, these emotions subtle distinctions and interplay are often overlooked in computational models. This paper introduces a dataset tailored to dissect the relationship between guilt and regret and their unique textual markers, filling a notable gap in affective computing research. Our approach treats guilt and regret recognition as a binary classification task and employs three machine learning and six transformer-based deep learning techniques to benchmark the newly created dataset. The study further implements innovative reasoning methods like chain-of-thought and tree-of-thought to assess the models interpretive logic. The results indicate a clear performance edge for transformer-based models, achieving a 90.4% macro F1 score compared to the 85.3% scored by the best machine learning classifier, demonstrating their superior capability in distinguishing complex emotional states. </p>
back
<p>This paper presents a fully multidimensional kernel-based reconstruction scheme for finite volume methods applied to systems of hyperbolic conservation laws, with a particular emphasis on the compressible Euler equations. Non-oscillatory reconstruction is achieved through an adaptive order weighted essentially non-oscillatory (WENO-AO) method cast into a form suited to multidimensional reconstruction. A kernel-based approach inspired by radial basis functions (RBF) and Gaussian process (GP) modeling, which we call KFVM-WENO, is presented here. This approach allows the creation of a scheme of arbitrary order of accuracy with simply defined multidimensional stencils and substencils. Furthermore, the fully multidimensional nature of the reconstruction allows for a more straightforward extension to higher spatial dimensions and removes the need for complicated boundary conditions on intermediate quantities in modified dimension-by-dimension methods. In addition, a new simple-yet-effective set of reconstruction variables is introduced, which could be useful in existing schemes with little modification. The proposed scheme is applied to a suite of stringent and informative benchmark problems to demonstrate its efficacy and utility. A highly parallel multi-GPU implementation using Kokkos and the message passing interface (MPI) is also provided. </p>
back
<p>In this study, we developed a real-time connected vehicle (CV) speed advisory application that uses public cloud services and tested it on a simulated signalized corridor for different roadway traffic conditions. First, we developed a scalable serverless cloud computing architecture leveraging public cloud services offered by Amazon Web Services (AWS) to support the requirements of a real-time CV application. Second, we developed an optimization-based real-time CV speed advisory algorithm by taking a modular design approach, which makes the application automatically scalable and deployable in the cloud using the serverless architecture. Third, we developed a cloud-in-the-loop simulation testbed using AWS and an open-source microscopic roadway traffic simulator called Simulation of Urban Mobility (SUMO). Our analyses based on different roadway traffic conditions showed that the serverless CV speed advisory application meets the latency requirement of real-time CV mobility applications. Besides, our serverless CV speed advisory application reduced the average stopped delay (by 77%) and the aggregated risk of collision (by 21%) at signalized intersection of a corridor. These prove the feasibility as well as the efficacy of utilizing public cloud infrastructure to implement real-time roadway traffic management applications in a CV environment. </p>
back
<p>The MPI Forum has recently adopted a Python scripting engine for generating the API text in the standard document. As a by-product, it made available reliable and rich descriptions of all MPI functions that are suited for scripting tools. Using these extracted API information, we developed a Python code generation toolbox to generate the language binding layers in MPICH. The toolbox replaces nearly 70,000 lines of manually maintained C and Fortran 2008 binding code with around 5,000 lines of Python scripts plus some simple configuration. In addition to completely eliminating code duplication in the binding layer and avoiding bugs from manual code copying , the code generation also minimizes the effort for API extension and code instrumentation. This is demonstrated in our implementation of MPI-4 large count functions and the prototyping of a next generation MPI profiling interface, QMPI. </p>
back
<p>Multi-label learning is a rapidly growing research area that aims to predict multiple labels from a single input data point. In the era of big data, tasks involving multi-label classification (MLC) or ranking present significant and intricate challenges, capturing considerable attention in diverse domains. Inherent difficulties in MLC include dealing with high-dimensional data, addressing label correlations, and handling partial labels, for which conventional methods prove ineffective. Recent years have witnessed a notable increase in adopting deep learning (DL) techniques to address these challenges more effectively in MLC. Notably, there is a burgeoning effort to harness the robust learning capabilities of DL for improved modelling of label dependencies and other challenges in MLC. However, it is noteworthy that comprehensive studies specifically dedicated to DL for multi-label learning are limited. Thus, this survey aims to thoroughly review recent progress in DL for multi-label learning, along with a summary of open research problems in MLC. The review consolidates existing research efforts in DL for MLC,including deep neural networks, transformers, autoencoders, and convolutional and recurrent architectures. Finally, the study presents a comparative analysis of the existing methods to provide insightful observations and stimulate future research directions in this domain. </p>
back
<p>MPI+Threads, embodied by the MPI/OpenMP hybrid programming model, is a parallel programming paradigm where threads are used for on-node shared-memory parallelization and MPI is used for multi-node distributed-memory parallelization. OpenMP provides an incremental approach to parallelize code, while MPI, with its isolated address space and explicit messaging API, affords straightforward paths to obtain good parallel performance. However, MPI+Threads is not an ideal solution. Since MPI is unaware of the thread context, it cannot be used for interthread communication. This results in duplicated efforts to create separate and sometimes nested solutions for similar parallel tasks. In addition, because the MPI library is required to obey message-ordering semantics, mixing threads and MPI via MPI_THREAD_MULTIPLE can easily result in miserable performance due to accidental serializations. </p> <p>We propose a new MPI extension, MPIX Thread Communicator (threadcomm), that allows threads to be assigned distinct MPI ranks within thread parallel regions. The threadcomm extension combines both MPI processes and OpenMP threads to form a unified parallel environment. We show that this MPIxThreads (MPI Multiply Threads) paradigm allows OpenMP and MPI to work together in a complementary way to achieve both cleaner codes and better performance. </p>
back
<p>Database modeling is a key activity towards the fulfillment of storage requirements. Despite the availability of several database modeling tools for developers, these often come with associated costs, setup complexities, usability challenges, or dependency on specific operating systems. In this paper we present ONDA, a web-based tool developed at the University of Coimbra, that allows the creation of Entity-Relationship diagrams, visualization of physical models, and generation of SQL code for various database engines. ONDA is freely available at https://onda.dei.uc.pt and was created with the intention of supporting teaching activities at university-level database courses. At the time of writing, the tool being used by more than three hundred university students every academic year. </p>
back
<p>Training large language models (LLMs) with a large and diverse instruction dataset aligns the models to comprehend and follow human instructions. Recent works have shown that using a small set of high-quality instructions can outperform using large yet more noisy ones. Because instructions are unlabeled and their responses are natural text, traditional active learning schemes with the model's confidence cannot be directly applied to the selection of unlabeled instructions. In this work, we propose a novel method for instruction selection, called SelectLLM, that leverages LLMs for the selection of high-quality instructions. Our high-level idea is to use LLMs to estimate the usefulness and impactfulness of each instruction without the corresponding labels (i.e., responses), via prompting. SelectLLM involves two steps: dividing the unlabelled instructions using a clustering algorithm (e.g., CoreSet) to multiple clusters, and then prompting LLMs to choose high-quality instructions within each cluster. SelectLLM showed comparable or slightly better performance on the popular instruction benchmarks, compared to the recent state-of-the-art selection methods. All code and data are publicly available (https://github.com/minnesotanlp/select-llm). </p>
back
<p>The objective of this research is to mitigate vibrations in induction motors. To achieve this goal, a discontinuous pulse width modulation (PWM) control strategy based on carrier wave modulation is proposed for multilevel inverters. This study provides justification for the reduction of machine vibrations compared to existing control techniques documented in the technical literature. Additionally, the proposed technique offers the advantage of attenuating the Total Harmonic Distortion of the multilevel inverter's output voltage while simultaneously achieving a higher RMS value for the same DC level. By modifying a parameter of the carrier wave, the control strategy allows for variations in the electrical spectrum while avoiding natural mechanical resonance frequencies, thereby reducing motor vibrations. Laboratory results demonstrating the application of different modulation strategies in a multilevel inverter for an induction motor and a comparison with the presented strategy are provided </p>
back
<p>The pervasive spread of misinformation and disinformation poses a significant threat to society. Professional fact-checkers play a key role in addressing this threat, but the vast scale of the problem forces them to prioritize their limited resources. This prioritization may consider a range of factors, such as varying risks of harm posed to specific groups of people. In this work, we investigate potential implications of using a large language model (LLM) to facilitate such prioritization. Because fact-checking impacts a wide range of diverse segments of society, it is important that diverse views are represented in the claim prioritization process. This paper examines whether a LLM can reflect the views of various groups when assessing the harms of misinformation, focusing on gender as a primary variable. We pose two central questions: (1) To what extent do prompts with explicit gender references reflect gender differences in opinion in the United States on topics of social relevance? and (2) To what extent do gender-neutral prompts align with gendered viewpoints on those topics? To analyze these questions, we present the TopicMisinfo dataset, containing 160 fact-checked claims from diverse topics, supplemented by nearly 1600 human annotations with subjective perceptions and annotator demographics. Analyzing responses to gender-specific and neutral prompts, we find that GPT 3.5-Turbo reflects empirically observed gender differences in opinion but amplifies the extent of these differences. These findings illuminate AI's complex role in moderating online communication, with implications for fact-checkers, algorithm designers, and the use of crowd-workers as annotators. We also release the TopicMisinfo dataset to support continuing research in the community. </p>
back
<p>This paper describes the results of the IEEE BigData 2023 Keystroke Verification Challenge (KVC), that considers the biometric verification performance of Keystroke Dynamics (KD), captured as tweet-long sequences of variable transcript text from over 185,000 subjects. The data are obtained from two of the largest public databases of KD up to date, the Aalto Desktop and Mobile Keystroke Databases, guaranteeing a minimum amount of data per subject, age and gender annotations, absence of corrupted data, and avoiding excessively unbalanced subject distributions with respect to the considered demographic attributes. Several neural architectures were proposed by the participants, leading to global Equal Error Rates (EERs) as low as 3.33% and 3.61% achieved by the best team respectively in the desktop and mobile scenario, outperforming the current state of the art biometric verification performance for KD. Hosted on CodaLab, the KVC will be made ongoing to represent a useful tool for the research community to compare different approaches under the same experimental conditions and to deepen the knowledge of the field. </p>
back
<p>Manipulating deformable objects arises in daily life and numerous applications. Despite phenomenal advances in industrial robotics, manipulation of deformable objects remains mostly a manual task. This is because of the high number of internal degrees of freedom and the complexity of predicting its motion. In this paper, we apply the computationally efficient position-based dynamics method to predict object motion and distance to obstacles. This distance is incorporated in a control barrier function for the resolved motion kinematic control for one or more robots to adjust their motion to avoid colliding with the obstacles. The controller has been applied in simulations to 1D and 2D deformable objects with varying numbers of assistant agents, demonstrating its versatility across different object types and multi-agent systems. Results indicate the feasibility of real-time collision avoidance through deformable object simulation, minimizing path tracking error while maintaining a predefined minimum distance from obstacles and preventing overstretching of the deformable object. The implementation is performed in ROS, allowing ready portability to different applications. </p>
back
<p>The number of Hindi speakers on social media has increased dramatically in recent years. Regret is a common emotional experience in our everyday life. Many speakers on social media, share their regretful experiences and opinions regularly. It might cause a re-evaluation of one's choices and a desire to make a different option if given the chance. As a result, knowing the source of regret is critical for investigating its impact on behavior and decision-making. This study focuses on regret and how it is expressed, specifically in Hindi, on various social media platforms. In our study, we present a novel dataset from three different sources, where each sentence has been manually classified into one of three classes "Regret by action", "Regret by inaction", and "No regret". Next, we use this dataset to investigate the linguistic expressions of regret in Hindi text and also identify the textual domains that are most frequently associated with regret. Our findings indicate that individuals on social media platforms frequently express regret for both past inactions and actions, particularly within the domain of interpersonal relationships. We use a pre-trained BERT model to generate word embeddings for the Hindi dataset and also compare deep learning models with conventional machine learning models in order to demonstrate accuracy. Our results show that BERT embedding with CNN consistently surpassed other models. This described the effectiveness of BERT for conveying the context and meaning of words in the regret domain. </p>
back
<p>The Ego Network Model (ENM) is a model for the structural organisation of relationships, rooted in evolutionary anthropology, that is found ubiquitously in social contexts. It takes the perspective of a single user (Ego) and organises their contacts (Alters) into a series of (typically 5) concentric circles of decreasing intimacy and increasing size. Alters are sorted based on their tie strength to the Ego, however, this is difficult to measure directly. Traditionally, the interaction frequency has been used as a proxy but this misses the qualitative aspects of connections, such as signs (i.e. polarity), which have been shown to provide extremely useful information. However, the sign of an online social relationship is usually an implicit piece of information, which needs to be estimated by interaction data from Online Social Networks (OSNs), making sign prediction in OSNs a research challenge in and of itself. This work aims to bring the ENM into the signed networks domain by investigating the interplay of signed connections with the ENM. This paper delivers 2 main contributions. Firstly, a new and data-efficient method of signing relationships between individuals using sentiment analysis and, secondly, we provide an in-depth look at the properties of Signed Ego Networks (SENs), using 9 Twitter datasets of various categories of users. We find that negative connections are generally over-represented in the active part of the Ego Networks, suggesting that Twitter greatly over-emphasises negative relationships with respect to "offline" social networks. Further, users who use social networks for professional reasons have an even greater share of negative connections. </p>
back
<p>This paper proposes a novel, more computationally efficient method for optimizing robot excitation trajectories for dynamic parameter identification, emphasizing self-collision avoidance. This addresses the system identification challenges for getting high-quality training data associated with co-manipulated robotic arms that can be equipped with a variety of tools, a common scenario in industrial but also clinical and research contexts. Utilizing the Unified Robotics Description Format (URDF) to implement a symbolic Python implementation of the Recursive Newton-Euler Algorithm (RNEA), the approach aids in dynamically estimating parameters such as inertia using regression analyses on data from real robots. The excitation trajectory was evaluated and achieved on par criteria when compared to state-of-the-art reported results which didn't consider self-collision and tool calibrations. Furthermore, physical Human-Robot Interaction (pHRI) admittance control experiments were conducted in a surgical context to evaluate the derived inverse dynamics model showing a 30.1\% workload reduction by the NASA TLX questionnaire. </p>
back
<p>This paper introduces a stochastic hybrid system (SHS) framework in state space model to capture sensor, communication, and system contingencies in modern power systems (MPS). Within this new framework, the paper concentrates on the development of state estimation methods and algorithms to provide reliable state estimation under randomly intermittent and noisy sensor data. MPSs employ diversified measurement devices for monitoring system operations that are subject to random measurement errors and rely on communication networks to transmit data whose channels encounter random packet loss and interruptions. The contingency and noise form two distinct and interacting stochastic processes that have a significant impact on state estimation accuracy and reliability. This paper formulates stochastic hybrid system models for MPSs, introduces coordinated observer design algorithms for state estimation, and establishes their convergence and reliability properties. A further study reveals a fundamental design tradeoff between convergence rates and steady-state error variances. Simulation studies on the IEEE 5-bus system and IEEE 33-bus system are used to illustrate the modeling methods, observer design algorithms, convergence properties, performance evaluations, and impact sensor system selections. </p>
back
<p>Communication with the goal of accurately conveying meaning, rather than accurately transmitting symbols, has become an area of growing interest. This paradigm, termed semantic communication, typically leverages modern developments in artificial intelligence and machine learning to improve the efficiency and robustness of communication systems. However, a standard model for capturing and quantifying the details of "meaning" is lacking, with many leading approaches to semantic communication adopting a black-box framework with little understanding of what exactly the model is learning. One solution is to utilize the conceptual spaces framework, which models meaning explicitly in a geometric manner. Though prior work studying semantic communication with conceptual spaces has shown promising results, these previous attempts involve hand-crafting a conceptual space model, severely limiting the scalability and practicality of the approach. In this work, we develop a framework for learning a domain of a conceptual space model using only the raw data with high-level property labels. In experiments using the MNIST and CelebA datasets, we show that the domains learned using the framework maintain semantic similarity relations and possess interpretable dimensions. </p>
back
<p>This study examines the use of embedded tweets in online news media. In particular, we add to the previous literature by exploring embedded tweets across reliable and unreliable news outlets. We use a mixed-method analysis to examine how the function and frequency of embedded tweets change across outlet reliability and news topic. We find that, no matter the outlet reliability, embedded tweets are most often used to relay the opinions of elites, to syndicate information from another news source, or to self-cite information an outlet previously produced. Our results also show some notable differences between reliable media and fringe media's use of tweets. Namely, fringe media embed tweets more and use those tweets as the source of news more than reliable media. Our work adds to the literature on hybrid media systems and the normalization of social media in journalism. </p>
back
<p>We study an opinion dynamics model in which each agent takes a random Bernoulli distributed action whose probability is updated at each discrete time step, and we prove that this model converges almost surely to consensus. We also provide a detailed critique of a claimed proof of this result in the literature. We generalize the result by proving that the assumption of irreducibility in the original model is not necessary. Furthermore, we prove as a corollary of the generalized result that the almost sure convergence to consensus holds also in the presence of a stubborn agent which never changes its opinion. In addition, we show that the model, in both the original and generalized cases, converges to consensus also in $r$th mean. </p>
back
<p>The dominant probing approaches rely on the zero-shot performance of image-text matching tasks to gain a finer-grained understanding of the representations learned by recent multimodal image-language transformer models. The evaluation is carried out on carefully curated datasets focusing on counting, relations, attributes, and others. This work introduces an alternative probing strategy called guided masking. The proposed approach ablates different modalities using masking and assesses the model's ability to predict the masked word with high accuracy. We focus on studying multimodal models that consider regions of interest (ROI) features obtained by object detectors as input tokens. We probe the understanding of verbs using guided masking on ViLBERT, LXMERT, UNITER, and VisualBERT and show that these models can predict the correct verb with high accuracy. This contrasts with previous conclusions drawn from image-text matching probing techniques that frequently fail in situations requiring verb understanding. The code for all experiments will be publicly available https://github.com/ivana-13/guided_masking. </p>
back
<p>Large Language Models (LLMs) have demonstrated remarkable language understanding and generation capabilities. However, training, deploying, and accessing these models pose notable challenges, including resource-intensive demands, extended training durations, and scalability issues. To address these issues, we introduce a concept of hierarchical, distributed LLM architecture that aims at enhancing the accessibility and deployability of LLMs across heterogeneous computing platforms, including general-purpose computers (e.g., laptops) and IoT-style devices (e.g., embedded systems). By introducing a "layered" approach, the proposed architecture enables on-demand accessibility to LLMs as a customizable service. This approach also ensures optimal trade-offs between the available computational resources and the user's application needs. We envision that the concept of hierarchical LLM will empower extensive, crowd-sourced user bases to harness the capabilities of LLMs, thereby fostering advancements in AI technology in general. </p>
back
<p>In radiology, Artificial Intelligence (AI) has significantly advanced report generation, but automatic evaluation of these AI-produced reports remains challenging. Current metrics, such as Conventional Natural Language Generation (NLG) and Clinical Efficacy (CE), often fall short in capturing the semantic intricacies of clinical contexts or overemphasize clinical details, undermining report clarity. To overcome these issues, our proposed method synergizes the expertise of professional radiologists with Large Language Models (LLMs), like GPT-3.5 and GPT-4 1. Utilizing In-Context Instruction Learning (ICIL) and Chain of Thought (CoT) reasoning, our approach aligns LLM evaluations with radiologist standards, enabling detailed comparisons between human and AI generated reports. This is further enhanced by a Regression model that aggregates sentence evaluation scores. Experimental results show that our ''Detailed GPT-4 (5-shot)'' model achieves a 0.48 score, outperforming the METEOR metric by 0.19, while our ''Regressed GPT-4'' model shows even greater alignment with expert evaluations, exceeding the best existing metric by a 0.35 margin. Moreover, the robustness of our explanations has been validated through a thorough iterative strategy. We plan to publicly release annotations from radiology experts, setting a new standard for accuracy in future assessments. This underscores the potential of our approach in enhancing the quality assessment of AI-driven medical reports. </p>
back
<p>One-shot channel simulation has recently emerged as a promising alternative to quantization and entropy coding in machine-learning-based lossy data compression schemes. However, while there are several potential applications of channel simulation - lossy compression with realism constraints or differential privacy, to name a few - little is known about its fundamental limitations. In this paper, we restrict our attention to a subclass of channel simulation protocols called causal rejection samplers (CRS), establish new, tighter lower bounds on their expected runtime and codelength, and demonstrate the bounds' achievability. Concretely, for an arbitrary CRS, let $Q$ and $P$ denote a target and proposal distribution supplied as input, and let $K$ be the number of samples examined by the algorithm. We show that the expected runtime $\mathbb{E}[K]$ of any CRS scales at least as $\exp_2(D_\infty[Q || P])$, where $D_\infty[Q || P]$ is the R\'enyi $\infty$-divergence. Regarding the codelength, we show that $D_{KL}[Q || P] \leq D_{CS}[Q || P] \leq \mathbb{H}[K]$, where $D_{CS}[Q || P]$ is a new quantity we call the channel simulation divergence. Furthermore, we prove that our new lower bound, unlike the $D_{KL}[Q || P]$ lower bound, is achievable tightly, i.e. there is a CRS such that $\mathbb{H}[K] \leq D_{CS}[Q || P] + \log_2 (e + 1)$. Finally, we conduct numerical studies of the asymptotic scaling of the codelength of Gaussian and Laplace channel simulation algorithms. </p>
back
<p>Job shop scheduling problems are one of the most important and challenging combinatorial optimization problems that have been tackled mainly by exact or approximate solution approaches. However, finding an exact solution can be infeasible for real-world problems, and even with an approximate solution approach, it can require a prohibitive amount of time to find a near-optimal solution, and the found solutions are not applicable to new problems in general. To address these challenges, we propose an attention-based reinforcement learning method for the class of job shop scheduling problems by integrating policy gradient reinforcement learning with a modified transformer architecture. An important result is that our trained learners in the proposed method can be reused to solve large-scale problems not used in training and demonstrate that our approach outperforms the results of recent studies and widely adopted heuristic rules. </p>
back
<p>Translation into severely low-resource languages has both the cultural goal of saving and reviving those languages and the humanitarian goal of assisting the everyday needs of local communities that are accelerated by the recent COVID-19 pandemic. In many humanitarian efforts, translation into severely low-resource languages often does not require a universal translation engine, but a dedicated text-specific translation engine. For example, healthcare records, hygienic procedures, government communication, emergency procedures and religious texts are all limited texts. While generic translation engines for all languages do not exist, translation of multilingually known limited texts into new, low-resource languages may be possible and reduce human translation effort. We attempt to leverage translation resources from rich-resource languages to efficiently produce best possible translation quality for well known texts, which are available in multiple languages, in a new, low-resource language. To reach this goal, we argue that in translating a closed text into low-resource languages, generalization to out-of-domain texts is not necessary, but generalization to new languages is. Performance gain comes from massive source parallelism by careful choice of close-by language families, style-consistent corpus-level paraphrases within the same language and strategic adaptation of existing large pretrained multilingual models to the domain first and then to the language. Such performance gain makes it possible for machine translation systems to collaborate with human translators to expedite the translation process into new, low-resource languages. </p>
back
<p>Outsourced computation can put client data confidentiality at risk. Existing solutions are either inefficient or insufficiently secure: cryptographic techniques like fully-homomorphic encryption incur significant overheads, even with hardware assistance, while the complexity of hardware-assisted trusted execution environments has been exploited to leak secret data. </p> <p>Recent proposals such as BliMe and OISA show how dynamic information flow tracking (DIFT) enforced in hardware can protect client data efficiently. They are designed to protect CPU-only workloads. However, many outsourced computing applications, like machine learning, make extensive use of accelerators. </p> <p>We address this gap with Dolma, which applies DIFT to the Gemmini matrix multiplication accelerator, efficiently guaranteeing client data confidentiality, even in the presence of malicious/vulnerable software and side channel attacks on the server. We show that accelerators can allow DIFT logic optimizations that significantly reduce area overhead compared with general-purpose processor architectures. Dolma is integrated with the BliMe framework to achieve end-to-end security guarantees. We evaluate Dolma on an FPGA using a ResNet-50 DNN model and show that it incurs low overheads for large configurations ($4.4\%$, $16.7\%$, $16.5\%$ for performance, resource usage and power, respectively, with a 32x32 configuration). </p>
back
<p>A business process model represents the expected behavior of a set of process instances (cases). The process instances may be executed in parallel and may affect each other through data or resources. In particular, changes in values of data shared by process instances may affect a set of process instances and require some operations in response. Such potential effects do not explicitly appear in the process model. This paper addresses possible impacts that may be affected through shared data across process instances and suggests how to analyze them at design time (when the actual process instances do not yet exist). The suggested method uses both a process model and a (relational) data model in order to identify potential inter-instance data impact sets. These sets may guide process users in tracking the impacts of data changes and supporting their handling at runtime. They can also assist process designers in exploring possible constraints over data. The applicability of the method was evaluated using three different realistic processes. Using a process expert, we further assessed the usefulness of the method, revealing some useful insights for coping with unexpected data-related changes suggested by our approach. </p>
back
<p>Robotic pick and place stands at the heart of autonomous manipulation. When conducted in cluttered or complex environments robots must jointly reason about the selected grasp and desired placement locations to ensure success. While several works have examined this joint pick-and-place problem, none have fully leveraged recent learning-based approaches for multi-fingered grasp planning. We present a modular algorithm for joint pick and place planning that can make use of state of the art grasp classifiers for planning multi-fingered grasps for novel objects from partial view point clouds. We demonstrate our joint pick and place formulation with several costs associated with different placement tasks. Experiments on pick and place tasks with cluttered scenes using a physical robot show that our joint inference method is more successful than a sequential pick then place approach, while also achieving better placement configurations. </p>
back
<p>This study explores linguistic differences between human and LLM-generated dialogues, using 19.5K dialogues generated by ChatGPT-3.5 as a companion to the EmpathicDialogues dataset. The research employs Linguistic Inquiry and Word Count (LIWC) analysis, comparing ChatGPT-generated conversations with human conversations across 118 linguistic categories. Results show greater variability and authenticity in human dialogues, but ChatGPT excels in categories such as social processes, analytical style, cognition, attentional focus, and positive emotional tone, reinforcing recent findings of LLMs being "more human than human." However, no significant difference was found in positive or negative affect between ChatGPT and human dialogues. Classifier analysis of dialogue embeddings indicates implicit coding of the valence of affect despite no explicit mention of affect in the conversations. The research also contributes a novel, companion ChatGPT-generated dataset of conversations between two independent chatbots, which were designed to replicate a corpus of human conversations available for open access and used widely in AI research on language modeling. Our findings increase understanding of ChatGPT's linguistic capabilities and inform ongoing efforts to distinguish between human and LLM-generated text, which is critical in detecting AI-generated fakes, misinformation, and disinformation. </p>
back
<p>Prompt-based methods have been successfully applied to multilingual pretrained language models for zero-shot cross-lingual understanding. However, most previous studies primarily focused on sentence-level classification tasks, and only a few considered token-level labeling tasks such as Named Entity Recognition (NER) and Part-of-Speech (POS) tagging. In this paper, we propose Token-Level Prompt Decomposition (ToPro), which facilitates the prompt-based method for token-level sequence labeling tasks. The ToPro method decomposes an input sentence into single tokens and applies one prompt template to each token. Our experiments on multilingual NER and POS tagging datasets demonstrate that ToPro-based fine-tuning outperforms Vanilla fine-tuning and Prompt-Tuning in zero-shot cross-lingual transfer, especially for languages that are typologically different from the source language English. Our method also attains state-of-the-art performance when employed with the mT5 model. Besides, our exploratory study in multilingual large language models shows that ToPro performs much better than the current in-context learning method. Overall, the performance improvements show that ToPro could potentially serve as a novel and simple benchmarking method for sequence labeling tasks. </p>
back
<p>We consider the optimization of complex performance metrics in multi-label classification under the population utility framework. We mainly focus on metrics linearly decomposable into a sum of binary classification utilities applied separately to each label with an additional requirement of exactly $k$ labels predicted for each instance. These "macro-at-$k$" metrics possess desired properties for extreme classification problems with long tail labels. Unfortunately, the at-$k$ constraint couples the otherwise independent binary classification tasks, leading to a much more challenging optimization problem than standard macro-averages. We provide a statistical framework to study this problem, prove the existence and the form of the optimal classifier, and propose a statistically consistent and practical learning algorithm based on the Frank-Wolfe method. Interestingly, our main results concern even more general metrics being non-linear functions of label-wise confusion matrices. Empirical results provide evidence for the competitive performance of the proposed approach. </p>
back
<p>This paper proposes a fully distributed termination method for distributed optimization algorithms solved by multiple agents. The proposed method guarantees terminating a distributed optimization algorithm after satisfying the global termination criterion using information from local computations and neighboring agents. The proposed method requires additional iterations after satisfying the global terminating criterion to communicate the termination status. The number of additional iterations is bounded by the diameter of the communication network. This paper also proposes a fault-tolerant extension of this termination method that prevents early termination due to faulty agents or communication errors. We provide a proof of the method's correctness and demonstrate the proposed method by solving the optimal power flow problem for electric power grids using the alternating direction method of multipliers. </p>
back
<p>The Ising model, originally developed as a spin-glass model for ferromagnetic elements, has gained popularity as a network-based model for capturing dependencies in agents' outputs. Its increasing adoption in healthcare and the social sciences has raised privacy concerns regarding the confidentiality of agents' responses. In this paper, we present a novel $(\varepsilon,\delta)$-differentially private algorithm specifically designed to protect the privacy of individual agents' outcomes. Our algorithm allows for precise estimation of the natural parameter using a single network through an objective perturbation technique. Furthermore, we establish regret bounds for this algorithm and assess its performance on synthetic datasets and two real-world networks: one involving HIV status in a social network and the other concerning the political leaning of online blogs. </p>
back
<p>Coordination of multi-robot systems require some form of localization between agents, but most methods today rely on some external infrastructure. Ultra Wide Band (UWB) sensing has gained popularity in relative localization applications, and we see many implementations that use cooperative agents augmenting UWB range measurements with other sensing modalities (e.g., ViO, IMU, VSLAM) for infrastructure-free relative localization. A lesser researched option is using Angle of Arrival (AoA) readings obtained from UWB Antenna pairs to perform relative localization. In this paper we present a UWB platform called ReLoki that can be used for ranging and AoA-based relative localization in~3D. ReLoki enables any message sent from a transmitting agent to be localized by using a Regular Tetrahedral Antenna Array (RTA). As a full scale proof of concept, we deploy ReLoki on a 3-robot system and compare its performance in terms of accuracy and speed with prior methods. </p>
back
<p>Monocular depth estimation (MDE) is a critical component of many medical tracking and mapping algorithms, particularly from endoscopic or laparoscopic video. However, because ground truth depth maps cannot be acquired from real patient data, supervised learning is not a viable approach to predict depth maps for medical scenes. Although self-supervised learning for MDE has recently gained attention, the outputs are difficult to evaluate reliably and each MDE's generalizability to other patients and anatomies is limited. This work evaluates the zero-shot performance of the newly released Depth Anything Model on medical endoscopic and laparoscopic scenes. We compare the accuracy and inference speeds of Depth Anything with other MDE models trained on general scenes as well as in-domain models trained on endoscopic data. Our findings show that although the zero-shot capability of Depth Anything is quite impressive, it is not necessarily better than other models in both speed and performance. We hope that this study can spark further research in employing foundation models for MDE in medical scenes. </p>
back
<p>This paper describes LeftoverLocals: a vulnerability that allows data recovery from GPU memory created by another process on Apple, Qualcomm, and AMD GPUs. LeftoverLocals impacts the security posture of GPU applications, with particular significance to LLMs and ML models that run on impacted GPUs. By recovering local memory, an optimized GPU memory region, we built a PoC where an attacker can listen into another user's interactive LLM session (e.g., llama.cpp) across process or container boundaries. </p>
back
<p>Existing modeling approaches for long-duration energy storage (LDES) are often based either on an oversimplified representation of power system operations or limited representation of storage technologies, e.g., evaluation of only a single application. This manuscript presents an overview of the challenges of modeling LDES technologies, as well as a discussion regarding the capabilities and limitations of existing approaches. We used two test power systems with high shares of both solar photovoltaics- and wind (70% - 90% annual variable renewable energy shares) to assess LDES dispatch approaches. Our results estimate that better dispatch modeling of LDES could increase the associated operational value by 4% - 14% and increase the standard capacity credit by 14% - 34%. Thus, a better LDES dispatch could represent significant cost saving opportunities for electric utilities and system operators. In addition, existing LDES dispatch modeling approaches were tested in terms of both improved system value (e.g., based on production cost and standard capacity credit) and scalability (e.g., based on central processing unit time and peak memory usage). Both copper plate and nodal representations of the power system were considered. Although the end volume target dispatch approach, i.e., based on mid-term scheduling, showed promising performance in terms of both improved system value and scalability, there is a need for robust and scalable dispatch approaches for LDES in transmission-constrained electric grids. Moreover, more research is required to better understand the optimal operation of LDES considering extreme climate/weather events, reliability applications, and power system operational uncertainties. </p>
back
<p>Millions of online communities are governed by volunteer moderators, who shape their communities by setting and enforcing rules, recruiting additional moderators, and participating in the community themselves. These moderators must regularly make decisions about how to govern, yet it is challenging to determine what governance strategies are most successful, as measuring the `success' of governance is complex and nuanced. Furthermore, the incredible diversity in community topic, size, and membership all but guarantee that there is no `one-size-fits-all' solution for community governance. In this work, we measure governance by assessing how community members publicly discuss their own moderators. We quantify perceptions of moderators through 1.89 million labeled posts and comments made on reddit over an 18 month period, and relate these perceptions to characteristics of community governance and to different actions that community moderators can take. We identify key differences between different types of communities, and highlight promising strategies for moderator teams. Amongst other findings, we show that positive perceptions of moderators are associated with other measures of community health, and that strict rule enforcement is perceived more favorably for certain topics, such as news communities, than others. We investigate what kinds of moderators have the most positive impact on the community when they join the mod team, and find that moderators who are active community members before and during their mod tenures result in the largest improvement of community members' perceptions of moderators. We make all our models, datasets, and code public. </p>
back
<p>Integrating deep learning with the search for new electron-phonon superconductors represents a burgeoning field of research, where the primary challenge lies in the computational intensity of calculating the electron-phonon spectral function, $\alpha^2F(\omega)$, the essential ingredient of Midgal-Eliashberg theory of superconductivity. To overcome this challenge, we adopt a two-step approach. First, we compute $\alpha^2F(\omega)$ for 818 dynamically stable materials. We then train a deep-learning model to predict $\alpha^2F(\omega)$, using an unconventional training strategy to temper the model's overfitting, enhancing predictions. Specifically, we train a Bootstrapped Ensemble of Tempered Equivariant graph neural NETworks (BETE-NET), obtaining an MAE of 0.21, 45 K, and 43 K for the Eliashberg moments derived from $\alpha^2F(\omega)$: $\lambda$, $\omega_{\log}$, and $\omega_{2}$, respectively, yielding an MAE of 2.5 K for the critical temperature, $T_c$. Further, we incorporate domain knowledge of the site-projected phonon density of states to impose inductive bias into the model's node attributes and enhance predictions. This methodological innovation decreases the MAE to 0.18, 29 K, and 28 K, respectively, yielding an MAE of 2.1 K for $T_c$. We illustrate the practical application of our model in high-throughput screening for high-$T_c$ materials. The model demonstrates an average precision nearly five times higher than random screening, highlighting the potential of ML in accelerating superconductor discovery. BETE-NET accelerates the search for high-$T_c$ superconductors while setting a precedent for applying ML in materials discovery, particularly when data is limited. </p>
back
<p>In inverse problems, it is widely recognized that the incorporation of a sparsity prior yields a regularization effect on the solution. This approach is grounded on the a priori assumption that the unknown can be appropriately represented in a basis with a limited number of significant components, while most coefficients are close to zero. This occurrence is frequently observed in real-world scenarios, such as with piecewise smooth signals. In this study, we propose a probabilistic sparsity prior formulated as a mixture of degenerate Gaussians, capable of modeling sparsity with respect to a generic basis. Under this premise, we design a neural network that can be interpreted as the Bayes estimator for linear inverse problems. Additionally, we put forth both a supervised and an unsupervised training strategy to estimate the parameters of this network. To evaluate the effectiveness of our approach, we conduct a numerical comparison with commonly employed sparsity-promoting regularization techniques, namely LASSO, group LASSO, iterative hard thresholding, and sparse coding/dictionary learning. Notably, our reconstructions consistently exhibit lower mean square error values across all $1$D datasets utilized for the comparisons, even in cases where the datasets significantly deviate from a Gaussian mixture model. </p>
back
<p>In this paper, we study linear convolutional networks with one-dimensional filters and arbitrary strides. The neuromanifold of such a network is a semialgebraic set, represented by a space of polynomials admitting specific factorizations. Introducing a recursive algorithm, we generate polynomial equations whose common zero locus corresponds to the Zariski closure of the corresponding neuromanifold. Furthermore, we explore the algebraic complexity of training these networks employing tools from metric algebraic geometry. Our findings reveal that the number of all complex critical points in the optimization of such a network is equal to the generic Euclidean distance degree of a Segre variety. Notably, this count significantly surpasses the number of critical points encountered in the training of a fully connected linear network with the same number of parameters. </p>
back
<p>In this paper, we present an exploration and assessment of employing a centralized deep Q-network (DQN) controller as a substitute for the prevalent use of PID controllers in the context of 6DOF swimming robots. Our primary focus centers on illustrating this transition with the specific case of underwater object tracking. DQN offers advantages such as data efficiency and off-policy learning, while remaining simpler to implement than other reinforcement learning methods. Given the absence of a dynamic model for our robot, we propose an RL agent to control this multi-input-multi-output (MIMO) system, where a centralized controller may offer more robust control than distinct PIDs. Our approach involves initially using classical controllers for safe exploration, then gradually shifting to DQN to take full control of the robot. </p> <p>We divide the underwater tracking task into vision and control modules. We use established methods for vision-based tracking and introduce a centralized DQN controller. By transmitting bounding box data from the vision module to the control module, we enable adaptation to various objects and effortless vision system replacement. Furthermore, dealing with low-dimensional data facilitates cost-effective online learning for the controller. Our experiments, conducted within a Unity-based simulator, validate the effectiveness of a centralized RL agent over separated PID controllers, showcasing the applicability of our framework for training the underwater RL agent and improved performance compared to traditional control methods. The code for both real and simulation implementations is at https://github.com/FARAZLOTFI/underwater-object-tracking. </p>
back
<p>In past work (Onokpasa, Wild, Wong, DCC 2023), we showed that (a) for joint compression of RNA sequence and structure, stochastic context-free grammars are the best known compressors and (b) that grammars which have better compression ability also show better performance in ab initio structure prediction. Previous grammars were manually curated by human experts. In this work, we develop a framework for automatic and systematic search algorithms for stochastic grammars with better compression (and prediction) ability for RNA. We perform an exhaustive search of small grammars and identify grammars that surpass the performance of human-expert grammars. </p>
back
<p>We contribute the first publicly available dataset of factual claims from different platforms and fake YouTube videos on the 2023 Israel-Hamas war for automatic fake YouTube video classification. The FakeClaim data is collected from 60 fact-checking organizations in 30 languages and enriched with metadata from the fact-checking organizations curated by trained journalists specialized in fact-checking. Further, we classify fake videos within the subset of YouTube videos using textual information and user comments. We used a pre-trained model to classify each video with different feature combinations. Our best-performing fine-tuned language model, Universal Sentence Encoder (USE), achieves a Macro F1 of 87\%, which shows that the trained model can be helpful for debunking fake videos using the comments from the user discussion. The dataset is available on Github\footnote{https://github.com/Gautamshahi/FakeClaim} </p>
back
<p>Local zoning ordinances across the United States have the impact of restricting development of energy infrastructure, including utility-scale solar photovoltaics. While these ordinances may be developed for legitimate purposes to protect public health and safety, they could impede or increase costs of power sector decarbonization. We quantify the role of utility-scale solar zoning ordinances on power sector decarbonization across the Great Lakes region (Illinois, Indiana, Michigan, Minnesota, Ohio, and Wisconsin) by integrating 6,300 rural community zoning ordinances into a power system planning model. Relative to no ordinances, solar zoning ordinances reduce total potential deployment of solar PV by 52% (or 1.6 TW) across our region. Currently, however, the biggest zoning barrier to deployment is zoning ordinances which are silent on utility-scale solar. Deployment restrictions translate to up to 4 GW greater investment needs and 5.6% greater PV investment costs to achieve a 10% PV generation target. Starker shifts occur at the state level, e.g. Wisconsin sees a 40% reduction in PV investments due to zoning restrictions. Our results underscore the need for planning that aligns local zoning laws with state and regional goals. </p>
back
<p>This paper presents new solutions for Private Information Retrieval (PIR) with side information. This problem is motivated by PIR settings in which a client has side information about the data held by the servers and would like to leverage this information in order to improve the download rate. The problem of PIR with side information has been the subject of several recent studies that presented achievability schemes as well as converses for both multi-server and single-server settings. However, the solutions for the multi-server settings adapted from the solutions for the single-server setting in a rather straightforward manner, relying on the concept of super-messages. Such solutions require an exponential degree of sub-packetization (in terms of the number of messages). </p> <p>This paper makes the following contributions. First, we revisit the PIR problem with side information and present a new approach to leverage side information in the context of PIR. The key idea of our approach is a randomized algorithm to determine the linear combinations of the sub-packets that need to be recovered from each server. In addition, our approach takes advantage of the fact that the identity of the side information messages does not need to be kept private, and, as a result, the information retrieval scheme does not need to be symmetric. Second, we present schemes for PIR with side information that achieve a higher rate than previously proposed solutions and require a significantly lower degree of sub-packetization (linear in the number of servers). Our scheme not only achieves the highest known download rate for the problem at hand but also invalidates a previously claimed converse bound on the maximum achievable download rate. </p>
back
<p>This paper revisits the problem of multi-server Private Information Retrieval with Private Side Information (PIR-PSI). In this problem, $N$ non-colluding servers store identical copies of $K$ messages, each comprising $L$ symbols from $\mathbb{F}_q$, and a user, who knows $M$ of these messages, wants to retrieve one of the remaining $K-M$ messages. The user's goal is to retrieve the desired message by downloading the minimum amount of information from the servers while revealing no information about the identities of the desired message and side information messages to any server. The capacity of PIR-PSI, defined as the maximum achievable download rate, was previously characterized for all $N$, $K$, and $M$ when $L$ and $q$ are sufficiently large -- specifically, growing exponentially with $K$, to ensure the divisibility of each message into $N^K$ sub-packets and to guarantee the existence of an MDS code with its length and dimension being exponential in $K$. In this work, we propose a new capacity-achieving PIR-PSI scheme that is applicable to all $N$, $K$, $M$, $L$, and $q$ where $N\geq M+1$ and $N-1\mid L$. The proposed scheme operates with a sub-packetization level of $N-1$, independent of $K$, and works over any finite field without requiring an MDS code. </p>
back
<p>For turbulent problems of industrial scale, computational cost may become prohibitive due to the stability constraints associated with explicit time discretization of the underlying conservation laws. On the other hand, implicit methods allow for larger time-step sizes but require exorbitant computational resources. Implicit-explicit (IMEX) formulations combine both temporal approaches, using an explicit method in nonstiff portions of the domain and implicit in stiff portions. While these methods can be shown to be orders of magnitude faster than typical explicit discretizations, they are still limited by their implicit discretization in terms of cost. Hybridization reduces the scaling of these systems to an effective lower dimension, which allows the system to be solved at significant speedup factors compared to standard implicit methods. This work proposes an IMEX scheme that combines hybridized and standard flux reconstriction (FR) methods to tackle geometry-induced stiffness. By using the so-called transmission conditions, an overall conservative formulation can be obtained after combining both explicit FR and hybridized implicit FR methods. We verify and apply our approach to a series of numerical examples, including a multi-element airfoil at Reynolds number 1.7 million. Results demonstrate speedup factors of four against standard IMEX formulations and at least 15 against standard explicit formulations for the same problem. </p>
back
<p>The execution failure of cyber-physical systems (e.g., autonomous driving systems, unmanned aerial systems, and robotic systems) could result in the loss of life, severe injuries, large-scale environmental damage, property destruction, and major economic loss. Hence, such systems usually require a strong justification that they will effectively support critical requirements (e.g., safety, security, and reliability) for which they were designed. Thus, it is often mandatory to develop compelling assurance cases to support that justification and allow regulatory bodies to certify such systems. In such contexts, detecting assurance deficits, relying on patterns to improve the structure of assurance cases, improving existing assurance case notations, and (semi-)automating the generation of assurance cases are key to develop compelling assurance cases and foster consumer acceptance. We therefore explore challenges related to such assurance enablers and outline some potential directions that could be explored to tackle them. </p>
back
<p>Active learning strategies for 3D object detection in autonomous driving datasets may help to address challenges of data imbalance, redundancy, and high-dimensional data. We demonstrate the effectiveness of entropy querying to select informative samples, aiming to reduce annotation costs and improve model performance. We experiment using the BEVFusion model for 3D object detection on the nuScenes dataset, comparing active learning to random sampling and demonstrating that entropy querying outperforms in most cases. The method is particularly effective in reducing the performance gap between majority and minority classes. Class-specific analysis reveals efficient allocation of annotated resources for limited data budgets, emphasizing the importance of selecting diverse and informative data for model training. Our findings suggest that entropy querying is a promising strategy for selecting data that enhances model learning in resource-constrained environments. </p>
back
<p>Reinforcement Learning from Human Feedback (RLHF) is a widely adopted approach for aligning large language models with human values. However, RLHF relies on a reward model that is trained with a limited amount of human preference data, which could lead to inaccurate predictions. As a result, RLHF may produce outputs that are misaligned with human values. To mitigate this issue, we contribute a reward ensemble method that allows the reward model to make more accurate predictions. As using an ensemble of large language model-based reward models can be computationally and resource-expensive, we explore efficient ensemble methods including linear-layer ensemble and LoRA-based ensemble. Empirically, we run Best-of-$n$ and Proximal Policy Optimization with our ensembled reward models, and verify that our ensemble methods help improve the alignment performance of RLHF outputs. </p>
back
<p>Code completion aims to enhance programming productivity by predicting potential code based on the current programming context. Recently, pretrained language models (LMs) have become prominent in this field. Various approaches have been proposed to fine-tune LMs using supervised fine-tuning (SFT) techniques for code completion. However, the inherent exposure bias of these models can cause errors to accumulate early in the sequence completion, leading to even more errors in subsequent completions. To address this problem, deep reinforcement learning (DRL) is an alternative technique for fine-tuning LMs for code completion, which can improve the generalization capabilities and overall performance. Nevertheless, integrating DRL-based strategies into code completion faces two major challenges: 1) The dynamic nature of the code context requires the completion model to quickly adapt to changes, which poses difficulties for conventional DRL strategies that focus on delayed rewarding of the final code state. 2) It is difficult to evaluate the correctness of partial code, thus the reward redistribution-based strategies cannot be adapted to code completion. To tackle these challenges, we propose IRCoCo, a code completion-specific DRL-based fine-tuning framework. This framework is designed to provide immediate rewards as feedback for detecting dynamic context changes arising from continuous edits during code completion. With the aid of immediate feedback, the fine-tuned LM can gain a more precise understanding of the current context, thereby enabling effective adjustment of the LM and optimizing code completion in a more refined manner. Experimental results demonstrate that fine-tuning pretrained LMs with IRCoCo leads to significant improvements in the code completion task, outperforming both SFT-based and other DRL-based baselines. </p>
back
<p>Fine-tuning large pre-trained language models (LLMs) on particular datasets is a commonly employed strategy in Natural Language Processing (NLP) classification tasks. However, this approach usually results in a loss of models generalizability. In this paper, we present a framework that allows for maintaining generalizability, and enhances the performance on the downstream task by utilizing task-specific context attribution. We show that a linear transformation of the text representation from any transformer model using the task-specific concept operator results in a projection onto the latent concept space, referred to as context attribution in this paper. The specific concept operator is optimized during the supervised learning stage via novel loss functions. The proposed framework demonstrates that context attribution of the text representation for each task objective can improve the capacity of the discriminator function and thus achieve better performance for the classification task. Experimental results on three datasets, namely HateXplain, IMDB reviews, and Social Media Attributions, illustrate that the proposed model attains superior accuracy and generalizability. Specifically, for the non-fine-tuned BERT on the HateXplain dataset, we observe 8% improvement in accuracy and 10% improvement in F1-score. Whereas for the IMDB dataset, fine-tuned state-of-the-art XLNet is outperformed by 1% for both accuracy and F1-score. Furthermore, in an out-of-domain cross-dataset test, DistilBERT fine-tuned on the IMDB dataset in conjunction with the proposed model improves the F1-score on the HateXplain dataset by 7%. For the Social Media Attributions dataset of YouTube comments, we observe 5.2% increase in F1-metric. The proposed framework is implemented with PyTorch and provided open-source on GitHub. </p>
back
<p>Large language models (LLMs) have significantly advanced natural language processing, but their progress has yet to be equal across languages. While most LLMs are trained in high-resource languages like English, multilingual models generally underperform monolingual ones. Additionally, aspects of their multilingual foundation sometimes restrict the byproducts they produce, like computational demands and licensing regimes. In this study, we document the development of open-foundation models tailored for use in low-resource settings, their limitations, and their benefits. This is the TeenyTinyLlama pair: two compact models for Brazilian Portuguese text generation. We release them under the permissive Apache 2.0 license on GitHub and Hugging Face for community use and further development. See https://github.com/Nkluge-correa/TeenyTinyLlama </p>
back
<p>Online platforms such as YouTube, Instagram, TikTok heavily rely on recommender systems to decide what content to show to which users. Content producers often aim to produce material that is likely to be shown to users and lead them to engage with said producer. To do so, producers try to align their content with the preferences of their targeted user base. In this work, we explore the equilibrium behavior of producers that are interested in maximizing user engagement. We study two variants of the content-serving rule that the platform's recommender system uses, and we show structural results on producers' production at equilibrium. We leverage these structural results to show that, in simple settings, we see specialization naturally arising from the competition among producers trying to maximize user engagement. We provide a heuristic for computing equilibria of our engagement game, and evaluate it experimentally. We show i) the performance and convergence of our heuristic, ii) the producer and user utilities at equilibrium, and iii) the level of producer specialization. </p>
back
<p>Coding theory revolves around the incorporation of redundancy into transmitted symbols, computation tasks, and stored data to guard against adversarial manipulation. However, error correction in coding theory is contingent upon a strict trust assumption. In the context of computation and storage, it is required that honest nodes outnumber adversarial ones by a certain margin. However, in several emerging real-world cases, particularly, in decentralized blockchain-oriented applications, such assumptions are often unrealistic. Consequently, despite the important role of coding in addressing significant challenges within decentralized systems, its applications become constrained. Still, in decentralized platforms, a distinctive characteristic emerges, offering new avenues for secure coding beyond the constraints of conventional methods. In these scenarios, the adversary benefits when the legitimate decoder recovers the data, and preferably with a high estimation error. This incentive motivates them to act rationally, trying to maximize their gains. In this paper, we propose a game theoretic formulation, called game of coding, that captures this unique dynamic where each of the adversary and the data collector (decoder) have a utility function to optimize. The utility functions reflect the fact that both the data collector and the adversary are interested to increase the chance of data being recoverable at the data collector. Moreover, the utility functions express the interest of the data collector to estimate the input with lower estimation error, but the opposite interest of the adversary. As a first, still highly non-trivial step, we characterize the equilibrium of the game for the repetition code with repetition factor of 2, for a wide class of utility functions with minimal assumptions. </p>
back
<p>Scientific machine learning (SciML) has emerged as a versatile approach to address complex computational science and engineering problems. Within this field, physics-informed neural networks (PINNs) and deep operator networks (DeepONets) stand out as the leading techniques for solving partial differential equations by incorporating both physical equations and experimental data. However, training PINNs and DeepONets requires significant computational resources, including long computational times and large amounts of memory. In search of computational efficiency, training neural networks using half precision (float16) rather than the conventional single (float32) or double (float64) precision has gained substantial interest, given the inherent benefits of reduced computational time and memory consumed. However, we find that float16 cannot be applied to SciML methods, because of gradient divergence at the start of training, weight updates going to zero, and the inability to converge to a local minima. To overcome these limitations, we explore mixed precision, which is an approach that combines the float16 and float32 numerical formats to reduce memory usage and increase computational speed. Our experiments showcase that mixed precision training not only substantially decreases training times and memory demands but also maintains model accuracy. We also reinforce our empirical observations with a theoretical analysis. The research has broad implications for SciML in various computational applications. </p>
back
<p>Autoregressive Large Language Models (LLMs) trained for next-word prediction have demonstrated remarkable proficiency at producing coherent text. But are they equally adept at forming coherent probability judgments? We use probabilistic identities and repeated judgments to assess the coherence of probability judgments made by LLMs. Our results show that the judgments produced by these models are often incoherent, displaying human-like systematic deviations from the rules of probability theory. Moreover, when prompted to judge the same event, the mean-variance relationship of probability judgments produced by LLMs shows an inverted-U-shaped like that seen in humans. We propose that these deviations from rationality can be explained by linking autoregressive LLMs to implicit Bayesian inference and drawing parallels with the Bayesian Sampler model of human probability judgments. </p>
back
<p>In this paper, we focus on the design of binary constant-weight codes that admit low-complexity encoding and decoding algorithms, and that have size as a power of $2$. We construct a family of $(n=2^\ell, M=2^k, d=2)$ constant-weight codes ${\cal C}[\ell, r]$ parameterized by integers $\ell \geq 3$ and $1 \leq r \leq \lfloor \frac{\ell+3}{4} \rfloor$, by encoding information in the gaps between successive $1$'s of a vector. The code has weight $w = \ell$ and combinatorial dimension $k$ that scales quadratically with $\ell$. The encoding time is linear in the input size $k$, and the decoding time is poly-logarithmic in the input size $n$, discounting the linear time spent on parsing the input. Encoding and decoding algorithms of similar codes known in either information-theoretic or combinatorial literature require computation of large number of binomial coefficients. Our algorithms fully eliminate the need to evaluate binomial coefficients. While the code has a natural price to pay in $k$, it performs fairly well against the information-theoretic upper bound $\lfloor \log_2 {n \choose w} \rfloor$. When $\ell =3$, the code is optimal achieving the upper bound; when $\ell=4$, it is one bit away from the upper bound, and as $\ell$ grows it is order-optimal in the sense that the ratio of $k$ with its upper bound becomes a constant $\frac{11}{16}$ when $r=\lfloor \frac{\ell+3}{4} \rfloor$. With the same or even lower complexity, we derive new codes permitting a wider range of parameters by modifying ${\cal C}[\ell, r]$ in two different ways. The code derived using the first approach has the same blocklength $n=2^\ell$, but weight $w$ is allowed to vary from $\ell-1$ to $1$. In the second approach, the weight remains fixed as $w = \ell$, but the blocklength is reduced to $n=2^\ell - 2^r +1$. For certain selected values of parameters, these modified codes have an optimal $k$. </p>
back
<p>Task-based behavioral biometric authentication of users interacting in virtual reality (VR) environments enables seamless continuous authentication by using only the motion trajectories of the person's body as a unique signature. Deep learning-based approaches for behavioral biometrics show high accuracy when using complete or near complete portions of the user trajectory, but show lower performance when using smaller segments from the start of the task. Thus, any systems designed with existing techniques are vulnerable while waiting for future segments of motion trajectories to become available. In this work, we present the first approach that predicts future user behavior using Transformer-based forecasting and using the forecasted trajectory to perform user authentication. Our work leverages the notion that given the current trajectory of a user in a task-based environment we can predict the future trajectory of the user as they are unlikely to dramatically shift their behavior since it would preclude the user from successfully completing their task goal. Using the publicly available 41-subject ball throwing dataset of Miller et al. we show improvement in user authentication when using forecasted data. When compared to no forecasting, our approach reduces the authentication equal error rate (EER) by an average of 23.85% and a maximum reduction of 36.14%. </p>
back
<p>In continual RL, the environment of a reinforcement learning (RL) agent undergoes change. A successful system should appropriately balance the conflicting requirements of retaining agent performance on already learned tasks, stability, whilst learning new tasks, plasticity. The first-in-first-out buffer is commonly used to enhance learning in such settings but requires significant memory. We explore the application of an augmentation to this buffer which alleviates the memory constraints, and use it with a world model model-based reinforcement learning algorithm, to evaluate its effectiveness in facilitating continual learning. We evaluate the effectiveness of our method in Procgen and Atari RL benchmarks and show that the distribution matching augmentation to the replay-buffer used in the context of latent world models can successfully prevent catastrophic forgetting with significantly reduced computational overhead. Yet, we also find such a solution to not be entirely infallible, and other failure modes such as the opposite -- lacking plasticity and being unable to learn a new task -- to be a potential limitation in continual learning systems. </p>
back
<p>Autonomous manipulation in robot arms is a complex and evolving field of study in robotics. This paper introduces an innovative approach to this challenge by focusing on imitation learning (IL). Unlike traditional imitation methods, our approach uses IL based on bilateral control, allowing for more precise and adaptable robot movements. The conventional IL based on bilateral control method have relied on Long Short-Term Memory (LSTM) networks. In this paper, we present the IL for robot using position and torque information based on Bilateral control with Transformer (ILBiT). This proposed method employs the Transformer model, known for its robust performance in handling diverse datasets and its capability to surpass LSTM's limitations, especially in tasks requiring detailed force adjustments. A standout feature of ILBiT is its high-frequency operation at 100 Hz, which significantly improves the system's adaptability and response to varying environments and objects of different hardness levels. The effectiveness of the Transformer-based ILBiT method can be seen through comprehensive real-world experiments. </p>
back
<p>BLVRUN is a command line shell script designed to offer developers within the BLV community a succinct and insightful overview of traceback errors. Its primary function involves parsing errors and utilizing a refined large language model to generate informative error summaries. In terms of performance, our model rivals that of well-known models like ChatGPT or AI-chatbot plug-ins tailored for specific Integrated Development Environments (IDEs). Importantly, BLV users can seamlessly integrate this tool into their existing development workflows, eliminating the need for any modifications or adaptations to facilitate debugging tasks. </p>
back
<p>We show how continuous-depth neural ODE models can be framed as single-layer, infinite-width nets using the Chen--Fliess series expansion for nonlinear ODEs. In this net, the output ''weights'' are taken from the signature of the control input -- a tool used to represent infinite-dimensional paths as a sequence of tensors -- which comprises iterated integrals of the control input over a simplex. The ''features'' are taken to be iterated Lie derivatives of the output function with respect to the vector fields in the controlled ODE model. The main result of this work applies this framework to derive compact expressions for the Rademacher complexity of ODE models that map an initial condition to a scalar output at some terminal time. The result leverages the straightforward analysis afforded by single-layer architectures. We conclude with some examples instantiating the bound for some specific systems and discuss potential follow-up work. </p>
back
<p>Red teaming is a common strategy for identifying weaknesses in generative language models (LMs), where adversarial prompts are produced that trigger an LM to generate unsafe responses. Red teaming is instrumental for both model alignment and evaluation, but is labor-intensive and difficult to scale when done by humans. In this paper, we present Gradient-Based Red Teaming (GBRT), a red teaming method for automatically generating diverse prompts that are likely to cause an LM to output unsafe responses. GBRT is a form of prompt learning, trained by scoring an LM response with a safety classifier and then backpropagating through the frozen safety classifier and LM to update the prompt. To improve the coherence of input prompts, we introduce two variants that add a realism loss and fine-tune a pretrained model to generate the prompts instead of learning the prompts directly. Our experiments show that GBRT is more effective at finding prompts that trigger an LM to generate unsafe responses than a strong reinforcement learning-based red teaming approach, and succeeds even when the LM has been fine-tuned to produce safer outputs. </p>
back
<p>Simulating sampling algorithms with people has proven a useful method for efficiently probing and understanding their mental representations. We propose that the same methods can be used to study the representations of Large Language Models (LLMs). While one can always directly prompt either humans or LLMs to disclose their mental representations introspectively, we show that increased efficiency can be achieved by using LLMs as elements of a sampling algorithm. We explore the extent to which we recover human-like representations when LLMs are interrogated with Direct Sampling and Markov chain Monte Carlo (MCMC). We found a significant increase in efficiency and performance using adaptive sampling algorithms based on MCMC. We also highlight the potential of our method to yield a more general method of conducting Bayesian inference \textit{with} LLMs. </p>
back
<p>Recent studies have advocated for fully open foundation models to promote transparency and open science. As an initial step, the Open Whisper-style Speech Model (OWSM) reproduced OpenAI's Whisper using publicly available data and open-source toolkits. With the aim of reproducing Whisper, the previous OWSM v1 through v3 models were still based on Transformer, which might lead to inferior performance compared to other state-of-the-art speech encoders. In this work, we aim to improve the performance and efficiency of OWSM without extra training data. We present E-Branchformer based OWSM v3.1 models at two scales, i.e., 100M and 1B. The 1B model is the largest E-Branchformer based speech model that has been made publicly available. It outperforms the previous OWSM v3 in a vast majority of evaluation benchmarks, while demonstrating up to 25% faster inference speed. We publicly release the data preparation scripts, pre-trained models and training logs. </p>
back
<p>Conversational search facilitates complex information retrieval by enabling multi-turn interactions between users and the system. Supporting such interactions requires a comprehensive understanding of the conversational inputs to formulate a good search query based on historical information. In particular, the search query should include the relevant information from the previous conversation turns. However, current approaches for conversational dense retrieval primarily rely on fine-tuning a pre-trained ad-hoc retriever using the whole conversational search session, which can be lengthy and noisy. Moreover, existing approaches are limited by the amount of manual supervision signals in the existing datasets. To address the aforementioned issues, we propose a History-Aware Conversational Dense Retrieval (HAConvDR) system, which incorporates two ideas: context-denoised query reformulation and automatic mining of supervision signals based on the actual impact of historical turns. Experiments on two public conversational search datasets demonstrate the improved history modeling capability of HAConvDR, in particular for long conversations with topic shifts. </p>
back
<p>LiNGAM determines the variable order from cause to effect using additive noise models, but it faces challenges with confounding. Previous methods maintained LiNGAM's fundamental structure while trying to identify and address variables affected by confounding. As a result, these methods required significant computational resources regardless of the presence of confounding, and they did not ensure the detection of all confounding types. In contrast, this paper enhances LiNGAM by introducing LiNGAM-MMI, a method that quantifies the magnitude of confounding using KL divergence and arranges the variables to minimize its impact. This method efficiently achieves a globally optimal variable order through the shortest path problem formulation. LiNGAM-MMI processes data as efficiently as traditional LiNGAM in scenarios without confounding while effectively addressing confounding situations. Our experimental results suggest that LiNGAM-MMI more accurately determines the correct variable order, both in the presence and absence of confounding. </p>
back
<p>As consumer Virtual Reality (VR) and Mixed Reality (MR) technologies gain momentum, there's a growing focus on the development of engagements with 3D virtual content. Unfortunately, traditional techniques for content creation, editing, and interaction within these virtual spaces are fraught with difficulties. They tend to be not only engineering-intensive but also require extensive expertise, which adds to the frustration and inefficiency in virtual object manipulation. Our proposed VR-GS system represents a leap forward in human-centered 3D content interaction, offering a seamless and intuitive user experience. By developing a physical dynamics-aware interactive Gaussian Splatting in a Virtual Reality setting, and constructing a highly efficient two-level embedding strategy alongside deformable body simulations, VR-GS ensures real-time execution with highly realistic dynamic responses. The components of our Virtual Reality system are designed for high efficiency and effectiveness, starting from detailed scene reconstruction and object segmentation, advancing through multi-view image in-painting, and extending to interactive physics-based editing. The system also incorporates real-time deformation embedding and dynamic shadow casting, ensuring a comprehensive and engaging virtual experience.Our project page is available at: https://yingjiang96.github.io/VR-GS/. </p>
back
<p>Relationship inference from sparse data is an important task with applications ranging from product recommendation to drug discovery. A recently proposed linear model for sparse matrix completion has demonstrated surprising advantage in speed and accuracy over more sophisticated recommender systems algorithms. Here we extend the linear model to develop a shallow autoencoder for the dual neighborhood-regularized matrix completion problem. We demonstrate the speed and accuracy advantage of our approach over the existing state-of-the-art in predicting drug-target interactions and drug-disease associations. </p>
back
<p>Smartphone overuse poses risks to people's physical and mental health. However, current intervention techniques mainly focus on explicitly changing screen content (i.e., output) and often fail to persistently reduce smartphone overuse due to being over-restrictive or over-flexible. We present the design and implementation of InteractOut, a suite of implicit input manipulation techniques that leverage interaction proxies to weakly inhibit the natural execution of common user gestures on mobile devices. We present a design space for input manipulations and demonstrate 8 Android implementations of input interventions. We first conducted a pilot lab study (N=30) to evaluate the usability of these interventions. Based on the results, we then performed a 5-week within-subject field experiment (N=42) to evaluate InteractOut in real-world scenarios. Compared to the traditional and common timed lockout technique, InteractOut significantly reduced the usage time by an additional 15.0% and opening frequency by 17.0% on participant-selected target apps. InteractOut also achieved a 25.4% higher user acceptance rate, and resulted in less frustration and better user experience according to participants' subjective feedback. InteractOut demonstrates a new direction for smartphone overuse intervention and serves as a strong complementary set of techniques with existing methods. </p>
back
<p>The rapid advancement of artificial intelligence technologies, particularly in recent years, has led to the emergence of several large parameter artificial intelligence weather forecast models. These models represent a significant breakthrough, overcoming the limitations of traditional numerical weather prediction models and indicating a potential second revolution for weather forecast. This study explores the evolution of these advanced artificial intelligence forecast models, and based on the identified commonalities, proposes the "Three Large Rules" for their development. We discuss the potential of artificial intelligence in revolutionizing numerical weather prediction, briefly outlining the underlying reasons for this potential. Additionally, we explore key areas for future development prospects for large artificial intelligence weather forecast models, integrating the entire numerical prediction process. Through an example that combines a large artificial intelligence model with ocean wave forecasting, we illustrate how forecasters can adapt and leverage the advanced artificial intelligence model. While acknowledging the high accuracy, computational efficiency, and ease of deployment of large artificial intelligence forecast models, we emphasize the irreplaceable values of traditional numerical forecasts. We believe that the optimal future of weather forecasting lies in achieving a seamless integration of artificial intelligence and traditional numerical models. Such a synthesis is anticipated to offer a more comprehensive and reliable approach for future weather forecasting. </p>
back
<p>In the rapidly evolving field of scientific research, efficiently extracting key information from the burgeoning volume of scientific papers remains a formidable challenge. This paper introduces an innovative framework designed to automate the extraction of vital data from scientific PDF documents, enabling researchers to discern future research trajectories more readily. AutoIE uniquely integrates four novel components: (1) A multi-semantic feature fusion-based approach for PDF document layout analysis; (2) Advanced functional block recognition in scientific texts; (3) A synergistic technique for extracting and correlating information on molecular sieve synthesis; (4) An online learning paradigm tailored for molecular sieve literature. Our SBERT model achieves high Marco F1 scores of 87.19 and 89.65 on CoNLL04 and ADE datasets. In addition, a practical application of AutoIE in the petrochemical molecular sieve synthesis domain demonstrates its efficacy, evidenced by an impressive 78\% accuracy rate. This research paves the way for enhanced data management and interpretation in molecular sieve synthesis. It is a valuable asset for seasoned experts and newcomers in this specialized field. </p>
back
<p>Large Language Models increasingly rely on distributed techniques for their training and inference. These techniques require communication across devices which can reduce scaling efficiency as the number of devices increases. While some distributed techniques can overlap, and thus, hide this communication with independent computations, techniques such as Tensor Parallelism (TP) inherently serialize communication with model execution. One approach to hide this serialized communication is to interleave it with the producer operation (of the communicated data) in a fine-grained manner. However, this fine-grained interleaving of communication and computation in software can be difficult. Furthermore, as with any concurrent execution, it requires compute and memory resources to be shared between computation and communication, causing resource contention that reduces overlapping efficacy. </p> <p>To overcome these challenges, we propose T3 which applies hardware-software co-design to transparently overlap serialized communication while minimizing resource contention with compute. T3 transparently fuses producer operations with the subsequent communication via a simple configuration of the producer's output address space and requires minor software changes. At the hardware level, T3 adds a lightweight track and trigger mechanism to orchestrate the producer's compute, and communication. It further uses compute-enhanced memories for communication's attendant compute. As a result, T3 reduces resource contention, and efficiently overlaps serialized communication with computation. For important Transformer models like T-NLG, T3 speeds up communication-heavy sublayers by 30% geomean (max 47%) and reduces data movement by 22% geomean (max 36%). Furthermore, T3's benefits persist as models scale: geomean 29% for sublayers in $\sim$500-billion parameter models, PALM and MT-NLG. </p>
back
<p>In this paper, we present a variety of classification experiments related to the task of fictional discourse detection. We utilize a diverse array of datasets, including contemporary professionally published fiction, historical fiction from the Hathi Trust, fanfiction, stories from Reddit, folk tales, GPT-generated stories, and anglophone world literature. Additionally, we introduce a new feature set of word "supersenses" that facilitate the goal of semantic generalization. The detection of fictional discourse can help enrich our knowledge of large cultural heritage archives and assist with the process of understanding the distinctive qualities of fictional storytelling more broadly. </p>
back
<p>Lithium-ion batteries (LIBs) have found wide applications in a variety of fields such as electrified transportation, stationary storage and portable electronics devices. A battery management system (BMS) is critical to ensure the reliability, efficiency and longevity of LIBs. Recent research has witnessed the emergence of model-based fault diagnosis methods in advanced BMSs. This paper provides a comprehensive review on the model-based fault diagnosis methods for LIBs. First, the widely explored battery models in the existing literature are classified into physics-based electrochemical models and electrical equivalent circuit models. Second, a general state-space representation that describes electrical dynamics of a faulty battery is presented. The formulation of the state vectors and the identification of the parameter matrices are then elaborated. Third, the fault mechanisms of both battery faults (incl. overcharege/overdischarge faults, connection faults, short circuit faults) and sensor faults (incl. voltage sensor faults and current sensor faults) are discussed. Furthermore, different types of modeling uncertainties, such as modeling errors and measurement noises, aging effects, measurement outliers, are elaborated. An emphasis is then placed on the observer design (incl. online state observers and offline state observers). The algorithm implementation of typical state observers for battery fault diagnosis is also put forward. Finally, discussion and outlook are offered to envision some possible future research directions. </p>
back
<p>In this work we introduce a manifold learning-based surrogate modeling framework for uncertainty quantification in high-dimensional stochastic systems. Our first goal is to perform data mining on the available simulation data to identify a set of low-dimensional (latent) descriptors that efficiently parameterize the response of the high-dimensional computational model. To this end, we employ Principal Geodesic Analysis on the Grassmann manifold of the response to identify a set of disjoint principal geodesic submanifolds, of possibly different dimension, that captures the variation in the data. Since operations on the Grassmann require the data to be concentrated, we propose an adaptive algorithm based on Riemanniann K-means and the minimization of the sample Frechet variance on the Grassmann manifold to identify "local" principal geodesic submanifolds that represent different system behavior across the parameter space. Polynomial chaos expansion is then used to construct a mapping between the random input parameters and the projection of the response on these local principal geodesic submanifolds. The method is demonstrated on four test cases, a toy-example that involves points on a hypersphere, a Lotka-Volterra dynamical system, a continuous-flow stirred-tank chemical reactor system, and a two-dimensional Rayleigh-Benard convection problem </p>
back
<p>Multimodal federated learning (FL) aims to enrich model training in FL settings where clients are collecting measurements across multiple modalities. However, key challenges to multimodal FL remain unaddressed, particularly in heterogeneous network settings where: (i) the set of modalities collected by each client will be diverse, and (ii) communication limitations prevent clients from uploading all their locally trained modality models to the server. In this paper, we propose multimodal Federated learning with joint Modality and Client selection (mmFedMC), a new FL methodology that can tackle the above-mentioned challenges in multimodal settings. The joint selection algorithm incorporates two main components: (a) A modality selection methodology for each client, which weighs (i) the impact of the modality, gauged by Shapley value analysis, (ii) the modality model size as a gauge of communication overhead, against (iii) the frequency of modality model updates, denoted recency, to enhance generalizability. (b) A client selection strategy for the server based on the local loss of modality model at each client. Experiments on five real-world datasets demonstrate the ability of mmFedMC to achieve comparable accuracy to several baselines while reducing the communication overhead by over 20x. A demo video of our methodology is available at https://liangqiy.com/mmfedmc/. </p>
back
<p>Collaborative learning (CL) is a distributed learning framework that aims to protect user privacy by allowing users to jointly train a model by sharing their gradient updates only. However, gradient inversion attacks (GIAs), which recover users' training data from shared gradients, impose severe privacy threats to CL. Existing defense methods adopt different techniques, e.g., differential privacy, cryptography, and perturbation defenses, to defend against the GIAs. Nevertheless, all current defense methods suffer from a poor trade-off between privacy, utility, and efficiency. To mitigate the weaknesses of existing solutions, we propose a novel defense method, Dual Gradient Pruning (DGP), based on gradient pruning, which can improve communication efficiency while preserving the utility and privacy of CL. Specifically, DGP slightly changes gradient pruning with a stronger privacy guarantee. And DGP can also significantly improve communication efficiency with a theoretical analysis of its convergence and generalization. Our extensive experiments show that DGP can effectively defend against the most powerful GIAs and reduce the communication cost without sacrificing the model's utility. </p>
back
<p>In material sciences, characterizing faults in periodic structures is vital for understanding material properties. To characterize magnetic labyrinthine patterns, it is necessary to accurately identify junctions and terminals, often featuring over a thousand closely packed defects per image. This study introduces a new technique called TM-CNN (Template Matching - Convolutional Neural Network) designed to detect a multitude of small objects in images, such as defects in magnetic labyrinthine patterns. TM-CNN was used to identify these structures in 444 experimental images, and the results were explored to deepen the understanding of magnetic materials. It employs a two-stage detection approach combining template matching, used in initial detection, with a convolutional neural network, used to eliminate incorrect identifications. To train a CNN classifier, it is necessary to create a large number of training images. This difficulty prevents the use of CNN in many practical applications. TM-CNN significantly reduces the manual workload for creating training images by automatically making most of the annotations and leaving only a small number of corrections to human reviewers. In testing, TM-CNN achieved an impressive F1 score of 0.988, far outperforming traditional template matching and CNN-based object detection algorithms. </p>
back
<p>The Standard Performance Evaluation Corporation (SPEC) CPU benchmark has been widely used as a measure of computing performance for decades. The SPEC is an industry-standardized, CPU-intensive benchmark suite and the collective data provide a proxy for the history of worldwide CPU and system performance. Past efforts have not provided or enabled answers to questions such as, how has the SPEC benchmark suite evolved empirically over time and what micro-architecture artifacts have had the most influence on performance? -- have any micro-benchmarks within the suite had undue influence on the results and comparisons among the codes? -- can the answers to these questions provide insights to the future of computer system performance? To answer these questions, we detail our historical and statistical analysis of specific hardware artifacts (clock frequencies, core counts, etc.) on the performance of the SPEC benchmarks since 1995. We discuss in detail several methods to normalize across benchmark evolutions. We perform both isolated and collective sensitivity analyses for various hardware artifacts and we identify one benchmark (libquantum) that had somewhat undue influence on performance outcomes. We also present the use of SPEC data to predict future performance. </p>
back
<p>Deep learning has been widely adopted across various fields, but there has been little focus on evaluating the performance of deep learning pipelines. With the increased use of large datasets and complex models, it has become common to run the training process only once and compare the result to previous benchmarks. However, this procedure can lead to imprecise comparisons due to the variance in neural network evaluation metrics. The metric variance comes from the randomness inherent in the training process of deep learning pipelines. Traditional solutions such as running the training process multiple times are usually not feasible in deep learning due to computational limitations. In this paper, we propose a new metric framework, Calibrated Loss Metric, that addresses this issue by reducing the variance in its vanilla counterpart. As a result, the new metric has a higher accuracy to detect effective modeling improvement. Our approach is supported by theoretical justifications and extensive experimental validations in the context of Deep Click-Through Rate Prediction Models. </p>
back
<p>Emerging applications, such as robot-assisted eldercare and object recognition, generally employ deep learning neural networks (DNNs) models and naturally require: i) handling streaming-in inference requests and ii) adapting to possible deployment scenario changes. Online model fine-tuning is widely adopted to satisfy these needs. However, fine-tuning involves significant energy consumption, making it challenging to deploy on edge devices. In this paper, we propose EdgeOL, an edge online learning framework that optimizes inference accuracy, fine-tuning execution time, and energy efficiency through both inter-tuning and intra-tuning optimizations. Experimental results show that, on average, EdgeOL reduces overall fine-tuning execution time by 82%, energy consumption by 74%, and improves average inference accuracy by 1.70% over the immediate online learning strategy. </p>
back
<p>Interactive visual grounding in Human-Robot Interaction (HRI) is challenging yet practical due to the inevitable ambiguity in natural languages. It requires robots to disambiguate the user input by active information gathering. Previous approaches often rely on predefined templates to ask disambiguation questions, resulting in performance reduction in realistic interactive scenarios. In this paper, we propose TiO, an end-to-end system for interactive visual grounding in human-robot interaction. Benefiting from a unified formulation of visual dialogue and grounding, our method can be trained on a joint of extensive public data, and show superior generality to diversified and challenging open-world scenarios. In the experiments, we validate TiO on GuessWhat?! and InViG benchmarks, setting new state-of-the-art performance by a clear margin. Moreover, we conduct HRI experiments on the carefully selected 150 challenging scenes as well as real-robot platforms. Results show that our method demonstrates superior generality to diversified visual and language inputs with a high success rate. Codes and demos are available at https://github.com/jxu124/TiO. </p>
back
<p>3D human pose estimation captures the human joint points in three-dimensional space while keeping the depth information and physical structure. That is essential for applications that require precise pose information, such as human-computer interaction, scene understanding, and rehabilitation training. Due to the challenges in data collection, mainstream datasets of 3D human pose estimation are primarily composed of multi-view video data collected in laboratory environments, which contains rich spatial-temporal correlation information besides the image frame content. Given the remarkable self-attention mechanism of transformers, capable of capturing the spatial-temporal correlation from multi-view video datasets, we propose a multi-stage framework for 3D sequence-to-sequence (seq2seq) human pose detection. Firstly, the spatial module represents the human pose feature by intra-image content, while the frame-image relation module extracts temporal relationships and 3D spatial positional relationship features between the multi-perspective images. Secondly, the self-attention mechanism is adopted to eliminate the interference from non-human body parts and reduce computing resources. Our method is evaluated on Human3.6M, a popular 3D human pose detection dataset. Experimental results demonstrate that our approach achieves state-of-the-art performance on this dataset. </p>
back
<p>Consider the task of estimating a random vector $X$ from noisy observations $Y = X + Z$, where $Z$ is a standard normal vector, under the $L^p$ fidelity criterion. This work establishes that, for $1 \leq p \leq 2$, the optimal Bayesian estimator is linear and positive definite if and only if the prior distribution on $X$ is a (non-degenerate) multivariate Gaussian. Furthermore, for $p &gt; 2$, it is demonstrated that there are infinitely many priors that can induce such an estimator. </p>
back
<p>Existing video-language studies mainly focus on learning short video clips, leaving long-term temporal dependencies rarely explored due to over-high computational cost of modeling long videos. To address this issue, one feasible solution is learning the correspondence between video clips and captions, which however inevitably encounters the multi-granularity noisy correspondence (MNC) problem. To be specific, MNC refers to the clip-caption misalignment (coarse-grained) and frame-word misalignment (fine-grained), hindering temporal learning and video understanding. In this paper, we propose NOise Robust Temporal Optimal traNsport (Norton) that addresses MNC in a unified optimal transport (OT) framework. In brief, Norton employs video-paragraph and clip-caption contrastive losses to capture long-term dependencies based on OT. To address coarse-grained misalignment in video-paragraph contrast, Norton filters out the irrelevant clips and captions through an alignable prompt bucket and realigns asynchronous clip-caption pairs based on transport distance. To address the fine-grained misalignment, Norton incorporates a soft-maximum operator to identify crucial words and key frames. Additionally, Norton exploits the potential faulty negative samples in clip-caption contrast by rectifying the alignment target with OT assignment to ensure precise temporal modeling. Extensive experiments on video retrieval, videoQA, and action segmentation verify the effectiveness of our method. Code is available at https://lin-yijie.github.io/projects/Norton. </p>
back
<p>Contemporary theories and models for electric power system stability are predicated on a widely held assumption that the mechanical inertia of the rotating mass of synchronous generators provides the sole contribution to stable and synchronized operation of this class of complex networks on subsecond timescales. Here we formulate the electromagnetic momentum of the field around the transmission lines that transports energy and present evidence from a real-world bulk power network that demonstrates its physical significance. We show the classical stability model for power networks that overlooks this property, known as the "swing equation", may become inadequate to analyze systems with high shares of inverter-based resources, commonly known as "low-inertia power systems". Subsequently, we introduce a plane wave dynamic model, consistent with the structural properties of emerging power systems with up to 100% inverter-based resources, which identifies the concept of inertia in power grids as a time-varying component. We leverage our theory to discuss a number of open questions in the electric power industry. Most notably, we postulate that the changing nature of power networks with a preponderance of variable renewable energy power plants could strengthen power network stability in the future; a vision which is irreconcilable with the conventional theories. </p>
back
<p>We consider the redundancy of the exact channel synthesis problem under an i.i.d. assumption. Existing results provide an upper bound on the unnormalized redundancy that is logarithmic in the block length. We show, via an improved scheme, that the logarithmic term can be halved for most channels and eliminated for all others. For full-support discrete memoryless channels, we show that this is the best possible. </p>
back
<p>This paper introduces the multivariate beta mixture model (MBMM), a new probabilistic model for soft clustering. MBMM adapts to diverse cluster shapes because of the flexible probability density function of the multivariate beta distribution. We introduce the properties of MBMM, describe the parameter learning procedure, and present the experimental results, showing that MBMM fits diverse cluster shapes on synthetic and real datasets. The code is released anonymously at \url{https://github.com/hhchen1105/mbmm/}. </p>
back
<p>This paper is concerned with the ordered statistic decoding with local constraints (LC-OSD) of binary linear block codes, which is a near maximum-likelihood decoding algorithm. Compared with the conventional OSD, the LC-OSD significantly reduces both the maximum and the average number of searches. The former is achieved by performing the serial list Viterbi algorithm (SLVA) or a two-way flipping pattern tree (FPT) algorithm with local constraints on the test error patterns, while the latter is achieved by incorporating tailored early termination criteria. The main objective of this paper is to explore the relationship between the performance of the LC-OSD and decoding parameters, such as the constraint degree and the maximum list size. To this end, we approximate the local parity-check matrix as a totally random matrix and then estimate the performance of the LC-OSD by analyzing with a saddlepoint approach the performance of random codes over the channels associated with the most reliable bits (MRBs). The random coding approach enables us to derive an upper bound on the performance and predict the average rank of the transmitted codeword in the list delivered by the LC-OSD. This allows us to balance the constraint degree and the maximum list size for the average (or maximum) time complexity reduction. Simulation results show that the approximation by random coding approach is numerically effective and powerful. Simulation results also show that the RS codes decoded by the LC-OSD can approach the random coding union (RCU) bounds, verifying the efficiency and universality of the LC-OSD. </p>
back
<p>Human digital twin (HDT) is an emerging paradigm that bridges physical twins (PTs) with powerful virtual twins (VTs) for assisting complex task executions in human-centric services. In this paper, we study a two-timescale online optimization for building HDT under an end-edge-cloud collaborative framework. As a unique feature of HDT, we consider that PTs' corresponding VTs are deployed on edge servers, consisting of not only generic models placed by downloading experiential knowledge from the cloud but also customized models updated by collecting personalized data from end devices. To maximize task execution accuracy with stringent energy and delay constraints, and by taking into account HDT's inherent mobility and status variation uncertainties, we jointly and dynamically optimize VTs' construction and PTs' task offloading, along with communication and computation resource allocations. Observing that decision variables are asynchronous with different triggers, we propose a novel two-timescale accuracy-aware online optimization approach (TACO). Specifically, TACO utilizes an improved Lyapunov method to decompose the problem into multiple instant ones, and then leverages piecewise McCormick envelopes and block coordinate descent based algorithms, addressing two timescales alternately. Theoretical analyses and simulations show that the proposed approach can reach asymptotic optimum within a polynomial-time complexity, and demonstrate its superiority over counterparts. </p>
back
<p>Leveraging the rich information extracted from light field (LF) cameras is instrumental for dense prediction tasks. However, adapting light field data to enhance Salient Object Detection (SOD) still follows the traditional RGB methods and remains under-explored in the community. Previous approaches predominantly employ a custom two-stream design to discover the implicit angular feature within light field cameras, leading to significant information isolation between different LF representations. In this study, we propose an efficient paradigm (LF Tracy) to address this limitation. We eschew the conventional specialized fusion and decoder architecture for a dual-stream backbone in favor of a unified, single-pipeline approach. This comprises firstly a simple yet effective data augmentation strategy called MixLD to bridge the connection of spatial, depth, and implicit angular information under different LF representations. A highly efficient information aggregation (IA) module is then introduced to boost asymmetric feature-wise information fusion. Owing to this innovative approach, our model surpasses the existing state-of-the-art methods, particularly demonstrating a 23% improvement over previous results on the latest large-scale PKU dataset. By utilizing only 28.9M parameters, the model achieves a 10% increase in accuracy with 3M additional parameters compared to its backbone using RGB images and an 86% rise to its backbone using LF images. The source code will be made publicly available at https://github.com/FeiBryantkit/LF-Tracy. </p>
back
<p>We demonstrate that large language models can produce reasonable numerical ratings of the logical consistency of claims. We also outline a mathematical approach based on sheaf theory for lifting such ratings to hypertexts such as laws, jurisprudence, and social media and evaluating their consistency globally. This approach is a promising avenue to increasing consistency in and of government, as well as to combating mis- and disinformation and related ills. </p>
back
<p>Maintainers are now self-sabotaging their work in order to take political or economic stances, a practice referred to as "protestware". In this poster, we present our approach to understand how the discourse about such an attack went viral, how it is received by the community, and whether developers respond to the attack in a timely manner. We study two notable protestware cases, i.e., Colors.js and es5-ext, comparing with discussions of a typical security vulnerability as a baseline, i.e., Ua-parser, and perform a thematic analysis of more than two thousand protest-related posts to extract the different narratives when discussing protestware. </p>
back
<p>State estimation for legged robots is challenging due to their highly dynamic motion and limitations imposed by sensor accuracy. By integrating Kalman filtering, optimization, and learning-based modalities, we propose a hybrid solution that combines proprioception and exteroceptive information for estimating the state of the robot's trunk. Leveraging joint encoder and IMU measurements, our Kalman filter is enhanced through a single-rigid body model that incorporates ground reaction force control outputs from convex Model Predictive Control optimization. The estimation is further refined through Gated Recurrent Units, which also considers semantic insights and robot height from a Vision Transformer autoencoder applied on depth images. This framework not only furnishes accurate robot state estimates, including uncertainty evaluations, but can minimize the nonlinear errors that arise from sensor measurements and model simplifications through learning. The proposed methodology is evaluated in hardware using a quadruped robot on various terrains, yielding a 65% improvement on the Root Mean Squared Error compared to our VIO SLAM baseline. Code example: https://github.com/AlexS28/OptiState </p>
back
<p>There has been a proliferation of artificial intelligence applications, where model training is key to promising high-quality services for these applications. However, the model training process is both time-intensive and energy-intensive, inevitably affecting the user's demand for application efficiency. Layer freezing, an efficient model training technique, has been proposed to improve training efficiency. Although existing layer freezing methods demonstrate the great potential to reduce model training costs, they still remain shortcomings such as lacking generalizability and compromised accuracy. For instance, existing layer freezing methods either require the freeze configurations to be manually defined before training, which does not apply to different networks, or use heuristic freezing criteria that is hard to guarantee decent accuracy in different scenarios. Therefore, there lacks a generic and smart layer freezing method that can automatically perform ``in-situation'' layer freezing for different networks during training processes. To this end, we propose a generic and efficient training framework (SmartFRZ). The core proposed technique in SmartFRZ is attention-guided layer freezing, which can automatically select the appropriate layers to freeze without compromising accuracy. Experimental results show that SmartFRZ effectively reduces the amount of computation in training and achieves significant training acceleration, and outperforms the state-of-the-art layer freezing approaches. </p>
back
<p>In this paper, we propose a novel approach for conducting face morphing attacks, which utilizes optimal-landmark-guided image blending. Current face morphing attacks can be categorized into landmark-based and generation-based approaches. Landmark-based methods use geometric transformations to warp facial regions according to averaged landmarks but often produce morphed images with poor visual quality. Generation-based methods, which employ generation models to blend multiple face images, can achieve better visual quality but are often unsuccessful in generating morphed images that can effectively evade state-of-the-art face recognition systems~(FRSs). Our proposed method overcomes the limitations of previous approaches by optimizing the morphing landmarks and using Graph Convolutional Networks (GCNs) to combine landmark and appearance features. We model facial landmarks as nodes in a bipartite graph that is fully connected and utilize GCNs to simulate their spatial and structural relationships. The aim is to capture variations in facial shape and enable accurate manipulation of facial appearance features during the warping process, resulting in morphed facial images that are highly realistic and visually faithful. Experiments on two public datasets prove that our method inherits the advantages of previous landmark-based and generation-based methods and generates morphed images with higher quality, posing a more significant threat to state-of-the-art FRSs. </p>
back
<p>The trajectory tracking problem is a fundamental control task in the study of mechanical systems. A key construction in tracking control is the error or difference between an actual and desired trajectory. This construction also lies at the heart of observer design and recent advances in the study of equivariant systems have provided a template for global error construction that exploits the symmetry structure of a group action if such a structure exists. Hamiltonian systems are posed on the cotangent bundle of configuration space of a mechanical system and symmetries for the full cotangent bundle are not commonly used in geometric control theory. In this paper, we propose a group structure on the cotangent bundle of a Lie group and leverage this to define momentum and configuration errors for trajectory tracking drawing on recent work on equivariant observer design. We show that this error definition leads to error dynamics that are themselves ``Euler-Poincare like'' and use these to derive simple, almost global trajectory tracking control for fully-actuated Euler-Poincare systems on a Lie group state space. </p>
back
<p>We study variable-length feedback (VLF) codes with noiseless feedback for discrete memoryless channels. We present a novel non-asymptotic bound, which analyzes the average error probability and average decoding time of our modified Yamamoto--Itoh scheme. We then optimize the parameters of our code in the asymptotic regime where the average error probability $\epsilon$ remains a constant as the average decoding time $N$ approaches infinity. Our second-order achievability bound is an improvement of Polyanskiy et al.'s (2011) achievability bound. We also universalize our code by employing the empirical mutual information in our decoding metric and derive a second-order achievability bound for universal VLF codes. Our results for both VLF and universal VLF codes are extended to the additive white Gaussian noise channel with an average power constraint. The former yields an improvement over Truong and Tan's (2017) achievability bound. The proof of our results for universal VLF codes uses a refined version of the method of types and an asymptotic expansion from the nonlinear renewal theory literature. </p>
back
<p>In the evolving landscape of online communication, moderating hate speech (HS) presents an intricate challenge, compounded by the multimodal nature of digital content. This comprehensive survey delves into the recent strides in HS moderation, spotlighting the burgeoning role of large language models (LLMs) and large multimodal models (LMMs). Our exploration begins with a thorough analysis of current literature, revealing the nuanced interplay between textual, visual, and auditory elements in propagating HS. We uncover a notable trend towards integrating these modalities, primarily due to the complexity and subtlety with which HS is disseminated. A significant emphasis is placed on the advances facilitated by LLMs and LMMs, which have begun to redefine the boundaries of detection and moderation capabilities. We identify existing gaps in research, particularly in the context of underrepresented languages and cultures, and the need for solutions to handle low-resource settings. The survey concludes with a forward-looking perspective, outlining potential avenues for future research, including the exploration of novel AI methodologies, the ethical governance of AI in moderation, and the development of more nuanced, context-aware systems. This comprehensive overview aims to catalyze further research and foster a collaborative effort towards more sophisticated, responsible, and human-centric approaches to HS moderation in the digital era.\footnote{ \textcolor{red}{WARNING: This paper contains offensive examples. </p>
back
<p>A recent study on the interpretability of real-valued convolutional neural networks (CNNs) \cite{Stankovic_Mandic_2023CNN} has revealed a direct and physically meaningful link with the task of finding features in data through matched filters. However, applying this paradigm to illuminate the interpretability of complex-valued CNNs meets a formidable obstacle: the extension of matched filtering to a general class of noncircular complex-valued data, referred to here as the widely linear matched filter (WLMF), has been only implicit in the literature. To this end, to establish the interpretability of the operation of complex-valued CNNs, we introduce a general WLMF paradigm, provide its solution and undertake analysis of its performance. For rigor, our WLMF solution is derived without imposing any assumption on the probability density of noise. The theoretical advantages of the WLMF over its standard strictly linear counterpart (SLMF) are provided in terms of their output signal-to-noise-ratios (SNRs), with WLMF consistently exhibiting enhanced SNR. Moreover, the lower bound on the SNR gain of WLMF is derived, together with condition to attain this bound. This serves to revisit the convolution-activation-pooling chain in complex-valued CNNs through the lens of matched filtering, which reveals the potential of WLMFs to provide physical interpretability and enhance explainability of general complex-valued CNNs. Simulations demonstrate the agreement between the theoretical and numerical results. </p>
back
<p>Recent developments in transformer-based language models have allowed them to capture a wide variety of world knowledge that can be adapted to downstream tasks with limited resources. However, what pieces of information are understood in these models is unclear, and neuron-level contributions in identifying them are largely unknown. Conventional approaches in neuron explainability either depend on a finite set of pre-defined descriptors or require manual annotations for training a secondary model that can then explain the neurons of the primary model. In this paper, we take BERT as an example and we try to remove these constraints and propose a novel and scalable framework that ties textual descriptions to neurons. We leverage the potential of generative language models to discover human-interpretable descriptors present in a dataset and use an unsupervised approach to explain neurons with these descriptors. Through various qualitative and quantitative analyses, we demonstrate the effectiveness of this framework in generating useful data-specific descriptors with little human involvement in identifying the neurons that encode these descriptors. In particular, our experiment shows that the proposed approach achieves 75% precision@2, and 50% recall@2 </p>
back
<p>This paper presents Flash, an optimized private inference (PI) hybrid protocol utilizing both homomorphic encryption (HE) and secure two-party computation (2PC), which can reduce the end-to-end PI latency for deep CNN models less than 1 minute with CPU. To this end, first, Flash proposes a low-latency convolution algorithm built upon a fast slot rotation operation and a novel data encoding scheme, which results in 4-94x performance gain over the state-of-the-art. Second, to minimize the communication cost introduced by the standard nonlinear activation function ReLU, Flash replaces the entire ReLUs with the polynomial $x^2+x$ and trains deep CNN models with the new activation function. The trained models improve the inference accuracy for CIFAR-10/100 and TinyImageNet by 16% on average (up to 40% for ResNet-32) compared to prior art. Last, Flash proposes an efficient 2PC-based $x^2+x$ evaluation protocol that does not require any offline communication and that reduces the total communication cost to process the activation layer by 84-196x over the state-of-the-art. As a result, the end-to-end PI latency of Flash implemented on CPU is 0.02 minute for CIFAR-100 and 0.57 minute for TinyImageNet classification, while the total data communication is 0.07GB for CIFAR-100 and 0.22GB for TinyImageNet. Flash improves the state-of-the-art PI by 16-45x in latency and 84-196x in communication cost. Moreover, even for ImageNet, Flash can deliver the latency less than 1 minute on CPU with the total communication less than 1GB. </p>
back
<p>The proliferation of deep learning in natural language processing (NLP) has led to the development and release of innovative technologies capable of understanding and generating human language with remarkable proficiency. Atinuke, a Transformer-based neural network, optimises performance across various language tasks by utilising a unique configuration. The architecture interweaves layers for processing sequential data with attention mechanisms to draw meaningful affinities between inputs and outputs. Due to the configuration of its topology and hyperparameter tuning, it can emulate human-like language by extracting features and learning complex mappings. Atinuke is modular, extensible, and integrates seamlessly with existing machine learning pipelines. Advanced matrix operations like softmax, embeddings, and multi-head attention enable nuanced handling of textual, acoustic, and visual signals. By unifying modern deep learning techniques with software design principles and mathematical theory, the system achieves state-of-the-art results on natural language tasks whilst remaining interpretable and robust. </p>
back
<p>Feature matching is a crucial task in the field of computer vision, which involves finding correspondences between images. Previous studies achieve remarkable performance using learning-based feature comparison. However, the pervasive presence of matching redundancy between images gives rise to unnecessary and error-prone computations in these methods, imposing limitations on their accuracy. To address this issue, we propose MESA, a novel approach to establish precise area (or region) matches for efficient matching redundancy reduction. MESA first leverages the advanced image understanding capability of SAM, a state-of-the-art foundation model for image segmentation, to obtain image areas with implicit semantic. Then, a multi-relational graph is proposed to model the spatial structure of these areas and construct their scale hierarchy. Based on graphical models derived from the graph, the area matching is reformulated as an energy minimization task and effectively resolved. Extensive experiments demonstrate that MESA yields substantial precision improvement for multiple point matchers in indoor and outdoor downstream tasks, e.g. +13.61% for DKM in indoor pose estimation. </p>
back
<p>While generative AI is now widespread and useful in society, there are potential risks of misuse, e.g., unconsciously influencing cognitive processes or decision-making. Although this causes a security problem in the cognitive domain, there has been no research about neural and computational mechanisms counteracting the impact of malicious generative AI in humans. We propose DecNefGAN, a novel framework that combines a generative adversarial system and a neural reinforcement model. More specifically, DecNefGAN bridges human and generative AI in a closed-loop system, with the AI creating stimuli that induce specific mental states, thus exerting external control over neural activity. The objective of the human is the opposite, to compete and reach an orthogonal mental state. This framework can contribute to elucidating how the human brain responds to and counteracts the potential influence of generative AI. </p>
back
<p>In this paper, practical utilization of multiple distributed reconfigurable intelligent surfaces (RISs), which are able to conduct group-specific operations, for multi-group multicasting systems is investigated. To tackle the inter-group interference issue in the multi-group multicasting systems, the block diagonalization (BD)-based beamforming is considered first. Without any inter-group interference after the BD operation, the multiple distributed RISs are operated to maximize the minimum rate for each group. Since the computational complexity of the BD-based beamforming can be too high, a multicasting tailored zero-forcing (MTZF) beamforming technique is proposed to efficiently suppress the inter-group interference, and the novel design for the multiple RISs that makes up for the inevitable loss of MTZF beamforming is also described. Effective closed-form solutions for the loss minimizing RIS operations are obtained with basic linear operations, making the proposed MTZF beamforming-based RIS design highly practical. Numerical results show that the BD-based approach has ability to achieve high sum-rate, but it is useful only when the base station deploys large antenna arrays. Even with the small number of antennas, the MTZF beamforming-based approach outperforms the other schemes in terms of the sum-rate while the technique requires low computational complexity. The results also prove that the proposed techniques can work with the minimum rate requirement for each group. </p>
back
<p>Algorithmic decisions in critical domains such as hiring, college admissions, and lending are often based on rankings. Because of the impact these decisions have on individuals, organizations, and population groups, there is a need to understand them: to know whether the decisions are abiding by the law, to help individuals improve their rankings, and to design better ranking procedures. </p> <p>In this paper, we present ShaRP (Shapley for Rankings and Preferences), a framework that explains the contributions of features to different aspects of a ranked outcome, and is based on Shapley values. Using ShaRP, we show that even when the scoring function used by an algorithmic ranker is known and linear, the weight of each feature does not correspond to its Shapley value contribution. The contributions instead depend on the feature distributions, and on the subtle local interactions between the scoring features. ShaRP builds on the Quantitative Input Influence framework, and can compute the contributions of features for multiple Quantities of Interest, including score, rank, pair-wise preference, and top-k. Because it relies on black-box access to the ranker, ShaRP can be used to explain both score-based and learned ranking models. We show results of an extensive experimental validation of ShaRP using real and synthetic datasets, showcasing its usefulness for qualitative analysis. </p>
back
<p>Large language models (LLMs) are increasingly relied upon for complex multi-turn conversations across diverse real-world applications. However, existing benchmarks predominantly focus on single-turn evaluations, overlooking the models' capabilities in multi-turn interactions. To address this gap, we introduce MT-Eval, a comprehensive benchmark designed to evaluate multi-turn conversational abilities. By analyzing human-LLM conversations, we categorize interaction patterns into four types: recollection, expansion, refinement, and follow-up. We construct multi-turn queries for each category either by augmenting existing datasets or by creating new examples with GPT-4 to avoid data leakage. To study the factors impacting multi-turn abilities, we create single-turn versions of the 1170 multi-turn queries and compare performance. Our evaluation of 11 well-known LLMs shows that while closed-source models generally surpass open-source ones, certain open-source models exceed GPT-3.5-Turbo in specific tasks. We observe significant performance degradation in multi-turn settings compared to single-turn settings in most models, which is not correlated with the models' fundamental capabilities. Moreover, we identify the distance to relevant content and susceptibility to error propagation as the key factors influencing multi-turn performance. MT-Eval is released publicly to encourage future research towards more robust conversational models. </p>
back
<p>Racism is an alarming phenomenon in our country as well as all over the world. Every day we have come across some racist comments in our daily life and virtual life. Though we can eradicate this racism from virtual life (such as Social Media). In this paper, we have tried to detect those racist comments with NLP and deep learning techniques. We have built a novel dataset in the Bengali Language. Further, we annotated the dataset and conducted data label validation. After extensive utilization of deep learning methodologies, we have successfully achieved text detection with an impressive accuracy rate of 87.94\% using the Ensemble approach. We have applied RNN and LSTM models using BERT Embeddings. However, the MCNN-LSTM model performed highest among all those models. Lastly, the Ensemble approach has been followed to combine all the model results to increase overall performance. </p>
back
<p>We study communication over a Gaussian multiple-access channel (MAC) with two types of transmitters: Digital transmitters hold a message from a discrete set that needs to be communicated to the receiver. Analog transmitters hold sequences of analog values, and some function of these distributed values (but not the values themselves) need to be conveyed to the receiver. For the digital messages, it is required that they can be decoded error free at the receiver with high probability while the recovered analog function values have to satisfy a fidelity criterion such as an upper bound on mean squared error (MSE) or a certain maximum error with a given confidence. For the case in which the computed function for the analog transmitters is a sum of values in [-1,1], we derive inner and outer bounds for the tradeoff of digital and analog rates of communication under peak and average power constraints for digital transmitters and a peak power constraint for analog transmitters. We then extend the achievability part of our result to a larger class of functions that includes all linear, but also some non-linear functions. </p>
back
<p>This paper studies zero-shot anomaly classification (AC) and segmentation (AS) in industrial vision. We reveal that the abundant normal and abnormal cues implicit in unlabeled test images can be exploited for anomaly determination, which is ignored by prior methods. Our key observation is that for the industrial product images, the normal image patches could find a relatively large number of similar patches in other unlabeled images, while the abnormal ones only have a few similar patches. We leverage such a discriminative characteristic to design a novel zero-shot AC/AS method by Mutual Scoring (MuSc) of the unlabeled images, which does not need any training or prompts. Specifically, we perform Local Neighborhood Aggregation with Multiple Degrees (LNAMD) to obtain the patch features that are capable of representing anomalies in varying sizes. Then we propose the Mutual Scoring Mechanism (MSM) to leverage the unlabeled test images to assign the anomaly score to each other. Furthermore, we present an optimization approach named Re-scoring with Constrained Image-level Neighborhood (RsCIN) for image-level anomaly classification to suppress the false positives caused by noises in normal images. The superior performance on the challenging MVTec AD and VisA datasets demonstrates the effectiveness of our approach. Compared with the state-of-the-art zero-shot approaches, MuSc achieves a $\textbf{21.1%}$ PRO absolute gain (from 72.7% to 93.8%) on MVTec AD, a $\textbf{19.4%}$ pixel-AP gain and a $\textbf{14.7%}$ pixel-AUROC gain on VisA. In addition, our zero-shot approach outperforms most of the few-shot approaches and is comparable to some one-class methods. Code is available at https://github.com/xrli-U/MuSc. </p>
back
<p>Powered by the increasing predictive capabilities of machine learning algorithms, artificial intelligence (AI) systems have begun to be used to overrule human mistakes in many settings. We provide the first field evidence this AI oversight carries psychological costs that can impact human decision-making. We investigate one of the highest visibility settings in which AI oversight has occurred: the Hawk-Eye review of umpires in top tennis tournaments. We find that umpires lowered their overall mistake rate after the introduction of Hawk-Eye review, in line with rational inattention given psychological costs of being overruled by AI. We also find that umpires increased the rate at which they called balls in, which produced a shift from making Type II errors (calling a ball out when in) to Type I errors (calling a ball in when out). We structurally estimate the psychological costs of being overruled by AI using a model of rational inattentive umpires, and our results suggest that because of these costs, umpires cared twice as much about Type II errors under AI oversight. </p>
back
<p>Dynamical behaviors of complex interacting systems, including brain activities, financial price movements, and physical collective phenomena, are associated with underlying interactions between the system's components. The issue of uncovering interaction relations in such systems using observable dynamics is called relational inference. In this study, we propose a Diffusion model for Relational Inference (DiffRI), inspired by a self-supervised method for probabilistic time series imputation. DiffRI learns to infer the probability of the presence of connections between components through conditional diffusion modeling. Experiments on both simulated and quasi-real datasets show that DiffRI is highly competent compared with other state-of-the-art models in discovering ground truth interactions in an unsupervised manner. Our code will be made public soon. </p>
back
<p>Executing deep neural networks (DNNs) on edge artificial intelligence (AI) devices enables various autonomous mobile computing applications. However, the memory budget of edge AI devices restricts the number and complexity of DNNs allowed in such applications. Existing solutions, such as model compression or cloud offloading, reduce the memory footprint of DNN inference at the cost of decreased model accuracy or autonomy. To avoid these drawbacks, we divide DNN into blocks and swap them in and out in order, such that large DNNs can execute within a small memory budget. Nevertheless, naive swapping on edge AI devices induces significant delays due to the redundant memory operations in the DNN development ecosystem for edge AI devices. To this end, we develop SwapNet, an efficient DNN block swapping middleware for edge AI devices. We systematically eliminate the unnecessary memory operations during block swapping while retaining compatible with the deep learning frameworks, GPU backends, and hardware architectures of edge AI devices. We further showcase the utility of SwapNet via a multi-DNN scheduling scheme. Evaluations on eleven DNN inference tasks in three applications demonstrate that SwapNet achieves almost the same latency as the case with sufficient memory even when DNNs demand 2.32x to 5.81x memory beyond the available budget. The design of SwapNet also provides novel and feasible insights for deploying large language models (LLMs) on edge AI devices in the future. </p>
back
<p>We construct a system, Sandi, to bring trust in online communication between parties that share little or no context. Sandi is based on a unique ``somewhat monotone'' privacy-preserving reputation system, with strong privacy and security properties. Registered senders request cryptographic tags from Sandi, which they attach to their messages. Message receivers do not need registered accounts, but they can use a sender's score to decide how much the sender should be trusted. If a receiver finds the message inappropriate, they can use the tag to report the sender to Sandi, thus decreasing the sender's score. The design of Sandi ensures compatibility with any communication system that allows for small binary data transmission. </p> <p>Sandi aims to benefit both senders and receivers. Senders benefit, as receivers are more likely to react to their messages with reputation scores attached. Receivers benefit, as they can make better choices in who to interact with based on indisputable evidence from prior receivers. </p> <p>Sandi does not require senders or receivers to maintain long-term secret keys. We provide a score integrity guarantee for the senders, a full communication privacy guarantee for the senders and receivers, a report privacy guarantee to protect reporting receivers, and an unlinkability guarantee to protect senders. </p> <p>Finally, we provide a game-theoretic analysis for the sender. We prove that, for any score function satisfying a list of properties, Sandi drives rational senders towards a strategy, which reduces the amount of inappropriate messages. </p>
back
<p>Weight quantization is an effective technique to compress deep neural networks for their deployment on edge devices with limited resources. Traditional loss-aware quantization methods commonly use the quantized gradient to replace the full-precision gradient. However, we discover that the gradient error will lead to an unexpected zig-zagging-like issue in the gradient descent learning procedures, where the gradient directions rapidly oscillate or zig-zag, and such issue seriously slows down the model convergence. Accordingly, this paper proposes a one-step forward and backtrack way for loss-aware quantization to get more accurate and stable gradient direction to defy this issue. During the gradient descent learning, a one-step forward search is designed to find the trial gradient of the next-step, which is adopted to adjust the gradient of current step towards the direction of fast convergence. After that, we backtrack the current step to update the full-precision and quantized weights through the current-step gradient and the trial gradient. A series of theoretical analysis and experiments on benchmark deep models have demonstrated the effectiveness and competitiveness of the proposed method, and our method especially outperforms others on the convergence performance. </p>
back
<p>Diffusion-based text-to-image personalization have achieved great success in generating subjects specified by users among various contexts. Even though, existing finetuning-based methods still suffer from model overfitting, which greatly harms the generative diversity, especially when given subject images are few. To this end, we propose Pick-and-Draw, a training-free semantic guidance approach to boost identity consistency and generative diversity for personalization methods. Our approach consists of two components: appearance picking guidance and layout drawing guidance. As for the former, we construct an appearance palette with visual features from the reference image, where we pick local patterns for generating the specified subject with consistent identity. As for layout drawing, we outline the subject's contour by referring to a generative template from the vanilla diffusion model, and inherit the strong image prior to synthesize diverse contexts according to different text conditions. The proposed approach can be applied to any personalized diffusion models and requires as few as a single reference image. Qualitative and quantitative experiments show that Pick-and-Draw consistently improves identity consistency and generative diversity, pushing the trade-off between subject fidelity and image-text fidelity to a new Pareto frontier. </p>
back
<p>We present an overview of recent developments on the convergence analysis of numerical methods for inviscid multidimensional compressible flows that preserve underlying physical structures. We introduce the concept of generalized solutions, the so-called dissipative solutions, and explain their relationship to other commonly used solution concepts. In numerical experiments we apply K-convergence of numerical solutions and approximate turbulent solutions together with the Reynolds stress defect and the energy defect. </p>
back
<p>Witnessing the evolution of text-to-image diffusion models, significant strides have been made in text-to-3D generation. Currently, two primary paradigms dominate the field of text-to-3D: the feed-forward generation solutions, capable of swiftly producing 3D assets but often yielding coarse results, and the Score Distillation Sampling (SDS) based solutions, known for generating high-fidelity 3D assets albeit at a slower pace. The synergistic integration of these methods holds substantial promise for advancing 3D generation techniques. In this paper, we present BoostDream, a highly efficient plug-and-play 3D refining method designed to transform coarse 3D assets into high-quality. The BoostDream framework comprises three distinct processes: (1) We introduce 3D model distillation that fits differentiable representations from the 3D assets obtained through feed-forward generation. (2) A novel multi-view SDS loss is designed, which utilizes a multi-view aware 2D diffusion model to refine the 3D assets. (3) We propose to use prompt and multi-view consistent normal maps as guidance in refinement.Our extensive experiment is conducted on different differentiable 3D representations, revealing that BoostDream excels in generating high-quality 3D assets rapidly, overcoming the Janus problem compared to conventional SDS-based methods. This breakthrough signifies a substantial advancement in both the efficiency and quality of 3D generation processes. </p>
back
<p>Large Language Models (LLMs) have become increasingly popular for their advanced text generation capabilities across various domains. However, like any software, they face security challenges, including the risk of 'jailbreak' attacks that manipulate LLMs to produce prohibited content. A particularly underexplored area is the Multilingual Jailbreak attack, where malicious questions are translated into various languages to evade safety filters. Currently, there is a lack of comprehensive empirical studies addressing this specific threat. </p> <p>To address this research gap, we conducted an extensive empirical study on Multilingual Jailbreak attacks. We developed a novel semantic-preserving algorithm to create a multilingual jailbreak dataset and conducted an exhaustive evaluation on both widely-used open-source and commercial LLMs, including GPT-4 and LLaMa. Additionally, we performed interpretability analysis to uncover patterns in Multilingual Jailbreak attacks and implemented a fine-tuning mitigation method. Our findings reveal that our mitigation strategy significantly enhances model defense, reducing the attack success rate by 96.2%. This study provides valuable insights into understanding and mitigating Multilingual Jailbreak attacks. </p>
back
<p>Deep Neural Network (DNN) models when implemented on executing devices as the inference engines are susceptible to Fault Injection Attacks (FIAs) that manipulate model parameters to disrupt inference execution with disastrous performance. This work introduces Contrastive Learning (CL) of visual representations i.e., a self-supervised learning approach into the deep learning training and inference pipeline to implement DNN inference engines with self-resilience under FIAs. Our proposed CL based FIA Detection and Recovery (CFDR) framework features (i) real-time detection with only a single batch of testing data and (ii) fast recovery effective even with only a small amount of unlabeled testing data. Evaluated with the CIFAR-10 dataset on multiple types of FIAs, our CFDR shows promising detection and recovery effectiveness. </p>
back
<p>Molecular core structures and R-groups are essential concepts in drug development. Integration of these concepts with conventional graph pre-training approaches can promote deeper understanding in molecules. We propose MolPLA, a novel pre-training framework that employs masked graph contrastive learning in understanding the underlying decomposable parts inmolecules that implicate their core structure and peripheral R-groups. Furthermore, we formulate an additional framework that grants MolPLA the ability to help chemists find replaceable R-groups in lead optimization scenarios. Experimental results on molecular property prediction show that MolPLA exhibits predictability comparable to current state-of-the-art models. Qualitative analysis implicate that MolPLA is capable of distinguishing core and R-group sub-structures, identifying decomposable regions in molecules and contributing to lead optimization scenarios by rationally suggesting R-group replacements given various query core templates. The code implementation for MolPLA and its pre-trained model checkpoint is available at https://github.com/dmis-lab/MolPLA </p>
back
<p>Imitation learning is often used in addition to reinforcement learning in environments where reward design is difficult or where the reward is sparse, but it is difficult to be able to imitate well in unknown states from a small amount of expert data and sampling data. Supervised learning methods such as Behavioral Cloning do not require sampling data, but usually suffer from distribution shift. The methods based on reinforcement learning, such as inverse reinforcement learning and Generative Adversarial imitation learning (GAIL), can learn from only a few expert data. However, they often need to interact with the environment. Soft Q imitation learning (SQIL) addressed the problems, and it was shown that it could learn efficiently by combining Behavioral Cloning and soft Q-learning with constant rewards. In order to make this algorithm more robust to distribution shift, we propose more efficient and robust algorithm by adding to this method a reward function based on adversarial inverse reinforcement learning that rewards the agent for performing actions in status similar to the demo. We call this algorithm Discriminator Soft Q Imitation Learning (DSQIL). We evaluated it on MuJoCo environments. </p>
back
<p>Activity detection is an important task in the next generation grant-free multiple access. While there are a number of existing algorithms designed for this purpose, they mostly require precise information about the network, such as large-scale fading coefficients, small-scale fading channel statistics, noise variance at the access points, and user activity probability. Acquiring these information would take a significant overhead and their estimated values might not be accurate. This problem is even more severe in cell-free networks as there are many of these parameters to be acquired. Therefore, this paper sets out to investigate the activity detection problem without the above-mentioned information. In order to handle so many unknown parameters, this paper employs the Bayesian approach, where the unknown variables are endowed with prior distributions which effectively act as regularizations. Together with the likelihood function, a maximum a posteriori (MAP) estimator and a variational inference algorithm are derived. Extensive simulations demonstrate that the proposed methods, even without the knowledge of these system parameters, perform better than existing state-of-the-art methods, such as covariance-based and approximate message passing methods. </p>
back
<p>Sequential neural posterior estimation (SNPE) techniques have been recently proposed for dealing with simulation-based models with intractable likelihoods. They are devoted to learning the posterior from adaptively proposed simulations using neural network-based conditional density estimators. As a SNPE technique, the automatic posterior transformation (APT) method proposed by Greenberg et al. (2019) performs notably and scales to high dimensional data. However, the APT method bears the computation of an expectation of the logarithm of an intractable normalizing constant, i.e., a nested expectation. Although atomic APT was proposed to solve this by discretizing the normalizing constant, it remains challenging to analyze the convergence of learning. In this paper, we propose a nested APT method to estimate the involved nested expectation instead. This facilitates establishing the convergence analysis. Since the nested estimators for the loss function and its gradient are biased, we make use of unbiased multi-level Monte Carlo (MLMC) estimators for debiasing. To further reduce the excessive variance of the unbiased estimators, this paper also develops some truncated MLMC estimators by taking account of the trade-off between the bias and the average cost. Numerical experiments for approximating complex posteriors with multimodal in moderate dimensions are provided. </p>
back
<p>Due to non-stationarity of time series, the distribution shift problem largely hinders the performance of time series forecasting. Existing solutions either fail for the shifts beyond simple statistics or the limited compatibility with forecasting models. In this paper, we propose a general decoupled formulation for time series forecasting, with no reliance on fixed statistics and no restriction on forecasting architectures. Then, we make such a formulation formalized into a bi-level optimization problem, to enable the joint learning of the transformation (outer loop) and forecasting (inner loop). Moreover, the special requirements of expressiveness and bi-direction for the transformation motivate us to propose instance normalization flows (IN-Flow), a novel invertible network for time series transformation. Extensive experiments demonstrate our method consistently outperforms state-of-the-art baselines on both synthetic and real-world data. </p>
back
<p>In this paper, we present a signaling design for secure integrated sensing and communication (ISAC) systems comprising a dual-functional multi-input multi-output (MIMO) base station (BS) that simultaneously communicates with multiple users while detecting targets present in their vicinity, which are regarded as potential eavesdroppers. In particular, assuming that the distribution of each parameter to be estimated is known \textit{a priori}, we focus on optimizing the targets' sensing performance. To this end, we derive and minimize the Bayesian Cram\'er-Rao bound (BCRB), while ensuring certain communication quality of service (QoS) by exploiting constructive interference (CI). The latter scheme enforces that the received signals at the eavesdropping targets fall into the destructive region of the signal constellation, to deteriorate their decoding probability, thus enhancing the ISAC's system physical-layer security (PLS) capability. To tackle the nonconvexity of the formulated problem, a tailored successive convex approximation method is proposed for its efficient solution. Our extensive numerical results verify the effectiveness of the proposed secure ISAC design showing that the proposed algorithm outperforms block-level precoding techniques. </p>
back
<p>Complex field imaging, which captures both the amplitude and phase information of input optical fields or objects, can offer rich structural insights into samples, such as their absorption and refractive index distributions. However, conventional image sensors are intensity-based and inherently lack the capability to directly measure the phase distribution of a field. This limitation can be overcome using interferometric or holographic methods, often supplemented by iterative phase retrieval algorithms, leading to a considerable increase in hardware complexity and computational demand. Here, we present a complex field imager design that enables snapshot imaging of both the amplitude and quantitative phase information of input fields using an intensity-based sensor array without any digital processing. Our design utilizes successive deep learning-optimized diffractive surfaces that are structured to collectively modulate the input complex field, forming two independent imaging channels that perform amplitude-to-amplitude and phase-to-intensity transformations between the input and output planes within a compact optical design, axially spanning ~100 wavelengths. The intensity distributions of the output fields at these two channels on the sensor plane directly correspond to the amplitude and quantitative phase profiles of the input complex field, eliminating the need for any digital image reconstruction algorithms. We experimentally validated the efficacy of our complex field diffractive imager designs through 3D-printed prototypes operating at the terahertz spectrum, with the output amplitude and phase channel images closely aligning with our numerical simulations. We envision that this complex field imager will have various applications in security, biomedical imaging, sensing and material science, among others. </p>
back
<p>This paper provides a comprehensive review of the latest advancements in fetal motion correction in MRI. We delve into various contemporary methodologies and technological advancements aimed at overcoming these challenges. It includes traditional 3D fetal MRI correction methods like Slice to Volume Registration (SVR), deep learning-based techniques such as Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM) Networks, Transformers, Generative Adversarial Networks (GANs) and most recent advancements of Diffusion Models. The insights derived from this literature review reflect a thorough understanding of both the technical intricacies and practical implications of fetal motion in MRI studies, offering a reasoned perspective on potential solutions and future improvements in this field. </p>
back
<p>Graph neural networks (GNNs) have achieved remarkable performance on graph-structured data. However, GNNs may inherit prejudice from the training data and make discriminatory predictions based on sensitive attributes, such as gender and race. Recently, there has been an increasing interest in ensuring fairness on GNNs, but all of them are under the assumption that the training and testing data are under the same distribution, i.e., training data and testing data are from the same graph. Will graph fairness performance decrease under distribution shifts? How does distribution shifts affect graph fairness learning? All these open questions are largely unexplored from a theoretical perspective. To answer these questions, we first theoretically identify the factors that determine bias on a graph. Subsequently, we explore the factors influencing fairness on testing graphs, with a noteworthy factor being the representation distances of certain groups between the training and testing graph. Motivated by our theoretical analysis, we propose our framework FatraGNN. Specifically, to guarantee fairness performance on unknown testing graphs, we propose a graph generator to produce numerous graphs with significant bias and under different distributions. Then we minimize the representation distances for each certain group between the training graph and generated graphs. This empowers our model to achieve high classification and fairness performance even on generated graphs with significant bias, thereby effectively handling unknown testing graphs. Experiments on real-world and semi-synthetic datasets demonstrate the effectiveness of our model in terms of both accuracy and fairness. </p>
back
<p>Support vector regression (SVR) has garnered significant popularity over the past two decades owing to its wide range of applications across various fields. Despite its versatility, SVR encounters challenges when confronted with outliers and noise, primarily due to the use of the $\varepsilon$-insensitive loss function. To address this limitation, SVR with bounded loss functions has emerged as an appealing alternative, offering enhanced generalization performance and robustness. Notably, recent developments focus on designing bounded loss functions with smooth characteristics, facilitating the adoption of gradient-based optimization algorithms. However, it's crucial to highlight that these bounded and smooth loss functions do not possess an insensitive zone. In this paper, we address the aforementioned constraints by introducing a novel symmetric loss function named the HawkEye loss function. It is worth noting that the HawkEye loss function stands out as the first loss function in SVR literature to be bounded, smooth, and simultaneously possess an insensitive zone. Leveraging this breakthrough, we integrate the HawkEye loss function into the least squares framework of SVR and yield a new fast and robust model termed HE-LSSVR. The optimization problem inherent to HE-LSSVR is addressed by harnessing the adaptive moment estimation (Adam) algorithm, known for its adaptive learning rate and efficacy in handling large-scale problems. To our knowledge, this is the first time Adam has been employed to solve an SVR problem. To empirically validate the proposed HE-LSSVR model, we evaluate it on UCI, synthetic, and time series datasets. The experimental outcomes unequivocally reveal the superiority of the HE-LSSVR model both in terms of its remarkable generalization performance and its efficiency in training time. </p>
back
<p>MediaWiki and Wikipedia authors usually use LaTeX to define mathematical formulas in the wiki text markup. In the Wikimedia ecosystem, these formulas were processed by a long cascade of web services and finally delivered to users' browsers in rendered form for visually readable representation as SVG. </p> <p>With the latest developments of supporting MathML Core in Chromium-based browsers, MathML continues its path to be a de facto standard markup language for mathematical notation in the web. Conveying formulas in MathML enables semantic annotation and machine readability for extended interpretation of mathematical content, in example for accessibility technologies. </p> <p>With this work, we present WikiTexVC, a novel method for validating LaTeX formulas from wiki texts and converting them to MathML, which is directly integrated into MediaWiki. This mitigates the shortcomings of previously used rendering methods in MediaWiki in terms of robustness, maintainability and performance. In addition, there is no need for a multitude of web services running in the background, but processing takes place directly within MediaWiki instances. We validated this method with an extended dataset of over 300k formulas which have been incorporated as automated tests to the MediaWiki continuous integration instances. Furthermore, we conducted an evaluation with 423 formulas, comparing the tree edit distance for produced parse trees to other MathML renderers. Our method has been made available Open Source and can be used on German Wikipedia and is delivered with recent MediaWiki versions. As a practical example of enabling semantic annotations within our method, we present a new macro that adds content to formula disambiguation to facilitate accessibility for visually impaired people. </p>
back
<p>Despite the utility of Large Language Models (LLMs) across a wide range of tasks and scenarios, developing a method for reliably evaluating LLMs across varied contexts continues to be challenging. Modern evaluation approaches often use LLMs to assess responses generated by LLMs. However, the meta-evaluation conducted to assess the effectiveness of these LLMs as evaluators is typically constrained by the coverage of existing benchmarks or requires extensive human annotation. This underscores the urgency of methods for scalable meta-evaluation that can effectively, reliably, and efficiently evaluate the performance of LLMs as evaluators across diverse tasks and scenarios, particularly in potentially new, user-defined scenarios. To fill this gap, we propose ScaleEval, an agent-debate-assisted meta-evaluation framework that leverages the capabilities of multiple communicative LLM agents. This framework supports multi-round discussions to assist human annotators in discerning the most capable LLMs as evaluators, which significantly eases their workload in cases that used to require large-scale annotations during meta-evaluation. We release the code for our framework, which is publicly available at: \url{https://github.com/GAIR-NLP/scaleeval}. </p>
back
<p>Training an effective Machine learning (ML) model is an iterative process that requires effort in multiple dimensions. Vertically, a single pipeline typically includes an initial ETL (Extract, Transform, Load) of raw datasets, a model training stage, and an evaluation stage where the practitioners obtain statistics of the model performance. Horizontally, many such pipelines may be required to find the best model within a search space of model configurations. Many practitioners resort to maintaining logs manually and writing simple glue code to automate the workflow. However, carrying out this process on the cloud is not a trivial task in terms of resource provisioning, data management, and bookkeeping of job histories to make sure the results are reproducible. We propose an end-to-end cloud-based machine learning platform, Accelerated Cloud for AI (ACAI), to help improve the productivity of ML practitioners. ACAI achieves this goal by enabling cloud-based storage of indexed, labeled, and searchable data, as well as automatic resource provisioning, job scheduling, and experiment tracking. Specifically, ACAI provides practitioners (1) a data lake for storing versioned datasets and their corresponding metadata, and (2) an execution engine for executing ML jobs on the cloud with automatic resource provisioning (auto-provision), logging and provenance tracking. To evaluate ACAI, we test the efficacy of our auto-provisioner on the MNIST handwritten digit classification task, and we study the usability of our system using experiments and interviews. We show that our auto-provisioner produces a 1.7x speed-up and 39% cost reduction, and our system reduces experiment time for ML scientists by 20% on typical ML use cases. </p>
back
<p>The Versal Adaptive Compute Acceleration Platform (ACAP) is a new architecture that combines AI Engines (AIEs) with reconfigurable fabric. This architecture offers significant acceleration potential for uniform recurrences in various domains, such as deep learning, high-performance computation, and signal processing. However, efficiently mapping these computations onto the Versal ACAP architecture while achieving high utilization of AIEs poses a challenge. </p> <p>To address this issue, we propose a mapping scheme called \fname, which aims to accelerate uniform recurrences on the Versal ACAP architecture by leveraging the features of both the hardware and the computations. Considering the array architecture of AIEs, our approach utilizes space-time transformations based on the polyhedral model to generate legally optimized systolic array mappings. Concurrently, we have developed a routing-aware PLIO assignment algorithm tailored for communication on the AIE array, and the algorithm aims at successful compilation while maximizing array utilization. Furthermore, we introduce an automatic mapping framework. This framework is designed to generate the corresponding executable code for uniform recurrences, which encompasses the AIE kernel program, programmable logic bitstreams, and the host program. The experimental results validate the effectiveness of our mapping scheme. Specifically, when applying our scheme to matrix multiplication computations on the VCK5000 board, we achieve a throughput of 4.15TOPS on float data type, which is 1.11$\times$ higher compared to the state-of-the-art accelerator on the Versal ACAP architecture. </p>
back
<p>The development of feedback controllers is undergoing a paradigm shift from $\textit{modelic}$ (model-driven) control to $\textit{datatic}$ (data-driven) control. Stability, as a fundamental property in control, is less well studied in datatic control paradigm. The difficulty is that traditional stability criteria rely on explicit system models, which are not available in those systems with datatic description. Some pioneering works explore stability criteria for datatic systems with special forms such as linear systems, homogeneous systems, and polynomial systems. However, these systems imply too strong assumptions on the inherent connection among data points, which do not hold in general nonlinear systems. This paper proposes a stability verification algorithm for general datatic control systems called $\eta$-testing. Our stability criterion only relies on a weak assumption of Lipschitz continuity so as to extend information from known data points to unmeasured regions. This information restricts the time derivative of any unknown state to the intersection of a set of closed balls. Inside the intersection, the worst-case time derivative of Lyapunov function is estimated by solving a quadratically constrained linear program (QCLP). By comparing the optimal values of QCLPs to zero in the whole state space, a sufficient condition of system stability can be checked. We test our algorithm on three datatic control systems, including both linear and nonlinear ones. Results show that our algorithm successfully verifies the stability, instability, and critical stability of tested systems. </p>
back
<p>We developed an artificial intelligence approach to predict the transfer fee of a football player. This model can help clubs make better decisions about which players to buy and sell, which can lead to improved performance and increased club budgets. Having collected data on player performance, transfer fees, and other factors that might affect a player's value, we then used this data to train a machine learning model that can accurately predict a player's impact on the game. We further passed the obtained results as one of the features to the predictor of transfer fees. The model can help clubs identify players who are undervalued and who could be sold for a profit. It can also help clubs avoid overpaying for players. We believe that our model can be a valuable tool for football clubs. It can help them make better decisions about player recruitment and transfers. </p>
back
<p>Analyzing the health status of patients based on Electronic Health Records (EHR) is a fundamental research problem in medical informatics. The presence of extensive missing values in EHR makes it challenging for deep neural networks to directly model the patient's health status based on EHR. Existing deep learning training protocols require the use of statistical information or imputation models to reconstruct missing values; however, the protocols inject non-realistic data into downstream EHR analysis models, significantly limiting model performance. This paper introduces Learnable Prompt as Pseudo Imputation (PAI) as a new training protocol. PAI no longer introduces any imputed data but constructs a learnable prompt to model the implicit preferences of the downstream model for missing values, resulting in a significant performance improvement for all EHR analysis models. Additionally, our experiments show that PAI exhibits higher robustness in situations of data insufficiency and high missing rates. More importantly, in a real-world application involving cross-institutional data with zero-shot evaluation, PAI demonstrates stronger model generalization capabilities for non-overlapping features. </p>
back
<p>This paper presents a framework that integrates Large Language Models (LLMs) into translation validation, targeting LLVM compiler transformations where formal verification tools are insufficient. Our framework first utilizes existing formal verification frameworks for translation validation. In this work, we use Alive2, a well-known tool in LLVM compiler verification, as an example. When formal verification frameworks are unable to confirm a transformation's soundness, our framework employs fine-tuned LLMs for prediction. It applies fuzzing to transformations predicted as potentially unsound by the LLMs due to return value or memory inconsistencies, aiming to find counterexamples. In cases where transformations are unsound for other reasons or sound, or if no counterexamples emerge, the framework directly reports these outcomes without further fuzzing. This methodology has shown effectiveness in complex areas like deep-learning accelerator design, where traditional tools struggle. </p>
back
<p>In this paper, we propose an online algorithm "mspace" for forecasting node features in temporal graphs, which adeptly captures spatial cross-correlation among different nodes as well as the temporal autocorrelation within a node. The algorithm can be used for both probabilistic and deterministic multi-step forecasting, making it applicable for estimation and generation tasks. Comparative evaluations against various baselines, including graph neural network (GNN) based models and classical Kalman filters, demonstrate that mspace performs at par with the state-of-the-art and even surpasses them on some datasets. Importantly, mspace demonstrates consistent robustness across datasets with varying training sizes, a notable advantage over GNN-based methods requiring abundant training samples to learn the spatiotemporal trends in the data effectively. Therefore, employing mspace is advantageous in scenarios where the training sample availability is limited. Additionally, we establish theoretical bounds on multi-step forecasting error of mspace and show that it scales as $O(q)$ for $q$-step forecast. </p>
back
<p>Robotic arms are highly common in various automation processes such as manufacturing lines. However, these highly capable robots are usually degraded to simple repetitive tasks such as pick-and-place. On the other hand, designing an optimal robot for one specific task consumes large resources of engineering time and costs. In this paper, we propose a novel concept for optimizing the fitness of a robotic arm to perform a specific task based on human demonstration. Fitness of a robot arm is a measure of its ability to follow recorded human arm and hand paths. The optimization is conducted using a modified variant of the Particle Swarm Optimization for the robot design problem. In the proposed approach, we generate an optimal robot design along with the required path to complete the task. The approach could reduce the time-to-market of robotic arms and enable the standardization of modular robotic parts. Novice users could easily apply a minimal robot arm to various tasks. Two test cases of common manufacturing tasks are presented yielding optimal designs and reduced computational effort by up to 92%. </p>
back
<p>Tendon-based underactuated hands are intended to be simple, compliant and affordable. Often, they are 3D printed and do not include tactile sensors. Hence, performing in-hand object recognition with direct touch sensing is not feasible. Adding tactile sensors can complicate the hardware and introduce extra costs to the robotic hand. Also, the common approach of visual perception may not be available due to occlusions. In this paper, we explore whether kinesthetic haptics can provide in-direct information regarding the geometry of a grasped object during in-hand manipulation with an underactuated hand. By solely sensing actuator positions and torques over a period of time during motion, we show that a classifier can recognize an object from a set of trained ones with a high success rate of almost 95%. In addition, the implementation of a real-time majority vote during manipulation further improves recognition. Additionally, a trained classifier is also shown to be successful in distinguishing between shape categories rather than just specific objects. </p>
back
<p>This article motivates, describes, and presents the PBSCSR dataset for studying composer style recognition of piano sheet music. Our overarching goal was to create a dataset for studying composer style recognition that is "as accessible as MNIST and as challenging as ImageNet." To achieve this goal, we sample fixed-length bootleg score fragments from piano sheet music images on IMSLP. The dataset itself contains 40,000 62x64 bootleg score images for a 9-way classification task, 100,000 62x64 bootleg score images for a 100-way classification task, and 29,310 unlabeled variable-length bootleg score images for pretraining. The labeled data is presented in a form that mirrors MNIST images, in order to make it extremely easy to visualize, manipulate, and train models in an efficient manner. Additionally, we include relevant metadata to allow access to the underlying raw sheet music images and other related data on IMSLP. We describe several research tasks that could be studied with the dataset, including variations of composer style recognition in a few-shot or zero-shot setting. For tasks that have previously proposed models, we release code and baseline results for future works to compare against. We also discuss open research questions that the PBSCSR data is especially well suited to facilitate research on and areas of fruitful exploration in future work. </p>
back
<p>In this paper, we distinguish two guessing algorithms for decoding binary linear codes. One is the guessing noise decoding (GND) algorithm, and the other is the guessing codeword decoding (GCD) algorithm. We prove that the GCD is a maximum likelihood (ML) decoding algorithm and that the GCD is more efficient than GND for most practical applications. We also introduce several variants of ordered statistic decoding (OSD) to trade off the complexity of the Gaussian elimination (GE) and that of the guessing, which may find applications in decoding short block codes in the high signal-to-noise ratio (SNR) region. </p>
back
<p>Large Language Models (LLMs), exemplified by ChatGPT, have significantly reshaped text generation, particularly in the realm of writing assistance. While ethical considerations underscore the importance of transparently acknowledging LLM use, especially in scientific communication, genuine acknowledgment remains infrequent. A potential avenue to encourage accurate acknowledging of LLM-assisted writing involves employing automated detectors. Our evaluation of four cutting-edge LLM-generated text detectors reveals their suboptimal performance compared to a simple ad-hoc detector designed to identify abrupt writing style changes around the time of LLM proliferation. We contend that the development of specialized detectors exclusively dedicated to LLM-assisted writing detection is necessary. Such detectors could play a crucial role in fostering more authentic recognition of LLM involvement in scientific communication, addressing the current challenges in acknowledgment practices. </p>
back
<p>Modeling time series data remains a pervasive issue as the temporal dimension is inherent to numerous domains. Despite significant strides in time series forecasting, high noise-to-signal ratio, non-normality, non-stationarity, and lack of data continue challenging practitioners. In response, we leverage a simple representation augmentation technique to overcome these challenges. Our augmented representation acts as a statistical-space prior encoded at each time step. In response, we name our method Statistical-space Augmented Representation (SSAR). The underlying high-dimensional data-generating process inspires our representation augmentation. We rigorously examine the empirical generalization performance on two data sets with two downstream temporal learning algorithms. Our approach significantly beats all five up-to-date baselines. Moreover, the highly modular nature of our approach can easily be applied to various settings. Lastly, fully-fledged theoretical perspectives are available throughout the writing for a clear and rigorous understanding. </p>
back
<p>To reconstruct a 3D human surface from a single image, it is important to consider human pose, shape and clothing details simultaneously. In recent years, a combination of parametric body models (such as SMPL) that capture body pose and shape prior, and neural implicit functions that learn flexible clothing details, has been used to integrate the advantages of both approaches. However, the combined representation introduces additional computation, e.g. signed distance calculation, in 3D body feature extraction, which exacerbates the redundancy of the implicit query-and-infer process and fails to preserve the underlying body shape prior. To address these issues, we propose a novel IUVD-Feedback representation, which consists of an IUVD occupancy function and a feedback query algorithm. With this representation, the time-consuming signed distance calculation is replaced by a simple linear transformation in the IUVD space, leveraging the SMPL UV maps. Additionally, the redundant query points in the query-and-infer process are reduced through a feedback mechanism. This leads to more reasonable 3D body features and more effective query points, successfully preserving the parametric body prior. Moreover, the IUVD-Feedback representation can be embedded into any existing implicit human reconstruction pipelines without modifying the trained neural networks. Experiments on THuman2.0 dataset demonstrate that the proposed IUVD-Feedback representation improves result robustness and achieves three times faster acceleration in the query-and-infer process. Furthermore, this representation has the potential to be used in generative applications by leveraging its inherited semantic information from the parametric body model. </p>
back
<p>The training datasets used in long-tailed recognition are extremely unbalanced, resulting in significant variation in per-class accuracy across categories. Prior works mostly used average accuracy to evaluate their algorithms, which easily ignores those worst-performing categories. In this paper, we aim to enhance the accuracy of the worst-performing categories and utilize the harmonic mean and geometric mean to assess the model's performance. We revive the balanced undersampling idea to achieve this goal. In few-shot learning, balanced subsets are few-shot and will surely under-fit, hence it is not used in modern long-tailed learning. But, we find that it produces a more equitable distribution of accuracy across categories with much higher harmonic and geometric mean accuracy, and, but lower average accuracy. Moreover, we devise a straightforward model ensemble strategy, which does not result in any additional overhead and achieves improved harmonic and geometric mean while keeping the average accuracy almost intact when compared to state-of-the-art long-tailed learning methods. We validate the effectiveness of our approach on widely utilized benchmark datasets for long-tailed learning. Our code is at \href{https://github.com/yuhao318/BTM/}{https://github.com/yuhao318/BTM/}. </p>
back
<p>While subjective assessments have been the gold standard for evaluating speech generation, there is a growing need for objective metrics that are highly correlated with human subjective judgments due to their cost efficiency. This paper proposes reference-aware automatic evaluation methods for speech generation inspired by evaluation metrics in natural language processing. The proposed SpeechBERTScore computes the BERTScore for self-supervised dense speech features of the generated and reference speech, which can have different sequential lengths. We also propose SpeechBLEU and SpeechTokenDistance, which are computed on speech discrete tokens. The evaluations on synthesized speech show that our method correlates better with human subjective ratings than mel cepstral distortion and a recent mean opinion score prediction model. Also, they are effective in noisy speech evaluation and have cross-lingual applicability. </p>
back
<p>We present H2O-Danube-1.8B, a 1.8B language model trained on 1T tokens following the core principles of LLama 2 and Mistral. We leverage and refine various techniques for pre-training large language models. Although our model is trained on significantly fewer total tokens compared to reference models of similar size, it exhibits highly competitive metrics across a multitude of benchmarks. We additionally release a chat model trained with supervised fine-tuning followed by direct preference optimization. We make H2O-Danube-1.8B openly available under Apache 2.0 license further democratizing LLMs to a wider audience economically. </p>
back
<p>Localizing linearly moving sound sources using microphone arrays is particularly challenging as the transient nature of the signal leads to relatively short observation periods. Commonly, a moving focus is used and most methods operate at least partially in the time domain. In contrast, here an inverse source localization algorithm for mono-frequent uniformly moving sources that acts entirely in the frequency domain is presented. For this, a 2.5D approach is utilized and a transfer function between sources and a microphone grid is derived. By solving a least squares problem using the data at the microphone grid, the unknown source distribution in the moving frame can be determined. For that the measured time signals need to be transformed into the frequency domain using a windowed discrete Fourier transform (DFT), which leads to effects such as spectral leakage that depends on the length of the time interval and the analysis window used. To include these effects in the numerical model, the calculation of the transfer matrix is modified using the Fourier transform of the analysis window. Currently, this approach is limited to mono-frequent sources as this allows a simplification of the calculation and reduces the computational effort. The least squares problem is solved using a Tikhonov regularization employing an L-curve approach to determine a suitable regularization parameter. As a moving source is considered, the Doppler effect allows to enhance the stability of the system by combining the transfer functions for multiple frequencies in the measured signals. The performance of the approach is validated using simulated data of a moving point source with or without a reflecting ground. Numerical experiments are performed to show the effect of the choice of frequencies in the receiver spectrum, the effect of the DFT, the frequency of the source, and the distance of source and receiver. </p>
back
<p>Large Language Models (LLMs) have been widely deployed for their remarkable capability to generate texts resembling human language. However, they could be misused by criminals to create deceptive content, such as fake news and phishing emails, which raises ethical concerns. Watermarking is a key technique to mitigate the misuse of LLMs, which embeds a watermark (e.g., a bit string) into a text generated by a LLM. Consequently, this enables the detection of texts generated by a LLM as well as the tracing of generated texts to a specific user. The major limitation of existing watermark techniques is that they cannot accurately or efficiently extract the watermark from a text, especially when the watermark is a long bit string. This key limitation impedes their deployment for real-world applications, e.g., tracing generated texts to a specific user. </p> <p>This work introduces a novel watermarking method for LLM-generated text grounded in \textbf{error-correction codes} to address this challenge. We provide strong theoretical analysis, demonstrating that under bounded adversarial word/token edits (insertion, deletion, and substitution), our method can correctly extract watermarks, offering a provable robustness guarantee. This breakthrough is also evidenced by our extensive experimental results. The experiments show that our method substantially outperforms existing baselines in both accuracy and robustness on benchmark datasets. For instance, when embedding a bit string of length 12 into a 200-token generated text, our approach attains an impressive match rate of $98.4\%$, surpassing the performance of Yoo et al. (state-of-the-art baseline) at $85.6\%$. When subjected to a copy-paste attack involving the injection of 50 tokens to generated texts with 200 words, our method maintains a substantial match rate of $90.8\%$, while the match rate of Yoo et al. diminishes to below $65\%$. </p>
back
<p>Multi-modal large language models (MLLMs) have demonstrated remarkable success in vision and visual-language tasks within the natural image domain. Owing to the significant diversities between the natural image and RS image hinder the development of MLLMs in the remote sensing (RS) domain. Currently, the unified and powerful MLLM capable of various RS visual tasks is still under-explored. To fill the gap, a pioneer MLLM called EarthGPT is proposed for universal RS image comprehension, which integrates various multi-sensor RS interpretation tasks uniformly. More importantly, a large-scale multi-sensor multi-modal RS instruction-following dataset named MMRS is carefully constructed, which comprises 1005.842k image-text pairs based on 34 existing diverse RS datasets and includes multi-sensor images such as optical, synthetic aperture radar (SAR), and infrared. The MMRS addresses the issue of MLLMs lacking RS expert knowledge and stimulates the development of MMLMs in the RS domain. Extensive experiments demonstrate the EarthGPT's superior performance in various RS visual interpretation tasks compared with the other specialist models and MLLMs, which proves the effectiveness of the proposed EarthGPT and provides a versatile paradigm for open-set reasoning tasks. </p>
back
<p>With the growth of coal production, the load on the production capacity of coal enterprises also increases, which leads to a concomitant increase in dust formation in both opencast and underground methods of mining coal deposits. Dust, generated during drilling, blasting operations, excavation, loading, crushing and transportation of mined rock is one of the factors that has a negative impact on the health of mining workers and on the level of environmental pollution with solid particles. Thus, increasing the efficiency of controlling the concentration of solid particles in the mine atmosphere and dust deposits is an urgent scientific and technical task. In doing so, the use of modern digital technologies within the framework of the industry 4.0 concept makes it possible to develop approaches that can significantly improve the quality of monitoring the state of the mine atmosphere at coal mining enterprises. This article provides a theoretical basis and test results for a system for continuous automatic monitoring of dust concentration in a mine atmosphere as the component of the multifunctional coal mine safety system. It is shown that monitoring the state of mine workings aerological safety can be carried out in real time through the system of the new generation using artificial intelligence. The ability of the proposed system to measure basic physical parameters affecting dust deposition (disperse composition, air humidity, dust concentration and air flow velocity) is noted. </p>
back
<p>In current virtual try-on tasks, only the effect of clothing worn on a person is depicted. In practical applications, users still need to select suitable clothing from a vast array of individual clothing items, but existing clothes may not be able to meet the needs of users. Additionally, some user groups may be uncertain about what clothing combinations suit them and require clothing selection recommendations. However, the retrieval-based recommendation methods cannot meet users' personalized needs, so we propose the Generative Fashion Matching-aware Virtual Try-on Framework(GMVT). We generate coordinated and stylistically diverse clothing for users using the Generative Matching Module. In order to effectively learn matching information, we leverage large-scale matching dataset, and transfer this acquired knowledge to the current virtual try-on domain. Furthermore, we utilize the Virtual Try-on Module to visualize the generated clothing on the user's body. To validate the effectiveness of our approach, we enlisted the expertise of fashion designers for a professional evaluation, assessing the rationality and diversity of the clothing combinations and conducting an evaluation matrix analysis. Our method significantly enhances the practicality of virtual try-on, offering users a wider range of clothing choices and an improved user experience. </p>
back
<p>This work focuses on distributed linear precoding when users transmit correlated information over a fading Multiple-Input and Multiple-Output Multiple Access Channel. Precoders are optimized in order to minimize the sum-Mean Square Error (MSE) between the source and the estimated symbols. When sources are correlated, minimizing the sum-MSE results in a non-convex optimization problem. Precoders for an arbitrary number of users and transmit and receive antennas are thus obtained via a projected steepest-descent algorithm and a low-complexity heuristic approach. For the more restrictive case of two single-antenna users, a closed-form expression for the minimum sum-MSE precoders is derived. Moreover, for the scenario with a single receive antenna and any number of users, a solution is obtained by means of a semidefinite relaxation. Finally, we also consider precoding schemes where the precoders are decomposed into complex scalars and unit norm vectors. Simulation results show a significant improvement when source correlation is exploited at precoding, especially for low SNRs and when the number of receive antennas is lower than the number of transmitting nodes. </p>
back
<p>Fluidic logic circuitry analogous to its electric counterpart could potentially provide soft robots with machine intelligence due to its supreme adaptability, dexterity, and seamless compatibility using state-of-the-art additive manufacturing processes. However, conventional microfluidic channel based circuitry suffers from limited driving force, while macroscopic pneumatic logic lacks timely responsivity and desirable accuracy. Producing heavy duty, highly responsive and integrated fluidic soft robotic circuitry for control and actuation purposes for biomedical applications has yet to be accomplished in a hydraulic manner. Here, we present a 3D printed hydraulic fluidic half-adder system, composing of three basic hydraulic fluidic logic building blocks: AND, OR, and NOT gates. Furthermore, a hydraulic soft robotic half-adder system is implemented using an XOR operation and modified dual NOT gate system based on an electrical oscillator structure. This half-adder system possesses binary arithmetic capability as a key component of arithmetic logic unit in modern computers. With slight modifications, it can realize the control over three different directions of deformation of a three degree-of-freedom soft actuation mechanism solely by changing the states of the two fluidic inputs. This hydraulic fluidic system utilizing a small number of inputs to control multiple distinct outputs, can alter the internal state of the circuit solely based on external inputs, holding significant promises for the development of microfluidics, fluidic logic, and intricate internal systems of untethered soft robots with machine intelligence. </p>
back
<p>This paper presents LatentPatch, a new method for generating realistic images from a small dataset of only a few images. We use a lightweight model with only a few thousand parameters. Unlike traditional few-shot generation methods that finetune pre-trained large-scale generative models, our approach is computed directly on the latent distribution by sequential feature matching, and is explainable by design. Avoiding large models based on transformers, recursive networks, or self-attention, which are not suitable for small datasets, our method is inspired by non-parametric texture synthesis and style transfer models, and ensures that generated image features are sampled from the source distribution. We extend previous single-image models to work with a few images and demonstrate that our method can generate realistic images, as well as enable conditional sampling and image editing. We conduct experiments on face datasets and show that our simplistic model is effective and versatile. </p>
back
<p>A maximal planar graph is a graph which can be embedded in the plane such that every face of the graph is a triangle. The center of a graph is the subgraph induced by the vertices of minimum eccentricity. We introduce the notion of quasi-eccentric vertices, and use this to characterize maximal planar graphs that are the center of some planar graph. We also present some easier to check only necessary / only sufficient conditions for planar and maximal planar graphs to be the center of a planar graph. Finally, we use the aforementioned characterization to prove that all maximal planar graphs of order at most 8 are the center of some planar graph -- and this bound is sharp. </p>
back
<p>Knowledge Tracing (KT) aims to predict the future performance of students by tracking the development of their knowledge states. Despite all the recent progress made in this field, the application of KT models in education systems is still restricted from the data perspectives: 1) limited access to real life data due to data protection concerns, 2) lack of diversity in public datasets, 3) noises in benchmark datasets such as duplicate records. To resolve these problems, we simulated student data with three statistical strategies based on public datasets and tested their performance on two KT baselines. While we observe only minor performance improvement with additional synthetic data, our work shows that using only synthetic data for training can lead to similar performance as real data. </p>
back
<p>Polar codes were originally specified for codelengths that are powers of two. In many applications, it is desired to have a code that is not restricted to such lengths. Two common strategies of modifying the length of a code are shortening and puncturing. Simple and explicit schemes for shortening and puncturing were introduced by Wang and Liu, and by Niu, Chen, and Lin, respectively. In this paper, we prove that both schemes yield polar codes that are capacity achieving. Moreover, the probability of error for both the shortened and the punctured polar codes decreases to zero at the same exponential rate as seminal polar codes. These claims hold for \emph{all} codelengths large enough. </p>
back
<p>We consider a non-conservative nonlinear Schrodinger equation (NCNLS) with time-dependent coefficients, inspired by a water waves problem. This problem does not have mass or energy conservation, but instead mass and energy change in time under explicit balance laws. In this paper we extend to the particular NCNLS two numerical schemes which are known to conserve energy and mass in the discrete level for the cubic NLS. Both schemes are second oder accurate in time, and we prove that their extensions satisfy discrete versions of the mass and energy balance laws for the NCNLS. The first scheme is a relaxation scheme that is linearly implicit. The other scheme is a modified Delfour-Fortin-Payre scheme and it is fully implicit. Numerical results show that both schemes capture robustly the correct values of mass and energy, even in strongly non-conservative problems. We finally compare the two numerical schemes and discuss their performance. </p>
back
<p>Nonnegative Matrix Factorization (NMF) is an important unsupervised learning method to extract meaningful features from data. To address the NMF problem within a polynomial time framework, researchers have introduced a separability assumption, which has recently evolved into the concept of coseparability. This advancement offers a more efficient core representation for the original data. However, in the real world, the data is more natural to be represented as a multi-dimensional array, such as images or videos. The NMF's application to high-dimensional data involves vectorization, which risks losing essential multi-dimensional correlations. To retain these inherent correlations in the data, we turn to tensors (multidimensional arrays) and leverage the tensor t-product. This approach extends the coseparable NMF to the tensor setting, creating what we term coseparable Nonnegative Tensor Factorization (NTF). In this work, we provide an alternating index selection method to select the coseparable core. Furthermore, we validate the t-CUR sampling theory and integrate it with the tensor Discrete Empirical Interpolation Method (t-DEIM) to introduce an alternative, randomized index selection process. These methods have been tested on both synthetic and facial analysis datasets. The results demonstrate the efficiency of coseparable NTF when compared to coseparable NMF. </p>
back
<p>The verification of liveness conditions is an important aspect of state-based rigorous methods. This article investigates this problem in a fragment $\square$LTL of the logic LTL(EB), the integration of the UNTIL-fragment of Pnueli's linear time temporal logic (LTL) and the logic of Event-B, in which the most commonly used liveness conditions can be expressed. For this fragment a sound set of derivation rules is developed, which is also complete under mild restrictions for Event-B machines. </p>
back
<p>We present a novel software feature for the BrainScaleS-2 accelerated neuromorphic platform that facilitates the emulation of partitioned large-scale spiking neural networks. This approach is well suited for many deep spiking neural networks, where the constraint of the largest recurrent subnetwork fitting on the substrate or the limited fan-in of neurons is often not a limitation in practice. We demonstrate the training of two deep spiking neural network models, using the MNIST and EuroSAT datasets, that exceed the physical size constraints of a single-chip BrainScaleS-2 system. The ability to emulate and train networks larger than the substrate provides a pathway for accurate performance evaluation in planned or scaled systems, ultimately advancing the development and understanding of large-scale models and neuromorphic computing architectures. </p>
back
<p>Traditional neuromorphic hardware architectures rely on event-driven computation, where the asynchronous transmission of events, such as spikes, triggers local computations within synapses and neurons. While machine learning frameworks are commonly used for gradient-based training, their emphasis on dense data structures poses challenges for processing asynchronous data such as spike trains. This problem is particularly pronounced for typical tensor data structures. In this context, we present a novel library (jaxsnn) built on top of JAX, that departs from conventional machine learning frameworks by providing flexibility in the data structures used and the handling of time, while maintaining Autograd functionality and composability. Our library facilitates the simulation of spiking neural networks and gradient estimation, with a focus on compatibility with time-continuous neuromorphic backends, such as the BrainScaleS-2 system, during the forward pass. This approach opens avenues for more efficient and flexible training of spiking neural networks, bridging the gap between traditional neuromorphic architectures and contemporary machine learning frameworks. </p>
back
<p>Cybersecurity remains a critical challenge in the digital age, with network traffic flow anomaly detection being a key pivotal instrument in the fight against cyber threats. In this study, we address the prevalent issue of data integrity in network traffic datasets, which are instrumental in developing machine learning (ML) models for anomaly detection. We introduce two refined versions of the CICIDS-2017 dataset, NFS-2023-nTE and NFS-2023-TE, processed using NFStream to ensure methodologically sound flow expiration and labeling. Our research contrasts the performance of the Random Forest (RF) algorithm across the original CICIDS-2017, its refined counterparts WTMC-2021 and CRiSIS-2022, and our NFStream-generated datasets, in both binary and multi-class classification contexts. We observe that the RF model exhibits exceptional robustness, achieving consistent high-performance metrics irrespective of the underlying dataset quality, which prompts a critical discussion on the actual impact of data integrity on ML efficacy. Our study underscores the importance of continual refinement and methodological rigor in dataset generation for network security research. As the landscape of network threats evolves, so must the tools and techniques used to detect and analyze them. </p>
back
<p>Congestion pricing, while adopted by many cities to alleviate traffic congestion, raises concerns about widening socioeconomic disparities due to its disproportionate impact on low-income travelers. In this study, we address this concern by proposing a new class of congestion pricing schemes that not only minimize congestion levels but also incorporate an equity objective to reduce cost disparities among travelers with different willingness-to-pay. Our analysis builds on a congestion game model with heterogeneous traveler populations. We present four pricing schemes that account for practical considerations, such as the ability to charge differentiated tolls to various traveler populations and the option to toll all or only a subset of edges in the network. We evaluate our pricing schemes in the calibrated freeway network of the San Francisco Bay Area. We demonstrate that the proposed congestion pricing schemes improve both efficiency (in terms of reduced average travel time) and equity (the disparities of travel costs experienced by different populations) compared to the current pricing scheme. Moreover, our pricing schemes also generate a total revenue comparable to the current pricing scheme. Our results further show that pricing schemes charging differentiated prices to traveler populations with varying willingness-to-pay lead to a more equitable distribution of travel costs compared to those that charge a homogeneous price to all. </p>
back
<p>Newspapers are important sources for historians interested in past societies' cultural values, social structures, and their changes. Since the 19th century, newspapers have been widely available and spread regionally. Today, historical newspapers are digitized but unavailable in a separate metadata-enhanced form. Machine-readable metadata, however, is a prerequisite for a mass statistical analysis of this source. This paper focuses on parsing the complex layout of historic newspaper pages, which today's machines do not understand well. We argue for using neural networks, which require detailed annotated data in large numbers. Our Bonn newspaper dataset consists of 486 pages of the \textit{K\"olnische Zeitung} from the years 1866 and 1924. We propose solving the newspaper-understanding problem by training a U-Net on our new dataset, which delivers satisfactory performance. </p>
back
<p>A robust multichannel speaker diarization and separation system is proposed by exploiting the spatio-temporal activity of the speakers. The system is realized in a hybrid architecture that combines the array signal processing units and the deep learning units. For speaker diarization, a spatial coherence matrix across time frames is computed based on the whitened relative transfer functions (wRTFs) of the microphone array. This serves as a robust feature for subsequent machine learning without the need for prior knowledge of the array configuration. A computationally efficient Spatial Activity-driven Speaker Diarization network (SASDnet) is constructed to estimate the speaker activity directly from the spatial coherence matrix. For speaker separation, we propose the Global and Local Activity-driven Speaker Extraction network (GLASEnet) to separate speaker signals via speaker-specific global and local spatial activity functions. The local spatial activity functions depend on the coherence between the wRTFs of each time-frequency bin and the target speaker-dominant bins. The global spatial activity functions are computed from the global spatial coherence functions based on frequency-averaged local spatial activity functions. Experimental results have demonstrated superior speaker, diarization, counting, and separation performance achieved by the proposed system with low computational complexity compared to the pre-selected baselines. </p>
back
<p>This paper presents a new approach that integrates deep learning with computational chess, using both the Mixture of Experts (MoE) method and Monte-Carlo Tree Search (MCTS). Our methodology employs a suite of specialized models, each designed to respond to specific changes in the game's input data. This results in a framework with sparsely activated models, which provides significant computational benefits. Our framework combines the MoE method with MCTS, in order to align it with the strategic phases of chess, thus departing from the conventional ``one-for-all'' model. Instead, we utilize distinct game phase definitions to effectively distribute computational tasks across multiple expert neural networks. Our empirical research shows a substantial improvement in playing strength, surpassing the traditional single-model framework. This validates the efficacy of our integrated approach and highlights the potential of incorporating expert knowledge and strategic principles into neural network design. The fusion of MoE and MCTS offers a promising avenue for advancing machine learning architectures. </p>
back
<p>The design of zero-delay Joint Source-Channel Coding (JSCC) schemes for the transmission of correlated information over fading Multiple Access Channels (MACs) is an interesting problem for many communication scenarios like Wireless Sensor Networks (WSNs). Among the different JSCC schemes so far proposed for this scenario, Distributed Quantizer Linear Coding (DQLC) represents an appealing solution since it is able to outperform uncoded transmissions for any correlation level at high Signal-to-Noise Ratios (SNRs) with a low computational cost. In this paper, we extend the design of DQLC-based schemes for fading MACs considering sphere decoding to make the optimal Minimum Mean Squared Error (MMSE) estimation computationally affordable for an arbitrary number of transmit users. The use of sphere decoding also allows to formulate a practical algorithm for the optimization of DQLC-based systems. Finally, non-linear Kalman Filtering for the DQLC is considered to jointly exploit the temporal and spatial correlation of the source symbols. The results of computer experiments show that the proposed DQLC scheme with the Kalman Filter decoding approach clearly outperforms uncoded transmissions for medium and high SNRs. </p>
back
<p>This paper presents a novel solution concept, called BAR Nash Equilibrium (BARNE) and apply it to analyse the Verifier's dilemma, a fundamental problem in blockchain. Our solution concept adapts the Nash equilibrium (NE) to accommodate interactions among Byzantine, altruistic and rational agents, which became known as the BAR setting in the literature. We prove the existence of BARNE in a large class of games and introduce two natural refinements, global and local stability. Using this equilibrium and its refinement, we analyse the free-rider problem in the context of byzantine consensus. We demonstrate that by incorporating fines and forced errors into a standard quorum-based blockchain protocol, we can effectively reestablish honest behavior as a globally stable BARNE. </p>
back
<p>Wasserstein distortion is a one-parameter family of distortion measures that was recently proposed to unify fidelity and realism constraints. After establishing continuity results for Wasserstein in the extreme cases of pure fidelity and pure realism, we prove the first coding theorems for compression under Wasserstein distortion focusing on the regime in which both the rate and the distortion are small. </p>
back
<p>Current image manipulation primarily centers on static manipulation, such as replacing specific regions within an image or altering its overall style. In this paper, we introduce an innovative dynamic manipulation task, subject repositioning. This task involves relocating a user-specified subject to a desired position while preserving the image's fidelity. Our research reveals that the fundamental sub-tasks of subject repositioning, which include filling the void left by the repositioned subject, reconstructing obscured portions of the subject and blending the subject to be consistent with surrounding areas, can be effectively reformulated as a unified, prompt-guided inpainting task. Consequently, we can employ a single diffusion generative model to address these sub-tasks using various task prompts learned through our proposed task inversion technique. Additionally, we integrate pre-processing and post-processing techniques to further enhance the quality of subject repositioning. These elements together form our SEgment-gEnerate-and-bLEnd (SEELE) framework. To assess SEELE's effectiveness in subject repositioning, we assemble a real-world subject repositioning dataset called ReS. Our results on ReS demonstrate the quality of repositioned image generation. </p>
back
<p>Recently, low-resource dialogue state tracking (DST) has received increasing attention. First obtaining state values then based on values to generate slot types has made great progress in this task. However, obtaining state values is still an under-studied problem. Existing extraction-based approaches cannot capture values that require the understanding of context and are not generalizable either. To address these issues, we propose a novel State VAlue Generation based framework (SVAG), decomposing DST into state value generation and domain slot generation. Specifically, we propose to generate state values and use self-training to further improve state value generation. Moreover, we design an estimator aiming at detecting incomplete generation and incorrect generation for pseudo-labeled data selection during self-training. Experimental results on the MultiWOZ 2.1 dataset show that our method which has only less than 1 billion parameters achieves state-of-the-art performance under the data ratio settings of 5%, 10%, and 25% when limited to models under 100 billion parameters. Compared to models with more than 100 billion parameters, SVAG still reaches competitive results. </p>
back
<p>This white paper outlines a long-term scientific vision for the development of digital-democracy technology. We contend that if digital democracy is to meet the ambition of enabling a participatory renewal in our societies, then a comprehensive multi-methods research effort is required that could, over the years, support its development in a democratically principled, empirically and computationally informed way. The paper is co-authored by an international and interdisciplinary team of researchers and arose from the Lorentz Center Workshop on ``Algorithmic Technology for Democracy'' (Leiden, October 2022). </p>
back
<p>Since Google introduced Kotlin as an official programming language for developing Android apps in 2017, Kotlin has gained widespread adoption in Android development. However, compared to Java, there is limited support for Kotlin code dependency analysis, which is the foundation to software analysis. To bridge this gap, we developed Depends-Kotlin to extract entities and their dependencies in Kotlin source code. Not only does Depends-Kotlin support extracting entities' dependencies in Kotlin code, but it can also extract dependency relations between Kotlin and Java. The extraction of such cross-language dependencies can help developers understand the migration process from Java to Kotlin. Additionally, we used a Java project with confirmed dependencies as a benchmark and converted this project to two projects: Kotlin-only and a combination of Kotlin and Java. The dependencies in these two projects were then extracted using our tool. The consistency observed among dependency relations in all three projects confirms the accuracy of Depends-Kotlin. Furthermore, the performance of Depends-Kotlin was assessed using another three projects of varying sizes. The source code of Depends-Kotlin and the dataset used in this demo paper have been uploaded to https://github.com/XYZboom/depends-kotlin. We also provided a screencast presenting Depends-Kotlin https://youtu.be/daZuXOwn1Ls. </p>
back
<p>Centralized repair refers to repairing $h\geq 2$ node failures using $d$ helper nodes in a centralized way, where the repair bandwidth is counted by the total amount of data downloaded from the helper nodes. A centralized MSR code is an MDS array code with $(h,d)$-optimal repair for some $h$ and $d$. In this paper, we present several classes of centralized MSR codes with small sub-packetization. At first, we construct an alternative MSR code with $(1,d_i)$-optimal repair for multiple repair degrees $d_i$ simultaneously. Based on the code structure, we are able to construct a centralized MSR code with $(h_i,d_i)$-optimal repair property for all possible $(h_i,d_i)$ with $h_i\mid (d_i-k)$ simultaneously. The sub-packetization is no more than ${\rm lcm}(1,2,\ldots,n-k)(n-k)^n$, which is much smaller than a previous work given by Ye and Barg ($({\rm lcm}(1,2,\ldots,n-k))^n$). Moreover, for general parameters $2\leq h\leq n-k$ and $k\leq d\leq n-h$, we further give a centralized MSR code enabling $(h,d)$-optimal repair with sub-packetization smaller than all previous works. </p>
back
<p>The transformation model is an essential component of any deformable image registration approach. It provides a representation of physical deformations between images, thereby defining the range and realism of registrations that can be found. Two types of transformation models have emerged as popular choices: B-spline models and mesh models. Although both models have been investigated in detail, a direct comparison has not yet been made, since the models are optimized using very different optimization methods in practice. B-spline models are predominantly optimized using gradient-descent methods, while mesh models are typically optimized using finite-element method solvers or evolutionary algorithms. Multi-objective optimization methods, which aim to find a diverse set of high-quality trade-off registrations, are increasingly acknowledged to be important in deformable image registration. Since these methods search for a diverse set of registrations, they can provide a more complete picture of the capabilities of different transformation models, making them suitable for a comparison of models. In this work, we conduct the first direct comparison between B-spline and mesh transformation models, by optimizing both models with the same state-of-the-art multi-objective optimization method, the Multi-Objective Real-Valued Gene-pool Optimal Mixing Evolutionary Algorithm (MO-RV-GOMEA). The combination with B-spline transformation models, moreover, is novel. We experimentally compare both models on two different registration problems that are both based on pelvic CT scans of cervical cancer patients, featuring large deformations. Our results, on three cervical cancer patients, indicate that the choice of transformation model can have a profound impact on the diversity and quality of achieved registration outcomes. </p>
back
<p>This paper proposes a node flux-linkage synchronizing control method (NFSCM) for power systems with 100% wind power generation based on a capacitor voltage balancing scheme (CVBS). Different from the conventional grid-forming controllers, NFSCM is designed to regulate inverters as virtual flux-linkage sources. Auto-synchronization of flux-linkage vectors is achieved through the CVBS-based NFSCM. The mismatch among the angular frequencies of flux-linkage vectors is eliminated by regulating the tracking errors of DC-link voltages, which establishes a negative feedback between the output frequency and active power of the inverter. NFSCM is adaptive to weak and strong grids. It avoids the excitation inrush currents in the step-up transformer of wind power generators. It also eliminates the DC components of the three-phase currents, and avoids low-frequency oscillations in active power. In order to limit the short-circuit current of inverters, a logic-based bang-bang funnel control (LBFC) is designed to control the switches of inverter bridges when over-current is detected. LBFC is able to restrict various fault currents within an acceptable range within the shortest time. LBFC and NFSCM are designed to operate in a switched manner according to a state-dependent switching strategy. Time-domain simulations were conducted on a 100% wind power generation test system, and the performance of NFSCM and LBFC were investigated. </p>
back
<p>RISC-V processors encounter substantial challenges in deploying multi-precision deep neural networks (DNNs) due to their restricted precision support, constrained throughput, and suboptimal dataflow design. To tackle these challenges, a scalable RISC-V vector (RVV) processor, namely SPEED, is proposed to enable efficient multi-precision DNN inference by innovations from customized instructions, hardware architecture, and dataflow mapping. Firstly, dedicated customized RISC-V instructions are proposed based on RVV extensions, providing SPEED with fine-grained control over processing precision ranging from 4 to 16 bits. Secondly, a parameterized multi-precision systolic array unit is incorporated within the scalable module to enhance parallel processing capability and data reuse opportunities. Finally, a mixed multi-precision dataflow strategy, compatible with different convolution kernels and data precision, is proposed to effectively improve data utilization and computational efficiency. We perform synthesis of SPEED in TSMC 28nm technology. The experimental results demonstrate that SPEED achieves a peak throughput of 287.41 GOPS and an energy efficiency of 1335.79 GOPS/W at 4-bit precision condition, respectively. Moreover, when compared to the pioneer open-source vector processor Ara, SPEED provides an area efficiency improvement of 2.04$\times$ and 1.63$\times$ under 16-bit and 8-bit precision conditions, respectively, which shows SPEED's significant potential for efficient multi-precision DNN inference. </p>
back
<p>Linear optical quantum computing (LOQC) offers a quantum computation paradigm based on well-established and robust technology and flexible environmental conditions following DiVincenzo's criteria. Within this framework, integrated photonics can be utilized to achieve gate-based quantum computing, defining qubits by path-encoding, quantum gates through the use of Mach-Zehnder interferometers (MZIs) as fundamental building blocks, and measurements through single-photon detectors. In particular, universal two-qubit gates can be achieved by suitable structures of MZIs together with post-selection or heralding. The most resource-efficient choice is given by the post-selected CZ gate. However, this implementation is characterized by a design which has a non-regular structure and cannot be cascaded. This limits the implementation of large-scale LOQC. Starting from these issues, we suggest an approach to move toward a universal and scalable LOQC on the integrated photonic platform. First of all, choosing the post-selected CZ as universal two-qubit gate, we extend the path-encoded dual-rail qubit to a triplet of waveguides, composed of an auxiliary waveguide and the pair of waveguides corresponding to the qubit basis states. Additionally, we introduce a swap photonic network that maps the regularly-labeled structure of the new path-encoded qubits to the structure needed for the post-selected CZ. We also discuss the optical swap gate that allows the connection of non-nearest neighbor path-encoded qubits. In this way, we can deterministically exchange the locations of the qubits and execute controlled quantum gates between any path-encoded qubits. Next, by truncating the auxiliary waveguides after any post-selected CZ, we find that it is possible to cascade this optical gate when it acts on different pairs that share only one qubit. </p>
back
<p>Classification based on Zero-shot Learning (ZSL) is the ability of a model to classify inputs into novel classes on which the model has not previously seen any training examples. Providing an auxiliary descriptor in the form of a set of attributes describing the new classes involved in the ZSL-based classification is one of the favored approaches to solving this challenging task. In this work, inspired by Hyperdimensional Computing (HDC), we propose the use of stationary binary codebooks of symbol-like distributed representations inside an attribute encoder to compactly represent a computationally simple end-to-end trainable model, which we name Hyperdimensional Computing Zero-shot Classifier~(HDC-ZSC). It consists of a trainable image encoder, an attribute encoder based on HDC, and a similarity kernel. We show that HDC-ZSC can be used to first perform zero-shot attribute extraction tasks and, can later be repurposed for Zero-shot Classification tasks with minimal architectural changes and minimal model retraining. HDC-ZSC achieves Pareto optimal results with a 63.8% top-1 classification accuracy on the CUB-200 dataset by having only 26.6 million trainable parameters. Compared to two other state-of-the-art non-generative approaches, HDC-ZSC achieves 4.3% and 9.9% better accuracy, while they require more than 1.85x and 1.72x parameters compared to HDC-ZSC, respectively. </p>
back
<p>Emotions are crucial in human life, influencing perceptions, relationships, behaviour, and choices. Emotion recognition using Electroencephalography (EEG) in the Brain-Computer Interface (BCI) domain presents significant challenges, particularly the need for extensive datasets. This study aims to generate synthetic EEG samples that are similar to real samples but are distinct by augmenting noise to a conditional denoising diffusion probabilistic model, thus addressing the prevalent issue of data scarcity in EEG research. The proposed method is tested on the DEAP dataset, showcasing a 1.94% improvement in classification performance when using synthetic data. This is higher compared to the traditional GAN-based and DDPM-based approaches. The proposed diffusion-based approach for EEG data generation appears promising in refining the accuracy of emotion recognition systems and marks a notable contribution to EEG-based emotion recognition. Our research further evaluates the effectiveness of state-of-the-art classifiers on EEG data, employing both real and synthetic data with varying noise levels. </p>
back
<p>The considered optimal control problem of a stochastic power system, is to select the set of power supply vectors which infimizes the probability that the phase-angle differences of any power flow of the network, endangers the transient stability of the power system by leaving a critical subset. The set of control laws is restricted to be a periodically recomputed set of fixed power supply vectors based on predictions of power demand for the next short horizon. Neither state feedback nor output feedback is used. The associated control objective function is Lipschitz continuous, nondifferentiable, and nonconvex. The results of the paper include that a minimum exists in the value range of the control objective function. Furthermore, it includes a two-step procedure to compute an approximate minimizer based on two key methods: (1) a projected generalized subgradient method for computing an initial vector, and (2) a steepest descent method for approximating a local minimizer. Finally, it includes two convergence theorems that an approximation sequence converges to a local minimum. </p>
back
<p>We propose a local modification of the standard subdiffusion model by introducing the initial Fickian diffusion, which results in a multiscale diffusion model. The developed model resolves the incompatibility between the nonlocal operators in subdiffusion and the local initial conditions and thus eliminates the initial singularity of the solutions of the subdiffusion, while retaining its heavy tail behavior away from the initial time. The well-posedness of the model and high-order regularity estimates of its solutions are analyzed by resolvent estimates, based on which the numerical discretization and analysis are performed. Numerical experiments are carried out to substantiate the theoretical findings. </p>
back
<p>Medical image semantic segmentation techniques can help identify tumors automatically from computed tomography (CT) scans. In this paper, we propose a Contextual and Attentional feature Fusions enhanced Convolutional Neural Network (CNN) and Transformer hybrid network (CAFCT) model for liver tumor segmentation. In the proposed model, three other modules are introduced in the network architecture: Attentional Feature Fusion (AFF), Atrous Spatial Pyramid Pooling (ASPP) of DeepLabv3, and Attention Gates (AGs) to improve contextual information related to tumor boundaries for accurate segmentation. Experimental results show that the proposed CAFCT achieves a mean Intersection over Union (IoU) of 90.38% and Dice score of 86.78%, respectively, on the Liver Tumor Segmentation Benchmark (LiTS) dataset, outperforming pure CNN or Transformer methods, e.g., Attention U-Net, and PVTFormer. </p>
back
<p>Earlier papers \cite{VB2022,VB2023a} introduced the notions of a core and an index of a relation (an index being a special case of a core). A limited form of the axiom of choice was postulated -- specifically that all partial equivalence relations (pers) have an index -- and the consequences of adding the axiom to axiom systems for point-free reasoning were explored. In this paper, we define a partial ordering on relations, which we call the \textsf{thins} ordering. We show that our axiom of choice is equivalent to the property that core relations are the minimal elements of the \textsf{thins} ordering. We also postulate a novel axiom that guarantees that, when \textsf{thins} is restricted to non-empty pers, equivalence relations are maximal. This and other properties of \textsf{thins} provide further evidence that our axiom of choice is a desirable means of strengthening point-free reasoning on relations. </p> <p>Although our novel axiom is valid for concrete relations and is a sufficient condition for characterising maximality, we show that it is not a necessary condition in the abstract point-free algebra. This leaves open the problem of deriving a necessary and sufficient condition. </p>
back
<p>This paper presents a comprehensive study on using deep reinforcement learning (RL) to create dynamic locomotion controllers for bipedal robots. Going beyond focusing on a single locomotion skill, we develop a general control solution that can be used for a range of dynamic bipedal skills, from periodic walking and running to aperiodic jumping and standing. Our RL-based controller incorporates a novel dual-history architecture, utilizing both a long-term and short-term input/output (I/O) history of the robot. This control architecture, when trained through the proposed end-to-end RL approach, consistently outperforms other methods across a diverse range of skills in both simulation and the real world.The study also delves into the adaptivity and robustness introduced by the proposed RL system in developing locomotion controllers. We demonstrate that the proposed architecture can adapt to both time-invariant dynamics shifts and time-variant changes, such as contact events, by effectively using the robot's I/O history. Additionally, we identify task randomization as another key source of robustness, fostering better task generalization and compliance to disturbances. The resulting control policies can be successfully deployed on Cassie, a torque-controlled human-sized bipedal robot. This work pushes the limits of agility for bipedal robots through extensive real-world experiments. We demonstrate a diverse range of locomotion skills, including: robust standing, versatile walking, fast running with a demonstration of a 400-meter dash, and a diverse set of jumping skills, such as standing long jumps and high jumps. </p>
back
<p>In the field of distributed computing by robot swarms, the research comprehends manifold models where robots operate in the Euclidean plane through a sequence of look-compute-move cycles. Models under study differ for (i) the possibility of storing constant-size information, (ii) the possibility of communicating constant-size information, and (iii) the synchronization mode. By varying features (i,ii), we obtain the noted four base models: OBLOT (silent and oblivious robots), FSTA (silent and finite-state robots), FCOM (oblivious and finite-communication robots), and LUMI (finite-state and finite-communication robots). Combining each base model with the three main synchronization modes (fully synchronous, semi-synchronous, and asynchronous), we obtain the well-known 12 models. Extensive research has studied their computational power, proving the hierarchical relations between different models. However, only transparent robots have been considered. In this work, we study the taxonomy of the 12 models considering collision-intolerant opaque robots. We present six witness problems that prove the majority of the computational relations between the 12 models. In particular, the last witness problem depicts a peculiar issue occurring in the case of obstructed visibility and asynchrony. </p>
back
<p>Although multilingual language models exhibit impressive cross-lingual transfer capabilities on unseen languages, the performance on downstream tasks is impacted when there is a script disparity with the languages used in the multilingual model's pre-training data. Using transliteration offers a straightforward yet effective means to align the script of a resource-rich language with a target language, thereby enhancing cross-lingual transfer capabilities. However, for mixed languages, this approach is suboptimal, since only a subset of the language benefits from the cross-lingual transfer while the remainder is impeded. In this work, we focus on Maltese, a Semitic language, with substantial influences from Arabic, Italian, and English, and notably written in Latin script. We present a novel dataset annotated with word-level etymology. We use this dataset to train a classifier that enables us to make informed decisions regarding the appropriate processing of each token in the Maltese language. We contrast indiscriminate transliteration or translation to mixing processing pipelines that only transliterate words of Arabic origin, thereby resulting in text with a mixture of scripts. We fine-tune the processed data on four downstream tasks and show that conditional transliteration based on word etymology yields the best results, surpassing fine-tuning with raw Maltese or Maltese processed with non-selective pipelines. </p>
back
<p>Sliced optimal transport, which is basically a Radon transform followed by one-dimensional optimal transport, became popular in various applications due to its efficient computation. In this paper, we deal with sliced optimal transport on the sphere $\mathbb{S}^{d-1}$ and on the rotation group SO(3). We propose a parallel slicing procedure of the sphere which requires again only optimal transforms on the line. We analyze the properties of the corresponding parallelly sliced optimal transport, which provides in particular a rotationally invariant metric on the spherical probability measures. For SO(3), we introduce a new two-dimensional Radon transform and develop its singular value decomposition. Based on this, we propose a sliced optimal transport on SO(3). </p> <p>As Wasserstein distances were extensively used in barycenter computations, we derive algorithms to compute the barycenters with respect to our new sliced Wasserstein distances and provide synthetic numerical examples on the 2-sphere that demonstrate their behavior both the free and fixed support setting of discrete spherical measures. In terms of computational speed, they outperform the existing methods for semicircular slicing as well as the regularized Wasserstein barycenters. </p>
back
<p>Performing versatile mobile manipulation actions in human-centered environments requires highly sophisticated software frameworks that are flexible enough to handle special use cases, yet general enough to be applicable across different robotic systems, tasks, and environments. This paper presents a comprehensive memory-centered, affordance-based, and modular uni- and multi-manual grasping and mobile manipulation framework, applicable to complex robot systems with a high number of degrees of freedom such as humanoid robots. By representing mobile manipulation actions through affordances, i.e., interaction possibilities of the robot with its environment, we unify the autonomous manipulation process for known and unknown objects in arbitrary environments. Our framework is integrated and embedded into the memory-centric cognitive architecture of the ARMAR humanoid robot family. This way, robots can not only interact with the physical world but also use common knowledge about objects, and learn and adapt manipulation strategies. We demonstrate the applicability of the framework in real-world experiments, including grasping known and unknown objects, object placing, and semi-autonomous bimanual grasping of objects on two different humanoid robot platforms. </p>
back
<p>The combination of multiple-input multiple-output (MIMO) systems and intelligent reflecting surfaces (IRSs) is foreseen as a critical enabler of beyond 5G (B5G) and 6G. In this work, two different approaches are considered for the joint optimization of the IRS phase-shift matrix and MIMO precoders of an IRS-assisted multi-stream (MS) multi-user MIMO (MU-MIMO) system. Both approaches aim to maximize the system sum-rate for every channel realization. The first proposed solution is a novel contextual bandit (CB) framework with continuous state and action spaces called deep contextual bandit-oriented deep deterministic policy gradient (DCB-DDPG). The second is an innovative deep reinforcement learning (DRL) formulation where the states, actions, and rewards are selected such that the Markov decision process (MDP) property of reinforcement learning (RL) is appropriately met. Both proposals perform remarkably better than state-of-the-art heuristic methods in scenarios with high multi-user interference. </p>
back
<p>In this study, we present a secure smart contract-based Verifiable Random Function (VRF) model, addressing the shortcomings of existing systems. As quantum computing emerges, conventional public key cryptography faces potential vulnerabilities. To enhance our VRF's robustness, we employ post-quantum Ring-LWE encryption for generating pseudo-random sequences. Given the computational intensity of this approach and associated on-chain gas costs, we propose a hybrid architecture of VRF system where on-chain and off-chain can communicate in a scalable and secure way. To ensure the validity and integrity of the off-chain computations (e.g., Ring-LWE encryption), we employ a quantum-secure linkable ring signature scheme on NTRU lattice and also delegated key generation (DKG) with a secure key encapsulation mechanism (KEM). Our decentralized VRF employs multi-party computation (MPC) with blockchain-based decentralized identifiers (DID), ensuring the collective efforts of enhanced randomness and security. We show the security and privacy advantages of our proposed VRF model with the approximated estimation of overall temporal and spatial complexities. We also evaluate our VRF MPC model's entropy and outline its Solidity smart contract integration. This research also provides a method to produce and verify the VRF output's proof, optimal for scenarios necessitating randomness and validation. Lastly, using NIST SP800-22 test suite for randomness, we demonstrate the commendable result with a 97.73% overall pass rate on 11 standard tests and 0.5459 of average p-value for the total 176 tests. </p>
back
<p>An innovative approach to hybrid analog-digital precoding for the downlink of wideband massive MIMO systems is developed. The proposed solution, termed Rank-Constrained Coordinate Ascent (RCCA), starts seeking the full-digital precoder that maximizes the achievable sum-rate over all the frequency subcarriers while constraining the rank of the overall transmit covariance matrix. The frequency-flat constraint on the analog part of the hybrid precoder and the non-convex nature of the rank constraint are circumvented by transforming the original problem into a more suitable one, where a convenient structure for the transmit covariance matrix is imposed. Such structure makes the resulting full-digital precoder particularly adequate for its posterior analog-digital factorization. An additional problem formulation to determine an appropriate power allocation policy according to the rank constraint is also provided. The numerical results show that the proposed method outperforms baseline solutions even for practical scenarios with high spatial diversity. </p>
back
<p>In [2] we show how to construct information sets for Reed-Muller codes only in terms of their basic parameters. In this work we deal with the corresponding problem for q-ary Generalized Reed-Muller codes of first and second order. We see that for first-order codes the result for binary Reed-Muller codes is also valid, while for second-order codes, with q &gt; 2, we have to manage more complex defining sets and we show that we get different information sets. We also present some examples and associated open problems. </p>
back
<p>Lattices are architected metamaterials whose properties strongly depend on their geometrical design. The analogy between lattices and graphs enables the use of graph neural networks (GNNs) as a faster surrogate model compared to traditional methods such as finite element modelling. In this work we present a higher-order GNN model trained to predict the fourth-order stiffness tensor of periodic strut-based lattices. The key features of the model are (i) SE(3) equivariance, and (ii) consistency with the thermodynamic law of conservation of energy. We compare the model to non-equivariant models based on a number of error metrics and demonstrate the benefits of the encoded equivariance and energy conservation in terms of predictive performance and reduced training requirements. </p>
back
<p>We tackle the problem of Byzantine errors in distributed gradient descent within the Byzantine-resilient gradient coding framework. Our proposed solution can recover the exact full gradient in the presence of $s$ malicious workers with a data replication factor of only $s+1$. It generalizes previous solutions to any data assignment scheme that has a regular replication over all data samples. The scheme detects malicious workers through additional interactive communication and a small number of local computations at the main node, leveraging group-wise comparisons between workers with a provably optimal grouping strategy. The scheme requires at most $s$ interactive rounds that incur a total communication cost logarithmic in the number of data samples. </p>
back
<p>In this paper we extend the equal division and the equal surplus division values for transferable utility cooperative games to the more general setup of transferable utility cooperative games with a priori unions. In the case of the equal surplus division value we propose three possible extensions. We provide axiomatic characterizations of the new values. Furthermore, we apply the proposed modifications to a particular cost sharing problem and compare the numerical results with those obtained with the original values. </p>
back
<p>Providing closed form estimates of the decoding failure rate of iterative decoder for low- and moderate-density parity check codes has attracted significant interest in the research community over the years. This interest has raised recently due to the use of iterative decoders in post-quantum cryptosystems, where the desired decoding failure rates are impossible to estimate via Monte Carlo simulations. In this work, we propose a new technique to provide accurate estimates of the DFR of a two-iterations (parallel) bit flipping decoder, which is also employable for cryptographic purposes. In doing so, we successfully tackle the estimation of the bit flipping probabilities at the second decoder iteration, and provide a fitting estimate for the syndrome weight distribution at the first iteration. We numerically validate our results, providing comparisons of the modeled and simulated weight of the syndrome, incorrectly-guessed error bit distribution at the end of the first iteration, and two-iteration Decoding Failure Rates (DFR), both in the floor and waterfall regime for simulatable codes. Finally, we apply our method to estimate the DFR of LEDAcrypt parameters, showing improvements by factors larger than $2^{70}$ (for NIST category $1$) with respect to the previous estimation techniques. This allows for a $\approx 20$% shortening in public key and ciphertext sizes, at no security loss, making the smallest ciphertext for NIST category $1$ only $6$% larger than the one of BIKE. We note that the analyzed two-iterations decoder is applicable in BIKE, where swapping it with the current black-gray decoder (and adjusting the parameters) would provide strong IND-CCA$2$ guarantees. </p>
back
<p>This paper uses topological data analysis (TDA) tools and introduces a data-driven clustering-based stock selection strategy tailored for sparse portfolio construction. Our asset selection strategy exploits the topological features of stock price movements to select a subset of topologically similar (different) assets for a sparse index tracking (Markowitz) portfolio. We introduce new distance measures, which serve as an input to the clustering algorithm, on the space of persistence diagrams and landscapes that consider the time component of a time series. We conduct an empirical analysis on the S\&amp;P index from 2009 to 2020, including a study on the COVID-19 data to validate the robustness of our methodology. Our strategy to integrate TDA with the clustering algorithm significantly enhanced the performance of sparse portfolios across various performance measures in diverse market scenarios. </p>
back
<p>We develop a framework for learning properties of quantum states beyond the assumption of independent and identically distributed (i.i.d.) input states. We prove that, given any learning problem (under reasonable assumptions), an algorithm designed for i.i.d. input states can be adapted to handle input states of any nature, albeit at the expense of a polynomial increase in copy complexity. Furthermore, we establish that algorithms which perform non-adaptive incoherent measurements can be extended to encompass non-i.i.d. input states while maintaining comparable error probabilities. This allows us, among others applications, to generalize the classical shadows of Huang, Kueng, and Preskill to the non-i.i.d. setting at the cost of a small loss in efficiency. Additionally, we can efficiently verify any pure state using Clifford measurements, in a way that is independent of the ideal state. Our main techniques are based on de Finetti-style theorems supported by tools from information theory. In particular, we prove a new randomized local de Finetti theorem that can be of independent interest. </p>
back
<p>Integrating information from multiple modalities enhances the robustness of scene perception systems in autonomous vehicles, providing a more comprehensive and reliable sensory framework. However, the modality incompleteness in multi-modal segmentation remains under-explored. In this work, we establish a task called Modality-Incomplete Scene Segmentation (MISS), which encompasses both system-level modality absence and sensor-level modality errors. To avoid the predominant modality reliance in multi-modal fusion, we introduce a Missing-aware Modal Switch (MMS) strategy to proactively manage missing modalities during training. Utilizing bit-level batch-wise sampling enhances the model's performance in both complete and incomplete testing scenarios. Furthermore, we introduce the Fourier Prompt Tuning (FPT) method to incorporate representative spectral information into a limited number of learnable prompts that maintain robustness against all MISS scenarios. Akin to fine-tuning effects but with fewer tunable parameters (1.1%). Extensive experiments prove the efficacy of our proposed approach, showcasing an improvement of 5.84% mIoU over the prior state-of-the-art parameter-efficient methods in modality missing. The source code will be publicly available at https://github.com/RuipingL/MISS. </p>
back
<p>We investigate the Witsenhausen counterexample in a continuous vector-valued context with a causal encoder and noncausal decoder. Our main result is the optimal single-letter condition that characterizes the set of achievable Witsenhausen power costs and estimation costs, leveraging a modified weak typicality approach. In particular, we accommodate our power analysis to the causal encoder constraint, and provide an improved distortion error analysis for the challenging estimation of the interim state. Interestingly, the idea of dual role of control is explicitly captured by the two auxiliary random variables. </p>
back
<p>The low-rank plus sparse (L+S) decomposition model has enabled better reconstruction of dynamic magnetic resonance imaging (dMRI) with separation into background (L) and dynamic (S) component. However, use of low-rank prior alone may not fully explain the slow variations or smoothness of the background part at the local scale. In this paper, we propose a smoothness-regularized L+S (SR-L+S) model for dMRI reconstruction from highly undersampled k-t-space data. We exploit joint low-rank and smooth priors on the background component of dMRI to better capture both its global and local temporal correlated structures. Extending the L+S formulation, the low-rank property is encoded by the nuclear norm, while the smoothness by a general \ell_{p}-norm penalty on the local differences of the columns of L. The additional smoothness regularizer can promote piecewise local consistency between neighboring frames. By smoothing out the noise and dynamic activities, it allows accurate recovery of the background part, and subsequently more robust dMRI reconstruction. Extensive experiments on multi-coil cardiac and synthetic data shows that the SR-L+S model outp </p>
back
<p>In this paper we introduce the $\Gamma$ value, a new value for cooperative games with transferable utility. We also provide an axiomatic characterization of the $\Gamma$ value based on a property concerning the so-called necessary players. A necessary players of a game is one without which the characteristic function is zero. We illustrate the performance of the $\Gamma$ value in a particular cost allocation problem that arises when the owners of the apartments in a building plan to install an elevator and share its installation cost; in the resulting example we compare the proposals of the $\Gamma$ value, the equal division value and the Shapley value in two different scenarios. In addition, we propose an extension of the $\Gamma$ value for cooperative games with transferable utility and with a coalition structure. Finally, we provide axiomatic characterizations of the coalitional $\Gamma$ value and of the Owen and Banzhaf-Owen values using alternative properties concerning necessary players. </p>
back
<p>World is looking for clean and renewable energy sources that do not pollute the environment, in an attempt to reduce greenhouse gas emissions that contribute to global warming. Wind energy has significant potential to not only reduce greenhouse emission, but also meet the ever increasing demand for energy. To enable the effective utilization of wind energy, addressing the following three challenges in wind data analysis is crucial. Firstly, improving data resolution in various climate conditions to ensure an ample supply of information for assessing potential energy resources. Secondly, implementing dimensionality reduction techniques for data collected from sensors/simulations to efficiently manage and store large datasets. Thirdly, extrapolating wind data from one spatial specification to another, particularly in cases where data acquisition may be impractical or costly. We propose a deep learning based approach to achieve multi-modal continuous resolution wind data prediction from discontinuous wind data, along with data dimensionality reduction. </p>
back
<p>Purpose: Wood comprises different cell types, such as fibers and vessels, defining its properties. Studying their shape, size, and arrangement in microscopic images is crucial for understanding wood samples. Typically, this involves macerating (soaking) samples in a solution to separate cells, then spreading them on slides for imaging with a microscope that covers a wide area, capturing thousands of cells. However, these cells often cluster and overlap in images, making the segmentation difficult and time-consuming using standard image-processing methods. Results: In this work, we develop an automatic deep learning segmentation approach that utilizes the one-stage YOLOv8 model for fast and accurate fiber and vessel segmentation and characterization in microscopy images. The model can analyze 32640 x 25920 pixels images and demonstrate effective cell detection and segmentation, achieving a mAP_0.5-0.95 of 78 %. To assess the model's robustness, we examined fibers from a genetically modified tree line known for longer fibers. The outcomes were comparable to previous manual measurements. Additionally, we created a user-friendly web application for image analysis and provided the code for use on Google Colab. Conclusion: By leveraging YOLOv8's advances, this work provides a deep learning solution to enable efficient quantification and analysis of wood cells suitable for practical applications. </p>
back
<p>In this paper we extend the equal division and the equal surplus division values for transferable utility cooperative games to the more general setup of transferable utility cooperative games with level structures. In the case of the equal surplus division value we propose three possible extensions, one of which has already been described in the literature. We provide axiomatic characterizations of the values considered, apply them to a particular cost sharing problem and compare them in the framework of such an application. </p>
back
<p>We consider a model of third-degree price discrimination, in which the seller has a valuation for the product which is unknown to the market designer, who aims to maximize the buyers' surplus by revealing information regarding the buyer's valuation to the seller. Our main result shows that the regret is bounded by $U^*(0)/e$, where $U^*(0)$ is the optimal buyer surplus in the case where the seller has zero valuation for the product. This bound is attained by randomly drawing a seller valuation and applying the segmentation of Bergemann et al. (2015) with respect to the drawn valuation. We show that the $U^*(0)/e$ bound is tight in the case of binary buyer valuation. </p>
back
<p>We propose a novel algorithm for online resource allocation with non-stationary customer arrivals and unknown click-through rates. We assume multiple types of customers arrive in a nonstationary stochastic fashion, with unknown arrival rates in each period, and that customers' click-through rates are unknown and can only be learned online. By leveraging results from the stochastic contextual bandit with knapsack and online matching with adversarial arrivals, we develop an online scheme to allocate the resources to nonstationary customers. We prove that under mild conditions, our scheme achieves a ``best-of-both-world'' result: the scheme has a sublinear regret when the customer arrivals are near-stationary, and enjoys an optimal competitive ratio under general (non-stationary) customer arrival distributions. Finally, we conduct extensive numerical experiments to show our approach generates near-optimal revenues for all different customer scenarios. </p>
back
<p>The importance of addressing security vulnerabilities is indisputable, with software becoming crucial in sectors such as national defense and finance. Consequently, The security issues caused by software vulnerabilities cannot be ignored. Fuzz testing is an automated software testing technology that can detect vulnerabilities in the software. However, most previous fuzzers encounter challenges that fuzzing performance is sensitive to initial input seeds. In the absence of high-quality initial input seeds, fuzzers may expend significant resources on program path exploration, leading to a substantial decrease in the efficiency of vulnerability detection. To address this issue, we propose WGAN-AFL. By collecting high-quality testcases, we train a generative adversarial network (GAN) to learn their features, thereby obtaining high-quality initial input seeds. To overcome drawbacks like mode collapse and training instability inherent in GANs, we utilize the Wasserstein GAN (WGAN) architecture for training, further enhancing the quality of the generated seeds. Experimental results demonstrate that WGAN-AFL significantly outperforms the original AFL in terms of code coverage, new paths, and vulnerability discovery, demonstrating the effective enhancement of seed quality by WGAN-AFL. </p>
back
<p>We present a new plug-in for the ARGoS swarm robotic simulator to implement the Crazyflie drone, including its controllers, sensors, and some expansion decks. We have based our development on the former Spiri drone, upgrading the position controller, adding a new speed controller, LED ring, onboard camera, and battery discharge model. We have compared this new plug-in in terms of accuracy and efficiency with data obtained from real Crazyflie drones. All our experiments showed that the proposed plug-in worked well, presenting high levels of accuracy. We believe that this is an important contribution to robot simulations which will extend ARGoS capabilities through the use of our proposed, open-source plug-in. </p>
back
<p>We consider the transmission of spatially correlated analog information in a wireless sensor network (WSN) through fading single-input and multiple-output (SIMO) multiple access channels (MACs) with low-latency requirements. A lattice-based analog joint source-channel coding (JSCC) approach is considered where vectors of consecutive source symbols are encoded at each sensor using an n-dimensional lattice and then transmitted to a multiantenna central node. We derive a minimum mean square error (MMSE) decoder that accounts for both the multidimensional structure of the encoding lattices and the spatial correlation. In addition, a sphere decoder is considered to simplify the required searches over the multidimensional lattices. Different lattice-based mappings are approached and the impact of their size and density on performance and latency is analyzed. Results show that, while meeting low-latency constraints, lattice-based analog JSCC provides performance gains and higher reliability with respect to the state-of-the-art JSCC schemes. </p>
back
<p>This paper presents an algorithm, called BCM-Broadcast, for the implementation of causal broadcast in distributed mobile systems in the presence of Byzantine failures. The BCM-Broadcast algorithm simultaneously focuses on three critical challenges in distributed systems: Byzantine failures, Causality, and Mobility. We first present a hierarchical architecture for BCM-Broadcast. Then, we define twelve properties for BCM-Broadcast, including validity, integrity, termination, and causality. We then show that BCM-Broadcast satisfies all these properties. We also prove the safety of BCM-Broadcast; i.e., no healthy process delivers a message from any Byzantine process, assuming that the number of Byzantine processes is less than a third of the total number of mobile nodes. Subsequently, we show that the message complexity of BCM-Broadcast is linear. Finally, using the Poisson process, we analyze the probability of the violation of the safety condition under different mobility scenarios. </p>
back
<p>This paper answers a fundamental question about the exact distribution of the signal-to-interference-plus-noise ratio (SINR) under matched-filter (MF) precoding. Specifically, we derive the exact expressions for the cumulative distribution function (CDF) and the probability density function (PDF) of SINR under MF precoding over Rayleigh fading channels. Based on the exact analysis, we then rigorously prove that the SINR converges to some specific distributions separately in high SNR and in massive MIMO. To simplify the exact result in general cases, we develop a good approximation by modelling the interference as a Beta distribution. We then shift to the exact analysis of the transmit rate, and answer the fundamental question: How does the exact rate converge to the well-known asymptotic rate in massive MIMO? After that, we propose a novel approximation for the ergodic rate, which performs better than various existing approximations. Finally, we present some numerical results to demonstrate the accuracy of the derived analytical models. </p>
back
<p>Entity alignment, which is a prerequisite for creating a more comprehensive Knowledge Graph (KG), involves pinpointing equivalent entities across disparate KGs. Contemporary methods for entity alignment have predominantly utilized knowledge embedding models to procure entity embeddings that encapsulate various similarities-structural, relational, and attributive. These embeddings are then integrated through attention-based information fusion mechanisms. Despite this progress, effectively harnessing multifaceted information remains challenging due to inherent heterogeneity. Moreover, while Large Language Models (LLMs) have exhibited exceptional performance across diverse downstream tasks by implicitly capturing entity semantics, this implicit knowledge has yet to be exploited for entity alignment. In this study, we propose a Large Language Model-enhanced Entity Alignment framework (LLMEA), integrating structural knowledge from KGs with semantic knowledge from LLMs to enhance entity alignment. Specifically, LLMEA identifies candidate alignments for a given entity by considering both embedding similarities between entities across KGs and edit distances to a virtual equivalent entity. It then engages an LLM iteratively, posing multiple multi-choice questions to draw upon the LLM's inference capability. The final prediction of the equivalent entity is derived from the LLM's output. Experiments conducted on three public datasets reveal that LLMEA surpasses leading baseline models. Additional ablation studies underscore the efficacy of our proposed framework. </p>
back
<p>The Finite Fourier Series (FFS) Shape-Based (SB) trajectory approximation method has been used to rapidly generate initial trajectories that satisfy the dynamics, trajectory boundary conditions, and limitation on maximum thrust acceleration. The FFS SB approach solves a nonlinear programming problem (NLP) in searching for feasible trajectories. This paper extends the development of the FFS SB approach to generate sub optimal solutions. Specifically, the objective function of the NLP problem is modified to include also a measure for the time of flight. Numerical results presented in this paper show several solutions that differ from those of the original FFS SB ones. The sub-optimal trajectories generated using a time of flight minimization are shown to be physically feasible trajectories and potential candidates for direct solvers. </p>
back
<p>The aim of this paper is to develop hybrid non-orthogonal multiple access (NOMA) assisted downlink transmission. First, for the single-input single-output (SISO) scenario, i.e., each node is equipped with a single antenna, a novel hybrid NOMA scheme is introduced, where NOMA is implemented as an add-on of a legacy time division multiple access (TDMA) network. Because of the simplicity of the SISO scenario, analytical results can be developed to reveal important properties of downlink hybrid NOMA. For example, in the case that the users' channel gains are ordered and the durations of their time slots are the same, downlink hybrid NOMA is shown to always outperform TDMA, which is different from the existing conclusion for uplink hybrid NOMA. Second, the proposed downlink SISO hybrid NOMA scheme is extended to the multiple-input single-output (MISO) scenario, i.e., the base station has multiple antennas. For the MISO scenario, near-field communication is considered to illustrate how NOMA can be used as an add-on in legacy networks based on space division multiple access and TDMA. Simulation results verify the developed analytical results and demonstrate the superior performance of downlink hybrid NOMA compared to conventional orthogonal multiple access. </p>
back
<p>In this paper, a direct finite element method is proposed for solving interface problems on simple unfitted meshes. The fact that the two interface conditions form a $H^{\frac12}(\Gamma)\times H^{-\frac12}(\Gamma)$ pair leads to a simple and direct weak formulation with an integral term for the mutual interaction over the interface, and the well-posedness of this weak formulation is proved. Based on this formulation, a direct finite element method is proposed to solve the problem on two adjacent subdomains separated by the interface by conforming finite element and conforming mixed finite element, respectively. The well-posedness and an optimal a priori analysis are proved for this direct finite element method under some reasonable assumptions. A simple lowest order direct finite element method by using the linear element method and the lowest order Raviart-Thomas element method is proposed and analyzed to admit the optimal a priori error estimate by verifying the aforementioned assumptions. Numerical tests are also conducted to verify the theoretical results and the effectiveness of the direct finite element method. </p>
back
<p>Recent approaches to automatically detect the speaker of an utterance of direct speech often disregard general information about characters in favor of local information found in the context, such as surrounding mentions of entities. In this work, we explore stylistic representations of characters built by encoding their quotes with off-the-shelf pretrained Authorship Verification models in a large corpus of English novels (the Project Dialogism Novel Corpus). Results suggest that the combination of stylistic and topical information captured in some of these models accurately distinguish characters among each other, but does not necessarily improve over semantic-only models when attributing quotes. However, these results vary across novels and more investigation of stylometric models particularly tailored for literary texts and the study of characters should be conducted. </p>
back
<p>Plagiarism is a pressing concern, even more so with the availability of large language models. Existing plagiarism detection systems reliably find copied and moderately reworded text but fail for idea plagiarism, especially in mathematical science, which heavily uses formal mathematical notation. We make two contributions. First, we establish a taxonomy of mathematical content reuse by annotating potentially plagiarised 122 scientific document pairs. Second, we analyze the best-performing approaches to detect plagiarism and mathematical content similarity on the newly established taxonomy. We found that the best-performing methods for plagiarism and math content similarity achieve an overall detection score (PlagDet) of 0.06 and 0.16, respectively. The best-performing methods failed to detect most cases from all seven newly established math similarity types. Outlined contributions will benefit research in plagiarism detection systems, recommender systems, question-answering systems, and search engines. We make our experiment's code and annotated dataset available to the community: https://github.com/gipplab/Taxonomy-of-Mathematical-Plagiarism </p>
back
<p>Many High Performance Computing (HPC) facilities have developed and deployed frameworks in support of continuous monitoring and operational data analytics (MODA) to help improve efficiency and throughput. Because of the complexity and scale of systems and workflows and the need for low-latency response to address dynamic circumstances, automated feedback and response have the potential to be more effective than current human-in-the-loop approaches which are laborious and error prone. Progress has been limited, however, by factors such as the lack of infrastructure and feedback hooks, and successful deployment is often site- and case-specific. In this position paper we report on the outcomes and plans from a recent Dagstuhl Seminar, seeking to carve a path for community progress in the development of autonomous feedback loops for MODA, based on the established formalism of similar (MAPE-K) loops in autonomous computing and self-adaptive systems. By defining and developing such loops for significant cases experienced across HPC sites, we seek to extract commonalities and develop conventions that will facilitate interoperability and interchangeability with system hardware, software, and applications across different sites, and will motivate vendors and others to provide telemetry interfaces and feedback hooks to enable community development and pervasive deployment of MODA autonomy loops. </p>
back
<p>Multi-image super-resolution (MISR) allows to increase the spatial resolution of a low-resolution (LR) acquisition by combining multiple images carrying complementary information in the form of sub-pixel offsets in the scene sampling, and can be significantly more effective than its single-image counterpart. Its main difficulty lies in accurately registering and fusing the multi-image information. Currently studied settings, such as burst photography, typically involve assumptions of small geometric disparity between the LR images and rely on optical flow for image registration. We study a MISR method that can increase the resolution of sets of images acquired with arbitrary, and potentially wildly different, camera positions and orientations, generalizing the currently studied MISR settings. Our proposed model, called EpiMISR, moves away from optical flow and explicitly uses the epipolar geometry of the acquisition process, together with transformer-based processing of radiance feature fields to substantially improve over state-of-the-art MISR methods in presence of large disparities in the LR images. </p>
back
<p>Causal discovery is the challenging task of inferring causal structure from data. Motivated by Pearl's Causal Hierarchy (PCH), which tells us that passive observations alone are not enough to distinguish correlation from causation, there has been a recent push to incorporate interventions into machine learning research. Reinforcement learning provides a convenient framework for such an active approach to learning. This paper presents CORE, a deep reinforcement learning-based approach for causal discovery and intervention planning. CORE learns to sequentially reconstruct causal graphs from data while learning to perform informative interventions. Our results demonstrate that CORE generalizes to unseen graphs and efficiently uncovers causal structures. Furthermore, CORE scales to larger graphs with up to 10 variables and outperforms existing approaches in structure estimation accuracy and sample efficiency. All relevant code and supplementary material can be found at https://github.com/sa-and/CORE </p>
back
<p>The modification of Amdahl's law for the case of increment of processor elements in a computer system is considered. The coefficient $k$ linking accelerations of parallel and parallel specialized computer systems is determined. The limiting values of the coefficient are investigated and its theoretical maximum is calculated. It is proved that $k$ &gt; 1 for any positive increment of processor elements. The obtained formulas are combined into a single method allowing to determine the maximum theoretical acceleration of a parallel specialized computer system in comparison with the acceleration of a minimal parallel computer system. The method is tested on Apriori, k-nearest neighbors, CDF 9/7, fast Fourier transform and naive Bayesian classifier algorithms. </p>
back
<p>This paper investigates the theoretical analysis of intrinsic message passing decoding for generalized product codes (GPCs) with irregular degree distributions, a generalization of product codes that allows every code bit to be protected by a minimum of two and potentially more component codes. We derive a random hypergraph-based asymptotic performance analysis for GPCs, extending previous work that considered the case where every bit is protected by exactly two component codes. The analysis offers a new tool to guide the code design of GPCs by providing insights into the influence of degree distributions on the performance of GPCs. </p>
back
<p>Generative retrieval models encode pointers to information in a corpus as an index within the model's parameters. These models serve as part of a larger pipeline, where retrieved information conditions generation for knowledge-intensive NLP tasks. However, we identify two limitations: the generative retrieval does not account for contextual information. Secondly, the retrieval can't be tuned for the downstream readers as decoding the page title is a non-differentiable operation. This paper introduces Re3val, trained with generative reranking and reinforcement learning using limited data. Re3val leverages context acquired via Dense Passage Retrieval to rerank the retrieved page titles and utilizes REINFORCE to maximize rewards generated by constrained decoding. Additionally, we generate questions from our pre-training dataset to mitigate epistemic uncertainty and bridge the domain gap between the pre-training and fine-tuning datasets. Subsequently, we extract and rerank contexts from the KILT database using the rerank page titles. Upon grounding the top five reranked contexts, Re3val demonstrates the Top 1 KILT scores compared to all other generative retrieval models across five KILT datasets. </p>
back
<p>Imaging Atmospheric Cherenkov Telescopes (IACTs) of gamma ray observatory TAIGA detect the Extesnive Air Showers (EASs) originating from the cosmic or gamma rays interactions with the atmosphere. Thereby, telescopes obtain images of the EASs. The ability to segregate gamma rays images from the hadronic cosmic ray background is one of the main features of this type of detectors. However, in actual IACT observations simultaneous observation of the background and the source of gamma ray is needed. This observation mode (called wobbling) modifies images of events, which affects the quality of selection by neural networks. </p> <p>Thus, in this work, the results of the application of neural networks (NN) for image classification task on Monte Carlo (MC) images of TAIGA-IACTs are presented. The wobbling mode is considered together with the image adaptation for adequate analysis by NNs. Simultaneously, we explore several neural network structures that classify events both directly from images or through Hillas parameters extracted from images. In addition, by employing NNs, MC simulation data are used to evaluate the quality of the segregation of rare gamma events with the account of all necessary image modifications. </p>
back
<p>The growing popularity of Android requires malware detection systems that can keep up with the pace of new software being released. According to a recent study, a new piece of malware appears online every 12 seconds. To address this, we treat Android malware detection as a streaming data problem and explore the use of active online learning as a means of mitigating the problem of labelling applications in a timely and cost-effective manner. Our resulting framework achieves accuracies of up to 96\%, requires as little of 24\% of the training data to be labelled, and compensates for concept drift that occurs between the release and labelling of an application. We also consider the broader practicalities of online learning within Android malware detection, and systematically explore the trade-offs between using different static, dynamic and hybrid feature sets to classify malware. </p>
back
<p>This manuscript introduces deep learning models that simultaneously describe the dynamics of several yield curves. We aim to learn the dependence structure among the different yield curves induced by the globalization of financial markets and exploit it to produce more accurate forecasts. By combining the self-attention mechanism and nonparametric quantile regression, our model generates both point and interval forecasts of future yields. The architecture is designed to avoid quantile crossing issues affecting multiple quantile regression models. Numerical experiments conducted on two different datasets confirm the effectiveness of our approach. Finally, we explore potential extensions and enhancements by incorporating deep ensemble methods and transfer learning mechanisms. </p>
back
<p>The Sustainable Development Goals (SDGs) of the United Nations provide a blueprint of a better future by 'leaving no one behind', and, to achieve the SDGs by 2030, poor countries require immense volumes of development aid. In this paper, we develop a causal machine learning framework for predicting heterogeneous treatment effects of aid disbursements to inform effective aid allocation. Specifically, our framework comprises three components: (i) a balancing autoencoder that uses representation learning to embed high-dimensional country characteristics while addressing treatment selection bias; (ii) a counterfactual generator to compute counterfactual outcomes for varying aid volumes to address small sample-size settings; and (iii) an inference model that is used to predict heterogeneous treatment-response curves. We demonstrate the effectiveness of our framework using data with official development aid earmarked to end HIV/AIDS in 105 countries, amounting to more than USD 5.2 billion. For this, we first show that our framework successfully computes heterogeneous treatment-response curves using semi-synthetic data. Then, we demonstrate our framework using real-world HIV data. Our framework points to large opportunities for a more effective aid allocation, suggesting that the total number of new HIV infections could be reduced by up to 3.3% (~50,000 cases) compared to the current allocation practice. </p>
back
<p>Large-scale image datasets are often partially labeled, where only a few categories' labels are known for each image. Assigning pseudo-labels to unknown labels to gain additional training signals has become prevalent for training deep classification models. However, some pseudo-labels are inevitably incorrect, leading to a notable decline in the model classification performance. In this paper, we propose a novel method called Category-wise Fine-Tuning (CFT), aiming to reduce model inaccuracies caused by the wrong pseudo-labels. In particular, CFT employs known labels without pseudo-labels to fine-tune the logistic regressions of trained models individually to calibrate each category's model predictions. Genetic Algorithm, seldom used for training deep models, is also utilized in CFT to maximize the classification performance directly. CFT is applied to well-trained models, unlike most existing methods that train models from scratch. Hence, CFT is general and compatible with models trained with different methods and schemes, as demonstrated through extensive experiments. CFT requires only a few seconds for each category for calibration with consumer-grade GPUs. We achieve state-of-the-art results on three benchmarking datasets, including the CheXpert chest X-ray competition dataset (ensemble mAUC 93.33%, single model 91.82%), partially labeled MS-COCO (average mAP 83.69%), and Open Image V3 (mAP 85.31%), outperforming the previous bests by 0.28%, 2.21%, 2.50%, and 0.91%, respectively. The single model on CheXpert has been officially evaluated by the competition server, endorsing the correctness of the result. The outstanding results and generalizability indicate that CFT could be substantial and prevalent for classification model development. Code is available at: https://github.com/maxium0526/category-wise-fine-tuning. </p>
back
<p>This article bridges the gap between two topics used in sharing an encryption key: (i) Key Consolidation, i.e., extracting two identical strings of bits from two information sources with similarities (common randomness). (ii) Quantum-safe Key Encapsulation by incorporating randomness in Public/Private Key pairs. In the context of Key Consolidation, the proposed scheme adds to the complexity Eve faces in extracting useful data from leaked information. In this context, it is applied to the method proposed in [1] for establishing common randomness from round-trip travel times in a packet data network. The proposed method allows adapting the secrecy level to the amount of similarity in common randomness. It can even encapsulate a Quantum-safe encryption key in the extreme case that no common randomness is available. In the latter case, it is shown that the proposed scheme offers improvements with respect to the McEliece cryptosystem which currently forms the foundation for Quantum safe key encapsulation. </p> <p>[1] A. K. Khandani, "Looping for Encryption Key Generation Over the Internet: A New Frontier in Physical Layer Security," 2023 Biennial Symposium on Communications (BSC), Montreal, QC, Canada, 2023, pp. 59-64 </p>
back
<p>In this paper we study the interactions between so-called fractional relaxations of the integer programs (IPs) which encode homomorphism and isomorphism of relational structures. We give a combinatorial characterization of a certain natural linear programming (LP) relaxation of homomorphism in terms of fractional isomorphism. As a result, we show that the families of constraint satisfaction problems (CSPs) that are solvable by such linear program are precisely those that are closed under an equivalence relation which we call Weisfeiler-Leman invariance. We also generalize this result to the much broader framework of Promise Valued Constraint Satisfaction Problems, which brings together two well-studied extensions of the CSP framework. Finally, we consider the hierarchies of increasingly tighter relaxations of the homomorphism and isomorphism IPs obtained by applying the Sherali-Adams and Weisfeiler-Leman methods respectively. We extend our combinatorial characterization of the basic LP to higher levels of the Sherali-Adams hierarchy, and we generalize a well-known logical characterization of the Weisfeiler-Leman test from graphs to relational structures. </p>
back
<p>Text generation is a compelling sub-field of natural language processing, aiming to generate human-readable text from input words. In particular, the decoder-only generative models, such as generative pre-trained transformer (GPT), are widely used for text generation, with two major computational stages: summarization and generation. Unlike the summarization stage, which can process the input tokens in parallel, the generation stage is difficult to accelerate due to its sequential generation of output tokens through iteration. Moreover, each iteration requires reading a whole model with little data reuse opportunity. Therefore, the workload of transformer-based text generation is severely memory-bound, making the external memory bandwidth system bottleneck. In this paper, we proposed a subarray-level processing-in-memory architecture named SAL-PIM, HBM-based PIM architecture for the end-to-end acceleration of transformer-based text generation. The SAL-PIM architecture includes three architectural features. First, the SAL-PIM architecture utilizes higher internal bandwidth by integrating multiple subarray-level arithmetic logic units with optimized data mapping schemes. Second, the SAL-PIM architecture adopts LUT-based linear interpolation to perform complex non-linear functions in PIM. Third, the SAL-PIM architecture accelerates end-to-end inference on PIM in text generation. Furthermore, to validate the SAL-PIM architecture, we built cycle-accurate simulator and implemented the SAL-PIM's logic units in 28-nm CMOS technology. As a result, when the input size is from 32 to 128 and the output size is from 1 to 256, SAL-PIM achieves a maximum of 4.72 times speedup and an average of 1.83 times speedup for the text generation based on the GPT-2 medium model compared to the server-level GPU. </p>
back
<p>This paper presents the results of finetuning large language models (LLMs) for the task of detecting vulnerabilities in source code. We leverage WizardCoder, a recent improvement of the state-of-the-art LLM StarCoder, and adapt it for vulnerability detection through further finetuning. To accelerate training, we modify WizardCoder's training procedure, also we investigate optimal training regimes. For the imbalanced dataset with many more negative examples than positive, we also explore different techniques to improve classification performance. The finetuned WizardCoder model achieves improvement in ROC AUC and F1 measures on balanced and imbalanced vulnerability datasets over CodeBERT-like model, demonstrating the effectiveness of adapting pretrained LLMs for vulnerability detection in source code. The key contributions are finetuning the state-of-the-art code LLM, WizardCoder, increasing its training speed without the performance harm, optimizing the training procedure and regimes, handling class imbalance, and improving performance on difficult vulnerability detection datasets. This demonstrates the potential for transfer learning by finetuning large pretrained language models for specialized source code analysis tasks. </p>
back
<p>In this paper, we introduce two metrics, namely, age of actuation (AoA) and age of actuated information (AoAI), within a discrete-time system model that integrates data caching and energy harvesting (EH). AoA evaluates the timeliness of actions irrespective of the age of the information, while AoAI considers the freshness of the utilized data packet. We use Markov Chain analysis to model the system's evolution. Furthermore, we employ three-dimensional Markov Chain analysis to characterize the stationary distributions for AoA and AoAI and calculate their average values. Our findings from the analysis, validated by simulations, show that while AoAI consistently decreases with increased data and energy packet arrival rates, AoA presents a more complex behavior, with potential increases under conditions of limited data or energy resources. These metrics go towards the semantics of information and goal-oriented communications since they consider the timeliness of utilizing the information to perform an action. </p>
back
<p>This paper belongs to a group of work in the intersection of symbolic computation and group analysis aiming for the symbolic analysis of differential equations. The goal is to extract important properties without finding the explicit general solution. In this contribution, we introduce the algorithmic verification of nonlinear superposition properties and its implementation. More exactly, for a system of nonlinear ordinary differential equations of first order with a polynomial right-hand side, we check if the differential system admits a general solution by means of a superposition rule and a certain number of particular solutions. It is based on the theory of Newton polytopes and associated symbolic computation. The developed method provides the basis for the identification of nonlinear superpositions within a given system and for the construction of numerical methods which preserve important algebraic properties at the numerical level. </p>
back
<p>Safety measures need to be systemically investigated to what extent they evaluate the intended performance of Deep Neural Networks (DNNs) for critical applications. Due to a lack of verification methods for high-dimensional DNNs, a trade-off is needed between accepted performance and handling of out-of-distribution (OOD) samples. </p> <p>This work evaluates rejecting outputs from semantic segmentation DNNs by applying a Mahalanobis distance (MD) based on the most probable class-conditional Gaussian distribution for the predicted class as an OOD score. The evaluation follows three DNNs trained on the Cityscapes dataset and tested on four automotive datasets and finds that classification risk can drastically be reduced at the cost of pixel coverage, even when applied on unseen datasets. The applicability of our findings will support legitimizing safety measures and motivate their usage when arguing for safe usage of DNNs in automotive perception. </p>
back
<p>Extremely large aperture array (ELAA) is anticipated to serve as a pivotal feature of future multiple-input multiple-output (MIMO) systems in 6G. Near-field (NF) fading channel models are essential for reliable link-level simulation and ELAA system design. In this article, we propose a framework designed to generate NF fading channels for both communication and integrated sensing and communication (ISAC) applications. The framework allows a mixed of line of sight (LoS) and non-LoS (NLoS) links. It also considers spherical wave model and spatially non-stationary shadow fading. Based on this framework, we propose a three-dimensional (3D) fading channel model for ELAA systems deployed with a uniform rectangular array (URA). It can capture the impact of sensing object for ISAC applications. Moreover, all parameters involved in the framework are based on specifications or measurements from the 3rd Generation Partnership Project (3GPP) documents. Therefore, the proposed framework and channel model have the potential to contribute to the standard in various aspects, including ISAC, extra-large (XL-) MIMO, and reconfigurable intelligent surface (RIS) aided MIMO systems. Finally, future directions for ELAA are presented, including not only NF channel modeling but also the design of next-generation transceivers. </p>
back
<p>Subgraph matching has garnered increasing attention for its diverse real-world applications. Given the dynamic nature of real-world graphs, addressing evolving scenarios without incurring prohibitive overheads has been a focus of research. However, existing approaches for dynamic subgraph matching often proceed serially, retrieving incremental matches for each updated edge individually. This approach falls short when handling batch data updates, leading to a decrease in system throughput. Leveraging the parallel processing power of GPUs, which can execute a massive number of cores simultaneously, has been widely recognized for performance acceleration in various domains. Surprisingly, systematic exploration of subgraph matching in the context of batch-dynamic graphs, particularly on a GPU platform, remains untouched. In this paper, we bridge this gap by introducing an efficient framework, GAMMA (GPU-Accelerated Batch-Dynamic Subgraph Matching). Our approach features a DFS-based warp-centric batch-dynamic subgraph matching algorithm. To ensure load balance in the DFS-based search, we propose warp-level work stealing via shared memory. Additionally, we introduce coalesced search to reduce redundant computations. Comprehensive experiments demonstrate the superior performance of GAMMA. Compared to state-of-the-art algorithms, GAMMA showcases a performance improvement up to hundreds of times. </p>
back
<p>Metamorphic testing (MT) has proven to be a successful solution to automating testing and addressing the oracle problem. However, it entails manually deriving metamorphic relations (MRs) and converting them into an executable form; these steps are time-consuming and may prevent the adoption of MT. In this paper, we propose an approach for automatically deriving executable MRs (EMRs) from requirements using large language models (LLMs). Instead of merely asking the LLM to produce EMRs, our approach relies on a few-shot prompting strategy to instruct the LLM to perform activities in the MT process, by providing requirements and API specifications, as one would do with software engineers. To assess the feasibility of our approach, we conducted a questionnaire-based survey in collaboration with Siemens Industry Software, focusing on four of their software applications. Additionally, we evaluated the accuracy of the generated EMRs for a web application. The outcomes of our study are highly promising, as they demonstrate the capability of our approach to generate MRs and EMRs that are both comprehensible and pertinent for testing purposes. </p>
back
<p>Moving object segmentation (MOS) provides a reliable solution for detecting traffic participants and thus is of great interest in the autonomous driving field. Dynamic capture is always critical in the MOS problem. Previous methods capture motion features from the range images directly. Differently, we argue that the residual maps provide greater potential for motion information, while range images contain rich semantic guidance. Based on this intuition, we propose MF-MOS, a novel motion-focused model with a dual-branch structure for LiDAR moving object segmentation. Novelly, we decouple the spatial-temporal information by capturing the motion from residual maps and generating semantic features from range images, which are used as movable object guidance for the motion branch. Our straightforward yet distinctive solution can make the most use of both range images and residual maps, thus greatly improving the performance of the LiDAR-based MOS task. Remarkably, our MF-MOS achieved a leading IoU of 76.7% on the MOS leaderboard of the SemanticKITTI dataset upon submission, demonstrating the current state-of-the-art performance. The implementation of our MF-MOS has been released at https://github.com/SCNU-RISLAB/MF-MOS. </p>
back
<p>Developing an automatic signature verification system is challenging and demands a large number of training samples. This is why synthetic handwriting generation is an emerging topic in document image analysis. Some handwriting synthesizers use the motor equivalence model, the well-established hypothesis from neuroscience, which analyses how a human being accomplishes movement. Specifically, a motor equivalence model divides human actions into two steps: 1) the effector independent step at cognitive level and 2) the effector dependent step at motor level. In fact, recent work reports the successful application to Western scripts of a handwriting synthesizer, based on this theory. This paper aims to adapt this scheme for the generation of synthetic signatures in two Indic scripts, Bengali (Bangla), and Devanagari (Hindi). For this purpose, we use two different online and offline databases for both Bengali and Devanagari signatures. This paper reports an effective synthesizer for static and dynamic signatures written in Devanagari or Bengali scripts. We obtain promising results with artificially generated signatures in terms of appearance and performance when we compare the results with those for real signatures. </p>
back
<p>Deep learning models have demonstrated promising results in estimating treatment effects (TEE). However, most of them overlook the variations in treatment outcomes among subgroups with distinct characteristics. This limitation hinders their ability to provide accurate estimations and treatment recommendations for specific subgroups. In this study, we introduce a novel neural network-based framework, named SubgroupTE, which incorporates subgroup identification and treatment effect estimation. SubgroupTE identifies diverse subgroups and simultaneously estimates treatment effects for each subgroup, improving the treatment effect estimation by considering the heterogeneity of treatment responses. Comparative experiments on synthetic data show that SubgroupTE outperforms existing models in treatment effect estimation. Furthermore, experiments on a real-world dataset related to opioid use disorder (OUD) demonstrate the potential of our approach to enhance personalized treatment recommendations for OUD patients. </p>
back
<p>We investigate the prospect of reconstructing the ``cosmic distance ladder'' of the Universe using a novel deep learning framework called LADDER - Learning Algorithm for Deep Distance Estimation and Reconstruction. LADDER is trained on the apparent magnitude data from the Pantheon Type Ia supernovae compilation, incorporating the full covariance information among data points, to produce predictions along with corresponding errors. After employing several validation tests with a number of deep learning models, we pick LADDER as the best performing one. We then demonstrate applications of our method in the cosmological context, that include serving as a model-independent tool for consistency checks for other datasets like baryon acoustic oscillations, calibration of high-redshift datasets such as gamma ray bursts, use as a model-independent mock catalog generator for future probes, etc. Our analysis advocates for interesting yet cautious consideration of machine learning applications in these contexts. </p>
back
<p>One of the most critical aspects of multimodal Reinforcement Learning (RL) is the effective integration of different observation modalities. Having robust and accurate representations derived from these modalities is key to enhancing the robustness and sample efficiency of RL algorithms. However, learning representations in RL settings for visuotactile data poses significant challenges, particularly due to the high dimensionality of the data and the complexity involved in correlating visual and tactile inputs with the dynamic environment and task objectives. To address these challenges, we propose Multimodal Contrastive Unsupervised Reinforcement Learning (M2CURL). Our approach employs a novel multimodal self-supervised learning technique that learns efficient representations and contributes to faster convergence of RL algorithms. Our method is agnostic to the RL algorithm, thus enabling its integration with any available RL algorithm. We evaluate M2CURL on the Tactile Gym 2 simulator and we show that it significantly enhances the learning efficiency in different manipulation tasks. This is evidenced by faster convergence rates and higher cumulative rewards per episode, compared to standard RL algorithms without our representation learning approach. </p>
back
<p>Deep subspace clustering (DSC) networks based on self-expressive model learn representation matrix, often implemented in terms of fully connected network, in the embedded space. After the learning is finished, representation matrix is used by spectral clustering module to assign labels to clusters. However, such approach ignores complementary information that exist in other layers of the encoder (including the input data themselves). Herein, we apply selected linear subspace clustering algorithm to learn representation matrices from representations learned by all layers of encoder network including the input data. Afterward, we learn a multilayer graph that in a multi-view like manner integrates information from graph Laplacians of all used layers. That improves further performance of selected DSC network. Furthermore, we also provide formulation of our approach to cluster out-of-sample/test data points. We validate proposed approach on four well-known datasets with two DSC networks as baseline models. In almost all the cases, proposed approach achieved statistically significant improvement in three performance metrics. MATLAB code of proposed algorithm is posted on https://github.com/lovro-sinda/MLG-DSC. </p>
back
<p>Kernel methods are applied to many problems in pattern recognition, including subspace clustering (SC). That way, nonlinear problems in the input data space become linear in mapped high-dimensional feature space. Thereby, computationally tractable nonlinear algorithms are enabled through implicit mapping by the virtue of kernel trick. However, kernelization of linear algorithms is possible only if square of the Froebenious norm of the error term is used in related optimization problem. That, however, implies normal distribution of the error. That is not appropriate for non-Gaussian errors such as gross sparse corruptions that are modeled by -norm. Herein, to the best of our knowledge, we propose for the first time robust kernel sparse SC (RKSSC) algorithm for data with gross sparse corruptions. The concept, in principle, can be applied to other SC algorithms to achieve robustness to the presence of such type of corruption. We validated proposed approach on two well-known datasets with linear robust SSC algorithm as a baseline model. According to Wilcoxon test, clustering performance obtained by the RKSSC algorithm is statistically significantly better than corresponding performance obtained by the robust SSC algorithm. MATLAB code of proposed RKSSC algorithm is posted on https://github.com/ikopriva/RKSSC. </p>
back
<p>The structure of data organization is widely recognized as having a substantial influence on the efficacy of machine learning algorithms, particularly in binary classification tasks. Our research provides a theoretical framework suggesting that the maximum potential of binary classifiers on a given dataset is primarily constrained by the inherent qualities of the data. Through both theoretical reasoning and empirical examination, we employed standard objective functions, evaluative metrics, and binary classifiers to arrive at two principal conclusions. Firstly, we show that the theoretical upper bound of binary classification performance on actual datasets can be theoretically attained. This upper boundary represents a calculable equilibrium between the learning loss and the metric of evaluation. Secondly, we have computed the precise upper bounds for three commonly used evaluation metrics, uncovering a fundamental uniformity with our overarching thesis: the upper bound is intricately linked to the dataset's characteristics, independent of the classifier in use. Additionally, our subsequent analysis uncovers a detailed relationship between the upper limit of performance and the level of class overlap within the binary classification data. This relationship is instrumental for pinpointing the most effective feature subsets for use in feature engineering. </p>
back
<p>This paper studies Bayesian optimization with noise-free observations. We introduce new algorithms rooted in scattered data approximation that rely on a random exploration step to ensure that the fill-distance of query points decays at a near-optimal rate. Our algorithms retain the ease of implementation of the classical GP-UCB algorithm and satisfy cumulative regret bounds that nearly match those conjectured in <a href="/abs/2002.05096">arXiv:2002.05096</a>, hence solving a COLT open problem. Furthermore, the new algorithms outperform GP-UCB and other popular Bayesian optimization strategies in several examples. </p>
back
<p>Recently, there has been increasing concern about the vulnerability of deep neural network (DNN)-based synthetic aperture radar (SAR) automatic target recognition (ATR) to adversarial attacks, where a DNN could be easily deceived by clean input with imperceptible but aggressive perturbations. This paper studies the synthetic-to-measured (S2M) transfer setting, where an attacker generates adversarial perturbation based solely on synthetic data and transfers it against victim models trained with measured data. Compared with the current measured-to-measured (M2M) transfer setting, our approach does not need direct access to the victim model or the measured SAR data. We also propose the transferability estimation attack (TEA) to uncover the adversarial risks in this more challenging and practical scenario. The TEA makes full use of the limited similarity between the synthetic and measured data pairs for blind estimation and optimization of S2M transferability, leading to feasible surrogate model enhancement without mastering the victim model and data. Comprehensive evaluations based on the publicly available synthetic and measured paired labeled experiment (SAMPLE) dataset demonstrate that the TEA outperforms state-of-the-art methods and can significantly enhance various attack algorithms in computer vision and remote sensing applications. Codes and data are available at https://github.com/scenarri/S2M-TEA. </p>
back
<p>Clarification requests are a mechanism to help solve communication problems, e.g. due to ambiguity or underspecification, in instruction-following interactions. Despite their importance, even skilful models struggle with producing or interpreting such repair acts. In this work, we test three hypotheses concerning the effects of action taking as an auxiliary task in modelling iCR policies. Contrary to initial expectations, we conclude that its contribution to learning an iCR policy is limited, but some information can still be extracted from prediction uncertainty. We present further evidence that even well-motivated, Transformer-based models fail to learn good policies for when to ask Instruction CRs (iCRs), while the task of determining what to ask about can be more successfully modelled. Considering the implications of these findings, we further discuss the shortcomings of the data-driven paradigm for learning meta-communication acts. </p>
back
<p>Recently, deep learning techniques are gradually replacing traditional statistical and machine learning models as the first choice for price forecasting tasks. In this paper, we leverage probabilistic deep learning for inferring the volatility index VIX. We employ the probabilistic counterpart of WaveNet, Temporal Convolutional Network (TCN), and Transformers. We show that TCN outperforms all models with an RMSE around 0.189. In addition, it has been well known that modern neural networks provide inaccurate uncertainty estimates. For solving this problem, we use the standard deviation scaling to calibrate the networks. Furthermore, we found out that MNF with Gaussian prior outperforms Reparameterization Trick and Flipout models in terms of precision and uncertainty predictions. Finally, we claim that MNF with Cauchy and LogUniform prior distributions yield well calibrated TCN and WaveNet networks being the former that best infer the VIX values. </p>
back
<p>Retrieval-Augmented Generation (RAG) is a technique that enhances the capabilities of large language models (LLMs) by incorporating external knowledge sources. This method addresses common LLM limitations, including outdated information and the tendency to produce inaccurate "hallucinated" content. However, the evaluation of RAG systems is challenging, as existing benchmarks are limited in scope and diversity. Most of the current benchmarks predominantly assess question-answering applications, overlooking the broader spectrum of situations where RAG could prove advantageous. Moreover, they only evaluate the performance of the LLM component of the RAG pipeline in the experiments, and neglect the influence of the retrieval component and the external knowledge database. To address these issues, this paper constructs a large-scale and more comprehensive benchmark, and evaluates all the components of RAG systems in various RAG application scenarios. Specifically, we have categorized the range of RAG applications into four distinct types-Create, Read, Update, and Delete (CRUD), each representing a unique use case. "Create" refers to scenarios requiring the generation of original, varied content. "Read" involves responding to intricate questions in knowledge-intensive situations. "Update" focuses on revising and rectifying inaccuracies or inconsistencies in pre-existing texts. "Delete" pertains to the task of summarizing extensive texts into more concise forms. For each of these CRUD categories, we have developed comprehensive datasets to evaluate the performance of RAG systems. We also analyze the effects of various components of the RAG system, such as the retriever, the context length, the knowledge base construction, and the LLM. Finally, we provide useful insights for optimizing the RAG technology for different scenarios. </p>
back
<p>Multi-Agent Path Finding (MAPF) involves determining paths for multiple agents to travel simultaneously through a shared area toward particular goal locations. This problem is computationally complex, especially when dealing with large numbers of agents, as is common in realistic applications like autonomous vehicle coordination. Finding an optimal solution is often computationally infeasible, making the use of approximate algorithms essential. Adding to the complexity, agents might act in a self-interested and strategic way, possibly misrepresenting their goals to the MAPF algorithm if it benefits them. Although the field of mechanism design offers tools to align incentives, using these tools without careful consideration can fail when only having access to approximately optimal outcomes. Since approximations are crucial for scalable MAPF algorithms, this poses a significant challenge. In this work, we introduce the problem of scalable mechanism design for MAPF and propose three strategyproof mechanisms, two of which even use approximate MAPF algorithms. We test our mechanisms on realistic MAPF domains with problem sizes ranging from dozens to hundreds of agents. Our findings indicate that they improve welfare beyond a simple baseline. </p>
back
<p>The emergence of tools based on artificial intelligence has also led to the need of producing explanations which are understandable by a human being. In some approaches, the system is not transparent (often referred to as a "black box"), making it difficult to generate appropriate explanations. In this work, though, we consider probabilistic logic programming, a combination of logic programming (for knowledge representation) and probability (to model uncertainty). In this setting, one can say that models are interpretable, which eases its understanding. However, given a particular query, the usual notion of "explanation" is associated with a set of choices, one for each random variable of the model. Unfortunately, this set does not have a causal structure and, in fact, some of the choices are actually irrelevant to the considered query. In order to overcome these shortcomings, we present an approach to explaining explanations which is based on the definition of a query-driven inference mechanism for probabilistic logic programs. </p>
back
<p>Movable antenna (MA) provides an innovative way to arrange antennas that can contribute to improved signal quality and more effective interference management. This method is especially beneficial for full-duplex (FD) wireless, which struggles with self-interference (SI) that usually overpowers the desired incoming signals. By dynamically repositioning transmit/receive antennas, we can mitigate the SI and enhance the reception of incoming signals. Thus, this paper proposes a novel MA-enabled point-to-point FD wireless system and formulates the minimum achievable rate of two FD terminals. To maximize the minimum achievable rate and determine the near-optimal positions of the MAs, we introduce a solution based on projected particle swarm optimization (PPSO), which can circumvent common suboptimal positioning issues. Moreover, numerical results reveal that the PPSO method leads to a better performance compared to the conventional alternating position optimization (APO). The results also demonstrate that an MA-enabled FD system outperforms the one using fixed-position antennas (FPAs). </p>
back
<p>As computer vision continues to advance and finds widespread applications across various domains, the need for interpretability in deep learning models becomes paramount. Existing methods often resort to post-hoc techniques or prototypes to explain the decision-making process, which can be indirect and lack intrinsic illustration. In this research, we introduce ViTree, a novel approach for fine-grained visual categorization that combines the popular vision transformer as a feature extraction backbone with neural decision trees. By traversing the tree paths, ViTree effectively selects patches from transformer-processed features to highlight informative local regions, thereby refining representations in a step-wise manner. Unlike previous tree-based models that rely on soft distributions or ensembles of paths, ViTree selects a single tree path, offering a clearer and simpler decision-making process. This patch and path selectivity enhances model interpretability of ViTree, enabling better insights into the model's inner workings. Remarkably, extensive experimentation validates that this streamlined approach surpasses various strong competitors and achieves state-of-the-art performance while maintaining exceptional interpretability which is proved by multi-perspective methods. Code can be found at https://github.com/SJTU-DeepVisionLab/ViTree. </p>
back
<p>Deep learning for tabular data has garnered increasing attention in recent years, yet employing deep models for structured data remains challenging. While these models excel with unstructured data, their efficacy with structured data has been limited. Recent research has introduced retrieval-augmented models to address this gap, demonstrating promising results in supervised tasks such as classification and regression. In this work, we investigate using retrieval-augmented models for anomaly detection on tabular data. We propose a reconstruction-based approach in which a transformer model learns to reconstruct masked features of \textit{normal} samples. We test the effectiveness of KNN-based and attention-based modules to select relevant samples to help in the reconstruction process of the target sample. Our experiments on a benchmark of 31 tabular datasets reveal that augmenting this reconstruction-based anomaly detection (AD) method with non-parametric relationships via retrieval modules may significantly boost performance. </p>
back
<p>We present BlockFusion, a diffusion-based model that generates 3D scenes as unit blocks and seamlessly incorporates new blocks to extend the scene. BlockFusion is trained using datasets of 3D blocks that are randomly cropped from complete 3D scene meshes. Through per-block fitting, all training blocks are converted into the hybrid neural fields: with a tri-plane containing the geometry features, followed by a Multi-layer Perceptron (MLP) for decoding the signed distance values. A variational auto-encoder is employed to compress the tri-planes into the latent tri-plane space, on which the denoising diffusion process is performed. Diffusion applied to the latent representations allows for high-quality and diverse 3D scene generation. To expand a scene during generation, one needs only to append empty blocks to overlap with the current scene and extrapolate existing latent tri-planes to populate new blocks. The extrapolation is done by conditioning the generation process with the feature samples from the overlapping tri-planes during the denoising iterations. Latent tri-plane extrapolation produces semantically and geometrically meaningful transitions that harmoniously blend with the existing scene. A 2D layout conditioning mechanism is used to control the placement and arrangement of scene elements. Experimental results indicate that BlockFusion is capable of generating diverse, geometrically consistent and unbounded large 3D scenes with unprecedented high-quality shapes in both indoor and outdoor scenarios. </p>
back
<p>Finding obstacle-free paths in unknown environments is a big navigation issue for visually impaired people and autonomous robots. Previous works focus on obstacle avoidance, however they do not have a general view of the environment they are moving in. New devices based on computer vision systems can help impaired people to overcome the difficulties of navigating in unknown environments in safe conditions. In this work it is proposed a combination of sensors and algorithms that can lead to the building of a navigation system for visually impaired people. Based on traditional systems that use RGB-D cameras for obstacle avoidance, it is included and combined the information of a fish-eye camera, which will give a better understanding of the user's surroundings. The combination gives robustness and reliability to the system as well as a wide field of view that allows to obtain many information from the environment. This combination of sensors is inspired by human vision where the center of the retina (fovea) provides more accurate information than the periphery, where humans have a wider field of view. The proposed system is mounted on a wearable device that provides the obstacle-free zones of the scene, allowing the planning of trajectories for people guidance. </p>
back
<p>We leverage the Gibbs inequality and its natural generalization to R\'enyi entropies to derive closed-form parametric expressions of the optimal lower bounds of $\rho$th-order guessing entropy (guessing moment) of a secret taking values on a finite set, in terms of the R\'enyi-Arimoto $\alpha$-entropy. This is carried out in an non-asymptotic regime when side information may be available. The resulting bounds yield a theoretical solution to a fundamental problem in side-channel analysis: Ensure that an adversary will not gain much guessing advantage when the leakage information is sufficiently weakened by proper countermeasures in a given cryptographic implementation. Practical evaluation for classical leakage models show that the proposed bounds greatly improve previous ones for analyzing the capability of an adversary to perform side-channel attacks. </p>
back
<p>In this work we present a novel approach for 3D layout recovery of indoor environments using a non-central acquisition system. From a non-central panorama, full and scaled 3D lines can be independently recovered by geometry reasoning without geometric nor scale assumptions. However, their sensitivity to noise and complex geometric modeling has led these panoramas being little investigated. Our new pipeline aims to extract the boundaries of the structural lines of an indoor environment with a neural network and exploit the properties of non-central projection systems in a new geometrical processing to recover an scaled 3D layout. The results of our experiments show that we improve state-of-the-art methods for layout reconstruction and line extraction in non-central projection systems. We completely solve the problem in Manhattan and Atlanta environments, handling occlusions and retrieving the metric scale of the room without extra measurements. As far as the authors knowledge goes, our approach is the first work using deep learning on non-central panoramas and recovering scaled layouts from single panoramas. </p>
back
<p>In data exploration, executing complex non-aggregate queries over large databases can be time-consuming. Our paper introduces a novel approach to address this challenge, focusing on finding an optimized subset of data, referred to as the approximation set, for query execution. The goal is to maximize query result quality while minimizing execution time. We formalize this problem as Approximate Non-Aggregates Query Processing (ANAQP) and establish its NP-completeness. To tackle this, we propose an approximate solution using advanced Reinforcement Learning architecture, termed ASQP-RL. This approach overcomes challenges related to the large action space and the need for generalization beyond a known query workload. Experimental results on two benchmarks demonstrate the superior performance of ASQP-RL, outperforming baselines by 30% in accuracy and achieving efficiency gains of 10-35X. Our research sheds light on the potential of reinforcement learning techniques for advancing data management tasks. Experimental results on two benchmarks show that ASQP-RL significantly outperforms the baselines both in terms of accuracy (30% better) and efficiency (10-35X). This research provides valuable insights into the potential of RL techniques for future advancements in data management tasks. </p>
back
<p>Omnidirectional and 360{\deg} images are becoming widespread in industry and in consumer society, causing omnidirectional computer vision to gain attention. Their wide field of view allows the gathering of a great amount of information about the environment from only an image. However, the distortion of these images requires the development of specific algorithms for their treatment and interpretation. Moreover, a high number of images is essential for the correct training of computer vision algorithms based on learning. In this paper, we present a tool for generating datasets of omnidirectional images with semantic and depth information. These images are synthesized from a set of captures that are acquired in a realistic virtual environment for Unreal Engine 4 through an interface plugin. We gather a variety of well-known projection models such as equirectangular and cylindrical panoramas, different fish-eye lenses, catadioptric systems, and empiric models. Furthermore, we include in our tool photorealistic non-central-projection systems as non-central panoramas and non-central catadioptric systems. As far as we know, this is the first reported tool for generating photorealistic non-central images in the literature. Moreover, since the omnidirectional images are made virtually, we provide pixel-wise information about semantics and depth as well as perfect knowledge of the calibration parameters of the cameras. This allows the creation of ground-truth information with pixel precision for training learning algorithms and testing 3D vision approaches. To validate the proposed tool, different computer vision algorithms are tested as line extractions from dioptric and catadioptric central images, 3D Layout recovery and SLAM using equirectangular panoramas, and 3D reconstruction from non-central panoramas. </p>
back
<p>This article proposes a test procedure that can be used to test ML models and ML-based systems independently of the actual training process. In this way, the typical quality statements such as accuracy and precision of these models and system can be verified independently, taking into account their black box character and the immanent stochastic properties of ML models and their training data. The article presents first results from a set of test experiments and suggest extensions to existing test methods reflecting the stochastic nature of ML models and ML-based systems. </p>
back
<p>For most service architectures, such as OSGi and Spring, architecture-specific tools allow software developers and architects to visualize otherwise obscure configurations hidden in the project files. Such visualization tools are often used for documentation purposes and help to better understand programs than with source code alone. However, such tools often do not address project-specific peculiarities or do not exist at all for less common architectures, requiring developers to use different visualization and analysis tools within the same architecture. Furthermore, many generic modeling tools and architecture visualization tools require their users to create and maintain models manually. </p> <p>We here propose a DSL-driven approach that allows software architects to define and adapt their own project visualization tool. The approach, which we refer to as Software Project Visualization (SPViz), uses two DSLs, one to describe architectural elements and their relationships, and one to describe how these should be visualized. We demonstrate how SPViz can then automatically synthesize a customized, project-specific visualization tool that can adapt to changes in the underlying project automatically. </p> <p>We implemented our approach in an open-source library, also termed SPViz and discuss and analyze four different tools that follow this concept, including open-source projects and projects from an industrial partner in the railway domain. </p>
back
<p>As intelligent systems become increasingly important in our daily lives, new ways of interaction are needed. Classical user interfaces pose issues for the physically impaired and are partially not practical or convenient. Gesture recognition is an alternative, but often not reactive enough when conventional cameras are used. This work proposes a Spiking Convolutional Neural Network, processing event- and depth data for gesture recognition. The network is simulated using the open-source neuromorphic computing framework LAVA for offline training and evaluation on an embedded system. For the evaluation three open source data sets are used. Since these do not represent the applied bi-modality, a new data set with synchronized event- and depth data was recorded. The results show the viability of temporal encoding on depth information and modality fusion, even on differently encoded data, to be beneficial to network performance and generalization capabilities. </p>
back
<p>We propose a simple and effective method to derive the Fundamental Diagram (FD) from platoon vehicle trajectories. Average traffic state variables are computed using Edie's generalized definitions within time-dependent trapezoidal space-time areas. To obtain a clear FD, we employ a bivariate data aggregation technique to eliminate scatter. Our findings are as follows: (i) The proposed method demonstrates a remarkably consistent relation between the traffic variables and a clear triangular shape for autonomously-driven vehicles. (ii) The FDs are invariant to several factors of heterogeneity such as the platoon length, vehicle characteristics, road particularities, and data acquisition accuracy. (iii) ACC-driven vehicle platoons with minimum headway setting achieve much higher capacity, roughly 90\% than those with a large headway setting. (iv) Connectivity might increase capacity. (v) Human drivers have a wider near-capacity operation area, showing different behaviors at high speeds than low ones, and (vi) Safety concerns might arise due to high values of backward wave speed for ACC-driven vehicles. Comparative analysis with the state-of-the-art confirms the validity of our approach. The proposed method stands out due to its simplicity and accuracy, which paves the way for practical applications in real-time traffic flow monitoring and control within modern intelligent transportation systems. </p>
back
<p>Integration of technological solutions aims to improve accuracy, precision and repeatability in farming operations, and biosensor devices are increasingly used for understanding basic biology during livestock production. The aim of this study was to design and validate a miniaturized tri-axial accelerometer for non-invasive monitoring of farmed fish with re-programmable schedule protocols.The device was attached to the operculum of gilthead sea bream and European sea bass juveniles for monitoring their physical activity by measurements of movement accelerations in x and y-axes, while records of operculum beats served as a measurement of respiratory frequency. Data post-processing of exercised fish in swimming test chambers revealed an exponential increase of fish accelerations with the increase of fish speed from 1 body-length to 4 body-lengths per second, while a close relationship between oxygen consumption and opercular frequency was consistently found.The usefulness of low computational load for data pre-processing with on-board algorithms was verified from low to submaximal exercise, increasing this procedure the autonomy of the system up to 6 h of data recording with different programmable schedules. Visual observations regarding tissue damage, feeding behavior and circulating levels of stress markers did not reveal at short term a negative impact of device tagging. Reduced plasma levels of triglycerides revealed a transient inhibition of feed intake in small fish, but this disturbance was not detected in larger fish. All this considered together is the proof of concept that miniaturized devices are suitable for non-invasive and reliable metabolic phenotyping of farmed fish to improve their overall performance and welfare. Further work is underway for improving the attachment procedure and the full device packaging. </p>
back
<p>Instruction-tuned Large Language Models (LLMs) have recently showcased remarkable advancements in their ability to generate fitting responses to natural language instructions. However, many current works rely on manual evaluation to judge the quality of generated responses. Since such manual evaluation is time-consuming, it does not easily scale to the evaluation of multiple models and model variants. In this short paper, we propose a straightforward but remarkably effective evaluation metric called SemScore, in which we directly compare model outputs to gold target responses using semantic textual similarity (STS). We conduct a comparative evaluation of the model outputs of 12 prominent instruction-tuned LLMs using 8 widely-used evaluation metrics for text generation. We find that our proposed SemScore metric outperforms all other, in many cases more complex, evaluation metrics in terms of correlation to human evaluation. These findings indicate the utility of our proposed metric for the evaluation of instruction-tuned LLMs. </p>
back
<p>Omnidirectional images are one of the main sources of information for learning based scene understanding algorithms. However, annotated datasets of omnidirectional images cannot keep the pace of these learning based algorithms development. Among the different panoramas and in contrast to standard central ones, non-central panoramas provide geometrical information in the distortion of the image from which we can retrieve 3D information of the environment [2]. However, due to the lack of commercial non-central devices, up until now there was no dataset of these kinds of panoramas. In this data paper, we present the first dataset of non-central panoramas for indoor scene understanding. The dataset is composed by {\bf 2574} RGB non-central panoramas taken in around 650 different rooms. Each panorama has associated a depth map and annotations to obtain the layout of the room from the image as a structural edge map, list of corners in the image, the 3D corners of the room and the camera pose. The images are taken from photorealistic virtual environments and pixel-wise automatically annotated. </p>
back
<p>We consider the task of learning individual-specific intensities of counting processes from a set of static variables and irregularly sampled time series. We introduce a novel modelization approach in which the intensity is the solution to a controlled differential equation. We first design a neural estimator by building on neural controlled differential equations. In a second time, we show that our model can be linearized in the signature space under sufficient regularity conditions, yielding a signature-based estimator which we call CoxSig. We provide theoretical learning guarantees for both estimators, before showcasing the performance of our models on a vast array of simulated and real-world datasets from finance, predictive maintenance and food supply chain management. </p>
back
<p>Casting manipulation has been studied to expand the robot's movable range. In this manipulation, the robot throws and reaches the end effector to a distant target. Usually, a special casting manipulator, which consists of rigid arm links and specific flexible linear objects, is constructed for an effective casting manipulation. However, the special manipulator cannot perform normal manipulations, such as picking and placing, grasping, and operating objects. We propose that the normal robot arm, which can perform normal tasks, picks up an unknown string in the surrounding environment and realizes casting manipulation with it. As the properties of the string are not provided in advance, it is crucial how to reflect it in casting manipulation. This is realized by the motion generation of the robot arm with the simulation of string movement, actual string manipulation by the robot arm, and string parameter estimation from the actual string movement. After repeating these three steps, the simulated string movement approximates the actual to realize casting manipulation with the unknown string. We confirmed the effectiveness of the proposed method through experiments. The try of this study will lead to enhancement of the performance of home service robot, exploration robot, rescue robot and entertainment robot. </p>
back
<p>Autonomous robot navigation within the dynamic unknown environment is of crucial significance for mobile robotic applications including robot navigation in last-mile delivery and robot-enabled automated supplies in industrial and hospital delivery applications. Current solutions still suffer from limitations, such as the robot cannot recognize unknown objects in real time and cannot navigate freely in a dynamic, narrow, and complex environment. We propose a complete software framework for autonomous robot perception and navigation within very dense obstacles and dense human crowds. First, we propose a framework that accurately detects and segments open-world object categories in a zero-shot manner, which overcomes the over-segmentation limitation of the current SAM model. Second, we proposed the distillation strategy to distill the knowledge to segment the free space of the walkway for robot navigation without the label. In the meantime, we design the trimming strategy that works collaboratively with distillation to enable lightweight inference to deploy the neural network on edge devices such as NVIDIA-TX2 or Xavier NX during autonomous navigation. Integrated into the robot navigation system, extensive experiments demonstrate that our proposed framework has achieved superior performance in terms of both accuracy and efficiency in robot scene perception and autonomous robot navigation. </p>
back
<p>A multi-input multi-output (MIMO) Gaussian channel with two transmit antennas and two receive antennas is studied that is subject to an input peak-power constraint. The capacity and the capacity-achieving input distribution are unknown in general. The problem is shown to be equivalent to a channel with an identity matrix but where the input lies inside and on an ellipse with principal axis length $r_p$ and minor axis length $r_m$. If $r_p \le \sqrt{2}$, then the capacity-achieving input has support on the ellipse. A sufficient condition is derived under which a two-point distribution is optimal. Finally, if $r_m &lt; r_p \le \sqrt{2}$, then the capacity-achieving distribution is discrete. </p>
back
<p>Data generation is a data augmentation technique for enhancing the generalization ability for skeleton-based human action recognition. Most existing data generation methods face challenges to ensure the temporal consistency of the dynamic information for action. In addition, the data generated by these methods lack diversity when only a few training samples are available. To solve those problems, We propose a novel active generative network (AGN), which can adaptively learn various action categories by motion style transfer to generate new actions when the data for a particular action is only a single sample or few samples. The AGN consists of an action generation network and an uncertainty metric network. The former, with ST-GCN as the Backbone, can implicitly learn the morphological features of the target action while preserving the category features of the source action. The latter guides generating actions. Specifically, an action recognition model generates prediction vectors for each action, which is then scored using an uncertainty metric. Finally, UMN provides the uncertainty sampling basis for the generated actions. </p>
back
<p>We present a new method to estimate the rate-distortion-perception function in the perfect realism regime (PR-RDPF), for multivariate continuous sources subject to a single-letter average distortion constraint. The proposed approach is not only able to solve the specific problem but also two related problems: the entropic optimal transport (EOT) and the output-constrained rate-distortion function (OC-RDF), of which the PR-RDPF represents a special case. Using copula distributions, we show that the OC-RDF can be cast as an I-projection problem on a convex set, based on which we develop a parametric solution of the optimal projection proving that its parameters can be estimated, up to an arbitrary precision, via the solution of a convex program. Subsequently, we propose an iterative scheme via gradient methods to estimate the convex program. Lastly, we characterize a Shannon lower bound (SLB) for the PR-RDPF under a mean squared error (MSE) distortion constraint. We support our theoretical findings with numerical examples by assessing the estimation performance of our iterative scheme using the PR-RDPF with the obtained SLB for various sources. </p>
back
<p>The labor market is changing rapidly, prompting increased interest in the automatic extraction of occupational skills from text. With the advent of English benchmark job description datasets, there is a need for systems that handle their diversity well. We tackle the complexity in occupational skill datasets tasks -- combining and leveraging multiple datasets for skill extraction, to identify rarely observed skills within a dataset, and overcoming the scarcity of skills across datasets. In particular, we investigate the retrieval-augmentation of language models, employing an external datastore for retrieving similar skills in a dataset-unifying manner. Our proposed method, \textbf{N}earest \textbf{N}eighbor \textbf{O}ccupational \textbf{S}kill \textbf{E}xtraction (NNOSE) effectively leverages multiple datasets by retrieving neighboring skills from other datasets in the datastore. This improves skill extraction \emph{without} additional fine-tuning. Crucially, we observe a performance gain in predicting infrequent patterns, with substantial gains of up to 30\% span-F1 in cross-dataset settings. </p>
back
<p>To leverage LLMs for visual synthesis, traditional methods convert raster image information into discrete grid tokens through specialized visual modules, while disrupting the model's ability to capture the true semantic representation of visual scenes. This paper posits that an alternative representation of images, vector graphics, can effectively surmount this limitation by enabling a more natural and semantically coherent segmentation of the image information. Thus, we introduce StrokeNUWA, a pioneering work exploring a better visual representation ''stroke tokens'' on vector graphics, which is inherently visual semantics rich, naturally compatible with LLMs, and highly compressed. Equipped with stroke tokens, StrokeNUWA can significantly surpass traditional LLM-based and optimization-based methods across various metrics in the vector graphic generation task. Besides, StrokeNUWA achieves up to a 94x speedup in inference over the speed of prior methods with an exceptional SVG code compression ratio of 6.9%. </p>
back
<p>In the literature, there are many results about permutation polynomials over finite fields. However, very few permutations of vector spaces are constructed although it has been shown that permutations of vector spaces have many applications in cryptography, especially in constructing permutations with low differential and boomerang uniformities. </p> <p>In this paper, motivated by the butterfly structure \cite{perrin2016cryptanalysis} and the work of Qu and Li \cite{qu2023}, we investigate rotatable permutations from $\gf_{2^m}^3$ to itself with $d$-homogenous functions. </p> <p>Based on the theory of equations of low degree, the resultant of polynomials, and some skills of exponential sums, we construct five infinite classes of $3$-homogeneous rotatable permutations from $\gf_{2^m}^3$ to itself, where $m$ is odd. Moreover, we demonstrate that the corresponding permutation polynomials of $\gf_{2^{3m}}$ of our newly constructed permutations of $\gf_{2^m}^3$ are QM-inequivalent to the known ones. </p>
back
<p>This paper leverages macroscopic models and multi-source spatiotemporal data collected from automatic traffic counters and probe vehicles to accurately estimate traffic flow and travel time in links where these measurements are unavailable. This problem is critical in transportation planning applications where the sensor coverage is low and the planned interventions have network-wide impacts. The proposed model, named the Macroscopic Traffic Estimator (MaTE), can perform network-wide estimations of traffic flow and travel time only using the set of observed measurements of these quantities. Because MaTE is grounded in macroscopic flow theory, all parameters and variables are interpretable. The estimated traffic flow satisfies fundamental flow conservation constraints and exhibits an increasing monotonic relationship with the estimated travel time. Using logit-based stochastic traffic assignment as the principle for routing flow behavior makes the model fully differentiable with respect to the model parameters. This property facilitates the application of computational graphs to learn parameters from vast amounts of spatiotemporal data. We also integrate neural networks and polynomial kernel functions to capture link flow interactions and enrich the mapping of traffic flows into travel times. MaTE also adds a destination choice model and a trip generation model that uses historical data on the number of trips generated by location. Experiments on synthetic data show that the model can accurately estimate travel time and traffic flow in out-of-sample links. Results obtained using real-world multi-source data from a large-scale transportation network suggest that MaTE outperforms data-driven benchmarks, especially in travel time estimation. The estimated parameters of MaTE are also informative about the hourly change in travel demand and supply characteristics of the transportation network. </p>
back
<p>Handwritten character recognition (HCR) is a challenging problem for machine learning researchers. Unlike printed text data, handwritten character datasets have more variation due to human-introduced bias. With numerous unique character classes present, some data, such as Logographic Scripts or Sino-Korean character sequences, bring new complications to the HCR problem. The classification task on such datasets requires the model to learn high-complexity details of the images that share similar features. With recent advances in computational resource availability and further computer vision theory development, some research teams have effectively addressed the arising challenges. Although known for achieving high efficiency, many common approaches are still not generalizable and use dataset-specific solutions to achieve better results. Due to complex structure and high computing demands, existing methods frequently prevent the solutions from gaining popularity. This paper proposes a straightforward, generalizable, and highly effective approach (CharNet) for detailed character image classification and compares its performance to that of existing approaches. </p>
back
<p>Traditionally, Machine Translation (MT) Evaluation has been treated as a regression problem -- producing an absolute translation-quality score. This approach has two limitations: i) the scores lack interpretability, and human annotators struggle with giving consistent scores; ii) most scoring methods are based on (reference, translation) pairs, limiting their applicability in real-world scenarios where references are absent. In practice, we often care about whether a new MT system is better or worse than some competitors. In addition, reference-free MT evaluation is increasingly practical and necessary. Unfortunately, these two practical considerations have yet to be jointly explored. In this work, we formulate the reference-free MT evaluation into a pairwise ranking problem. Given the source sentence and a pair of translations, our system predicts which translation is better. In addition to proposing this new formulation, we further show that this new paradigm can demonstrate superior correlation with human judgments by merely using indirect supervision from natural language inference and weak supervision from our synthetic data. In the context of reference-free evaluation, MT-Ranker, trained without any human annotations, achieves state-of-the-art results on the WMT Shared Metrics Task benchmarks DARR20, MQM20, and MQM21. On a more challenging benchmark, ACES, which contains fine-grained evaluation criteria such as addition, omission, and mistranslation errors, MT-Ranker marks state-of-the-art against reference-free as well as reference-based baselines. </p>
back
<p>The effectiveness of an IR system is gauged not just by its ability to retrieve relevant results but also by how it presents these results to users; an engaging presentation often correlates with increased user satisfaction. While existing research has delved into the link between user satisfaction, IR performance metrics, and presentation, these aspects have typically been investigated in isolation. Our research aims to bridge this gap by examining the relationship between query performance, presentation and user satisfaction. For our analysis, we conducted a between-subjects experiment comparing the effectiveness of various result card layouts for an ad-hoc news search interface. Drawing data from the TREC WaPo 2018 collection, we centered our study on four specific topics. Within each of these topics, we assessed six distinct queries with varying nDCG values. Our study involved 164 participants who were exposed to one of five distinct layouts containing result cards, such as "title'', "title+image'', or "title+image+summary''. Our findings indicate that while nDCG is a strong predictor of user satisfaction at the query level, there exists no linear relationship between the performance of the query, presentation of results and user satisfaction. However, when considering the total gain on the initial result page, we observed that presentation does play a significant role in user satisfaction (at the query level) for certain layouts with result cards such as, title+image or title+image+summary. Our results also suggest that the layout differences have complex and multifaceted impacts on satisfaction. We demonstrate the capacity to equalize user satisfaction levels between queries of varying performance by changing how results are presented. This emphasizes the necessity to harmonize both performance and presentation in IR systems, considering users' diverse preferences. </p>
back
<p>Purpose: To develop a method for automated segmentation of hypothalamus subregions informed by ultra-high resolution ex vivo magnetic resonance images (MRI), which generalizes across MRI sequences and resolutions without retraining. </p> <p>Materials and Methods: We trained our deep learning method, H-synEx, with synthetic images derived from label maps built from ultra-high resolution ex vivo MRI scans, which enables finer-grained manual segmentation when compared with 1mm isometric in vivo images. We validated this retrospective study using 1535 in vivo images from six datasets and six MRI sequences. The quantitative evaluation used the Dice Coefficient (DC) and Average Hausdorff distance (AVD). Statistical analysis compared hypothalamic subregion volumes in controls, Alzheimer's disease (AD), and behavioral variant frontotemporal dementia (bvFTD) subjects using the area under the curve (AUC) and Wilcoxon rank sum test. </p> <p>Results: H-SynEx can segment the hypothalamus across various MRI sequences, encompassing FLAIR sequences with significant slice spacing (5mm). Using hypothalamic volumes on T1w images to distinguish control from AD and bvFTD patients, we observed AUC values of 0.74 and 0.79 respectively. Additionally, AUC=0.66 was found for volume variation on FLAIR scans when comparing control and non-patients. </p> <p>Conclusion: Our results show that H-SynEx successfully leverages information from ultra-high resolution scans to segment in vivo from different MRI sequences such as T1w, T2w, PD, qT1, FA, and FLAIR. We also found that our automated segmentation was able to discriminate controls versus patients on FLAIR images with 5mm spacing. H-SynEx is openly available at https://github.com/liviamarodrigues/hsynex. </p>
back
<p>This paper investigates the secure resource allocation for a downlink integrated sensing and communication system with multiple legal users and potential eavesdroppers. In the considered model, the base station (BS) simultaneously transmits sensing and communication signals through beamforming design, where the sensing signals can be viewed as artificial noise to enhance the security of communication signals. To further enhance the security in the semantic layer, the semantic information is extracted from the original information before transmission. The user side can only successfully recover the received information with the help of the knowledge base shared with the BS, which is stored in advance. Our aim is to maximize the sum semantic secrecy rate of all users while maintaining the minimum quality of service for each user and guaranteeing overall sensing performance. To solve this sum semantic secrecy rate maximization problem, an iterative algorithm is proposed using the alternating optimization method. The simulation results demonstrate the superiority of the proposed algorithm in terms of secure semantic communication and reliable detection. </p>
back
<p>The field of Neural Style Transfer (NST) has witnessed remarkable progress in the past few years, with approaches being able to synthesize artistic and photorealistic images and videos of exceptional quality. To evaluate such results, a diverse landscape of evaluation methods and metrics is used, including authors' opinions based on side-by-side comparisons, human evaluation studies that quantify the subjective judgements of participants, and a multitude of quantitative computational metrics which objectively assess the different aspects of an algorithm's performance. However, there is no consensus regarding the most suitable and effective evaluation procedure that can guarantee the reliability of the results. In this review, we provide an in-depth analysis of existing evaluation techniques, identify the inconsistencies and limitations of current evaluation methods, and give recommendations for standardized evaluation practices. We believe that the development of a robust evaluation framework will not only enable more meaningful and fairer comparisons among NST methods but will also enhance the comprehension and interpretation of research findings in the field. </p>
back
<p>This note presents an upper bound of $1.252 n$ on the size of a set system that satisfies the mod-6 town rules. Under these rules the sizes of the sets are not congruent to $0\bmod 6$ while the sizes of all pairwise intersections are congruent to $ 0\bmod 6$. </p>
back
<p>The Mersenne Twister (MT) is a pseudo-random number generator (PRNG) widely used in High Performance Computing for parallel stochastic simulations. We aim to assess the quality of common parallelization techniques used to generate large streams of MT pseudo-random numbers. We compare three techniques: sequence splitting, random spacing and MT indexed sequence. The TestU01 Big Crush battery is used to evaluate the quality of 4096 streams for each technique on three different hardware configurations. Surprisingly, all techniques exhibited almost 30% of defects with no technique showing better quality than the others. While all 106 Big Crush tests showed failures, the failure rate was limited to a small number of tests (maximum of 6 tests failed per stream, resulting in over 94% success rate). Thanks to 33 CPU years, high-quality streams identified are given. They can be used for sensitive parallel simulations such as nuclear medicine and precise high-energy physics applications. </p>
back
<p>Quantum computing shows great potential, but errors pose a significant challenge. This study explores new strategies for mitigating quantum errors using artificial neural networks (ANN) and the Yang-Baxter equation (YBE). Unlike traditional error correction methods, which are computationally intensive, we investigate artificial error mitigation. The manuscript introduces the basics of quantum error sources and explores the potential of using classical computation for error mitigation. The Yang-Baxter equation plays a crucial role, allowing us to compress time dynamics simulations into constant-depth circuits. By introducing controlled noise through the YBE, we enhance the dataset for error mitigation. We train an ANN model on partial data from quantum simulations, demonstrating its effectiveness in correcting errors in time-evolving quantum states. </p>
back
<p>Vision-based estimation of the motion of a moving target is usually formulated as a bearing-only estimation problem where the visual measurement is modeled as a bearing vector. Although the bearing-only approach has been studied for decades, a fundamental limitation of this approach is that it requires extra lateral motion of the observer to enhance the target's observability. Unfortunately, the extra lateral motion conflicts with the desired motion of the observer in many tasks. It is well-known that, once a target has been detected in an image, a bounding box that surrounds the target can be obtained. Surprisingly, this common visual measurement especially its size information has not been well explored up to now. In this paper, we propose a new bearing-angle approach to estimate the motion of a target by modeling its image bounding box as bearing-angle measurements. Both theoretical analysis and experimental results show that this approach can significantly enhance the observability without relying on additional lateral motion of the observer. The benefit of the bearing-angle approach comes with no additional cost because a bounding box is a standard output of object detection algorithms. The approach simply exploits the information that has not been fully exploited in the past. No additional sensing devices or special detection algorithms are required. </p>
back
<p>Traditional models grounded in first principles often struggle with accuracy as the system's complexity increases. Conversely, machine learning approaches, while powerful, face challenges in interpretability and in handling physical constraints. Efforts to combine these models often often stumble upon difficulties in finding a balance between accuracy and complexity. To address these issues, we propose a comprehensive framework based on a "mixture of experts" rationale. This approach enables the data-based fusion of diverse local models, leveraging the full potential of first-principle-based priors. Our solution allows independent training of experts, drawing on techniques from both machine learning and system identification, and it supports both collaborative and competitive learning paradigms. To enhance interpretability, we penalize abrupt variations in the expert's combination. Experimental results validate the effectiveness of our approach in producing an interpretable combination of models closely resembling the target phenomena. </p>
back
<p>Landscape renderings are realistic images of landscape sites, allowing stakeholders to perceive better and evaluate design ideas. While recent advances in Generative Artificial Intelligence (GAI) enable automated generation of landscape renderings, the end-to-end methods are not compatible with common design processes, leading to insufficient alignment with design idealizations and limited cohesion of iterative landscape design. Informed by a formative study for comprehending design requirements, we present PlantoGraphy, an iterative design system that allows for interactive configuration of GAI models to accommodate human-centered design practice. A two-stage pipeline is incorporated: first, concretization module transforms conceptual ideas into concrete scene layouts with a domain-oriented large language model; and second, illustration module converts scene layouts into realistic landscape renderings using a fine-tuned low-rank adaptation diffusion model. PlantoGraphy has undergone a series of performance evaluations and user studies, demonstrating its effectiveness in landscape rendering generation and the high recognition of its interactive functionality. </p>
back
<p>3D neural implicit representations play a significant component in many robotic applications. However, reconstructing neural radiance fields (NeRF) from realistic event data remains a challenge due to the sparsities and the lack of information when only event streams are available. In this paper, we utilize motion, geometry, and density priors behind event data to impose strong physical constraints to augment NeRF training. The proposed novel pipeline can directly benefit from those priors to reconstruct 3D scenes without additional inputs. Moreover, we present a novel density-guided patch-based sampling strategy for robust and efficient learning, which not only accelerates training procedures but also conduces to expressions of local geometries. More importantly, we establish the first large dataset for event-based 3D reconstruction, which contains 101 objects with various materials and geometries, along with the groundtruth of images and depth maps for all camera viewpoints, which significantly facilitates other research in the related fields. The code and dataset will be publicly available at https://github.com/Mercerai/PAEv3d. </p>
back
<p>In recent years the need for DC distribution buses has increased considerably. As it can be noticed in the transport for example the distribution systems of the more electric aircrafts, ships, or electric cars. Given the complexities of the systems presented above, the need to use more and more switched power converters has arisen. The main problem of the connection of multiple controlled switched converters acting as source and load is the degradation of stability that occurs on the DC distribution bus due to the converter interactions. To study the stability in the distribution bus there are some wellestablished criteria. These criteria require knowledge of the input impedance of the converters that act as load and the output impedance of the equipment that acts as source. In order to reduce the complexity of obtaining the input impedance a model based on a controlled converter acting as a constant power load (CPL) is commonly used. This article studies the accuracy of this model for a commonly used topology in distribution systems nowadays, Two Level Voltage Source Converter (2L-VSC), studying different scenarios that make the model become inaccurate. </p>
back
<p>Deep generative models (DGMs) have been widely developed for graph data. However, much less investigation has been carried out on understanding the latent space of such pretrained graph DGMs. These understandings possess the potential to provide constructive guidelines for crucial tasks, such as graph controllable generation. Thus in this work, we are interested in studying this problem and propose GraphCG, a method for the unsupervised discovery of steerable factors in the latent space of pretrained graph DGMs. We first examine the representation space of three pretrained graph DGMs with six disentanglement metrics, and we observe that the pretrained representation space is entangled. Motivated by this observation, GraphCG learns the steerable factors via maximizing the mutual information between semantic-rich directions, where the controlled graph moving along the same direction will share the same steerable factors. We quantitatively verify that GraphCG outperforms four competitive baselines on two graph DGMs pretrained on two molecule datasets. Additionally, we qualitatively illustrate seven steerable factors learned by GraphCG on five pretrained DGMs over five graph datasets, including two for molecules and three for point clouds. </p>
back
<p>Personalized federated learning (PFL) has been widely investigated to address the challenge of data heterogeneity, especially when a single generic model is inadequate in satisfying the diverse performance requirements of local clients simultaneously. Existing PFL methods are inherently based on the idea that the relations between the generic global and personalized local models are captured by the similarity of model weights. Such a similarity is primarily based on either partitioning the model architecture into generic versus personalized components, or modeling client relationships via model weights. To better capture similar (yet distinct) generic versus personalized model representations, we propose \textit{spectral distillation}, a novel distillation method based on model spectrum information. Building upon spectral distillation, we also introduce a co-distillation framework that establishes a two-way bridge between generic and personalized model training. Moreover, to utilize the local idle time in conventional PFL, we propose a wait-free local training protocol. Through extensive experiments on multiple datasets over diverse heterogeneous data settings, we demonstrate the outperformance and efficacy of our proposed spectral co-distillation method, as well as our wait-free training protocol. </p>
back
<p>A key challenge for supporting elastic behaviour in cloud systems is to achieve a good performance in automated (de-)provisioning and scheduling of computing resources. One of the key aspects that can be significant is the overheads associated with deploying, terminating and maintaining resources. Therefore, due to their lower start up and termination overhead, containers are rapidly replacing Virtual Machines (VMs) in many cloud deployments, as the computation instance of choice. In this paper, we analyse the performance of Kubernetes achieved through a Petri net-based performance model. Kubernetes is a container management system for a distributed cluster environment. Our model can be characterised using data from a Kubernetes deployment, and can be exploited for supporting capacity planning and designing Kubernetes-based elastic applications. </p>
back
<p>The increased application of machine learning (ML) in sensitive domains requires protecting the training data through privacy frameworks, such as differential privacy (DP). DP requires to specify a uniform privacy level $\varepsilon$ that expresses the maximum privacy loss that each data point in the entire dataset is willing to tolerate. Yet, in practice, different data points often have different privacy requirements. Having to set one uniform privacy level is usually too restrictive, often forcing a learner to guarantee the stringent privacy requirement, at a large cost to accuracy. To overcome this limitation, we introduce our novel Personalized-DP Output Perturbation method (PDP-OP) that enables to train Ridge regression models with individual per data point privacy levels. We provide rigorous privacy proofs for our PDP-OP as well as accuracy guarantees for the resulting model. This work is the first to provide such theoretical accuracy guarantees when it comes to personalized DP in machine learning, whereas previous work only provided empirical evaluations. We empirically evaluate PDP-OP on synthetic and real datasets and with diverse privacy distributions. We show that by enabling each data point to specify their own privacy requirement, we can significantly improve the privacy-accuracy trade-offs in DP. We also show that PDP-OP outperforms the personalized privacy techniques of Jorgensen et al. (2015). </p>
back
<p>This technical report details our work towards building an enhanced audio-visual sound event localization and detection (SELD) network. We build on top of the audio-only SELDnet23 model and adapt it to be audio-visual by merging both audio and video information prior to the gated recurrent unit (GRU) of the audio-only network. Our model leverages YOLO and DETIC object detectors. We also build a framework that implements audio-visual data augmentation and audio-visual synthetic data generation. We deliver an audio-visual SELDnet system that outperforms the existing audio-visual SELD baseline. </p>
back
<p>More than 70 years ago, Jaques Riguet suggested the existence of an ``analogie frappante'' (striking analogy) between so-called ``relations de Ferrers'' and a class of difunctional relations, members of which we call ``diagonals''. Inspired by his suggestion, we formulate an ``analogie frappante'' linking the notion of a block-ordered relation and the notion of the diagonal of a relation. We formulate several novel properties of the core/index of a diagonal, and use these properties to rephrase our ``analogie frappante''. Loosely speaking, we show that a block-ordered relation is a provisional ordering up to isomorphism and reduction to its core. (Our theorems make this informal statement precise.) Unlike Riguet (and others who follow his example), we avoid almost entirely the use of nested complements to express and reason about properties of these notions: we use factors (aka residuals) instead. The only (and inevitable) exception to this is to show that our definition of a ``staircase'' relation is equivalent to Riguet's definition of a ``relation de Ferrers''. Our ``analogie frappante'' also makes it obvious that a ``staircase'' relation is not necessarily block-ordered, in spite of the mental picture of such a relation presented by Riguet. </p>
back
<p>Singing voice conversion (SVC) automates song covers by converting one singer's singing voice into another target singer's singing voice with the original lyrics and melody. However, it raises serious concerns about copyright and civil right infringements to multiple entities. This work proposes SongBsAb, the first proactive approach to mitigate unauthorized SVC-based illegal song covers. SongBsAb introduces human-imperceptible perturbations to singing voices before releasing them, so that when they are used, the generation process of SVC will be interfered, resulting in unexpected singing voices. SongBsAb features a dual prevention effect by causing both (singer) identity disruption and lyric disruption, namely, the SVC-covered singing voice neither imitates the target singer nor preserves the original lyrics. To improve the imperceptibility of perturbations, we refine a psychoacoustic model-based loss with the backing track as an additional masker, a unique accompanying element for singing voices compared to ordinary speech voices. To enhance the transferability, we propose to utilize a frame-level interaction reduction-based loss. We demonstrate the prevention effectiveness, utility, and robustness of SongBsAb on three SVC models and two datasets using both objective and human study-based subjective metrics. Our work fosters an emerging research direction for mitigating illegal automated song covers. </p>
back
<p>The principal component analysis (PCA) is widely used for data decorrelation and dimensionality reduction. However, the use of PCA may be impractical in real-time applications, or in situations were energy and computing constraints are severe. In this context, the discrete cosine transform (DCT) becomes a low-cost alternative to data decorrelation. This paper presents a method to derive computationally efficient approximations to the DCT. The proposed method aims at the minimization of the angle between the rows of the exact DCT matrix and the rows of the approximated transformation matrix. The resulting transformations matrices are orthogonal and have extremely low arithmetic complexity. Considering popular performance measures, one of the proposed transformation matrices outperforms the best competitors in both matrix error and coding capabilities. Practical applications in image and video coding demonstrate the relevance of the proposed transformation. In fact, we show that the proposed approximate DCT can outperform the exact DCT for image encoding under certain compression ratios. The proposed transform and its direct competitors are also physically realized as digital prototype circuits using FPGA technology. </p>
back
<p>Brain-computer interfaces (BCIs) use brain signals such as electroencephalography to reflect user intention and enable two-way communication between computers and users. BCI technology has recently received much attention in healthcare applications, such as neurorehabilitation and diagnosis. BCI applications can also control external devices using only brain activity, which can help people with physical or mental disabilities, especially those suffering from neurological and neuromuscular diseases such as stroke and amyotrophic lateral sclerosis. Motor imagery (MI) has been widely used for BCI-based device control, but we adopted intuitive visual motion imagery to overcome the weakness of MI. In this study, we developed a three-dimensional (3D) BCI training platform to induce users to imagine upper-limb movements used in real-life activities (picking up a cell phone, pouring water, opening a door, and eating food). We collected intuitive visual motion imagery data and proposed a deep learning network based on functional connectivity as a mind-reading technique. As a result, the proposed network recorded a high classification performance on average (71.05%). Furthermore, we applied the leave-one-subject-out approach to confirm the possibility of improvements in subject-independent classification performance. This study will contribute to the development of BCI-based healthcare applications for rehabilitation, such as robotic arms and wheelchairs, or assist daily life. </p>
back
<p>In this work, we verify the mutable LongMap from the Scala standard library, a hash table using open addressing within a single array, using the Stainless program verifier. As a reference implementation, we write an immutable map based on a list of tuples. We then show that LongMap's operations correspond to operations of this association list. To express the resizing of the hash table array, we introduce a new reference swapping construct in Stainless. This allows us to apply the decorator pattern without introducing aliasing. Our verification effort led us to find and fix a bug in the original implementation that manifests for large hash tables. Our performance analysis shows the verified version to be within a 1.5 factor of the original data structure. </p>
back
<p>In large-scale applications including medical imaging, collocation differential equation solvers, and estimation with differential privacy, the underlying linear inverse problem can be reformulated as a streaming problem. In theory, the streaming problem can be effectively solved using memory-efficient, exponentially-converging streaming solvers. In practice, a streaming solver's effectiveness is undermined if it is stopped before, or well-after, the desired accuracy is achieved. In special cases when the underlying linear inverse problem is finite-dimensional, streaming solvers can periodically evaluate the residual norm at a substantial computational cost. When the underlying system is infinite dimensional, streaming solver can only access noisy estimates of the residual. While such noisy estimates are computationally efficient, they are useful only when their accuracy is known. In this work, we rigorously develop a general family of computationally-practical residual estimators and their uncertainty sets for streaming solvers, and we demonstrate the accuracy of our methods on a number of large-scale linear problems. Thus, we further enable the practical use of streaming solvers for important classes of linear inverse problems. </p>
back
<p>Decentralized optimization is gaining increased traction due to its widespread applications in large-scale machine learning and multi-agent systems. The same mechanism that enables its success, i.e., information sharing among participating agents, however, also leads to the disclosure of individual agents' private information, which is unacceptable when sensitive data are involved. As differential privacy is becoming a de facto standard for privacy preservation, recently results have emerged integrating differential privacy with distributed optimization. However, directly incorporating differential privacy design in existing distributed optimization approaches significantly compromises optimization accuracy. In this paper, we propose to redesign and tailor gradient methods for differentially-private distributed optimization, and propose two differential-privacy oriented gradient methods that can ensure both rigorous epsilon-differential privacy and optimality. The first algorithm is based on static-consensus based gradient methods, and the second algorithm is based on dynamic-consensus (gradient-tracking) based distributed optimization methods and, hence, is applicable to general directed interaction graph topologies. Both algorithms can simultaneously ensure almost sure convergence to an optimal solution and a finite privacy budget, even when the number of iterations goes to infinity. To our knowledge, this is the first time that both goals are achieved simultaneously. Numerical simulations using a distributed estimation problem and experimental results on a benchmark dataset confirm the effectiveness of the proposed approaches. </p>
back
<p>We consider the dynamics of an elastic continuum under large deformation but small strain. Such systems can be described by the equations of geometrically nonlinear elastodynamics in combination with the St. Venant-Kirchhoff material law. The velocity-stress formulation of the problem turns out to have a formal port-Hamiltonian structure. In contrast to the linear case, the operators of the problem are modulated by the displacement field which can be handled as a passive variable and integrated along with the velocities. A weak formulation of the problem is derived and essential boundary conditions are incorporated via Lagrange multipliers. This variational formulation explicitly encodes the transfer between kinetic and potential energy in the interior as well as across the boundary, thus leading to a global power balance and ensuring passivity of the system. The particular geometric structure of the weak formulation can be preserved under Galerkin approximation via appropriate mixed finite elements. In addition, a fully discrete power balance can be obtained by appropriate time discretization. The main properties of the system and its discretization are shown theoretically and demonstrated by numerical tests. </p>
back
<p>Flattening is essential in computer vision by converting multi-dimensional feature maps or images into one-dimensional vectors. However, existing flattening approaches neglect the preservation of local smoothness, which can impact the representational learning capacity of vision models. In this paper, we propose Hilbert curve flattening as an innovative method to preserve locality in flattened matrices. We compare it with the commonly used Zigzag operation and demonstrate that Hilbert curve flattening can better retain the spatial relationships and local smoothness of the original grid structure, while maintaining robustness against the input scale variance. And, we introduce the Localformer, a vision transformer architecture that incorporates Hilbert token sampling with a token aggregator to enhance its locality bias. Extensive experiments on image classification and semantic segmentation tasks demonstrate that the Localformer outperforms baseline models consistently. We also show it brings consistent performance boosts for other popular architectures (e.g. MLP-Mixer). </p>
back
<p>Algorithmic paradigms such as divide-and-conquer (D&amp;C) are proposed to guide developers in designing efficient algorithms, but it can still be difficult to apply algorithmic paradigms to practical tasks. To ease the usage of paradigms, many research efforts have been devoted to the automatic application of algorithmic paradigms. However, most existing approaches to this problem rely on syntax-based program transformations and thus put significant restrictions on the original program. </p> <p>In this paper, we study the automatic application of D&amp;C and several similar paradigms, denoted as D&amp;C-like algorithmic paradigms, and aim to remove the restrictions from syntax-based transformations. To achieve this goal, we propose an efficient synthesizer, named AutoLifter, which does not depend on syntax-based transformations. Specifically, the main challenge of applying algorithmic paradigms is from the large scale of the synthesized programs, and AutoLifter addresses this challenge by applying two novel decomposition methods that do not depend on the syntax of the input program, component elimination and variable elimination, to soundly divide the whole problem into simpler subtasks, each synthesizing a sub-program of the final program and being tractable with existing synthesizers. </p> <p>We evaluate AutoLifter on 96 programming tasks related to 6 different algorithmic paradigms. AutoLifter solves 82/96 tasks with an average time cost of 20.17 seconds, significantly outperforming existing approaches. </p>
back
<p>This paper explores optimal service resource management strategy, a continuous challenge for health information service to enhance service performance, optimise service resource utilisation and deliver interactive health information service. An adaptive optimal service resource management strategy was developed considering a value co-creation model in health information service with a focus on collaborative and interactive with users. The deep reinforcement learning algorithm was embedded in the Internet of Things (IoT)-based health information service system (I-HISS) to allocate service resources by controlling service provision and service adaptation based on user engagement behaviour. The simulation experiments were conducted to evaluate the significance of the proposed algorithm under different user reactions to the health information service. </p>
back
<p>As with many other tasks, neural networks prove very effective for anomaly detection purposes. However, very few deep-learning models are suited for detecting anomalies on tabular datasets. This paper proposes a novel methodology to flag anomalies based on TracIn, an influence measure initially introduced for explicability purposes. The proposed methods can serve to augment any unsupervised deep anomaly detection method. We test our approach using Variational Autoencoders and show that the average influence of a subsample of training points on a test point can serve as a proxy for abnormality. Our model proves to be competitive in comparison with state-of-the-art approaches: it achieves comparable or better performance in terms of detection accuracy on medical and cyber-security tabular benchmark data. </p>
back
<p>Evaluation of intervention in a multi-agent system, e.g., when humans should intervene in autonomous driving systems and when a player should pass to teammates for a good shot, is challenging in various engineering and scientific fields. Estimating the individual treatment effect (ITE) using counterfactual long-term prediction is practical to evaluate such interventions. However, most of the conventional frameworks did not consider the time-varying complex structure of multi-agent relationships and covariate counterfactual prediction. This may lead to erroneous assessments of ITE and difficulty in interpretation. Here we propose an interpretable, counterfactual recurrent network in multi-agent systems to estimate the effect of the intervention. Our model leverages graph variational recurrent neural networks and theory-based computation with domain knowledge for the ITE estimation framework based on long-term prediction of multi-agent covariates and outcomes, which can confirm the circumstances under which the intervention is effective. On simulated models of an automated vehicle and biological agents with time-varying confounders, we show that our methods achieved lower estimation errors in counterfactual covariates and the most effective treatment timing than the baselines. Furthermore, using real basketball data, our methods performed realistic counterfactual predictions and evaluated the counterfactual passes in shot scenarios. </p>
back
<p>Optical sensors can capture dynamic environments and derive depth information in near real-time. The quality of these digital reconstructions is determined by factors like illumination, surface and texture conditions, sensing speed and other sensor characteristics as well as the sensor-object relations. Improvements can be obtained by using dynamically collected data from multiple sensors. However, matching the data from multiple sensors requires a shared world coordinate system. We present a concept for transferring multi-sensor data into a commonly referenced world coordinate system: the earth's magnetic field. The steady presence of our planetary magnetic field provides a reliable world coordinate system, which can serve as a reference for a position-defined reconstruction of dynamic environments. Our approach is evaluated using magnetic field sensors of the ZED 2 stereo camera from Stereolabs, which provides orientation relative to the North Pole similar to a compass. With the help of inertial measurement unit informations, each camera's position data can be transferred into the unified world coordinate system. Our evaluation reveals the level of quality possible using the earth magnetic field and allows a basis for dynamic and real-time-based applications of optical multi-sensors for environment detection. </p>
back
<p>The co-optimization of behind-the-meter distributed energy resources is considered for prosumers under the net energy metering tariff. The distributed energy resources considered include renewable generations, flexible demands, and battery energy storage systems. An energy management system co-schedules the consumptions and battery storage based on locally available stochastic renewables by maximizing the expected operation surplus. A stochastic dynamic programming formulation is introduced for which structural properties of the dynamic optimization are derived. A closed-form myopic co-optimization algorithm is proposed, which achieves optimality when the storage capacity constraints are nonbinding. The proposed co-optimization algorithm has linear computation complexity and can be implemented in a decentralized fashion. The myopic co-optimization algorithm's performance and the economic benefits of the co-optimization policy to prosumers and grid operations are evaluated in numerical simulations. </p>
back
<p>The last decade has shown a growing interest in robots as well-being coaches. However, insightful guidelines for the design of robots as coaches to promote mental well-being have not yet been proposed. This paper details design and ethical recommendations based on a qualitative analysis drawing on a grounded theory approach, which was conducted with a three-step iterative design process which included user-centered design studies involving robotic well-being coaches, namely: (1) a user-centred design study conducted with 11 participants consisting of both prospective users who had participated in a Brief Solution-Focused Practice study with a human coach, as well as coaches of different disciplines, (2) semi-structured individual interview data gathered from 20 participants attending a Positive Psychology intervention study with the robotic well-being coach Pepper, and (3) a user-centred design study conducted with 3 participants of the Positive Psychology study as well as 2 relevant well-being coaches. After conducting a thematic analysis and a qualitative analysis, we collated the data gathered into convergent and divergent themes, and we distilled from those results a set of design guidelines and ethical considerations. Our findings can inform researchers and roboticists on the key aspects to take into account when designing robotic mental well-being coaches. </p>
back
<p>By abstracting over well-known properties of De Bruijn's representation with nameless dummies, we design a new theory of syntax with variable binding and capture-avoiding substitution. We propose it as a simpler alternative to Fiore, Plotkin, and Turi's approach, with which we establish a strong formal link. We also show that our theory easily incorporates simple types and equations between terms. </p>
back
<p>Famous people, such as celebrities and influencers, are harassed online on a daily basis. Online harassment mentally disturbs them and negatively affects society. However, limited studies have been conducted on the online harassment victimization of famous people, and its effects remain unclear. We surveyed Japanese famous people ($N=213$), who were influential people who appeared on television and other traditional media and on social media, regarding online harassment victimization, emotional injury, and action against offenders and revealed that various forms of online harassment are prevalent. Some victims used the anti-harassment functions provided by weblogs and social media systems (e.g., blocking/muting/reporting offender accounts and closing comment forms), talked about their victimization to close people, and contacted relevant authorities to take legal action (talent agencies, legal consultants, and police). By contrast, some victims felt compelled to accept harassment and did not initiate action for offenses. We propose several approaches to support victims, inhibit online harassment, and educate people. Our findings help that platforms establish support systems against online harassment. </p>
back
<p>Deep Reinforcement Learning (DRL) has achieved impressive performance in robotics and autonomous systems (RAS). A key challenge to its deployment in real-life operations is the presence of spuriously unsafe DRL policies. Unexplored states may lead the agent to make wrong decisions that could result in hazards, especially in applications where DRL-trained end-to-end controllers govern the behaviour of RAS. This paper proposes a novel quantitative reliability assessment framework for DRL-controlled RAS, leveraging verification evidence generated from formal reliability analysis of neural networks. A two-level verification framework is introduced to check the safety property with respect to inaccurate observations that are due to, e.g., environmental noise and state changes. Reachability verification tools are leveraged locally to generate safety evidence of trajectories. In contrast, at the global level, we quantify the overall reliability as an aggregated metric of local safety evidence, corresponding to a set of distinct tasks and their occurrence probabilities. The effectiveness of the proposed verification framework is demonstrated and validated via experiments on real RAS. </p>
back
<p>We investigate composed image retrieval with text feedback. Users gradually look for the target of interest by moving from coarse to fine-grained feedback. However, existing methods merely focus on the latter, i.e., fine-grained search, by harnessing positive and negative pairs during training. This pair-based paradigm only considers the one-to-one distance between a pair of specific points, which is not aligned with the one-to-many coarse-grained retrieval process and compromises the recall rate. In an attempt to fill this gap, we introduce a unified learning approach to simultaneously modeling the coarse- and fine-grained retrieval by considering the multi-grained uncertainty. The key idea underpinning the proposed method is to integrate fine- and coarse-grained retrieval as matching data points with small and large fluctuations, respectively. Specifically, our method contains two modules: uncertainty modeling and uncertainty regularization. (1) The uncertainty modeling simulates the multi-grained queries by introducing identically distributed fluctuations in the feature space. (2) Based on the uncertainty modeling, we further introduce uncertainty regularization to adapt the matching objective according to the fluctuation range. Compared with existing methods, the proposed strategy explicitly prevents the model from pushing away potential candidates in the early stage, and thus improves the recall rate. On the three public datasets, i.e., FashionIQ, Fashion200k, and Shoes, the proposed method has achieved +4.03%, +3.38%, and +2.40% Recall@50 accuracy over a strong baseline, respectively. </p>
back
<p>We introduce and analyze an improved variant of nearest neighbors (NN) for estimation with missing data in latent factor models. We consider a matrix completion problem with missing data, where the $(i, t)$-th entry, when observed, is given by its mean $f(u_i, v_t)$ plus mean-zero noise for an unknown function $f$ and latent factors $u_i$ and $v_t$. Prior NN strategies, like unit-unit NN, for estimating the mean $f(u_i, v_t)$ relies on existence of other rows $j$ with $u_j \approx u_i$. Similarly, time-time NN strategy relies on existence of columns $t'$ with $v_{t'} \approx v_t$. These strategies provide poor performance respectively when similar rows or similar columns are not available. Our estimate is doubly robust to this deficit in two ways: (1) As long as there exist either good row or good column neighbors, our estimate provides a consistent estimate. (2) Furthermore, if both good row and good column neighbors exist, it provides a (near-)quadratic improvement in the non-asymptotic error and admits a significantly narrower asymptotic confidence interval when compared to both unit-unit or time-time NN. </p>
back
<p>Despite the impressive advancements achieved through vision-and-language pretraining, it remains unclear whether this joint learning paradigm can help understand each individual modality. In this work, we conduct a comparative analysis of the visual representations in existing vision-and-language models and vision-only models by probing a broad range of tasks, aiming to assess the quality of the learned representations in a nuanced manner. Interestingly, our empirical observations suggest that vision-and-language models are better at label prediction tasks like object and attribute prediction, while vision-only models are stronger at dense prediction tasks that require more localized information. We hope our study sheds light on the role of language in visual learning, and serves as an empirical guide for various pretrained models. Code will be released at https://github.com/Lizw14/visual_probing </p>
back
<p>Reversible debuggers help programmers to find the causes of misbehaviours in concurrent programs more quickly, by executing a program backwards from the point where a misbehaviour was observed, and looking for the bug(s) that caused it. Reversible debuggers can be founded on the well-studied theory of causal-consistent reversibility, which only allows one to undo an action provided that its consequences, if any, are undone beforehand. Causal-consistent reversibility yields more efficient debugging by reducing the number of states to be explored when looking backwards. Till now, causal-consistent reversibility has never considered time, which is a key aspect in real-world applications. Here, we study the interplay between reversibility and time in concurrent systems via a process algebra. The Temporal Process Language (TPL) by Hennessy and Regan is a well-understood extension of CCS with discrete-time and a timeout operator. We define revTPL, a reversible extension of TPL, and we show that it satisfies the properties expected from a causal-consistent reversible calculus. We show that, alternatively, revTPL can be interpreted as an extension of reversible CCS with time. </p>
back
<p>There is increasing adoption of artificial intelligence in drug discovery. However, existing studies use machine learning to mainly utilize the chemical structures of molecules but ignore the vast textual knowledge available in chemistry. Incorporating textual knowledge enables us to realize new drug design objectives, adapt to text-based instructions and predict complex biological activities. Here we present a multi-modal molecule structure-text model, MoleculeSTM, by jointly learning molecules' chemical structures and textual descriptions via a contrastive learning strategy. To train MoleculeSTM, we construct a large multi-modal dataset, namely, PubChemSTM, with over 280,000 chemical structure-text pairs. To demonstrate the effectiveness and utility of MoleculeSTM, we design two challenging zero-shot tasks based on text instructions, including structure-text retrieval and molecule editing. MoleculeSTM has two main properties: open vocabulary and compositionality via natural language. In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts across various benchmarks. </p>
back
<p>The relationship between electricity demand and weather is well established in power systems, along with the importance of behavioral and social aspects such as holidays and significant events. This study explores the link between electricity demand and more nuanced information about social events. This is done using mature Natural Language Processing (NLP) and demand forecasting techniques. The results indicate that day-ahead forecasts are improved by textual features such as word frequencies, public sentiments, topic distributions, and word embeddings. The social events contained in these features include global pandemics, politics, international conflicts, transportation, etc. Causality effects and correlations are discussed to propose explanations for the mechanisms behind the links highlighted. This study is believed to bring a new perspective to traditional electricity demand analysis. It confirms the feasibility of improving forecasts from unstructured text, with potential consequences for sociology and economics. </p>
back
<p>Estimating depth from images nowadays yields outstanding results, both in terms of in-domain accuracy and generalization. However, we identify two main challenges that remain open in this field: dealing with non-Lambertian materials and effectively processing high-resolution images. Purposely, we propose a novel dataset that includes accurate and dense ground-truth labels at high resolution, featuring scenes containing several specular and transparent surfaces. Our acquisition pipeline leverages a novel deep space-time stereo framework, enabling easy and accurate labeling with sub-pixel precision. The dataset is composed of 606 samples collected in 85 different scenes, each sample includes both a high-resolution pair (12 Mpx) as well as an unbalanced stereo pair (Left: 12 Mpx, Right: 1.1 Mpx), typical of modern mobile devices that mount sensors with different resolutions. Additionally, we provide manually annotated material segmentation masks and 15K unlabeled samples. The dataset is composed of a train set and two test sets, the latter devoted to the evaluation of stereo and monocular depth estimation networks. Our experiments highlight the open challenges and future research directions in this field. </p>
back
<p>The Shapley value, which is arguably the most popular approach for assigning a meaningful contribution value to players in a cooperative game, has recently been used intensively in explainable artificial intelligence. Its meaningfulness is due to axiomatic properties that only the Shapley value satisfies, which, however, comes at the expense of an exact computation growing exponentially with the number of agents. Accordingly, a number of works are devoted to the efficient approximation of the Shapley value, most of them revolve around the notion of an agent's marginal contribution. In this paper, we propose with SVARM and Stratified SVARM two parameter-free and domain-independent approximation algorithms based on a representation of the Shapley value detached from the notion of marginal contribution. We prove unmatched theoretical guarantees regarding their approximation quality and provide empirical results including synthetic games as well as common explainability use cases comparing ourselves with state-of-the-art methods. </p>
back
<p>In this work we explore the deliberate infusion of ambiguity into the design of contracts. We show that when the agent is ambiguity-averse and chooses an action that maximizes their max-min utility, then the principal can strictly gain from using an ambiguous contract. We provide insights into the structure of optimal contracts, and establish that optimal ambiguous contracts are composed of simple contracts. We also provide a geometric characterization of ambiguity-proof classes of contracts. Finally, we show that when the agent considers mixed strategies, then there is no advantage in using an ambiguous contract. </p>
back
<p>In this paper we adapt previous work on rewriting string diagrams using hypergraphs to the case where the underlying category has a traced comonoid structure, in which wires can be forked and the outputs of a morphism can be connected to its input. Such a structure is particularly interesting because any traced Cartesian (dataflow) category has an underlying traced comonoid structure. We show that certain subclasses of hypergraphs are fully complete for traced comonoid categories: that is to say, every term in such a category has a unique corresponding hypergraph up to isomorphism, and from every hypergraph with the desired properties, a unique term in the category can be retrieved up to the axioms of traced comonoid categories. We also show how the framework of double pushout rewriting (DPO) can be adapted for traced comonoid categories by characterising the valid pushout complements for rewriting in our setting. We conclude by presenting a case study in the form of recent work on an equational theory for sequential circuits: circuits built from primitive logic gates with delay and feedback. The graph rewriting framework allows for the definition of an operational semantics for sequential circuits. </p>
back
<p>Let $G$ be an undirected graph. We say that $G$ contains a ladder of length $k$ if the $2 \times (k+1)$ grid graph is an induced subgraph of $G$ that is only connected to the rest of $G$ via its four cornerpoints. We prove that if all the ladders contained in $G$ are reduced to length 4, the treewidth remains unchanged (and that this bound is tight). Our result indicates that, when computing the treewidth of a graph, long ladders can simply be reduced, and that minimal forbidden minors for bounded treewidth graphs cannot contain long ladders. Our result also settles an open problem from algorithmic phylogenetics: the common chain reduction rule, used to simplify the comparison of two evolutionary trees, is treewidth-preserving in the display graph of the two trees. </p>
back
<p>Tennenbaum's theorem states that the only countable model of Peano arithmetic (PA) with computable arithmetical operations is the standard model of natural numbers. In this paper, we use constructive type theory as a framework to revisit, analyze and generalize this result. The chosen framework allows for a synthetic approach to computability theory, exploiting that, externally, all functions definable in constructive type theory can be shown computable. We then build on this viewpoint and furthermore internalize it by assuming a version of Church's thesis, which expresses that any function on natural numbers is representable by a formula in PA. This assumption provides for a conveniently abstract setup to carry out rigorous computability arguments, even in the theorem's mechanization. Concretely, we constructivize several classical proofs and present one inherently constructive rendering of Tennenbaum's theorem, all following arguments from the literature. Concerning the classical proofs in particular, the constructive setting allows us to highlight differences in their assumptions and conclusions which are not visible classically. All versions are accompanied by a unified mechanization in the Coq proof assistant. </p>
back
<p>In this paper, we establish novel data-dependent upper bounds on the generalization error through the lens of a "variable-size compressibility" framework that we introduce newly here. In this framework, the generalization error of an algorithm is linked to a variable-size 'compression rate' of its input data. This is shown to yield bounds that depend on the empirical measure of the given input data at hand, rather than its unknown distribution. Our new generalization bounds that we establish are tail bounds, tail bounds on the expectation, and in-expectations bounds. Moreover, it is shown that our framework also allows to derive general bounds on any function of the input data and output hypothesis random variables. In particular, these general bounds are shown to subsume and possibly improve over several existing PAC-Bayes and data-dependent intrinsic dimension-based bounds that are recovered as special cases, thus unveiling a unifying character of our approach. For instance, a new data-dependent intrinsic dimension-based bound is established, which connects the generalization error to the optimization trajectories and reveals various interesting connections with the rate-distortion dimension of a process, the R\'enyi information dimension of a process, and the metric mean dimension. </p>
back
<p>The pressing need for digitization of historical documents has led to a strong interest in designing computerised image processing methods for automatic handwritten text recognition. However, not much attention has been paid on studying the handwritten text written in the margins, i.e. marginalia, that also forms an important source of information. Nevertheless, training an accurate and robust recognition system for marginalia calls for data-efficient approaches due to the unavailability of sufficient amounts of annotated multi-writer texts. Therefore, this work presents an end-to-end framework for automatic detection and recognition of handwritten marginalia, and leverages data augmentation and transfer learning to overcome training data scarcity. The detection phase involves investigation of R-CNN and Faster R-CNN networks. The recognition phase includes an attention-based sequence-to-sequence model, with ResNet feature extraction, bidirectional LSTM-based sequence modeling, and attention-based prediction of marginalia. The effectiveness of the proposed framework has been empirically evaluated on the data from early book collections found in the Uppsala University Library in Sweden. Source code and pre-trained models are available at Github. </p>
back
<p>Social network structures play an important role in the lives of animals by affecting individual fitness, and the spread of disease and information. Nevertheless, we still lack a good understanding of how these structures emerge from the behaviour of individuals. Generative network models based on empirical knowledge about animal social systems provide a powerful approach that can help close this gap. In this study: 1) we develop a general model for the emergence of social structures based on a key generative process of real animal social networks, namely social preferences for traits (such as the age, sex, etc. of social partners); 2) we use this model to investigate how different trait preferences affect social network structure and function. We find that the preferences used in a population can have far-reaching consequences for the population, via effects on the transmission of disease and information and the robustness of the social network against fragmentation when individuals disappear. The study thus shows that social preferences can have consequences that go far beyond direct benefits individuals gain from social partner selection. It also shows that these consequences depend both on the preference types, and on the types of traits they are used with. We discuss the implications of the results for social evolution. </p>
back
<p>Deep learning has emerged as a strong alternative for classical iterative methods for deformable medical image registration, where the goal is to find a mapping between the coordinate systems of two images. Popular classical image registration methods enforce the useful inductive biases of symmetricity, inverse consistency, and topology preservation by construct. However, while many deep learning registration methods encourage these properties via loss functions, no earlier methods enforce all of them by construct. Here, we propose a novel registration architecture based on extracting multi-resolution feature representations which is by construct symmetric, inverse consistent, and topology preserving. We also develop an implicit layer for memory efficient inversion of the deformation fields. Our method achieves state-of-the-art registration accuracy on two datasets. </p>
back
<p>Finding localized correspondences across different images of the same object is crucial to understand its geometry. In recent years, this problem has seen remarkable progress with the advent of deep learning-based local image features and learnable matchers. Still, learnable matchers often underperform when there exists only small regions of co-visibility between image pairs (i.e. wide camera baselines). To address this problem, we leverage recent progress in coarse single-view geometry estimation methods. We propose LFM-3D, a Learnable Feature Matching framework that uses models based on graph neural networks and enhances their capabilities by integrating noisy, estimated 3D signals to boost correspondence estimation. When integrating 3D signals into the matcher model, we show that a suitable positional encoding is critical to effectively make use of the low-dimensional 3D information. We experiment with two different 3D signals - normalized object coordinates and monocular depth estimates - and evaluate our method on large-scale (synthetic and real) datasets containing object-centric image pairs across wide baselines. We observe strong feature matching improvements compared to 2D-only methods, with up to +6% total recall and +28% precision at fixed recall. Additionally, we demonstrate that the resulting improved correspondences lead to much higher relative posing accuracy for in-the-wild image pairs - up to 8.6% compared to the 2D-only approach. </p>
back
<p>Inverse Reinforcement Learning (IRL) is a powerful set of techniques for imitation learning that aims to learn a reward function that rationalizes expert demonstrations. Unfortunately, traditional IRL methods suffer from a computational weakness: they require repeatedly solving a hard reinforcement learning (RL) problem as a subroutine. This is counter-intuitive from the viewpoint of reductions: we have reduced the easier problem of imitation learning to repeatedly solving the harder problem of RL. Another thread of work has proved that access to the side-information of the distribution of states where a strong policy spends time can dramatically reduce the sample and computational complexities of solving an RL problem. In this work, we demonstrate for the first time a more informed imitation learning reduction where we utilize the state distribution of the expert to alleviate the global exploration component of the RL subroutine, providing an exponential speedup in theory. In practice, we find that we are able to significantly speed up the prior art on continuous control tasks. </p>
back
<p>Diffusion-weighted (DW) MRI measures the direction and scale of the local diffusion process in every voxel through its spectrum in q-space, typically acquired in one or more shells. Recent developments in micro-structure imaging and multi-tissue decomposition have sparked renewed attention to the radial b-value dependence of the signal. Applications in tissue classification and micro-architecture estimation, therefore, require a signal representation that extends over the radial as well as angular domain. Multiple approaches have been proposed that can model the non-linear relationship between the DW-MRI signal and biological microstructure. In the past few years, many deep learning-based methods have been developed towards faster inference speed and higher inter-scan consistency compared with traditional model-based methods (e.g., multi-shell multi-tissue constrained spherical deconvolution). However, a multi-stage learning strategy is typically required since the learning process relies on various middle representations, such as simple harmonic oscillator reconstruction (SHORE) representation. In this work, we present a unified dynamic network with a single-stage spherical convolutional neural network, which allows efficient fiber orientation distribution function (fODF) estimation through heterogeneous multi-shell diffusion MRI sequences. We study the Human Connectome Project (HCP) young adults with test-retest scans. From the experimental results, the proposed single-stage method outperforms prior multi-stage approaches in repeated fODF estimation with shell dropoff and single-shell DW-MRI sequences. </p>
back
<p>This study introduces an approach to estimate the uncertainty in bibliometric indicator values that is caused by data errors. This approach utilizes Bayesian regression models, estimated from empirical data samples, which are used to predict error-free data. Through direct Monte Carlo simulation -- drawing predicted data from the estimated regression models a large number of times for the same input data -- probability distributions for indicator values can be obtained, which provide the information on their uncertainty due to data errors. It is demonstrated how uncertainty in base quantities, such as the number of publications of a unit of certain document types and the number of citations of a publication, can be propagated along a measurement model into final indicator values. This method can be used to estimate the uncertainty of indicator values due to sources of errors with known error distributions. The approach is demonstrated with simple synthetic examples for instructive purposes and real bibliometric research evaluation data to show its possible application in practice. </p>
back
<p>We study the problem of federated stochastic multi-arm contextual bandits with unknown contexts, in which M agents are faced with different bandits and collaborate to learn. The communication model consists of a central server and the agents share their estimates with the central server periodically to learn to choose optimal actions in order to minimize the total regret. We assume that the exact contexts are not observable and the agents observe only a distribution of the contexts. Such a situation arises, for instance, when the context itself is a noisy measurement or based on a prediction mechanism. Our goal is to develop a distributed and federated algorithm that facilitates collaborative learning among the agents to select a sequence of optimal actions so as to maximize the cumulative reward. By performing a feature vector transformation, we propose an elimination-based algorithm and prove the regret bound for linearly parametrized reward functions. Finally, we validated the performance of our algorithm and compared it with another baseline approach using numerical simulations on synthetic data and on the real-world movielens dataset. </p>
back
<p>In smart electrical grids, fault detection tasks may have a high impact on society due to their economic and critical implications. In the recent years, numerous smart grid applications, such as defect detection and load forecasting, have embraced data-driven methodologies. The purpose of this study is to investigate the challenges associated with the security of machine learning (ML) applications in the smart grid scenario. Indeed, the robustness and security of these data-driven algorithms have not been extensively studied in relation to all power grid applications. We demonstrate first that the deep neural network method used in the smart grid is susceptible to adversarial perturbation. Then, we highlight how studies on fault localization and type classification illustrate the weaknesses of present ML algorithms in smart grids to various adversarial attacks </p>
back
<p>This paper presents a fully multidimensional kernel-based reconstruction scheme for finite volume methods applied to systems of hyperbolic conservation laws, with a particular emphasis on the compressible Euler equations. Non-oscillatory reconstruction is achieved through an adaptive order weighted essentially non-oscillatory (WENO-AO) method cast into a form suited to multidimensional stencils and reconstruction. A kernel-based approach inspired by Gaussian process (GP) modeling is presented here. This approach allows the creation of a scheme of arbitrary order with simply defined multidimensional stencils and substencils. Furthermore, the fully multidimensional nature of the reconstruction allows a more straightforward extension to higher spatial dimensions and removes the need for complicated boundary conditions on intermediate quantities in modified dimension-by-dimension methods. In addition, a new simple-yet-effective set of reconstruction variables is introduced, as well as an easy-to-implement effective limiter for positivity preservation, both of which could be useful in existing schemes with little modification. The proposed scheme is applied to a suite of stringent and informative benchmark problems to demonstrate its efficacy and utility. </p>
back
<p>Analysis of geospatial data has traditionally been model-based, with a mean model, customarily specified as a linear regression on the covariates, and a covariance model, encoding the spatial dependence. We relax the strong assumption of linearity and propose embedding neural networks directly within the traditional geostatistical models to accommodate non-linear mean functions while retaining all other advantages including use of Gaussian Processes to explicitly model the spatial covariance, enabling inference on the covariate effect through the mean and on the spatial dependence through the covariance, and offering predictions at new locations via kriging. We propose NN-GLS, a new neural network estimation algorithm for the non-linear mean in GP models that explicitly accounts for the spatial covariance through generalized least squares (GLS), the same loss used in the linear case. We show that NN-GLS admits a representation as a special type of graph neural network (GNN). This connection facilitates use of standard neural network computational techniques for irregular geospatial data, enabling novel and scalable mini-batching, backpropagation, and kriging schemes. Theoretically, we show that NN-GLS will be consistent for irregularly observed spatially correlated data processes. To our knowledge this is the first asymptotic consistency result for any neural network algorithm for spatial data. We demonstrate the methodology through simulated and real datasets. </p>
back
<p>Translated texts bear several hallmarks distinct from texts originating in the language. Though individual translated texts are often fluent and preserve meaning, at a large scale, translated texts have statistical tendencies which distinguish them from text originally written in the language ("translationese") and can affect model performance. We frame the novel task of translationese reduction and hypothesize that Abstract Meaning Representation (AMR), a graph-based semantic representation which abstracts away from the surface form, can be used as an interlingua to reduce the amount of translationese in translated texts. By parsing English translations into an AMR and then generating text from that AMR, the result more closely resembles originally English text across three quantitative macro-level measures, without severely compromising fluency or adequacy. We compare our AMR-based approach against three other techniques based on machine translation or paraphrase generation. This work makes strides towards reducing translationese in text and highlights the utility of AMR as an interlingua. </p>
back
<p>We propose a method to explore the flavor structure of quarks and leptons with reinforcement learning. As a concrete model, we utilize a basic value-based algorithm for models with $U(1)$ flavor symmetry. By training neural networks on the $U(1)$ charges of quarks and leptons, the agent finds 21 models to be consistent with experimentally measured masses and mixing angles of quarks and leptons. In particular, an intrinsic value of normal ordering tends to be larger than that of inverted ordering, and the normal ordering is well fitted with the current experimental data in contrast to the inverted ordering. A specific value of effective mass for the neutrinoless double beta decay and a sizable leptonic CP violation induced by an angular component of flavon field are predicted by autonomous behavior of the agent. Our finding results indicate that the reinforcement learning can be a new method for understanding the flavor structure. </p>
back
<p>Orbit recovery problems are a class of problems that often arise in practice and various forms. In these problems, we aim to estimate an unknown function after being distorted by a group action and observed via a known operator. Typically, the observations are contaminated with a non-trivial level of noise. Two particular orbit recovery problems of interest in this paper are multireference alignment and single-particle cryo-EM modelling. In order to suppress the noise, we suggest using the method of moments approach for both problems while introducing deep neural network priors. In particular, our neural networks should output the signals and the distribution of group elements, with moments being the input. In the multireference alignment case, we demonstrate the advantage of using the NN to accelerate the convergence for the reconstruction of signals from the moments. Finally, we use our method to reconstruct simulated and biological volumes in the cryo-EM setting. </p>
back
<p>As semiconductor power density is no longer constant with the technology process scaling down, modern CPUs are integrating capable data accelerators on chip, aiming to improve performance and efficiency for a wide range of applications and usages. One such accelerator is the Intel Data Streaming Accelerator (DSA) introduced in Intel 4th Generation Xeon Scalable CPUs (Sapphire Rapids). DSA targets data movement operations in memory that are common sources of overhead in datacenter workloads and infrastructure. In addition, it becomes much more versatile by supporting a wider range of operations on streaming data, such as CRC32 calculations, delta record creation/merging, and data integrity field (DIF) operations. This paper sets out to introduce the latest features supported by DSA, deep-dive into its versatility, and analyze its throughput benefits through a comprehensive evaluation. Along with the analysis of its characteristics, and the rich software ecosystem of DSA, we summarize several insights and guidelines for the programmer to make the most out of DSA, and use an in-depth case study of DPDK Vhost to demonstrate how these guidelines benefit a real application. </p>
back
<p>Acquiring new knowledge without forgetting what has been learned in a sequence of tasks is the central focus of continual learning (CL). While tasks arrive sequentially, the training data are often prepared and annotated independently, leading to the CL of incoming supervised learning tasks. This paper considers the under-explored problem of active continual learning (ACL) for a sequence of active learning (AL) tasks, where each incoming task includes a pool of unlabelled data and an annotation budget. We investigate the effectiveness and interplay between several AL and CL algorithms in the domain, class and task-incremental scenarios. Our experiments reveal the trade-off between two contrasting goals of not forgetting the old knowledge and the ability to quickly learn new knowledge in CL and AL, respectively. While conditioning the AL query strategy on the annotations collected for the previous tasks leads to improved task performance on the domain and task incremental learning, our proposed forgetting-learning profile suggests a gap in balancing the effect of AL and CL for the class-incremental scenario. </p>
back
<p>Integrating the gas and district heating with the electrical grid in a multi-energy grid has been shown to provide flexibility and prevent bottlenecks in the operation of electrical distribution grids. This integration however presents new challenges, including uncertainties in demand prediction, energy prices, and renewable energy availability. In response to these challenges, this paper proposes a novel approach to apply robust optimization methods in the integrated planning of multi-energy grids, to reduce the risk of investment in grid expansion and to optimize the use of different carbon-neutral energy carriers. The uncertainty in energy prices is modeled using interval uncertainty with a proportional deviation. This allows planners, operators and regulators to prioritize the expansion of specific grids in certain areas of a city. By minimizing a cost function subject to various constraints, the strategy ensures robustness against uncertainties in energy prices. This robust optimization approach is applied to Hamburg as a case study. The study concludes that district heating expansion in high-density areas is a low-risk investment for carbon neutrality. In less dense areas, electrification supports decentralized heat pumps. Meanwhile, hydrogen gas grids are viable where electric expansion is impractical. Increased uncertainty leads to more conservative solutions. This novel approach can be implemented promptly and practically by grid planners and is an important component of a new holistic integrated planning process for multi-energy grids. </p>
back
<p>Principled accountability in the aftermath of harms is essential to the trustworthy design and governance of algorithmic decision making. Legal theory offers a paramount method for assessing culpability: putting the agent 'on the stand' to subject their actions and intentions to cross-examination. We show that under minimal assumptions automated reasoning can rigorously interrogate algorithmic behaviors as in the adversarial process of legal fact finding. We model accountability processes, such as trials or review boards, as Counterfactual-Guided Logic Exploration and Abstraction Refinement (CLEAR) loops. We use the formal methods of symbolic execution and satisfiability modulo theories (SMT) solving to discharge queries about agent behavior in factual and counterfactual scenarios, as adaptively formulated by a human investigator. In order to do so, for a decision algorithm $\mathcal{A}$ we use symbolic execution to represent its logic as a statement $\Pi$ in the decidable theory $\texttt{QF_FPBV}$. We implement our framework and demonstrate its utility on an illustrative car crash scenario. </p>
back
<p>Cross-platform recommendation aims to improve recommendation accuracy by gathering heterogeneous features from different platforms. However, such cross-silo collaborations between platforms are restricted by increasingly stringent privacy protection regulations, thus data cannot be aggregated for training. Federated learning (FL) is a practical solution to deal with the data silo problem in recommendation scenarios. Existing cross-silo FL methods transmit model information to collaboratively build a global model by leveraging the data of overlapped users. However, in reality, the number of overlapped users is often very small, thus largely limiting the performance of such approaches. Moreover, transmitting model information during training requires high communication costs and may cause serious privacy leakage. In this paper, we propose a novel privacy-preserving double distillation framework named FedPDD for cross-silo federated recommendation, which efficiently transfers knowledge when overlapped users are limited. Specifically, our double distillation strategy enables local models to learn not only explicit knowledge from the other party but also implicit knowledge from its past predictions. Moreover, to ensure privacy and high efficiency, we employ an offline training scheme to reduce communication needs and privacy leakage risk. In addition, we adopt differential privacy to further protect the transmitted information. The experiments on two real-world recommendation datasets, HetRec-MovieLens and Criteo, demonstrate the effectiveness of FedPDD compared to the state-of-the-art approaches. </p>
back
<p>Model generalizability to unseen datasets, concerned with in-the-wild robustness, is less studied for indoor single-image depth prediction. We leverage gradient-based meta-learning for higher generalizability on zero-shot cross-dataset inference. Unlike the most-studied image classification in meta-learning, depth is pixel-level continuous range values, and mappings from each image to depth vary widely across environments. Thus no explicit task boundaries exist. We instead propose fine-grained task that treats each RGB-D pair as a task in our meta-optimization. We first show meta-learning on limited data induces much better prior (max +29.4\%). Using meta-learned weights as initialization for following supervised learning, without involving extra data or information, it consistently outperforms baselines without the method. Compared to most indoor-depth methods that only train/ test on a single dataset, we propose zero-shot cross-dataset protocols, closely evaluate robustness, and show consistently higher generalizability and accuracy by our meta-initialization. The work at the intersection of depth and meta-learning potentially drives both research streams to step closer to practical use. </p>
back
<p>$\textbf{Objectives}$: Large Language Models (LLMs) such as ChatGPT and Med-PaLM have excelled in various medical question-answering tasks. However, these English-centric models encounter challenges in non-English clinical settings, primarily due to limited clinical knowledge in respective languages, a consequence of imbalanced training corpora. We systematically evaluate LLMs in the Chinese medical context and develop a novel in-context learning framework to enhance their performance. </p> <p>$\textbf{Materials and Methods}$: The latest China National Medical Licensing Examination (CNMLE-2022) served as the benchmark. We collected 53 medical books and 381,149 medical questions to construct the medical knowledge base and question bank. The proposed Knowledge and Few-shot Enhancement In-context Learning (KFE) framework leverages the in-context learning ability of LLMs to integrate diverse external clinical knowledge sources. We evaluated KFE with ChatGPT(GPT3.5), GPT4, Baichuan2(BC2)-7B, and BC2-13B in CNMLE-2022 and investigated the effectiveness of different pathways for incorporating LLMs with medical knowledge from 7 perspectives. </p> <p>$\textbf{Results}$: Directly applying ChatGPT failed to qualify for the CNMLE-2022 at a score of 51. Cooperated with the KFE, the LLMs with varying sizes yielded consistent and significant improvements. The ChatGPT's performance surged to 70.04 and GPT-4 achieved the highest score of 82.59. This surpasses the qualification threshold (60) and exceeds the average human score of 68.70. It also enabled a smaller BC2-13B to pass the examination, showcasing the great potential in low-resource settings. </p> <p>$\textbf{Conclusion}$: By synergizing medical knowledge through in-context learning, LLM can extend clinical insight beyond language barriers, significantly reducing language-related disparities of LLM applications and ensuring global benefit in healthcare. </p>
back
<p>The IoT is vulnerable to network attacks, and Intrusion Detection Systems (IDS) can provide high attack detection accuracy and are easily installed in IoT Servers. However, IDS are seldom evaluated in operational conditions which are seriously impaired by attack overload. Thus a Local Area Network testbed is used to evaluate the impact of UDP Flood Attacks on an IoT Server, whose first line of defence is an accurate IDS. We show that attacks overload the multi-core Server and paralyze its IDS. Thus a mitigation scheme that detects attacks rapidly, and drops packets within milli-seconds after the attack begins, is proposed and experimentally evaluated. </p>
back
<p>Humans work together to solve common problems by having discussions, explaining, and agreeing or disagreeing with each other. Similarly, if a system can have discussions with humans when solving tasks, it can improve the system's performance and reliability. In previous research on explainability, it has only been possible for the system to make predictions and for humans to ask questions about them rather than having a mutual exchange of opinions. This research aims to create a dataset and computational framework for systems that discuss and refine their predictions through dialogue. Through experiments, we show that the proposed system can have beneficial discussions with humans improving the accuracy by up to 25 points in the natural language inference task. </p>
back
<p>Representations from transformer-based unidirectional language models are known to be effective at predicting brain responses to natural language. However, most studies comparing language models to brains have used GPT-2 or similarly sized language models. Here we tested whether larger open-source models such as those from the OPT and LLaMA families are better at predicting brain responses recorded using fMRI. Mirroring scaling results from other contexts, we found that brain prediction performance scales logarithmically with model size from 125M to 30B parameter models, with ~15% increased encoding performance as measured by correlation with a held-out test set across 3 subjects. Similar logarithmic behavior was observed when scaling the size of the fMRI training set. We also characterized scaling for acoustic encoding models that use HuBERT, WavLM, and Whisper, and we found comparable improvements with model size. A noise ceiling analysis of these large, high-performance encoding models showed that performance is nearing the theoretical maximum for brain areas such as the precuneus and higher auditory cortex. These results suggest that increasing scale in both models and data will yield incredibly effective models of language processing in the brain, enabling better scientific understanding as well as applications such as decoding. </p>
back
<p>Speech language models (SpeechLMs) process and generate acoustic data only, without textual supervision. In this work, we propose TWIST, a method for training SpeechLMs using a warm-start from a pretrained textual language models. We show using both automatic and human evaluations that TWIST outperforms a cold-start SpeechLM across the board. We empirically analyze the effect of different model design choices such as the speech tokenizer, the pretrained textual model, and the dataset size. We find that model and dataset scale both play an important role in constructing better-performing SpeechLMs. Based on our observations, we present the largest (to the best of our knowledge) SpeechLM both in terms of number of parameters and training data. We additionally introduce two spoken versions of the StoryCloze textual benchmark to further improve model evaluation and advance future research in the field. We make speech samples, code and models publicly available: https://pages.cs.huji.ac.il/adiyoss-lab/twist/ . </p>
back
<p>Training generally capable agents that thoroughly explore their environment and learn new and diverse skills is a long-term goal of robot learning. Quality Diversity Reinforcement Learning (QD-RL) is an emerging research area that blends the best aspects of both fields -- Quality Diversity (QD) provides a principled form of exploration and produces collections of behaviorally diverse agents, while Reinforcement Learning (RL) provides a powerful performance improvement operator enabling generalization across tasks and dynamic environments. Existing QD-RL approaches have been constrained to sample efficient, deterministic off-policy RL algorithms and/or evolution strategies, and struggle with highly stochastic environments. In this work, we, for the first time, adapt on-policy RL, specifically Proximal Policy Optimization (PPO), to the Differentiable Quality Diversity (DQD) framework and propose additional improvements over prior work that enable efficient optimization and discovery of novel skills on challenging locomotion tasks. Our new algorithm, Proximal Policy Gradient Arborescence (PPGA), achieves state-of-the-art results, including a 4x improvement in best reward over baselines on the challenging humanoid domain. </p>
back
<p>Diffusion models, as a kind of powerful generative model, have given impressive results on image super-resolution (SR) tasks. However, due to the randomness introduced in the reverse process of diffusion models, the performances of diffusion-based SR models are fluctuating at every time of sampling, especially for samplers with few resampled steps. This inherent randomness of diffusion models results in ineffectiveness and instability, making it challenging for users to guarantee the quality of SR results. However, our work takes this randomness as an opportunity: fully analyzing and leveraging it leads to the construction of an effective plug-and-play sampling method that owns the potential to benefit a series of diffusion-based SR methods. More in detail, we propose to steadily sample high-quality SR images from pre-trained diffusion-based SR models by solving diffusion ordinary differential equations (diffusion ODEs) with optimal boundary conditions (BCs) and analyze the characteristics between the choices of BCs and their corresponding SR results. Our analysis shows the route to obtain an approximately optimal BC via an efficient exploration in the whole space. The quality of SR results sampled by the proposed method with fewer steps outperforms the quality of results sampled by current methods with randomness from the same pre-trained diffusion-based SR model, which means that our sampling method "boosts" current diffusion-based SR models without any additional training. </p>
back
<p>This paper proposes a new easy-to-implement parameter-free gradient-based optimizer: DoWG (Distance over Weighted Gradients). We prove that DoWG is efficient -- matching the convergence rate of optimally tuned gradient descent in convex optimization up to a logarithmic factor without tuning any parameters, and universal -- automatically adapting to both smooth and nonsmooth problems. While popular algorithms following the AdaGrad framework compute a running average of the squared gradients to use for normalization, DoWG maintains a new distance-based weighted version of the running average, which is crucial to achieve the desired properties. To complement our theory, we also show empirically that DoWG trains at the edge of stability, and validate its effectiveness on practical machine learning tasks. </p>
back
<p>Unsupervised domain adaptation (UDA) involves adapting a model trained on a label-rich source domain to an unlabeled target domain. However, in real-world scenarios, the absence of target-domain labels makes it challenging to evaluate the performance of UDA models. Furthermore, prevailing UDA methods relying on adversarial training and self-training could lead to model degeneration and negative transfer, further exacerbating the evaluation problem. In this paper, we propose a novel metric called the \textit{Transfer Score} to address these issues. The proposed metric enables the unsupervised evaluation of UDA models by assessing the spatial uniformity of the classifier via model parameters, as well as the transferability and discriminability of deep representations. Based on the metric, we achieve three novel objectives without target-domain labels: (1) selecting the best UDA method from a range of available options, (2) optimizing hyperparameters of UDA models to prevent model degeneration, and (3) identifying which checkpoint of UDA model performs optimally. Our work bridges the gap between data-level UDA research and practical UDA scenarios, enabling a realistic assessment of UDA model performance. We validate the effectiveness of our metric through extensive empirical studies on UDA datasets of different scales and imbalanced distributions. The results demonstrate that our metric robustly achieves the aforementioned goals. </p>
back
<p>Understanding how social situations unfold in people's daily lives is relevant to designing mobile systems that can support users in their personal goals, well-being, and activities. As an alternative to questionnaires, some studies have used passively collected smartphone sensor data to infer social context (i.e., being alone or not) with machine learning models. However, the few existing studies have focused on specific daily life occasions and limited geographic cohorts in one or two countries. This limits the understanding of how inference models work in terms of generalization to everyday life occasions and multiple countries. In this paper, we used a novel, large-scale, and multimodal smartphone sensing dataset with over 216K self-reports collected from 581 young adults in five countries (Mongolia, Italy, Denmark, UK, Paraguay), first to understand whether social context inference is feasible with sensor data, and then, to know how behavioral and country-level diversity affects inferences. We found that several sensors are informative of social context, that partially personalized multi-country models (trained and tested with data from all countries) and country-specific models (trained and tested within countries) can achieve similar performance above 90% AUC, and that models do not generalize well to unseen countries regardless of geographic proximity. These findings confirm the importance of the diversity of mobile data, to better understand social context inference models in different countries. </p>
back
<p>Audio bandwidth extension involves the realistic reconstruction of high-frequency spectra from bandlimited observations. In cases where the lowpass degradation is unknown, such as in restoring historical audio recordings, this becomes a blind problem. This paper introduces a novel method called BABE (Blind Audio Bandwidth Extension) that addresses the blind problem in a zero-shot setting, leveraging the generative priors of a pre-trained unconditional diffusion model. During the inference process, BABE utilizes a generalized version of diffusion posterior sampling, where the degradation operator is unknown but parametrized and inferred iteratively. The performance of the proposed method is evaluated using objective and subjective metrics, and the results show that BABE surpasses state-of-the-art blind bandwidth extension baselines and achieves competitive performance compared to informed methods when tested with synthetic data. Moreover, BABE exhibits robust generalization capabilities when enhancing real historical recordings, effectively reconstructing the missing high-frequency content while maintaining coherence with the original recording. Subjective preference tests confirm that BABE significantly improves the audio quality of historical music recordings. Examples of historical recordings restored with the proposed method are available on the companion webpage: (<a href="http://research.spa.aalto.fi/publications/papers/ieee-taslp-babe/">this http URL</a>) </p>
back
<p>This work proposes a Momentum-Enabled Kronecker-Factor-Based Optimizer Using Rank-1 updates, called MKOR, that improves the training time and convergence properties of deep neural networks (DNNs). Second-order techniques, while enjoying higher convergence rates vs first-order counterparts, have cubic complexity with respect to either the model size and/or the training batch size. Hence they exhibit poor scalability and performance in transformer models, e.g. large language models (LLMs), because the batch sizes in these models scale by the attention mechanism sequence length, leading to large model size and batch sizes. MKOR's complexity is quadratic with respect to the model size, alleviating the computation bottlenecks in second-order methods. Because of their high computation complexity, state-of-the-art implementations of second-order methods can only afford to update the second order information infrequently, and thus do not fully exploit the promise of better convergence from these updates. By reducing the communication complexity of the second-order updates as well as achieving a linear communication complexity, MKOR increases the frequency of second order updates. We also propose a hybrid version of MKOR (called MKOR-H) that mid-training falls backs to a first order optimizer if the second order updates no longer accelerate convergence. Our experiments show that MKOR outperforms state -of-the-art first order methods, e.g. the LAMB optimizer, and best implementations of second-order methods, i.e. KAISA/KFAC, up to 2.57x and 1.85x respectively on BERT-Large-Uncased on 64 GPUs. </p>
back
<p>We tackle the task of conditional music generation. We introduce MusicGen, a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens. Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns, which eliminates the need for cascading several models, e.g., hierarchically or upsampling. Following this approach, we demonstrate how MusicGen can generate high-quality samples, both mono and stereo, while being conditioned on textual description or melodic features, allowing better controls over the generated output. We conduct extensive empirical evaluation, considering both automatic and human studies, showing the proposed approach is superior to the evaluated baselines on a standard text-to-music benchmark. Through ablation studies, we shed light over the importance of each of the components comprising MusicGen. Music samples, code, and models are available at https://github.com/facebookresearch/audiocraft </p>
back
<p>Despite advances in generative methods, accurately modeling the distribution of graphs remains a challenging task primarily because of the absence of predefined or inherent unique graph representation. Two main strategies have emerged to tackle this issue: 1) restricting the number of possible representations by sorting the nodes, or 2) using permutation-invariant/equivariant functions, specifically Graph Neural Networks (GNNs). </p> <p>In this paper, we introduce a new framework named Discrete Graph Auto-Encoder (DGAE), which leverages the strengths of both strategies and mitigate their respective limitations. In essence, we propose a strategy in 2 steps. We first use a permutation-equivariant auto-encoder to convert graphs into sets of discrete latent node representations, each node being represented by a sequence of quantized vectors. In the second step, we sort the sets of discrete latent representations and learn their distribution with a specifically designed auto-regressive model based on the Transformer architecture. </p> <p>Through multiple experimental evaluations, we demonstrate the competitive performances of our model in comparison to the existing state-of-the-art across various datasets. Various ablation studies support the interest of our method. </p>
back
<p>The rise of Generative Artificial Intelligence systems ("AI systems") has created unprecedented social engagement. AI code generation systems provide responses (output) to questions or requests by accessing the vast library of open-source code created by developers over the past few decades. However, they do so by allegedly stealing the open-source code stored in virtual libraries, known as repositories. This Article focuses on how this happens and whether there is a solution that protects innovation and avoids years of litigation. We also touch upon the array of issues raised by the relationship between AI and copyright. Looking ahead, we propose the following: (a) immediate changes to the licenses for open-source code created by developers that will limit access and/or use of any open-source code to humans only; (b) we suggest revisions to the Massachusetts Institute of Technology ("MIT") license so that AI systems are required to procure appropriate licenses from open-source code developers, which we believe will harmonize standards and build social consensus for the benefit of all of humanity, rather than promote profit-driven centers of innovation; (c) we call for urgent legislative action to protect the future of AI systems while also promoting innovation; and (d) we propose a shift in the burden of proof to AI systems in obfuscation cases. </p>
back
<p>By combining voice and touch interactions, multimodal interfaces can surpass the efficiency of either modality alone. This paper targets complex interactions, where users can issue multimodal commands that translate into one of the possible exponential combinations of actions/function invocations. This paper presents ReactGenie, a programming framework where developers can code with simple object-oriented abstractions and labeled user-invocable primitives. ReactGenie translates multimodal user commands into ReactGenieDSL, a domain-specific language we created for this purpose, using a neural semantic parser based on large-language models. The ReactGenie runtime interprets the parsed ReactGenieDSL and composes primitives to implement complex user commands. As a result, ReactGenie provides an unprecedented level of richness in user interactions. Our evaluation showed that 12 developers can learn and build a ReactGenie application in under 2.5 hours on average. In addition, compared with a traditional GUI, end users can complete tasks faster and with less task load using ReactGenie apps. </p>
back
<p>Recently, ChatGPT, a representative large language model (LLM), has gained considerable attention due to its powerful emergent abilities. Some researchers suggest that LLMs could potentially replace structured knowledge bases like knowledge graphs (KGs) and function as parameterized knowledge bases. However, while LLMs are proficient at learning probabilistic language patterns based on large corpus and engaging in conversations with humans, they, like previous smaller pre-trained language models (PLMs), still have difficulty in recalling facts while generating knowledge-grounded contents. To overcome these limitations, researchers have proposed enhancing data-driven PLMs with knowledge-based KGs to incorporate explicit factual knowledge into PLMs, thus improving their performance to generate texts requiring factual knowledge and providing more informed responses to user queries. This paper reviews the studies on enhancing PLMs with KGs, detailing existing knowledge graph enhanced pre-trained language models (KGPLMs) as well as their applications. Inspired by existing studies on KGPLM, this paper proposes to enhance LLMs with KGs by developing knowledge graph-enhanced large language models (KGLLMs). KGLLM provides a solution to enhance LLMs' factual reasoning ability, opening up new avenues for LLM research. </p>
back
<p>Despite deep learning has achieved great success, it often relies on a large amount of training data with accurate labels, which are expensive and time-consuming to collect. A prominent direction to reduce the cost is to learn with noisy labels, which are ubiquitous in the real-world applications. A critical challenge for such a learning task is to reduce the effect of network memorization on the falsely-labeled data. In this work, we propose an iterative selection approach based on the Weibull mixture model, which identifies clean data by considering the overall learning dynamics of each data instance. In contrast to the previous small-loss heuristics, we leverage the observation that deep network is easy to memorize and hard to forget clean data. In particular, we measure the difficulty of memorization and forgetting for each instance via the transition times between being misclassified and being memorized in training, and integrate them into a novel metric for selection. Based on the proposed metric, we retain a subset of identified clean data and repeat the selection procedure to iteratively refine the clean subset, which is finally used for model training. To validate our method, we perform extensive experiments on synthetic noisy datasets and real-world web data, and our strategy outperforms existing noisy-label learning methods. </p>
back
<p>One of the fundamental results in quantum foundations is the Kochen-Specker (KS) theorem, which states that any theory whose predictions agree with quantum mechanics must be contextual, i.e., a quantum observation cannot be understood as revealing a pre-existing value. The theorem hinges on the existence of a mathematical object called a KS vector system. While many KS vector systems are known, the problem of finding the minimum KS vector system in three dimensions (3D) has remained stubbornly open for over 55 years. </p> <p>To address the minimum KS problem, we present a new verifiable proof-producing method based on a combination of a Boolean satisfiability (SAT) solver and a computer algebra system (CAS) that uses an isomorph-free orderly generation technique that is very effective in pruning away large parts of the search space. Our method shows that a KS system in 3D must contain at least 24 vectors. We show that our sequential and parallel Cube-and-Conquer (CnC) SAT+CAS methods are significantly faster than SAT-only, CAS-only, and a prior CAS-based method of Uijlen and Westerbaan. Further, while our parallel pipeline is somewhat slower than the parallel CnC version of the recently introduced Satisfiability Modulo Theories (SMS) method, this is in part due to the overhead of proof generation. Finally, we provide the first computer-verifiable proof certificate of a lower bound to the KS problem with a size of 42.9 TiB in order 23. </p>
back
<p>We introduce a general abstract framework for database repairing in which the repair notions are defined using formal logic. We differentiate between integrity constraints and the so-called query constraints. The former are used to model consistency and desirable properties of the data (such as functional dependencies and independencies), while the latter relates two database instances according to their answers for the query constraints. The framework also admits a distinction between hard and soft queries, allowing to preserve the answers of a core set of queries as well as defining a distance between instances based on query answers. We exemplify how various notions of repairs from the literature can be modelled in our unifying framework. Furthermore, we initiate a complexity-theoretic analysis of the problems of consistent query answering, repair computation, and existence of repair within the new framework. We present both coNP- and NP-hard cases that illustrate the interplay between computationally hard problems and more flexible repair notions. We show general upper bounds in NP and the second level of the polynomial hierarchy. Finally, we relate the existence of a repair to model checking of existential second-order logic. </p>
back
<p>Matroidal entropy functions are entropy functions in the form $\mathbf{h} = \log v \cdot \mathbf{r}_M$ , where $v \ge 2$ is an integer and $\mathbf{r}_M$ is the rank function of a matroid $M$. They can be applied into capacity characterization and code construction of information theory problems such as network coding, secret sharing, index coding and locally repairable code. In this paper, by constructing the variable strength arrays of some matroid operations, we characterized matroidal entropy functions induced by regular matroids and some matroids with the same p-characteristic set as uniform matroid $U_{2,4}$. </p>
back
<p>Transfer learning plays a key role in modern data analysis when: (1) the target data are scarce but the source data are sufficient; (2) the distributions of the source and target data are heterogeneous. This paper develops an interpretable unified transfer learning model, termed as UTrans, which can detect both transferable variables and source data. More specifically, we establish the estimation error bounds and prove that our bounds are lower than those with target data only. Besides, we propose a source detection algorithm based on hypothesis testing to exclude the nontransferable data. We evaluate and compare UTrans to the existing algorithms in multiple experiments. It is shown that UTrans attains much lower estimation and prediction errors than the existing methods, while preserving interpretability. We finally apply it to the US intergenerational mobility data and compare our proposed algorithms to the classical machine learning algorithms. </p>
back
<p>The expressiveness of neural networks highly depends on the nature of the activation function, although these are usually assumed predefined and fixed during the training stage. Under a signal processing perspective, in this paper we present Expressive Neural Network (ENN), a novel model in which the non-linear activation functions are modeled using the Discrete Cosine Transform (DCT) and adapted using backpropagation during training. This parametrization keeps the number of trainable parameters low, is appropriate for gradient-based schemes, and adapts to different learning tasks. This is the first non-linear model for activation functions that relies on a signal processing perspective, providing high flexibility and expressiveness to the network. We contribute with insights in the explainability of the network at convergence by recovering the concept of bump, this is, the response of each activation function in the output space. Finally, through exhaustive experiments we show that the model can adapt to classification and regression tasks. The performance of ENN outperforms state of the art benchmarks, providing above a 40% gap in accuracy in some scenarios. </p>
back
<p>As a popular channel pruning method for convolutional neural networks (CNNs), network slimming (NS) has a three-stage process: (1) it trains a CNN with $\ell_1$ regularization applied to the scaling factors of the batch normalization layers; (2) it removes channels whose scaling factors are below a chosen threshold; and (3) it retrains the pruned model to recover the original accuracy. This time-consuming, three-step process is a result of using subgradient descent to train CNNs. Because subgradient descent does not exactly train CNNs towards sparse, accurate structures, the latter two steps are necessary. Moreover, subgradient descent does not have any convergence guarantee. Therefore, we develop an alternative algorithm called proximal NS. Our proposed algorithm trains CNNs towards sparse, accurate structures, so identifying a scaling factor threshold is unnecessary and fine tuning the pruned CNNs is optional. Using Kurdyka-{\L}ojasiewicz assumptions, we establish global convergence of proximal NS. Lastly, we validate the efficacy of the proposed algorithm on VGGNet, DenseNet and ResNet on CIFAR 10/100. Our experiments demonstrate that after one round of training, proximal NS yields a CNN with competitive accuracy and compression. </p>
back
<p>We provide two families of algorithms to compute characteristic polynomials of endomorphisms and norms of isogenies of Drinfeld modules. Our algorithms work for Drinfeld modules of any rank, defined over any base curve. When the base curve is $\mathbb P^1_{\mathbb F_q}$, we do a thorough study of the complexity, demonstrating that our algorithms are, in many cases, the most asymptotically performant. The first family of algorithms relies on the correspondence between Drinfeld modules and Anderson motives, reducing the computation to linear algebra over a polynomial ring. The second family, available only for the Frobenius endomorphism, is based on a formula expressing the characteristic polynomial of the Frobenius as a reduced norm in a central simple algebra. </p>
back
<p>In this paper, we provide a theoretical analysis of the recently introduced weakly adversarial networks (WAN) method, used to approximate partial differential equations in high dimensions. We address the existence and stability of the solution, as well as approximation bounds. More precisely, we prove the existence of discrete solutions, intended in a suitable weak sense, for which we prove a quasi-best approximation estimate similar to Cea's lemma, a result commonly found in finite element methods. We also propose two new stabilized WAN-based formulas that avoid the need for direct normalization. Furthermore, we analyze the method's effectiveness for the Dirichlet boundary problem that employs the implicit representation of the geometry. The key requirement for achieving the best approximation outcome is to ensure that the space for the test network satisfies a specific condition, known as the inf-sup condition, essentially requiring that the test network set is sufficiently large when compared to the trial space. The method's accuracy, however, is only determined by the space of the trial network. We also devise a pseudo-time XNODE neural network class for static PDE problems, yielding significantly faster convergence results than the classical DNN network. </p>
back
<p>Translational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data, but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to construct them automatically. However, tackling complex biomedical integration problems requires flexibility in the way knowledge is modeled. Moreover, existing KG construction methods provide robust tooling at the cost of fixed or limited choices among knowledge representation models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem for automating the FAIR (Findable, Accessible, Interoperable, and Reusable) construction of ontologically grounded KGs with fully customizable knowledge representation. The ecosystem includes KG construction resources (e.g., data preparation APIs), analysis tools (e.g., SPARQL endpoints and abstraction algorithms), and benchmarks (e.g., prebuilt KGs and embeddings). We evaluated the ecosystem by systematically comparing it to existing open-source KG construction methods and by analyzing its computational performance when used to construct 12 large-scale KGs. With flexible knowledge representation, PheKnowLator enables fully customizable KGs without compromising performance or usability. </p>
back
<p>Optimal Transport has sparked vivid interest in recent years, in particular thanks to the Wasserstein distance, which provides a geometrically sensible and intuitive way of comparing probability measures. For computational reasons, the Sliced Wasserstein (SW) distance was introduced as an alternative to the Wasserstein distance, and has seen uses for training generative Neural Networks (NNs). While convergence of Stochastic Gradient Descent (SGD) has been observed practically in such a setting, there is to our knowledge no theoretical guarantee for this observation. Leveraging recent works on convergence of SGD on non-smooth and non-convex functions by Bianchi et al. (2022), we aim to bridge that knowledge gap, and provide a realistic context under which fixed-step SGD trajectories for the SW loss on NN parameters converge. More precisely, we show that the trajectories approach the set of (sub)-gradient flow equations as the step decreases. Under stricter assumptions, we show a much stronger convergence result for noised and projected SGD schemes, namely that the long-run limits of the trajectories approach a set of generalised critical points of the loss function. </p>
back
<p>This paper unveils CG-Eval, the first-ever comprehensive and automated evaluation framework designed for assessing the generative capabilities of large Chinese language models across a spectrum of academic disciplines. CG-Eval stands out for its automated process, which critically assesses models based on their proficiency in generating precise and contextually relevant responses to a diverse array of questions within six key domains: Science and Engineering, Humanities and Social Sciences, Mathematical Calculations, Medical Practitioner Qualification Examination, Judicial Examination, and Certified Public Accountant Examination. Alongside this, we introduce Gscore, an innovative composite index developed from a weighted sum of multiple metrics. Gscore uniquely automates the quality measurement of a model's text generation against reference standards, providing a detailed and nuanced assessment of model performance. This automation not only enhances the efficiency and scalability of the evaluation process but also ensures objective and consistent assessment across various models. The detailed test data and results, highlighting the robust capabilities and comparative performance of the evaluated models, are accessible at <a href="http://cgeval.besteasy.com/.">this http URL</a> </p>
back
<p>Self-supervised learning (SSL) has gained significant interest in recent years as a solution to address the challenges posed by sparse and noisy data in recommender systems. Despite the growing number of SSL algorithms designed to provide state-of-the-art performance in various recommendation scenarios (e.g., graph collaborative filtering, sequential recommendation, social recommendation, KG-enhanced recommendation), there is still a lack of unified frameworks that integrate recommendation algorithms across different domains. Such a framework could serve as the cornerstone for self-supervised recommendation algorithms, unifying the validation of existing methods and driving the design of new ones. To address this gap, we introduce SSLRec, a novel benchmark platform that provides a standardized, flexible, and comprehensive framework for evaluating various SSL-enhanced recommenders. The SSLRec framework features a modular architecture that allows users to easily evaluate state-of-the-art models and a complete set of data augmentation and self-supervised toolkits to help create SSL recommendation models with specific needs. Furthermore, SSLRec simplifies the process of training and evaluating different recommendation models with consistent and fair settings. Our SSLRec platform covers a comprehensive set of state-of-the-art SSL-enhanced recommendation models across different scenarios, enabling researchers to evaluate these cutting-edge models and drive further innovation in the field. Our implemented SSLRec framework is available at the source code repository https://github.com/HKUDS/SSLRec. </p>
back
<p>Recent proliferation of electric vehicle (EV) charging events has brought prominent stress over power grid operation. Due to the stochastic and volatile EV charging behaviors, the induced charging loads are extremely uncertain, posing modeling and control challenges for grid operators and charging management. Generating EV charging scenarios would aid via synthesizing a myriad of realistic charging scenarios. To this end, we propose a novel denoising Diffusion-based Charging scenario generation model DiffCharge, which is capable of generating a broad variety of realistic EV charging profiles with distinctive temporal properties. It is able to progressively convert the simply known Gaussian noise to genuine charging time-series data, by learning a parameterized reversal of a forward diffusion process. Besides, we leverage the multi-head self-attention and prior conditions to capture the temporal correlations and unique information associated with EV or charging station types in real charging profiles. Moreover, We demonstrate the superiority of DiffCharge on extensive real-world charging datasets, as well as the efficacy on EV integration in power distribution grids. </p>
back
<p>The increasing versatility of language models LMs has given rise to a new class of benchmarks that comprehensively assess a broad range of capabilities. Such benchmarks are associated with massive computational costs reaching thousands of GPU hours per model. However the efficiency aspect of these evaluation efforts had raised little discussion in the literature. In this work we present the problem of Efficient Benchmarking namely intelligently reducing the computation costs of LM evaluation without compromising reliability. Using the HELM benchmark as a test case we investigate how different benchmark design choices affect the computation-reliability tradeoff. We propose to evaluate the reliability of such decisions by using a new measure Decision Impact on Reliability DIoR for short. We find for example that the current leader on HELM may change by merely removing a low-ranked model from the benchmark and observe that a handful of examples suffice to obtain the correct benchmark ranking. Conversely a slightly different choice of HELM scenarios varies ranking widely. Based on our findings we outline a set of concrete recommendations for more efficient benchmark design and utilization practices leading to dramatic cost savings with minimal loss of benchmark reliability often reducing computation by x100 or more. </p>
back
<p>Deep learning (DL) has become one of the mainstream and effective methods for point cloud analysis tasks such as detection, segmentation and classification. To reduce overfitting during training DL models and improve model performance especially when the amount and/or diversity of training data are limited, augmentation is often crucial. Although various point cloud data augmentation methods have been widely used in different point cloud processing tasks, there are currently no published systematic surveys or reviews of these methods. Therefore, this article surveys these methods, categorizing them into a taxonomy framework that comprises basic and advanced point cloud data augmentation methods, according to their levels of complexity. Through a comprehensive evaluation of these augmentation methods, this article identifies their potentials and limitations, serving as a useful reference for choosing appropriate augmentation methods. In addition, potential directions for future research are recommended. This survey contributes to providing a holistic overview of the current state of point cloud data augmentation, promoting its wider application and development. </p>
back
<p>Graph Neural Networks (GNNs) have emerged as promising solutions for collaborative filtering (CF) through the modeling of user-item interaction graphs. The nucleus of existing GNN-based recommender systems involves recursive message passing along user-item interaction edges to refine encoded embeddings. Despite their demonstrated effectiveness, current GNN-based methods encounter challenges of limited receptive fields and the presence of noisy ``interest-irrelevant'' connections. In contrast, Transformer-based methods excel in aggregating information adaptively and globally. Nevertheless, their application to large-scale interaction graphs is hindered by inherent complexities and challenges in capturing intricate, entangled structural information. In this paper, we propose TransGNN, a novel model that integrates Transformer and GNN layers in an alternating fashion to mutually enhance their capabilities. Specifically, TransGNN leverages Transformer layers to broaden the receptive field and disentangle information aggregation from edges, which aggregates information from more relevant nodes, thereby enhancing the message passing of GNNs. Additionally, to capture graph structure information effectively, positional encoding is meticulously designed and integrated into GNN layers to encode such structural knowledge into node attributes, thus enhancing the Transformer's performance on graphs. Efficiency considerations are also alleviated by proposing the sampling of the most relevant nodes for the Transformer, along with two efficient sample update strategies to reduce complexity. Furthermore, theoretical analysis demonstrates that TransGNN offers increased expressiveness compared to GNNs, with only a marginal increase in linear complexity. Extensive experiments on five public datasets validate the effectiveness and efficiency of TransGNN. </p>
back
<p>During the past decade, deep neural networks have led to fast-paced progress and significant achievements in computer vision problems, for both academia and industry. Yet despite their success, state-of-the-art image classification approaches fail to generalize well in previously unseen visual contexts, as required by many real-world applications. In this paper, we focus on this domain generalization (DG) problem and argue that the generalization ability of deep convolutional neural networks can be improved by taking advantage of multi-layer and multi-scaled representations of the network. We introduce a framework that aims at improving domain generalization of image classifiers by combining both low-level and high-level features at multiple scales, enabling the network to implicitly disentangle representations in its latent space and learn domain-invariant attributes of the depicted objects. Additionally, to further facilitate robust representation learning, we propose a novel objective function, inspired by contrastive learning, which aims at constraining the extracted representations to remain invariant under distribution shifts. We demonstrate the effectiveness of our method by evaluating on the domain generalization datasets of PACS, VLCS, Office-Home and NICO. Through extensive experimentation, we show that our model is able to surpass the performance of previous DG methods and consistently produce competitive and state-of-the-art results in all datasets. </p>
back
<p>High-definition (HD) maps play a crucial role in autonomous driving systems. Recent methods have attempted to construct HD maps in real-time using vehicle onboard sensors. Due to the inherent limitations of onboard sensors, which include sensitivity to detection range and susceptibility to occlusion by nearby vehicles, the performance of these methods significantly declines in complex scenarios and long-range detection tasks. In this paper, we explore a new perspective that boosts HD map construction through the use of satellite maps to complement onboard sensors. We initially generate the satellite map tiles for each sample in nuScenes and release a complementary dataset for further research. To enable better integration of satellite maps with existing methods, we propose a hierarchical fusion module, which includes feature-level fusion and BEV-level fusion. The feature-level fusion, composed of a mask generator and a masked cross-attention mechanism, is used to refine the features from onboard sensors. The BEV-level fusion mitigates the coordinate differences between features obtained from onboard sensors and satellite maps through an alignment module. The experimental results on the augmented nuScenes showcase the seamless integration of our module into three existing HD map construction methods. The satellite maps and our proposed module notably enhance their performance in both HD map semantic segmentation and instance detection tasks. </p>
back
<p>Recommender models excel at providing domain-specific item recommendations by leveraging extensive user behavior data. Despite their ability to act as lightweight domain experts, they struggle to perform versatile tasks such as providing explanations and engaging in conversations. On the other hand, large language models (LLMs) represent a significant step towards artificial general intelligence, showcasing remarkable capabilities in instruction comprehension, commonsense reasoning, and human interaction. However, LLMs lack the knowledge of domain-specific item catalogs and behavioral patterns, particularly in areas that diverge from general world knowledge, such as online e-commerce. Finetuning LLMs for each domain is neither economic nor efficient. </p> <p>In this paper, we bridge the gap between recommender models and LLMs, combining their respective strengths to create a versatile and interactive recommender system. We introduce an efficient framework called \textbf{InteRecAgent}, which employs LLMs as the brain and recommender models as tools. We first outline a minimal set of essential tools required to transform LLMs into InteRecAgent. We then propose an efficient workflow within InteRecAgent for task execution, incorporating key components such as memory components, dynamic demonstration-augmented task planning, and reflection. InteRecAgent enables traditional recommender systems, such as those ID-based matrix factorization models, to become interactive systems with a natural language interface through the integration of LLMs. Experimental results on several public datasets show that InteRecAgent achieves satisfying performance as a conversational recommender system, outperforming general-purpose LLMs. The source code of InteRecAgent is released at https://aka.ms/recagent. </p>
back
<p>How to solve high-dimensional linear programs (LPs) efficiently is a fundamental question. Recently, there has been a surge of interest in reducing LP sizes using \textit{random projections}, which can accelerate solving LPs independently of improving LP solvers. </p> <p>In this paper, we explore a new direction of \emph{data-driven projections}, which use projection matrices learned from data instead of random projection matrices. Given data of past $n$-dimensional LPs, we learn an $n\times k$ projection matrix such that $n &gt; k$. When addressing a future LP instance, we reduce its dimensionality from $n$ to $k$ via the learned projection matrix, solve the resulting LP to obtain a $k$-dimensional solution, and apply the learned matrix to it to recover an $n$-dimensional solution. </p> <p>On the theoretical side, a natural question is: how much data is sufficient to ensure the quality of recovered solutions? We address this question based on the framework of \textit{data-driven algorithm design}, which connects the amount of data sufficient for establishing generalization bounds to the \textit{pseudo-dimension} of performance metrics. We obtain an $\tilde{\mathrm{O}}(nk^2)$ upper bound on the pseudo-dimension, where $\tilde{\mathrm{O}}$ compresses logarithmic factors. We also provide an $\Omega(nk)$ lower bound, implying our result is tight up to an $\tilde{\mathrm{O}}(k)$ factor. </p> <p>On the practical side, we explore two natural methods for learning projection matrices: PCA- and gradient-based methods. While the former is simple and efficient, the latter can sometimes lead to better solution quality. Our experiments confirm the practical benefit of learning projection matrices from data, achieving significantly higher solution quality than the existing random projection while greatly reducing the time for solving LPs. </p>
back
<p>Simulating turbulent flows is crucial for a wide range of applications, and machine learning-based solvers are gaining increasing relevance. However, achieving temporal stability when generalizing to longer rollout horizons remains a persistent challenge for learned PDE solvers. In this work, we analyze if fully data-driven fluid solvers that utilize an autoregressive rollout based on conditional diffusion models are a viable option to address this challenge. We investigate accuracy, posterior sampling, spectral behavior, and temporal stability, while requiring that methods generalize to flow parameters beyond the training regime. To quantitatively and qualitatively benchmark the performance of a range of flow prediction approaches, three challenging scenarios including incompressible and transonic flows, as well as isotropic turbulence are employed. We find that even simple diffusion-based approaches can outperform multiple established flow prediction methods in terms of accuracy and temporal stability, while being on par with state-of-the-art stabilization techniques like unrolling at training time. Such traditional architectures are superior in terms of inference speed, however, the probabilistic nature of diffusion approaches allows for inferring multiple predictions that align with the statistics of the underlying physics. Overall, our benchmark contains three carefully chosen data sets that are suitable for probabilistic evaluation alongside various established flow prediction architectures. </p>
back
<p>Two CNF formulas are called ucp-equivalent, if they behave in the same way with respect to the unit clause propagation (UCP). A formula is called ucp-irredundant, if removing any clause leads to a formula which is not ucp-equivalent to the original one. As a consequence of known results, the ratio of the size of a ucp-irredundant formula and the size of a smallest ucp-equivalent formula is at most $n^2$, where $n$ is the number of the variables. We demonstrate an example of a ucp-irredundant formula for a symmetric definite Horn function which is larger than a smallest ucp-equivalent formula by a factor $\Omega(n/\ln n)$ and, hence, a general upper bound on the above ratio cannot be smaller than this. </p>
back
<p>In this paper, we explore the dynamic behavior of threshold networks on undirected signed graphs. Much attention has been dedicated to understand the convergence and long-term behavior of this model. Yet, an open question persists: How does the underlying graph structure impact network dynamics? Similar studies have been carried out for threshold networks and other types of networks, but these primarily focus on unsigned networks. Here, we address the question on signed threshold networks. We introduce the stability index of a graph, related to the concepts of frustration and balance in signed graphs, to establish a connection between the structure and the dynamics of such networks. We show that graphs which present a negative stability index exhibit stable dynamics, i.e., the dynamics converges to fixed points regardless of its threshold parameters. Conversely, if at least one subgraph has a positive stability index, oscillations in long term behavior may appear. Furthermore, we generalize the analysis to network dynamics under periodic update modes and explore the case of the existence of some subgraph with a positive stability index, for which we find that attractors of super-polynomial period in the size of the network may appear. </p>
back
<p>Composed image retrieval, a task involving the search for a target image using a reference image and a complementary text as the query, has witnessed significant advancements owing to the progress made in cross-modal modeling. Unlike the general image-text retrieval problem with only one alignment relation, i.e., image-text, we argue for the existence of two types of relations in composed image retrieval. The explicit relation pertains to the reference image &amp; complementary text-target image, which is commonly exploited by existing methods. Besides this intuitive relation, the observations during our practice have uncovered another implicit yet crucial relation, i.e., reference image &amp; target image-complementary text, since we found that the complementary text can be inferred by studying the relation between the target image and the reference image. Regrettably, existing methods largely focus on leveraging the explicit relation to learn their networks, while overlooking the implicit relation. In response to this weakness, We propose a new framework for composed image retrieval, termed dual relation alignment, which integrates both explicit and implicit relations to fully exploit the correlations among the triplets. Specifically, we design a vision compositor to fuse reference image and target image at first, then the resulted representation will serve two roles: (1) counterpart for semantic alignment with the complementary text and (2) compensation for the complementary text to boost the explicit relation modeling, thereby implant the implicit relation into the alignment learning. Our method is evaluated on two popular datasets, CIRR and FashionIQ, through extensive experiments. The results confirm the effectiveness of our dual-relation learning in substantially enhancing composed image retrieval performance. </p>
back
<p>Physical systems can often be described via a continuous-time dynamical system. In practice, the true system is often unknown and has to be learned from measurement data. Since data is typically collected in discrete time, e.g. by sensors, most methods in Gaussian process (GP) dynamics model learning are trained on one-step ahead predictions. This can become problematic in several scenarios, e.g. if measurements are provided at irregularly-sampled time steps or physical system properties have to be conserved. Thus, we aim for a GP model of the true continuous-time dynamics. Higher-order numerical integrators provide the necessary tools to address this problem by discretizing the dynamics function with arbitrary accuracy. Many higher-order integrators require dynamics evaluations at intermediate time steps making exact GP inference intractable. In previous work, this problem is often tackled by approximating the GP posterior with variational inference. However, exact GP inference is preferable in many scenarios, e.g. due to its mathematical guarantees. In order to make direct inference tractable, we propose to leverage multistep and Taylor integrators. We demonstrate how to derive flexible inference schemes for these types of integrators. Further, we derive tailored sampling schemes that allow to draw consistent dynamics functions from the learned posterior. This is crucial to sample consistent predictions from the dynamics model. We demonstrate empirically and theoretically that our approach yields an accurate representation of the continuous-time system. </p>
back
<p>Dynamics model learning deals with the task of inferring unknown dynamics from measurement data and predicting the future behavior of the system. A typical approach to address this problem is to train recurrent models. However, predictions with these models are often not physically meaningful. Further, they suffer from deteriorated behavior over time due to accumulating errors. Often, simulators building on first principles are available being physically meaningful by design. However, modeling simplifications typically cause inaccuracies in these models. Consequently, hybrid modeling is an emerging trend that aims to combine the best of both worlds. In this paper, we propose a new approach to hybrid modeling, where we inform the latent states of a learned model via a black-box simulator. This allows to control the predictions via the simulator preventing them from accumulating errors. This is especially challenging since, in contrast to previous approaches, access to the simulator's latent states is not available. We tackle the task by leveraging observers, a well-known concept from control theory, inferring unknown latent states from observations and dynamics over time. In our learning-based setting, we jointly learn the dynamics and an observer that infers the latent states via the simulator. Thus, the simulator constantly corrects the latent states, compensating for modeling mismatch caused by learning. To maintain flexibility, we train an RNN-based residuum for the latent states that cannot be informed by the simulator. </p>
back
<p>We explore the impact of coarse quantization on low-rank matrix sensing in the extreme scenario of dithered one-bit sampling, where the high-resolution measurements are compared with random time-varying threshold levels. To recover the low-rank matrix of interest from the highly-quantized collected data, we offer an enhanced randomized Kaczmarz algorithm that efficiently solves the emerging highly-overdetermined feasibility problem. Additionally, we provide theoretical guarantees in terms of the convergence and sample size requirements. Our numerical results demonstrate the effectiveness of the proposed methodology. </p>
back
<p>There are now many explainable AI methods for understanding the decisions of a machine learning model. Among these are those based on counterfactual reasoning, which involve simulating features changes and observing the impact on the prediction. This article proposes to view this simulation process as a source of creating a certain amount of knowledge that can be stored to be used, later, in different ways. This process is illustrated in the additive model and, more specifically, in the case of the naive Bayes classifier, whose interesting properties for this purpose are shown. </p>
back
<p>Language models often exhibit behaviors that improve performance on a pre-training objective but harm performance on downstream tasks. We propose a novel approach to removing undesirable behaviors by ablating a small number of causal pathways between model components, with the intention of disabling the computational circuit responsible for the bad behavior. Given a small dataset of inputs where the model behaves poorly, we learn to ablate a small number of important causal pathways. In the setting of reducing GPT-2 toxic language generation, we find ablating just 12 of the 11.6K causal edges mitigates toxic generation with minimal degradation of performance on other inputs. </p>
back
<p>Video-based remote physiological measurement utilizes facial videos to measure the blood volume change signal, which is also called remote photoplethysmography (rPPG). Supervised methods for rPPG measurements have been shown to achieve good performance. However, the drawback of these methods is that they require facial videos with ground truth (GT) physiological signals, which are often costly and difficult to obtain. In this paper, we propose Contrast-Phys+, a method that can be trained in both unsupervised and weakly-supervised settings. We employ a 3DCNN model to generate multiple spatiotemporal rPPG signals and incorporate prior knowledge of rPPG into a contrastive loss function. We further incorporate the GT signals into contrastive learning to adapt to partial or misaligned labels. The contrastive loss encourages rPPG/GT signals from the same video to be grouped together, while pushing those from different videos apart. We evaluate our methods on five publicly available datasets that include both RGB and Near-infrared videos. Contrast-Phys+ outperforms the state-of-the-art supervised methods, even when using partially available or misaligned GT signals, or no labels at all. Additionally, we highlight the advantages of our methods in terms of computational efficiency, noise robustness, and generalization. Our code is available at https://github.com/zhaodongsun/contrast-phys. </p>
back
<p>In-context learning (ICL) i.e. showing LLMs only a few task-specific demonstrations has led to downstream gains with no task-specific fine-tuning required. However, LLMs are sensitive to the choice of prompts, and therefore a crucial research question is how to select good demonstrations for ICL. One effective strategy is leveraging semantic similarity between the ICL demonstrations and test inputs by using a text retriever, which however is sub-optimal as that does not consider the LLM's existing knowledge about that task. From prior work (Lyu et al., 2023), we already know that labels paired with the demonstrations bias the model predictions. This leads us to our hypothesis whether considering LLM's existing knowledge about the task, especially with respect to the output label space can help in a better demonstration selection strategy. Through extensive experimentation on three text classification tasks, we find that it is beneficial to not only choose semantically similar ICL demonstrations but also to choose those demonstrations that help resolve the inherent label ambiguity surrounding the test example. Interestingly, we find that including demonstrations that the LLM previously mis-classified and also fall on the test example's decision boundary, brings the most performance gain. </p>
back
<p>Language models (LMs) have already demonstrated remarkable abilities in understanding and generating both natural and formal language. Despite these advances, their integration with real-world environments such as large-scale knowledge bases (KBs) remains an underdeveloped area, affecting applications such as semantic parsing and indulging in "hallucinated" information. This paper is an experimental investigation aimed at uncovering the robustness challenges that LMs encounter when tasked with knowledge base question answering (KBQA). The investigation covers scenarios with inconsistent data distribution between training and inference, such as generalization to unseen domains, adaptation to various language variations, and transferability across different datasets. Our comprehensive experiments reveal that even when employed with our proposed data augmentation techniques, advanced small and large language models exhibit poor performance in various dimensions. While the LM is a promising technology, the robustness of the current form in dealing with complex environments is fragile and of limited practicality because of the data distribution issue. This calls for future research on data collection and LM learning paradims. </p>
back
<p>In recent years, predicting mobile app usage has become increasingly important for areas like app recommendation, user behaviour analysis, and mobile resource management. Existing models, however, struggle with the heterogeneous nature of contextual data and the user cold start problem. This study introduces a novel prediction model, Mobile App Prediction Leveraging Large Language Model Embeddings (MAPLE), which employs Large Language Models (LLMs) and installed app similarity to overcome these challenges. MAPLE utilises the power of LLMs to process contextual data and discern intricate relationships within it effectively. Additionally, we explore the use of installed app similarity to address the cold start problem, facilitating the modelling of user preferences and habits, even for new users with limited historical data. In essence, our research presents MAPLE as a novel, potent, and practical approach to app usage prediction, making significant strides in resolving issues faced by existing models. MAPLE stands out as a comprehensive and effective solution, setting a new benchmark for more precise and personalised app usage predictions. In tests on two real-world datasets, MAPLE surpasses contemporary models in both standard and cold start scenarios. These outcomes validate MAPLE's capacity for precise app usage predictions and its resilience against the cold start problem. This enhanced performance stems from the model's proficiency in capturing complex temporal patterns and leveraging contextual information. As a result, MAPLE can potentially improve personalised mobile app usage predictions and user experiences markedly. </p>
back
<p>With increasing frequency of high-profile privacy breaches in various online platforms, users are becoming more concerned about their privacy. And recommender system is the core component of online platforms for providing personalized service, consequently, its privacy preservation has attracted great attention. As the gold standard of privacy protection, differential privacy has been widely adopted to preserve privacy in recommender systems. However, existing differentially private recommender systems only consider static and independent interactions, so they cannot apply to sequential recommendation where behaviors are dynamic and dependent. Meanwhile, little attention has been paid on the privacy risk of sensitive user features, most of them only protect user feedbacks. In this work, we propose a novel DIfferentially Private Sequential recommendation framework with a noisy Graph Neural Network approach (denoted as DIPSGNN) to address these limitations. To the best of our knowledge, we are the first to achieve differential privacy in sequential recommendation with dependent interactions. Specifically, in DIPSGNN, we first leverage piecewise mechanism to protect sensitive user features. Then, we innovatively add calibrated noise into aggregation step of graph neural network based on aggregation perturbation mechanism. And this noisy graph neural network can protect sequentially dependent interactions and capture user preferences simultaneously. Extensive experiments demonstrate the superiority of our method over state-of-the-art differentially private recommender systems in terms of better balance between privacy and accuracy. </p>
back
<p>The development of large language models (LLMs) capable of following instructions and engaging in conversational interactions sparked increased interest in their utilization across various support tools. We investigate the utility of modern LLMs in assisting professional writers via an empirical user study (n=30). The design of our collaborative writing interface is grounded in the cognitive process model of writing that views writing as a goal-oriented thinking process encompassing non-linear cognitive activities: planning, translating, and reviewing. Participants are asked to submit a post-completion survey to provide feedback on the potential and pitfalls of LLMs as writing collaborators. Upon analyzing the writer-LLM interactions, we find that while writers seek LLM's help across all three types of cognitive activities, they find LLMs more helpful in translation and reviewing. Our findings from analyzing both the interactions and the survey responses highlight future research directions in creative writing assistance using LLMs. </p>
back
<p>Large language models are becoming increasingly practical for translating code across programming languages, a process known as $transpiling$. Even though automated transpilation significantly boosts developer productivity, a key concern is whether the generated code is correct. Existing work initially used manually crafted test suites to test the translations of a small corpus of programs; these test suites were later automated. In contrast, we devise the first approach for automated, functional, property-based testing of code translation models. Our general, user-provided specifications about the transpiled code capture a range of properties, from purely syntactic to purely semantic ones. As shown by our experiments, this approach is very effective in detecting property violations in popular code translation models, and therefore, in evaluating model quality with respect to given properties. We also go a step further and explore the usage scenario where a user simply aims to obtain a correct translation of some code with respect to certain properties without necessarily being concerned about the overall quality of the model. To this purpose, we develop the first property-guided search procedure for code translation models, where a model is repeatedly queried with slightly different parameters to produce alternative and potentially more correct translations. Our results show that this search procedure helps to obtain significantly better code translations. </p>
back
<p>Indoor monocular depth estimation has attracted increasing research interest. Most previous works have been focusing on methodology, primarily experimenting with NYU-Depth-V2 (NYUv2) Dataset, and only concentrated on the overall performance over the test set. However, little is known regarding robustness and generalization when it comes to applying monocular depth estimation methods to real-world scenarios where highly varying and diverse functional \textit{space types} are present such as library or kitchen. A study for performance breakdown into space types is essential to realize a pretrained model's performance variance. To facilitate our investigation for robustness and address limitations of previous works, we collect InSpaceType, a high-quality and high-resolution RGBD dataset for general indoor environments. We benchmark 12 recent methods on InSpaceType and find they severely suffer from performance imbalance concerning space types, which reveals their underlying bias. We extend our analysis to 4 other datasets, 3 mitigation approaches, and the ability to generalize to unseen space types. Our work marks the first in-depth investigation of performance imbalance across space types for indoor monocular depth estimation, drawing attention to potential safety concerns for model deployment without considering space types, and further shedding light on potential ways to improve robustness. See \url{https://depthcomputation.github.io/DepthPublic} for data and the supplementary document. The benchmark list on the GitHub project page keeps updates for the lastest monocular depth estimation methods. </p>
back
<p>Advanced Driver-Assistance Systems (ADAS) have successfully integrated learning-based techniques into vehicle perception and decision-making. However, their application in 3D lane detection for effective driving environment perception is hindered by the lack of comprehensive LiDAR datasets. The sparse nature of LiDAR point cloud data prevents an efficient manual annotation process. To solve this problem, we present LiSV-3DLane, a large-scale 3D lane dataset that comprises 20k frames of surround-view LiDAR point clouds with enriched semantic annotation. Unlike existing datasets confined to a frontal perspective, LiSV-3DLane provides a full 360-degree spatial panorama around the ego vehicle, capturing complex lane patterns in both urban and highway environments. We leverage the geometric traits of lane lines and the intrinsic spatial attributes of LiDAR data to design a simple yet effective automatic annotation pipeline for generating finer lane labels. To propel future research, we propose a novel LiDAR-based 3D lane detection model, LiLaDet, incorporating the spatial geometry learning of the LiDAR point cloud into Bird's Eye View (BEV) based lane identification. Experimental results indicate that LiLaDet outperforms existing camera- and LiDAR-based approaches in the 3D lane detection task on the K-Lane dataset and our LiSV-3DLane. </p>
back
<p>Distributed deep neural networks (DNNs) have been shown to reduce the computational burden of mobile devices and decrease the end-to-end inference latency in edge computing scenarios. While distributed DNNs have been studied, to the best of our knowledge the resilience of distributed DNNs to adversarial action still remains an open problem. In this paper, we fill the existing research gap by rigorously analyzing the robustness of distributed DNNs against adversarial action. We cast this problem in the context of information theory and introduce two new measurements for distortion and robustness. Our theoretical findings indicate that (i) assuming the same level of information distortion, latent features are always more robust than input representations; (ii) the adversarial robustness is jointly determined by the feature dimension and the generalization capability of the DNN. To test our theoretical findings, we perform extensive experimental analysis by considering 6 different DNN architectures, 6 different approaches for distributed DNN and 10 different adversarial attacks to the ImageNet-1K dataset. Our experimental results support our theoretical findings by showing that the compressed latent representations can reduce the success rate of adversarial attacks by 88% in the best case and by 57% on the average compared to attacks to the input space. </p>
back
<p>Imitation learning, which learns agent policy by mimicking expert demonstration, has shown promising results in many applications such as medical treatment regimes and self-driving vehicles. However, it remains a difficult task to interpret control policies learned by the agent. Difficulties mainly come from two aspects: 1) agents in imitation learning are usually implemented as deep neural networks, which are black-box models and lack interpretability; 2) the latent causal mechanism behind agents' decisions may vary along the trajectory, rather than staying static throughout time steps. To increase transparency and offer better interpretability of the neural agent, we propose to expose its captured knowledge in the form of a directed acyclic causal graph, with nodes being action and state variables and edges denoting the causal relations behind predictions. Furthermore, we design this causal discovery process to be state-dependent, enabling it to model the dynamics in latent causal graphs. Concretely, we conduct causal discovery from the perspective of Granger causality and propose a self-explainable imitation learning framework, {\method}. The proposed framework is composed of three parts: a dynamic causal discovery module, a causality encoding module, and a prediction module, and is trained in an end-to-end manner. After the model is learned, we can obtain causal relations among states and action variables behind its decisions, exposing policies learned by it. Experimental results on both synthetic and real-world datasets demonstrate the effectiveness of the proposed {\method} in learning the dynamic causal graphs for understanding the decision-making of imitation learning meanwhile maintaining high prediction accuracy. </p>
back
<p>The generation of lyrics tightly connected to accompanying melodies involves establishing a mapping between musical notes and syllables of lyrics. This process requires a deep understanding of music constraints and semantic patterns at syllable-level, word-level, and sentence-level semantic meanings. However, pre-trained language models specifically designed at the syllable level are publicly unavailable. To solve these challenging issues, we propose to exploit fine-tuning character-level language models for syllable-level lyrics generation from symbolic melody. In particular, our method endeavors to incorporate linguistic knowledge of the language model into the beam search process of a syllable-level Transformer generator network. Additionally, by exploring ChatGPT-based evaluation for generated lyrics, along with human subjective evaluation, we demonstrate that our approach enhances the coherence and correctness of the generated lyrics, eliminating the need to train expensive new language models. </p>
back
<p>Driven by the appealing properties of neural fields for storing and communicating 3D data, the problem of directly processing them to address tasks such as classification and part segmentation has emerged and has been investigated in recent works. Early approaches employ neural fields parameterized by shared networks trained on the whole dataset, achieving good task performance but sacrificing reconstruction quality. To improve the latter, later methods focus on individual neural fields parameterized as large Multi-Layer Perceptrons (MLPs), which are, however, challenging to process due to the high dimensionality of the weight space, intrinsic weight space symmetries, and sensitivity to random initialization. Hence, results turn out significantly inferior to those achieved by processing explicit representations, e.g., point clouds or meshes. In the meantime, hybrid representations, in particular based on tri-planes, have emerged as a more effective and efficient alternative to realize neural fields, but their direct processing has not been investigated yet. In this paper, we show that the tri-plane discrete data structure encodes rich information, which can be effectively processed by standard deep-learning machinery. We define an extensive benchmark covering a diverse set of fields such as occupancy, signed/unsigned distance, and, for the first time, radiance fields. While processing a field with the same reconstruction quality, we achieve task performance far superior to frameworks that process large MLPs and, for the first time, almost on par with architectures handling explicit representations. </p>
back
<p>Existing Self-Supervised Learning (SSL) models for speech typically process speech signals at a fixed resolution of 20 milliseconds. This approach overlooks the varying informational content present at different resolutions in speech signals. In contrast, this paper aims to incorporate multi-resolution information into speech self-supervised representation learning. We introduce a SSL model that leverages a hierarchical Transformer architecture, complemented by HuBERT-style masked prediction objectives, to process speech at multiple resolutions. Experimental results indicate that the proposed model not only achieves more efficient inference but also exhibits superior or comparable performance to the original HuBERT model over various tasks. Specifically, significant performance improvements over the original HuBERT have been observed in fine-tuning experiments on the LibriSpeech speech recognition benchmark as well as in evaluations using the Speech Universal PERformance Benchmark (SUPERB) and Multilingual SUPERB (ML-SUPERB). </p>
back
<p>Generative adversarial networks (GANs) have remarkably advanced in diverse domains, especially image generation and editing. However, the misuse of GANs for generating deceptive images, such as face replacement, raises significant security concerns, which have gained widespread attention. Therefore, it is urgent to develop effective detection methods to distinguish between real and fake images. Current research centers around the application of transfer learning. Nevertheless, it encounters challenges such as knowledge forgetting from the original dataset and inadequate performance when dealing with imbalanced data during training. To alleviate this issue, this paper introduces a novel GAN-generated image detection algorithm called X-Transfer, which enhances transfer learning by utilizing two neural networks that employ interleaved parallel gradient transmission. In addition, we combine AUC loss and cross-entropy loss to improve the model's performance. We carry out comprehensive experiments on multiple facial image datasets. The results show that our model outperforms the general transferring approach, and the best metric achieves 99.04%, which is increased by approximately 10%. Furthermore, we demonstrate excellent performance on non-face datasets, validating its generality and broader application prospects. </p>
back
<p>Neural language models are probabilistic models of human text. They are predominantly trained using maximum likelihood estimation (MLE), which is equivalent to minimizing the forward cross-entropy between the empirical data distribution and the model distribution. However, various degeneration phenomena are still widely observed when decoding from the distributions learned by such models. We establish that the forward cross-entropy is suboptimal as a distance metric for aligning human and model distribution due to its (1) recall-prioritization (2) negative diversity ignorance and (3) train-test mismatch. In this paper, we propose Earth Mover Distance Optimization (EMO) for auto-regressive language modeling. EMO capitalizes on the inherent properties of earth mover distance to address the aforementioned challenges. Due to the high complexity of direct computation, we further introduce a feasible upper bound for EMO to ease end-to-end training. Upon extensive evaluation of language models trained using EMO and MLE. We find that EMO demonstrates a consistently better language modeling performance than MLE across domains. Moreover, EMO demonstrates noteworthy enhancements in downstream performance with minimal fine-tuning on merely 25,000 sentences. This highlights the tremendous potential of EMO as a lightweight calibration method for enhancing large-scale pre-trained language models. </p>
back
<p>The history of artificial intelligence (AI) has witnessed the significant impact of high-quality data on various deep learning models, such as ImageNet for AlexNet and ResNet. Recently, instead of designing more complex neural architectures as model-centric approaches, the attention of AI community has shifted to data-centric ones, which focuses on better processing data to strengthen the ability of neural models. Graph learning, which operates on ubiquitous topological data, also plays an important role in the era of deep learning. In this survey, we comprehensively review graph learning approaches from the data-centric perspective, and aim to answer three crucial questions: (1) when to modify graph data, (2) what part of the graph data needs modification to unlock the potential of various graph models, and (3) how to safeguard graph models from problematic data influence. Accordingly, we propose a novel taxonomy based on the stages in the graph learning pipeline, and highlight the processing methods for different data structures in the graph data, i.e., topology, feature and label. Furthermore, we analyze some potential problems embedded in graph data and discuss how to solve them in a data-centric manner. Finally, we provide some promising future directions for data-centric graph learning. </p>
back
<p>Synchronous condensers (SCs) have been reported to improve the overall stability and short-circuit power of a power system. SCs are also being integrated into offshore wind power plants (WPPs) for the same reason. This paper, investigates the effect of synchronous condensers on an offshore wind power plant with grid-following (GFL) and grid-forming (GFM) converter controls. Primarily, the effect of synchronous condensers can be two-fold: (1) overall stability enhancement of the WPP by providing reactive power support, (2) contribution to the effective short circuit ratio (SCR) of the WPP by fault current support. Therefore, this paper focuses on studies concerning these effects on an aggregated model of a WPP connected to the grid. To that end, a state-space model of the test system is developed for small-signal stability assessment and the synchronous condenser's effect on its stability. In addition, a mathematical explanation of SCR enhancement with synchronous condenser is provided and is verified with time-domain electromagnetic transient simulations. </p>
back
<p>Various Large Language Models(LLMs) from the Generative Pretrained Transformer~(GPT) family have achieved outstanding performances in a wide range of text generation tasks. However, the enormous model sizes have hindered their practical use in real-world applications due to high inference latency. Therefore, improving the efficiencies of LLMs through quantization, pruning, and other means has been a key issue in LLM studies. In this work, we propose a method based on Hessian sensitivity-aware mixed sparsity pruning to prune LLMs to at least 50\% sparsity without the need of any retraining. It allocates sparsity adaptively based on sensitivity, allowing us to reduce pruning-induced error while maintaining the overall sparsity level. The advantages of the proposed method exhibit even more when the sparsity is extremely high. Furthermore, our method is compatible with quantization, enabling further compression of LLMs. </p>
back
<p>Diffusion models for text-to-image (T2I) synthesis, such as Stable Diffusion (SD), have recently demonstrated exceptional capabilities for generating high-quality content. However, this progress has raised several concerns of potential misuse, particularly in creating copyrighted, prohibited, and restricted content, or NSFW (not safe for work) images. While efforts have been made to mitigate such problems, either by implementing a safety filter at the evaluation stage or by fine-tuning models to eliminate undesirable concepts or styles, the effectiveness of these safety measures in dealing with a wide range of prompts remains largely unexplored. In this work, we aim to investigate these safety mechanisms by proposing one novel concept retrieval algorithm for evaluation. We introduce Ring-A-Bell, a model-agnostic red-teaming tool for T2I diffusion models, where the whole evaluation can be prepared in advance without prior knowledge of the target model. Specifically, Ring-A-Bell first performs concept extraction to obtain holistic representations for sensitive and inappropriate concepts. Subsequently, by leveraging the extracted concept, Ring-A-Bell automatically identifies problematic prompts for diffusion models with the corresponding generation of inappropriate content, allowing the user to assess the reliability of deployed safety mechanisms. Finally, we empirically validate our method by testing online services such as Midjourney and various methods of concept removal. Our results show that Ring-A-Bell, by manipulating safe prompting benchmarks, can transform prompts that were originally regarded as safe to evade existing safety mechanisms, thus revealing the defects of the so-called safety mechanisms which could practically lead to the generation of harmful contents. </p>
back
<p>Graph Neural Networks (GNNs), especially message-passing neural networks (MPNNs), have emerged as powerful architectures for learning on graphs in diverse applications. However, MPNNs face challenges when modeling non-local interactions in graphs such as large conjugated molecules, and social networks due to oversmoothing and oversquashing. Although Spectral GNNs and traditional neural networks such as recurrent neural networks and transformers mitigate these challenges, they often lack generalizability, or fail to capture detailed structural relationships or symmetries in the data. To address these concerns, we introduce Matrix Function Neural Networks (MFNs), a novel architecture that parameterizes non-local interactions through analytic matrix equivariant functions. Employing resolvent expansions offers a straightforward implementation and the potential for linear scaling with system size. The MFN architecture achieves stateof-the-art performance in standard graph benchmarks, such as the ZINC and TU datasets, and is able to capture intricate non-local interactions in quantum systems, paving the way to new state-of-the-art force fields. </p>
back
<p>This work proposes an optimal power management strategy for shipboard microgrids equipped with diesel generators, a fuel cell and a battery energy storage system. The optimization aims to determine both the unit commitment and the optimal power dispatch for all resources to ensure a reliable power supply at minimum cost and with minimal environmental impact. This strategy takes into account the zero-emission capability of the ship and incorporates a soft constraint related to the ship's speed. The optimization is performed solving a mixed integer linear programming problem, where the constraints are defined according to the operational limits of the resources when a contingency occurs. The algorithm is tested on a notional all-electric ship where the electrical load is generated through a Markov chain, modelled on real measurement data. The results show that the proposed power management strategy successfully maximizes fuel and emission savings while ensuring blackout prevention capability. </p>
back
<p>Finding bugs is key to the correctness of compilers in wide use today. If the behaviour of a compiled program, as allowed by its architecture memory model, is not a behaviour of the source program under its source model, then there is a bug. This holds for all programs, but we focus on concurrency bugs that occur only with two or more threads of execution. We focus on testing techniques that detect such bugs in C/C++ compilers. </p> <p>We seek a testing technique that automatically covers concurrency bugs up to fixed bounds on program sizes and that scales to find bugs in compiled programs with many lines of code. Otherwise, a testing technique can miss bugs. Unfortunately, the state-of-the-art techniques are yet to satisfy all of these properties. </p> <p>We present the T\'el\'echat compiler testing tool for concurrent programs. T\'el\'echat compiles a concurrent C/C++ program and compares source and compiled program behaviours using source and architecture memory models. We make three claims: T\'el\'echat improves the state-of-the-art at finding bugs in code generation for multi-threaded execution, it is the first public description of a compiler testing tool for concurrency that is deployed in industry, and it is the first tool that takes a significant step towards the desired properties. We provide experimental evidence suggesting T\'el\'echat finds bugs missed by other state-of-the-art techniques, case studies indicating that T\'el\'echat satisfies the properties, and reports of our experience deploying T\'el\'echat in industry regression testing. </p>
back
<p>The expressivity of Graph Neural Networks (GNNs) can be entirely characterized by appropriate fragments of the first order logic. Namely, any query of the two variable fragment of graded modal logic (GC2) interpreted over labeled graphs can be expressed using a GNN whose size depends only on the depth of the query. As pointed out by [Barcelo &amp; Al., 2020, Grohe, 2021], this description holds for a family of activation functions, leaving the possibibility for a hierarchy of logics expressible by GNNs depending on the chosen activation function. In this article, we show that such hierarchy indeed exists by proving that GC2 queries cannot be expressed by GNNs with polynomial activation functions. This implies a separation between polynomial and popular non polynomial activations (such as Rectified Linear Units) and answers an open question formulated by [Grohe, 21]. </p>
back
<p>This work is the first study on the effects of attacks on cryptocurrencies as expressed in the sentiments and emotions of social media users. Our goals are to design the methodologies for the study including data collection, conduct volumetric and temporal analyses of the data, and profile the sentiments and emotions that emerge from the data. As a first step, we have created a first-of-its-kind comprehensive list of 31 events of 51% attacks on various PoW cryptocurrencies, showing that these events are quite common contrary to the general perception. We have gathered Twitter data on the events as well as benchmark data during normal times for comparison. We have defined parameters for profiling the datasets based on their sentiments and emotions. We have studied the variation of these sentiment and emotion profiles when a cryptocurrency is under attack and the benchmark otherwise, between multiple attack events of the same cryptocurrency, and between different cryptocurrencies. Our results confirm some expected overall behaviour and reactions while providing nuanced insights that may not be obvious or may even be considered surprising. Our code and datasets are publicly accessible. </p>
back
<p>Rule-based models, e.g., decision trees, are widely used in scenarios demanding high model interpretability for their transparent inner structures and good model expressivity. However, rule-based models are hard to optimize, especially on large data sets, due to their discrete parameters and structures. Ensemble methods and fuzzy/soft rules are commonly used to improve performance, but they sacrifice the model interpretability. To obtain both good scalability and interpretability, we propose a new classifier, named Rule-based Representation Learner (RRL), that automatically learns interpretable non-fuzzy rules for data representation and classification. To train the non-differentiable RRL effectively, we project it to a continuous space and propose a novel training method, called Gradient Grafting, that can directly optimize the discrete model using gradient descent. A novel design of logical activation functions is also devised to increase the scalability of RRL and enable it to discretize the continuous features end-to-end. Exhaustive experiments on ten small and four large data sets show that RRL outperforms the competitive interpretable approaches and can be easily adjusted to obtain a trade-off between classification accuracy and model complexity for different scenarios. Our code is available at: https://github.com/12wang3/rrl. </p>
back
<p>Restless multi-arm bandits (RMABs), a class of resource allocation problems with broad application in areas such as healthcare, online advertising, and anti-poaching, have recently been studied from a multi-agent reinforcement learning perspective. Prior RMAB research suffers from several limitations, e.g., it fails to adequately address continuous states, and requires retraining from scratch when arms opt-in and opt-out over time, a common challenge in many real world applications. We address these limitations by developing a neural network-based pre-trained model (PreFeRMAB) that has general zero-shot ability on a wide range of previously unseen RMABs, and which can be fine-tuned on specific instances in a more sample-efficient way than retraining from scratch. Our model also accommodates general multi-action settings and discrete or continuous state spaces. To enable fast generalization, we learn a novel single policy network model that utilizes feature information and employs a training procedure in which arms opt-in and out over time. We derive a new update rule for a crucial $\lambda$-network with theoretical convergence guarantees and empirically demonstrate the advantages of our approach on several challenging, real-world inspired problems. </p>
back
<p>A spectrum-sharing satellite-ground integrated network is conceived, consisting of a pair of non-geostationary orbit (NGSO) constellations and multiple terrestrial base stations, which impose the co-frequency interference (CFI) on each other. The CFI may increase upon increasing the number of satellites. To manage the potentially severe interference, we propose to rely on joint multi-domain resource aided interference management (JMDR-IM). Specifically, the coverage overlap of the constellations considered is analyzed. Then, multi-domain resources - including both the beam-domain and power-domain - are jointly utilized for managing the CFI in an overlapping coverage region. This joint resource utilization is performed by relying on our specifically designed beam-shut-off and switching based beam scheduling, as well as on long short-term memory based joint autoregressive moving average assisted deep Q network aided power scheduling. Moreover, the outage probability (OP) of the proposed JMDR-IM scheme is derived, and the asymptotic analysis of the OP is also provided. Our performance evaluations demonstrate the superiority of the proposed JMDR-IM scheme in terms of its increased throughput and reduced OP. </p>
back
<p>With the availability of large-scale video datasets and the advances of diffusion models, text-driven video generation has achieved substantial progress. However, existing video generation models are typically trained on a limited number of frames, resulting in the inability to generate high-fidelity long videos during inference. Furthermore, these models only support single-text conditions, whereas real-life scenarios often require multi-text conditions as the video content changes over time. To tackle these challenges, this study explores the potential of extending the text-driven capability to generate longer videos conditioned on multiple texts. 1) We first analyze the impact of initial noise in video diffusion models. Then building upon the observation of noise, we propose FreeNoise, a tuning-free and time-efficient paradigm to enhance the generative capabilities of pretrained video diffusion models while preserving content consistency. Specifically, instead of initializing noises for all frames, we reschedule a sequence of noises for long-range correlation and perform temporal attention over them by window-based function. 2) Additionally, we design a novel motion injection method to support the generation of videos conditioned on multiple text prompts. Extensive experiments validate the superiority of our paradigm in extending the generative capabilities of video diffusion models. It is noteworthy that compared with the previous best-performing method which brought about 255% extra time cost, our method incurs only negligible time cost of approximately 17%. Generated video samples are available at our website: <a href="http://haonanqiu.com/projects/FreeNoise.html.">this http URL</a> </p>
back
<p>In scheduling problems common in the industry and various real-world scenarios, responding in real-time to disruptive events is essential. Recent methods propose the use of deep reinforcement learning (DRL) to learn policies capable of generating solutions under this constraint. The objective of this paper is to introduce a new DRL method for solving the flexible job-shop scheduling problem, particularly for large instances. The approach is based on the use of heterogeneous graph neural networks to a more informative graph representation of the problem. This novel modeling of the problem enhances the policy's ability to capture state information and improve its decision-making capacity. Additionally, we introduce two novel approaches to enhance the performance of the DRL approach: the first involves generating a diverse set of scheduling policies, while the second combines DRL with dispatching rules (DRs) constraining the action space. Experimental results on two public benchmarks show that our approach outperforms DRs and achieves superior results compared to three state-of-the-art DRL methods, particularly for large instances. </p>
back
<p>The use of large language models for code generation is a rapidly growing trend in software development. However, without effective methods for ensuring the correctness of generated code, this trend could lead to any number of undesirable outcomes. In this paper, we lay out a vision for addressing this challenge: the Clover paradigm, short for Closed-Loop Verifiable Code Generation, which reduces correctness checking to the more accessible problem of consistency checking. At the core of Clover lies a checker that performs consistency checks among code, docstrings, and formal annotations. The checker is implemented using a novel integration of formal verification tools and large language models. We provide a theoretical analysis to support our thesis that Clover should be effective at consistency checking. We also empirically investigate its feasibility on a hand-designed dataset (CloverBench) featuring annotated Dafny programs at a textbook level of difficulty. Experimental results show that for this dataset, (i) LLMs are reasonably successful at automatically generating formal specifications; and (ii) our consistency checker achieves a promising acceptance rate (up to 87%) for correct instances while maintaining zero tolerance for incorrect ones (no false positives). </p>
back
<p>Sound over-approximation methods have been proved effective for guaranteeing the absence of errors, but inevitably they produce false alarms that can hamper the programmers. Conversely, under-approximation methods are aimed at bug finding and are free from false alarms. We introduce Sufficient Incorrectness Logic~(SIL), a new under-approximating, triple-based program logic to reason about program errors. SIL is designed to set apart the initial states leading to errors. We prove that SIL is correct and complete for a minimal set of rules, and we study additional rules that can facilitate program analyses. We formally compare SIL to existing triple-based program logics. Incorrectness Logic and SIL both perform under-approximations, but while the former exposes only true errors, the latter locates the set of initial states that lead to such errors. Hoare Logic performs over-approximations and as such cannot capture the set of initial states leading to errors in nondeterministic programs -- for deterministic and terminating programs, Hoare Logic and SIL coincide. Finally, we instantiate SIL with Separation Logic formulae (Separation SIL) to handle pointers and dynamic allocation and we prove its correctness and, for loop-free programs, also its completeness. We argue that in some cases Separation SIL can yield more succinct postconditions and provide stronger guarantees than Incorrectness Separation Logic and can support effective backward reasoning. </p>
back
<p>The growing computing power over the years has enabled simulations to become more complex and accurate. While immensely valuable for scientific discovery and problem-solving, however, high-fidelity simulations come with significant computational demands. As a result, it is common to run a low-fidelity model with a subgrid-scale model to reduce the computational cost, but selecting the appropriate subgrid-scale models and tuning them are challenging. We propose a novel method for learning the subgrid-scale model effects when simulating partial differential equations augmented by neural ordinary differential operators in the context of discontinuous Galerkin (DG) spatial discretization. Our approach learns the missing scales of the low-order DG solver at a continuous level and hence improves the accuracy of the low-order DG approximations as well as accelerates the filtered high-order DG simulations with a certain degree of precision. We demonstrate the performance of our approach through multidimensional Taylor-Green vortex examples at different Reynolds numbers and times, which cover laminar, transitional, and turbulent regimes. The proposed method not only reconstructs the subgrid-scale from the low-order (1st-order) approximation but also speeds up the filtered high-order DG (6th-order) simulation by two orders of magnitude. </p>
back
<p>Emotion recognition is a crucial task for human conversation understanding. It becomes more challenging with the notion of multimodal data, e.g., language, voice, and facial expressions. As a typical solution, the global- and the local context information are exploited to predict the emotional label for every single sentence, i.e., utterance, in the dialogue. Specifically, the global representation could be captured via modeling of cross-modal interactions at the conversation level. The local one is often inferred using the temporal information of speakers or emotional shifts, which neglects vital factors at the utterance level. Additionally, most existing approaches take fused features of multiple modalities in an unified input without leveraging modality-specific representations. Motivating from these problems, we propose the Relational Temporal Graph Neural Network with Auxiliary Cross-Modality Interaction (CORECT), an novel neural network framework that effectively captures conversation-level cross-modality interactions and utterance-level temporal dependencies with the modality-specific manner for conversation understanding. Extensive experiments demonstrate the effectiveness of CORECT via its state-of-the-art results on the IEMOCAP and CMU-MOSEI datasets for the multimodal ERC task. </p>
back
<p>Regev recently introduced a quantum factoring algorithm that may be perceived as a $d$-dimensional variation of Shor's factoring algorithm. In this work, we extend Regev's factoring algorithm to an algorithm for computing discrete logarithms in a natural way. Furthermore, we discuss natural extensions of Regev's factoring algorithm to order finding, and to factoring completely via order finding. For all of these algorithms, we discuss various practical implementation considerations, including in particular the robustness of the post-processing. </p>
back
<p>With the booming popularity of smartphones, threats related to these devices are increasingly on the rise. Smishing, a combination of SMS (Short Message Service) and phishing has emerged as a treacherous cyber threat used by malicious actors to deceive users, aiming to steal sensitive information, money or install malware on their mobile devices. Despite the increase in smishing attacks in recent years, there are very few studies aimed at understanding the factors that contribute to a user's ability to differentiate real from fake messages. To address this gap in knowledge, we have conducted an online survey on smishing detection with 214 participants. In this study, we presented them with 16 SMS screenshots and evaluated how different factors affect their decision making process in smishing detection. Next, we conducted a follow-up survey to garner information on the participants' security attitudes, behavior and knowledge. Our results highlighted that attention and security behavioral scores had a significant impact on participants' accuracy in identifying smishing messages. Interestingly, we found that participants had more difficulty identifying real messages from fake ones, with an accuracy of 65.6% with fake messages and 44.6% with real messages. Our study is crucial in developing proactive strategies to encounter and mitigate smishing attacks. By understanding what factors influence smishing detection, we aim to bolster users' resilience against such threats and create a safer digital environment for all. </p>
back
<p>We study the design of a goal-oriented sampling and scheduling strategy through a channel with highly variable two-way random delay, which can exhibit memory (e.g., Delay and Disruption Tolerant Networks). The objective of the communication is to optimize the performance of remote inference, where an inference algorithm (e.g., a trained neural network) on the receiver side predicts a time-varying target signal using the data samples transmitted by a sensor. Previous formulations to this problem either assumed a channel with IID transmission delay, neglecting feedback delay, or considered the monotonic relation that the performance only gets worse as the input information ages. We show how, with delayed feedback, one can effectively exploit the knowledge about delay memory through an index-based threshold policy. This policy minimizes the expected time-average inference error that can be monotone or non-monotone in age. The index function is expressed in terms of the Age of Information (AoI) on the receiver side and a parameter regarding the distribution of subsequent transmission delay, both of which can readily be tracked. </p>
back
<p>This paper presents a comprehensive study of Meta Prompting, an innovative technique reshaping the utilization of large language models (LLMs), multi-modal foundation models, and AI systems in problem-solving and data interpretation. Grounded in type theory and category theory, Meta Prompting emphasizes the structure and syntax of information over traditional content-centric methods. The paper explores the formal definitions of Meta Prompting (MP), sets it apart from Few-Shot Prompting, and underlines its effectiveness in various AI applications. A key focus is applying Meta Prompting for complex reasoning (MP-CR) tasks, showing how it effectively deconstructs intricate problems into simpler sub-problems, enhancing token efficiency, and enabling more equitable problem-solving comparisons, especially against few-shot example methods. Additionally, the paper introduces Meta Prompting for prompting tasks, allowing LLMs to self-generate new prompts in a recursive, metaprogramming-like manner. This approach marks a significant leap in AI's autonomous and adaptive capabilities. The paper also introduces the integration of Meta Prompting into multi-modal foundation model settings, tackling the challenges and opportunities of incorporating varied data types such as images, audio, and video within the structured Meta Prompting framework. Empirical experiments, including solving the Game of 24 tasks, demonstrate the MP-CR Agent's enhanced reasoning capabilities, achieving high accuracy and efficiency, and showcasing Meta Prompting's transformative impact on AI problem-solving. (The code is available at https://github.com/meta-prompting/meta-prompting) </p>
back
<p>The emergent abilities of Large Language Models (LLMs), which power tools like ChatGPT and Bard, have produced both excitement and worry about how AI will impact academic writing. In response to rising concerns about AI use, authors of academic publications may decide to voluntarily disclose any AI tools they use to revise their manuscripts, and journals and conferences could begin mandating disclosure and/or turn to using detection services, as many teachers have done with student writing in class settings. Given these looming possibilities, we investigate whether academics view it as necessary to report AI use in manuscript preparation and how detectors react to the use of AI in academic writing. </p>
back
<p>We present four main contributions to enhance the performance of Large Language Models (LLMs) in generating domain-specific code: (i) utilizing LLM-based data splitting and data renovation techniques to improve the semantic representation of embeddings' space; (ii) introducing the Chain of Density for Renovation Credibility (CoDRC), driven by LLMs, and the Adaptive Text Renovation (ATR) algorithm for assessing data renovation reliability; (iii) developing the Implicit Knowledge Expansion and Contemplation (IKEC) Prompt technique; and (iv) effectively refactoring existing scripts to generate new and high-quality scripts with LLMs. By using engineering simulation software RedHawk-SC as a case study, we demonstrate the effectiveness of our data pre-processing method for expanding and categorizing scripts. When combined with IKEC, these techniques enhance the Retrieval-Augmented Generation (RAG) method in retrieving more relevant information, ultimately achieving a 73.33% "Percentage of Correct Lines" for code generation problems in MapReduce applications. </p>
back
<p>Can we avoid wars at the crossroads of history? This question has been pursued by individuals, scholars, policymakers, and organizations throughout human history. In this research, we attempt to answer the question based on the recent advances of Artificial Intelligence (AI) and Large Language Models (LLMs). We propose \textbf{WarAgent}, an LLM-powered multi-agent AI system, to simulate the participating countries, their decisions, and the consequences, in historical international conflicts, including the World War I (WWI), the World War II (WWII), and the Warring States Period (WSP) in Ancient China. By evaluating the simulation effectiveness, we examine the advancements and limitations of cutting-edge AI systems' abilities in studying complex collective human behaviors such as international conflicts under diverse settings. In these simulations, the emergent interactions among agents also offer a novel perspective for examining the triggers and conditions that lead to war. Our findings offer data-driven and AI-augmented insights that can redefine how we approach conflict resolution and peacekeeping strategies. The implications stretch beyond historical analysis, offering a blueprint for using AI to understand human history and possibly prevent future international conflicts. Code and data are available at \url{https://github.com/agiresearch/WarAgent}. </p>
back
<p>This work introduces self-infilling code generation, a general framework that incorporates infilling operations into auto-regressive decoding. Our approach capitalizes on the observation that recent infilling-capable code language models can self-infill: whereas infilling operations aim to fill in the middle based on a predefined prefix and suffix, self-infilling sequentially generates both such surrounding context and the infilled content. We utilize this capability to introduce novel interruption and looping mechanisms in conventional decoding, evolving it into a non-monotonic process. Interruptions allow for postponing the generation of specific code until a definitive suffix is established, enhancing control over the output. Meanwhile, the looping mechanism, which leverages the complementary nature of self-infilling and left-to-right decoding, can iteratively update and synchronize each piece of generation cyclically. Extensive experiments are conducted to demonstrate that our proposed decoding process is effective in enhancing both regularity and quality across several code generation benchmarks. </p>
back
<p>Enhancing the domain generalization performance of Face Anti-Spoofing (FAS) techniques has emerged as a research focus. Existing methods are dedicated to extracting domain-invariant features from various training domains. Despite the promising performance, the extracted features inevitably contain residual style feature bias (e.g., illumination, capture device), resulting in inferior generalization performance. In this paper, we propose an alternative and effective solution, the Textually Guided Domain Generalization (TeG-DG) framework, which can effectively leverage text information for cross-domain alignment. Our core insight is that text, as a more abstract and universal form of expression, can capture the commonalities and essential characteristics across various attacks, bridging the gap between different image domains. Contrary to existing vision-language models, the proposed framework is elaborately designed to enhance the domain generalization ability of the FAS task. Concretely, we first design a Hierarchical Attention Fusion (HAF) module to enable adaptive aggregation of visual features at different levels; Then, a Textual-Enhanced Visual Discriminator (TEVD) is proposed for not only better alignment between the two modalities but also to regularize the classifier with unbiased text features. TeG-DG significantly outperforms previous approaches, especially in situations with extremely limited source domain data (~14% and ~12% improvements on HTER and AUC respectively), showcasing impressive few-shot performance. </p>
back
<p>Creativity is core to being human. Generative artificial intelligence (GenAI) -- including ever more powerful large language models (LLMs) -- holds promise for humans to be more creative by offering new ideas, or less creative by anchoring on GenAI ideas. We study the causal impact of GenAI ideas on the production of a short story in an online experimental study where some writers could obtain story ideas from a GenAI platform. We find that access to GenAI ideas causes stories to be evaluated as more creative, better written, and more enjoyable, especially among less creative writers. However, GenAI-enabled stories are more similar to each other than stories by humans alone. These results point to an increase in individual creativity at the risk of losing collective novelty. This dynamic resembles a social dilemma: with GenAI, individual writers are better off, but collectively a narrower scope of novel content may be produced. Our results have implications for researchers, policy-makers and practitioners interested in bolstering creativity. </p>
back
<p>The widespread adoption of REST APIs, coupled with their growing complexity and size, has led to the need for automated REST API testing tools. Current tools focus on the structured data in REST API specifications but often neglect valuable insights available in unstructured natural-language descriptions in the specifications, which leads to suboptimal test coverage. Recently, to address this gap, researchers have developed techniques that extract rules from these human-readable descriptions and query knowledge bases to derive meaningful input values. However, these techniques are limited in the types of rules they can extract and prone to produce inaccurate results. This paper presents RESTGPT, an innovative approach that leverages the power and intrinsic context-awareness of Large Language Models (LLMs) to improve REST API testing. RESTGPT takes as input an API specification, extracts machine-interpretable rules, and generates example parameter values from natural-language descriptions in the specification. It then augments the original specification with these rules and values. Our evaluations indicate that RESTGPT outperforms existing techniques in both rule extraction and value generation. Given these promising results, we outline future research directions for advancing REST API testing through LLMs. </p>
back
<p>Graphs can inherently model interconnected objects on the Web, thereby facilitating a series of Web applications, such as web analyzing and content recommendation. Recently, Graph Neural Networks (GNNs) have emerged as a mainstream technique for graph representation learning. However, their efficacy within an end-to-end supervised framework is significantly tied to the availabilityof task-specific labels. To mitigate labeling costs and enhance robustness in few-shot settings, pre-training on self-supervised tasks has emerged as a promising method, while prompting has been proposed to further narrow the objective gap between pretext and downstream tasks. Although there has been some initial exploration of prompt-based learning on graphs, they primarily leverage a single pretext task, resulting in a limited subset of general knowledge that could be learned from the pre-training data. Hence, in this paper, we propose MultiGPrompt, a novel multi-task pre-training and prompting framework to exploit multiple pretext tasks for more comprehensive pre-trained knowledge. First, in pre-training, we design a set of pretext tokens to synergize multiple pretext tasks. Second, we propose a dual-prompt mechanism consisting of composed and open prompts to leverage task-specific and global pre-training knowledge, to guide downstream tasks in few-shot settings. Finally, we conduct extensive experiments on six public datasets to evaluate and analyze MultiGPrompt. </p>
back
<p>Large language models (LLMs) encapsulate a vast amount of factual information within their pre-trained weights, as evidenced by their ability to answer diverse questions across different domains. However, this knowledge is inherently limited, relying heavily on the characteristics of the training data. Consequently, using external datasets to incorporate new information or refine the capabilities of LLMs on previously seen information poses a significant challenge. In this study, we compare two common approaches: unsupervised fine-tuning and retrieval-augmented generation (RAG). We evaluate both approaches on a variety of knowledge-intensive tasks across different topics. Our findings reveal that while unsupervised fine-tuning offers some improvement, RAG consistently outperforms it, both for existing knowledge encountered during training and entirely new knowledge. Moreover, we find that LLMs struggle to learn new factual information through unsupervised fine-tuning, and that exposing them to numerous variations of the same fact during training could alleviate this problem. </p>
back
<p>The detection of toxic language in the Arabic language has emerged as an active area of research in recent years, and reviewing the existing datasets employed for training the developed solutions has become a pressing need. This paper offers a comprehensive survey of Arabic datasets focused on online toxic language. We systematically gathered a total of 54 available datasets and their corresponding papers and conducted a thorough analysis, considering 18 criteria across four primary dimensions: availability details, content, annotation process, and reusability. This analysis enabled us to identify existing gaps and make recommendations for future research works. For the convenience of the research community, the list of the analysed datasets is maintained in a GitHub repository (https://github.com/Imene1/Arabic-toxic-language). </p>
back
<p>Zero-shot voice conversion (VC) aims to transfer the source speaker timbre to arbitrary unseen target speaker timbre, while keeping the linguistic content unchanged. Although the voice of generated speech can be controlled by providing the speaker embedding of the target speaker, the speaker similarity still lags behind the ground truth recordings. In this paper, we propose SEF-VC, a speaker embedding free voice conversion model, which is designed to learn and incorporate speaker timbre from reference speech via a powerful position-agnostic cross-attention mechanism, and then reconstruct waveform from HuBERT semantic tokens in a non-autoregressive manner. The concise design of SEF-VC enhances its training stability and voice conversion performance. Objective and subjective evaluations demonstrate the superiority of SEF-VC to generate high-quality speech with better similarity to target reference than strong zero-shot VC baselines, even for very short reference speeches. </p>
back
<p>In this paper, we investigate the existence of self-dual MRD codes $C\subset L^n$, where $L/F$ is an arbitrary field extension of degree $m\geq n$. We then apply our results to the case of finite fields, and prove that if $m=n$ and $F=\mathbb{F}_q$, a self-dual MRD code exists if and only if $q\equiv n\equiv 3 \ [4].$ </p>
back
<p>We propose a reinforcement learning (RL)-based system that would automatically prescribe a hypothetical patient medication that may help the patient with their mental health-related speech disfluency, and adjust the medication and the dosages in response to zero-cost frequent measurement of the fluency of the patient. We demonstrate the components of the system: a module that detects and evaluates speech disfluency on a large dataset we built, and an RL algorithm that automatically finds good combinations of medications. To support the two modules, we collect data on the effect of psychiatric medications for speech disfluency from the literature, and build a plausible patient simulation system. We demonstrate that the RL system is, under some circumstances, able to converge to a good medication regime. We collect and label a dataset of people with possible speech disfluency and demonstrate our methods using that dataset. Our work is a proof of concept: we show that there is promise in the idea of using automatic data collection to address speech disfluency. </p>
back
<p>In this work, we examined how fact-checkers prioritize which claims to inspect for further investigation and publishing, and what tools may assist them in their efforts. Specifically, through a series of interviews with 23 professional fact-checkers from around the world, we validated that harm assessment is a central component of how fact-checkers triage their work. First, we clarify what aspects of misinformation they considered to create urgency or importance. These often revolved around the potential for the claim to harm others. We also clarify the processes behind collective fact-checking decisions and gather suggestions for tools that could help with these processes. </p> <p>In addition, to address the needs articulated by these fact-checkers and others, we present a five-dimension framework of questions to help fact-checkers negotiate the priority of claims. Our FABLE Framework of Misinformation Harms incorporates five dimensions of magnitude -- (social) Fragmentation, Actionability, Believability, Likelihood of spread, and Exploitativeness -- that can help determine the potential urgency of a specific message or post when considering misinformation as harm. This effort was further validated by additional interviews with expert fact-checkers. The result is a questionnaire, a practical and conceptual tool to support fact-checkers and other content moderators as they make strategic decisions to prioritize their efforts. </p>
back
<p>We find the location of factual knowledge in large language models by exploring the residual stream and analyzing subvalues in vocabulary space. We find the reason why subvalues have human-interpretable concepts when projecting into vocabulary space. The before-softmax values of subvalues are added by an addition function, thus the probability of top tokens in vocabulary space will increase. Based on this, we find using log probability increase to compute the significance of layers and subvalues is better than probability increase, since the curve of log probability increase has a linear monotonically increasing shape. Moreover, we calculate the inner products to evaluate how much a feed-forward network (FFN) subvalue is activated by previous layers. Base on our methods, we find where factual knowledge &lt;France, capital, Paris&gt; is stored. Specifically, attention layers store "Paris is related to France". FFN layers store "Paris is a capital/city", activated by attention subvalues related to "capital". We leverage our method on Baevski-18, GPT2 medium, Llama-7B and Llama-13B. Overall, we provide a new method for understanding the mechanism of transformers. We will release our code on github. </p>
back
<p>We consider the problem of approximating a function from $L^2$ by an element of a given $m$-dimensional space $V_m$, associated with some feature map $\varphi$, using evaluations of the function at random points $x_1,\dots,x_n$. After recalling some results on optimal weighted least-squares using independent and identically distributed points, we consider weighted least-squares using projection determinantal point processes (DPP) or volume sampling. These distributions introduce dependence between the points that promotes diversity in the selected features $\varphi(x_i)$. We first provide a generalized version of volume-rescaled sampling yielding quasi-optimality results in expectation with a number of samples $n = O(m\log(m))$, that means that the expected $L^2$ error is bounded by a constant times the best approximation error in $L^2$. Also, further assuming that the function is in some normed vector space $H$ continuously embedded in $L^2$, we further prove that the approximation is almost surely bounded by the best approximation error measured in the $H$-norm. This includes the cases of functions from $L^\infty$ or reproducing kernel Hilbert spaces. Finally, we present an alternative strategy consisting in using independent repetitions of projection DPP (or volume sampling), yielding similar error bounds as with i.i.d. or volume sampling, but in practice with a much lower number of samples. Numerical experiments illustrate the performance of the different strategies. </p>
back
<p>Emergency and non-emergency response systems are essential services provided by local governments and critical to protecting lives, the environment, and property. The effective handling of (non-)emergency calls is critical for public safety and well-being. By reducing the burden through non-emergency callers, residents in critical need of assistance through 911 will receive a fast and effective response. Collaborating with the Department of Emergency Communications (DEC) in Nashville, we analyzed 11,796 non-emergency call recordings and developed Auto311, the first automated system to handle 311 non-emergency calls, which (1) effectively and dynamically predicts ongoing non-emergency incident types to generate tailored case reports during the call; (2) itemizes essential information from dialogue contexts to complete the generated reports; and (3) strategically structures system-caller dialogues with optimized confidence. We used real-world data to evaluate the system's effectiveness and deployability. The experimental results indicate that the system effectively predicts incident type with an average F-1 score of 92.54%. Moreover, the system successfully itemizes critical information from relevant contexts to complete reports, evincing a 0.93 average consistency score compared to the ground truth. Additionally, emulations demonstrate that the system effectively decreases conversation turns as the utterance size gets more extensive and categorizes the ongoing call with 94.49% mean accuracy. </p>
back
<p>This paper proposes a novel method for demand forecasting in a pricing context. Here, modeling the causal relationship between price as an input variable to demand is crucial because retailers aim to set prices in a (profit) optimal manner in a downstream decision making problem. Our methods bring together the Double Machine Learning methodology for causal inference and state-of-the-art transformer-based forecasting models. In extensive empirical experiments, we show on the one hand that our method estimates the causal effect better in a fully controlled setting via synthetic, yet realistic data. On the other hand, we demonstrate on real-world data that our method outperforms forecasting methods in off-policy settings (i.e., when there's a change in the pricing policy) while only slightly trailing in the on-policy setting. </p>
back
<p>Machine unlearning refers to the process of mitigating the influence of specific training data on machine learning models based on removal requests from data owners. However, one important area that has been largely overlooked in the research of unlearning is reinforcement learning. Reinforcement learning focuses on training an agent to make optimal decisions within an environment to maximize its cumulative rewards. During the training, the agent tends to memorize the features of the environment, which raises a significant concern about privacy. As per data protection regulations, the owner of the environment holds the right to revoke access to the agent's training data, thus necessitating the development of a novel and pressing research field, known as \emph{reinforcement unlearning}. Reinforcement unlearning focuses on revoking entire environments rather than individual data samples. This unique characteristic presents three distinct challenges: 1) how to propose unlearning schemes for environments; 2) how to avoid degrading the agent's performance in remaining environments; and 3) how to evaluate the effectiveness of unlearning. To tackle these challenges, we propose two reinforcement unlearning methods. The first method is based on decremental reinforcement learning, which aims to erase the agent's previously acquired knowledge gradually. The second method leverages environment poisoning attacks, which encourage the agent to learn new, albeit incorrect, knowledge to remove the unlearning environment. Particularly, to tackle the third challenge, we introduce the concept of ``environment inference attack'' to evaluate the unlearning outcomes. The source code is available at \url{https://anonymous.4open.science/r/Reinforcement-Unlearning-D347}. </p>
back
<p>Recent advancements in Text-to-Image (T2I) diffusion models have demonstrated impressive success in generating high-quality images with zero-shot generalization capabilities. Yet, current models struggle to closely adhere to prompt semantics, often misrepresenting or overlooking specific attributes. To address this, we propose a simple, training-free approach that modulates the guidance direction of diffusion models during inference. We first decompose the prompt semantics into a set of concepts, and monitor the guidance trajectory in relation to each concept. Our key observation is that deviations in model's adherence to prompt semantics are highly correlated with divergence of the guidance from one or more of these concepts. Based on this observation, we devise a technique to steer the guidance direction towards any concept from which the model diverges. Extensive experimentation validates that our method improves the semantic alignment of images generated by diffusion models in response to prompts. Project page is available at: https://korguy.github.io/ </p>
back
<p>Text-guided image-to-video (I2V) generation aims to generate a coherent video that preserves the identity of the input image and semantically aligns with the input prompt. Existing methods typically augment pretrained text-to-video (T2V) models by either concatenating the image with noised video frames channel-wise before being fed into the model or injecting the image embedding produced by pretrained image encoders in cross-attention modules. However, the former approach often necessitates altering the fundamental weights of pretrained T2V models, thus restricting the model's compatibility within the open-source communities and disrupting the model's prior knowledge. Meanwhile, the latter typically fails to preserve the identity of the input image. We present I2V-Adapter to overcome such limitations. I2V-Adapter adeptly propagates the unnoised input image to subsequent noised frames through a cross-frame attention mechanism, maintaining the identity of the input image without any changes to the pretrained T2V model. Notably, I2V-Adapter only introduces a few trainable parameters, significantly alleviating the training cost and also ensures compatibility with existing community-driven personalized models and control tools. Moreover, we propose a novel Frame Similarity Prior to balance the motion amplitude and the stability of generated videos through two adjustable control coefficients. Our experimental results demonstrate that I2V-Adapter is capable of producing high-quality videos. This performance, coupled with its agility and adaptability, represents a substantial advancement in the field of I2V, particularly for personalized and controllable applications. </p>
back
<p>Inspired by human driving focus, this research pioneers networks augmented with Focusing Sampling, Partial Field of View Evaluation, Enhanced FPN architecture and Directional IoU Loss - targeted innovations addressing obstacles to precise lane detection for autonomous driving. Experiments demonstrate our Focusing Sampling strategy, emphasizing vital distant details unlike uniform approaches, significantly boosts both benchmark and practical curved/distant lane recognition accuracy essential for safety. While FENetV1 achieves state-of-the-art conventional metric performance via enhancements isolating perspective-aware contexts mimicking driver vision, FENetV2 proves most reliable on the proposed Partial Field analysis. Hence we specifically recommend V2 for practical lane navigation despite fractional degradation on standard entire-image measures. Future directions include collecting on-road data and integrating complementary dual frameworks to further breakthroughs guided by human perception principles. The Code is available at https://github.com/HanyangZhong/FENet. </p>
back
<p>Recent advances in large language models (LLMs) have led to the development of various evaluation benchmarks. These benchmarks typically rely on a single instruction template for evaluating all LLMs on a specific task. In this paper, we comprehensively analyze the brittleness of results obtained via single-prompt evaluations across 6.5M instances, involving 20 different LLMs and 39 tasks from 3 benchmarks. To improve robustness of the analysis, we propose to evaluate LLMs with a set of diverse prompts instead. We discuss tailored evaluation metrics for specific use cases (e.g., LLM developers vs. developers interested in a specific downstream task), ensuring a more reliable and meaningful assessment of LLM capabilities. We then implement these criteria and conduct evaluations of multiple models, providing insights into the true strengths and limitations of current LLMs. </p>
back
<p>This paper introduces HAAQI-Net, a non-intrusive deep learning model for music quality assessment tailored to hearing aid users. In contrast to traditional methods like the Hearing Aid Audio Quality Index (HAAQI), HAAQI-Net utilizes a Bidirectional Long Short-Term Memory (BLSTM) with attention. It takes an assessed music sample and a hearing loss pattern as input, generating a predicted HAAQI score. The model employs the pre-trained Bidirectional Encoder representation from Audio Transformers (BEATs) for acoustic feature extraction. Comparing predicted scores with ground truth, HAAQI-Net achieves a Longitudinal Concordance Correlation (LCC) of 0.9368, Spearman's Rank Correlation Coefficient (SRCC) of 0.9486, and Mean Squared Error (MSE) of 0.0064. Notably, this high performance comes with a substantial reduction in inference time: from 62.52 seconds (by HAAQI) to 2.54 seconds (by HAAQI-Net), serving as an efficient music quality assessment model for hearing aid users. </p>
back
<p>3D human body shape and pose estimation from RGB images is a challenging problem with potential applications in augmented/virtual reality, healthcare and fitness technology and virtual retail. Recent solutions have focused on three types of inputs: i) single images, ii) multi-view images and iii) videos. In this study, we surveyed and compared 3D body shape and pose estimation methods for contemporary dance and performing arts, with a special focus on human body pose and dressing, camera viewpoint, illumination conditions and background conditions. We demonstrated that multi-frame methods, such as PHALP, provide better results than single-frame method for pose estimation when dancers are performing contemporary dances. </p>
back
<p>In this paper, we present a novel transformer architecture tailored for learning robust power system state representations, which strives to optimize power dispatch for the power flow adjustment across different transmission sections. Specifically, our proposed approach, named Powerformer, develops a dedicated section-adaptive attention mechanism, separating itself from the self-attention used in conventional transformers. This mechanism effectively integrates power system states with transmission section information, which facilitates the development of robust state representations. Furthermore, by considering the graph topology of power system and the electrical attributes of bus nodes, we introduce two customized strategies to further enhance the expressiveness: graph neural network propagation and multi-factor attention mechanism. Extensive evaluations are conducted on three power system scenarios, including the IEEE 118-bus system, a realistic 300-bus system in China, and a large-scale European system with 9241 buses, where Powerformer demonstrates its superior performance over several baseline methods. </p>
back
<p>This paper introduces RAISE (Reasoning and Acting through Scratchpad and Examples), an advanced architecture enhancing the integration of Large Language Models (LLMs) like GPT-4 into conversational agents. RAISE, an enhancement of the ReAct framework, incorporates a dual-component memory system, mirroring human short-term and long-term memory, to maintain context and continuity in conversations. It entails a comprehensive agent construction scenario, including phases like Conversation Selection, Scene Extraction, CoT Completion, and Scene Augmentation, leading to the LLMs Training phase. This approach appears to enhance agent controllability and adaptability in complex, multi-turn dialogues. Our preliminary evaluations in a real estate sales context suggest that RAISE has some advantages over traditional agents, indicating its potential for broader applications. This work contributes to the AI field by providing a robust framework for developing more context-aware and versatile conversational agents. </p>
back
<p>This paper addresses the challenge of example-based non-stationary texture synthesis. We introduce a novel twostep approach wherein users first modify a reference texture using standard image editing tools, yielding an initial rough target for the synthesis. Subsequently, our proposed method, termed "self-rectification", automatically refines this target into a coherent, seamless texture, while faithfully preserving the distinct visual characteristics of the reference exemplar. Our method leverages a pre-trained diffusion network, and uses self-attention mechanisms, to gradually align the synthesized texture with the reference, ensuring the retention of the structures in the provided target. Through experimental validation, our approach exhibits exceptional proficiency in handling non-stationary textures, demonstrating significant advancements in texture synthesis when compared to existing state-of-the-art techniques. Code is available at https://github.com/xiaorongjun000/Self-Rectification </p>
back
<p>There has been a growing interest in the task of generating sound for silent videos, primarily because of its practicality in streamlining video post-production. However, existing methods for video-sound generation attempt to directly create sound from visual representations, which can be challenging due to the difficulty of aligning visual representations with audio representations. In this paper, we present SonicVisionLM, a novel framework aimed at generating a wide range of sound effects by leveraging vision language models. Instead of generating audio directly from video, we use the capabilities of powerful vision language models (VLMs). When provided with a silent video, our approach first identifies events within the video using a VLM to suggest possible sounds that match the video content. This shift in approach transforms the challenging task of aligning image and audio into more well-studied sub-problems of aligning image-to-text and text-to-audio through the popular diffusion models. To improve the quality of audio recommendations with LLMs, we have collected an extensive dataset that maps text descriptions to specific sound effects and developed temporally controlled audio adapters. Our approach surpasses current state-of-the-art methods for converting video to audio, resulting in enhanced synchronization with the visuals and improved alignment between audio and video components. Project page: https://yusiissy.github.io/SonicVisionLM.github.io/ </p>
back
<p>The success of drug discovery and development relies on the precise prediction of molecular activities and properties. While in silico molecular property prediction has shown remarkable potential, its use has been limited so far to assays for which large amounts of data are available. In this study, we use a fine-tuned large language model to integrate biological assays based on their textual information, coupled with Barlow Twins, a Siamese neural network using a novel self-supervised learning approach. This architecture uses both assay information and molecular fingerprints to extract the true molecular information. TwinBooster enables the prediction of properties of unseen bioassays and molecules by providing state-of-the-art zero-shot learning tasks. Remarkably, our artificial intelligence pipeline shows excellent performance on the FS-Mol benchmark. This breakthrough demonstrates the application of deep learning to critical property prediction tasks where data is typically scarce. By accelerating the early identification of active molecules in drug discovery and development, this method has the potential to help streamline the identification of novel therapeutics. </p>
back
<p>In this paper, for any fixed integer $q&gt;2$, we construct $q$-ary codes correcting a burst of at most $t$ deletions with redundancy $\log n+8\log\log n+o(\log\log n)+\gamma_{q,t}$ bits and near-linear encoding/decoding complexity, where $n$ is the message length and $\gamma_{q,t}$ is a constant that only depends on $q$ and $t$. In previous works there are constructions of such codes with redundancy $\log n+O(\log q\log\log n)$ bits or $\log n+O(t^2\log\log n)+O(t\log q)$. The redundancy of our new construction is independent of $q$ and $t$ in the second term. </p>
back
<p>We propose Compact and Swift Segmenting 3D Gaussians(CoSSegGaussians), a method for compact 3D-consistent scene segmentation at fast rendering speed with only RGB images input. Previous NeRF-based segmentation methods have relied on time-consuming neural scene optimization. While recent 3D Gaussian Splatting has notably improved speed, existing Gaussian-based segmentation methods struggle to produce compact masks, especially in zero-shot segmentation. This issue probably stems from their straightforward assignment of learnable parameters to each Gaussian, resulting in a lack of robustness against cross-view inconsistent 2D machine-generated labels. Our method aims to address this problem by employing Dual Feature Fusion Network as Gaussians' segmentation field. Specifically, we first optimize 3D Gaussians under RGB supervision. After Gaussian Locating, DINO features extracted from images are applied through explicit unprojection, which are further incorporated with spatial features from the efficient point cloud processing network. Feature aggregation is utilized to fuse them in a global-to-local strategy for compact segmentation features. Experimental results show that our model outperforms baselines on both semantic and panoptic zero-shot segmentation task, meanwhile consumes less than 10% inference time compared to NeRF-based methods. Code and more results will be available at https://David-Dou.github.io/CoSSegGaussians </p>
back
<p>Adversarial generative models, such as Generative Adversarial Networks (GANs), are widely applied for generating various types of data, i.e., images, text, and audio. Accordingly, its promising performance has led to the GAN-based adversarial attack methods in the white-box and black-box attack scenarios. The importance of transferable black-box attacks lies in their ability to be effective across different models and settings, more closely aligning with real-world applications. However, it remains challenging to retain the performance in terms of transferable adversarial examples for such methods. Meanwhile, we observe that some enhanced gradient-based transferable adversarial attack algorithms require prolonged time for adversarial sample generation. Thus, in this work, we propose a novel algorithm named GE-AdvGAN to enhance the transferability of adversarial samples whilst improving the algorithm's efficiency. The main approach is via optimising the training process of the generator parameters. With the functional and characteristic similarity analysis, we introduce a novel gradient editing (GE) mechanism and verify its feasibility in generating transferable samples on various models. Moreover, by exploring the frequency domain information to determine the gradient editing direction, GE-AdvGAN can generate highly transferable adversarial samples while minimizing the execution time in comparison to the state-of-the-art transferable adversarial attack algorithms. The performance of GE-AdvGAN is comprehensively evaluated by large-scale experiments on different datasets, which results demonstrate the superiority of our algorithm. The code for our algorithm is available at: https://github.com/LMBTough/GE-advGAN </p>
back
<p>Multi-modal large language models have demonstrated impressive performance across various tasks in different modalities. However, existing multi-modal models primarily emphasize capturing global information within each modality while neglecting the importance of perceiving local information across modalities. Consequently, these models lack the ability to effectively understand the fine-grained details of input data, limiting their performance in tasks that require a more nuanced understanding. To address this limitation, there is a compelling need to develop models that enable fine-grained understanding across multiple modalities, thereby enhancing their applicability to a wide range of tasks. In this paper, we propose GroundingGPT, a language enhanced multi-modal grounding model. Beyond capturing global information like other multi-modal models, our proposed model excels at tasks demanding a detailed understanding of local information within the input. It demonstrates precise identification and localization of specific regions in images or moments in videos. To achieve this objective, we design a diversified dataset construction pipeline, resulting in a multi-modal, multi-granularity dataset for model training. The code, dataset, and demo of our model can be found at https: //github.com/lzw-lzw/GroundingGPT. </p>
back
<p>We give complete presentations for the dagger-compact props of affine Lagrangian and coisotropic relations over an arbitrary field. This provides a unified family of graphical languages for both affinely constrained classical mechanical systems, as well as odd-prime-dimensional stabiliser quantum circuits. To this end, we present affine Lagrangian relations by a particular class of undirected coloured graphs. In order to reason about composite systems, we introduce a powerful scalable notation where the vertices of these graphs are themselves coloured by graphs. In the setting of stabiliser quantum mechanics, this scalable notation gives an extremely concise description of graph states, which can be composed via ``phased spider fusion.'' Likewise, in the classical mechanical setting of electrical circuits, we show that impedance matrices for reciprocal networks are presented in essentially the same way. </p>
back
<p>In this chapter, we investigate the mathematical foundation of the modeling and design of reconfigurable intelligent surfaces (RIS) in both the far- and near-field regimes. More specifically, we first present RIS-assisted wireless channel models for the far- and near-field regimes, discussing relevant phenomena, such as line-of-sight (LOS) and non-LOS links, rich and poor scattering, channel correlation, and array manifold. Subsequently, we introduce two general approaches for the RIS reflective beam design, namely optimization-based and analytical, which offer different degrees of design flexibility and computational complexity. Furthermore, we provide a comprehensive set of simulation results for the performance evaluation of the studied RIS beam designs and the investigation of the impact of the system parameters. </p>
back
<p>There are two common ways in which developers are incorporating proprietary and domain-specific data when building applications of Large Language Models (LLMs): Retrieval-Augmented Generation (RAG) and Fine-Tuning. RAG augments the prompt with the external data, while fine-Tuning incorporates the additional knowledge into the model itself. However, the pros and cons of both approaches are not well understood. In this paper, we propose a pipeline for fine-tuning and RAG, and present the tradeoffs of both for multiple popular LLMs, including Llama2-13B, GPT-3.5, and GPT-4. Our pipeline consists of multiple stages, including extracting information from PDFs, generating questions and answers, using them for fine-tuning, and leveraging GPT-4 for evaluating the results. We propose metrics to assess the performance of different stages of the RAG and fine-Tuning pipeline. We conduct an in-depth study on an agricultural dataset. Agriculture as an industry has not seen much penetration of AI, and we study a potentially disruptive application - what if we could provide location-specific insights to a farmer? Our results show the effectiveness of our dataset generation pipeline in capturing geographic-specific knowledge, and the quantitative and qualitative benefits of RAG and fine-tuning. We see an accuracy increase of over 6 p.p. when fine-tuning the model and this is cumulative with RAG, which increases accuracy by 5 p.p. further. In one particular experiment, we also demonstrate that the fine-tuned model leverages information from across geographies to answer specific questions, increasing answer similarity from 47% to 72%. Overall, the results point to how systems built using LLMs can be adapted to respond and incorporate knowledge across a dimension that is critical for a specific industry, paving the way for further applications of LLMs in other industrial domains. </p>
back
<p>Human participants play a central role in the development of modern artificial intelligence (AI) technology, in psychological science, and in user research. Recent advances in generative AI have attracted growing interest to the possibility of replacing human participants in these domains with AI surrogates. We survey several such "substitution proposals" to better understand the arguments for and against substituting human participants with modern generative AI. Our scoping review indicates that the recent wave of these proposals is motivated by goals such as reducing the costs of research and development work and increasing the diversity of collected data. However, these proposals ignore and ultimately conflict with foundational values of work with human participants: representation, inclusion, and understanding. This paper critically examines the principles and goals underlying human participation to help chart out paths for future work that truly centers and empowers participants. </p>
back
<p>As deep neural networks are more commonly deployed in high-stakes domains, their lack of interpretability makes uncertainty quantification challenging. We investigate the effects of presenting conformal prediction sets$\unicode{x2013}$a method for generating valid confidence sets in distribution-free uncertainty quantification$\unicode{x2013}$to express uncertainty in AI-advised decision-making. Through a large online experiment, we compare the utility of conformal prediction sets to displays of Top-$1$ and Top-$k$ predictions for AI-advised image labeling. We find that the utility of prediction sets for accuracy varies with the difficulty of the task: while they result in accuracy on par with or less than Top-$1$ and Top-$k$ displays for easy images, prediction sets excel at assisting humans in labeling out-of-distribution (OOD) images especially when the set size is small. Our results empirically pinpoint the practical challenges of conformal prediction sets and provide implications on how to incorporate them for real-world decision-making. </p>
back
<p>Despite the advancements in large language models (LLMs) for mathematical reasoning, solving competition-level math problems remains a significant challenge, especially for open-source LLMs without external tools. We introduce the MMIQC dataset, comprising a mixture of processed web data and synthetic question-response pairs, aimed at enhancing the mathematical reasoning capabilities of base language models. Models fine-tuned on MMIQC consistently surpass their counterparts in performance on the MATH benchmark across various model sizes. Notably, Qwen-72B-MMIQC achieves a 45.0% accuracy, exceeding the previous open-source state-of-the-art by 8.2% and outperforming the initial version GPT-4 released in 2023. Extensive evaluation results on Hungarian high school finals suggest that such improvement can generalize to unseen data. Our ablation study on MMIQC reveals that a large part of the improvement can be attributed to our novel augmentation method, Iterative Question Composing (IQC), which involves iteratively composing new questions from seed problems using an LLM and applying rejection sampling through another LLM. The MMIQC dataset is available on the HuggingFace hub at https://huggingface.co/datasets/Vivacem/MMIQC. Our code is available at https://github.com/iiis-ai/IterativeQuestionComposing. </p>
back
<p>Whilst spectral Graph Neural Networks (GNNs) are theoretically well-founded in the spectral domain, their practical reliance on polynomial approximation implies a profound linkage to the spatial domain. As previous studies rarely examine spectral GNNs from the spatial perspective, their spatial-domain interpretability remains elusive, e.g., what information is essentially encoded by spectral GNNs in the spatial domain? In this paper, to answer this question, we establish a theoretical connection between spectral filtering and spatial aggregation, unveiling an intrinsic interaction that spectral filtering implicitly leads the original graph to an adapted new graph, explicitly computed for spatial aggregation. Both theoretical and empirical investigations reveal that the adapted new graph not only exhibits non-locality but also accommodates signed edge weights to reflect label consistency among nodes. These findings thus highlight the interpretable role of spectral GNNs in the spatial domain and inspire us to rethink graph spectral filters beyond the fixed-order polynomials, which neglect global information. Built upon the theoretical findings, we revisit the state-of-the-art spectral GNNs and propose a novel Spatially Adaptive Filtering (SAF) framework, which leverages the adapted new graph by spectral filtering for an auxiliary non-local aggregation. Notably, our proposed SAF comprehensively models both node similarity and dissimilarity from a global perspective, therefore alleviating persistent deficiencies of GNNs related to long-range dependencies and graph heterophily. Extensive experiments over 13 node classification benchmarks demonstrate the superiority of our proposed framework to the state-of-the-art models. </p>
back
<p>E-commerce customers frequently seek detailed product information for purchase decisions, commonly contacting sellers directly with extended queries. This manual response requirement imposes additional costs and disrupts buyer's shopping experience with response time fluctuations ranging from hours to days. We seek to automate buyer inquiries to sellers in a leading e-commerce store using a domain-specific federated Question Answering (QA) system. The main challenge is adapting current QA systems, designed for single questions, to address detailed customer queries. We address this with a low-latency, sequence-to-sequence approach, MESSAGE-TO-QUESTION ( M2Q ). It reformulates buyer messages into succinct questions by identifying and extracting the most salient information from a message. Evaluation against baselines shows that M2Q yields relative increases of 757% in question understanding, and 1,746% in answering rate from the federated QA system. Live deployment shows that automatic answering saves sellers from manually responding to millions of messages per year, and also accelerates customer purchase decisions by eliminating the need for buyers to wait for a reply </p>
back
<p>Study Objectives: Polysomnography (PSG) currently serves as the benchmark for evaluating sleep disorders. Its discomfort, impracticality for home-use, and introduction of bias in sleep quality assessment necessitate the exploration of less invasive, cost-effective, and portable alternatives. One promising contender is the in-ear-EEG sensor, which offers advantages in terms of comfort, fixed electrode positions, resistance to electromagnetic interference, and user-friendliness. This study aims to establish a methodology to assess the similarity between the in-ear-EEG signal and standard PSG. </p> <p>Methods: We assess the agreement between the PSG and in-ear-EEG derived hypnograms. We extract features in the time- and frequency- domain from PSG and in-ear-EEG 30-second epochs. We only consider the epochs where the PSG-scorers and the in-ear-EEG-scorers were in agreement. We introduce a methodology to quantify the similarity between PSG derivations and the single-channel in-ear-EEG. The approach relies on a comparison of distributions of selected features -- extracted for each sleep stage and subject on both PSG and the in-ear-EEG signals -- via a Jensen-Shannon Divergence Feature-based Similarity Index (JSD-FSI). </p> <p>Results: We found a high intra-scorer variability, mainly due to the uncertainty the scorers had in evaluating the in-ear-EEG signals. We show that the similarity between PSG and in-ear-EEG signals is high (JSD-FSI: 0.61 +/- 0.06 in awake, 0.60 +/- 0.07 in NREM and 0.51 +/- 0.08 in REM), and in line with the similarity values computed independently on standard PSG-channel-combinations. </p> <p>Conclusions: In-ear-EEG is a valuable solution for home-based sleep monitoring, however further studies with a larger and more heterogeneous dataset are needed. </p>
back
<p>Tactics, Techniques and Procedures (TTPs) represent sophisticated attack patterns in the cybersecurity domain, described encyclopedically in textual knowledge bases. Identifying TTPs in cybersecurity writing, often called TTP mapping, is an important and challenging task. Conventional learning approaches often target the problem in the classical multi-class or multilabel classification setting. This setting hinders the learning ability of the model due to a large number of classes (i.e., TTPs), the inevitable skewness of the label distribution and the complex hierarchical structure of the label space. We formulate the problem in a different learning paradigm, where the assignment of a text to a TTP label is decided by the direct semantic similarity between the two, thus reducing the complexity of competing solely over the large labeling space. To that end, we propose a neural matching architecture with an effective sampling-based learn-to-compare mechanism, facilitating the learning process of the matching model despite constrained resources. </p>
back
<p>This study explores the nuanced capabilities and inherent biases of Recommender Systems using Large Language Models (RecLLMs), with a focus on ChatGPT-based systems. It studies into the contrasting behaviors of generative models and traditional collaborative filtering models in movie recommendations. The research primarily investigates prompt design strategies and their impact on various aspects of recommendation quality, including accuracy, provider fairness, diversity, stability, genre dominance, and temporal freshness (recency). </p> <p>Our experimental analysis reveals that the introduction of specific 'system roles' and 'prompt strategies' in RecLLMs significantly influences their performance. For instance, role-based prompts enhance fairness and diversity in recommendations, mitigating popularity bias. We find that while GPT-based models do not always match the performance of CF baselines, they exhibit a unique tendency to recommend newer and more diverse movie genres. Notably, GPT-based models tend to recommend more recent films, particularly those released post-2000, and show a preference for genres like \sq{Drama} and Comedy, and Romance (compared to CF Action, Adventure) presumably due to the RecLLMs' training on varied data sets, which allows them to capture recent trends and discussions more effectively than CF models. Interestingly, our results demonstrate that the 'Simple' and 'Chain of Thought (COT)' paradigms yield the highest accuracy. These findings imply the potential of combining these strategies with scenarios that favor more recent content, thereby offering a more balanced and up-to-date recommendation experience. This study contributes significantly to the understanding of emerging RecLLMs, particularly in the context of harms and biases within these systems. </p>
back
<p>Electroencephalography (EEG) signals are frequently used for various Brain-Computer Interface (BCI) tasks. While Deep Learning (DL) techniques have shown promising results, they are hindered by the substantial data requirements. By leveraging data from multiple subjects, transfer learning enables more effective training of DL models. A technique that is gaining popularity is Euclidean Alignment (EA) due to its ease of use, low computational complexity, and compatibility with Deep Learning models. However, few studies evaluate its impact on the training performance of shared and individual DL models. In this work, we systematically evaluate the effect of EA combined with DL for decoding BCI signals. We used EA to train shared models with data from multiple subjects and evaluated its transferability to new subjects. Our experimental results show that it improves decoding in the target subject by 4.33% and decreases convergence time by more than 70%. We also trained individual models for each subject to use as a majority-voting ensemble classifier. In this scenario, using EA improved the 3-model ensemble accuracy by 3.7%. However, when compared to the shared model with EA, the ensemble accuracy was 3.62% lower. </p>
back
<p>Embedding methods transform the knowledge graph into a continuous, low-dimensional space, facilitating inference and completion tasks. Existing methods are mainly divided into two types: translational distance models and semantic matching models. A key challenge in translational distance models is their inability to effectively differentiate between 'head' and 'tail' entities in graphs. To address this problem, a novel location-sensitive embedding (LSE) method has been developed. LSE innovatively modifies the head entity using relation-specific mappings, conceptualizing relations as linear transformations rather than mere translations. The theoretical foundations of LSE, including its representational capabilities and its connections to existing models, have been thoroughly examined. A more streamlined variant, LSE-d, which employs a diagonal matrix for transformations to enhance practical efficiency, is also proposed. Experiments conducted on four large-scale KG datasets for link prediction show that LSEd either outperforms or is competitive with state-of-the-art related works. </p>
back
<p>In the current landscape of online abuses and harms, effective content moderation is necessary to cultivate safe and inclusive online spaces. Yet, the effectiveness of many moderation interventions is still unclear. Here, we assess the effectiveness of The Great Ban, a massive deplatforming operation that affected nearly 2,000 communities on Reddit. By analyzing 16M comments posted by 17K users during 14 months, we provide nuanced results on the effects, both desired and otherwise, of the ban. Among our main findings is that 15.6% of the affected users left Reddit and that those who remained reduced their toxicity by 6.6% on average. The ban also caused 5% users to increase their toxicity by more than 70% of their pre-ban level. However, these resentful users likely had limited impact on Reddit due to low activity and little support by peers. Overall, our multifaceted results provide new insights into the efficacy of deplatforming. Our findings can inform the development of future moderation interventions and the policing of online platforms. </p>
back
<p>Graph classification is an important learning task for graph-structured data. Graph neural networks (GNNs) have recently gained growing attention in graph learning and have shown significant improvements in many important graph problems. Despite their state-of-the-art performances, existing GNNs only use local information from a very limited neighborhood around each node, suffering from loss of multi-modal information and overheads of excessive computation. To address these issues, we propose a novel Tensor-view Topological Graph Neural Network (TTG-NN), a class of simple yet effective topological deep learning built upon persistent homology, graph convolution, and tensor operations. This new method incorporates tensor learning to simultaneously capture Tensor-view Topological (TT), as well as Tensor-view Graph (TG) structural information on both local and global levels. Computationally, to fully exploit graph topology and structure, we propose two flexible TT and TG representation learning modules that disentangle feature tensor aggregation and transformation and learn to preserve multi-modal structure with less computation. Theoretically, we derive high probability bounds on both the out-of-sample and in-sample mean squared approximation errors for our proposed Tensor Transformation Layer (TTL). Real data experiments show that the proposed TTG-NN outperforms 20 state-of-the-art methods on various graph benchmarks. </p>
back
<p>Computational modeling of the melt pool dynamics in laser-based powder bed fusion metal additive manufacturing PBF-LB/M promises to shed light on fundamental defect generation mechanisms. These processes are typically accompanied by rapid evaporation so that the evaporation-induced recoil pressure and cooling arise as major driving forces for fluid dynamics and temperature evolution. The magnitude of these interface fluxes depends exponentially on the melt pool surface temperature, which, therefore, has to be predicted with high accuracy. The present work utilizes a diffuse interface model based on a continuum surface flux (CSF) description on the interfaces to study dimensionally reduced thermal two-phase problems representing PBF-LB/M in a finite element framework. It is demonstrated that the extreme temperature gradients combined with the high ratios of material properties between metal and ambient gas lead to significant errors in the interface temperatures and fluxes when classical CSF approaches, along with typical interface thicknesses and discretizations, are applied. A novel parameter-scaled CSF approach is proposed, which is constructed to yield a smoother temperature rate in the diffuse interface region, significantly increasing the solution accuracy. The interface thickness required to predict the temperature field with a given level of accuracy is less restrictive by at least one order of magnitude for the proposed parameter-scaled CSF approach compared to classical CSF, drastically reducing computational costs. Finally, we showcase the general applicability of the parameter-scaled CSF to a three-dimensional simulation of stationary laser melting of PBF-LB/M considering the fully coupled thermo-hydrodynamic multi-phase problem, including phase change. </p>
back
<p>In the open source software (OSS) ecosystem, there exists a complex software supply chain, where developers upstream and downstream widely borrow and reuse code. This results in the widespread occurrence of recurring defects, missing fixes, and propagation issues. These are collectively referred to as cognate defects, and their scale and threats have not received extensive attention and systematic research. Software composition analysis and code clone detection methods are unable to cover the various variant issues in the supply chain scenario, while code static analysis, or static application security testing (SAST) techniques struggle to target specific defects. In this paper, we propose a novel technique for detecting cognate defects in OSS through the automatic generation of SAST rules. Specifically, it extracts key syntax and semantic information from pre- and post-patch versions of code through structural comparison and control flow to data flow analysis, and generates rules that matches these key elements. We have implemented a prototype tool called Patch2QL and applied it to fundamental OSS in C/C++. In experiments, we discovered 7 new vulnerabilities with medium to critical severity in the most popular upstream software, as well as numerous potential security issues. When analyzing downstream projects in the supply chain, we found a significant number of representative cognate defects, clarifying the threat posed by this issue. Additionally, compared to general-purpose SAST and signature-based mechanisms, the generated rules perform better at discover all variants of cognate defects. </p>
back
<p>The dynamic nature of language, particularly evident in the realm of slang and memes on the Internet, poses serious challenges to the adaptability of large language models (LLMs). Traditionally anchored to static datasets, these models often struggle to keep up with the rapid linguistic evolution characteristic of online communities. This research addresses the critical need to bridge this gap, aiming to enhance LLMs' comprehension of the evolving new concepts on the internet, without the high cost of continual retraining. To address this issue, we propose a new benchmark $\textbf{SLANG}$, which can autonomously integrates novel data to stay dataset up-to-date, to assess LLMs' capability in comprehending emerging concepts and an approach $\textbf{FOCUS}$, which uses causal inference to enhance LLMs to understand new phrases and their colloquial context. This benchmark and approach involves digesting real-world instances of linguistic shifts, serving as contextual beacons, to form more precise and contextually relevant connections between newly emerging expressions and their intended meanings. The empirical analysis shows that our causal inference-based approach outperforms the traditional models in terms of precision and relevance in the interpretation of internet slang and memes. </p>
back
<p>Multiview clustering (MVC) segregates data samples into meaningful clusters by synthesizing information across multiple views. Moreover, deep learning-based methods have demonstrated their strong feature learning capabilities in MVC scenarios. However, effectively generalizing feature representations while maintaining consistency is still an intractable problem. In addition, most existing deep clustering methods based on contrastive learning overlook the consistency of the clustering representations during the clustering process. In this paper, we show how the above problems can be overcome and propose a consistent enhancement-based deep MVC method via contrastive learning (CCEC). Specifically, semantic connection blocks are incorporated into a feature representation to preserve the consistent information among multiple views. Furthermore, the representation process for clustering is enhanced through spectral clustering, and the consistency across multiple views is improved. Experiments conducted on five datasets demonstrate the effectiveness and superiority of our method in comparison with the state-of-the-art (SOTA) methods. The code for this method can be accessed at https://anonymous.4open.science/r/CCEC-E84E/. </p>
back
<p>Gaze interaction presents a promising avenue in Virtual Reality (VR) due to its intuitive and efficient user experience. Yet, the depth control inherent in our visual system remains underutilized in current methods. In this study, we introduce FocusFlow, a hands-free interaction method that capitalizes on human visual depth perception within the 3D scenes of Virtual Reality. We first develop a binocular visual depth detection algorithm to understand eye input characteristics. We then propose a layer-based user interface and introduce the concept of 'Virtual Window' that offers an intuitive and robust gaze-depth VR interaction, despite the constraints of visual depth accuracy and precision spatially at further distances. Finally, to help novice users actively manipulate their visual depth, we propose two learning strategies that use different visual cues to help users master visual depth control. Our user studies on 24 participants demonstrate the usability of our proposed virtual window concept as a gaze-depth interaction method. In addition, our findings reveal that the user experience can be enhanced through an effective learning process with adaptive visual cues, helping users to develop muscle memory for this brand-new input mechanism. We conclude the paper by discussing strategies to optimize learning and potential research topics of gaze-depth interaction. </p>
back
<p>Despite much progress, achieving real-time high-fidelity head avatar animation is still difficult and existing methods have to trade-off between speed and quality. 3DMM based methods often fail to model non-facial structures such as eyeglasses and hairstyles, while neural implicit models suffer from deformation inflexibility and rendering inefficiency. Although 3D Gaussian has been demonstrated to possess promising capability for geometry representation and radiance field reconstruction, applying 3D Gaussian in head avatar creation remains a major challenge since it is difficult for 3D Gaussian to model the head shape variations caused by changing poses and expressions. In this paper, we introduce PSAvatar, a novel framework for animatable head avatar creation that utilizes discrete geometric primitive to create a parametric morphable shape model and employs 3D Gaussian for fine detail representation and high fidelity rendering. The parametric morphable shape model is a Point-based Morphable Shape Model (PMSM) which uses points instead of meshes for 3D representation to achieve enhanced representation flexibility. The PMSM first converts the FLAME mesh to points by sampling on the surfaces as well as off the meshes to enable the reconstruction of not only surface-like structures but also complex geometries such as eyeglasses and hairstyles. By aligning these points with the head shape in an analysis-by-synthesis manner, the PMSM makes it possible to utilize 3D Gaussian for fine detail representation and appearance modeling, thus enabling the creation of high-fidelity avatars. We show that PSAvatar can reconstruct high-fidelity head avatars of a variety of subjects and the avatars can be animated in real-time ($\ge$ 25 fps at a resolution of 512 $\times$ 512 ). </p>
back
<p>We introduce Coverage Axis++, a novel and efficient approach to 3D shape skeletonization. The current state-of-the-art approaches for this task often rely on the watertightness of the input or suffer from substantial computational costs, thereby limiting their practicality. To address this challenge, Coverage Axis++ proposes a heuristic algorithm to select skeletal points, offering a high-accuracy approximation of the Medial Axis Transform (MAT) while significantly mitigating computational intensity for various shape representations. We introduce a simple yet effective strategy that considers both shape coverage and uniformity to derive skeletal points. The selection procedure enforces consistency with the shape structure while favoring the dominant medial balls, which thus introduces a compact underlying shape representation in terms of MAT. As a result, Coverage Axis++ allows for skeletonization for various shape representations (e.g., water-tight meshes, triangle soups, point clouds), specification of the number of skeletal points, few hyperparameters, and highly efficient computation with improved reconstruction accuracy. Extensive experiments across a wide range of 3D shapes validate the efficiency and effectiveness of Coverage Axis++. The code will be publicly available once the paper is published. </p>
back
<p>Large-scale neural language models exhibit a remarkable capacity for in-context learning (ICL): they can infer novel functions from datasets provided as input. Most of our current understanding of when and how ICL arises comes from LMs trained on extremely simple learning problems like linear regression and associative recall. There remains a significant gap between these model problems and the "real" ICL exhibited by LMs trained on large text corpora, which involves not just retrieval and function approximation but free-form generation of language and other structured outputs. In this paper, we study ICL through the lens of a new family of model problems we term in context language learning (ICLL). In ICLL, LMs are presented with a set of strings from a formal language, and must generate additional strings from the same language. We focus on in-context learning of regular languages generated by random finite automata. We evaluate a diverse set of neural sequence models (including several RNNs, Transformers, and state-space model variants) on regular ICLL tasks, aiming to answer three questions: (1) Which model classes are empirically capable of ICLL? (2) What algorithmic solutions do successful models implement to perform ICLL? (3) What architectural changes can improve ICLL in less performant models? We first show that Transformers significantly outperform neural sequence models with recurrent or convolutional representations on ICLL tasks. Next, we provide evidence that their ability to do so relies on specialized "n-gram heads" (higher-order variants of induction heads) that compute input-conditional next-token distributions. Finally, we show that hard-wiring these heads into neural models improves performance not just on ICLL, but natural language modeling -- improving the perplexity of 340M-parameter models by up to 1.14 points (6.7%) on the SlimPajama dataset. </p>
back
<p>Large Language Models (LLMs) have demonstrated remarkable success in various natural language processing and software engineering tasks, such as code generation. The LLMs are mainly utilized in the prompt-based zero/few-shot paradigm to guide the model in accomplishing the task. GPT-based models are one of the popular ones studied for tasks such as code comment generation or test generation. These tasks are `generative' tasks. However, there is limited research on the usage of LLMs for `non-generative' tasks such as classification using the prompt-based paradigm. In this preliminary exploratory study, we investigated the applicability of LLMs for Code Clone Detection (CCD), a non-generative task. By building a mono-lingual and cross-lingual CCD dataset derived from CodeNet, we first investigated two different prompts using ChatGPT to detect Type-4 code clones in Java-Java and Java-Ruby pairs in a zero-shot setting. We then conducted an analysis to understand the strengths and weaknesses of ChatGPT in CCD. ChatGPT surpasses the baselines in cross-language CCD attaining an F1-score of 0.877 and achieves comparable performance to fully fine-tuned models for mono-lingual CCD, with an F1-score of 0.878. Also, the prompt and the difficulty level of the problems has an impact on the performance of ChatGPT. Finally we provide insights and future directions based on our initial analysis </p>
back
<p>Social media feeds are deeply personal spaces that reflect individual values and preferences. However, top-down, platform-wide content algorithms can reduce users' sense of agency and fail to account for nuanced experiences and values. Drawing on the paradigm of interactive machine teaching (IMT), an interaction framework for non-expert algorithmic adaptation, we map out a design space for teachable social media feed experiences to empower agential, personalized feed curation. To do so, we conducted a think-aloud study (N=24) featuring four social media platforms -- Instagram, Mastodon, TikTok, and Twitter -- to understand key signals users leveraged to determine the value of a post in their feed. We synthesized users' signals into taxonomies that, when combined with user interviews, inform five design principles that extend IMT into the social media setting. We finally embodied our principles into three feed designs that we present as sensitizing concepts for teachable feed experiences moving forward. </p>
back
<p>Large-scale text-to-image generative models have made impressive strides, showcasing their ability to synthesize a vast array of high-quality images. However, adapting these models for artistic image editing presents two significant challenges. Firstly, users struggle to craft textual prompts that meticulously detail visual elements of the input image. Secondly, prevalent models, when effecting modifications in specific zones, frequently disrupt the overall artistic style, complicating the attainment of cohesive and aesthetically unified artworks. To surmount these obstacles, we build the innovative unified framework CreativeSynth, which is based on a diffusion model with the ability to coordinate multimodal inputs and multitask in the field of artistic image generation. By integrating multimodal features with customized attention mechanisms, CreativeSynth facilitates the importation of real-world semantic content into the domain of art through inversion and real-time style transfer. This allows for the precise manipulation of image style and content while maintaining the integrity of the original model parameters. Rigorous qualitative and quantitative evaluations underscore that CreativeSynth excels in enhancing artistic images' fidelity and preserves their innate aesthetic essence. By bridging the gap between generative models and artistic finesse, CreativeSynth becomes a custom digital palette. </p>
back
<p>Indexed Linear Logic has been introduced by Ehrhard and Bucciarelli, it can be seen as a logical presentation of non-idempotent intersection types extended through the relational semantics to the full linear logic. We introduce an idempotent variant of Indexed Linear Logic. We give a fine-grained reformulation of the syntax by exposing implicit parameters and by unifying several operations on formulae via the notion of base change. Idempotency is achieved by means of an appropriate subtyping relation. We carry on an in-depth study of indLL as a logic, showing how it determines a refinement of classical linear logic and establishing a terminating cut-elimination procedure. Cut-elimination is proved to be confluent up to an appropriate congruence induced by the subtyping relation. </p>
back
<p>This work considers an asynchronous $\textsf{K}_\text{a}$-active-user unsourced multiple access channel (AUMAC) with the worst-case asynchronicity. The transmitted messages must be decoded within $n$ channel uses, while some codewords are not completely received due to asynchronicities. We consider a constraint of the largest allowed delay of the transmission. The AUMAC lacks the permutation-invariant property of the synchronous UMAC since different permutations of the same codewords with a fixed asynchronicity are distinguishable. Hence, the analyses require calculating all $2^{\textsf{K}_\text{a}}-1$ combinations of erroneously decoded messages. Moreover, transmitters cannot adapt the corresponding codebooks according to asynchronicity due to a lack of information on asynchronicities. To overcome this challenge, a uniform bound of the per-user probability of error (PUPE) is derived by investigating the worst-case of the asynchronous patterns with the delay constraint. Numerical results show the trade-off between the energy-per-bit and the number of active users for different delay constraints. In addition, although the asynchronous transmission reduces interference, the required energy-per-bit increases as the receiver decodes with incompletely received codewords, compared to the synchronous case. </p>
back
<p>We study the (quantum) security of pseudorandom generators (PRGs) constructed from random oracles. We prove a "lifting theorem" showing, roughly, that if such a PRG is unconditionally secure against classical adversaries making polynomially many queries to the random oracle, then it is also (unconditionally) secure against quantum adversaries in the same sense. As a result of independent interest, we also show that any pseudo-deterministic quantum-oracle algorithm (i.e., a quantum algorithm that with high probability returns the same value on repeated executions) can be simulated by a computationally unbounded but query bounded classical-oracle algorithm with only a polynomial blowup in the number of queries. This implies as a corollary that our lifting theorem holds even for PRGs that themselves make quantum queries to the random oracle. </p>
back
<p>Recent TTS models with decoder-only Transformer architecture, such as SPEAR-TTS and VALL-E, achieve impressive naturalness and demonstrate the ability for zero-shot adaptation given a speech prompt. However, such decoder-only TTS models lack monotonic alignment constraints, sometimes leading to hallucination issues such as mispronunciation, word skipping and repeating. To address this limitation, we propose VALL-T, a generative Transducer model that introduces shifting relative position embeddings for input phoneme sequence, explicitly indicating the monotonic generation process while maintaining the architecture of decoder-only Transformer. Consequently, VALL-T retains the capability of prompt-based zero-shot adaptation and demonstrates better robustness against hallucinations with a relative reduction of 28.3% in the word error rate. Furthermore, the controllability of alignment in VALL-T during decoding facilitates the use of untranscribed speech prompts, even in unknown languages. It also enables the synthesis of lengthy speech by utilizing an aligned context window. </p>
back
<p>While Bangla is considered a language with limited resources, sentiment analysis has been a subject of extensive research in the literature. Nevertheless, there is a scarcity of exploration into sentiment analysis specifically in the realm of noisy Bangla texts. In this paper, we introduce a dataset (NC-SentNoB) that we annotated manually to identify ten different types of noise found in a pre-existing sentiment analysis dataset comprising of around 15K noisy Bangla texts. At first, given an input noisy text, we identify the noise type, addressing this as a multi-label classification task. Then, we introduce baseline noise reduction methods to alleviate noise prior to conducting sentiment analysis. Finally, we assess the performance of fine-tuned sentiment analysis models with both noisy and noise-reduced texts to make comparisons. The experimental findings indicate that the noise reduction methods utilized are not satisfactory, highlighting the need for more suitable noise reduction methods in future research endeavors. We have made the implementation and dataset presented in this paper publicly available at https://github.com/ktoufiquee/A-Comparative-Analysis-of-Noise-Reduction-Methods-in-Sentiment-Analysis-on-Noisy-Bangla-Texts </p>
back
<p>Prompt design and engineering has become an important discipline in just the past few months. In this paper, we provide an introduction to the main concepts and design approaches. We also provide more advanced techniques all the way to those needed to design LLM-based agents. We finish by providing a list of existing tools for prompt engineering. </p>
back
<p>Finding a concise and interpretable mathematical formula that accurately describes the relationship between each variable and the predicted value in the data is a crucial task in scientific research, as well as a significant challenge in artificial intelligence. This problem is referred to as symbolic regression, which is an NP-hard problem. In the previous year, a novel symbolic regression methodology utilizing Monte Carlo Tree Search (MCTS) was advanced, achieving state-of-the-art results on a diverse range of datasets. although this algorithm has shown considerable improvement in recovering target expressions compared to previous methods, the lack of guidance during the MCTS process severely hampers its search efficiency. Recently, some algorithms have added a pre-trained policy network to guide the search of MCTS, but the pre-trained policy network generalizes poorly. To optimize the trade-off between efficiency and versatility, we introduce SR-GPT, a novel algorithm for symbolic regression that integrates Monte Carlo Tree Search (MCTS) with a Generative Pre-Trained Transformer (GPT). By using GPT to guide the MCTS, the search efficiency of MCTS is significantly improved. Next, we utilize the MCTS results to further refine the GPT, enhancing its capabilities and providing more accurate guidance for the MCTS. MCTS and GPT are coupled together and optimize each other until the target expression is successfully determined. We conducted extensive evaluations of SR-GPT using 222 expressions sourced from over 10 different symbolic regression datasets. The experimental results demonstrate that SR-GPT outperforms existing state-of-the-art algorithms in accurately recovering symbolic expressions both with and without added noise. </p>
back
<p>People employ expressive behaviors to effectively communicate and coordinate their actions with others, such as nodding to acknowledge a person glancing at them or saying "excuse me" to pass people in a busy corridor. We would like robots to also demonstrate expressive behaviors in human-robot interaction. Prior work proposes rule-based methods that struggle to scale to new communication modalities or social situations, while data-driven methods require specialized datasets for each social situation the robot is used in. We propose to leverage the rich social context available from large language models (LLMs) and their ability to generate motion based on instructions or user preferences, to generate expressive robot motion that is adaptable and composable, building upon each other. Our approach utilizes few-shot chain-of-thought prompting to translate human language instructions into parametrized control code using the robot's available and learned skills. Through user studies and simulation experiments, we demonstrate that our approach produces behaviors that users found to be competent and easy to understand. Supplementary material can be found at https://generative-expressive-motion.github.io/. </p>
back
<p>This work delves into the expanding role of large language models (LLMs) in generating artificial data. LLMs are increasingly employed to create a variety of outputs, including annotations, preferences, instruction prompts, simulated dialogues, and free text. As these forms of LLM-generated data often intersect in their application, they exert mutual influence on each other and raise significant concerns about the quality and diversity of the artificial data incorporated into training cycles, leading to an artificial data ecosystem. To the best of our knowledge, this is the first study to aggregate various types of LLM-generated text data, from more tightly constrained data like "task labels" to more lightly constrained "free-form text". We then stress test the quality and implications of LLM-generated artificial data, comparing it with human data across various existing benchmarks. Despite artificial data's capability to match human performance, this paper reveals significant hidden disparities, especially in complex tasks where LLMs often miss the nuanced understanding of intrinsic human-generated content. This study critically examines diverse LLM-generated data and emphasizes the need for ethical practices in data creation and when using LLMs. It highlights the LLMs' shortcomings in replicating human traits and behaviors, underscoring the importance of addressing biases and artifacts produced in LLM-generated content for future research and development. All data and code are available on our project page. </p>
back
<p>In high-dimensional data analysis, such as financial index tracking or biomedical applications, it is crucial to select the few relevant variables while maintaining control over the false discovery rate (FDR). In these applications, strong dependencies often exist among the variables (e.g., stock returns), which can undermine the FDR control property of existing methods like the model-X knockoff method or the T-Rex selector. To address this issue, we have expanded the T-Rex framework to accommodate overlapping groups of highly correlated variables. This is achieved by integrating a nearest neighbors penalization mechanism into the framework, which provably controls the FDR at the user-defined target level. A real-world example of sparse index tracking demonstrates the proposed method's ability to accurately track the S&amp;P 500 index over the past 20 years based on a small number of stocks. An open-source implementation is provided within the R package TRexSelector on CRAN. </p>
back
<p>In recent years, deep learning-based solutions have proven successful in the domains of image enhancement. This paper introduces LYT-Net, or Lightweight YUV Transformer-based Network, as a novel approach for low-light image enhancement. The proposed architecture, distinct from conventional Retinex-based models, leverages the YUV color space's natural separation of luminance (Y) and chrominance (U and V) to simplify the intricate task of disentangling light and color information in images. By utilizing the strengths of transformers, known for their capability to capture long-range dependencies, LYT-Net ensures a comprehensive contextual understanding of the image while maintaining reduced model complexity. By employing a novel hybrid loss function, our proposed method achieves state-of-the-art results on low-light image enhancement datasets, all while being considerably more compact than its counterparts. The source code and pre-trained models are available at https://github.com/albrateanu/LYT-Net </p>
back
<p>Automating visual inspection for capturing defects based on civil structures appearance is crucial due to its currently labour-intensive and time-consuming nature. An important aspect of automated inspection is image acquisition, which is rapid and cost-effective considering the pervasive developments in both software and hardware computing in recent years. Previous studies largely focused on concrete and asphalt, with less attention to masonry cracks. The latter also lacks publicly available datasets. In this paper, we first present a corresponding data set for instance segmentation with 1,300 annotated images (640 pixels x 640 pixels), named as MCrack1300, covering bricks, broken bricks, and cracks. We then test several leading algorithms for benchmarking, including the latest large-scale model, the prompt-based Segment Anything Model (SAM). We fine-tune the encoder using Low-Rank Adaptation (LoRA) and proposed two novel methods for automation of SAM execution. The first method involves abandoning the prompt encoder and connecting the SAM encoder to other decoders, while the second method introduces a learnable self-generating prompter. In order to ensure the seamless integration of the two proposed methods with SAM encoder section, we redesign the feature extractor. Both proposed methods exceed state-of-the-art performance, surpassing the best benchmark by approximately 3% for all classes and around 6% for cracks specifically. Based on successful detection, we propose a method based on a monocular camera and the Hough Line Transform to automatically transform images into orthographic projection maps. By incorporating known real sizes of brick units, we accurately estimate crack dimensions, with the results differing by less than 10% from those obtained by laser scanning. Overall, we address important research gaps in automated masonry crack detection and size estimation. </p>
back
<p>In this paper, we consider multi-robot localization problems with focus on cooperative localization and observability analysis of relative pose estimation. For cooperative localization, there is extra information available to each robot via communication network and message passing. If odometry data of a target robot can be transmitted to the ego-robot then the observability of their relative pose estimation can be achieved by range-only or bearing-only measurements provided both of their linear velocities are non-zero. If odometry data of a target robot is not directly transmitted but estimated by the ego-robot then there must be both range and bearing measurements to guarantee the observability of relative pose estimation. For ROS/Gazebo simulations, we consider four different sensing and communication structures in which extended Kalman filtering (EKF) and pose graph optimization (PGO) estimation with different robust loss functions (filtering and smoothing with different batch sizes of sliding window) are compared in terms of estimation accuracy. For hardware experiments, two Turtlebot3 equipped with UWB modules are used for real-world inter-robot relative pose estimation, in which both EKF and PGO are applied and compared. </p>
back
<p>Large language models (LLMs) have exhibited an array of reasoning capabilities but face challenges like error propagation and hallucination, particularly in specialised areas like finance, where data is heterogeneous, and precision is paramount. We explore the potential of language model augmentation with external tools to mitigate these limitations and offload certain reasoning steps to external tools that are more suited for the task, instead of solely depending on the LLM's inherent abilities. More concretely, using financial domain question-answering datasets, we apply supervised fine-tuning on a LLaMA-2 13B Chat model to act both as a 'task router' and 'task solver'. The 'task router' dynamically directs a question to either be answered internally by the LLM or externally via the right tool from the tool set. Our tool-equipped SFT model, Raven, demonstrates an improvement of 35.2% and 5.06% over the base model and SFT-only baselines, respectively, and is highly competitive with strong GPT-3.5 results. To the best of our knowledge, our work is the first that investigates tool augmentation of language models for the finance domain. </p>
back
<p>There exist challenges in learning and understanding religions as the presence of complexity and depth of religious doctrines and teachings. Chatbots as question-answering systems can help in solving these challenges. LLM chatbots use NLP techniques to establish connections between topics and accurately respond to complex questions. These capabilities make it perfect to be used in enlightenment on religion as a question answering chatbot. However, LLMs also have a tendency to generate false information, known as hallucination. The responses of the chatbots can include content that insults personal religious beliefs, interfaith conflicts, and controversial or sensitive topics. It needs to avoid such cases without promoting hate speech or offending certain groups of people or their beliefs. This study uses a vector database-based Retrieval Augmented Generation (RAG) approach to enhance the accuracy and transparency of LLMs. Our question-answering system is called as "MufassirQAS". We created a vector database with several open-access books that include Turkish context. These are Turkish translations, and interpretations on Islam. We worked on creating system prompts with care, ensuring they provide instructions that prevent harmful, offensive, or disrespectful responses. We also tested the MufassirQAS and ChatGPT with sensitive questions. We got better performance with our system. Study and enhancements are still in progress. Results and future works are given. </p>
back
<p>Diffusion planning has been recognized as an effective decision-making paradigm in various domains. The high-quality conditional generation capability of long-horizon trajectories makes it a promising research direction. However, existing diffusion planning methods suffer from low decision-making frequencies because of the expensive iterative sampling cost. To address this issue, we introduce DiffuserLite, a fast and lightweight diffusion planning framework. DiffuserLite employs a planning refinement process (PRP) to generate coarse-to-fine-grained trajectories, significantly reducing the modeling of redundant information and leading to notable increases in decision-making frequency. Our experimental results demonstrate that DiffuserLite needs only $0.88\%$ of the runtime cost compared to previous frameworks, achieves an average decision-making frequency of $122$Hz, and reaches state-of-the-art performance on D4RL benchmarks. In addition, our clean DiffuserLite framework can serve as a flexible plugin to enhance decision frequency in other diffusion planning algorithms, providing a structural design reference for future works. More details and visualizations are available at [project website](https://diffuserlite.github.io/). </p>
back
<p>The synthesis of 3D facial animations from speech has garnered considerable attention. Due to the scarcity of high-quality 4D facial data and well-annotated abundant multi-modality labels, previous methods often suffer from limited realism and a lack of lexible conditioning. We address this challenge through a trilogy. We first introduce Generalized Neural Parametric Facial Asset (GNPFA), an efficient variational auto-encoder mapping facial geometry and images to a highly generalized expression latent space, decoupling expressions and identities. Then, we utilize GNPFA to extract high-quality expressions and accurate head poses from a large array of videos. This presents the M2F-D dataset, a large, diverse, and scan-level co-speech 3D facial animation dataset with well-annotated emotional and style labels. Finally, we propose Media2Face, a diffusion model in GNPFA latent space for co-speech facial animation generation, accepting rich multi-modality guidances from audio, text, and image. Extensive experiments demonstrate that our model not only achieves high fidelity in facial animation synthesis but also broadens the scope of expressiveness and style adaptability in 3D facial animation. </p>
back
<p>Despite significant advancements in text-to-image models for generating high-quality images, these methods still struggle to ensure the controllability of text prompts over images in the context of complex text prompts, especially when it comes to retaining object attributes and relationships. In this paper, we propose CompAgent, a training-free approach for compositional text-to-image generation, with a large language model (LLM) agent as its core. The fundamental idea underlying CompAgent is premised on a divide-and-conquer methodology. Given a complex text prompt containing multiple concepts including objects, attributes, and relationships, the LLM agent initially decomposes it, which entails the extraction of individual objects, their associated attributes, and the prediction of a coherent scene layout. These individual objects can then be independently conquered. Subsequently, the agent performs reasoning by analyzing the text, plans and employs the tools to compose these isolated objects. The verification and human feedback mechanism is finally incorporated into our agent to further correct the potential attribute errors and refine the generated images. Guided by the LLM agent, we propose a tuning-free multi-concept customization model and a layout-to-image generation model as the tools for concept composition, and a local image editing method as the tool to interact with the agent for verification. The scene layout controls the image generation process among these tools to prevent confusion among multiple objects. Extensive experiments demonstrate the superiority of our approach for compositional text-to-image generation: CompAgent achieves more than 10\% improvement on T2I-CompBench, a comprehensive benchmark for open-world compositional T2I generation. The extension to various related tasks also illustrates the flexibility of our CompAgent for potential applications. </p>
back
<p>Solving feedback Stackelberg games with nonlinear dynamics and coupled constraints, a common scenario in practice, presents significant challenges. This work introduces an efficient method for computing local feedback Stackelberg policies in multi-player general-sum dynamic games, with continuous state and action spaces. Different from existing (approximate) dynamic programming solutions that are primarily designed for unconstrained problems, our approach involves reformulating a feedback Stackelberg dynamic game into a sequence of nested optimization problems, enabling the derivation of Karush-Kuhn-Tucker (KKT) conditions and the establishment of a second-order sufficient condition for local feedback Stackelberg policies. We propose a Newton-style primal-dual interior point method for solving constrained linear quadratic (LQ) feedback Stackelberg games, offering provable convergence guarantees. Our method is further extended to compute local feedback Stackelberg policies for more general nonlinear games by iteratively approximating them using LQ games, ensuring that their KKT conditions are locally aligned with those of the original nonlinear games. We prove the exponential convergence of our algorithm in constrained nonlinear games. In a feedback Stackelberg game with nonlinear dynamics and (nonconvex) coupled costs and constraints, our experimental results reveal the algorithm's ability to handle infeasible initial conditions and achieve exponential convergence towards an approximate local feedback Stackelberg equilibrium. </p>
back
<p>Training machine learning and statistical models often involves optimizing a data-driven risk criterion. The risk is usually computed with respect to the empirical data distribution, but this may result in poor and unstable out-of-sample performance due to distributional uncertainty. In the spirit of distributionally robust optimization, we propose a novel robust criterion by combining insights from Bayesian nonparametric (i.e., Dirichlet Process) theory and recent decision-theoretic models of smooth ambiguity-averse preferences. First, we highlight novel connections with standard regularized empirical risk minimization techniques, among which Ridge and LASSO regressions. Then, we theoretically demonstrate the existence of favorable finite-sample and asymptotic statistical guarantees on the performance of the robust optimization procedure. For practical implementation, we propose and study tractable approximations of the criterion based on well-known Dirichlet Process representations. We also show that the smoothness of the criterion naturally leads to standard gradient-based numerical optimization. Finally, we provide insights into the workings of our method by applying it to high-dimensional sparse linear regression and robust location parameter estimation tasks. </p>
back
<p>Conducting real road testing for autonomous driving algorithms can be expensive and sometimes impractical, particularly for small startups and research institutes. Thus, simulation becomes an important method for evaluating these algorithms. However, the availability of free and open-source simulators is limited, and the installation and configuration process can be daunting for beginners and interdisciplinary researchers. We introduce an autonomous driving simulator with photorealistic scenes, meanwhile keeping a user-friendly workflow. The simulator is able to communicate with external algorithms through ROS2 or Socket.IO, making it compatible with existing software stacks. Furthermore, we implement a highly accurate vehicle dynamics model within the simulator to enhance the realism of the vehicle's physical effects. The simulator is able to serve various functions, including generating synthetic data and driving with machine learning-based algorithms. Moreover, we prioritize simplicity in the deployment process, ensuring that beginners find it approachable and user-friendly. </p>
back
<p>PageRank is a widely used centrality measure that assesses the significance of vertices in a graph by considering their connections and the importance of those connections. Efficiently updating PageRank on dynamic graphs is essential for various applications due to the increasing scale of datasets. This technical report introduces our improved Dynamic Frontier (DF) and Dynamic Frontier with Pruning (DF-P) approaches. Given a batch update comprising edge insertions and deletions, these approaches iteratively identify vertices likely to change their ranks with minimal overhead. On a server featuring a 64-core AMD EPYC-7742 processor, our approaches outperform Static and Dynamic Traversal PageRank by 5.2x/15.2x and 1.3x/3.5x respectively - on real-world dynamic graphs, and by 7.2x/9.6x and 4.0x/5.6x on large static graphs with random batch updates. Furthermore, our approaches improve performance at a rate of 1.8x/1.7x for every doubling of threads. </p>
back
<p>In the field of causal modeling, potential outcomes (PO) and structural causal models (SCMs) stand as the predominant frameworks. However, these frameworks face notable challenges in practically modeling counterfactuals, formalized as parameters of the joint distribution of potential outcomes. Counterfactual reasoning holds paramount importance in contemporary decision-making processes, especially in scenarios that demand personalized incentives based on the joint values of $(Y(0), Y(1))$. This paper begins with an investigation of the PO and SCM frameworks for modeling counterfactuals. Through the analysis, we identify an inherent model capacity limitation, termed as the ``degenerative counterfactual problem'', emerging from the consistency rule that is the cornerstone of both frameworks. To address this limitation, we introduce a novel \textit{distribution-consistency} assumption, and in alignment with it, we propose the Distribution-consistency Structural Causal Models (DiscoSCMs) offering enhanced capabilities to model counterfactuals. To concretely reveal the enhanced model capacity, we introduce a new identifiable causal parameter, \textit{the probability of consistency}, which holds practical significance within DiscoSCM alone, showcased with a personalized incentive example. Furthermore, we provide a comprehensive set of theoretical results about the ``Ladder of Causation'' within the DiscoSCM framework. We hope it opens new avenues for future research of counterfactual modeling, ultimately enhancing our understanding of causality and its real-world applications. </p>
back
<p>Recently, DNA storage has surfaced as a promising alternative for data storage, presenting notable benefits in terms of storage capacity, cost-effectiveness in maintenance, and the capability for parallel replication. Mathematically, the DNA storage process can be conceptualized as an insertion, deletion, and substitution (IDS) channel. Due to the mathematical complexity associated with the Levenshtein distance, creating a code that corrects for IDS remains a challenging task. In this paper, we propose a bottom-up generation approach to grow the required codebook based on the computation of Edit Computational Graph (ECG) which differs from the algebraic constructions by incorporating the Derivative-Free Optimization (DFO) method. Specifically, this approach is regardless of the type of errors. Compared the results with the work for 1-substitution-1-deletion and 2-deletion, the redundancy is reduced by about 30-bit and 60-bit, respectively. As far as we know, our method is the first IDS-correcting code designed using classical Natural Language Process (NLP) techniques, marking a turning point in the field of error correction code research. Based on the codebook generated by our method, there may be significant breakthroughs in the complexity of encoding and decoding algorithms. </p>
back
<p>This study investigates self-supervised learning techniques to obtain representations of Event Sequences. It is a key modality in various applications, including but not limited to banking, e-commerce, and healthcare. </p> <p>We perform a comprehensive study of generative and contrastive approaches in self-supervised learning, applying them both independently. We find that there is no single supreme method. Consequently, we explore the potential benefits of combining these approaches. To achieve this goal, we introduce a novel method that aligns generative and contrastive embeddings as distinct modalities, drawing inspiration from contemporary multimodal research. </p> <p>Generative and contrastive approaches are often treated as mutually exclusive, leaving a gap for their combined exploration. Our results demonstrate that this aligned model performs at least on par with, and mostly surpasses, existing methods and is more universal across a variety of tasks. Furthermore, we demonstrate that self-supervised methods consistently outperform the supervised approach on our datasets. </p>
back
<p>Guessing random additive noise decoding (GRAND) is a recently proposed decoding paradigm particularly suitable for codes with short length and high rate. Among its variants, ordered reliability bits GRAND (ORBGRAND) exploits soft information in a simple and effective fashion to schedule its queries, thereby allowing efficient hardware implementation. Compared with maximum likelihood (ML) decoding, however, ORBGRAND still exhibits noticeable performance gap in terms of block error rate (BLER). In order to improve the performance of ORBGRAND while still retaining its amenability to hardware implementation, a new variant of ORBGRAND termed RS-ORBGRAND is proposed, whose basic idea is to reshuffle the queries of ORBGRAND so that the expected number of queries is minimized. Numerical simulations show that RS-ORBGRAND leads to noticeable gains compared with ORBGRAND and its existing variants, and is only 0.1dB away from ML decoding, for BLER as low as $10^{-6}$. </p>
back
<p>We consider channels with synchronization errors modeled as insertions and deletions. A classical result for such channels is the information stability of such channels, hence the existence of the Shannon capacity, when the synchronization errors are memoryless. In this paper, we extend this result to the case where the insertions and deletions have memory. Specifically, we assume that the synchronization errors are governed by a stationary and ergodic finite state Markov chain, and prove that mutual information capacity of such channels exist, and it is equal to its coding capacity, showing that there exists a coding scheme which achieves this limit. </p>
back
<p>The share of online video traffic in global carbon dioxide emissions is growing steadily. To comply with the demand for video media, dedicated compression techniques are continuously optimized, but at the expense of increasingly higher computational demands and thus rising energy consumption at the video encoder side. In order to find the best trade-off between compression and energy consumption, modeling encoding energy for a wide range of encoding parameters is crucial. We propose an encoding time and energy model for SVT-AV1 based on empirical relations between the encoding time and video parameters as well as encoder configurations. Furthermore, we model the influence of video content by established content descriptors such as spatial and temporal information. We then use the predicted encoding time to estimate the required energy demand and achieve a prediction error of 19.6 % for encoding time and 20.9 % for encoding energy. </p>
back
<p>Instruction finetuning on a variety of image-text instruction data is the key to obtaining a versatile Multimodal Large Language Model (MLLM), and different configurations of the instruction data can lead to finetuned models with different capabilities. However, we have discovered that data conflicts are inevitable when mixing instruction data from distinct domains, which can result in performance drops for tasks of a specific domain. To address this issue, we propose to apply an efficient Mixture of Experts (MoE) design, which is a sparse Mixture of LoRA Experts (MoLE) for instruction finetuning MLLMs. Within the Transformer layers, we extend the popular Low-Rank Adaption (LoRA) method by creating a set of LoRA experts specifically for the MLP layer, and route each token to the top-1 expert based on a routing function, allowing adaptive choices for tokens from different domains. Since the LoRA experts are sparsely activated, the training and inference cost are kept roughly constant compared to the original LoRA method. By replacing the plain-LoRA of LLaVA-1.5 with our MoE design, our final model is named LLaVA-MoLE. Extensive experiments proved that LLaVA-MoLE effectively mitigates the data conflict issue when mixing multiple distinct instruction datasets with various configurations, and achieves consistent performance gains over the strong plain-LoRA baselines. Most importantly, on the mixed datasets, LLaVA-MoLE can even outperform the plain-LoRA baseline trained with twice the samples. </p>
back
<p>This paper discusses the error and cost aspects of ill-posed integral equations when given discrete noisy point evaluations on a fine grid. Standard solution methods usually employ discretization schemes that are directly induced by the measurement points. Thus, they may scale unfavorably with the number of evaluation points, which can result in computational inefficiency. To address this issue, we propose an algorithm that achieves the same level of accuracy while significantly reducing computational costs. Our approach involves an initial averaging procedure to sparsify the underlying grid. To keep the exposition simple, we focus only on one-dimensional ill-posed integral equations that have sufficient smoothness. However, the approach can be generalized to more complicated two- and three-dimensional problems with appropriate modifications. </p>
back
<p>Federated learning enhanced by differential privacy has emerged as a popular approach to better safeguard the privacy of client-side data by protecting clients' contributions during the training process. Existing solutions typically assume a uniform privacy budget for all records and provide one-size-fits-all solutions that may not be adequate to meet each record's privacy requirement. In this paper, we explore the uncharted territory of cross-silo FL with record-level personalized differential privacy. We devise a novel framework named rPDP-FL, employing a two-stage hybrid sampling scheme with both client-level sampling and non-uniform record-level sampling to accommodate varying privacy requirements. A critical and non-trivial problem is to select the ideal per-record sampling probability q given the personalized privacy budget {\epsilon}. We introduce a versatile solution named Simulation-CurveFitting, allowing us to uncover a significant insight into the nonlinear correlation between q and {\epsilon} and derive an elegant mathematical model to tackle the problem. Our evaluation demonstrates that our solution can provide significant performance gains over the baselines that do not consider personalized privacy preservation. </p>
back
<p>Process modeling and discovery techniques aim to construct sound and valid process models for different types of processes, i.e., process orchestrations and collaboration processes. Orchestrations represent behavior of cases within one process. Collaboration processes represent behavior of collaborating cases within multiple process orchestrations that interact via collaboration concepts such as organizations, agents, objects, and services. The heterogeneity of collaboration concepts and types such as message exchange and resource sharing has led to different representations and discovery techniques for collaboration process models, but a standard model class is lacking. We propose collaboration Petri nets (cPN) to achieve comparability between techniques, to enable approach and property transfer, and to build a standardized collaboration mining pipeline similar to process mining. For cPN, we require desirable modeling power, decision power, modeling convenience, and relations to existing model classes. We show the representation of collaboration types, structural characterization as workflow nets, automatic verification of soundness, bisimulation equivalence to existing model classes, and application in a general discovery framework. As empirical evidence to discover cPN, we conduct a comparative evaluation between three discovery techniques on a set of existing collaboration event logs. </p>
back
<p>This paper delves into the degradability of quantum channels, with a specific focus on high-dimensional extensions of qubit depolarizing channels in low-noise regimes. We build upon the foundation of $\eta$-approximate degradable channels, as established by Sutter et al. and Leditzky et al., to introduce and examine the Modified Landau-Streater (MLS) channels. These channels expand upon the qubit depolarizing and the recently proposed modified Werner-Holevo channels by Roofeh and Karimipour, extending them to higher-dimensional Hilbert spaces (with dimension $d=2j+1$, where $j$ are positive half-integers). Our investigation centers on their conformity to the $O(\varepsilon^2)$ degradability pattern, aligning with and extending Leditzky et al.'s findings in the $d=2$ case. By replacing the SU($2$) generators with SU($d$) in our treatment, we may explore the potential inclusion of generalized Gell-Mann matrices in future research. Our results enhance the understanding of super-additivity in quantum channels within the low-noise regime and lay the groundwork for future explorations into conditions and structures that could lead to $O(\varepsilon^2)$ degradability across a broader spectrum of quantum channels. </p>
back
<p>Simulating realistic time-domain observations of gravitational waves (GWs) and GW detector glitches can help in advancing GW data analysis. Simulated data can be used in downstream tasks by augmenting datasets for signal searches, balancing data sets for machine learning, and validating detection schemes. In this work, we present Conditional Derivative GAN (cDVGAN), a novel conditional model in the Generative Adversarial Network framework for simulating multiple classes of time-domain observations that represent gravitational waves (GWs) and detector glitches. cDVGAN can also generate generalized hybrid samples that span the variation between classes through interpolation in the conditioned class vector. cDVGAN introduces an additional player into the typical 2-player adversarial game of GANs, where an auxiliary discriminator analyzes the first-order derivative time-series. Our results show that this provides synthetic data that better captures the features of the original data. cDVGAN conditions on three classes, two denoised from LIGO blip and tomte glitch events from its 3rd observing run (O3), and the third representing binary black hole (BBH) mergers. Our proposed cDVGAN outperforms 4 different baseline GAN models in replicating the features of the three classes. Specifically, our experiments show that training convolutional neural networks (CNNs) with our cDVGAN-generated data improves the detection of samples embedded in detector noise beyond the synthetic data from other state-of-the-art GAN models. Our best synthetic dataset yields as much as a 4.2% increase in area-under-the-curve (AUC) performance compared to synthetic datasets from baseline GANs. Moreover, training the CNN with hybrid samples from our cDVGAN outperforms CNNs trained only on the standard classes, when identifying real samples embedded in LIGO detector background (4% AUC improvement for cDVGAN). </p>
back
<p>In this work we present FreDSNet, a deep learning solution which obtains semantic 3D understanding of indoor environments from single panoramas. Omnidirectional images reveal task-specific advantages when addressing scene understanding problems due to the 360-degree contextual information about the entire environment they provide. However, the inherent characteristics of the omnidirectional images add additional problems to obtain an accurate detection and segmentation of objects or a good depth estimation. To overcome these problems, we exploit convolutions in the frequential domain obtaining a wider receptive field in each convolutional layer. These convolutions allow to leverage the whole context information from omnidirectional images. FreDSNet is the first network that jointly provides monocular depth estimation and semantic segmentation from a single panoramic image exploiting fast Fourier convolutions. Our experiments show that FreDSNet has similar performance as specific state of the art methods for semantic segmentation and depth estimation. FreDSNet code is publicly available in https://github.com/Sbrunoberenguel/FreDSNet </p>
back
<p>Feature attribution methods (FAs) are popular approaches for providing insights into the model reasoning process of making predictions. The more faithful a FA is, the more accurately it reflects which parts of the input are more important for the prediction. Widely used faithfulness metrics, such as sufficiency and comprehensiveness use a hard erasure criterion, i.e. entirely removing or retaining the top most important tokens ranked by a given FA and observing the changes in predictive likelihood. However, this hard criterion ignores the importance of each individual token, treating them all equally for computing sufficiency and comprehensiveness. In this paper, we propose a simple yet effective soft erasure criterion. Instead of entirely removing or retaining tokens from the input, we randomly mask parts of the token vector representations proportionately to their FA importance. Extensive experiments across various natural language processing tasks and different FAs show that our soft-sufficiency and soft-comprehensiveness metrics consistently prefer more faithful explanations compared to hard sufficiency and comprehensiveness. Our code: https://github.com/casszhao/SoftFaith </p>
back
<p>Context: Software model optimization is a process that automatically generates design alternatives, typically to enhance quantifiable non-functional properties of software systems, such as performance and reliability. Multi-objective evolutionary algorithms have shown to be effective in this context for assisting the designer in identifying trade-offs between the desired non-functional properties. Objective: In this work, we investigate the effects of imposing a time budget to limit the search for design alternatives, which inevitably affects the quality of the resulting alternatives. Method: The effects of time budgets are analyzed by investigating both the quality of the generated design alternatives and their structural features when varying the budget and the genetic algorithm (NSGA-II, PESA2, SPEA2). This is achieved by employing multi-objective quality indicators and a tree-based representation of the search space. Results: The study reveals that the time budget significantly affects the quality of Pareto fronts, especially for performance and reliability. NSGA-II is the fastest algorithm, while PESA2 generates the highest-quality solutions. The imposition of a time budget results in structurally distinct models compared to those obtained without a budget, indicating that the search process is influenced by both the budget and algorithm selection. Conclusions: In software model optimization, imposing a time budget can be effective in saving optimization time, but designers should carefully consider the trade-off between time and solution quality in the Pareto front, along with the structural characteristics of the generated models. By making informed choices about the specific genetic algorithm, designers can achieve different trade-offs. </p>